Pluggable Storage - Andres's take

Started by Andres Freundover 7 years ago179 messages
#1Andres Freund
andres@anarazel.de

Hi,

As I've previously mentioned I had planned to spend some time to polish
Haribabu's version of the pluggable storage patch and rebase it on the
vtable based slot approach from [1]http://archives.postgresql.org/message-id/20180220224318.gw4oe5jadhpmcdnm%40alap3.anarazel.de. While doing so I found more and
more things that I previously hadn't noticed. I started rewriting things
into something closer to what I think we want architecturally.

The current state of my version of the patch is *NOT* ready for proper
review (it doesn't even pass all tests, there's FIXME / elog()s). But I
think it's getting close enough to it's eventual shape that more eyes,
and potentially more hands on keyboards, can be useful.

The most fundamental issues I had with Haribabu's last version from [2]http://archives.postgresql.org/message-id/CAJrrPGcN5A4jH0PJ-s=6k3+SLA4pozC4HHRdmvU1ZBuA20TE-A@mail.gmail.com
are the following:

- The use of TableTuple, a typedef from void *, is bad from multiple
fronts. For one it reduces just about all type safety. There were
numerous bugs in the patch where things were just cast from HeapTuple
to TableTuple to HeapTuple (and even to TupleTableSlot). I think it's
a really, really bad idea to introduce a vague type like this for
development purposes alone, it makes it way too hard to refactor -
essentially throwing the biggest benefit of type safe languages out of
the window.

Additionally I think it's also the wrong approach architecturally. We
shouldn't assume that a tuple can efficiently be represented as a
single palloc'ed chunk. In fact, we should move *away* from relying on
that so much.

I've thus removed the TableTuple type entirely.

- Previous verions of the patchset exposed Buffers in the tableam.h API,
performed buffer locking / pinning / ExecStoreTuple() calls outside of
it. That is wrong in my opinion, as various AMs will deal very
differently with buffer pinning & locking. The relevant logic is
largely moved within the AM. Bringing me to the next point:

- tableam exposed various operations based on HeapTuple/TableTuple's
(and their Buffers). This all need be slot based, as we can't
represent the way each AM will deal with this. I've largely converted
the API to be slot based. That has some fallout, but I think largely
works. Lots of outdated comments.

- I think the move of the indexing from outside the table layer into the
storage layer isn't a good idea. It lead to having to pass EState into
the tableam, a callback API to perform index updates, etc. This seems
to have at least partially been triggered by the speculative insertion
codepaths. I've reverted this part of the changes. The speculative
insertion / confirm codepaths are now exposed to tableam.h - I think
that's the right thing because we'll likely want to have that
functionality across more than a single tuple in the future.

- The visibility functions relied on the *caller* performing buffer
locking. That's not a great idea, because generic code shouldn't know
about the locking scheme a particular AM needs. I've changed the
external visibility functions to instead take a slot, and perform the
necessary locking inside.

- There were numerous tableam callback uses inside heapam.c - that makes
no sense, we know what the storage is therein. The relevant

- The integration between index lookups and heap lookups based on the
results on a index lookup was IMO too tight. The index code dealt
with heap tuples, which isn't great. I've introduced a new concept, a
'IndexFetchTableData' scan. It's initialized when building an index
scan, and provides the necessary state (say current heap buffer), to
do table lookups from within a heap.

- The am of relations required for bootstrapping was set to 0 - I don't
think that's a good idea. I changed it so it's set to the heap AM as
well.

- HOT was encoded in the API in a bunch of places. That doesn't look
right to me. I tried to improve a bit on that, but I'm not yet quite
sure I like it. Needs written explanation & arguments...

- the heap tableam did a heap_copytuple() nearly everywhere. Leading to
a higher memory usage, because the resulting tuples weren't freed or
anything. There might be a reason for doing such a change - we've
certainly discussed that before - but I'm *vehemently* against doing
that at the same time we introduce pluggable storage. Analyzing the
performance effects will be hard enough without changes like this.

- I've for now backed out the heap rewrite changes, partially. Mostly
because I didn't like the way the abstraction looks, but haven't quite
figured out how it should look like.

- I did not like that speculative tokens were moved to slots. There's
really no reason for them to live outside parameters to tableam.h
functsions.

- lotsa additional smaller changes.

- lotsa new bugs

My current working state is at [3]https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/pluggable-storage (urls to clone repo are at [4]https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=summary).
This is *HEAVILY WIP*. I plan to continue working on it over the next
days, but I'll temporarily focus onto v11 work. If others want I could
move repo to github and grant others write access.

I think the patchseries should eventually look like:

- move vacuumlazy.c (and other similar files) into access/heap, there's
really nothing generic here. This is a fairly independent task.
- slot'ify FDW RefetchForeignRow_function
- vtable based slot API, based on [1]http://archives.postgresql.org/message-id/20180220224318.gw4oe5jadhpmcdnm%40alap3.anarazel.de
- slot'ify trigger API
- redo EPQ based on slots (prototyped in my git tree)
- redo trigger API to be slot based
- tuple traversal API changes
- tableam infrastructure, with error if a non-builtin AM is chosen
- move heap and calling code to be tableam based
- make vacuum callback based (not vacuum.c, just vacuumlazy.c)
- [other patches]
- allow other AMs
- introduce test AM

Tasks / Questions:

- split up patch
- Change heap table AM to not allocate handler function for each table,
instead allocate it statically. Avoids a significant amount of data
duplication, and allows for a few more compiler optimizations.
- Merge tableam.h and tableamapi.h and make most tableam.c functions
small inline functions. Having one-line tableam.c wrappers makes this
more expensive than necessary. We'll have a big enough trouble not
regressing performancewise.
- change scan level slot creation to use tableam function for doing so
- get rid of slot->tts_tid, tts_tupleOid and potentially tts_tableOid
- COPY's multi_insert path should probably deal with a bunch of slots,
rather than forming HeapTuples
- bitmap index scans probably need a new tableam.h callback, abstracting
bitgetpage()
- suspect IndexBuildHeapScan might need to move into the tableam.h API -
it's not clear to me that it's realistically possible to this in a
generic manner.

Greetings,

Andres Freund

[1]: http://archives.postgresql.org/message-id/20180220224318.gw4oe5jadhpmcdnm%40alap3.anarazel.de
[2]: http://archives.postgresql.org/message-id/CAJrrPGcN5A4jH0PJ-s=6k3+SLA4pozC4HHRdmvU1ZBuA20TE-A@mail.gmail.com
[3]: https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/pluggable-storage
[4]: https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=summary

#2Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#1)
Re: Pluggable Storage - Andres's take

On Tue, Jul 3, 2018 at 5:06 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

As I've previously mentioned I had planned to spend some time to polish
Haribabu's version of the pluggable storage patch and rebase it on the
vtable based slot approach from [1]. While doing so I found more and
more things that I previously hadn't noticed. I started rewriting things
into something closer to what I think we want architecturally.

Thanks for the deep review and changes.

The current state of my version of the patch is *NOT* ready for proper
review (it doesn't even pass all tests, there's FIXME / elog()s). But I
think it's getting close enough to it's eventual shape that more eyes,
and potentially more hands on keyboards, can be useful.

I will try to update it to make sure that it passes all the tests and also
try to
reduce the FIXME's.

The most fundamental issues I had with Haribabu's last version from [2]
are the following:

- The use of TableTuple, a typedef from void *, is bad from multiple
fronts. For one it reduces just about all type safety. There were
numerous bugs in the patch where things were just cast from HeapTuple
to TableTuple to HeapTuple (and even to TupleTableSlot). I think it's
a really, really bad idea to introduce a vague type like this for
development purposes alone, it makes it way too hard to refactor -
essentially throwing the biggest benefit of type safe languages out of
the window.

My earlier intention was to remove the HeapTuple usage entirely and replace
it with slot everywhere outside the tableam. But it ended up with TableTuple
before it reach to the stage because of heavy use of HeapTuple.

Additionally I think it's also the wrong approach architecturally. We
shouldn't assume that a tuple can efficiently be represented as a
single palloc'ed chunk. In fact, we should move *away* from relying on
that so much.

I've thus removed the TableTuple type entirely.

Thanks for the changes, I didn't check the code yet, so for now whenever the
HeapTuple is required it will be generated from slot?

- Previous verions of the patchset exposed Buffers in the tableam.h API,
performed buffer locking / pinning / ExecStoreTuple() calls outside of
it. That is wrong in my opinion, as various AMs will deal very
differently with buffer pinning & locking. The relevant logic is
largely moved within the AM. Bringing me to the next point:

- tableam exposed various operations based on HeapTuple/TableTuple's
(and their Buffers). This all need be slot based, as we can't
represent the way each AM will deal with this. I've largely converted
the API to be slot based. That has some fallout, but I think largely
works. Lots of outdated comments.

Yes, I agree with you.

- I think the move of the indexing from outside the table layer into the
storage layer isn't a good idea. It lead to having to pass EState into
the tableam, a callback API to perform index updates, etc. This seems
to have at least partially been triggered by the speculative insertion
codepaths. I've reverted this part of the changes. The speculative
insertion / confirm codepaths are now exposed to tableam.h - I think
that's the right thing because we'll likely want to have that
functionality across more than a single tuple in the future.

- The visibility functions relied on the *caller* performing buffer
locking. That's not a great idea, because generic code shouldn't know
about the locking scheme a particular AM needs. I've changed the
external visibility functions to instead take a slot, and perform the
necessary locking inside.

When I first moved all the visibility functions as part of tableam, I find
this
problem, and it will be good if it takes of buffer locking and etc.

- There were numerous tableam callback uses inside heapam.c - that makes
no sense, we know what the storage is therein. The relevant

- The integration between index lookups and heap lookups based on the
results on a index lookup was IMO too tight. The index code dealt
with heap tuples, which isn't great. I've introduced a new concept, a
'IndexFetchTableData' scan. It's initialized when building an index
scan, and provides the necessary state (say current heap buffer), to
do table lookups from within a heap.

I agree that it will be good with the new concept from index to access the
heap.

- The am of relations required for bootstrapping was set to 0 - I don't
think that's a good idea. I changed it so it's set to the heap AM as
well.

- HOT was encoded in the API in a bunch of places. That doesn't look
right to me. I tried to improve a bit on that, but I'm not yet quite
sure I like it. Needs written explanation & arguments...

- the heap tableam did a heap_copytuple() nearly everywhere. Leading to
a higher memory usage, because the resulting tuples weren't freed or
anything. There might be a reason for doing such a change - we've
certainly discussed that before - but I'm *vehemently* against doing
that at the same time we introduce pluggable storage. Analyzing the
performance effects will be hard enough without changes like this.

How about using of slot instead of tuple and reusing of it? I don't know
yet whether it is possible everywhere.

- I've for now backed out the heap rewrite changes, partially. Mostly
because I didn't like the way the abstraction looks, but haven't quite
figured out how it should look like.

- I did not like that speculative tokens were moved to slots. There's
really no reason for them to live outside parameters to tableam.h
functsions.

- lotsa additional smaller changes.

- lotsa new bugs

Thanks for all the changes.

My current working state is at [3] (urls to clone repo are at [4]).
This is *HEAVILY WIP*. I plan to continue working on it over the next
days, but I'll temporarily focus onto v11 work. If others want I could
move repo to github and grant others write access.

Yes, I want to access the code and do further development on it.

Tasks / Questions:

- split up patch

How about generating refactoring changes as patches first based on
the code in your repo as discussed here[1]?

- Change heap table AM to not allocate handler function for each table,
instead allocate it statically. Avoids a significant amount of data
duplication, and allows for a few more compiler optimizations.

Some kind of static variable handlers for each tableam, but need to check
how can we access that static handler from the relation.

- Merge tableam.h and tableamapi.h and make most tableam.c functions
small inline functions. Having one-line tableam.c wrappers makes this
more expensive than necessary. We'll have a big enough trouble not
regressing performancewise.

OK.

- change scan level slot creation to use tableam function for doing so
- get rid of slot->tts_tid, tts_tupleOid and potentially tts_tableOid

so with this there shouldn't be a way from slot to tid mapping or there
should be some other way.

- COPY's multi_insert path should probably deal with a bunch of slots,
rather than forming HeapTuples

OK.

- bitmap index scans probably need a new tableam.h callback, abstracting

bitgetpage()

OK.

Regards,
Haribabu Kommi
Fujitsu Australia

#3Alexander Korotkov
a.korotkov@postgrespro.ru
In reply to: Andres Freund (#1)
Re: Pluggable Storage - Andres's take

Hi!

On Tue, Jul 3, 2018 at 10:06 AM Andres Freund <andres@anarazel.de> wrote:

As I've previously mentioned I had planned to spend some time to polish
Haribabu's version of the pluggable storage patch and rebase it on the
vtable based slot approach from [1]. While doing so I found more and
more things that I previously hadn't noticed. I started rewriting things
into something closer to what I think we want architecturally.

Great, thank you for working on this patchset!

The current state of my version of the patch is *NOT* ready for proper
review (it doesn't even pass all tests, there's FIXME / elog()s). But I
think it's getting close enough to it's eventual shape that more eyes,
and potentially more hands on keyboards, can be useful.

The most fundamental issues I had with Haribabu's last version from [2]
are the following:

- The use of TableTuple, a typedef from void *, is bad from multiple
fronts. For one it reduces just about all type safety. There were
numerous bugs in the patch where things were just cast from HeapTuple
to TableTuple to HeapTuple (and even to TupleTableSlot). I think it's
a really, really bad idea to introduce a vague type like this for
development purposes alone, it makes it way too hard to refactor -
essentially throwing the biggest benefit of type safe languages out of
the window.

Additionally I think it's also the wrong approach architecturally. We
shouldn't assume that a tuple can efficiently be represented as a
single palloc'ed chunk. In fact, we should move *away* from relying on
that so much.

I've thus removed the TableTuple type entirely.

+1, TableTuple was vague concept.

- Previous verions of the patchset exposed Buffers in the tableam.h API,
performed buffer locking / pinning / ExecStoreTuple() calls outside of
it. That is wrong in my opinion, as various AMs will deal very
differently with buffer pinning & locking. The relevant logic is
largely moved within the AM. Bringing me to the next point:

- tableam exposed various operations based on HeapTuple/TableTuple's
(and their Buffers). This all need be slot based, as we can't
represent the way each AM will deal with this. I've largely converted
the API to be slot based. That has some fallout, but I think largely
works. Lots of outdated comments.

Makes sense for me. I like passing TupleTableSlot to tableam api
function much more.

- I think the move of the indexing from outside the table layer into the
storage layer isn't a good idea. It lead to having to pass EState into
the tableam, a callback API to perform index updates, etc. This seems
to have at least partially been triggered by the speculative insertion
codepaths. I've reverted this part of the changes. The speculative
insertion / confirm codepaths are now exposed to tableam.h - I think
that's the right thing because we'll likely want to have that
functionality across more than a single tuple in the future.

I agree that passing EState into tableam doesn't look good. But I
believe that tableam needs way more control over indexes than it has
in your version patch. If even tableam-independent insertion into
indexes on tuple insert is more or less ok, but on update we need
something smarter rather than just insert index tuples depending
"update_indexes" flag. Tableam specific update method may decide to
update only some of indexes. For example, when zheap performs update
in-place then it inserts to only indexes, whose fields were updated.
And I think any undo-log based storage would have similar behavior.
Moreover, it might be required to do something with existing index
tuples (for instance, as I know, zheap sets "deleted" flag to index
tuples related to previous values of updated fields).

If we would like to move indexing outside of tableam, then we might
turn "update_indexes" from bool to more enum with values like: "don't
insert index tuples", "insert all index tuples", "insert index tuples
only for updated fields" and so on. But that looks more like set of
hardcoded cases for particular implementations, than proper API. So,
probably we shouldn't move indexing outside of tableam, but rather
provide better wrappers for doing that in tableam?

- The visibility functions relied on the *caller* performing buffer
locking. That's not a great idea, because generic code shouldn't know
about the locking scheme a particular AM needs. I've changed the
external visibility functions to instead take a slot, and perform the
necessary locking inside.

Makes sense for me. But would it cause extra locking/unlocking and in
turn performance impact?

- There were numerous tableam callback uses inside heapam.c - that makes
no sense, we know what the storage is therein. The relevant

Ok.

- The integration between index lookups and heap lookups based on the
results on a index lookup was IMO too tight. The index code dealt
with heap tuples, which isn't great. I've introduced a new concept, a
'IndexFetchTableData' scan. It's initialized when building an index
scan, and provides the necessary state (say current heap buffer), to
do table lookups from within a heap.

+1

- The am of relations required for bootstrapping was set to 0 - I don't
think that's a good idea. I changed it so it's set to the heap AM as
well.

+1

- HOT was encoded in the API in a bunch of places. That doesn't look
right to me. I tried to improve a bit on that, but I'm not yet quite
sure I like it. Needs written explanation & arguments...

Yes, HOT is heapam specific feature. Other tableams might not have
HOT. But it appears that we still expose hot_search_buffer() function
in tableam API. But that function have no usage, so it's just
redundant and can be removed.

- the heap tableam did a heap_copytuple() nearly everywhere. Leading to
a higher memory usage, because the resulting tuples weren't freed or
anything. There might be a reason for doing such a change - we've
certainly discussed that before - but I'm *vehemently* against doing
that at the same time we introduce pluggable storage. Analyzing the
performance effects will be hard enough without changes like this.

I think once we switched to slots, doing heap_copytuple() do
frequently is not required anymore.

- I've for now backed out the heap rewrite changes, partially. Mostly
because I didn't like the way the abstraction looks, but haven't quite
figured out how it should look like.

Yeah, it's hard part, but we need to invent something in this area...

- I did not like that speculative tokens were moved to slots. There's
really no reason for them to live outside parameters to tableam.h
functsions.

Good.

My current working state is at [3] (urls to clone repo are at [4]).
This is *HEAVILY WIP*. I plan to continue working on it over the next
days, but I'll temporarily focus onto v11 work. If others want I could
move repo to github and grant others write access.

Github would be more convinient for me.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#4Alexander Korotkov
a.korotkov@postgrespro.ru
In reply to: Alexander Korotkov (#3)
Re: Pluggable Storage - Andres's take

On Thu, Jul 5, 2018 at 3:25 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

My current working state is at [3] (urls to clone repo are at [4]).
This is *HEAVILY WIP*. I plan to continue working on it over the next
days, but I'll temporarily focus onto v11 work. If others want I could
move repo to github and grant others write access.

Github would be more convinient for me.

I've another note. It appears that you leave my patch for locking
last version of tuple in one call (heapam_lock_tuple() function)
almost without changes. During PGCon 2018 Developer meeting I
remember you was somewhat unhappy with this approach. So, do you have
any notes about that for now?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#5Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#2)
Re: Pluggable Storage - Andres's take

Hi,

I've pushed up a new version to
https://github.com/anarazel/postgres-pluggable-storage which now passes
all the tests.

Besides a lot of bugfixes, I've rebased the tree, moved TriggerData to
be primarily slot based (with a conversion roundtrip when calling
trigger functions), and a lot of other small things.

On 2018-07-04 20:11:21 +1000, Haribabu Kommi wrote:

On Tue, Jul 3, 2018 at 5:06 PM Andres Freund <andres@anarazel.de> wrote:

The current state of my version of the patch is *NOT* ready for proper
review (it doesn't even pass all tests, there's FIXME / elog()s). But I
think it's getting close enough to it's eventual shape that more eyes,
and potentially more hands on keyboards, can be useful.

I will try to update it to make sure that it passes all the tests and also
try to
reduce the FIXME's.

Cool.

Alexander, Haribabu, if you give me (privately) your github accounts,
I'll give you write access to that repository.

The most fundamental issues I had with Haribabu's last version from [2]
are the following:

- The use of TableTuple, a typedef from void *, is bad from multiple
fronts. For one it reduces just about all type safety. There were
numerous bugs in the patch where things were just cast from HeapTuple
to TableTuple to HeapTuple (and even to TupleTableSlot). I think it's
a really, really bad idea to introduce a vague type like this for
development purposes alone, it makes it way too hard to refactor -
essentially throwing the biggest benefit of type safe languages out of
the window.

My earlier intention was to remove the HeapTuple usage entirely and replace
it with slot everywhere outside the tableam. But it ended up with TableTuple
before it reach to the stage because of heavy use of HeapTuple.

I don't think that's necessary - a lot of the system catalog accesses
are going to continue to be heap tuple accesses. And the conversions you
did significantly continue to access TableTuples as heap tuples - it was
just that the compiler didn't warn about it anymore.

A prime example of that is the way the rewriteheap / cluster
integreation was done. Cluster continued to sort tuples as heap tuples -
even though that's likely incompatible with other tuple formats which
need different state.

Additionally I think it's also the wrong approach architecturally. We
shouldn't assume that a tuple can efficiently be represented as a
single palloc'ed chunk. In fact, we should move *away* from relying on
that so much.

I've thus removed the TableTuple type entirely.

Thanks for the changes, I didn't check the code yet, so for now whenever the
HeapTuple is required it will be generated from slot?

Pretty much.

- the heap tableam did a heap_copytuple() nearly everywhere. Leading to
a higher memory usage, because the resulting tuples weren't freed or
anything. There might be a reason for doing such a change - we've
certainly discussed that before - but I'm *vehemently* against doing
that at the same time we introduce pluggable storage. Analyzing the
performance effects will be hard enough without changes like this.

How about using of slot instead of tuple and reusing of it? I don't know
yet whether it is possible everywhere.

Not quite sure what you mean?

Tasks / Questions:

- split up patch

How about generating refactoring changes as patches first based on
the code in your repo as discussed here[1]?

Sure - it was just at the moment too much work ;)

- Change heap table AM to not allocate handler function for each table,
instead allocate it statically. Avoids a significant amount of data
duplication, and allows for a few more compiler optimizations.

Some kind of static variable handlers for each tableam, but need to check
how can we access that static handler from the relation.

I'm not sure what you mean by "how can we access"? We can just return a
pointer from the constant data from the current handler? Except that
adding a bunch of consts would be good, the external interface wouldn't
need to change?

- change scan level slot creation to use tableam function for doing so
- get rid of slot->tts_tid, tts_tupleOid and potentially tts_tableOid

so with this there shouldn't be a way from slot to tid mapping or there
should be some other way.

I'm not sure I follow?

- bitmap index scans probably need a new tableam.h callback, abstracting

bitgetpage()

OK.

Any chance you could try to tackle this? I'm going to be mostly out
this week, so we'd probably not run across each others feet...

Greetings,

Andres Freund

#6Andres Freund
andres@anarazel.de
In reply to: Alexander Korotkov (#3)
Re: Pluggable Storage - Andres's take

Hi,

On 2018-07-05 15:25:25 +0300, Alexander Korotkov wrote:

- I think the move of the indexing from outside the table layer into the
storage layer isn't a good idea. It lead to having to pass EState into
the tableam, a callback API to perform index updates, etc. This seems
to have at least partially been triggered by the speculative insertion
codepaths. I've reverted this part of the changes. The speculative
insertion / confirm codepaths are now exposed to tableam.h - I think
that's the right thing because we'll likely want to have that
functionality across more than a single tuple in the future.

I agree that passing EState into tableam doesn't look good. But I
believe that tableam needs way more control over indexes than it has
in your version patch. If even tableam-independent insertion into
indexes on tuple insert is more or less ok, but on update we need
something smarter rather than just insert index tuples depending
"update_indexes" flag. Tableam specific update method may decide to
update only some of indexes. For example, when zheap performs update
in-place then it inserts to only indexes, whose fields were updated.
And I think any undo-log based storage would have similar behavior.
Moreover, it might be required to do something with existing index
tuples (for instance, as I know, zheap sets "deleted" flag to index
tuples related to previous values of updated fields).

I agree that we probably need more - I'm just inclined to think that we
need a more concrete target to work against. Currently zheap's indexing
logic still is fairly naive, I don't think we'll get the interface right
without having worked further on the zheap side of things.

- The visibility functions relied on the *caller* performing buffer
locking. That's not a great idea, because generic code shouldn't know
about the locking scheme a particular AM needs. I've changed the
external visibility functions to instead take a slot, and perform the
necessary locking inside.

Makes sense for me. But would it cause extra locking/unlocking and in
turn performance impact?

I don't think so - nearly all the performance relevant cases do all the
visibility logic inside the AM. Where the underlying functions can be
used, that do not do the locking. Pretty all the converted places just
had manual LockBuffer calls.

- HOT was encoded in the API in a bunch of places. That doesn't look
right to me. I tried to improve a bit on that, but I'm not yet quite
sure I like it. Needs written explanation & arguments...

Yes, HOT is heapam specific feature. Other tableams might not have
HOT. But it appears that we still expose hot_search_buffer() function
in tableam API. But that function have no usage, so it's just
redundant and can be removed.

Yea, that was a leftover.

- the heap tableam did a heap_copytuple() nearly everywhere. Leading to
a higher memory usage, because the resulting tuples weren't freed or
anything. There might be a reason for doing such a change - we've
certainly discussed that before - but I'm *vehemently* against doing
that at the same time we introduce pluggable storage. Analyzing the
performance effects will be hard enough without changes like this.

I think once we switched to slots, doing heap_copytuple() do
frequently is not required anymore.

It's mostly gone now.

- I've for now backed out the heap rewrite changes, partially. Mostly
because I didn't like the way the abstraction looks, but haven't quite
figured out how it should look like.

Yeah, it's hard part, but we need to invent something in this area...

I agree. But I really don't yet quite now what. I somewhat wonder if we
should just add a cluster_rel() callback to the tableam and let it deal
with everything :(. As previously proposed the interface wouldn't have
worked with anything not losslessly encodable into a heaptuple, which is
unlikely to be sufficient.

FWIW, I plan to be mostly out until Thursday this week, and then I'll
rebase onto the new version of the abstract slot patch and then try to
split up the patchset. Once that's done, I'll do a prototype conversion
of zheap, which I'm sure will show up a lot of weaknesses in the current
abstraction. Once that's done I hope we can collaborate / divide &
conquer to get the individual pieces into commit shape.

If either of you wants to get a head start separating something out,
let's try to organize who would do what? The EPQ and trigger
slotification are probably good candidates.

Greetings,

Andres Freund

#7Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#5)
Re: Pluggable Storage - Andres's take

On Mon, Jul 16, 2018 at 11:35 PM Andres Freund <andres@anarazel.de> wrote:

On 2018-07-04 20:11:21 +1000, Haribabu Kommi wrote:

On Tue, Jul 3, 2018 at 5:06 PM Andres Freund <andres@anarazel.de> wrote:

The most fundamental issues I had with Haribabu's last version from [2]
are the following:

- The use of TableTuple, a typedef from void *, is bad from multiple
fronts. For one it reduces just about all type safety. There were
numerous bugs in the patch where things were just cast from HeapTuple
to TableTuple to HeapTuple (and even to TupleTableSlot). I think

it's

a really, really bad idea to introduce a vague type like this for
development purposes alone, it makes it way too hard to refactor -
essentially throwing the biggest benefit of type safe languages out

of

the window.

My earlier intention was to remove the HeapTuple usage entirely and

replace

it with slot everywhere outside the tableam. But it ended up with

TableTuple

before it reach to the stage because of heavy use of HeapTuple.

I don't think that's necessary - a lot of the system catalog accesses
are going to continue to be heap tuple accesses. And the conversions you
did significantly continue to access TableTuples as heap tuples - it was
just that the compiler didn't warn about it anymore.

A prime example of that is the way the rewriteheap / cluster
integreation was done. Cluster continued to sort tuples as heap tuples -
even though that's likely incompatible with other tuple formats which
need different state.

OK. Understood.

- the heap tableam did a heap_copytuple() nearly everywhere. Leading to
a higher memory usage, because the resulting tuples weren't freed or
anything. There might be a reason for doing such a change - we've
certainly discussed that before - but I'm *vehemently* against doing
that at the same time we introduce pluggable storage. Analyzing the
performance effects will be hard enough without changes like this.

How about using of slot instead of tuple and reusing of it? I don't know
yet whether it is possible everywhere.

Not quite sure what you mean?

I thought of using slot everywhere can reduce the use of heap_copytuple,
I understand that you already did those changes from you reply from the
other mail.

Tasks / Questions:

- split up patch

How about generating refactoring changes as patches first based on
the code in your repo as discussed here[1]?

Sure - it was just at the moment too much work ;)

Yes, it is too much work. How about doing this once most of the
open items are finished?

- Change heap table AM to not allocate handler function for each table,
instead allocate it statically. Avoids a significant amount of data
duplication, and allows for a few more compiler optimizations.

Some kind of static variable handlers for each tableam, but need to check
how can we access that static handler from the relation.

I'm not sure what you mean by "how can we access"? We can just return a
pointer from the constant data from the current handler? Except that
adding a bunch of consts would be good, the external interface wouldn't
need to change?

I mean we may need to store some tableam ID in each table, so that based on
that ID we get the static tableam handler, because at a time there may be
tables from different tableam methods.

- change scan level slot creation to use tableam function for doing so
- get rid of slot->tts_tid, tts_tupleOid and potentially tts_tableOid

so with this there shouldn't be a way from slot to tid mapping or there
should be some other way.

I'm not sure I follow?

Replacing of heaptuple with slot, currently with slot only the tid is passed
to the tableam methods, To get rid of the tid from slot, we may need any
other method of passing?

- bitmap index scans probably need a new tableam.h callback, abstracting

bitgetpage()

OK.

Any chance you could try to tackle this? I'm going to be mostly out
this week, so we'd probably not run across each others feet...

OK, I will take care of the above point.

Regards,
Haribabu Kommi
Fujitsu Australia

#8Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#7)
2 attachment(s)
Re: Pluggable Storage - Andres's take

On Tue, Jul 17, 2018 at 11:01 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Mon, Jul 16, 2018 at 11:35 PM Andres Freund <andres@anarazel.de> wrote:

On 2018-07-04 20:11:21 +1000, Haribabu Kommi wrote:

On Tue, Jul 3, 2018 at 5:06 PM Andres Freund <andres@anarazel.de>

wrote:

- bitmap index scans probably need a new tableam.h callback, abstracting

bitgetpage()

OK.

Any chance you could try to tackle this? I'm going to be mostly out
this week, so we'd probably not run across each others feet...

OK, I will take care of the above point.

I added new API in the tableam.h to get all the page visible tuples to
abstract the bitgetpage() function.

- Merge tableam.h and tableamapi.h and make most tableam.c functions
small inline functions. Having one-line tableam.c wrappers makes this
more expensive than necessary. We'll have a big enough trouble not
regressing performancewise.

I merged tableam.h and tableamapi.h into tableam.h and changed all the
functions as inline. This change may have added some additional headers,
will check them if I can remove their need.

Attached are the updated patches on top your github tree.

Currently I am working on the following.
- I observed that there is a crash when running isolation tests.
- COPY's multi_insert path should probably deal with a bunch of slots,
rather than forming HeapTuples

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0002-New-API-to-get-heap-page-tuples.patchapplication/octet-stream; name=0002-New-API-to-get-heap-page-tuples.patchDownload
From e96746d07b8dfcdb17082b54524505163f5e237a Mon Sep 17 00:00:00 2001
From: Hari Babu <haribabuk@fast.au.fujitsu.com>
Date: Tue, 24 Jul 2018 23:18:15 +1000
Subject: [PATCH 2/2] New API to get heap page tuples

This API is used in bitmap scan to get all the visible tuples
from the page.
---
 src/backend/access/heap/heapam_handler.c  | 105 ++++++++++++++++++++++
 src/backend/executor/nodeBitmapHeapscan.c | 100 ++-------------------
 src/include/access/tableam.h              |  20 +++++
 3 files changed, 131 insertions(+), 94 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 42eec2a2ab..a3fe110efe 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,9 +33,13 @@
 
 #include "storage/bufpage.h"
 #include "storage/bufmgr.h"
+#include "storage/predicate.h"
 #include "access/xact.h"
 
 
+static bool heapam_fetch_tuple_from_offset(TableScanDesc sscan, BlockNumber blkno,
+										   OffsetNumber offset, TupleTableSlot *slot);
+
 /* ----------------------------------------------------------------
  *				storage AM support routines for heapam
  * ----------------------------------------------------------------
@@ -462,6 +466,106 @@ heapam_get_heappagescandesc(TableScanDesc sscan)
 	return &scan->rs_pagescan;
 }
 
+static void
+heapam_scan_get_page_tuples(TableScanDesc scan,
+						   HeapPageScanDesc pagescan,
+						   TupleTableSlot *slot,
+						   BlockNumber page,
+						   int ntuples,
+						   OffsetNumber *offsets)
+{
+	Buffer		buffer;
+	Snapshot	snapshot;
+	int			ntup;
+
+	/*
+	 * Acquire pin on the target heap page, trading in any pin we held before.
+	 */
+	Assert(page < pagescan->rs_nblocks);
+
+	scan->rs_cbuf = ReleaseAndReadBuffer(scan->rs_cbuf,
+										 scan->rs_rd,
+										 page);
+	buffer = scan->rs_cbuf;
+	snapshot = scan->rs_snapshot;
+
+	ntup = 0;
+
+	/*
+	 * Prune and repair fragmentation for the whole page, if possible.
+	 */
+	heap_page_prune_opt(scan->rs_rd, buffer);
+
+	/*
+	 * We must hold share lock on the buffer content while examining tuple
+	 * visibility.  Afterwards, however, the tuples we have found to be
+	 * visible are guaranteed good as long as we hold the buffer pin.
+	 */
+	LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+	/*
+	 * We need two separate strategies for lossy and non-lossy cases.
+	 */
+	if (ntuples >= 0)
+	{
+		/*
+		 * Bitmap is non-lossy, so we just look through the offsets listed in
+		 * tbmres; but we have to follow any HOT chain starting at each such
+		 * offset.
+		 */
+		int			curslot;
+
+		for (curslot = 0; curslot < ntuples; curslot++)
+		{
+			OffsetNumber offnum = offsets[curslot];
+			ItemPointerData tid;
+			HeapTupleData heapTuple;
+
+			ItemPointerSet(&tid, page, offnum);
+			if (heap_hot_search_buffer(&tid, scan->rs_rd, buffer, snapshot,
+									   &heapTuple, NULL, true))
+				pagescan->rs_vistuples[ntup++] = ItemPointerGetOffsetNumber(&tid);
+		}
+	}
+	else
+	{
+		/*
+		 * Bitmap is lossy, so we must examine each item pointer on the page.
+		 * But we can ignore HOT chains, since we'll check each tuple anyway.
+		 */
+		Page		dp = (Page) BufferGetPage(buffer);
+		OffsetNumber maxoff = PageGetMaxOffsetNumber(dp);
+		OffsetNumber offnum;
+
+		for (offnum = FirstOffsetNumber; offnum <= maxoff; offnum = OffsetNumberNext(offnum))
+		{
+			ItemId		lp;
+			bool		valid;
+			BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+
+			lp = PageGetItemId(dp, offnum);
+			if (!ItemIdIsNormal(lp))
+				continue;
+
+			/* FIXME: unnecessarily pins */
+			heapam_fetch_tuple_from_offset(scan, page, offnum, slot);
+			valid = HeapTupleSatisfies(bslot->base.tuple, snapshot, bslot->buffer);
+			if (valid)
+			{
+				pagescan->rs_vistuples[ntup++] = offnum;
+				PredicateLockTuple(scan->rs_rd, bslot->base.tuple, snapshot);
+			}
+			CheckForSerializableConflictOut(valid, scan->rs_rd, bslot->base.tuple,
+											buffer, snapshot);
+		}
+	}
+
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
+	Assert(ntup <= MaxHeapTuplesPerPage);
+	pagescan->rs_ntuples = ntup;
+}
+
 static bool
 heapam_fetch_tuple_from_offset(TableScanDesc sscan, BlockNumber blkno, OffsetNumber offset, TupleTableSlot *slot)
 {
@@ -683,6 +787,7 @@ heap_tableam_handler(PG_FUNCTION_ARGS)
 	 * BitmapHeap and Sample Scans
 	 */
 	amroutine->scan_get_heappagescandesc = heapam_get_heappagescandesc;
+	amroutine->scan_get_page_tuples = heapam_scan_get_page_tuples;
 	amroutine->sync_scan_report_location = ss_report_location;
 
 	amroutine->tuple_fetch_row_version = heapam_fetch_row_version;
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 8521e45132..be123728eb 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -371,100 +371,12 @@ BitmapHeapNext(BitmapHeapScanState *node)
 static void
 bitgetpage(BitmapHeapScanState *node, TBMIterateResult *tbmres)
 {
-	TableScanDesc scan = node->ss.ss_currentScanDesc;
-	HeapPageScanDesc pagescan = node->pagescan;
-	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
-	BlockNumber page = tbmres->blockno;
-	Buffer		buffer;
-	Snapshot	snapshot;
-	int			ntup;
-
-	/*
-	 * Acquire pin on the target heap page, trading in any pin we held before.
-	 */
-	Assert(page < pagescan->rs_nblocks);
-
-	scan->rs_cbuf = ReleaseAndReadBuffer(scan->rs_cbuf,
-										 scan->rs_rd,
-										 page);
-	buffer = scan->rs_cbuf;
-	snapshot = scan->rs_snapshot;
-
-	ntup = 0;
-
-	/*
-	 * Prune and repair fragmentation for the whole page, if possible.
-	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
-
-	/*
-	 * We must hold share lock on the buffer content while examining tuple
-	 * visibility.  Afterwards, however, the tuples we have found to be
-	 * visible are guaranteed good as long as we hold the buffer pin.
-	 */
-	LockBuffer(buffer, BUFFER_LOCK_SHARE);
-
-	/*
-	 * We need two separate strategies for lossy and non-lossy cases.
-	 */
-	if (tbmres->ntuples >= 0)
-	{
-		/*
-		 * Bitmap is non-lossy, so we just look through the offsets listed in
-		 * tbmres; but we have to follow any HOT chain starting at each such
-		 * offset.
-		 */
-		int			curslot;
-
-		for (curslot = 0; curslot < tbmres->ntuples; curslot++)
-		{
-			OffsetNumber offnum = tbmres->offsets[curslot];
-			ItemPointerData tid;
-			HeapTupleData heapTuple;
-
-			ItemPointerSet(&tid, page, offnum);
-			if (heap_hot_search_buffer(&tid, scan->rs_rd, buffer, snapshot,
-									   &heapTuple, NULL, true))
-				pagescan->rs_vistuples[ntup++] = ItemPointerGetOffsetNumber(&tid);
-		}
-	}
-	else
-	{
-		/*
-		 * Bitmap is lossy, so we must examine each item pointer on the page.
-		 * But we can ignore HOT chains, since we'll check each tuple anyway.
-		 */
-		Page		dp = (Page) BufferGetPage(buffer);
-		OffsetNumber maxoff = PageGetMaxOffsetNumber(dp);
-		OffsetNumber offnum;
-
-		for (offnum = FirstOffsetNumber; offnum <= maxoff; offnum = OffsetNumberNext(offnum))
-		{
-			ItemId		lp;
-			bool		valid;
-			BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
-
-			lp = PageGetItemId(dp, offnum);
-			if (!ItemIdIsNormal(lp))
-				continue;
-
-			/* FIXME: unnecessarily pins */
-			table_tuple_fetch_from_offset(scan, page, offnum, slot);
-			valid = HeapTupleSatisfies(bslot->base.tuple, snapshot, bslot->buffer);
-			if (valid)
-			{
-				pagescan->rs_vistuples[ntup++] = offnum;
-				PredicateLockTuple(scan->rs_rd, bslot->base.tuple, snapshot);
-			}
-			CheckForSerializableConflictOut(valid, scan->rs_rd, bslot->base.tuple,
-											buffer, snapshot);
-		}
-	}
-
-	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
-
-	Assert(ntup <= MaxHeapTuplesPerPage);
-	pagescan->rs_ntuples = ntup;
+	table_scan_get_page_tuples(node->ss.ss_currentScanDesc,
+			node->pagescan,
+			node->ss.ss_ScanTupleSlot,
+			tbmres->blockno,
+			tbmres->ntuples,
+			tbmres->offsets);
 }
 
 /*
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index bf675ff881..df60ba3316 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -172,6 +172,13 @@ typedef bool (*TupleFetchFollow_function)(struct IndexFetchTableData *scan,
 										  TupleTableSlot *slot,
 										  bool *call_again, bool *all_dead);
 
+typedef void (*ScanGetPageTuples_function)(TableScanDesc scan,
+										   HeapPageScanDesc pagescan,
+										   TupleTableSlot *slot,
+										   BlockNumber page,
+										   int ntuples,
+										   OffsetNumber *offsets);
+
 /*
  * API struct for a table AM.  Note this must be stored in a single palloc'd
  * chunk of memory.
@@ -224,6 +231,7 @@ typedef struct TableAmRoutine
 	ScanGetpage_function scan_getpage;
 	ScanRescan_function scan_rescan;
 	ScanUpdateSnapshot_function scan_update_snapshot;
+	ScanGetPageTuples_function scan_get_page_tuples;
 
 	BeginIndexFetchTable_function begin_index_fetch;
 	EndIndexFetchTable_function reset_index_fetch;
@@ -472,6 +480,18 @@ table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot)
 	scan->rs_rd->rd_tableamroutine->scan_update_snapshot(scan, snapshot);
 }
 
+static inline void
+table_scan_get_page_tuples(TableScanDesc scan,
+		   HeapPageScanDesc pagescan,
+		   TupleTableSlot *slot,
+		   BlockNumber page,
+		   int ntuples,
+		   OffsetNumber *offsets)
+{
+	scan->rs_rd->rd_tableamroutine->scan_get_page_tuples(scan, pagescan, slot, page, ntuples, offsets);
+}
+
+
 static inline TupleTableSlot *
 table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *slot)
 {
-- 
2.18.0.windows.1

0001-Merge-tableam.h-and-tableamapi.h.patchapplication/octet-stream; name=0001-Merge-tableam.h-and-tableamapi.h.patchDownload
From 4bb12ad1e73927ee31683d728d5cdaebca6a53a6 Mon Sep 17 00:00:00 2001
From: Hari Babu <haribabuk@fast.au.fujitsu.com>
Date: Mon, 23 Jul 2018 14:44:35 +1000
Subject: [PATCH 1/2] Merge tableam.h and tableamapi.h

And also most tableam.c functions small inline functions.
Having one-line tableam.c wrappers makes this more expensive
than necessary.

The above change may added some internal headers also exposed
via tableam.h, may need another check.
---
 contrib/pgrowlocks/pgrowlocks.c          |   2 +-
 src/backend/access/heap/heapam.c         |   2 +-
 src/backend/access/heap/heapam_handler.c |   2 +-
 src/backend/access/heap/rewriteheap.c    |   1 +
 src/backend/access/nbtree/nbtsort.c      |   1 +
 src/backend/access/table/Makefile        |   2 +-
 src/backend/access/table/tableam.c       | 472 ---------------
 src/backend/access/table/tableamapi.c    |   2 +-
 src/backend/commands/cluster.c           |   2 +-
 src/backend/executor/execIndexing.c      |   1 +
 src/backend/executor/nodeSamplescan.c    |   1 +
 src/backend/optimizer/util/plancat.c     |   2 +-
 src/backend/postmaster/autovacuum.c      |   1 +
 src/backend/storage/lmgr/predicate.c     |   1 +
 src/backend/utils/adt/ri_triggers.c      |   1 +
 src/backend/utils/adt/selfuncs.c         |   1 +
 src/backend/utils/cache/relcache.c       |   2 +-
 src/include/access/relscan.h             |   1 -
 src/include/access/tableam.h             | 738 ++++++++++++++++++++---
 src/include/access/tableamapi.h          | 212 -------
 src/include/nodes/nodes.h                |   2 +-
 src/include/utils/tqual.h                |  15 -
 22 files changed, 669 insertions(+), 795 deletions(-)
 delete mode 100644 src/backend/access/table/tableam.c
 delete mode 100644 src/include/access/tableamapi.h

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index 3995a88397..959f5e7dc8 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -26,7 +26,7 @@
 
 #include "access/multixact.h"
 #include "access/relscan.h"
-#include "access/tableamapi.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 5b8155c911..40c1a5432d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -45,7 +45,7 @@
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/relscan.h"
-#include "access/tableamapi.h"
+#include "access/tableam.h"
 #include "access/sysattr.h"
 #include "access/transam.h"
 #include "access/tuptoaster.h"
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 94c64d8387..42eec2a2ab 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -23,7 +23,7 @@
 #include "access/heapam.h"
 #include "access/relscan.h"
 #include "access/rewriteheap.h"
-#include "access/tableamapi.h"
+#include "access/tableam.h"
 #include "catalog/pg_am_d.h"
 #include "pgstat.h"
 #include "storage/lmgr.h"
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 2ddb421eb0..5dad191ab2 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -110,6 +110,7 @@
 #include "access/heapam.h"
 #include "access/heapam_xlog.h"
 #include "access/rewriteheap.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/tuptoaster.h"
 #include "access/xact.h"
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 74f8e1bbeb..be74041df4 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -60,6 +60,7 @@
 #include "access/nbtree.h"
 #include "access/parallel.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
diff --git a/src/backend/access/table/Makefile b/src/backend/access/table/Makefile
index fe22bf9208..ff0989ed24 100644
--- a/src/backend/access/table/Makefile
+++ b/src/backend/access/table/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/access/table
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = tableam.o tableamapi.o tableam_common.o
+OBJS = tableamapi.o tableam_common.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
deleted file mode 100644
index 77c04aaa27..0000000000
--- a/src/backend/access/table/tableam.c
+++ /dev/null
@@ -1,472 +0,0 @@
-/*-------------------------------------------------------------------------
- *
- * tableam.c
- *	  table access method code
- *
- * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
- * Portions Copyright (c) 1994, Regents of the University of California
- *
- *
- * IDENTIFICATION
- *	  src/backend/access/table/tableam.c
- *
- *-------------------------------------------------------------------------
- */
-#include "postgres.h"
-
-#include "access/tableam.h"
-#include "access/tableamapi.h"
-#include "access/relscan.h"
-#include "storage/bufmgr.h"
-#include "utils/rel.h"
-#include "utils/tqual.h"
-
-TupleTableSlot*
-table_gimmegimmeslot(Relation relation, List **reglist)
-{
-	TupleTableSlot *slot;
-
-	slot = relation->rd_tableamroutine->gimmegimmeslot(relation);
-
-	if (reglist)
-		*reglist = lappend(*reglist, slot);
-
-	return slot;
-}
-
-/*
- *	table_fetch_row_version		- retrieve tuple with given tid
- *
- *  XXX: This shouldn't just take a tid, but tid + additional information
- */
-bool
-table_fetch_row_version(Relation r,
-						ItemPointer tid,
-						Snapshot snapshot,
-						TupleTableSlot *slot,
-						Relation stats_relation)
-{
-	return r->rd_tableamroutine->tuple_fetch_row_version(r, tid,
-														 snapshot, slot,
-														 stats_relation);
-}
-
-
-/*
- *	table_lock_tuple - lock a tuple in shared or exclusive mode
- *
- *  XXX: This shouldn't just take a tid, but tid + additional information
- */
-HTSU_Result
-table_lock_tuple(Relation relation, ItemPointer tid, Snapshot snapshot,
-				 TupleTableSlot *slot, CommandId cid, LockTupleMode mode,
-				 LockWaitPolicy wait_policy, uint8 flags,
-				 HeapUpdateFailureData *hufd)
-{
-	return relation->rd_tableamroutine->tuple_lock(relation, tid, snapshot, slot,
-												cid, mode, wait_policy,
-												flags, hufd);
-}
-
-/* ----------------
- *		heap_beginscan_parallel - join a parallel scan
- *
- *		Caller must hold a suitable lock on the correct relation.
- * ----------------
- */
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelHeapScanDesc parallel_scan)
-{
-	Snapshot	snapshot;
-
-	Assert(RelationGetRelid(relation) == parallel_scan->phs_relid);
-
-	if (!parallel_scan->phs_snapshot_any)
-	{
-		/* Snapshot was serialized -- restore it */
-		snapshot = RestoreSnapshot(parallel_scan->phs_snapshot_data);
-		RegisterSnapshot(snapshot);
-	}
-	else
-	{
-		/* SnapshotAny passed by caller (not serialized) */
-		snapshot = SnapshotAny;
-	}
-
-	return relation->rd_tableamroutine->scan_begin(relation, snapshot, 0, NULL, parallel_scan,
-												true, true, true, false, false, !parallel_scan->phs_snapshot_any);
-}
-
-ParallelHeapScanDesc
-tableam_get_parallelheapscandesc(TableScanDesc sscan)
-{
-	return sscan->rs_rd->rd_tableamroutine->scan_get_parallelheapscandesc(sscan);
-}
-
-HeapPageScanDesc
-tableam_get_heappagescandesc(TableScanDesc sscan)
-{
-	/*
-	 * Planner should have already validated whether the current storage
-	 * supports Page scans are not? This function will be called only from
-	 * Bitmap Heap scan and sample scan
-	 */
-	Assert(sscan->rs_rd->rd_tableamroutine->scan_get_heappagescandesc != NULL);
-
-	return sscan->rs_rd->rd_tableamroutine->scan_get_heappagescandesc(sscan);
-}
-
-void
-table_syncscan_report_location(Relation rel, BlockNumber location)
-{
-	return rel->rd_tableamroutine->sync_scan_report_location(rel, location);
-}
-
-/*
- * heap_setscanlimits - restrict range of a heapscan
- *
- * startBlk is the page to start at
- * numBlks is number of pages to scan (InvalidBlockNumber means "all")
- */
-void
-table_setscanlimits(TableScanDesc sscan, BlockNumber startBlk, BlockNumber numBlks)
-{
-	sscan->rs_rd->rd_tableamroutine->scansetlimits(sscan, startBlk, numBlks);
-}
-
-
-/* ----------------
- *		heap_beginscan	- begin relation scan
- *
- * heap_beginscan is the "standard" case.
- *
- * heap_beginscan_catalog differs in setting up its own temporary snapshot.
- *
- * heap_beginscan_strat offers an extended API that lets the caller control
- * whether a nondefault buffer access strategy can be used, and whether
- * syncscan can be chosen (possibly resulting in the scan not starting from
- * block zero).  Both of these default to true with plain heap_beginscan.
- *
- * heap_beginscan_bm is an alternative entry point for setting up a
- * TableScanDesc for a bitmap heap scan.  Although that scan technology is
- * really quite unlike a standard seqscan, there is just enough commonality
- * to make it worth using the same data structure.
- *
- * heap_beginscan_sampling is an alternative entry point for setting up a
- * TableScanDesc for a TABLESAMPLE scan.  As with bitmap scans, it's worth
- * using the same data structure although the behavior is rather different.
- * In addition to the options offered by heap_beginscan_strat, this call
- * also allows control of whether page-mode visibility checking is used.
- * ----------------
- */
-TableScanDesc
-table_beginscan(Relation relation, Snapshot snapshot,
-				  int nkeys, ScanKey key)
-{
-	return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
-												true, true, true, false, false, false);
-}
-
-TableScanDesc
-table_beginscan_catalog(Relation relation, int nkeys, ScanKey key)
-{
-	Oid			relid = RelationGetRelid(relation);
-	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
-
-	return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
-												true, true, true, false, false, true);
-}
-
-TableScanDesc
-table_beginscan_strat(Relation relation, Snapshot snapshot,
-						int nkeys, ScanKey key,
-						bool allow_strat, bool allow_sync)
-{
-	return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
-												allow_strat, allow_sync, true,
-												false, false, false);
-}
-
-TableScanDesc
-table_beginscan_bm(Relation relation, Snapshot snapshot,
-					 int nkeys, ScanKey key)
-{
-	return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
-												false, false, true, true, false, false);
-}
-
-TableScanDesc
-table_beginscan_sampling(Relation relation, Snapshot snapshot,
-						   int nkeys, ScanKey key,
-						   bool allow_strat, bool allow_sync, bool allow_pagemode)
-{
-	return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
-												allow_strat, allow_sync, allow_pagemode,
-												false, true, false);
-}
-
-/* ----------------
- *		heap_rescan		- restart a relation scan
- * ----------------
- */
-void
-table_rescan(TableScanDesc scan,
-			   ScanKey key)
-{
-	scan->rs_rd->rd_tableamroutine->scan_rescan(scan, key, false, false, false, false);
-}
-
-/* ----------------
- *		heap_rescan_set_params	- restart a relation scan after changing params
- *
- * This call allows changing the buffer strategy, syncscan, and pagemode
- * options before starting a fresh scan.  Note that although the actual use
- * of syncscan might change (effectively, enabling or disabling reporting),
- * the previously selected startblock will be kept.
- * ----------------
- */
-void
-table_rescan_set_params(TableScanDesc scan, ScanKey key,
-						  bool allow_strat, bool allow_sync, bool allow_pagemode)
-{
-	scan->rs_rd->rd_tableamroutine->scan_rescan(scan, key, true,
-											 allow_strat, allow_sync, (allow_pagemode && IsMVCCSnapshot(scan->rs_snapshot)));
-}
-
-/* ----------------
- *		heap_endscan	- end relation scan
- *
- *		See how to integrate with index scans.
- *		Check handling if reldesc caching.
- * ----------------
- */
-void
-table_endscan(TableScanDesc scan)
-{
-	scan->rs_rd->rd_tableamroutine->scan_end(scan);
-}
-
-
-/* ----------------
- *		heap_update_snapshot
- *
- *		Update snapshot info in heap scan descriptor.
- * ----------------
- */
-void
-table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot)
-{
-	scan->rs_rd->rd_tableamroutine->scan_update_snapshot(scan, snapshot);
-}
-
-TupleTableSlot *
-table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *slot)
-{
-	return sscan->rs_rd->rd_tableamroutine->scan_getnextslot(sscan, direction, slot);
-}
-
-bool
-table_tuple_fetch_from_offset(TableScanDesc sscan, BlockNumber blkno, OffsetNumber offset, TupleTableSlot *slot)
-{
-	return sscan->rs_rd->rd_tableamroutine->scan_fetch_tuple_from_offset(sscan, blkno, offset, slot);
-}
-
-
-IndexFetchTableData*
-table_begin_index_fetch_table(Relation rel)
-{
-	return rel->rd_tableamroutine->begin_index_fetch(rel);
-}
-
-void
-table_reset_index_fetch_table(IndexFetchTableData* scan)
-{
-	scan->rel->rd_tableamroutine->reset_index_fetch(scan);
-}
-
-void
-table_end_index_fetch_table(IndexFetchTableData* scan)
-{
-	scan->rel->rd_tableamroutine->end_index_fetch(scan);
-}
-
-/*
- * Insert a tuple from a slot into table AM routine
- */
-Oid
-table_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
-			   int options, BulkInsertState bistate)
-{
-	return relation->rd_tableamroutine->tuple_insert(relation, slot, cid, options,
-													 bistate);
-}
-
-Oid
-table_insert_speculative(Relation relation, TupleTableSlot *slot, CommandId cid,
-						 int options, BulkInsertState bistate, uint32 specToken)
-{
-	return relation->rd_tableamroutine->tuple_insert_speculative(relation, slot, cid, options,
-																 bistate, specToken);
-}
-
-void table_complete_speculative(Relation relation, TupleTableSlot *slot, uint32 specToken,
-								bool succeeded)
-{
-	return relation->rd_tableamroutine->tuple_complete_speculative(relation, slot, specToken, succeeded);
-}
-
-/*
- * Delete a tuple from tid using table AM routine
- */
-HTSU_Result
-table_delete(Relation relation, ItemPointer tid, CommandId cid,
-			 Snapshot crosscheck, bool wait,
-			 HeapUpdateFailureData *hufd, bool changingPart)
-{
-	return relation->rd_tableamroutine->tuple_delete(relation, tid, cid,
-												  crosscheck, wait, hufd, changingPart);
-}
-
-/*
- * update a tuple from tid using table AM routine
- */
-HTSU_Result
-table_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
-			 CommandId cid, Snapshot crosscheck, bool wait,
-			 HeapUpdateFailureData *hufd, LockTupleMode *lockmode,
-			 bool *update_indexes)
-{
-	return relation->rd_tableamroutine->tuple_update(relation, otid, slot,
-												  cid, crosscheck, wait, hufd,
-												  lockmode, update_indexes);
-}
-
-bool
-table_fetch_follow(struct IndexFetchTableData *scan,
-				   ItemPointer tid,
-				   Snapshot snapshot,
-				   TupleTableSlot *slot,
-				   bool *call_again, bool *all_dead)
-{
-
-	return scan->rel->rd_tableamroutine->tuple_fetch_follow(scan, tid, snapshot,
-														   slot, call_again,
-														   all_dead);
-}
-
-bool
-table_fetch_follow_check(Relation rel,
-						 ItemPointer tid,
-						 Snapshot snapshot,
-						 bool *all_dead)
-{
-	IndexFetchTableData *scan = table_begin_index_fetch_table(rel);
-	TupleTableSlot *slot = table_gimmegimmeslot(rel, NULL);
-	bool call_again = false;
-	bool found;
-
-	found = table_fetch_follow(scan, tid, snapshot, slot, &call_again, all_dead);
-
-	table_end_index_fetch_table(scan);
-	ExecDropSingleTupleTableSlot(slot);
-
-	return found;
-}
-
-/*
- *	table_multi_insert	- insert multiple tuple into a table
- */
-void
-table_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
-					 CommandId cid, int options, BulkInsertState bistate)
-{
-	relation->rd_tableamroutine->multi_insert(relation, tuples, ntuples,
-										   cid, options, bistate);
-}
-
-tuple_data
-table_tuple_get_data(Relation relation, TupleTableSlot *slot, tuple_data_flags flags)
-{
-	return relation->rd_tableamroutine->get_tuple_data(slot, flags);
-}
-
-void
-table_get_latest_tid(Relation relation,
-					   Snapshot snapshot,
-					   ItemPointer tid)
-{
-	relation->rd_tableamroutine->tuple_get_latest_tid(relation, snapshot, tid);
-}
-
-
-void
-table_vacuum_rel(Relation rel, int options,
-			 struct VacuumParams *params, BufferAccessStrategy bstrategy)
-{
-	rel->rd_tableamroutine->relation_vacuum(rel, options, params, bstrategy);
-}
-
-/*
- *	table_sync		- sync a heap, for use when no WAL has been written
- */
-void
-table_sync(Relation rel)
-{
-	rel->rd_tableamroutine->relation_sync(rel);
-}
-
-/*
- * -------------------
- * storage Bulk Insert functions
- * -------------------
- */
-BulkInsertState
-table_getbulkinsertstate(Relation rel)
-{
-	return rel->rd_tableamroutine->getbulkinsertstate();
-}
-
-void
-table_freebulkinsertstate(Relation rel, BulkInsertState bistate)
-{
-	rel->rd_tableamroutine->freebulkinsertstate(bistate);
-}
-
-void
-table_releasebulkinsertstate(Relation rel, BulkInsertState bistate)
-{
-	rel->rd_tableamroutine->releasebulkinsertstate(bistate);
-}
-
-/*
- * -------------------
- * storage tuple rewrite functions
- * -------------------
- */
-RewriteState
-table_begin_rewrite(Relation OldHeap, Relation NewHeap,
-				   TransactionId OldestXmin, TransactionId FreezeXid,
-				   MultiXactId MultiXactCutoff, bool use_wal)
-{
-	return NewHeap->rd_tableamroutine->begin_heap_rewrite(OldHeap, NewHeap,
-			OldestXmin, FreezeXid, MultiXactCutoff, use_wal);
-}
-
-void
-table_end_rewrite(Relation rel, RewriteState state)
-{
-	rel->rd_tableamroutine->end_heap_rewrite(state);
-}
-
-void
-table_rewrite_tuple(Relation rel, RewriteState state, HeapTuple oldTuple,
-				   HeapTuple newTuple)
-{
-	rel->rd_tableamroutine->rewrite_heap_tuple(state, oldTuple, newTuple);
-}
-
-bool
-table_rewrite_dead_tuple(Relation rel, RewriteState state, HeapTuple oldTuple)
-{
-	return rel->rd_tableamroutine->rewrite_heap_dead_tuple(state, oldTuple);
-}
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index f94660e306..91e5774a6e 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -13,7 +13,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
-#include "access/tableamapi.h"
+#include "access/tableam.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_proc.h"
 #include "utils/syscache.h"
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 14ed2aa393..34f815c28f 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -21,7 +21,7 @@
 #include "access/multixact.h"
 #include "access/relscan.h"
 #include "access/rewriteheap.h"
-#include "access/tableamapi.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/tuptoaster.h"
 #include "access/xact.h"
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 542210b29f..80b604821b 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -107,6 +107,7 @@
 #include "postgres.h"
 
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "catalog/index.h"
 #include "executor/executor.h"
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 566dabaa00..b5d02983c5 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -16,6 +16,7 @@
 
 #include "access/hash.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "access/tsmapi.h"
 #include "executor/executor.h"
 #include "executor/nodeSamplescan.h"
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index f3cd64cf62..8fe8312f29 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -21,7 +21,7 @@
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/nbtree.h"
-#include "access/tableamapi.h"
+#include "access/tableam.h"
 #include "access/sysattr.h"
 #include "access/transam.h"
 #include "access/xlog.h"
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 4455b42875..7142a54ce9 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -69,6 +69,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/reloptions.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "catalog/dependency.h"
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index e8390311d0..2960e21340 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -188,6 +188,7 @@
 #include "access/htup_details.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/twophase_rmgr.h"
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index a661f4b047..254041cea7 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -31,6 +31,7 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/sysattr.h"
 #include "access/xact.h"
 #include "catalog/pg_collation.h"
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 203b83ad06..7fcf077426 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -103,6 +103,7 @@
 #include "access/brin.h"
 #include "access/gin.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/sysattr.h"
 #include "catalog/index.h"
 #include "catalog/pg_am.h"
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 6360371493..ece332bd44 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -36,7 +36,7 @@
 #include "access/nbtree.h"
 #include "access/reloptions.h"
 #include "access/sysattr.h"
-#include "access/tableamapi.h"
+#include "access/tableam.h"
 #include "access/tupdesc_details.h"
 #include "access/xact.h"
 #include "access/xlog.h"
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index e9d8eed541..97208d4c44 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -16,7 +16,6 @@
 
 #include "access/genam.h"
 #include "access/heapam.h"
-#include "access/tableam.h"
 #include "access/htup_details.h"
 #include "access/itup.h"
 #include "access/tupdesc.h"
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index fd05018ee8..bf675ff881 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -14,10 +14,18 @@
 #ifndef TABLEAM_H
 #define TABLEAM_H
 
+#include "postgres.h"
+
 #include "access/heapam.h"
+#include "access/relscan.h"
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
+#include "nodes/nodes.h"
+#include "fmgr.h"
 #include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/snapshot.h"
+#include "utils/tqual.h"
 
 #define DEFAULT_TABLE_ACCESS_METHOD	"heap_tableam"
 
@@ -37,103 +45,661 @@ typedef enum tuple_data_flags
 	CTID
 }			tuple_data_flags;
 
-extern TupleTableSlot* table_gimmegimmeslot(Relation relation, List **reglist);
-extern TableScanDesc table_beginscan_parallel(Relation relation, ParallelHeapScanDesc parallel_scan);
-extern ParallelHeapScanDesc tableam_get_parallelheapscandesc(TableScanDesc sscan);
-extern HeapPageScanDesc tableam_get_heappagescandesc(TableScanDesc sscan);
-extern void table_syncscan_report_location(Relation rel, BlockNumber location);
-extern void table_setscanlimits(TableScanDesc sscan, BlockNumber startBlk, BlockNumber numBlks);
-extern TableScanDesc table_beginscan(Relation relation, Snapshot snapshot,
-				  int nkeys, ScanKey key);
-extern TableScanDesc table_beginscan_catalog(Relation relation, int nkeys, ScanKey key);
-extern TableScanDesc table_beginscan_strat(Relation relation, Snapshot snapshot,
-						int nkeys, ScanKey key,
-						bool allow_strat, bool allow_sync);
-extern TableScanDesc table_beginscan_bm(Relation relation, Snapshot snapshot,
-					 int nkeys, ScanKey key);
-extern TableScanDesc table_beginscan_sampling(Relation relation, Snapshot snapshot,
-						   int nkeys, ScanKey key,
-						   bool allow_strat, bool allow_sync, bool allow_pagemode);
 
-extern struct IndexFetchTableData* table_begin_index_fetch_table(Relation rel);
-extern void table_reset_index_fetch_table(struct IndexFetchTableData* scan);
-extern void table_end_index_fetch_table(struct IndexFetchTableData* scan);
+/*
+ * Storage routine function hooks
+ */
+typedef bool (*SnapshotSatisfies_function) (TupleTableSlot *slot, Snapshot snapshot);
+typedef HTSU_Result (*SnapshotSatisfiesUpdate_function) (TupleTableSlot *slot, CommandId curcid);
+typedef HTSV_Result (*SnapshotSatisfiesVacuum_function) (TupleTableSlot *slot, TransactionId OldestXmin);
 
-extern void table_endscan(TableScanDesc scan);
-extern void table_rescan(TableScanDesc scan, ScanKey key);
-extern void table_rescan_set_params(TableScanDesc scan, ScanKey key,
-						  bool allow_strat, bool allow_sync, bool allow_pagemode);
-extern void table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot);
+typedef Oid (*TupleInsert_function) (Relation rel, TupleTableSlot *slot, CommandId cid,
+									 int options, BulkInsertState bistate);
 
-extern TupleTableSlot *table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *slot);
-extern bool table_tuple_fetch_from_offset(TableScanDesc sscan, BlockNumber blkno, OffsetNumber offset, TupleTableSlot *slot);
+typedef Oid (*TupleInsertSpeculative_function) (Relation rel,
+												 TupleTableSlot *slot,
+												 CommandId cid,
+												 int options,
+												 BulkInsertState bistate,
+												 uint32 specToken);
 
-extern void storage_get_latest_tid(Relation relation,
-					   Snapshot snapshot,
-					   ItemPointer tid);
-
-extern bool table_fetch_row_version(Relation relation,
-			  ItemPointer tid,
-			  Snapshot snapshot,
-			  TupleTableSlot *slot,
-			  Relation stats_relation);
-
-extern bool table_fetch_follow(struct IndexFetchTableData *scan,
-							   ItemPointer tid,
-							   Snapshot snapshot,
-							   TupleTableSlot *slot,
-							   bool *call_again, bool *all_dead);
-
-extern bool table_fetch_follow_check(Relation rel,
-									 ItemPointer tid,
-									 Snapshot snapshot,
-									 bool *all_dead);
-
-extern HTSU_Result table_lock_tuple(Relation relation, ItemPointer tid, Snapshot snapshot,
-				   TupleTableSlot *slot, CommandId cid, LockTupleMode mode,
-				   LockWaitPolicy wait_policy, uint8 flags,
-				   HeapUpdateFailureData *hufd);
-
-extern Oid table_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
-			   int options, BulkInsertState bistate);
-extern Oid table_insert_speculative(Relation relation, TupleTableSlot *slot, CommandId cid,
-									int options, BulkInsertState bistate, uint32 specToken);
-extern void table_complete_speculative(Relation relation, TupleTableSlot *slot, uint32 specToken,
-									   bool succeeded);
-
-extern HTSU_Result table_delete(Relation relation, ItemPointer tid, CommandId cid,
-			   Snapshot crosscheck, bool wait, HeapUpdateFailureData *hufd,
-			   bool changingPart);
-
-extern HTSU_Result table_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
-			   CommandId cid, Snapshot crosscheck, bool wait,
-			   HeapUpdateFailureData *hufd, LockTupleMode *lockmode,
-			   bool *upddate_indexes);
-
-extern void table_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
-					 CommandId cid, int options, BulkInsertState bistate);
-
-extern tuple_data table_tuple_get_data(Relation relation, TupleTableSlot *slot, tuple_data_flags flags);
-
-extern void table_get_latest_tid(Relation relation,
-					   Snapshot snapshot,
-					   ItemPointer tid);
 
-extern void table_sync(Relation rel);
+typedef void (*TupleCompleteSpeculative_function) (Relation rel,
+												  TupleTableSlot *slot,
+												  uint32 specToken,
+												  bool succeeded);
+
+typedef HTSU_Result (*TupleDelete_function) (Relation relation,
+											 ItemPointer tid,
+											 CommandId cid,
+											 Snapshot crosscheck,
+											 bool wait,
+											 HeapUpdateFailureData *hufd,
+											 bool changingPart);
+
+typedef HTSU_Result (*TupleUpdate_function) (Relation relation,
+											 ItemPointer otid,
+											 TupleTableSlot *slot,
+											 CommandId cid,
+											 Snapshot crosscheck,
+											 bool wait,
+											 HeapUpdateFailureData *hufd,
+											 LockTupleMode *lockmode,
+											 bool *update_indexes);
+
+typedef bool (*TupleFetchRowVersion_function) (Relation relation,
+											   ItemPointer tid,
+											   Snapshot snapshot,
+											   TupleTableSlot *slot,
+											   Relation stats_relation);
+
+typedef HTSU_Result (*TupleLock_function) (Relation relation,
+										   ItemPointer tid,
+										   Snapshot snapshot,
+										   TupleTableSlot *slot,
+										   CommandId cid,
+										   LockTupleMode mode,
+										   LockWaitPolicy wait_policy,
+										   uint8 flags,
+										   HeapUpdateFailureData *hufd);
+
+typedef void (*MultiInsert_function) (Relation relation, HeapTuple *tuples, int ntuples,
+									  CommandId cid, int options, BulkInsertState bistate);
+
+typedef void (*TupleGetLatestTid_function) (Relation relation,
+											Snapshot snapshot,
+											ItemPointer tid);
+
+typedef tuple_data(*GetTupleData_function) (TupleTableSlot *slot, tuple_data_flags flags);
+
 struct VacuumParams;
-extern void table_vacuum_rel(Relation onerel, int options,
+typedef void (*RelationVacuum_function)(Relation onerel, int options,
 				struct VacuumParams *params, BufferAccessStrategy bstrategy);
 
-extern BulkInsertState table_getbulkinsertstate(Relation rel);
-extern void table_freebulkinsertstate(Relation rel, BulkInsertState bistate);
-extern void table_releasebulkinsertstate(Relation rel, BulkInsertState bistate);
+typedef void (*RelationSync_function) (Relation relation);
 
-extern RewriteState table_begin_rewrite(Relation OldHeap, Relation NewHeap,
+typedef BulkInsertState (*GetBulkInsertState_function) (void);
+typedef void (*FreeBulkInsertState_function) (BulkInsertState bistate);
+typedef void (*ReleaseBulkInsertState_function) (BulkInsertState bistate);
+
+typedef RewriteState (*BeginHeapRewrite_function) (Relation OldHeap, Relation NewHeap,
 				   TransactionId OldestXmin, TransactionId FreezeXid,
 				   MultiXactId MultiXactCutoff, bool use_wal);
-extern void table_end_rewrite(Relation rel, RewriteState state);
-extern void table_rewrite_tuple(Relation rel, RewriteState state, HeapTuple oldTuple,
+typedef void (*EndHeapRewrite_function) (RewriteState state);
+typedef void (*RewriteHeapTuple_function) (RewriteState state, HeapTuple oldTuple,
 				   HeapTuple newTuple);
-extern bool table_rewrite_dead_tuple(Relation rel, RewriteState state, HeapTuple oldTuple);
+typedef bool (*RewriteHeapDeadTuple_function) (RewriteState state, HeapTuple oldTuple);
+
+typedef TupleTableSlot* (*Slot_function) (Relation relation);
+
+typedef TableScanDesc (*ScanBegin_function) (Relation relation,
+											Snapshot snapshot,
+											int nkeys, ScanKey key,
+											ParallelHeapScanDesc parallel_scan,
+											bool allow_strat,
+											bool allow_sync,
+											bool allow_pagemode,
+											bool is_bitmapscan,
+											bool is_samplescan,
+											bool temp_snap);
+
+typedef struct IndexFetchTableData* (*BeginIndexFetchTable_function) (Relation relation);
+typedef void (*ResetIndexFetchTable_function) (struct IndexFetchTableData* data);
+typedef void (*EndIndexFetchTable_function) (struct IndexFetchTableData* data);
+
+typedef ParallelHeapScanDesc (*ScanGetParallelheapscandesc_function) (TableScanDesc scan);
+typedef HeapPageScanDesc(*ScanGetHeappagescandesc_function) (TableScanDesc scan);
+typedef void (*SyncScanReportLocation_function) (Relation rel, BlockNumber location);
+typedef void (*ScanSetlimits_function) (TableScanDesc sscan, BlockNumber startBlk, BlockNumber numBlks);
+
+typedef TupleTableSlot *(*ScanGetnextSlot_function) (TableScanDesc scan,
+													 ScanDirection direction, TupleTableSlot *slot);
+
+typedef bool (*ScanFetchTupleFromOffset_function) (TableScanDesc scan,
+												   BlockNumber blkno, OffsetNumber offset, TupleTableSlot *slot);
+
+typedef void (*ScanEnd_function) (TableScanDesc scan);
+
+
+typedef void (*ScanGetpage_function) (TableScanDesc scan, BlockNumber page);
+typedef void (*ScanRescan_function) (TableScanDesc scan, ScanKey key, bool set_params,
+									 bool allow_strat, bool allow_sync, bool allow_pagemode);
+typedef void (*ScanUpdateSnapshot_function) (TableScanDesc scan, Snapshot snapshot);
+
+typedef bool (*TupleFetchFollow_function)(struct IndexFetchTableData *scan,
+										  ItemPointer tid,
+										  Snapshot snapshot,
+										  TupleTableSlot *slot,
+										  bool *call_again, bool *all_dead);
+
+/*
+ * API struct for a table AM.  Note this must be stored in a single palloc'd
+ * chunk of memory.
+ */
+typedef struct TableAmRoutine
+{
+	NodeTag		type;
+
+	Slot_function gimmegimmeslot;
+
+	SnapshotSatisfies_function snapshot_satisfies;
+	SnapshotSatisfiesUpdate_function snapshot_satisfiesUpdate;
+	SnapshotSatisfiesVacuum_function snapshot_satisfiesVacuum;
+
+	/* Operations on physical tuples */
+	TupleInsert_function tuple_insert;
+	TupleInsertSpeculative_function tuple_insert_speculative;
+	TupleCompleteSpeculative_function tuple_complete_speculative;
+	TupleUpdate_function tuple_update;
+	TupleDelete_function tuple_delete;
+	TupleFetchRowVersion_function tuple_fetch_row_version;
+	TupleLock_function tuple_lock;
+	MultiInsert_function multi_insert;
+	TupleGetLatestTid_function tuple_get_latest_tid;
+	TupleFetchFollow_function tuple_fetch_follow;
+
+	GetTupleData_function get_tuple_data;
+
+	RelationVacuum_function relation_vacuum;
+	RelationSync_function relation_sync;
+
+	GetBulkInsertState_function getbulkinsertstate;
+	FreeBulkInsertState_function freebulkinsertstate;
+	ReleaseBulkInsertState_function releasebulkinsertstate;
+
+	BeginHeapRewrite_function begin_heap_rewrite;
+	EndHeapRewrite_function end_heap_rewrite;
+	RewriteHeapTuple_function rewrite_heap_tuple;
+	RewriteHeapDeadTuple_function rewrite_heap_dead_tuple;
+
+	/* Operations on relation scans */
+	ScanBegin_function scan_begin;
+	ScanGetParallelheapscandesc_function scan_get_parallelheapscandesc;
+	ScanGetHeappagescandesc_function scan_get_heappagescandesc;
+	SyncScanReportLocation_function sync_scan_report_location;
+	ScanSetlimits_function scansetlimits;
+	ScanGetnextSlot_function scan_getnextslot;
+	ScanFetchTupleFromOffset_function scan_fetch_tuple_from_offset;
+	ScanEnd_function scan_end;
+	ScanGetpage_function scan_getpage;
+	ScanRescan_function scan_rescan;
+	ScanUpdateSnapshot_function scan_update_snapshot;
+
+	BeginIndexFetchTable_function begin_index_fetch;
+	EndIndexFetchTable_function reset_index_fetch;
+	EndIndexFetchTable_function end_index_fetch;
+
+}			TableAmRoutine;
+
+/*
+ * INLINE functions
+ */
+static inline TupleTableSlot*
+table_gimmegimmeslot(Relation relation, List **reglist)
+{
+	TupleTableSlot *slot;
+
+	slot = relation->rd_tableamroutine->gimmegimmeslot(relation);
+
+	if (reglist)
+		*reglist = lappend(*reglist, slot);
+
+	return slot;
+}
+
+/*
+ *	table_fetch_row_version		- retrieve tuple with given tid
+ *
+ *  XXX: This shouldn't just take a tid, but tid + additional information
+ */
+static inline bool
+table_fetch_row_version(Relation r,
+						ItemPointer tid,
+						Snapshot snapshot,
+						TupleTableSlot *slot,
+						Relation stats_relation)
+{
+	return r->rd_tableamroutine->tuple_fetch_row_version(r, tid,
+														 snapshot, slot,
+														 stats_relation);
+}
+
+
+/*
+ *	table_lock_tuple - lock a tuple in shared or exclusive mode
+ *
+ *  XXX: This shouldn't just take a tid, but tid + additional information
+ */
+static inline HTSU_Result
+table_lock_tuple(Relation relation, ItemPointer tid, Snapshot snapshot,
+				 TupleTableSlot *slot, CommandId cid, LockTupleMode mode,
+				 LockWaitPolicy wait_policy, uint8 flags,
+				 HeapUpdateFailureData *hufd)
+{
+	return relation->rd_tableamroutine->tuple_lock(relation, tid, snapshot, slot,
+												cid, mode, wait_policy,
+												flags, hufd);
+}
+
+/* ----------------
+ *		heap_beginscan_parallel - join a parallel scan
+ *
+ *		Caller must hold a suitable lock on the correct relation.
+ * ----------------
+ */
+static inline TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelHeapScanDesc parallel_scan)
+{
+	Snapshot	snapshot;
+
+	Assert(RelationGetRelid(relation) == parallel_scan->phs_relid);
+
+	if (!parallel_scan->phs_snapshot_any)
+	{
+		/* Snapshot was serialized -- restore it */
+		snapshot = RestoreSnapshot(parallel_scan->phs_snapshot_data);
+		RegisterSnapshot(snapshot);
+	}
+	else
+	{
+		/* SnapshotAny passed by caller (not serialized) */
+		snapshot = SnapshotAny;
+	}
+
+	return relation->rd_tableamroutine->scan_begin(relation, snapshot, 0, NULL, parallel_scan,
+												true, true, true, false, false, !parallel_scan->phs_snapshot_any);
+}
+
+static inline ParallelHeapScanDesc
+tableam_get_parallelheapscandesc(TableScanDesc sscan)
+{
+	return sscan->rs_rd->rd_tableamroutine->scan_get_parallelheapscandesc(sscan);
+}
+
+static inline HeapPageScanDesc
+tableam_get_heappagescandesc(TableScanDesc sscan)
+{
+	/*
+	 * Planner should have already validated whether the current storage
+	 * supports Page scans are not? This function will be called only from
+	 * Bitmap Heap scan and sample scan
+	 */
+	Assert(sscan->rs_rd->rd_tableamroutine->scan_get_heappagescandesc != NULL);
+
+	return sscan->rs_rd->rd_tableamroutine->scan_get_heappagescandesc(sscan);
+}
+
+static inline void
+table_syncscan_report_location(Relation rel, BlockNumber location)
+{
+	return rel->rd_tableamroutine->sync_scan_report_location(rel, location);
+}
+
+/*
+ * heap_setscanlimits - restrict range of a heapscan
+ *
+ * startBlk is the page to start at
+ * numBlks is number of pages to scan (InvalidBlockNumber means "all")
+ */
+static inline void
+table_setscanlimits(TableScanDesc sscan, BlockNumber startBlk, BlockNumber numBlks)
+{
+	sscan->rs_rd->rd_tableamroutine->scansetlimits(sscan, startBlk, numBlks);
+}
+
+
+/* ----------------
+ *		heap_beginscan	- begin relation scan
+ *
+ * heap_beginscan is the "standard" case.
+ *
+ * heap_beginscan_catalog differs in setting up its own temporary snapshot.
+ *
+ * heap_beginscan_strat offers an extended API that lets the caller control
+ * whether a nondefault buffer access strategy can be used, and whether
+ * syncscan can be chosen (possibly resulting in the scan not starting from
+ * block zero).  Both of these default to true with plain heap_beginscan.
+ *
+ * heap_beginscan_bm is an alternative entry point for setting up a
+ * TableScanDesc for a bitmap heap scan.  Although that scan technology is
+ * really quite unlike a standard seqscan, there is just enough commonality
+ * to make it worth using the same data structure.
+ *
+ * heap_beginscan_sampling is an alternative entry point for setting up a
+ * TableScanDesc for a TABLESAMPLE scan.  As with bitmap scans, it's worth
+ * using the same data structure although the behavior is rather different.
+ * In addition to the options offered by heap_beginscan_strat, this call
+ * also allows control of whether page-mode visibility checking is used.
+ * ----------------
+ */
+static inline TableScanDesc
+table_beginscan(Relation relation, Snapshot snapshot,
+				  int nkeys, ScanKey key)
+{
+	return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
+												true, true, true, false, false, false);
+}
+
+static inline TableScanDesc
+table_beginscan_catalog(Relation relation, int nkeys, ScanKey key)
+{
+	Oid			relid = RelationGetRelid(relation);
+	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
+
+	return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
+												true, true, true, false, false, true);
+}
+
+static inline TableScanDesc
+table_beginscan_strat(Relation relation, Snapshot snapshot,
+						int nkeys, ScanKey key,
+						bool allow_strat, bool allow_sync)
+{
+	return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
+												allow_strat, allow_sync, true,
+												false, false, false);
+}
+
+static inline TableScanDesc
+table_beginscan_bm(Relation relation, Snapshot snapshot,
+					 int nkeys, ScanKey key)
+{
+	return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
+												false, false, true, true, false, false);
+}
+
+static inline TableScanDesc
+table_beginscan_sampling(Relation relation, Snapshot snapshot,
+						   int nkeys, ScanKey key,
+						   bool allow_strat, bool allow_sync, bool allow_pagemode)
+{
+	return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
+												allow_strat, allow_sync, allow_pagemode,
+												false, true, false);
+}
+
+/* ----------------
+ *		heap_rescan		- restart a relation scan
+ * ----------------
+ */
+static inline void
+table_rescan(TableScanDesc scan,
+			   ScanKey key)
+{
+	scan->rs_rd->rd_tableamroutine->scan_rescan(scan, key, false, false, false, false);
+}
+
+/* ----------------
+ *		heap_rescan_set_params	- restart a relation scan after changing params
+ *
+ * This call allows changing the buffer strategy, syncscan, and pagemode
+ * options before starting a fresh scan.  Note that although the actual use
+ * of syncscan might change (effectively, enabling or disabling reporting),
+ * the previously selected startblock will be kept.
+ * ----------------
+ */
+static inline void
+table_rescan_set_params(TableScanDesc scan, ScanKey key,
+						  bool allow_strat, bool allow_sync, bool allow_pagemode)
+{
+	scan->rs_rd->rd_tableamroutine->scan_rescan(scan, key, true,
+											 allow_strat, allow_sync, (allow_pagemode && IsMVCCSnapshot(scan->rs_snapshot)));
+}
+
+/* ----------------
+ *		heap_endscan	- end relation scan
+ *
+ *		See how to integrate with index scans.
+ *		Check handling if reldesc caching.
+ * ----------------
+ */
+static inline void
+table_endscan(TableScanDesc scan)
+{
+	scan->rs_rd->rd_tableamroutine->scan_end(scan);
+}
+
+
+/* ----------------
+ *		heap_update_snapshot
+ *
+ *		Update snapshot info in heap scan descriptor.
+ * ----------------
+ */
+static inline void
+table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot)
+{
+	scan->rs_rd->rd_tableamroutine->scan_update_snapshot(scan, snapshot);
+}
+
+static inline TupleTableSlot *
+table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *slot)
+{
+	return sscan->rs_rd->rd_tableamroutine->scan_getnextslot(sscan, direction, slot);
+}
+
+static inline bool
+table_tuple_fetch_from_offset(TableScanDesc sscan, BlockNumber blkno, OffsetNumber offset, TupleTableSlot *slot)
+{
+	return sscan->rs_rd->rd_tableamroutine->scan_fetch_tuple_from_offset(sscan, blkno, offset, slot);
+}
+
+
+static inline IndexFetchTableData*
+table_begin_index_fetch_table(Relation rel)
+{
+	return rel->rd_tableamroutine->begin_index_fetch(rel);
+}
+
+static inline void
+table_reset_index_fetch_table(struct IndexFetchTableData* scan)
+{
+	scan->rel->rd_tableamroutine->reset_index_fetch(scan);
+}
+
+static inline void
+table_end_index_fetch_table(struct IndexFetchTableData* scan)
+{
+	scan->rel->rd_tableamroutine->end_index_fetch(scan);
+}
+
+/*
+ * Insert a tuple from a slot into table AM routine
+ */
+static inline Oid
+table_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
+			   int options, BulkInsertState bistate)
+{
+	return relation->rd_tableamroutine->tuple_insert(relation, slot, cid, options,
+													 bistate);
+}
+
+static inline Oid
+table_insert_speculative(Relation relation, TupleTableSlot *slot, CommandId cid,
+						 int options, BulkInsertState bistate, uint32 specToken)
+{
+	return relation->rd_tableamroutine->tuple_insert_speculative(relation, slot, cid, options,
+																 bistate, specToken);
+}
+
+static inline void
+table_complete_speculative(Relation relation, TupleTableSlot *slot, uint32 specToken,
+								bool succeeded)
+{
+	return relation->rd_tableamroutine->tuple_complete_speculative(relation, slot, specToken, succeeded);
+}
+
+/*
+ * Delete a tuple from tid using table AM routine
+ */
+static inline HTSU_Result
+table_delete(Relation relation, ItemPointer tid, CommandId cid,
+			 Snapshot crosscheck, bool wait,
+			 HeapUpdateFailureData *hufd, bool changingPart)
+{
+	return relation->rd_tableamroutine->tuple_delete(relation, tid, cid,
+												  crosscheck, wait, hufd, changingPart);
+}
+
+/*
+ * update a tuple from tid using table AM routine
+ */
+static inline HTSU_Result
+table_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
+			 CommandId cid, Snapshot crosscheck, bool wait,
+			 HeapUpdateFailureData *hufd, LockTupleMode *lockmode,
+			 bool *update_indexes)
+{
+	return relation->rd_tableamroutine->tuple_update(relation, otid, slot,
+												  cid, crosscheck, wait, hufd,
+												  lockmode, update_indexes);
+}
+
+static inline bool
+table_fetch_follow(struct IndexFetchTableData *scan,
+				   ItemPointer tid,
+				   Snapshot snapshot,
+				   TupleTableSlot *slot,
+				   bool *call_again, bool *all_dead)
+{
+
+	return scan->rel->rd_tableamroutine->tuple_fetch_follow(scan, tid, snapshot,
+														   slot, call_again,
+														   all_dead);
+}
+
+static inline bool
+table_fetch_follow_check(Relation rel,
+						 ItemPointer tid,
+						 Snapshot snapshot,
+						 bool *all_dead)
+{
+	IndexFetchTableData *scan = table_begin_index_fetch_table(rel);
+	TupleTableSlot *slot = table_gimmegimmeslot(rel, NULL);
+	bool call_again = false;
+	bool found;
+
+	found = table_fetch_follow(scan, tid, snapshot, slot, &call_again, all_dead);
+
+	table_end_index_fetch_table(scan);
+	ExecDropSingleTupleTableSlot(slot);
+
+	return found;
+}
+
+/*
+ *	table_multi_insert	- insert multiple tuple into a table
+ */
+static inline void
+table_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
+					 CommandId cid, int options, BulkInsertState bistate)
+{
+	relation->rd_tableamroutine->multi_insert(relation, tuples, ntuples,
+										   cid, options, bistate);
+}
+
+static inline tuple_data
+table_tuple_get_data(Relation relation, TupleTableSlot *slot, tuple_data_flags flags)
+{
+	return relation->rd_tableamroutine->get_tuple_data(slot, flags);
+}
+
+static inline void
+table_get_latest_tid(Relation relation,
+					   Snapshot snapshot,
+					   ItemPointer tid)
+{
+	relation->rd_tableamroutine->tuple_get_latest_tid(relation, snapshot, tid);
+}
+
+
+static inline void
+table_vacuum_rel(Relation rel, int options,
+			 struct VacuumParams *params, BufferAccessStrategy bstrategy)
+{
+	rel->rd_tableamroutine->relation_vacuum(rel, options, params, bstrategy);
+}
+
+/*
+ *	table_sync		- sync a heap, for use when no WAL has been written
+ */
+static inline void
+table_sync(Relation rel)
+{
+	rel->rd_tableamroutine->relation_sync(rel);
+}
+
+/*
+ * -------------------
+ * storage Bulk Insert functions
+ * -------------------
+ */
+static inline BulkInsertState
+table_getbulkinsertstate(Relation rel)
+{
+	return rel->rd_tableamroutine->getbulkinsertstate();
+}
+
+static inline void
+table_freebulkinsertstate(Relation rel, BulkInsertState bistate)
+{
+	rel->rd_tableamroutine->freebulkinsertstate(bistate);
+}
+
+static inline void
+table_releasebulkinsertstate(Relation rel, BulkInsertState bistate)
+{
+	rel->rd_tableamroutine->releasebulkinsertstate(bistate);
+}
+
+/*
+ * -------------------
+ * storage tuple rewrite functions
+ * -------------------
+ */
+static inline RewriteState
+table_begin_rewrite(Relation OldHeap, Relation NewHeap,
+				   TransactionId OldestXmin, TransactionId FreezeXid,
+				   MultiXactId MultiXactCutoff, bool use_wal)
+{
+	return NewHeap->rd_tableamroutine->begin_heap_rewrite(OldHeap, NewHeap,
+			OldestXmin, FreezeXid, MultiXactCutoff, use_wal);
+}
+
+static inline void
+table_end_rewrite(Relation rel, RewriteState state)
+{
+	rel->rd_tableamroutine->end_heap_rewrite(state);
+}
+
+static inline void
+table_rewrite_tuple(Relation rel, RewriteState state, HeapTuple oldTuple,
+				   HeapTuple newTuple)
+{
+	rel->rd_tableamroutine->rewrite_heap_tuple(state, oldTuple, newTuple);
+}
+
+static inline bool
+table_rewrite_dead_tuple(Relation rel, RewriteState state, HeapTuple oldTuple)
+{
+	return rel->rd_tableamroutine->rewrite_heap_dead_tuple(state, oldTuple);
+}
+
+/*
+ * HeapTupleSatisfiesVisibility
+ *		True iff heap tuple satisfies a time qual.
+ *
+ * Notes:
+ *	Assumes heap tuple is valid.
+ *	Beware of multiple evaluations of snapshot argument.
+ *	Hint bits in the HeapTuple's t_infomask may be updated as a side effect;
+ *	if so, the indicated buffer is marked dirty.
+ */
+#define HeapTupleSatisfiesVisibility(method, slot, snapshot) \
+	(((method)->snapshot_satisfies) (slot, snapshot))
+
+extern TableAmRoutine * GetTableAmRoutine(Oid amhandler);
+extern TableAmRoutine * GetTableAmRoutineByAmId(Oid amoid);
+extern TableAmRoutine * GetHeapamTableAmRoutine(void);
 
 #endif		/* TABLEAM_H */
diff --git a/src/include/access/tableamapi.h b/src/include/access/tableamapi.h
deleted file mode 100644
index a4a6e7fd23..0000000000
--- a/src/include/access/tableamapi.h
+++ /dev/null
@@ -1,212 +0,0 @@
-/*---------------------------------------------------------------------
- *
- * tableamapi.h
- *		API for Postgres table access methods
- *
- * Copyright (c) 2017, PostgreSQL Global Development Group
- *
- * src/include/access/tableamapi.h
- *---------------------------------------------------------------------
- */
-#ifndef TABLEEAMAPI_H
-#define TABLEEAMAPI_H
-
-#include "access/heapam.h"
-#include "access/tableam.h"
-#include "nodes/execnodes.h"
-#include "nodes/nodes.h"
-#include "fmgr.h"
-#include "utils/snapshot.h"
-
-struct IndexFetchTableData;
-
-/*
- * Storage routine function hooks
- */
-typedef bool (*SnapshotSatisfies_function) (TupleTableSlot *slot, Snapshot snapshot);
-typedef HTSU_Result (*SnapshotSatisfiesUpdate_function) (TupleTableSlot *slot, CommandId curcid);
-typedef HTSV_Result (*SnapshotSatisfiesVacuum_function) (TupleTableSlot *slot, TransactionId OldestXmin);
-
-typedef Oid (*TupleInsert_function) (Relation rel, TupleTableSlot *slot, CommandId cid,
-									 int options, BulkInsertState bistate);
-
-typedef Oid (*TupleInsertSpeculative_function) (Relation rel,
-												 TupleTableSlot *slot,
-												 CommandId cid,
-												 int options,
-												 BulkInsertState bistate,
-												 uint32 specToken);
-
-
-typedef void (*TupleCompleteSpeculative_function) (Relation rel,
-												  TupleTableSlot *slot,
-												  uint32 specToken,
-												  bool succeeded);
-
-typedef HTSU_Result (*TupleDelete_function) (Relation relation,
-											 ItemPointer tid,
-											 CommandId cid,
-											 Snapshot crosscheck,
-											 bool wait,
-											 HeapUpdateFailureData *hufd,
-											 bool changingPart);
-
-typedef HTSU_Result (*TupleUpdate_function) (Relation relation,
-											 ItemPointer otid,
-											 TupleTableSlot *slot,
-											 CommandId cid,
-											 Snapshot crosscheck,
-											 bool wait,
-											 HeapUpdateFailureData *hufd,
-											 LockTupleMode *lockmode,
-											 bool *update_indexes);
-
-typedef bool (*TupleFetchRowVersion_function) (Relation relation,
-											   ItemPointer tid,
-											   Snapshot snapshot,
-											   TupleTableSlot *slot,
-											   Relation stats_relation);
-
-typedef HTSU_Result (*TupleLock_function) (Relation relation,
-										   ItemPointer tid,
-										   Snapshot snapshot,
-										   TupleTableSlot *slot,
-										   CommandId cid,
-										   LockTupleMode mode,
-										   LockWaitPolicy wait_policy,
-										   uint8 flags,
-										   HeapUpdateFailureData *hufd);
-
-typedef void (*MultiInsert_function) (Relation relation, HeapTuple *tuples, int ntuples,
-									  CommandId cid, int options, BulkInsertState bistate);
-
-typedef void (*TupleGetLatestTid_function) (Relation relation,
-											Snapshot snapshot,
-											ItemPointer tid);
-
-typedef tuple_data(*GetTupleData_function) (TupleTableSlot *slot, tuple_data_flags flags);
-
-struct VacuumParams;
-typedef void (*RelationVacuum_function)(Relation onerel, int options,
-				struct VacuumParams *params, BufferAccessStrategy bstrategy);
-
-typedef void (*RelationSync_function) (Relation relation);
-
-typedef BulkInsertState (*GetBulkInsertState_function) (void);
-typedef void (*FreeBulkInsertState_function) (BulkInsertState bistate);
-typedef void (*ReleaseBulkInsertState_function) (BulkInsertState bistate);
-
-typedef RewriteState (*BeginHeapRewrite_function) (Relation OldHeap, Relation NewHeap,
-				   TransactionId OldestXmin, TransactionId FreezeXid,
-				   MultiXactId MultiXactCutoff, bool use_wal);
-typedef void (*EndHeapRewrite_function) (RewriteState state);
-typedef void (*RewriteHeapTuple_function) (RewriteState state, HeapTuple oldTuple,
-				   HeapTuple newTuple);
-typedef bool (*RewriteHeapDeadTuple_function) (RewriteState state, HeapTuple oldTuple);
-
-typedef TupleTableSlot* (*Slot_function) (Relation relation);
-
-typedef TableScanDesc (*ScanBegin_function) (Relation relation,
-											Snapshot snapshot,
-											int nkeys, ScanKey key,
-											ParallelHeapScanDesc parallel_scan,
-											bool allow_strat,
-											bool allow_sync,
-											bool allow_pagemode,
-											bool is_bitmapscan,
-											bool is_samplescan,
-											bool temp_snap);
-
-typedef struct IndexFetchTableData* (*BeginIndexFetchTable_function) (Relation relation);
-typedef void (*ResetIndexFetchTable_function) (struct IndexFetchTableData* data);
-typedef void (*EndIndexFetchTable_function) (struct IndexFetchTableData* data);
-
-typedef ParallelHeapScanDesc (*ScanGetParallelheapscandesc_function) (TableScanDesc scan);
-typedef HeapPageScanDesc(*ScanGetHeappagescandesc_function) (TableScanDesc scan);
-typedef void (*SyncScanReportLocation_function) (Relation rel, BlockNumber location);
-typedef void (*ScanSetlimits_function) (TableScanDesc sscan, BlockNumber startBlk, BlockNumber numBlks);
-
-typedef TupleTableSlot *(*ScanGetnextSlot_function) (TableScanDesc scan,
-													 ScanDirection direction, TupleTableSlot *slot);
-
-typedef bool (*ScanFetchTupleFromOffset_function) (TableScanDesc scan,
-												   BlockNumber blkno, OffsetNumber offset, TupleTableSlot *slot);
-
-typedef void (*ScanEnd_function) (TableScanDesc scan);
-
-
-typedef void (*ScanGetpage_function) (TableScanDesc scan, BlockNumber page);
-typedef void (*ScanRescan_function) (TableScanDesc scan, ScanKey key, bool set_params,
-									 bool allow_strat, bool allow_sync, bool allow_pagemode);
-typedef void (*ScanUpdateSnapshot_function) (TableScanDesc scan, Snapshot snapshot);
-
-typedef bool (*TupleFetchFollow_function)(struct IndexFetchTableData *scan,
-										  ItemPointer tid,
-										  Snapshot snapshot,
-										  TupleTableSlot *slot,
-										  bool *call_again, bool *all_dead);
-
-/*
- * API struct for a table AM.  Note this must be stored in a single palloc'd
- * chunk of memory.
- */
-typedef struct TableAmRoutine
-{
-	NodeTag		type;
-
-	Slot_function gimmegimmeslot;
-
-	SnapshotSatisfies_function snapshot_satisfies;
-	SnapshotSatisfiesUpdate_function snapshot_satisfiesUpdate;
-	SnapshotSatisfiesVacuum_function snapshot_satisfiesVacuum;
-
-	/* Operations on physical tuples */
-	TupleInsert_function tuple_insert;
-	TupleInsertSpeculative_function tuple_insert_speculative;
-	TupleCompleteSpeculative_function tuple_complete_speculative;
-	TupleUpdate_function tuple_update;
-	TupleDelete_function tuple_delete;
-	TupleFetchRowVersion_function tuple_fetch_row_version;
-	TupleLock_function tuple_lock;
-	MultiInsert_function multi_insert;
-	TupleGetLatestTid_function tuple_get_latest_tid;
-	TupleFetchFollow_function tuple_fetch_follow;
-
-	GetTupleData_function get_tuple_data;
-
-	RelationVacuum_function relation_vacuum;
-	RelationSync_function relation_sync;
-
-	GetBulkInsertState_function getbulkinsertstate;
-	FreeBulkInsertState_function freebulkinsertstate;
-	ReleaseBulkInsertState_function releasebulkinsertstate;
-
-	BeginHeapRewrite_function begin_heap_rewrite;
-	EndHeapRewrite_function end_heap_rewrite;
-	RewriteHeapTuple_function rewrite_heap_tuple;
-	RewriteHeapDeadTuple_function rewrite_heap_dead_tuple;
-
-	/* Operations on relation scans */
-	ScanBegin_function scan_begin;
-	ScanGetParallelheapscandesc_function scan_get_parallelheapscandesc;
-	ScanGetHeappagescandesc_function scan_get_heappagescandesc;
-	SyncScanReportLocation_function sync_scan_report_location;
-	ScanSetlimits_function scansetlimits;
-	ScanGetnextSlot_function scan_getnextslot;
-	ScanFetchTupleFromOffset_function scan_fetch_tuple_from_offset;
-	ScanEnd_function scan_end;
-	ScanGetpage_function scan_getpage;
-	ScanRescan_function scan_rescan;
-	ScanUpdateSnapshot_function scan_update_snapshot;
-
-	BeginIndexFetchTable_function begin_index_fetch;
-	EndIndexFetchTable_function reset_index_fetch;
-	EndIndexFetchTable_function end_index_fetch;
-
-}			TableAmRoutine;
-
-extern TableAmRoutine * GetTableAmRoutine(Oid amhandler);
-extern TableAmRoutine * GetTableAmRoutineByAmId(Oid amoid);
-extern TableAmRoutine * GetHeapamTableAmRoutine(void);
-
-#endif							/* TABLEEAMAPI_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 945bbc3ddf..c69ca99435 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -502,7 +502,7 @@ typedef enum NodeTag
 	T_InlineCodeBlock,			/* in nodes/parsenodes.h */
 	T_FdwRoutine,				/* in foreign/fdwapi.h */
 	T_IndexAmRoutine,			/* in access/amapi.h */
-	T_TableAmRoutine,			/* in access/tableamapi.h */
+	T_TableAmRoutine,			/* in access/tableam.h */
 	T_TsmRoutine,				/* in access/tsmapi.h */
 	T_ForeignKeyCacheInfo,		/* in utils/rel.h */
 	T_CallContext				/* in nodes/parsenodes.h */
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 9739bed9e0..1fe9cc6402 100644
--- a/src/include/utils/tqual.h
+++ b/src/include/utils/tqual.h
@@ -16,10 +16,8 @@
 #define TQUAL_H
 
 #include "utils/snapshot.h"
-#include "access/tableamapi.h"
 #include "access/xlogdefs.h"
 
-
 /* Static variables representing various special snapshot semantics */
 extern PGDLLIMPORT SnapshotData SnapshotSelfData;
 extern PGDLLIMPORT SnapshotData SnapshotAnyData;
@@ -33,19 +31,6 @@ extern PGDLLIMPORT SnapshotData CatalogSnapshotData;
 	((snapshot)->visibility_type == MVCC_VISIBILITY || \
 	 (snapshot)->visibility_type == HISTORIC_MVCC_VISIBILITY)
 
-/*
- * HeapTupleSatisfiesVisibility
- *		True iff heap tuple satisfies a time qual.
- *
- * Notes:
- *	Assumes heap tuple is valid.
- *	Beware of multiple evaluations of snapshot argument.
- *	Hint bits in the HeapTuple's t_infomask may be updated as a side effect;
- *	if so, the indicated buffer is marked dirty.
- */
-#define HeapTupleSatisfiesVisibility(method, slot, snapshot) \
-	(((method)->snapshot_satisfies) (slot, snapshot))
-
 /*
  * To avoid leaking too much knowledge about reorderbuffer implementation
  * details this is implemented in reorderbuffer.c not tqual.c.
-- 
2.18.0.windows.1

#9Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#8)
2 attachment(s)
Re: Pluggable Storage - Andres's take

On Tue, Jul 24, 2018 at 11:31 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Tue, Jul 17, 2018 at 11:01 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

I added new API in the tableam.h to get all the page visible tuples to

abstract the bitgetpage() function.

- Merge tableam.h and tableamapi.h and make most tableam.c functions
small inline functions. Having one-line tableam.c wrappers makes this
more expensive than necessary. We'll have a big enough trouble not
regressing performancewise.

I merged tableam.h and tableamapi.h into tableam.h and changed all the
functions as inline. This change may have added some additional headers,
will check them if I can remove their need.

Attached are the updated patches on top your github tree.

Currently I am working on the following.
- I observed that there is a crash when running isolation tests.

while investing the crash, I observed that it is due to the lot of FIXME's
in
the code. So I just fixed minimal changes and looking into correcting
the FIXME's first.

One thing I observed is lack relation pointer is leading to crash in the
flow of EvalPlan* functions, because all ROW_MARK types doesn't
contains relation pointer.

will continue to check all FIXME fixes.

- COPY's multi_insert path should probably deal with a bunch of slots,
rather than forming HeapTuples

Implemented supporting of slots in the copy multi insert path.

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0002-Isolation-test-fixes-1.patchapplication/octet-stream; name=0002-Isolation-test-fixes-1.patchDownload
From 3e4d02fd2ade9b9b116b013899b4b81b435379a8 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Fri, 3 Aug 2018 11:29:05 +1000
Subject: [PATCH 2/2] Isolation test fixes -1

---
 src/backend/access/heap/heapam_handler.c | 9 ++++++---
 src/backend/executor/execMain.c          | 5 +++--
 src/backend/executor/nodeModifyTable.c   | 6 +++++-
 3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index a3fe110efe..cce2123416 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -317,7 +317,8 @@ retry:
 				if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple->t_data),
 										 priorXmax))
 				{
-					ReleaseBuffer(buffer);
+					if (BufferIsValid(buffer))
+							ReleaseBuffer(buffer);
 					return HeapTupleDeleted;
 				}
 
@@ -336,7 +337,8 @@ retry:
 				if (ItemPointerEquals(&tuple->t_self, &tuple->t_data->t_ctid))
 				{
 					/* deleted, so forget about it */
-					ReleaseBuffer(buffer);
+					if (BufferIsValid(buffer))
+							ReleaseBuffer(buffer);
 					return HeapTupleDeleted;
 				}
 
@@ -344,7 +346,8 @@ retry:
 				*tid = tuple->t_data->t_ctid;
 				/* updated row should have xmin matching this xmax */
 				priorXmax = HeapTupleHeaderGetUpdateXid(tuple->t_data);
-				ReleaseBuffer(buffer);
+				if (BufferIsValid(buffer))
+						ReleaseBuffer(buffer);
 				/* loop back to fetch next in chain */
 			}
 		}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ea60e588a5..fd3e53d1ee 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2552,8 +2552,9 @@ EvalPlanQual(EState *estate, EPQState *epqstate,
 	 * datums that may be present in copyTuple).  As with the next step, this
 	 * is to guard against early re-use of the EPQ query.
 	 */
-	if (!TupIsNull(slot))
-		ExecMaterializeSlot(slot);
+	/*if (!TupIsNull(slot))
+	 *	ExecMaterializeSlot(slot);
+	 */
 
 #if FIXME
 	/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 5ae0bab9f5..71150ad32e 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1177,7 +1177,11 @@ lreplace:;
 
 		if (result == HeapTupleUpdated && !IsolationUsesXactSnapshot())
 		{
-			TupleTableSlot *inputslot = EvalPlanQualSlot(epqstate, resultRelationDesc, resultRelInfo->ri_RangeTableIndex);
+			TupleTableSlot *inputslot;
+
+			EvalPlanQualBegin(epqstate, estate);
+
+			inputslot = EvalPlanQualSlot(epqstate, resultRelationDesc, resultRelInfo->ri_RangeTableIndex);
 
 			result = table_lock_tuple(resultRelationDesc, tupleid,
 									  estate->es_snapshot,
-- 
2.18.0.windows.1

0001-COPY-s-multi_insert-path-deal-with-bunch-of-slots.patchapplication/octet-stream; name=0001-COPY-s-multi_insert-path-deal-with-bunch-of-slots.patchDownload
From 2bcbe0ffc5df81b38e6c6f0093eb486669c7a3b2 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Fri, 3 Aug 2018 11:07:52 +1000
Subject: [PATCH 1/2] COPY's multi_insert path deal with bunch of slots

Support of passing slots instead of tuples when doing
multi insert.
---
 src/backend/access/heap/heapam.c |  31 ++++++----
 src/backend/commands/copy.c      | 100 +++++++++++++++----------------
 src/include/access/heapam.h      |   3 +-
 src/include/access/tableam.h     |   6 +-
 4 files changed, 74 insertions(+), 66 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 40c1a5432d..7d0d1dc234 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2648,7 +2648,7 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
  * temporary context before calling this, if that's a problem.
  */
 void
-heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
+heap_multi_insert(Relation relation, TupleTableSlot **slots, int nslots,
 				  CommandId cid, int options, BulkInsertState bistate)
 {
 	TransactionId xid = GetCurrentTransactionId();
@@ -2666,11 +2666,18 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 	saveFreeSpace = RelationGetTargetPageFreeSpace(relation,
 												   HEAP_DEFAULT_FILLFACTOR);
 
-	/* Toast and set header data in all the tuples */
-	heaptuples = palloc(ntuples * sizeof(HeapTuple));
-	for (i = 0; i < ntuples; i++)
-		heaptuples[i] = heap_prepare_insert(relation, tuples[i],
+	/* Toast and set header data in all the slots */
+	heaptuples = palloc(nslots * sizeof(HeapTuple));
+	for (i = 0; i < nslots; i++)
+	{
+		heaptuples[i] = heap_prepare_insert(relation, ExecGetHeapTupleFromSlot(slots[i]),
 											xid, cid, options);
+		if (slots[i]->tts_tupleOid != InvalidOid)
+			HeapTupleSetOid(heaptuples[i], slots[i]->tts_tupleOid);
+
+		if (slots[i]->tts_tableOid != InvalidOid)
+			heaptuples[i]->t_tableOid = slots[i]->tts_tableOid;
+	}
 
 	/*
 	 * Allocate some memory to use for constructing the WAL record. Using
@@ -2706,7 +2713,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 	CheckForSerializableConflictIn(relation, NULL, InvalidBuffer);
 
 	ndone = 0;
-	while (ndone < ntuples)
+	while (ndone < nslots)
 	{
 		Buffer		buffer;
 		Buffer		vmbuffer = InvalidBuffer;
@@ -2732,7 +2739,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 		 * Put that on the page, and then as many other tuples as fit.
 		 */
 		RelationPutHeapTuple(relation, buffer, heaptuples[ndone], false);
-		for (nthispage = 1; ndone + nthispage < ntuples; nthispage++)
+		for (nthispage = 1; ndone + nthispage < nslots; nthispage++)
 		{
 			HeapTuple	heaptup = heaptuples[ndone + nthispage];
 
@@ -2841,7 +2848,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 			 * emitted by this call to heap_multi_insert(). Needed for logical
 			 * decoding so it knows when to cleanup temporary data.
 			 */
-			if (ndone + nthispage == ntuples)
+			if (ndone + nthispage == nslots)
 				xlrec->flags |= XLH_INSERT_LAST_IN_MULTI;
 
 			if (init)
@@ -2904,7 +2911,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 	 */
 	if (IsCatalogRelation(relation))
 	{
-		for (i = 0; i < ntuples; i++)
+		for (i = 0; i < nslots; i++)
 			CacheInvalidateHeapTuple(relation, heaptuples[i], NULL);
 	}
 
@@ -2913,10 +2920,10 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 	 * nothing for untoasted tuples (tuples[i] == heaptuples[i)], but it's
 	 * probably faster to always copy than check.
 	 */
-	for (i = 0; i < ntuples; i++)
-		tuples[i]->t_self = heaptuples[i]->t_self;
+	for (i = 0; i < nslots; i++)
+		slots[i]->tts_tid = heaptuples[i]->t_self;
 
-	pgstat_count_heap_insert(relation, ntuples);
+	pgstat_count_heap_insert(relation, nslots);
 }
 
 /*
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index d49734ddab..62ee8cfea7 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -306,9 +306,9 @@ static void CopyOneRowTo(CopyState cstate, Oid tupleOid,
 			 Datum *values, bool *nulls);
 static void CopyFromInsertBatch(CopyState cstate, EState *estate,
 					CommandId mycid, int hi_options,
-					ResultRelInfo *resultRelInfo, TupleTableSlot *myslot,
+					ResultRelInfo *resultRelInfo,
 					BulkInsertState bistate,
-					int nBufferedTuples, HeapTuple *bufferedTuples,
+					int nslots, TupleTableSlot **slots,
 					uint64 firstBufferedLineNo);
 static bool CopyReadLine(CopyState cstate);
 static bool CopyReadLineText(CopyState cstate);
@@ -2310,7 +2310,6 @@ CopyFrom(CopyState cstate)
 	EState	   *estate = CreateExecutorState(); /* for ExecConstraints() */
 	ModifyTableState *mtstate;
 	ExprContext *econtext;
-	TupleTableSlot *myslot;
 	MemoryContext oldcontext = CurrentMemoryContext;
 
 	ErrorContextCallback errcallback;
@@ -2319,12 +2318,11 @@ CopyFrom(CopyState cstate)
 	void       *bistate;
 	uint64		processed = 0;
 	bool		useHeapMultiInsert;
-	int			nBufferedTuples = 0;
+	int			nslots = 0;
 	int			prev_leaf_part_index = -1;
 
-#define MAX_BUFFERED_TUPLES 1000
-	HeapTuple  *bufferedTuples = NULL;	/* initialize to silence warning */
-	Size		bufferedTuplesSize = 0;
+#define MAX_BUFFERED_SLOTS 1000
+	TupleTableSlot  **slots = NULL;	/* initialize to silence warning */
 	uint64		firstBufferedLineNo = 0;
 
 	Assert(cstate->rel);
@@ -2467,10 +2465,6 @@ CopyFrom(CopyState cstate)
 	estate->es_result_relation_info = resultRelInfo;
 	estate->es_range_table = cstate->range_table;
 
-	/* Set up a tuple slot too */
-	myslot = ExecInitExtraTupleSlot(estate, tupDesc,
-									TTS_TYPE_HEAPTUPLE);
-
 	/*
 	 * Set up a ModifyTableState so we can let FDW(s) init themselves for
 	 * foreign-table result relation(s).
@@ -2541,7 +2535,7 @@ CopyFrom(CopyState cstate)
 	else
 	{
 		useHeapMultiInsert = true;
-		bufferedTuples = palloc(MAX_BUFFERED_TUPLES * sizeof(HeapTuple));
+		slots = palloc(MAX_BUFFERED_SLOTS * sizeof(TupleTableSlot *));
 	}
 
 	/*
@@ -2569,11 +2563,17 @@ CopyFrom(CopyState cstate)
 		TupleTableSlot *slot;
 		bool		skip_tuple;
 		Oid			loaded_oid = InvalidOid;
+		int			natts = resultRelInfo->ri_RelationDesc->rd_att->natts;
+		int			cnt;
 
 		CHECK_FOR_INTERRUPTS();
 
-		if (nBufferedTuples == 0)
+		if (nslots == 0)
 		{
+			/* Reset Tupletable slots if any */
+			ExecResetTupleTable(estate->es_tupleTable, false);
+			estate->es_tupleTable = NIL;
+
 			/*
 			 * Reset the per-tuple exprcontext. We can only do this if the
 			 * tuple buffer is empty. (Calling the context the per-tuple
@@ -2588,25 +2588,32 @@ CopyFrom(CopyState cstate)
 		if (!NextCopyFrom(cstate, econtext, values, nulls, &loaded_oid))
 			break;
 
-		/* And now we can form the input tuple. */
-		tuple = heap_form_tuple(tupDesc, values, nulls);
+		slot = ExecInitExtraTupleSlot(estate,
+						RelationGetDescr(resultRelInfo->ri_RelationDesc),
+						useHeapMultiInsert ? TTS_TYPE_VIRTUAL : TTS_TYPE_HEAPTUPLE);
+
+		/* Directly store the values/nulls array in the slot */
+		memcpy(slot->tts_isnull, nulls, sizeof(bool) * natts);
+		for (cnt = 0; cnt < natts; cnt++)
+		{
+			if (!slot->tts_isnull[cnt])
+				slot->tts_values[cnt] = values[cnt];
+		}
+
+		ExecStoreVirtualTuple(slot);
 
 		if (loaded_oid != InvalidOid)
-			HeapTupleSetOid(tuple, loaded_oid);
+			slot->tts_tupleOid = loaded_oid;
 
 		/*
 		 * Constraints might reference the tableoid column, so initialize
 		 * t_tableOid before evaluating them.
 		 */
-		tuple->t_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+		slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 
 		/* Triggers and stuff need to be invoked in query context. */
 		MemoryContextSwitchTo(oldcontext);
 
-		/* Place tuple in tuple slot --- but slot shouldn't free it */
-		slot = myslot;
-		ExecStoreTuple(tuple, slot, InvalidBuffer, false);
-
 		/* Determine the partition to heap_insert the tuple into */
 		if (cstate->partition_tuple_routing)
 		{
@@ -2659,6 +2666,9 @@ CopyFrom(CopyState cstate)
 			 */
 			estate->es_result_relation_info = resultRelInfo;
 
+			/* FIXME: Get the HeapTuple from slot */
+			tuple = ExecGetHeapTupleFromSlot(slot);
+
 			/*
 			 * If we're capturing transition tuples, we might need to convert
 			 * from the partition rowtype to parent rowtype.
@@ -2699,6 +2709,7 @@ CopyFrom(CopyState cstate)
 											  &slot);
 
 			tuple->t_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+			slot->tts_tableOid = tuple->t_tableOid;
 		}
 
 		skip_tuple = false;
@@ -2709,8 +2720,6 @@ CopyFrom(CopyState cstate)
 		{
 			if (!ExecBRInsertTriggers(estate, resultRelInfo, slot))
 				skip_tuple = true;	/* "do nothing" */
-			else				/* trigger might have changed tuple */
-				tuple = ExecGetHeapTupleFromSlot(slot);
 		}
 
 		if (!skip_tuple)
@@ -2746,10 +2755,9 @@ CopyFrom(CopyState cstate)
 				if (useHeapMultiInsert)
 				{
 					/* Add this tuple to the tuple buffer */
-					if (nBufferedTuples == 0)
+					if (nslots == 0)
 						firstBufferedLineNo = cstate->cur_lineno;
-					bufferedTuples[nBufferedTuples++] = tuple;
-					bufferedTuplesSize += tuple->t_len;
+					slots[nslots++] = slot;
 
 					/*
 					 * If the buffer filled up, flush it.  Also flush if the
@@ -2757,15 +2765,13 @@ CopyFrom(CopyState cstate)
 					 * large, to avoid using large amounts of memory for the
 					 * buffer when the tuples are exceptionally wide.
 					 */
-					if (nBufferedTuples == MAX_BUFFERED_TUPLES ||
-						bufferedTuplesSize > 65535)
+					if (nslots == MAX_BUFFERED_SLOTS)
 					{
 						CopyFromInsertBatch(cstate, estate, mycid, hi_options,
-											resultRelInfo, myslot, bistate,
-											nBufferedTuples, bufferedTuples,
+											resultRelInfo, bistate,
+											nslots, slots,
 											firstBufferedLineNo);
-						nBufferedTuples = 0;
-						bufferedTuplesSize = 0;
+						nslots = 0;
 					}
 				}
 				else
@@ -2783,15 +2789,12 @@ CopyFrom(CopyState cstate)
 						if (slot == NULL)	/* "do nothing" */
 							goto next_tuple;
 
-						/* FDW might have changed tuple */
-						tuple = ExecGetHeapTupleFromSlot(slot);
-
 						/*
 						 * AFTER ROW Triggers might reference the tableoid
 						 * column, so initialize t_tableOid before evaluating
 						 * them.
 						 */
-						tuple->t_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+						slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 					}
 					else
 					{
@@ -2834,10 +2837,10 @@ next_tuple:
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (nBufferedTuples > 0)
+	if (nslots > 0)
 		CopyFromInsertBatch(cstate, estate, mycid, hi_options,
-							resultRelInfo, myslot, bistate,
-							nBufferedTuples, bufferedTuples,
+							resultRelInfo, bistate,
+							nslots, slots,
 							firstBufferedLineNo);
 
 	/* Done, clean up */
@@ -2900,8 +2903,7 @@ next_tuple:
 static void
 CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 					int hi_options, ResultRelInfo *resultRelInfo,
-					TupleTableSlot *myslot, BulkInsertState bistate,
-					int nBufferedTuples, HeapTuple *bufferedTuples,
+					BulkInsertState bistate, int nslots, TupleTableSlot **slots,
 					uint64 firstBufferedLineNo)
 {
 	MemoryContext oldcontext;
@@ -2921,8 +2923,8 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 	 */
 	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
 	table_multi_insert(cstate->rel,
-					   bufferedTuples,
-					   nBufferedTuples,
+					   slots,
+					   nslots,
 					   mycid,
 					   hi_options,
 					   bistate);
@@ -2934,16 +2936,15 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 	 */
 	if (resultRelInfo->ri_NumIndices > 0)
 	{
-		for (i = 0; i < nBufferedTuples; i++)
+		for (i = 0; i < nslots; i++)
 		{
 			List	   *recheckIndexes;
 
 			cstate->cur_lineno = firstBufferedLineNo + i;
-			ExecStoreTuple(bufferedTuples[i], myslot, InvalidBuffer, false);
 			recheckIndexes =
-				ExecInsertIndexTuples(myslot, estate, false, NULL, NIL);
+				ExecInsertIndexTuples(slots[i], estate, false, NULL, NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
-								 myslot,
+								 slots[i],
 								 recheckIndexes, cstate->transition_capture);
 			list_free(recheckIndexes);
 		}
@@ -2957,12 +2958,11 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 			 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
 			  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
 	{
-		for (i = 0; i < nBufferedTuples; i++)
+		for (i = 0; i < nslots; i++)
 		{
 			cstate->cur_lineno = firstBufferedLineNo + i;
-			ExecStoreTuple(bufferedTuples[i], myslot, InvalidBuffer, false);
 			ExecARInsertTriggers(estate, resultRelInfo,
-								 myslot,
+								 slots[i],
 								 NIL, cstate->transition_capture);
 		}
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 5f89b5b174..a36a0c49e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -16,6 +16,7 @@
 
 #include "access/sdir.h"
 #include "access/skey.h"
+#include "executor/tuptable.h"
 #include "nodes/lockoptions.h"
 #include "nodes/primnodes.h"
 #include "storage/bufpage.h"
@@ -168,7 +169,7 @@ extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
 
 extern Oid heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 			int options, BulkInsertState bistate);
-extern void heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
+extern void heap_multi_insert(Relation relation, TupleTableSlot **slots, int nslots,
 				  CommandId cid, int options, BulkInsertState bistate);
 extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
 			CommandId cid, Snapshot crosscheck, bool wait,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index df60ba3316..9912a171fb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -103,7 +103,7 @@ typedef HTSU_Result (*TupleLock_function) (Relation relation,
 										   uint8 flags,
 										   HeapUpdateFailureData *hufd);
 
-typedef void (*MultiInsert_function) (Relation relation, HeapTuple *tuples, int ntuples,
+typedef void (*MultiInsert_function) (Relation relation, TupleTableSlot **slots, int nslots,
 									  CommandId cid, int options, BulkInsertState bistate);
 
 typedef void (*TupleGetLatestTid_function) (Relation relation,
@@ -611,10 +611,10 @@ table_fetch_follow_check(Relation rel,
  *	table_multi_insert	- insert multiple tuple into a table
  */
 static inline void
-table_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
+table_multi_insert(Relation relation, TupleTableSlot **slots, int nslots,
 					 CommandId cid, int options, BulkInsertState bistate)
 {
-	relation->rd_tableamroutine->multi_insert(relation, tuples, ntuples,
+	relation->rd_tableamroutine->multi_insert(relation, slots, nslots,
 										   cid, options, bistate);
 }
 
-- 
2.18.0.windows.1

#10Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#9)
Re: Pluggable Storage - Andres's take

Hi,

I'm currently in the process of rebasing zheap onto the pluggable
storage work. The goal, which seems to work surprisingly well, is to
find issues that the current pluggable storage patch doesn't yet deal
with. I plan to push a tree including a lot of fixes and improvements
soon.

On 2018-08-03 12:35:50 +1000, Haribabu Kommi wrote:

while investing the crash, I observed that it is due to the lot of FIXME's
in
the code. So I just fixed minimal changes and looking into correcting
the FIXME's first.

One thing I observed is lack relation pointer is leading to crash in the
flow of EvalPlan* functions, because all ROW_MARK types doesn't
contains relation pointer.

will continue to check all FIXME fixes.

Thanks.

- COPY's multi_insert path should probably deal with a bunch of slots,
rather than forming HeapTuples

Implemented supporting of slots in the copy multi insert path.

Cool. I've not yet looked at it, but I plan to do so soon. Will have to
rebase over the other copy changes first :(

- Andres

#11Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#10)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Sun, Aug 5, 2018 at 7:48 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

I'm currently in the process of rebasing zheap onto the pluggable
storage work. The goal, which seems to work surprisingly well, is to
find issues that the current pluggable storage patch doesn't yet deal
with. I plan to push a tree including a lot of fixes and improvements
soon.

Sorry for coming late to this thread.

That's good. Did you find any problems in porting zheap into pluggable
storage? Does it needs any API changes or new API requirement?

On 2018-08-03 12:35:50 +1000, Haribabu Kommi wrote:

while investing the crash, I observed that it is due to the lot of

FIXME's

in
the code. So I just fixed minimal changes and looking into correcting
the FIXME's first.

One thing I observed is lack relation pointer is leading to crash in the
flow of EvalPlan* functions, because all ROW_MARK types doesn't
contains relation pointer.

will continue to check all FIXME fixes.

Thanks.

I fixed some of the Isolation test problems. All the issues are related to
EPQ slot handling. Still more needs to be fixed.

Does the new TupleTableSlot abstraction patches has fixed any of these
issues in the recent thread [1]/messages/by-id/CAFjFpRcNPQ1oOL41-HQYaEF=Nq6Vbg0eHeFgopJhHw_X2usA5w@mail.gmail.com? so that I can look into the change of FDW
API
to return slot instead of tuple.

- COPY's multi_insert path should probably deal with a bunch of slots,
rather than forming HeapTuples

Implemented supporting of slots in the copy multi insert path.

Cool. I've not yet looked at it, but I plan to do so soon. Will have to
rebase over the other copy changes first :(

OK. Understood. There are many changes in the COPY flow conflicts
with my changes. Please let me know once you done the rebase, I can
fix those conflicts and regenerate the patch.

Attached is the patch with further fixes.

[1]: /messages/by-id/CAFjFpRcNPQ1oOL41-HQYaEF=Nq6Vbg0eHeFgopJhHw_X2usA5w@mail.gmail.com
/messages/by-id/CAFjFpRcNPQ1oOL41-HQYaEF=Nq6Vbg0eHeFgopJhHw_X2usA5w@mail.gmail.com

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0001-isolation-test-fixes-2.patchapplication/octet-stream; name=0001-isolation-test-fixes-2.patchDownload
From a064078f3cc917cd548f20cc7327516c3905b35b Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Tue, 21 Aug 2018 16:28:46 +1000
Subject: [PATCH] isolation test fixes -2

---
 src/backend/commands/trigger.c         | 6 +++++-
 src/backend/executor/execMain.c        | 9 +++++++--
 src/backend/executor/nodeModifyTable.c | 4 ++++
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index b2951a237e..801a3fee25 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3326,7 +3326,11 @@ GetTupleForTrigger(EState *estate,
 					if (TupIsNull(epqslot))
 						return false;
 
-					ExecCopySlot(newslot, epqslot);
+					if (newslot)
+						ExecCopySlot(newslot, epqslot);
+					else
+						ExecCopySlot(oldslot, epqslot);
+
 					*is_epqtuple = true;
 				}
 				break;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index fd3e53d1ee..dbbebca045 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2477,8 +2477,13 @@ EvalPlanQualSlot(EPQState *epqstate,
 		MemoryContext oldcontext;
 
 		oldcontext = MemoryContextSwitchTo(epqstate->estate->es_query_cxt);
-		*slot = table_gimmegimmeslot(relation,
-									 &epqstate->estate->es_tupleTable);
+
+		if (relation)
+			*slot = table_gimmegimmeslot(relation, &epqstate->estate->es_tupleTable);
+		else
+			*slot = MakeTupleTableSlot(epqstate->origslot->tts_tupleDescriptor, TTS_TYPE_BUFFER);
+
+		epqstate->estate->es_epqTupleSet[rti - 1] = true;
 		MemoryContextSwitchTo(oldcontext);
 	}
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 71150ad32e..14ca3b976e 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -702,6 +702,9 @@ ldelete:;
 
 		if (result == HeapTupleUpdated && !IsolationUsesXactSnapshot())
 		{
+			EvalPlanQualBegin(epqstate, estate);
+			slot = EvalPlanQualSlot(epqstate, resultRelationDesc, resultRelInfo->ri_RangeTableIndex);
+
 			result = table_lock_tuple(resultRelationDesc, tupleid,
 									  estate->es_snapshot,
 									  slot, estate->es_output_cid,
@@ -1182,6 +1185,7 @@ lreplace:;
 			EvalPlanQualBegin(epqstate, estate);
 
 			inputslot = EvalPlanQualSlot(epqstate, resultRelationDesc, resultRelInfo->ri_RangeTableIndex);
+			ExecCopySlot(inputslot, slot);
 
 			result = table_lock_tuple(resultRelationDesc, tupleid,
 									  estate->es_snapshot,
-- 
2.18.0.windows.1

#12Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#11)
Re: Pluggable Storage - Andres's take

Hi,

On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:

On Sun, Aug 5, 2018 at 7:48 PM Andres Freund <andres@anarazel.de> wrote:

I'm currently in the process of rebasing zheap onto the pluggable
storage work. The goal, which seems to work surprisingly well, is to
find issues that the current pluggable storage patch doesn't yet deal
with. I plan to push a tree including a lot of fixes and improvements
soon.

Sorry for coming late to this thread.

No worries.

That's good. Did you find any problems in porting zheap into pluggable
storage? Does it needs any API changes or new API requirement?

A lot, yes. The big changes are:
- removal of HeapPageScanDesc
- introduction of explicit support functions for tablesample & bitmap scans
- introduction of callbacks for vacuum_rel, cluster

And quite a bit more along those lines.

Does the new TupleTableSlot abstraction patches has fixed any of these
issues in the recent thread [1]? so that I can look into the change of
FDW API to return slot instead of tuple.

Yea, that'd be a good thing to start with.

Greetings,

Andres Freund

#13Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#12)
2 attachment(s)
Re: Pluggable Storage - Andres's take

On Tue, Aug 21, 2018 at 6:59 PM Andres Freund <andres@anarazel.de> wrote:

On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:

On Sun, Aug 5, 2018 at 7:48 PM Andres Freund <andres@anarazel.de> wrote:

I'm currently in the process of rebasing zheap onto the pluggable
storage work. The goal, which seems to work surprisingly well, is to
find issues that the current pluggable storage patch doesn't yet deal
with. I plan to push a tree including a lot of fixes and improvements
soon.

That's good. Did you find any problems in porting zheap into pluggable
storage? Does it needs any API changes or new API requirement?

A lot, yes. The big changes are:
- removal of HeapPageScanDesc
- introduction of explicit support functions for tablesample & bitmap scans
- introduction of callbacks for vacuum_rel, cluster

And quite a bit more along those lines.

OK. Those are quite a bit of changes.

Does the new TupleTableSlot abstraction patches has fixed any of these
issues in the recent thread [1]? so that I can look into the change of
FDW API to return slot instead of tuple.

Yea, that'd be a good thing to start with.

I found out only the RefetchForeignRow API needs the change and done the
same.
Along with that, I fixed all the issues of running make check-world.
Attached patches
for the same.

Now I will look into the remaining FIXME's that don't conflict with your
further changes.

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0002-check-world-fixes.patchapplication/octet-stream; name=0002-check-world-fixes.patchDownload
From 58b1bcd28991a1be04d13783e1c46a1ebf5de51c Mon Sep 17 00:00:00 2001
From: Kommi <haribabuk@fast.au.fujitsu.com>
Date: Fri, 24 Aug 2018 11:21:34 +1000
Subject: [PATCH 2/2] check-world fixes

---
 contrib/amcheck/verify_nbtree.c        |  1 +
 contrib/pg_visibility/pg_visibility.c  |  5 +++--
 contrib/pgstattuple/pgstatapprox.c     | 10 +++++++++-
 contrib/pgstattuple/pgstattuple.c      |  9 ++++++++-
 src/backend/executor/execMain.c        | 11 +----------
 src/backend/executor/execTuples.c      | 22 ++++++++++++++++++++++
 src/backend/executor/nodeModifyTable.c | 12 ++++++++++--
 src/include/executor/tuptable.h        |  1 +
 8 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 04102dd3df..cb4294af7d 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -25,6 +25,7 @@
 
 #include "access/htup_details.h"
 #include "access/nbtree.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "catalog/index.h"
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index dce5262e34..88ca4fd2af 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -563,12 +563,13 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
 
 	rel = relation_open(relid, AccessShareLock);
 
+	/* Only some relkinds have a visibility map */
+	check_relation_relkind(rel);
+
 	if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
 		ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 						errmsg("only heap AM is supported")));
 
-	/* Only some relkinds have a visibility map */
-	check_relation_relkind(rel);
 
 	nblocks = RelationGetNumberOfBlocks(rel);
 
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index e805981bb9..6aee2ce8ac 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -12,6 +12,7 @@
  */
 #include "postgres.h"
 
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/visibilitymap.h"
 #include "access/xact.h"
@@ -69,6 +70,7 @@ statapprox_heap(Relation rel, output_type *stat)
 	Buffer		vmbuffer = InvalidBuffer;
 	BufferAccessStrategy bstrategy;
 	TransactionId OldestXmin;
+	TupleTableSlot *slot;
 
 	OldestXmin = GetOldestXmin(rel, PROCARRAY_FLAGS_VACUUM);
 	bstrategy = GetAccessStrategy(BAS_BULKREAD);
@@ -76,6 +78,8 @@ statapprox_heap(Relation rel, output_type *stat)
 	nblocks = RelationGetNumberOfBlocks(rel);
 	scanned = 0;
 
+	slot = MakeSingleTupleTableSlot(RelationGetDescr(rel), TTS_TYPE_BUFFER);
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
@@ -153,13 +157,15 @@ statapprox_heap(Relation rel, output_type *stat)
 			tuple.t_len = ItemIdGetLength(itemid);
 			tuple.t_tableOid = RelationGetRelid(rel);
 
+			ExecStoreTuple(&tuple,slot,buf,false);
+
 			/*
 			 * We follow VACUUM's lead in counting INSERT_IN_PROGRESS tuples
 			 * as "dead" while DELETE_IN_PROGRESS tuples are "live".  We don't
 			 * bother distinguishing tuples inserted/deleted by our own
 			 * transaction.
 			 */
-			switch (rel->rd_tableamroutine->snapshot_satisfiesVacuum(&tuple, OldestXmin, buf))
+			switch (rel->rd_tableamroutine->snapshot_satisfiesVacuum(slot, OldestXmin))
 			{
 				case HEAPTUPLE_LIVE:
 				case HEAPTUPLE_DELETE_IN_PROGRESS:
@@ -210,6 +216,8 @@ statapprox_heap(Relation rel, output_type *stat)
 		ReleaseBuffer(vmbuffer);
 		vmbuffer = InvalidBuffer;
 	}
+
+	ExecDropSingleTupleTableSlot(slot);
 }
 
 /*
diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 53fdff6c2c..5965ecbcf8 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -28,6 +28,7 @@
 #include "access/hash.h"
 #include "access/nbtree.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_am.h"
 #include "funcapi.h"
@@ -326,12 +327,14 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 	Buffer		buffer;
 	pgstattuple_type stat = {0};
 	SnapshotData SnapshotDirty;
+	TupleTableSlot *slot;
 	TableAmRoutine *method = rel->rd_tableamroutine;
 
 	/* Disable syncscan because we assume we scan from block zero upwards */
 	scan = table_beginscan_strat(rel, SnapshotAny, 0, NULL, true, false);
 	InitDirtySnapshot(SnapshotDirty);
 
+	slot = MakeSingleTupleTableSlot(RelationGetDescr(rel), TTS_TYPE_BUFFER);
 	pagescan = tableam_get_heappagescandesc(scan);
 	nblocks = pagescan->rs_nblocks; /* # blocks to be scanned */
 
@@ -343,7 +346,10 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 		/* must hold a buffer lock to call HeapTupleSatisfiesVisibility */
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
-		if (HeapTupleSatisfiesVisibility(method, tuple, &SnapshotDirty, scan->rs_cbuf))
+		/* FIXME: change to get the slot directly, instead of tuple */
+		ExecStoreTuple(tuple, slot, scan->rs_cbuf, false);
+
+		if (HeapTupleSatisfiesVisibility(method, slot, &SnapshotDirty))
 		{
 			stat.tuple_len += tuple->t_len;
 			stat.tuple_count++;
@@ -393,6 +399,7 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 	relation_close(rel, AccessShareLock);
 
 	stat.table_len = (uint64) nblocks * BLCKSZ;
+	ExecDropSingleTupleTableSlot(slot);
 
 	return build_pgstattuple_type(&stat, fcinfo);
 }
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 1237e809f3..e349a22e94 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2765,16 +2765,7 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
 			if (isNull)
 				continue;
 
-			elog(ERROR, "frak, need to implement ROW_MARK_COPY");
-#ifdef FIXME
-			// FIXME: this should just deform the tuple and store it as a
-			// virtual one.
-			tuple = table_tuple_by_datum(erm->relation, datum, erm->relid);
-
-			/* store tuple */
-			EvalPlanQualSetTuple(epqstate, erm->rti, tuple);
-#endif
-
+			ExecForceStoreHeapTupleDatum(datum, slot);
 		}
 	}
 }
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 59ccc1a626..046f7b23dc 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -1385,6 +1385,28 @@ ExecForceStoreHeapTuple(HeapTuple tuple,
 
 }
 
+void
+ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot)
+{
+	HeapTuple	tuple;
+	HeapTupleHeader td;
+	MemoryContext oldContext;
+
+	td = DatumGetHeapTupleHeader(data);
+
+	tuple = (HeapTuple) palloc(HEAPTUPLESIZE + HeapTupleHeaderGetDatumLength(td));
+	tuple->t_len = HeapTupleHeaderGetDatumLength(td);
+	tuple->t_self = td->t_ctid;
+	tuple->t_data = (HeapTupleHeader) ((char *) tuple + HEAPTUPLESIZE);
+	memcpy((char *) tuple->t_data, (char *) td, tuple->t_len);
+
+	ExecClearTuple(slot);
+
+	heap_deform_tuple(tuple, slot->tts_tupleDescriptor,
+					  slot->tts_values, slot->tts_isnull);
+	ExecStoreVirtualTuple(slot);
+}
+
 /* --------------------------------
  *		ExecStoreMinimalTuple
  *
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 14ca3b976e..d7d79f845c 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -609,7 +609,7 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool canSetTag,
 		   bool changingPart,
 		   bool *tupleDeleted,
-		   TupleTableSlot **epqslot)
+		   TupleTableSlot **epqreturnslot)
 {
 	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
@@ -634,7 +634,7 @@ ExecDelete(ModifyTableState *mtstate,
 		bool		dodelete;
 
 		dodelete = ExecBRDeleteTriggers(estate, epqstate, resultRelInfo,
-										tupleid, oldtuple, epqslot);
+										tupleid, oldtuple, epqreturnslot);
 
 		if (!dodelete)			/* "do nothing" */
 			return NULL;
@@ -727,6 +727,14 @@ ldelete:;
 					/* Tuple no more passing quals, exiting... */
 					return NULL;
 				}
+
+				/**/
+				if (epqreturnslot)
+				{
+					*epqreturnslot = epqslot;
+					return NULL;
+				}
+
 				goto ldelete;
 			}
 		}
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index dec3e87a1e..4e49304ba7 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -313,6 +313,7 @@ extern TupleTableSlot *ExecStoreMinimalTuple(MinimalTuple mtup,
 
 extern void ExecForceStoreHeapTuple(HeapTuple tuple,
 			   TupleTableSlot *slot);
+extern void ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot);
 
 extern TupleTableSlot *ExecStoreVirtualTuple(TupleTableSlot *slot);
 extern TupleTableSlot *ExecStoreAllNullTuple(TupleTableSlot *slot);
-- 
2.18.0.windows.1

0001-FDW-RefetchForeignRow-API-prototype-change.patchapplication/octet-stream; name=0001-FDW-RefetchForeignRow-API-prototype-change.patchDownload
From 71a9811c742dfe7ac8cfac053d02869540452f92 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Wed, 22 Aug 2018 16:45:02 +1000
Subject: [PATCH 1/2] FDW RefetchForeignRow API prototype change

With pluggable storage, the tuple usage is minimized
and all the extenal API's must deal with TupleTableSlot.
---
 doc/src/sgml/fdwhandler.sgml        | 10 ++++++----
 src/backend/executor/execMain.c     | 16 ++++++++--------
 src/backend/executor/nodeLockRows.c | 20 +++++++-------------
 src/include/foreign/fdwapi.h        |  9 +++++----
 4 files changed, 26 insertions(+), 29 deletions(-)

diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 4ce88dd77c..12769f3288 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -988,23 +988,25 @@ GetForeignRowMarkType(RangeTblEntry *rte,
 
     <para>
 <programlisting>
-HeapTuple
+TupleTableSlot *
 RefetchForeignRow(EState *estate,
                   ExecRowMark *erm,
                   Datum rowid,
+                  TupleTableSlot *slot,
                   bool *updated);
 </programlisting>
 
-     Re-fetch one tuple from the foreign table, after locking it if required.
+     Re-fetch one tuple slot from the foreign table, after locking it if required.
      <literal>estate</literal> is global execution state for the query.
      <literal>erm</literal> is the <structname>ExecRowMark</structname> struct describing
      the target foreign table and the row lock type (if any) to acquire.
      <literal>rowid</literal> identifies the tuple to be fetched.
-     <literal>updated</literal> is an output parameter.
+     <literal>slot</literal> contains nothing useful upon call, but can be used to
+     hold the returned tuple. <literal>updated</literal> is an output parameter.
     </para>
 
     <para>
-     This function should return a palloc'ed copy of the fetched tuple,
+     This function should return a slot containing the fetched tuple
      or <literal>NULL</literal> if the row lock couldn't be obtained.  The row lock
      type to acquire is defined by <literal>erm-&gt;markType</literal>, which is the
      value previously returned by <function>GetForeignRowMarkType</function>.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index dbbebca045..1237e809f3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2704,23 +2704,24 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
 			/* fetch requests on foreign tables must be passed to their FDW */
 			if (erm->relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 			{
-				elog(ERROR, "frak, need to change fdw API");
-#ifdef FIXME
 				FdwRoutine *fdwroutine;
 				bool		updated = false;
 
 				fdwroutine = GetFdwRoutineForRelation(erm->relation, false);
+
 				/* this should have been checked already, but let's be safe */
 				if (fdwroutine->RefetchForeignRow == NULL)
 					ereport(ERROR,
 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 							 errmsg("cannot lock rows in foreign table \"%s\"",
 									RelationGetRelationName(erm->relation))));
-				tuple = fdwroutine->RefetchForeignRow(epqstate->estate,
-													  erm,
-													  datum,
-													  &updated);
-				if (tuple == NULL)
+
+				slot = fdwroutine->RefetchForeignRow(epqstate->estate,
+													 erm,
+													 datum,
+													 slot,
+													 &updated);
+				if (slot == NULL)
 					elog(ERROR, "failed to fetch tuple for EvalPlanQual recheck");
 
 				/*
@@ -2728,7 +2729,6 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
 				 * assumes that FDWs can track that exactly, which they might
 				 * not be able to.  So just ignore the flag.
 				 */
-#endif
 			}
 			else
 			{
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index e58e0919d8..3a4071f2e3 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -128,33 +128,27 @@ lnext:
 		{
 			FdwRoutine *fdwroutine;
 			bool		updated = false;
-			HeapTuple	copyTuple;
-
-			elog(ERROR, "frak, tuple based API needs to be rewritten");
 
 			fdwroutine = GetFdwRoutineForRelation(erm->relation, false);
+
 			/* this should have been checked already, but let's be safe */
 			if (fdwroutine->RefetchForeignRow == NULL)
 				ereport(ERROR,
 						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 						 errmsg("cannot lock rows in foreign table \"%s\"",
 								RelationGetRelationName(erm->relation))));
-			copyTuple = fdwroutine->RefetchForeignRow(estate,
-													  erm,
-													  datum,
-													  &updated);
 
-			if (copyTuple == NULL)
+			markSlot = fdwroutine->RefetchForeignRow(estate,
+													 erm,
+													 datum,
+													 markSlot,
+													 &updated);
+			if (markSlot == NULL)
 			{
 				/* couldn't get the lock, so skip this row */
 				goto lnext;
 			}
 
-			elog(ERROR, "frak: slotify");
-
-			/* save locked tuple for possible EvalPlanQual testing below */
-			//*testTuple = copyTuple;
-
 			/*
 			 * if FDW says tuple was updated before getting locked, we need to
 			 * perform EPQ testing to see if quals are still satisfied
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index c14eb546c6..508b0eece8 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -121,10 +121,11 @@ typedef void (*EndDirectModify_function) (ForeignScanState *node);
 typedef RowMarkType (*GetForeignRowMarkType_function) (RangeTblEntry *rte,
 													   LockClauseStrength strength);
 
-typedef HeapTuple (*RefetchForeignRow_function) (EState *estate,
-												 ExecRowMark *erm,
-												 Datum rowid,
-												 bool *updated);
+typedef TupleTableSlot *(*RefetchForeignRow_function) (EState *estate,
+													   ExecRowMark *erm,
+													   Datum rowid,
+													   TupleTableSlot *slot,
+													   bool *updated);
 
 typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
 											 struct ExplainState *es);
-- 
2.18.0.windows.1

#14Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#13)
Re: Pluggable Storage - Andres's take

Hi,

On 2018-08-24 11:55:41 +1000, Haribabu Kommi wrote:

On Tue, Aug 21, 2018 at 6:59 PM Andres Freund <andres@anarazel.de> wrote:

On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:

On Sun, Aug 5, 2018 at 7:48 PM Andres Freund <andres@anarazel.de> wrote:

I'm currently in the process of rebasing zheap onto the pluggable
storage work. The goal, which seems to work surprisingly well, is to
find issues that the current pluggable storage patch doesn't yet deal
with. I plan to push a tree including a lot of fixes and improvements
soon.

That's good. Did you find any problems in porting zheap into pluggable
storage? Does it needs any API changes or new API requirement?

A lot, yes. The big changes are:
- removal of HeapPageScanDesc
- introduction of explicit support functions for tablesample & bitmap scans
- introduction of callbacks for vacuum_rel, cluster

And quite a bit more along those lines.

OK. Those are quite a bit of changes.

I've pushed a current version of that to my git tree to the
pluggable-storage branch. It's not really a version that I think makese
sense to review or such, but it's probably more useful if you work based
on that. There's also the pluggable-zheap branch, which I found
extremely useful to develop against.

There's a few further changes since last time: - Pluggable handlers are
now stored in static global variables, and thus do not need to be copied
anymore - VACUUM FULL / CLUSTER is moved into one callback that does the
actual copying. The various previous rewrite callbacks imo integrated at
the wrong level. - there's a GUC that allows to change the default
table AM - moving COPY to use slots (roughly based on your / Haribabu's
patch) - removed the AM specific shmem initialization callbacks -
various AMs are going to need the syncscan APIs, so moving that into AM
callbacks doesn't make sense.

Missing:
- callback for the second scan of CREATE INDEX CONCURRENTLY
- commands/analyze.c integration (Working on it)
- fixing your (Haribabu's) slotification of copy patch to compute memory
usage somehow
- table creation callback, currently the pluggable-zheap patch has a few
conditionals outside of access/zheap for that purpose (see RelationTruncate
- header structure cleanup

And then:
- lotsa cleanups
- rebasing onto a newer version of the abstract slot patchset
- splitting out smaller patches

You'd moved the bulk insert into tableam callbacks - I don't quite get
why? There's not really anything AM specific in that code?

Does the new TupleTableSlot abstraction patches has fixed any of these
issues in the recent thread [1]? so that I can look into the change of
FDW API to return slot instead of tuple.

Yea, that'd be a good thing to start with.

I found out only the RefetchForeignRow API needs the change and done the
same.
Along with that, I fixed all the issues of running make check-world.
Attached patches
for the same.

Thanks, that's really helpful! I'll try to merge these soon.

I'm starting to think that we're getting closer to something that
looks right from a high level, even though there's a lot of details to
clean up.

Greetings,

Andres Freund

#15Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#14)
Re: Pluggable Storage - Andres's take

On Fri, Aug 24, 2018 at 12:50 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-08-24 11:55:41 +1000, Haribabu Kommi wrote:

On Tue, Aug 21, 2018 at 6:59 PM Andres Freund <andres@anarazel.de>

wrote:

On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:

On Sun, Aug 5, 2018 at 7:48 PM Andres Freund <andres@anarazel.de>

wrote:

I'm currently in the process of rebasing zheap onto the pluggable
storage work. The goal, which seems to work surprisingly well, is

to

find issues that the current pluggable storage patch doesn't yet

deal

with. I plan to push a tree including a lot of fixes and

improvements

soon.

That's good. Did you find any problems in porting zheap into

pluggable

storage? Does it needs any API changes or new API requirement?

A lot, yes. The big changes are:
- removal of HeapPageScanDesc
- introduction of explicit support functions for tablesample & bitmap

scans

- introduction of callbacks for vacuum_rel, cluster

And quite a bit more along those lines.

OK. Those are quite a bit of changes.

I've pushed a current version of that to my git tree to the
pluggable-storage branch. It's not really a version that I think makese
sense to review or such, but it's probably more useful if you work based
on that. There's also the pluggable-zheap branch, which I found
extremely useful to develop against.

OK. Thanks, will check that also.

There's a few further changes since last time: - Pluggable handlers are
now stored in static global variables, and thus do not need to be copied
anymore - VACUUM FULL / CLUSTER is moved into one callback that does the
actual copying. The various previous rewrite callbacks imo integrated at
the wrong level. - there's a GUC that allows to change the default
table AM - moving COPY to use slots (roughly based on your / Haribabu's
patch) - removed the AM specific shmem initialization callbacks -
various AMs are going to need the syncscan APIs, so moving that into AM
callbacks doesn't make sense.

OK.

Missing:
- callback for the second scan of CREATE INDEX CONCURRENTLY
- commands/analyze.c integration (Working on it)
- fixing your (Haribabu's) slotification of copy patch to compute memory
usage somehow

I will check it.

- table creation callback, currently the pluggable-zheap patch has a few
conditionals outside of access/zheap for that purpose (see
RelationTruncate

I will check it.

And then:
- lotsa cleanups
- rebasing onto a newer version of the abstract slot patchset
- splitting out smaller patches

You'd moved the bulk insert into tableam callbacks - I don't quite get
why? There's not really anything AM specific in that code?

The main reason of adding them to AM is just to provide a control to
the specific AM to decide whether they can support the bulk insert or
not?

Current framework doesn't support AM specific bulk insert state to be
passed from one function to another and it's structure is fixed. This needs
to be enhanced to add AM specific private members also.

Does the new TupleTableSlot abstraction patches has fixed any of

these

issues in the recent thread [1]? so that I can look into the change

of

FDW API to return slot instead of tuple.

Yea, that'd be a good thing to start with.

I found out only the RefetchForeignRow API needs the change and done the
same.
Along with that, I fixed all the issues of running make check-world.
Attached patches
for the same.

Thanks, that's really helpful! I'll try to merge these soon.

I can share the rebased patches for the fixes, so that it will be easy to
merge.

I'm starting to think that we're getting closer to something that

looks right from a high level, even though there's a lot of details to
clean up.

That's good.

Regards,
Haribabu Kommi
Fujitsu Australia

#16Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#15)
3 attachment(s)
Re: Pluggable Storage - Andres's take

On Tue, Aug 28, 2018 at 1:48 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Fri, Aug 24, 2018 at 12:50 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-08-24 11:55:41 +1000, Haribabu Kommi wrote:

On Tue, Aug 21, 2018 at 6:59 PM Andres Freund <andres@anarazel.de>

wrote:

On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:

On Sun, Aug 5, 2018 at 7:48 PM Andres Freund <andres@anarazel.de>

wrote:

I'm currently in the process of rebasing zheap onto the pluggable
storage work. The goal, which seems to work surprisingly well, is

to

find issues that the current pluggable storage patch doesn't yet

deal

with. I plan to push a tree including a lot of fixes and

improvements

soon.

That's good. Did you find any problems in porting zheap into

pluggable

storage? Does it needs any API changes or new API requirement?

A lot, yes. The big changes are:
- removal of HeapPageScanDesc
- introduction of explicit support functions for tablesample & bitmap

scans

- introduction of callbacks for vacuum_rel, cluster

And quite a bit more along those lines.

OK. Those are quite a bit of changes.

I've pushed a current version of that to my git tree to the
pluggable-storage branch. It's not really a version that I think makese
sense to review or such, but it's probably more useful if you work based
on that. There's also the pluggable-zheap branch, which I found
extremely useful to develop against.

OK. Thanks, will check that also.

- fixing your (Haribabu's) slotification of copy patch to compute memory
usage somehow

I will check it.

Attached is the copy patch that brings back the size validation.
Compute the tuple size from the first tuple in the batch and use the same
for the
rest of the tuples in that batch. This way the calculation overhead is also
reduced,
but there may be chances that the first tuple size is very low and rest can
be very
long, but I feel those are rare.

- table creation callback, currently the pluggable-zheap patch has a few
conditionals outside of access/zheap for that purpose (see
RelationTruncate

I will check it.

I found couple of places where the zheap is using some extra logic in
verifying
whether it is zheap AM or not, based on that it used to took some extra
decisions.
I am analyzing all the extra code that is done, whether any callbacks can
handle it
or not? and how? I can come back with more details later.

And then:
- lotsa cleanups
- rebasing onto a newer version of the abstract slot patchset
- splitting out smaller patches

You'd moved the bulk insert into tableam callbacks - I don't quite get
why? There's not really anything AM specific in that code?

The main reason of adding them to AM is just to provide a control to
the specific AM to decide whether they can support the bulk insert or
not?

Current framework doesn't support AM specific bulk insert state to be
passed from one function to another and it's structure is fixed. This needs
to be enhanced to add AM specific private members also.

Do you want me to work on it to make it generic to AM methods to extend
the structure?

Does the new TupleTableSlot abstraction patches has fixed any of

these

issues in the recent thread [1]? so that I can look into the change

of

FDW API to return slot instead of tuple.

Yea, that'd be a good thing to start with.

I found out only the RefetchForeignRow API needs the change and done the
same.
Along with that, I fixed all the issues of running make check-world.
Attached patches
for the same.

Thanks, that's really helpful! I'll try to merge these soon.

I can share the rebased patches for the fixes, so that it will be easy to
merge.

Rebased FDW and check-world fixes patch is attached.
I will continue working on the rest of the miss items.

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0002-copy-memory-limit-fix.patchapplication/octet-stream; name=0002-copy-memory-limit-fix.patchDownload
From fd6fb3028c1c9f7fcb41d651a324b1b1eb4ab2ce Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Wed, 29 Aug 2018 13:52:39 +1000
Subject: [PATCH 2/2] copy memory limit fix

To limit memory used by the COPY FROM because of slotification,
calculates the tuple size of the first tuple in the batch and
use that for remaining batch, so that it almost averages the
memory usage by the COPY command.
---
 src/backend/commands/copy.c | 61 ++++++++++++++++++++++++-------------
 1 file changed, 39 insertions(+), 22 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c9272b344a..1e2d5ebb50 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -308,7 +308,7 @@ static void CopyFromInsertBatch(CopyState cstate, EState *estate,
 					CommandId mycid, int hi_options,
 					ResultRelInfo *resultRelInfo,
 					BulkInsertState bistate,
-					int nBufferedTuples, TupleTableSlot **bufferedSlots,
+					int nBufferedSlots, TupleTableSlot **bufferedSlots,
 					uint64 firstBufferedLineNo);
 static bool CopyReadLine(CopyState cstate);
 static bool CopyReadLineText(CopyState cstate);
@@ -2309,11 +2309,12 @@ CopyFrom(CopyState cstate)
 	void       *bistate;
 	uint64		processed = 0;
 	bool		useHeapMultiInsert;
-	int			nBufferedTuples = 0;
+	int			nBufferedSlots = 0;
 	int			prev_leaf_part_index = -1;
 
-#define MAX_BUFFERED_TUPLES 1000
+#define MAX_BUFFERED_SLOTS 1000
 	TupleTableSlot  **bufferedSlots = NULL;	/* initialize to silence warning */
+	Size		bufferedSlotsSize = 0;
 	uint64		firstBufferedLineNo = 0;
 
 	Assert(cstate->rel);
@@ -2524,7 +2525,7 @@ CopyFrom(CopyState cstate)
 	else
 	{
 		useHeapMultiInsert = true;
-		bufferedSlots = palloc0(MAX_BUFFERED_TUPLES * sizeof(TupleTableSlot *));
+		bufferedSlots = palloc0(MAX_BUFFERED_SLOTS * sizeof(TupleTableSlot *));
 	}
 
 	/*
@@ -2562,7 +2563,7 @@ CopyFrom(CopyState cstate)
 
 		CHECK_FOR_INTERRUPTS();
 
-		if (nBufferedTuples == 0)
+		if (nBufferedSlots == 0)
 		{
 			/*
 			 * Reset the per-tuple exprcontext. We can only do this if the
@@ -2577,14 +2578,14 @@ CopyFrom(CopyState cstate)
 			myslot = singleslot;
 			Assert(myslot != NULL);
 		}
-		else if (bufferedSlots[nBufferedTuples] == NULL)
+		else if (bufferedSlots[nBufferedSlots] == NULL)
 		{
 			myslot = table_gimmegimmeslot(resultRelInfo->ri_RelationDesc,
 										  &estate->es_tupleTable);
-			bufferedSlots[nBufferedTuples] = myslot;
+			bufferedSlots[nBufferedSlots] = myslot;
 		}
 		else
-			myslot = bufferedSlots[nBufferedTuples];
+			myslot = bufferedSlots[nBufferedSlots];
 
 		/* Switch into its memory context */
 		MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -2750,27 +2751,43 @@ CopyFrom(CopyState cstate)
 
 				if (useHeapMultiInsert)
 				{
+					int tup_size;
+
 					/* Add this tuple to the tuple buffer */
-					if (nBufferedTuples == 0)
+					if (nBufferedSlots == 0)
+					{
 						firstBufferedLineNo = cstate->cur_lineno;
-					Assert(bufferedSlots[nBufferedTuples] == myslot);
-					nBufferedTuples++;
+
+						/*
+						 * Find out the Tuple size of the first tuple in a batch and
+						 * use it for the rest tuples in a batch. There may be scenarios
+						 * where the first tuple is very small and rest can be large, but
+						 * that's rare and this should work for majority of the scenarios.
+						 */
+						tup_size = heap_compute_data_size(myslot->tts_tupleDescriptor,
+														  myslot->tts_values,
+														  myslot->tts_isnull);
+					}
+
+					Assert(bufferedSlots[nBufferedSlots] == myslot);
+					nBufferedSlots++;
+					bufferedSlotsSize += tup_size;
 
 					/*
 					 * If the buffer filled up, flush it.  Also flush if the
 					 * total size of all the tuples in the buffer becomes
 					 * large, to avoid using large amounts of memory for the
 					 * buffer when the tuples are exceptionally wide.
-					 *
-					 * PBORKED: Re-introduce size limit
 					 */
-					if (nBufferedTuples == MAX_BUFFERED_TUPLES)
+					if (nBufferedSlots == MAX_BUFFERED_SLOTS ||
+						bufferedSlotsSize > 65535)
 					{
 						CopyFromInsertBatch(cstate, estate, mycid, hi_options,
 											resultRelInfo, bistate,
-											nBufferedTuples, bufferedSlots,
+											nBufferedSlots, bufferedSlots,
 											firstBufferedLineNo);
-						nBufferedTuples = 0;
+						nBufferedSlots = 0;
+						bufferedSlotsSize = 0;
 					}
 				}
 				else
@@ -2836,10 +2853,10 @@ next_tuple:
 	}
 
 	/* Flush any remaining buffered tuples */
-	if (nBufferedTuples > 0)
+	if (nBufferedSlots > 0)
 		CopyFromInsertBatch(cstate, estate, mycid, hi_options,
 							resultRelInfo, bistate,
-							nBufferedTuples, bufferedSlots,
+							nBufferedSlots, bufferedSlots,
 							firstBufferedLineNo);
 
 	/* Done, clean up */
@@ -2899,7 +2916,7 @@ next_tuple:
 static void
 CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 					int hi_options, ResultRelInfo *resultRelInfo,
-					BulkInsertState bistate, int nBufferedTuples, TupleTableSlot **bufferedSlots,
+					BulkInsertState bistate, int nBufferedSlots, TupleTableSlot **bufferedSlots,
 					uint64 firstBufferedLineNo)
 {
 	MemoryContext oldcontext;
@@ -2920,7 +2937,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
 	table_multi_insert(cstate->rel,
 					   bufferedSlots,
-					   nBufferedTuples,
+					   nBufferedSlots,
 					   mycid,
 					   hi_options,
 					   bistate);
@@ -2932,7 +2949,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 	 */
 	if (resultRelInfo->ri_NumIndices > 0)
 	{
-		for (i = 0; i < nBufferedTuples; i++)
+		for (i = 0; i < nBufferedSlots; i++)
 		{
 			List	   *recheckIndexes;
 
@@ -2954,7 +2971,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 			 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
 			  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
 	{
-		for (i = 0; i < nBufferedTuples; i++)
+		for (i = 0; i < nBufferedSlots; i++)
 		{
 			cstate->cur_lineno = firstBufferedLineNo + i;
 			ExecARInsertTriggers(estate, resultRelInfo,
-- 
2.18.0.windows.1

0001-check-world-fixes.patchapplication/octet-stream; name=0001-check-world-fixes.patchDownload
From e64936a923fa2772fa9151a0e51bb4a042bbd36b Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Wed, 29 Aug 2018 14:22:07 +1000
Subject: [PATCH 1/2] check-world fixes

---
 contrib/pg_visibility/pg_visibility.c  |  5 +++--
 src/backend/access/heap/heapam.c       |  2 --
 src/backend/executor/execExprInterp.c  |  3 +++
 src/backend/executor/execMain.c        | 13 ++-----------
 src/backend/executor/execTuples.c      | 21 +++++++++++++++++++++
 src/backend/executor/nodeModifyTable.c | 12 ++++++++++--
 src/include/executor/tuptable.h        |  1 +
 7 files changed, 40 insertions(+), 17 deletions(-)

diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index dce5262e34..88ca4fd2af 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -563,12 +563,13 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
 
 	rel = relation_open(relid, AccessShareLock);
 
+	/* Only some relkinds have a visibility map */
+	check_relation_relkind(rel);
+
 	if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
 		ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 						errmsg("only heap AM is supported")));
 
-	/* Only some relkinds have a visibility map */
-	check_relation_relkind(rel);
 
 	nblocks = RelationGetNumberOfBlocks(rel);
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b35dcea75e..6d516ccc0b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -83,8 +83,6 @@
 /* GUC variable */
 bool		synchronize_seqscans = true;
 
-static void heap_parallelscan_startblock_init(HeapScanDesc scan);
-static BlockNumber heap_parallelscan_nextpage(HeapScanDesc scan);
 static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 					TransactionId xid, CommandId cid, int options);
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 5ed9273d32..d91d0f25a1 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -570,6 +570,9 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 				Assert(TTS_IS_HEAPTUPLE(scanslot) ||
 					   TTS_IS_BUFFERTUPLE(scanslot));
 
+				if (hslot->tuple == NULL)
+					ExecMaterializeSlot(scanslot);
+
 				d = heap_getsysattr(hslot->tuple, attnum,
 									scanslot->tts_tupleDescriptor,
 									op->resnull);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index f430733b69..faeb960e1d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2559,7 +2559,7 @@ EvalPlanQual(EState *estate, EPQState *epqstate,
 	 * datums that may be present in copyTuple).  As with the next step, this
 	 * is to guard against early re-use of the EPQ query.
 	 */
-	if (!TupIsNull(slot))
+	if (!TupIsNull(slot) && !TTS_IS_VIRTUAL(slot))
 		ExecMaterializeSlot(slot);
 
 #if FIXME
@@ -2766,16 +2766,7 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
 			if (isNull)
 				continue;
 
-			elog(ERROR, "frak, need to implement ROW_MARK_COPY");
-#ifdef FIXME
-			// FIXME: this should just deform the tuple and store it as a
-			// virtual one.
-			tuple = table_tuple_by_datum(erm->relation, datum, erm->relid);
-
-			/* store tuple */
-			EvalPlanQualSetTuple(epqstate, erm->rti, tuple);
-#endif
-
+			ExecForceStoreHeapTupleDatum(datum, slot);
 		}
 	}
 }
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 4921835c31..7628799d41 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -1397,6 +1397,27 @@ ExecStoreAllNullTuple(TupleTableSlot *slot)
 	return ExecFinishStoreSlotValues(slot);
 }
 
+void
+ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot)
+{
+	HeapTuple	tuple;
+	HeapTupleHeader td;
+
+	td = DatumGetHeapTupleHeader(data);
+
+	tuple = (HeapTuple) palloc(HEAPTUPLESIZE + HeapTupleHeaderGetDatumLength(td));
+	tuple->t_len = HeapTupleHeaderGetDatumLength(td);
+	tuple->t_self = td->t_ctid;
+	tuple->t_data = (HeapTupleHeader) ((char *) tuple + HEAPTUPLESIZE);
+	memcpy((char *) tuple->t_data, (char *) td, tuple->t_len);
+
+	ExecClearTuple(slot);
+
+	heap_deform_tuple(tuple, slot->tts_tupleDescriptor,
+					  slot->tts_values, slot->tts_isnull);
+	ExecFinishStoreSlotValues(slot);
+}
+
 /* --------------------------------
  *		ExecFetchSlotTuple
  *			Fetch the slot's regular physical tuple.
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 226ba5fc21..0b38259387 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -607,7 +607,7 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool canSetTag,
 		   bool changingPart,
 		   bool *tupleDeleted,
-		   TupleTableSlot **epqslot)
+		   TupleTableSlot **epqreturnslot)
 {
 	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
@@ -632,7 +632,7 @@ ExecDelete(ModifyTableState *mtstate,
 		bool		dodelete;
 
 		dodelete = ExecBRDeleteTriggers(estate, epqstate, resultRelInfo,
-										tupleid, oldtuple, epqslot);
+										tupleid, oldtuple, epqreturnslot);
 
 		if (!dodelete)			/* "do nothing" */
 			return NULL;
@@ -724,6 +724,14 @@ ldelete:;
 					/* Tuple no more passing quals, exiting... */
 					return NULL;
 				}
+
+				/**/
+				if (epqreturnslot)
+				{
+					*epqreturnslot = epqslot;
+					return NULL;
+				}
+
 				goto ldelete;
 			}
 		}
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index 1d316deed3..97aa26f5e0 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -476,6 +476,7 @@ extern TupleTableSlot *ExecCopySlot(TupleTableSlot *dstslot,
 
 extern void ExecForceStoreHeapTuple(HeapTuple tuple,
 			   TupleTableSlot *slot);
+extern void ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot);
 
 extern void slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum);
 extern Datum slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull);
-- 
2.18.0.windows.1

0003-FDW-RefetchForeignRow-API-prototype-change.patchapplication/octet-stream; name=0003-FDW-RefetchForeignRow-API-prototype-change.patchDownload
From c1a1ca617f344b8e4d6094da50585783508de0c2 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Wed, 22 Aug 2018 16:45:02 +1000
Subject: [PATCH 3/3] FDW RefetchForeignRow API prototype change

With pluggable storage, the tuple usage is minimized
and all the extenal API's must deal with TupleTableSlot.
---
 doc/src/sgml/fdwhandler.sgml        | 10 ++++++----
 src/backend/executor/execMain.c     | 16 ++++++++--------
 src/backend/executor/nodeLockRows.c | 20 +++++++-------------
 src/include/foreign/fdwapi.h        |  9 +++++----
 4 files changed, 26 insertions(+), 29 deletions(-)

diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 4ce88dd77c..12769f3288 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -988,23 +988,25 @@ GetForeignRowMarkType(RangeTblEntry *rte,
 
     <para>
 <programlisting>
-HeapTuple
+TupleTableSlot *
 RefetchForeignRow(EState *estate,
                   ExecRowMark *erm,
                   Datum rowid,
+                  TupleTableSlot *slot,
                   bool *updated);
 </programlisting>
 
-     Re-fetch one tuple from the foreign table, after locking it if required.
+     Re-fetch one tuple slot from the foreign table, after locking it if required.
      <literal>estate</literal> is global execution state for the query.
      <literal>erm</literal> is the <structname>ExecRowMark</structname> struct describing
      the target foreign table and the row lock type (if any) to acquire.
      <literal>rowid</literal> identifies the tuple to be fetched.
-     <literal>updated</literal> is an output parameter.
+     <literal>slot</literal> contains nothing useful upon call, but can be used to
+     hold the returned tuple. <literal>updated</literal> is an output parameter.
     </para>
 
     <para>
-     This function should return a palloc'ed copy of the fetched tuple,
+     This function should return a slot containing the fetched tuple
      or <literal>NULL</literal> if the row lock couldn't be obtained.  The row lock
      type to acquire is defined by <literal>erm-&gt;markType</literal>, which is the
      value previously returned by <function>GetForeignRowMarkType</function>.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index faeb960e1d..674569a586 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2705,23 +2705,24 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
 			/* fetch requests on foreign tables must be passed to their FDW */
 			if (erm->relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 			{
-				elog(ERROR, "frak, need to change fdw API");
-#ifdef FIXME
 				FdwRoutine *fdwroutine;
 				bool		updated = false;
 
 				fdwroutine = GetFdwRoutineForRelation(erm->relation, false);
+
 				/* this should have been checked already, but let's be safe */
 				if (fdwroutine->RefetchForeignRow == NULL)
 					ereport(ERROR,
 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 							 errmsg("cannot lock rows in foreign table \"%s\"",
 									RelationGetRelationName(erm->relation))));
-				tuple = fdwroutine->RefetchForeignRow(epqstate->estate,
-													  erm,
-													  datum,
-													  &updated);
-				if (tuple == NULL)
+
+				slot = fdwroutine->RefetchForeignRow(epqstate->estate,
+													 erm,
+													 datum,
+													 slot,
+													 &updated);
+				if (slot == NULL)
 					elog(ERROR, "failed to fetch tuple for EvalPlanQual recheck");
 
 				/*
@@ -2729,7 +2730,6 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
 				 * assumes that FDWs can track that exactly, which they might
 				 * not be able to.  So just ignore the flag.
 				 */
-#endif
 			}
 			else
 			{
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 668f5fa7a2..e52394a65c 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -128,33 +128,27 @@ lnext:
 		{
 			FdwRoutine *fdwroutine;
 			bool		updated = false;
-			HeapTuple	copyTuple;
-
-			elog(ERROR, "frak, tuple based API needs to be rewritten");
 
 			fdwroutine = GetFdwRoutineForRelation(erm->relation, false);
+
 			/* this should have been checked already, but let's be safe */
 			if (fdwroutine->RefetchForeignRow == NULL)
 				ereport(ERROR,
 						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 						 errmsg("cannot lock rows in foreign table \"%s\"",
 								RelationGetRelationName(erm->relation))));
-			copyTuple = fdwroutine->RefetchForeignRow(estate,
-													  erm,
-													  datum,
-													  &updated);
 
-			if (copyTuple == NULL)
+			markSlot = fdwroutine->RefetchForeignRow(estate,
+													 erm,
+													 datum,
+													 markSlot,
+													 &updated);
+			if (markSlot == NULL)
 			{
 				/* couldn't get the lock, so skip this row */
 				goto lnext;
 			}
 
-			elog(ERROR, "frak: slotify");
-
-			/* save locked tuple for possible EvalPlanQual testing below */
-			//*testTuple = copyTuple;
-
 			/*
 			 * if FDW says tuple was updated before getting locked, we need to
 			 * perform EPQ testing to see if quals are still satisfied
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index c14eb546c6..508b0eece8 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -121,10 +121,11 @@ typedef void (*EndDirectModify_function) (ForeignScanState *node);
 typedef RowMarkType (*GetForeignRowMarkType_function) (RangeTblEntry *rte,
 													   LockClauseStrength strength);
 
-typedef HeapTuple (*RefetchForeignRow_function) (EState *estate,
-												 ExecRowMark *erm,
-												 Datum rowid,
-												 bool *updated);
+typedef TupleTableSlot *(*RefetchForeignRow_function) (EState *estate,
+													   ExecRowMark *erm,
+													   Datum rowid,
+													   TupleTableSlot *slot,
+													   bool *updated);
 
 typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
 											 struct ExplainState *es);
-- 
2.18.0.windows.1

#17Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#16)
Re: Pluggable Storage - Andres's take

Hi,

Thanks for the patches!

On 2018-09-03 19:06:27 +1000, Haribabu Kommi wrote:

I found couple of places where the zheap is using some extra logic in
verifying
whether it is zheap AM or not, based on that it used to took some extra
decisions.
I am analyzing all the extra code that is done, whether any callbacks can
handle it
or not? and how? I can come back with more details later.

Yea, I think some of them will need to stay (particularly around
integrating undo) and som other ones we'll need to abstract.

And then:
- lotsa cleanups
- rebasing onto a newer version of the abstract slot patchset
- splitting out smaller patches

You'd moved the bulk insert into tableam callbacks - I don't quite get
why? There's not really anything AM specific in that code?

The main reason of adding them to AM is just to provide a control to
the specific AM to decide whether they can support the bulk insert or
not?

Current framework doesn't support AM specific bulk insert state to be
passed from one function to another and it's structure is fixed. This needs
to be enhanced to add AM specific private members also.

Do you want me to work on it to make it generic to AM methods to extend
the structure?

I think the best thing here would be to *remove* all AM abstraction for
bulk insert, until it's actuallly needed. The likelihood of us getting
the interface right and useful without an actual user seems low. Also,
this already is a huge patch...

@@ -308,7 +308,7 @@ static void CopyFromInsertBatch(CopyState cstate, EState *estate,
CommandId mycid, int hi_options,
ResultRelInfo *resultRelInfo,
BulkInsertState bistate,
-					int nBufferedTuples, TupleTableSlot **bufferedSlots,
+					int nBufferedSlots, TupleTableSlot **bufferedSlots,
uint64 firstBufferedLineNo);
static bool CopyReadLine(CopyState cstate);
static bool CopyReadLineText(CopyState cstate);
@@ -2309,11 +2309,12 @@ CopyFrom(CopyState cstate)
void       *bistate;
uint64		processed = 0;
bool		useHeapMultiInsert;
-	int			nBufferedTuples = 0;
+	int			nBufferedSlots = 0;
int			prev_leaf_part_index = -1;
-#define MAX_BUFFERED_TUPLES 1000
+#define MAX_BUFFERED_SLOTS 1000

What's the point of these renames? We're still dealing in tuples. Just
seems to make the patch larger.

if (useHeapMultiInsert)
{
+					int tup_size;
+
/* Add this tuple to the tuple buffer */
-					if (nBufferedTuples == 0)
+					if (nBufferedSlots == 0)
+					{
firstBufferedLineNo = cstate->cur_lineno;
-					Assert(bufferedSlots[nBufferedTuples] == myslot);
-					nBufferedTuples++;
+
+						/*
+						 * Find out the Tuple size of the first tuple in a batch and
+						 * use it for the rest tuples in a batch. There may be scenarios
+						 * where the first tuple is very small and rest can be large, but
+						 * that's rare and this should work for majority of the scenarios.
+						 */
+						tup_size = heap_compute_data_size(myslot->tts_tupleDescriptor,
+														  myslot->tts_values,
+														  myslot->tts_isnull);
+					}

This seems too exensive to me. I think it'd be better if we instead
used the amount of input data consumed for the tuple as a proxy. Does that
sound reasonable?

Greetings,

Andres Freund

#18Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#17)
Re: Pluggable Storage - Andres's take

On Tue, Sep 4, 2018 at 10:33 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

Thanks for the patches!

On 2018-09-03 19:06:27 +1000, Haribabu Kommi wrote:

I found couple of places where the zheap is using some extra logic in
verifying
whether it is zheap AM or not, based on that it used to took some extra
decisions.
I am analyzing all the extra code that is done, whether any callbacks can
handle it
or not? and how? I can come back with more details later.

Yea, I think some of them will need to stay (particularly around
integrating undo) and som other ones we'll need to abstract.

OK. I will list all the areas that I found with my observation of how to
abstract or leaving it and then implement around it.

And then:
- lotsa cleanups
- rebasing onto a newer version of the abstract slot patchset
- splitting out smaller patches

You'd moved the bulk insert into tableam callbacks - I don't quite get
why? There's not really anything AM specific in that code?

The main reason of adding them to AM is just to provide a control to
the specific AM to decide whether they can support the bulk insert or
not?

Current framework doesn't support AM specific bulk insert state to be
passed from one function to another and it's structure is fixed. This

needs

to be enhanced to add AM specific private members also.

Do you want me to work on it to make it generic to AM methods to extend
the structure?

I think the best thing here would be to *remove* all AM abstraction for
bulk insert, until it's actuallly needed. The likelihood of us getting
the interface right and useful without an actual user seems low. Also,
this already is a huge patch...

OK. Will remove them and share the patch.

@@ -308,7 +308,7 @@ static void CopyFromInsertBatch(CopyState cstate,

EState *estate,

CommandId mycid, int hi_options,
ResultRelInfo *resultRelInfo,
BulkInsertState bistate,
- int nBufferedTuples,

TupleTableSlot **bufferedSlots,

+ int nBufferedSlots, TupleTableSlot

**bufferedSlots,

uint64 firstBufferedLineNo);
static bool CopyReadLine(CopyState cstate);
static bool CopyReadLineText(CopyState cstate);
@@ -2309,11 +2309,12 @@ CopyFrom(CopyState cstate)
void       *bistate;
uint64          processed = 0;
bool            useHeapMultiInsert;
-     int                     nBufferedTuples = 0;
+     int                     nBufferedSlots = 0;
int                     prev_leaf_part_index = -1;
-#define MAX_BUFFERED_TUPLES 1000
+#define MAX_BUFFERED_SLOTS 1000

What's the point of these renames? We're still dealing in tuples. Just
seems to make the patch larger.

OK. I will correct it.

if (useHeapMultiInsert)
{
+                                     int tup_size;
+
/* Add this tuple to the tuple

buffer */

-                                     if (nBufferedTuples == 0)
+                                     if (nBufferedSlots == 0)
+                                     {
firstBufferedLineNo =

cstate->cur_lineno;

-

Assert(bufferedSlots[nBufferedTuples] == myslot);

-                                     nBufferedTuples++;
+
+                                             /*
+                                              * Find out the Tuple size

of the first tuple in a batch and

+ * use it for the rest

tuples in a batch. There may be scenarios

+ * where the first tuple

is very small and rest can be large, but

+ * that's rare and this

should work for majority of the scenarios.

+                                              */
+                                             tup_size =

heap_compute_data_size(myslot->tts_tupleDescriptor,

+

myslot->tts_values,

+

myslot->tts_isnull);

+ }

This seems too exensive to me. I think it'd be better if we instead
used the amount of input data consumed for the tuple as a proxy. Does that
sound reasonable?

Yes, the cstate structure contains the line_buf member that holds the
information of
the line length of the row, this can be used as a tuple length to limit the
size usage.
comments?

Regards,
Haribabu Kommi
Fujitsu Australia

#19Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#18)
3 attachment(s)
Re: Pluggable Storage - Andres's take

On Wed, Sep 5, 2018 at 2:04 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Tue, Sep 4, 2018 at 10:33 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

Thanks for the patches!

On 2018-09-03 19:06:27 +1000, Haribabu Kommi wrote:

I found couple of places where the zheap is using some extra logic in
verifying
whether it is zheap AM or not, based on that it used to took some extra
decisions.
I am analyzing all the extra code that is done, whether any callbacks

can

handle it
or not? and how? I can come back with more details later.

Yea, I think some of them will need to stay (particularly around
integrating undo) and som other ones we'll need to abstract.

OK. I will list all the areas that I found with my observation of how to
abstract or leaving it and then implement around it.

The following are the change where the code is specific to checking whether
it is a zheap relation or not?

Overall I found that It needs 3 new API's at the following locations.
1. RelationSetNewRelfilenode
2. heap_create_init_fork
3. estimate_rel_size
4. Facility to provide handler options like (skip WAL and etc).
_hash_vacuum_one_page:
xlrec.flags = RelationStorageIsZHeap(heapRel) ?
XLOG_HASH_VACUUM_RELATION_STORAGE_ZHEAP : 0;

_bt_delitems_delete:
xlrec_delete.flags = RelationStorageIsZHeap(heapRel) ?
XLOG_BTREE_DELETE_RELATION_STORAGE_ZHEAP : 0;

Storing the type of the handler and while checking for these new types
adding a
new API for special handing can remove the need of the above code.
RelationAddExtraBlocks:
if (RelationStorageIsZHeap(relation))
{
ZheapInitPage(page, BufferGetPageSize(buffer));
freespace = PageGetZHeapFreeSpace(page);
}

Adding a new API for PageInt and PageGetHeapFreeSpace to redirect the calls
to specific
table am handlers.

visibilitymap_set:
if (RelationStorageIsZHeap(rel))
{
recptr = log_zheap_visible(rel->rd_node, heapBuf, vmBuf,
cutoff_xid, flags);
/*
* We do not have a page wise visibility flag in zheap.
* So no need to set LSN on zheap page.
*/
}

Handler options may solve need of above code.

validate_index_heapscan:
/* Set up for predicate or expression evaluation */
/* For zheap relations, the tuple is locally allocated, so free it. */
ExecStoreHeapTuple(heapTuple, slot, RelationStorageIsZHeap(heapRelation));

This will solve after changing the validate_index_heapscan function to
slotify.

RelationTruncate:
/* Create the meta page for zheap */
if (RelationStorageIsZHeap(rel))
RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
InvalidTransactionId,
InvalidMultiXactId);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED &&
rel->rd_rel->relkind != 'p')
{
heap_create_init_fork(rel);
if (RelationStorageIsZHeap(rel))
ZheapInitMetaPage(rel, INIT_FORKNUM);
}

new API in RelationSetNewRelfilenode and heap_create_init_fork can solve it.
cluster:
if (RelationStorageIsZHeap(rel))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot cluster a zheap table")));

No change required.

copyFrom:
/*
* In zheap, we don't support the optimization for HEAP_INSERT_SKIP_WAL.
* See zheap_prepare_insert for details.
* PBORKED / ZBORKED: abstract
*/
if (!RelationStorageIsZHeap(cstate->rel) && !XLogIsNeeded())
hi_options |= HEAP_INSERT_SKIP_WAL;
How about requesting the table am handler to provide options and use them
here?
ExecuteTruncateGuts:

// PBORKED: Need to abstract this
minmulti = GetOldestMultiXactId();

/*
* Need the full transaction-safe pushups.
*
* Create a new empty storage file for the relation, and assign it
* as the relfilenode value. The old storage file is scheduled for
* deletion at commit.
*
* PBORKED: needs to be a callback
*/
if (RelationStorageIsZHeap(rel))
RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
InvalidTransactionId, InvalidMultiXactId);
else
RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
RecentXmin, minmulti);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
{
heap_create_init_fork(rel);
if (RelationStorageIsZHeap(rel))
ZheapInitMetaPage(rel, INIT_FORKNUM);
}

New API inside RelationSetNewRelfilenode can handle it.
ATRewriteCatalogs:
/* Inherit the storage_engine reloption from the parent table. */
if (RelationStorageIsZHeap(rel))
{
static char *validnsps[] = HEAP_RELOPT_NAMESPACES;
DefElem *storage_engine;

storage_engine = makeDefElemExtended("toast", "storage_engine",
(Node *) makeString("zheap"),
DEFELEM_UNSPEC, -1);
reloptions = transformRelOptions((Datum) 0,
list_make1(storage_engine),
"toast",
validnsps, true, false);
}

I don't think anything can be done in API.

ATRewriteTable:
/*
* In zheap, we don't support the optimization for HEAP_INSERT_SKIP_WAL.
* See zheap_prepare_insert for details.
*
* ZFIXME / PFIXME: We probably need a different abstraction for this.
*/
if (!RelationStorageIsZHeap(newrel) && !XLogIsNeeded())
hi_options |= HEAP_INSERT_SKIP_WAL;

Options can solve this also.

estimate_rel_size:
if (curpages < 10 &&
(rel->rd_rel->relpages == 0 ||
(RelationStorageIsZHeap(rel) &&
rel->rd_rel->relpages == ZHEAP_METAPAGE + 1)) &&
!rel->rd_rel->relhassubclass &&
rel->rd_rel->relkind != RELKIND_INDEX)
curpages = 10;

/* report estimated # pages */
*pages = curpages;
/* quick exit if rel is clearly empty */
if (curpages == 0 || (RelationStorageIsZHeap(rel) &&
curpages == ZHEAP_METAPAGE + 1))
{
*tuples = 0;
*allvisfrac = 0;
break;
}
/* coerce values in pg_class to more desirable types */
relpages = (BlockNumber) rel->rd_rel->relpages;
reltuples = (double) rel->rd_rel->reltuples;
relallvisible = (BlockNumber) rel->rd_rel->relallvisible;

/*
* If it's a zheap relation, then subtract the pages
* to account for the metapage.
*/
if (relpages > 0 && RelationStorageIsZHeap(rel))
{
curpages--;
relpages--;
}

An API may be needed to find out estimation size based on handler type?
pg_stat_get_tuples_hot_updated and others:
/*
* Counter tuples_hot_updated stores number of hot updates for heap table
* and the number of inplace updates for zheap table.
*/
if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL ||
RelationStorageIsZHeap(rel))
result = 0;
else
result = (int64) (tabentry->tuples_hot_updated);

Is the special condition is needed? The values should be 0 because of zheap
right?

RelationSetNewRelfilenode:
/* Initialize the metapage for zheap relation. */
if (RelationStorageIsZHeap(relation))
ZheapInitMetaPage(relation, MAIN_FORKNUM);
New API in RelationSetNetRelfilenode Can solve this problem.

And then:
- lotsa cleanups
- rebasing onto a newer version of the abstract slot patchset
- splitting out smaller patches

You'd moved the bulk insert into tableam callbacks - I don't quite

get

why? There's not really anything AM specific in that code?

The main reason of adding them to AM is just to provide a control to
the specific AM to decide whether they can support the bulk insert or
not?

Current framework doesn't support AM specific bulk insert state to be
passed from one function to another and it's structure is fixed. This

needs

to be enhanced to add AM specific private members also.

Do you want me to work on it to make it generic to AM methods to extend
the structure?

I think the best thing here would be to *remove* all AM abstraction for
bulk insert, until it's actuallly needed. The likelihood of us getting
the interface right and useful without an actual user seems low. Also,
this already is a huge patch...

OK. Will remove them and share the patch.

Bulk insert API changes are removed.

@@ -308,7 +308,7 @@ static void CopyFromInsertBatch(CopyState cstate,

EState *estate,

CommandId mycid, int hi_options,
ResultRelInfo *resultRelInfo,
BulkInsertState bistate,
- int nBufferedTuples,

TupleTableSlot **bufferedSlots,

+ int nBufferedSlots,

TupleTableSlot **bufferedSlots,

uint64 firstBufferedLineNo);
static bool CopyReadLine(CopyState cstate);
static bool CopyReadLineText(CopyState cstate);
@@ -2309,11 +2309,12 @@ CopyFrom(CopyState cstate)
void       *bistate;
uint64          processed = 0;
bool            useHeapMultiInsert;
-     int                     nBufferedTuples = 0;
+     int                     nBufferedSlots = 0;
int                     prev_leaf_part_index = -1;
-#define MAX_BUFFERED_TUPLES 1000
+#define MAX_BUFFERED_SLOTS 1000

What's the point of these renames? We're still dealing in tuples. Just
seems to make the patch larger.

OK. I will correct it.

if (useHeapMultiInsert)
{
+                                     int tup_size;
+
/* Add this tuple to the tuple

buffer */

-                                     if (nBufferedTuples == 0)
+                                     if (nBufferedSlots == 0)
+                                     {
firstBufferedLineNo =

cstate->cur_lineno;

-

Assert(bufferedSlots[nBufferedTuples] == myslot);

-                                     nBufferedTuples++;
+
+                                             /*
+                                              * Find out the Tuple

size of the first tuple in a batch and

+ * use it for the rest

tuples in a batch. There may be scenarios

+ * where the first tuple

is very small and rest can be large, but

+ * that's rare and this

should work for majority of the scenarios.

+                                              */
+                                             tup_size =

heap_compute_data_size(myslot->tts_tupleDescriptor,

+

myslot->tts_values,

+

myslot->tts_isnull);

+ }

This seems too exensive to me. I think it'd be better if we instead
used the amount of input data consumed for the tuple as a proxy. Does that
sound reasonable?

Yes, the cstate structure contains the line_buf member that holds the
information of
the line length of the row, this can be used as a tuple length to limit
the size usage.
comments?

copy from batch insert memory usage limit fix and Provide grammer support
for USING method
to create table as also.

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0001-copy-memory-limit-fix.patchapplication/octet-stream; name=0001-copy-memory-limit-fix.patchDownload
From 67018b04b7e11ec0f0644afbbd451f5fbaf0a6d6 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Wed, 29 Aug 2018 13:52:39 +1000
Subject: [PATCH 1/3] copy memory limit fix

To limit memory used by the COPY FROM because of slotification,
calculates the tuple size of the first tuple in the batch and
use that for remaining batch, so that it almost averages the
memory usage by the COPY command.
---
 src/backend/commands/copy.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c9272b344a..a82389b1a8 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2314,6 +2314,7 @@ CopyFrom(CopyState cstate)
 
 #define MAX_BUFFERED_TUPLES 1000
 	TupleTableSlot  **bufferedSlots = NULL;	/* initialize to silence warning */
+	Size		bufferedSlotsSize = 0;
 	uint64		firstBufferedLineNo = 0;
 
 	Assert(cstate->rel);
@@ -2753,24 +2754,26 @@ CopyFrom(CopyState cstate)
 					/* Add this tuple to the tuple buffer */
 					if (nBufferedTuples == 0)
 						firstBufferedLineNo = cstate->cur_lineno;
+
 					Assert(bufferedSlots[nBufferedTuples] == myslot);
 					nBufferedTuples++;
+					bufferedSlotsSize += cstate->line_buf.len;
 
 					/*
 					 * If the buffer filled up, flush it.  Also flush if the
 					 * total size of all the tuples in the buffer becomes
 					 * large, to avoid using large amounts of memory for the
 					 * buffer when the tuples are exceptionally wide.
-					 *
-					 * PBORKED: Re-introduce size limit
 					 */
-					if (nBufferedTuples == MAX_BUFFERED_TUPLES)
+					if (nBufferedTuples == MAX_BUFFERED_TUPLES ||
+						bufferedSlotsSize > 65535)
 					{
 						CopyFromInsertBatch(cstate, estate, mycid, hi_options,
 											resultRelInfo, bistate,
 											nBufferedTuples, bufferedSlots,
 											firstBufferedLineNo);
 						nBufferedTuples = 0;
+						bufferedSlotsSize = 0;
 					}
 				}
 				else
-- 
2.18.0.windows.1

0003-CREATE-AS-USING-method-grammer-support.patchapplication/octet-stream; name=0003-CREATE-AS-USING-method-grammer-support.patchDownload
From 8b59dbc51a15fe769e29f22d5049fa42bf8eebfc Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 10 Sep 2018 17:20:56 +1000
Subject: [PATCH 3/3] CREATE AS USING method grammer support

This change was missed during earlier USING grammer support.
---
 src/backend/parser/gram.y     | 11 ++++++-----
 src/include/nodes/primnodes.h |  1 +
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 8f6f9ddae2..a9c5450a37 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4038,7 +4038,6 @@ CreateStatsStmt:
  *
  *****************************************************************************/
 
-// PBORKED: storage option
 CreateAsStmt:
 		CREATE OptTemp TABLE create_as_target AS SelectStmt opt_with_data
 				{
@@ -4069,14 +4068,16 @@ CreateAsStmt:
 		;
 
 create_as_target:
-			qualified_name opt_column_list OptWith OnCommitOption OptTableSpace
+			qualified_name opt_column_list table_access_method_clause
+			OptWith OnCommitOption OptTableSpace
 				{
 					$$ = makeNode(IntoClause);
 					$$->rel = $1;
 					$$->colNames = $2;
-					$$->options = $3;
-					$$->onCommit = $4;
-					$$->tableSpaceName = $5;
+					$$->accessMethod = $3;
+					$$->options = $4;
+					$$->onCommit = $5;
+					$$->tableSpaceName = $6;
 					$$->viewQuery = NULL;
 					$$->skipData = false;		/* might get changed later */
 				}
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..ffc788c4a3 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -108,6 +108,7 @@ typedef struct IntoClause
 
 	RangeVar   *rel;			/* target relation name */
 	List	   *colNames;		/* column names to assign, or NIL */
+	char	   *accessMethod;	/* table access method */
 	List	   *options;		/* options from WITH clause */
 	OnCommitAction onCommit;	/* what do we do at COMMIT? */
 	char	   *tableSpaceName; /* table space to use, or NULL */
-- 
2.18.0.windows.1

0002-Remove-of-Bulk-insert-state-API.patchapplication/octet-stream; name=0002-Remove-of-Bulk-insert-state-API.patchDownload
From b01172f17c561b89c79a88733241e020ecf946e3 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Wed, 5 Sep 2018 17:18:09 +1000
Subject: [PATCH 2/3] Remove of Bulk insert state API

Currently there is no requirement of exposing Bulk Insert state
APIs, as there is no use of it currently.
---
 src/backend/access/heap/heapam_handler.c |  4 ---
 src/backend/commands/copy.c              |  6 ++---
 src/backend/commands/createas.c          |  4 +--
 src/backend/commands/matview.c           |  4 +--
 src/backend/commands/tablecmds.c         |  4 +--
 src/include/access/tableam.h             | 32 ------------------------
 6 files changed, 9 insertions(+), 45 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 382148ff1d..2d5074734b 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1771,10 +1771,6 @@ static const TableAmRoutine heapam_methods = {
 	.relation_copy_for_cluster = heap_copy_for_cluster,
 	.relation_sync = heap_sync,
 
-	.getbulkinsertstate = GetBulkInsertState,
-	.freebulkinsertstate = FreeBulkInsertState,
-	.releasebulkinsertstate = ReleaseBulkInsertStatePin,
-
 	.begin_index_fetch = heapam_begin_index_fetch,
 	.reset_index_fetch = heapam_reset_index_fetch,
 	.end_index_fetch = heapam_end_index_fetch,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index a82389b1a8..49e654e4ee 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2546,7 +2546,7 @@ CopyFrom(CopyState cstate)
 	 */
 	ExecBSInsertTriggers(estate, resultRelInfo);
 
-	bistate = table_getbulkinsertstate(resultRelInfo->ri_RelationDesc);
+	bistate = GetBulkInsertState();
 	econtext = GetPerTupleExprContext(estate);
 
 	/* Set up callback to identify error line number */
@@ -2639,7 +2639,7 @@ CopyFrom(CopyState cstate)
 			 */
 			if (prev_leaf_part_index != leaf_part_index)
 			{
-				table_releasebulkinsertstate(resultRelInfo->ri_RelationDesc, bistate);
+				ReleaseBulkInsertStatePin(bistate);
 				prev_leaf_part_index = leaf_part_index;
 			}
 
@@ -2848,7 +2848,7 @@ next_tuple:
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	table_freebulkinsertstate(resultRelInfo->ri_RelationDesc, bistate);
+	FreeBulkInsertState(bistate);
 
 	MemoryContextSwitchTo(oldcontext);
 
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index f84ef0a65e..852c6becba 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -572,7 +572,7 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 */
 	myState->hi_options = HEAP_INSERT_SKIP_FSM |
 		(XLogIsNeeded() ? 0 : HEAP_INSERT_SKIP_WAL);
-	myState->bistate = table_getbulkinsertstate(intoRelationDesc);
+	myState->bistate = GetBulkInsertState();
 
 	/* Not using WAL requires smgr_targblock be initially invalid */
 	Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
@@ -611,7 +611,7 @@ intorel_shutdown(DestReceiver *self)
 {
 	DR_intorel *myState = (DR_intorel *) self;
 
-	table_freebulkinsertstate(myState->rel, myState->bistate);
+	FreeBulkInsertState(myState->bistate);
 
 	/* If we skipped using WAL, must heap_sync before commit */
 	if (myState->hi_options & HEAP_INSERT_SKIP_WAL)
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 83ee2f725e..80828ed4a6 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -466,7 +466,7 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	myState->hi_options = HEAP_INSERT_SKIP_FSM | HEAP_INSERT_FROZEN;
 	if (!XLogIsNeeded())
 		myState->hi_options |= HEAP_INSERT_SKIP_WAL;
-	myState->bistate = table_getbulkinsertstate(transientrel);
+	myState->bistate = GetBulkInsertState();
 
 	/* Not using WAL requires smgr_targblock be initially invalid */
 	Assert(RelationGetTargetBlock(transientrel) == InvalidBlockNumber);
@@ -499,7 +499,7 @@ transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
 
-	table_freebulkinsertstate(myState->transientrel, myState->bistate);
+	FreeBulkInsertState(myState->bistate);
 
 	/* If we skipped using WAL, must heap_sync before commit */
 	if (myState->hi_options & HEAP_INSERT_SKIP_WAL)
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index ff6e4486f0..d44d865ec7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -4616,7 +4616,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	if (newrel)
 	{
 		mycid = GetCurrentCommandId(true);
-		bistate = table_getbulkinsertstate(newrel);
+		bistate = GetBulkInsertState();
 
 		hi_options = HEAP_INSERT_SKIP_FSM;
 		if (!XLogIsNeeded())
@@ -4901,7 +4901,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	heap_close(oldrel, NoLock);
 	if (newrel)
 	{
-		table_freebulkinsertstate(newrel, bistate);
+		FreeBulkInsertState(bistate);
 
 		/* If we skipped writing WAL, then we need to sync the heap. */
 		if (hi_options & HEAP_INSERT_SKIP_WAL)
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 6d50410166..5f6b39c0e0 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -125,10 +125,6 @@ typedef void (*RelationCopyForCluster_function)(Relation NewHeap, Relation OldHe
 
 typedef void (*RelationSync_function) (Relation relation);
 
-typedef BulkInsertState (*GetBulkInsertState_function) (void);
-typedef void (*FreeBulkInsertState_function) (BulkInsertState bistate);
-typedef void (*ReleaseBulkInsertState_function) (BulkInsertState bistate);
-
 typedef const TupleTableSlotOps* (*SlotCallbacks_function) (Relation relation);
 
 typedef TableScanDesc (*ScanBegin_function) (Relation relation,
@@ -217,10 +213,6 @@ typedef struct TableAmRoutine
 	RelationCopyForCluster_function relation_copy_for_cluster;
 	RelationSync_function relation_sync;
 
-	GetBulkInsertState_function getbulkinsertstate;
-	FreeBulkInsertState_function freebulkinsertstate;
-	ReleaseBulkInsertState_function releasebulkinsertstate;
-
 	/* Operations on relation scans */
 	ScanBegin_function scan_begin;
 	ScanSetlimits_function scansetlimits;
@@ -650,30 +642,6 @@ table_sync(Relation rel)
 	rel->rd_tableamroutine->relation_sync(rel);
 }
 
-/*
- * -------------------
- * storage Bulk Insert functions
- * -------------------
- */
-static inline BulkInsertState
-table_getbulkinsertstate(Relation rel)
-{
-	return rel->rd_tableamroutine->getbulkinsertstate();
-}
-
-static inline void
-table_freebulkinsertstate(Relation rel, BulkInsertState bistate)
-{
-	rel->rd_tableamroutine->freebulkinsertstate(bistate);
-}
-
-static inline void
-table_releasebulkinsertstate(Relation rel, BulkInsertState bistate)
-{
-	rel->rd_tableamroutine->releasebulkinsertstate(bistate);
-}
-
-
 static inline double
 table_index_build_scan(Relation heapRelation,
 					   Relation indexRelation,
-- 
2.18.0.windows.1

#20Amit Kapila
amit.kapila16@gmail.com
In reply to: Haribabu Kommi (#19)
Re: Pluggable Storage - Andres's take

On Mon, Sep 10, 2018 at 1:12 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

On Wed, Sep 5, 2018 at 2:04 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

pg_stat_get_tuples_hot_updated and others:
/*
* Counter tuples_hot_updated stores number of hot updates for heap table
* and the number of inplace updates for zheap table.
*/
if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL ||
RelationStorageIsZHeap(rel))
result = 0;
else
result = (int64) (tabentry->tuples_hot_updated);

Is the special condition is needed? The values should be 0 because of zheap right?

I also think so. Beena/Mithun has worked on this part of the code, so
it is better if they also confirm once.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#21Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Amit Kapila (#20)
Re: Pluggable Storage - Andres's take

On Mon, Sep 10, 2018 at 7:33 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Sep 10, 2018 at 1:12 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

On Wed, Sep 5, 2018 at 2:04 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

pg_stat_get_tuples_hot_updated and others:
/*
* Counter tuples_hot_updated stores number of hot updates for heap table
* and the number of inplace updates for zheap table.
*/
if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL ||
RelationStorageIsZHeap(rel))
result = 0;
else
result = (int64) (tabentry->tuples_hot_updated);

Is the special condition is needed? The values should be 0 because of zheap right?

I also think so. Beena/Mithun has worked on this part of the code, so
it is better if they also confirm once.

Yes pg_stat_get_tuples_hot_updated should return 0 for zheap.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

#22Beena Emerson
memissemerson@gmail.com
In reply to: Amit Kapila (#20)
Re: Pluggable Storage - Andres's take

Hello,

On Mon, 10 Sep 2018, 19:33 Amit Kapila, <amit.kapila16@gmail.com> wrote:

On Mon, Sep 10, 2018 at 1:12 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Wed, Sep 5, 2018 at 2:04 PM Haribabu Kommi <kommi.haribabu@gmail.com>

wrote:

pg_stat_get_tuples_hot_updated and others:
/*
* Counter tuples_hot_updated stores number of hot updates for heap table
* and the number of inplace updates for zheap table.
*/
if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL ||
RelationStorageIsZHeap(rel))
result = 0;
else
result = (int64) (tabentry->tuples_hot_updated);

Is the special condition is needed? The values should be 0 because of

zheap right?

I also think so. Beena/Mithun has worked on this part of the code, so
it is better if they also confirm once.

We have used the hot_updated counter to count the number of inplace updates
for zheap to qvoid introducing a new counter. Though, technically, hot
updates are 0 for zheap, the counter could hold non-zero value indicating
the inplace updates.

Thank you

Show quoted text
#23Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#19)
Re: Pluggable Storage - Andres's take

On Mon, Sep 10, 2018 at 5:42 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Wed, Sep 5, 2018 at 2:04 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Tue, Sep 4, 2018 at 10:33 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

Thanks for the patches!

On 2018-09-03 19:06:27 +1000, Haribabu Kommi wrote:

I found couple of places where the zheap is using some extra logic in
verifying
whether it is zheap AM or not, based on that it used to took some extra
decisions.
I am analyzing all the extra code that is done, whether any callbacks

can

handle it
or not? and how? I can come back with more details later.

Yea, I think some of them will need to stay (particularly around
integrating undo) and som other ones we'll need to abstract.

OK. I will list all the areas that I found with my observation of how to
abstract or leaving it and then implement around it.

The following are the change where the code is specific to checking whether
it is a zheap relation or not?

Overall I found that It needs 3 new API's at the following locations.
1. RelationSetNewRelfilenode
2. heap_create_init_fork
3. estimate_rel_size
4. Facility to provide handler options like (skip WAL and etc).

During the porting of Fujitsu in-memory columnar store on top of pluggable
storage, I found that the callers of the "heap_beginscan" are expecting
the returned data is always contains all the records.

For example, in the sequential scan, the heap returns the slot with
the tuple or with value array of all the columns and then the data gets
filtered and later removed the unnecessary columns with projection.
This works fine for the row based storage. For columnar storage, if
the storage knows that upper layers needs only particular columns,
then they can directly return the specified columns and there is no
need of projection step. This will help the columnar storage also
to return proper columns in a faster way.

Is it good to pass the plan to the storage, so that they can find out
the columns that needs to be returned? And also if the projection
can handle in the storage itself for some scenarios, need to be
informed the callers that there is no need to perform the projection
extra.

comments?

Regards,
Haribabu Kommi
Fujitsu Australia

#24Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#23)
Re: Pluggable Storage - Andres's take

Hi,

On 2018-09-21 16:57:43 +1000, Haribabu Kommi wrote:

During the porting of Fujitsu in-memory columnar store on top of pluggable
storage, I found that the callers of the "heap_beginscan" are expecting
the returned data is always contains all the records.

Right.

For example, in the sequential scan, the heap returns the slot with
the tuple or with value array of all the columns and then the data gets
filtered and later removed the unnecessary columns with projection.
This works fine for the row based storage. For columnar storage, if
the storage knows that upper layers needs only particular columns,
then they can directly return the specified columns and there is no
need of projection step. This will help the columnar storage also
to return proper columns in a faster way.

I think this is an important feature, but I feel fairly strongly that we
should only tackle it in a second version. This patchset is already
pretty darn large. It's imo not just helpful for columnar, but even for
heap - we e.g. spend a lot of time deforming columns that are never
accessed. That's particularly harmful when the leading columns are all
NOT NULL and fixed width, but even if not, it's painful.

Is it good to pass the plan to the storage, so that they can find out
the columns that needs to be returned?

I don't think that's the right approach - this should be a level *below*
plan nodes, not reference them. I suspect we're going to have to have a
new table_scan_set_columnlist() option or such.

And also if the projection can handle in the storage itself for some
scenarios, need to be informed the callers that there is no need to
perform the projection extra.

I don't think that should be done in the storage layer - that's probably
better done introducing custom scan nodes and such. This has costing
implications etc, so this needs to happen *before* planning is finished.

Greetings,

Andres Freund

#25Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#24)
Re: Pluggable Storage - Andres's take

On Fri, Sep 21, 2018 at 5:05 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-09-21 16:57:43 +1000, Haribabu Kommi wrote:

For example, in the sequential scan, the heap returns the slot with
the tuple or with value array of all the columns and then the data gets
filtered and later removed the unnecessary columns with projection.
This works fine for the row based storage. For columnar storage, if
the storage knows that upper layers needs only particular columns,
then they can directly return the specified columns and there is no
need of projection step. This will help the columnar storage also
to return proper columns in a faster way.

I think this is an important feature, but I feel fairly strongly that we
should only tackle it in a second version. This patchset is already
pretty darn large. It's imo not just helpful for columnar, but even for
heap - we e.g. spend a lot of time deforming columns that are never
accessed. That's particularly harmful when the leading columns are all
NOT NULL and fixed width, but even if not, it's painful.

OK. Thanks for your opinion.
Then I will first try to cleanup the open items of the existing patch.

Is it good to pass the plan to the storage, so that they can find out

the columns that needs to be returned?

I don't think that's the right approach - this should be a level *below*
plan nodes, not reference them. I suspect we're going to have to have a
new table_scan_set_columnlist() option or such.

The table_scan_set_columnlist() API can be a good solution to share
the columns that are expected.

And also if the projection can handle in the storage itself for some
scenarios, need to be informed the callers that there is no need to
perform the projection extra.

I don't think that should be done in the storage layer - that's probably
better done introducing custom scan nodes and such. This has costing
implications etc, so this needs to happen *before* planning is finished.

Sorry, my explanation was wrong, Assuming a scenario where the target list
contains only the plain columns of a table and these columns are already
passed
to storage using the above proposed new API and their of one to one mapping.
Based on the above info, deciding whether the projection is required or not
is good.

Regards,
Haribabu Kommi
Fujitsu Australia

#26Alexander Korotkov
a.korotkov@postgrespro.ru
In reply to: Andres Freund (#14)
Re: Pluggable Storage - Andres's take

On Fri, Aug 24, 2018 at 5:50 AM Andres Freund <andres@anarazel.de> wrote:

I've pushed a current version of that to my git tree to the
pluggable-storage branch. It's not really a version that I think makese
sense to review or such, but it's probably more useful if you work based
on that. There's also the pluggable-zheap branch, which I found
extremely useful to develop against.

BTW, I'm going to take a look at current shape of this patch and share
my thoughts. But where are the branches you're referring? On your
postgres.org git repository pluggable-storage brach was updates last
time at June 7. And on the github branches are updated at August 5
and 14, and that is still much older than your email (August 24)...

1. https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/pluggable-storage
2. https://github.com/anarazel/postgres-pluggable-storage
3, https://github.com/anarazel/postgres-pluggable-storage/tree/pluggable-zheap

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#27Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Alexander Korotkov (#26)
Re: Pluggable Storage - Andres's take

On Mon, Sep 24, 2018 at 5:02 AM Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:

On Fri, Aug 24, 2018 at 5:50 AM Andres Freund <andres@anarazel.de> wrote:

I've pushed a current version of that to my git tree to the
pluggable-storage branch. It's not really a version that I think makese
sense to review or such, but it's probably more useful if you work based
on that. There's also the pluggable-zheap branch, which I found
extremely useful to develop against.

BTW, I'm going to take a look at current shape of this patch and share
my thoughts. But where are the branches you're referring? On your
postgres.org git repository pluggable-storage brach was updates last
time at June 7. And on the github branches are updated at August 5
and 14, and that is still much older than your email (August 24)...

The code is latest, but the commit time is older, I feel that is because of
commit squash.

pluggable-storage is the branch where the pluggable storage code is present
and pluggable-zheap branch where zheap is rebased on top of pluggable
storage.

Regards,
Haribabu Kommi
Fujitsu Australia

#28Alexander Korotkov
a.korotkov@postgrespro.ru
In reply to: Haribabu Kommi (#27)
Re: Pluggable Storage - Andres's take

On Mon, Sep 24, 2018 at 8:04 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

On Mon, Sep 24, 2018 at 5:02 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

On Fri, Aug 24, 2018 at 5:50 AM Andres Freund <andres@anarazel.de> wrote:

I've pushed a current version of that to my git tree to the
pluggable-storage branch. It's not really a version that I think makese
sense to review or such, but it's probably more useful if you work based
on that. There's also the pluggable-zheap branch, which I found
extremely useful to develop against.

BTW, I'm going to take a look at current shape of this patch and share
my thoughts. But where are the branches you're referring? On your
postgres.org git repository pluggable-storage brach was updates last
time at June 7. And on the github branches are updated at August 5
and 14, and that is still much older than your email (August 24)...

The code is latest, but the commit time is older, I feel that is because of
commit squash.

pluggable-storage is the branch where the pluggable storage code is present
and pluggable-zheap branch where zheap is rebased on top of pluggable
storage.

Got it, thanks!

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#29Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#25)
3 attachment(s)
Re: Pluggable Storage - Andres's take

On Fri, Sep 21, 2018 at 5:40 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Fri, Sep 21, 2018 at 5:05 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-09-21 16:57:43 +1000, Haribabu Kommi wrote:

For example, in the sequential scan, the heap returns the slot with
the tuple or with value array of all the columns and then the data gets
filtered and later removed the unnecessary columns with projection.
This works fine for the row based storage. For columnar storage, if
the storage knows that upper layers needs only particular columns,
then they can directly return the specified columns and there is no
need of projection step. This will help the columnar storage also
to return proper columns in a faster way.

I think this is an important feature, but I feel fairly strongly that we
should only tackle it in a second version. This patchset is already
pretty darn large. It's imo not just helpful for columnar, but even for
heap - we e.g. spend a lot of time deforming columns that are never
accessed. That's particularly harmful when the leading columns are all
NOT NULL and fixed width, but even if not, it's painful.

OK. Thanks for your opinion.
Then I will first try to cleanup the open items of the existing patch.

Here I attached further cleanup patches.
1. Re-arrange the GUC variable
2. Added a check function hook for default_table_access_method GUC
3. Added a new hook validate_index. I tried to change the function
validate_index_heapscan to slotify, but that have many problems as it
is accessing some internals of the heapscandesc structure and accessing
the buffer and etc.

So I added a new hook and provided a callback to handle the index insert.
Please check and let me know comments?

I will further add the new API's that are discussed for Zheap storage and
share the patch.

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0003-validate-index-scan-hook-addition.patchapplication/octet-stream; name=0003-validate-index-scan-hook-addition.patchDownload
From 3a783b0e62c6f93eba808e6a3c6be3c479484a5d Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Fri, 28 Sep 2018 11:25:07 +1000
Subject: [PATCH 3/3] validate index scan hook addition

Slotify the validate index is having problems as it tries to access
the buffer stored in the scandesc, so made a callback to get the
control from back.

This may needs further visit as the callback may need further
abstraction
---
 src/backend/access/heap/heapam_handler.c | 243 ++++++++++++++++-
 src/backend/catalog/index.c              | 318 +++--------------------
 src/include/access/tableam.h             |  27 ++
 src/include/catalog/index.h              |  48 ++++
 4 files changed, 352 insertions(+), 284 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 2d5074734b..ee8a658c6d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1120,6 +1120,246 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 	return reltuples;
 }
 
+/*
+ * validate_index_heapscan - second table scan for concurrent index build
+ *
+ * This has much code in common with IndexBuildHeapScan, but it's enough
+ * different that it seems cleaner to have two routines not one.
+ */
+static uint64
+validate_index_heapscan(Relation heapRelation,
+						Relation indexRelation,
+						IndexInfo *indexInfo,
+						Snapshot snapshot,
+						Tuplesortstate *tuplesort,
+						IndexValidateCallback callback,
+						void *callback_state)
+{
+	TableScanDesc sscan;
+	HeapScanDesc scan;
+	HeapTuple	heapTuple;
+	Datum		values[INDEX_MAX_KEYS];
+	bool		isnull[INDEX_MAX_KEYS];
+	ExprState  *predicate;
+	TupleTableSlot *slot;
+	EState	   *estate;
+	ExprContext *econtext;
+	BlockNumber root_blkno = InvalidBlockNumber;
+	OffsetNumber root_offsets[MaxHeapTuplesPerPage];
+	bool		in_index[MaxHeapTuplesPerPage];
+
+	/* state variables for the merge */
+	ItemPointer indexcursor = NULL;
+	ItemPointerData decoded;
+	bool		tuplesort_empty = false;
+	uint64 		nhtups = 0;
+
+	/*
+	 * sanity checks
+	 */
+	Assert(OidIsValid(indexRelation->rd_rel->relam));
+
+	/*
+	 * Need an EState for evaluation of index expressions and partial-index
+	 * predicates.  Also a slot to hold the current tuple.
+	 */
+	estate = CreateExecutorState();
+	econtext = GetPerTupleExprContext(estate);
+	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
+									&TTSOpsHeapTuple);
+
+	/* Arrange for econtext's scan tuple to be the tuple under test */
+	econtext->ecxt_scantuple = slot;
+
+	/* Set up execution state for predicate, if any. */
+	predicate = ExecPrepareQual(indexInfo->ii_Predicate, estate);
+
+	/*
+	 * Prepare for scan of the base relation.  We need just those tuples
+	 * satisfying the passed-in reference snapshot.  We must disable syncscan
+	 * here, because it's critical that we read from block zero forward to
+	 * match the sorted TIDs.
+	 */
+	sscan = heap_beginscan(heapRelation,	/* relation */
+						  snapshot,	/* snapshot */
+						  0,	/* number of keys */
+						  NULL,	/* scan key */
+						  NULL,
+						  true,	/* buffer access strategy OK */
+						  false,	/* syncscan not OK */
+						  true,
+						  false,
+						  false,
+						  false);
+
+	scan = (HeapScanDesc)sscan;
+
+	/*
+	 * Scan all tuples matching the snapshot.
+	 */
+	while ((heapTuple = heap_getnext(sscan, ForwardScanDirection)) != NULL)
+	{
+		ItemPointer heapcursor = &heapTuple->t_self;
+		ItemPointerData rootTuple;
+		OffsetNumber root_offnum;
+
+		CHECK_FOR_INTERRUPTS();
+
+		nhtups += 1;
+
+		/*
+		 * As commented in IndexBuildHeapScan, we should index heap-only
+		 * tuples under the TIDs of their root tuples; so when we advance onto
+		 * a new heap page, build a map of root item offsets on the page.
+		 *
+		 * This complicates merging against the tuplesort output: we will
+		 * visit the live tuples in order by their offsets, but the root
+		 * offsets that we need to compare against the index contents might be
+		 * ordered differently.  So we might have to "look back" within the
+		 * tuplesort output, but only within the current page.  We handle that
+		 * by keeping a bool array in_index[] showing all the
+		 * already-passed-over tuplesort output TIDs of the current page. We
+		 * clear that array here, when advancing onto a new heap page.
+		 */
+		if (scan->rs_cblock != root_blkno)
+		{
+			Page		page = BufferGetPage(scan->rs_cbuf);
+
+			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+			heap_get_root_tuples(page, root_offsets);
+			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+
+			memset(in_index, 0, sizeof(in_index));
+
+			root_blkno = scan->rs_cblock;
+		}
+
+		/* Convert actual tuple TID to root TID */
+		rootTuple = *heapcursor;
+		root_offnum = ItemPointerGetOffsetNumber(heapcursor);
+
+		if (HeapTupleIsHeapOnly(heapTuple))
+		{
+			root_offnum = root_offsets[root_offnum - 1];
+			if (!OffsetNumberIsValid(root_offnum))
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"",
+										 ItemPointerGetBlockNumber(heapcursor),
+										 ItemPointerGetOffsetNumber(heapcursor),
+										 RelationGetRelationName(heapRelation))));
+			ItemPointerSetOffsetNumber(&rootTuple, root_offnum);
+		}
+
+		/*
+		 * "merge" by skipping through the index tuples until we find or pass
+		 * the current root tuple.
+		 */
+		while (!tuplesort_empty &&
+			   (!indexcursor ||
+				ItemPointerCompare(indexcursor, &rootTuple) < 0))
+		{
+			Datum		ts_val;
+			bool		ts_isnull;
+
+			if (indexcursor)
+			{
+				/*
+				 * Remember index items seen earlier on the current heap page
+				 */
+				if (ItemPointerGetBlockNumber(indexcursor) == root_blkno)
+					in_index[ItemPointerGetOffsetNumber(indexcursor) - 1] = true;
+			}
+
+			tuplesort_empty = !tuplesort_getdatum(tuplesort, true,
+												  &ts_val, &ts_isnull, NULL);
+			Assert(tuplesort_empty || !ts_isnull);
+			if (!tuplesort_empty)
+			{
+				itemptr_decode(&decoded, DatumGetInt64(ts_val));
+				indexcursor = &decoded;
+
+				/* If int8 is pass-by-ref, free (encoded) TID Datum memory */
+#ifndef USE_FLOAT8_BYVAL
+				pfree(DatumGetPointer(ts_val));
+#endif
+			}
+			else
+			{
+				/* Be tidy */
+				indexcursor = NULL;
+			}
+		}
+
+		/*
+		 * If the tuplesort has overshot *and* we didn't see a match earlier,
+		 * then this tuple is missing from the index, so insert it.
+		 */
+		if ((tuplesort_empty ||
+			 ItemPointerCompare(indexcursor, &rootTuple) > 0) &&
+			!in_index[root_offnum - 1])
+		{
+			MemoryContextReset(econtext->ecxt_per_tuple_memory);
+
+			/* Set up for predicate or expression evaluation */
+			ExecStoreHeapTuple(heapTuple, slot, false);
+
+			/*
+			 * In a partial index, discard tuples that don't satisfy the
+			 * predicate.
+			 */
+			if (predicate != NULL)
+			{
+				if (!ExecQual(predicate, econtext))
+					continue;
+			}
+
+			/*
+			 * For the current heap tuple, extract all the attributes we use
+			 * in this index, and note which are null.  This also performs
+			 * evaluation of any expressions needed.
+			 */
+			FormIndexDatum(indexInfo,
+						   slot,
+						   estate,
+						   values,
+						   isnull);
+
+			/*
+			 * You'd think we should go ahead and build the index tuple here,
+			 * but some index AMs want to do further processing on the data
+			 * first. So pass the values[] and isnull[] arrays, instead.
+			 */
+
+			/*
+			 * If the tuple is already committed dead, you might think we
+			 * could suppress uniqueness checking, but this is no longer true
+			 * in the presence of HOT, because the insert is actually a proxy
+			 * for a uniqueness check on the whole HOT-chain.  That is, the
+			 * tuple we have here could be dead because it was already
+			 * HOT-updated, and if so the updating transaction will not have
+			 * thought it should insert index entries.  The index AM will
+			 * check the whole HOT-chain and correctly detect a conflict if
+			 * there is one.
+			 */
+
+			callback(indexRelation, values, isnull, &rootTuple, heapRelation,
+							indexInfo, callback_state);
+		}
+	}
+
+	table_endscan(sscan);
+
+	ExecDropSingleTupleTableSlot(slot);
+
+	FreeExecutorState(estate);
+
+	/* These may have been pointing to the now-gone estate */
+	indexInfo->ii_ExpressionsState = NIL;
+	indexInfo->ii_PredicateState = NULL;
+
+	return nhtups;
+}
 
 static bool
 heapam_scan_bitmap_pagescan(TableScanDesc sscan,
@@ -1775,7 +2015,8 @@ static const TableAmRoutine heapam_methods = {
 	.reset_index_fetch = heapam_reset_index_fetch,
 	.end_index_fetch = heapam_end_index_fetch,
 
-	.index_build_range_scan = IndexBuildHeapRangeScan
+	.index_build_range_scan = IndexBuildHeapRangeScan,
+	.validate_index_scan = validate_index_heapscan
 };
 
 const TableAmRoutine *
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 2fe66972a1..a0096e60ca 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -81,7 +81,7 @@
 /* Potentially set by pg_upgrade_support functions */
 Oid			binary_upgrade_next_index_pg_class_oid = InvalidOid;
 
-/* state info for validate_index bulkdelete callback */
+/*  state info for validate_index bulkdelete callback */
 typedef struct
 {
 	Tuplesortstate *tuplesort;	/* for sorting the index TIDs */
@@ -134,11 +134,13 @@ static void IndexCheckExclusion(Relation heapRelation,
 static inline int64 itemptr_encode(ItemPointer itemptr);
 static inline void itemptr_decode(ItemPointer itemptr, int64 encoded);
 static bool validate_index_callback(ItemPointer itemptr, void *opaque);
-static void validate_index_heapscan(Relation heapRelation,
-						Relation indexRelation,
-						IndexInfo *indexInfo,
-						Snapshot snapshot,
-						v_i_state *state);
+static void validate_index_scan_callbck(Relation indexRelation,
+					Datum *values,
+					bool *isnull,
+					ItemPointer rootTuple,
+					Relation heapRelation,
+					IndexInfo *indexInfo,
+					void *opaque);
 static bool ReindexIsCurrentlyProcessingIndex(Oid indexOid);
 static void SetReindexProcessing(Oid heapOid, Oid indexOid);
 static void ResetReindexProcessing(void);
@@ -2638,11 +2640,13 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	/*
 	 * Now scan the heap and "merge" it with the index
 	 */
-	validate_index_heapscan(heapRelation,
-							indexRelation,
-							indexInfo,
-							snapshot,
-							&state);
+	state.htups = table_validate_index(heapRelation,
+									   indexRelation,
+									   indexInfo,
+									   snapshot,
+									   state.tuplesort,
+									   validate_index_scan_callbck,
+									   &state);
 
 	/* Done with tuplesort object */
 	tuplesort_end(state.tuplesort);
@@ -2662,45 +2666,6 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	heap_close(heapRelation, NoLock);
 }
 
-/*
- * itemptr_encode - Encode ItemPointer as int64/int8
- *
- * This representation must produce values encoded as int64 that sort in the
- * same order as their corresponding original TID values would (using the
- * default int8 opclass to produce a result equivalent to the default TID
- * opclass).
- *
- * As noted in validate_index(), this can be significantly faster.
- */
-static inline int64
-itemptr_encode(ItemPointer itemptr)
-{
-	BlockNumber block = ItemPointerGetBlockNumber(itemptr);
-	OffsetNumber offset = ItemPointerGetOffsetNumber(itemptr);
-	int64		encoded;
-
-	/*
-	 * Use the 16 least significant bits for the offset.  32 adjacent bits are
-	 * used for the block number.  Since remaining bits are unused, there
-	 * cannot be negative encoded values (We assume a two's complement
-	 * representation).
-	 */
-	encoded = ((uint64) block << 16) | (uint16) offset;
-
-	return encoded;
-}
-
-/*
- * itemptr_decode - Decode int64/int8 representation back to ItemPointer
- */
-static inline void
-itemptr_decode(ItemPointer itemptr, int64 encoded)
-{
-	BlockNumber block = (BlockNumber) (encoded >> 16);
-	OffsetNumber offset = (OffsetNumber) (encoded & 0xFFFF);
-
-	ItemPointerSet(itemptr, block, offset);
-}
 
 /*
  * validate_index_callback - bulkdelete callback to collect the index TIDs
@@ -2717,242 +2682,29 @@ validate_index_callback(ItemPointer itemptr, void *opaque)
 }
 
 /*
- * validate_index_heapscan - second table scan for concurrent index build
- *
- * This has much code in common with IndexBuildHeapScan, but it's enough
- * different that it seems cleaner to have two routines not one.
+ * validate_index_scan_callbck - callback to insert into the index
  */
 static void
-validate_index_heapscan(Relation heapRelation,
-						Relation indexRelation,
-						IndexInfo *indexInfo,
-						Snapshot snapshot,
-						v_i_state *state)
+validate_index_scan_callbck(Relation indexRelation,
+							Datum *values,
+							bool *isnull,
+							ItemPointer rootTuple,
+							Relation heapRelation,
+							IndexInfo *indexInfo,
+							void *opaque)
 {
-	TableScanDesc sscan;
-	HeapScanDesc scan;
-	HeapTuple	heapTuple;
-	Datum		values[INDEX_MAX_KEYS];
-	bool		isnull[INDEX_MAX_KEYS];
-	ExprState  *predicate;
-	TupleTableSlot *slot;
-	EState	   *estate;
-	ExprContext *econtext;
-	BlockNumber root_blkno = InvalidBlockNumber;
-	OffsetNumber root_offsets[MaxHeapTuplesPerPage];
-	bool		in_index[MaxHeapTuplesPerPage];
-
-	/* state variables for the merge */
-	ItemPointer indexcursor = NULL;
-	ItemPointerData decoded;
-	bool		tuplesort_empty = false;
-
-	/*
-	 * sanity checks
-	 */
-	Assert(OidIsValid(indexRelation->rd_rel->relam));
-
-	/*
-	 * Need an EState for evaluation of index expressions and partial-index
-	 * predicates.  Also a slot to hold the current tuple.
-	 */
-	estate = CreateExecutorState();
-	econtext = GetPerTupleExprContext(estate);
-	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
-									&TTSOpsHeapTuple);
-
-	/* Arrange for econtext's scan tuple to be the tuple under test */
-	econtext->ecxt_scantuple = slot;
-
-	/* Set up execution state for predicate, if any. */
-	predicate = ExecPrepareQual(indexInfo->ii_Predicate, estate);
-
-	/*
-	 * Prepare for scan of the base relation.  We need just those tuples
-	 * satisfying the passed-in reference snapshot.  We must disable syncscan
-	 * here, because it's critical that we read from block zero forward to
-	 * match the sorted TIDs.
-	 */
-	sscan = table_beginscan_strat(heapRelation,	/* relation */
-								   snapshot,	/* snapshot */
-								   0,	/* number of keys */
-								   NULL,	/* scan key */
-								   true,	/* buffer access strategy OK */
-								   false);	/* syncscan not OK */
-	scan = (HeapScanDesc) sscan;
-
-	/*
-	 * Scan all tuples matching the snapshot.
-	 */
-	// PBORKED: slotify
-	while ((heapTuple = heap_scan_getnext(sscan, ForwardScanDirection)) != NULL)
-	{
-		ItemPointer heapcursor = &heapTuple->t_self;
-		ItemPointerData rootTuple;
-		OffsetNumber root_offnum;
-
-		CHECK_FOR_INTERRUPTS();
-
-		state->htups += 1;
-
-		/*
-		 * As commented in IndexBuildHeapScan, we should index heap-only
-		 * tuples under the TIDs of their root tuples; so when we advance onto
-		 * a new heap page, build a map of root item offsets on the page.
-		 *
-		 * This complicates merging against the tuplesort output: we will
-		 * visit the live tuples in order by their offsets, but the root
-		 * offsets that we need to compare against the index contents might be
-		 * ordered differently.  So we might have to "look back" within the
-		 * tuplesort output, but only within the current page.  We handle that
-		 * by keeping a bool array in_index[] showing all the
-		 * already-passed-over tuplesort output TIDs of the current page. We
-		 * clear that array here, when advancing onto a new heap page.
-		 */
-		if (scan->rs_cblock != root_blkno)
-		{
-			Page		page = BufferGetPage(scan->rs_cbuf);
-
-			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
-			heap_get_root_tuples(page, root_offsets);
-			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-
-			memset(in_index, 0, sizeof(in_index));
-
-			root_blkno = scan->rs_cblock;
-		}
-
-		/* Convert actual tuple TID to root TID */
-		rootTuple = *heapcursor;
-		root_offnum = ItemPointerGetOffsetNumber(heapcursor);
-
-		if (HeapTupleIsHeapOnly(heapTuple))
-		{
-			root_offnum = root_offsets[root_offnum - 1];
-			if (!OffsetNumberIsValid(root_offnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_DATA_CORRUPTED),
-						 errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"",
-										 ItemPointerGetBlockNumber(heapcursor),
-										 ItemPointerGetOffsetNumber(heapcursor),
-										 RelationGetRelationName(heapRelation))));
-			ItemPointerSetOffsetNumber(&rootTuple, root_offnum);
-		}
-
-		/*
-		 * "merge" by skipping through the index tuples until we find or pass
-		 * the current root tuple.
-		 */
-		while (!tuplesort_empty &&
-			   (!indexcursor ||
-				ItemPointerCompare(indexcursor, &rootTuple) < 0))
-		{
-			Datum		ts_val;
-			bool		ts_isnull;
-
-			if (indexcursor)
-			{
-				/*
-				 * Remember index items seen earlier on the current heap page
-				 */
-				if (ItemPointerGetBlockNumber(indexcursor) == root_blkno)
-					in_index[ItemPointerGetOffsetNumber(indexcursor) - 1] = true;
-			}
-
-			tuplesort_empty = !tuplesort_getdatum(state->tuplesort, true,
-												  &ts_val, &ts_isnull, NULL);
-			Assert(tuplesort_empty || !ts_isnull);
-			if (!tuplesort_empty)
-			{
-				itemptr_decode(&decoded, DatumGetInt64(ts_val));
-				indexcursor = &decoded;
-
-				/* If int8 is pass-by-ref, free (encoded) TID Datum memory */
-#ifndef USE_FLOAT8_BYVAL
-				pfree(DatumGetPointer(ts_val));
-#endif
-			}
-			else
-			{
-				/* Be tidy */
-				indexcursor = NULL;
-			}
-		}
-
-		/*
-		 * If the tuplesort has overshot *and* we didn't see a match earlier,
-		 * then this tuple is missing from the index, so insert it.
-		 */
-		if ((tuplesort_empty ||
-			 ItemPointerCompare(indexcursor, &rootTuple) > 0) &&
-			!in_index[root_offnum - 1])
-		{
-			MemoryContextReset(econtext->ecxt_per_tuple_memory);
-
-			/* Set up for predicate or expression evaluation */
-			ExecStoreHeapTuple(heapTuple, slot, false);
-
-			/*
-			 * In a partial index, discard tuples that don't satisfy the
-			 * predicate.
-			 */
-			if (predicate != NULL)
-			{
-				if (!ExecQual(predicate, econtext))
-					continue;
-			}
-
-			/*
-			 * For the current heap tuple, extract all the attributes we use
-			 * in this index, and note which are null.  This also performs
-			 * evaluation of any expressions needed.
-			 */
-			FormIndexDatum(indexInfo,
-						   slot,
-						   estate,
-						   values,
-						   isnull);
-
-			/*
-			 * You'd think we should go ahead and build the index tuple here,
-			 * but some index AMs want to do further processing on the data
-			 * first. So pass the values[] and isnull[] arrays, instead.
-			 */
-
-			/*
-			 * If the tuple is already committed dead, you might think we
-			 * could suppress uniqueness checking, but this is no longer true
-			 * in the presence of HOT, because the insert is actually a proxy
-			 * for a uniqueness check on the whole HOT-chain.  That is, the
-			 * tuple we have here could be dead because it was already
-			 * HOT-updated, and if so the updating transaction will not have
-			 * thought it should insert index entries.  The index AM will
-			 * check the whole HOT-chain and correctly detect a conflict if
-			 * there is one.
-			 */
-
-			index_insert(indexRelation,
-						 values,
-						 isnull,
-						 &rootTuple,
-						 heapRelation,
-						 indexInfo->ii_Unique ?
-						 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
-						 indexInfo);
-
-			state->tups_inserted += 1;
-		}
-	}
-
-	table_endscan(sscan);
-
-	ExecDropSingleTupleTableSlot(slot);
-
-	FreeExecutorState(estate);
-
-	/* These may have been pointing to the now-gone estate */
-	indexInfo->ii_ExpressionsState = NIL;
-	indexInfo->ii_PredicateState = NULL;
+	v_i_state *state = (v_i_state *)opaque;
+
+	index_insert(indexRelation,
+				 values,
+				 isnull,
+				 rootTuple,
+				 heapRelation,
+				 indexInfo->ii_Unique ?
+				 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
+				 indexInfo);
+
+	state->tups_inserted += 1;
 }
 
 
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 691f687ade..27bf57a486 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -173,6 +173,14 @@ typedef double (*IndexBuildRangeScan_function)(Relation heapRelation,
 											   void *callback_state,
 											   TableScanDesc scan);
 
+typedef uint64 (*ValidateIndexscan_function)(Relation heapRelation,
+									   Relation indexRelation,
+									   IndexInfo *indexInfo,
+									   Snapshot snapshot,
+									   Tuplesortstate *tuplesort,
+									   IndexValidateCallback callback,
+									   void *callback_state);
+
 typedef bool (*BitmapPagescan_function)(TableScanDesc scan,
 										TBMIterateResult *tbmres);
 
@@ -236,6 +244,7 @@ typedef struct TableAmRoutine
 
 
 	IndexBuildRangeScan_function index_build_range_scan;
+	ValidateIndexscan_function validate_index_scan;
 }			TableAmRoutine;
 
 static inline const TupleTableSlotOps*
@@ -691,6 +700,24 @@ table_index_build_range_scan(Relation heapRelation,
 		scan);
 }
 
+static inline uint64
+table_validate_index(Relation heapRelation,
+					 Relation indexRelation,
+					 IndexInfo *indexInfo,
+					 Snapshot snapshot,
+					 Tuplesortstate *tuplesort,
+					 IndexValidateCallback callback,
+					 void *callback_state)
+{
+	return heapRelation->rd_tableamroutine->validate_index_scan(heapRelation,
+			indexRelation,
+			indexInfo,
+			snapshot,
+			tuplesort,
+			callback,
+			callback_state);
+}
+
 extern BlockNumber table_parallelscan_nextpage(TableScanDesc scan);
 extern void table_parallelscan_startblock_init(TableScanDesc scan);
 extern Size table_parallelscan_estimate(Snapshot snapshot);
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 376907b616..874e956c8e 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -28,6 +28,15 @@ typedef void (*IndexBuildCallback) (Relation index,
 									bool tupleIsAlive,
 									void *state);
 
+/* Typedef for callback function for table_validate_index_scan */
+typedef void (*IndexValidateCallback) (Relation indexRelation,
+									   Datum *values,
+									   bool *isnull,
+									   ItemPointer heap_t_ctid,
+									   Relation heapRelation,
+									   IndexInfo *indexInfo,
+									   void *state);
+
 /* Action code for index_set_state_flags */
 typedef enum
 {
@@ -37,6 +46,45 @@ typedef enum
 	INDEX_DROP_SET_DEAD
 } IndexStateFlagsAction;
 
+/*
+ * itemptr_encode - Encode ItemPointer as int64/int8
+ *
+ * This representation must produce values encoded as int64 that sort in the
+ * same order as their corresponding original TID values would (using the
+ * default int8 opclass to produce a result equivalent to the default TID
+ * opclass).
+ *
+ * As noted in validate_index(), this can be significantly faster.
+ */
+static inline int64
+itemptr_encode(ItemPointer itemptr)
+{
+	BlockNumber block = ItemPointerGetBlockNumber(itemptr);
+	OffsetNumber offset = ItemPointerGetOffsetNumber(itemptr);
+	int64		encoded;
+
+	/*
+	 * Use the 16 least significant bits for the offset.  32 adjacent bits are
+	 * used for the block number.  Since remaining bits are unused, there
+	 * cannot be negative encoded values (We assume a two's complement
+	 * representation).
+	 */
+	encoded = ((uint64) block << 16) | (uint16) offset;
+
+	return encoded;
+}
+
+/*
+ * itemptr_decode - Decode int64/int8 representation back to ItemPointer
+ */
+static inline void
+itemptr_decode(ItemPointer itemptr, int64 encoded)
+{
+	BlockNumber block = (BlockNumber) (encoded >> 16);
+	OffsetNumber offset = (OffsetNumber) (encoded & 0xFFFF);
+
+	ItemPointerSet(itemptr, block, offset);
+}
 
 extern void index_check_primary_key(Relation heapRel,
 						IndexInfo *indexInfo,
-- 
2.18.0.windows.1

0001-Movting-GUC-variable-declartion-to-proper-place.patchapplication/octet-stream; name=0001-Movting-GUC-variable-declartion-to-proper-place.patchDownload
From 02340b6422a324eddbaa096fbeef95ed3a4cd6df Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 27 Sep 2018 15:54:10 +1000
Subject: [PATCH 1/3] Movting GUC variable declartion to proper place

---
 src/backend/access/heap/heapam.c   | 3 ---
 src/backend/access/table/tableam.c | 6 ++----
 src/include/access/tableam.h       | 1 +
 3 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6d516ccc0b..bff7049214 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -80,9 +80,6 @@
 #include "nodes/execnodes.h"
 #include "executor/executor.h"
 
-/* GUC variable */
-bool		synchronize_seqscans = true;
-
 static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 					TransactionId xid, CommandId cid, int options);
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 96c5325ddb..af99264df9 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -15,10 +15,8 @@
 #include "storage/bufmgr.h"
 #include "storage/shmem.h"
 
-
-// PBORKED: move to header
-extern bool synchronize_seqscans;
-
+/* GUC variable */
+bool		synchronize_seqscans = true;
 
 char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 5f6b39c0e0..d0a5f59aa9 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -48,6 +48,7 @@ typedef enum tuple_data_flags
 }			tuple_data_flags;
 
 extern char *default_table_access_method;
+extern bool synchronize_seqscans;
 
 /*
  * Storage routine function hooks
-- 
2.18.0.windows.1

0002-check_default_table_access_method-hook-to-verify-the.patchapplication/octet-stream; name=0002-check_default_table_access_method-hook-to-verify-the.patchDownload
From 8df9ac913811220bbcbfefa57e2204362c481cb9 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 27 Sep 2018 15:54:49 +1000
Subject: [PATCH 2/3] check_default_table_access_method hook to verify the
 access method

---
 src/backend/access/table/tableamapi.c | 88 +++++++++++++++++++++++++++
 src/backend/utils/misc/guc.c          |  3 +-
 src/include/access/tableam.h          |  4 ++
 3 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index 7f08500ef4..d4eb889bd2 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -14,11 +14,14 @@
 
 #include "access/htup_details.h"
 #include "access/tableam.h"
+#include "access/xact.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_proc.h"
+#include "utils/fmgroids.h"
 #include "utils/syscache.h"
 #include "utils/memutils.h"
 
+static Oid get_table_am_oid(const char *tableamname, bool missing_ok);
 
 TupleTableSlot*
 table_gimmegimmeslot(Relation relation, List **reglist)
@@ -97,3 +100,88 @@ GetTableAmRoutineByAmId(Oid amoid)
 	/* And finally, call the handler function to get the API struct. */
 	return GetTableAmRoutine(amhandler);
 }
+
+/*
+ * get_table_am_oid - given a table access method name, look up the OID
+ *
+ * If missing_ok is false, throw an error if table access method name not
+ * found. If true, just return InvalidOid.
+ */
+static Oid
+get_table_am_oid(const char *tableamname, bool missing_ok)
+{
+	Oid			result;
+	Relation	rel;
+	TableScanDesc scandesc;
+	HeapTuple	tuple;
+	ScanKeyData entry[1];
+
+	/*
+	 * Search pg_tablespace.  We use a heapscan here even though there is an
+	 * index on name, on the theory that pg_tablespace will usually have just
+	 * a few entries and so an indexed lookup is a waste of effort.
+	 */
+	rel = heap_open(AccessMethodRelationId, AccessShareLock);
+
+	ScanKeyInit(&entry[0],
+				Anum_pg_am_amname,
+				BTEqualStrategyNumber, F_NAMEEQ,
+				CStringGetDatum(tableamname));
+	scandesc = table_beginscan_catalog(rel, 1, entry);
+	tuple = heap_scan_getnext(scandesc, ForwardScanDirection);
+
+	/* We assume that there can be at most one matching tuple */
+	if (HeapTupleIsValid(tuple) &&
+			((Form_pg_am) GETSTRUCT(tuple))->amtype == AMTYPE_TABLE)
+		result = HeapTupleGetOid(tuple);
+	else
+		result = InvalidOid;
+
+	table_endscan(scandesc);
+	heap_close(rel, AccessShareLock);
+
+	if (!OidIsValid(result) && !missing_ok)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("table access method \"%s\" does not exist",
+						 tableamname)));
+
+	return result;
+}
+
+/* check_hook: validate new default_table_access_method */
+bool
+check_default_table_access_method(char **newval, void **extra, GucSource source)
+{
+	/*
+	 * If we aren't inside a transaction, we cannot do database access so
+	 * cannot verify the name.  Must accept the value on faith.
+	 */
+	if (IsTransactionState())
+	{
+		if (**newval != '\0' &&
+			!OidIsValid(get_table_am_oid(*newval, true)))
+		{
+			/*
+			 * When source == PGC_S_TEST, don't throw a hard error for a
+			 * nonexistent table access method, only a NOTICE.
+			 * See comments in guc.h.
+			 */
+			if (source == PGC_S_TEST)
+			{
+				ereport(NOTICE,
+						(errcode(ERRCODE_UNDEFINED_OBJECT),
+						 errmsg("Table access method \"%s\" does not exist",
+								*newval)));
+			}
+			else
+			{
+				GUC_check_errdetail("Table access method \"%s\" does not exist.",
+									*newval);
+				return false;
+			}
+		}
+	}
+
+	return true;
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 3b996c8088..94b135a48b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3336,8 +3336,7 @@ static struct config_string ConfigureNamesString[] =
 		},
 		&default_table_access_method,
 		DEFAULT_TABLE_ACCESS_METHOD,
-		/* PBORKED: a check hook would be good */
-		NULL, NULL, NULL
+		check_default_table_access_method, NULL, NULL
 	},
 
 	{
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index d0a5f59aa9..691f687ade 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -23,6 +23,7 @@
 #include "nodes/execnodes.h"
 #include "nodes/nodes.h"
 #include "fmgr.h"
+#include "utils/guc.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 #include "utils/snapshot.h"
@@ -714,4 +715,7 @@ extern const TableAmRoutine * GetTableAmRoutine(Oid amhandler);
 extern const TableAmRoutine * GetTableAmRoutineByAmId(Oid amoid);
 extern const TableAmRoutine * GetHeapamTableAmRoutine(void);
 
+extern bool check_default_table_access_method(char **newval, void **extra,
+									GucSource source);
+
 #endif		/* TABLEAM_H */
-- 
2.18.0.windows.1

#30Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#29)
Re: Pluggable Storage - Andres's take

On 2018-09-28 12:21:08 +1000, Haribabu Kommi wrote:

Here I attached further cleanup patches.
1. Re-arrange the GUC variable
2. Added a check function hook for default_table_access_method GUC

Cool.

3. Added a new hook validate_index. I tried to change the function
validate_index_heapscan to slotify, but that have many problems as it
is accessing some internals of the heapscandesc structure and accessing
the buffer and etc.

Oops, I also did that locally, in a way. I also made a validate a
callback, as the validation logic is going to be specific to the AMs.
Sorry for not pushing that up earlier. I'll try to do that soon,
there's a fair amount of change.

Greetings,

Andres Freund

#31Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#30)
Re: Pluggable Storage - Andres's take

On 2018-09-27 20:03:58 -0700, Andres Freund wrote:

On 2018-09-28 12:21:08 +1000, Haribabu Kommi wrote:

Here I attached further cleanup patches.
1. Re-arrange the GUC variable
2. Added a check function hook for default_table_access_method GUC

Cool.

3. Added a new hook validate_index. I tried to change the function
validate_index_heapscan to slotify, but that have many problems as it
is accessing some internals of the heapscandesc structure and accessing
the buffer and etc.

Oops, I also did that locally, in a way. I also made a validate a
callback, as the validation logic is going to be specific to the AMs.
Sorry for not pushing that up earlier. I'll try to do that soon,
there's a fair amount of change.

I've pushed an updated version, with a fair amount of pending changes,
and I hope all your pending (and not redundant, by our concurrent
development), patches merged.

There's currently 3 regression test failures, that I'll look into
tomorrow:
- partition_prune shows a few additional Heap Blocks: exact=1 lines. I'm
a bit confused as to why, but haven't really investigated yet.
- fast_default fails, because I've undone most of 7636e5c60fea83a9f3c,
I'll have to redo that in a different way.
- I occasionally see failures in aggregates.sql - I've not figured out
what's going on there.

Amit Khandekar said he'll publish a new version of the slot-abstraction
patch tomorrow, so I'll rebase it onto that ASAP.

My next planned steps are a) to try to commit parts of the
slot-abstraction work b) to try to break out a few more pieces out of
the large pluggable storage patch.

Greetings,

Andres Freund

#32Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#31)
2 attachment(s)
Re: Pluggable Storage - Andres's take

On Wed, Oct 3, 2018 at 3:16 PM Andres Freund <andres@anarazel.de> wrote:

On 2018-09-27 20:03:58 -0700, Andres Freund wrote:

On 2018-09-28 12:21:08 +1000, Haribabu Kommi wrote:

Here I attached further cleanup patches.
1. Re-arrange the GUC variable
2. Added a check function hook for default_table_access_method GUC

Cool.

3. Added a new hook validate_index. I tried to change the function
validate_index_heapscan to slotify, but that have many problems as it
is accessing some internals of the heapscandesc structure and accessing
the buffer and etc.

Oops, I also did that locally, in a way. I also made a validate a
callback, as the validation logic is going to be specific to the AMs.
Sorry for not pushing that up earlier. I'll try to do that soon,
there's a fair amount of change.

I've pushed an updated version, with a fair amount of pending changes,
and I hope all your pending (and not redundant, by our concurrent
development), patches merged.

Yes, All the patches are merged.

There's currently 3 regression test failures, that I'll look into

tomorrow:
- partition_prune shows a few additional Heap Blocks: exact=1 lines. I'm
a bit confused as to why, but haven't really investigated yet.
- fast_default fails, because I've undone most of 7636e5c60fea83a9f3c,
I'll have to redo that in a different way.
- I occasionally see failures in aggregates.sql - I've not figured out
what's going on there.

I also observed the failure of aggregates.sql, will look into it.

Amit Khandekar said he'll publish a new version of the slot-abstraction
patch tomorrow, so I'll rebase it onto that ASAP.

OK.
Here I attached two new API patches.
1. Set New Rel File node
2. Create Init fork

There is an another patch of "External Relations" in the older patch set,
which is not included in the current git. That patch is of creating
external relations by the extensions for their internal purpose. (Columnar
relations for the columnar storage). This new relkind can be used for
those relations, this way it provides the difference between normal and
columnar relations. Do you have any other idea of supporting those type
of relations?

And also I want to create a new API for heap_create_with_catalog
to let the pluggable storage engine to create additional relations.
This API is not required for every storage engine, so instead of moving
the entire function as API, how about adding an API at the end of the
function and calls only when it is set like hook functions? In case if the
storage engine doesn't need any of the heap_create_with_catalog
functionality then creating a full API is better.

Comments?

My next planned steps are a) to try to commit parts of the

slot-abstraction work b) to try to break out a few more pieces out of
the large pluggable storage patch.

OK. Let me know your views on the part of the pieces that are stable,
so that I can separate them from larger patch.

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0002-init-fork-API.patchapplication/octet-stream; name=0002-init-fork-API.patchDownload
From 3f1340364236b22f5a2b505e359083494b276b95 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 8 Oct 2018 15:08:59 +1100
Subject: [PATCH 2/2] init fork API

API to create INIT_FORKNUM file with wrapper
table_create_init_fork.
---
 src/backend/access/heap/heapam_handler.c | 26 +++++++++++++++++++++++-
 src/backend/catalog/heap.c               | 24 ++--------------------
 src/backend/commands/tablecmds.c         |  4 ++--
 src/include/access/tableam.h             |  8 ++++++++
 src/include/catalog/heap.h               |  2 --
 5 files changed, 37 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 313ed319fc..87d3331ba1 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,6 +33,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_am_d.h"
 #include "catalog/storage.h"
+#include "catalog/storage_xlog.h"
 #include "executor/executor.h"
 #include "pgstat.h"
 #include "storage/lmgr.h"
@@ -2240,6 +2241,28 @@ RelationSetNewRelfilenode(Relation relation, char persistence,
 	EOXactListAdd(relation);
 }
 
+/*
+ * Set up an init fork for an unlogged table so that it can be correctly
+ * reinitialized on restart.  An immediate sync is required even if the
+ * page has been logged, because the write did not go through
+ * shared_buffers and therefore a concurrent checkpoint may have moved
+ * the redo pointer past our xlog record.  Recovery may as well remove it
+ * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
+ * record. Therefore, logging is necessary even if wal_level=minimal.
+ */
+static void
+heap_create_init_fork(Relation rel)
+{
+	Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
+		   rel->rd_rel->relkind == RELKIND_MATVIEW ||
+		   rel->rd_rel->relkind == RELKIND_TOASTVALUE);
+	RelationOpenSmgr(rel);
+	smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
+	log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
+	smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
+}
+
+
 static const TableAmRoutine heapam_methods = {
 	.type = T_TableAmRoutine,
 
@@ -2289,7 +2312,8 @@ static const TableAmRoutine heapam_methods = {
 
 	.index_validate_scan = validate_index_heapscan,
 
-	.SetNewFileNode = RelationSetNewRelfilenode
+	.SetNewFileNode = RelationSetNewRelfilenode,
+	.CreateInitFork = heap_create_init_fork
 };
 
 const TableAmRoutine *
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 38b368f916..8e7c8ce684 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -32,6 +32,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/sysattr.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
@@ -1425,7 +1426,7 @@ heap_create_with_catalog(const char *relname,
 	 */
 	if (relpersistence == RELPERSISTENCE_UNLOGGED &&
 		relkind != RELKIND_PARTITIONED_TABLE)
-		heap_create_init_fork(new_rel_desc);
+		table_create_init_fork(new_rel_desc);
 
 	/*
 	 * ok, the relation has been cataloged, so close our relations and return
@@ -1437,27 +1438,6 @@ heap_create_with_catalog(const char *relname,
 	return relid;
 }
 
-/*
- * Set up an init fork for an unlogged table so that it can be correctly
- * reinitialized on restart.  An immediate sync is required even if the
- * page has been logged, because the write did not go through
- * shared_buffers and therefore a concurrent checkpoint may have moved
- * the redo pointer past our xlog record.  Recovery may as well remove it
- * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
- * record. Therefore, logging is necessary even if wal_level=minimal.
- */
-void
-heap_create_init_fork(Relation rel)
-{
-	Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
-		   rel->rd_rel->relkind == RELKIND_MATVIEW ||
-		   rel->rd_rel->relkind == RELKIND_TOASTVALUE);
-	RelationOpenSmgr(rel);
-	smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
-	log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
-	smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
-}
-
 /*
  *		RelationRemoveInheritance
  *
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6db214309e..e107afc786 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1647,7 +1647,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 			table_set_new_filenode(rel, rel->rd_rel->relpersistence,
 									  RecentXmin, minmulti);
 			if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
-				heap_create_init_fork(rel);
+				table_create_init_fork(rel);
 
 			heap_relid = RelationGetRelid(rel);
 			toast_relid = rel->rd_rel->reltoastrelid;
@@ -1661,7 +1661,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 				table_set_new_filenode(rel, rel->rd_rel->relpersistence,
 										  RecentXmin, minmulti);
 				if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
-					heap_create_init_fork(rel);
+					table_create_init_fork(rel);
 				heap_close(rel, NoLock);
 			}
 
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 4d5b11c294..f3e36368db 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -196,6 +196,7 @@ typedef bool (*SampleScanNextTuple_function)(TableScanDesc scan, struct SampleSc
 
 typedef void (*SetNewFileNode_function)(Relation relation, char persistence,
 										TransactionId freezeXid, MultiXactId minmulti);
+typedef void (*CreateInitFork_function)(Relation rel);
 
 /*
  * API struct for a table AM.  Note this must be allocated in a
@@ -255,6 +256,7 @@ typedef struct TableAmRoutine
 	IndexValidateScan_function index_validate_scan;
 
 	SetNewFileNode_function	SetNewFileNode;
+	CreateInitFork_function CreateInitFork;
 }			TableAmRoutine;
 
 static inline const TupleTableSlotOps*
@@ -754,6 +756,12 @@ table_set_new_filenode(Relation relation, char persistence,
 									freezeXid, minmulti);
 }
 
+static inline void
+table_create_init_fork(Relation relation)
+{
+	relation->rd_tableamroutine->CreateInitFork(relation);
+}
+
 extern BlockNumber table_parallelscan_nextpage(TableScanDesc scan);
 extern void table_parallelscan_startblock_init(TableScanDesc scan);
 extern Size table_parallelscan_estimate(Snapshot snapshot);
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 4584b3473c..c0e706ecc9 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -77,8 +77,6 @@ extern Oid heap_create_with_catalog(const char *relname,
 						 Oid relrewrite,
 						 ObjectAddress *typaddress);
 
-extern void heap_create_init_fork(Relation rel);
-
 extern void heap_drop_with_catalog(Oid relid);
 
 extern void heap_truncate(List *relids);
-- 
2.18.0.windows.1

0001-New-API-setNewfilenode.patchapplication/octet-stream; name=0001-New-API-setNewfilenode.patchDownload
From d8489cf06b9cd186f5dac801879e604bb330f79a Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 8 Oct 2018 14:33:49 +1100
Subject: [PATCH 1/2] New API setNewfilenode

This API can be used to set the filenode of a relation.
The wrapper function for this API is table_set_new_filenode,
using of it for sequence and index to create storage. The
wrapper function name can be updated if required.
---
 src/backend/access/heap/heapam_handler.c | 128 ++++++++++++++++++++-
 src/backend/catalog/index.c              |   2 +-
 src/backend/commands/sequence.c          |   5 +-
 src/backend/commands/tablecmds.c         |   6 +-
 src/backend/utils/cache/relcache.c       | 135 ++---------------------
 src/include/access/tableam.h             |  13 +++
 src/include/utils/relcache.h             |   9 +-
 7 files changed, 157 insertions(+), 141 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c3960dc91f..313ed319fc 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -22,6 +22,7 @@
 
 #include "miscadmin.h"
 
+#include "access/multixact.h"
 #include "access/heapam.h"
 #include "access/relscan.h"
 #include "access/rewriteheap.h"
@@ -29,12 +30,17 @@
 #include "access/tsmapi.h"
 #include "catalog/catalog.h"
 #include "catalog/index.h"
+#include "catalog/indexing.h"
 #include "catalog/pg_am_d.h"
+#include "catalog/storage.h"
 #include "executor/executor.h"
 #include "pgstat.h"
 #include "storage/lmgr.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
+#include "utils/relcache.h"
+#include "utils/relmapper.h"
+#include "utils/syscache.h"
 #include "utils/tqual.h"
 #include "storage/bufpage.h"
 #include "storage/bufmgr.h"
@@ -2116,6 +2122,124 @@ heap_copy_for_cluster(Relation OldHeap, Relation NewHeap, Relation OldIndex,
 	pfree(isnull);
 }
 
+/*
+ * RelationSetNewRelfilenode
+ *
+ * Assign a new relfilenode (physical file name) to the relation.
+ *
+ * This allows a full rewrite of the relation to be done with transactional
+ * safety (since the filenode assignment can be rolled back).  Note however
+ * that there is no simple way to access the relation's old data for the
+ * remainder of the current transaction.  This limits the usefulness to cases
+ * such as TRUNCATE or rebuilding an index from scratch.
+ *
+ * Caller must already hold exclusive lock on the relation.
+ *
+ * The relation is marked with relfrozenxid = freezeXid (InvalidTransactionId
+ * must be passed for indexes and sequences).  This should be a lower bound on
+ * the XIDs that will be put into the new relation contents.
+ *
+ * The new filenode's persistence is set to the given value.  This is useful
+ * for the cases that are changing the relation's persistence; other callers
+ * need to pass the original relpersistence value.
+ */
+static void
+RelationSetNewRelfilenode(Relation relation, char persistence,
+						  TransactionId freezeXid, MultiXactId minmulti)
+{
+	Oid			newrelfilenode;
+	RelFileNodeBackend newrnode;
+	Relation	pg_class;
+	HeapTuple	tuple;
+	Form_pg_class classform;
+
+	/* Indexes, sequences must have Invalid frozenxid; other rels must not */
+	Assert((relation->rd_rel->relkind == RELKIND_INDEX ||
+			relation->rd_rel->relkind == RELKIND_SEQUENCE) ?
+		   freezeXid == InvalidTransactionId :
+		   TransactionIdIsNormal(freezeXid));
+	Assert(TransactionIdIsNormal(freezeXid) == MultiXactIdIsValid(minmulti));
+
+	/* Allocate a new relfilenode */
+	newrelfilenode = GetNewRelFileNode(relation->rd_rel->reltablespace, NULL,
+									   persistence);
+
+	/*
+	 * Get a writable copy of the pg_class tuple for the given relation.
+	 */
+	pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCacheCopy1(RELOID,
+								ObjectIdGetDatum(RelationGetRelid(relation)));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "could not find tuple for relation %u",
+			 RelationGetRelid(relation));
+	classform = (Form_pg_class) GETSTRUCT(tuple);
+
+	/*
+	 * Create storage for the main fork of the new relfilenode.
+	 *
+	 * NOTE: any conflict in relfilenode value will be caught here, if
+	 * GetNewRelFileNode messes up for any reason.
+	 */
+	newrnode.node = relation->rd_node;
+	newrnode.node.relNode = newrelfilenode;
+	newrnode.backend = relation->rd_backend;
+	RelationCreateStorage(newrnode.node, persistence);
+	smgrclosenode(newrnode);
+
+	/*
+	 * Schedule unlinking of the old storage at transaction commit.
+	 */
+	RelationDropStorage(relation);
+
+	/*
+	 * Now update the pg_class row.  However, if we're dealing with a mapped
+	 * index, pg_class.relfilenode doesn't change; instead we have to send the
+	 * update to the relation mapper.
+	 */
+	if (RelationIsMapped(relation))
+		RelationMapUpdateMap(RelationGetRelid(relation),
+							 newrelfilenode,
+							 relation->rd_rel->relisshared,
+							 false);
+	else
+		classform->relfilenode = newrelfilenode;
+
+	/* These changes are safe even for a mapped relation */
+	if (relation->rd_rel->relkind != RELKIND_SEQUENCE)
+	{
+		classform->relpages = 0;	/* it's empty until further notice */
+		classform->reltuples = 0;
+		classform->relallvisible = 0;
+	}
+	classform->relfrozenxid = freezeXid;
+	classform->relminmxid = minmulti;
+	classform->relpersistence = persistence;
+
+	CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+
+	heap_freetuple(tuple);
+
+	heap_close(pg_class, RowExclusiveLock);
+
+	/*
+	 * Make the pg_class row change visible, as well as the relation map
+	 * change if any.  This will cause the relcache entry to get updated, too.
+	 */
+	CommandCounterIncrement();
+
+	/*
+	 * Mark the rel as having been given a new relfilenode in the current
+	 * (sub) transaction.  This is a hint that can be used to optimize later
+	 * operations on the rel in the same transaction.
+	 */
+	relation->rd_newRelfilenodeSubid = GetCurrentSubTransactionId();
+
+	/* Flag relation as needing eoxact cleanup (to remove the hint) */
+	EOXactListAdd(relation);
+}
+
 static const TableAmRoutine heapam_methods = {
 	.type = T_TableAmRoutine,
 
@@ -2163,7 +2287,9 @@ static const TableAmRoutine heapam_methods = {
 
 	.index_build_range_scan = IndexBuildHeapRangeScan,
 
-	.index_validate_scan = validate_index_heapscan
+	.index_validate_scan = validate_index_heapscan,
+
+	.SetNewFileNode = RelationSetNewRelfilenode
 };
 
 const TableAmRoutine *
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 55477bd995..df213dc07d 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2865,7 +2865,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
 		}
 
 		/* We'll build a new physical relation for the index */
-		RelationSetNewRelfilenode(iRel, persistence, InvalidTransactionId,
+		table_set_new_filenode(iRel, persistence, InvalidTransactionId,
 								  InvalidMultiXactId);
 
 		/* Initialize the index and rebuild */
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 89122d4ad7..107f9a0176 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -17,6 +17,7 @@
 #include "access/bufmask.h"
 #include "access/htup_details.h"
 #include "access/multixact.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
@@ -315,7 +316,7 @@ ResetSequence(Oid seq_relid)
 	 * sequence's relfrozenxid at 0, since it won't contain any unfrozen XIDs.
 	 * Same with relminmxid, since a sequence will never contain multixacts.
 	 */
-	RelationSetNewRelfilenode(seq_rel, seq_rel->rd_rel->relpersistence,
+	table_set_new_filenode(seq_rel, seq_rel->rd_rel->relpersistence,
 							  InvalidTransactionId, InvalidMultiXactId);
 
 	/*
@@ -485,7 +486,7 @@ AlterSequence(ParseState *pstate, AlterSeqStmt *stmt)
 		 * at 0, since it won't contain any unfrozen XIDs.  Same with
 		 * relminmxid, since a sequence will never contain multixacts.
 		 */
-		RelationSetNewRelfilenode(seqrel, seqrel->rd_rel->relpersistence,
+		table_set_new_filenode(seqrel, seqrel->rd_rel->relpersistence,
 								  InvalidTransactionId, InvalidMultiXactId);
 
 		/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f3526b267d..6db214309e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1643,10 +1643,8 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 			 * Create a new empty storage file for the relation, and assign it
 			 * as the relfilenode value. The old storage file is scheduled for
 			 * deletion at commit.
-			 *
-			 * PBORKED: needs to be a callback
 			 */
-			RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
+			table_set_new_filenode(rel, rel->rd_rel->relpersistence,
 									  RecentXmin, minmulti);
 			if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
 				heap_create_init_fork(rel);
@@ -1660,7 +1658,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 			if (OidIsValid(toast_relid))
 			{
 				rel = relation_open(toast_relid, AccessExclusiveLock);
-				RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
+				table_set_new_filenode(rel, rel->rd_rel->relpersistence,
 										  RecentXmin, minmulti);
 				if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
 					heap_create_init_fork(rel);
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 0d6e5a189f..0592fdc750 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -160,13 +160,6 @@ static Oid	eoxact_list[MAX_EOXACT_LIST];
 static int	eoxact_list_len = 0;
 static bool eoxact_list_overflowed = false;
 
-#define EOXactListAdd(rel) \
-	do { \
-		if (eoxact_list_len < MAX_EOXACT_LIST) \
-			eoxact_list[eoxact_list_len++] = (rel)->rd_id; \
-		else \
-			eoxact_list_overflowed = true; \
-	} while (0)
 
 /*
  * EOXactTupleDescArray stores TupleDescs that (might) need AtEOXact
@@ -292,6 +285,14 @@ static void unlink_initfile(const char *initfilename, int elevel);
 static bool equalPartitionDescs(PartitionKey key, PartitionDesc partdesc1,
 					PartitionDesc partdesc2);
 
+void
+EOXactListAdd(Relation rel)
+{
+	if (eoxact_list_len < MAX_EOXACT_LIST)
+		eoxact_list[eoxact_list_len++] = (rel)->rd_id;
+	else
+		eoxact_list_overflowed = true;
+}
 
 /*
  *		ScanPgRelation
@@ -3392,126 +3393,6 @@ RelationBuildLocalRelation(const char *relname,
 	return rel;
 }
 
-
-/*
- * RelationSetNewRelfilenode
- *
- * Assign a new relfilenode (physical file name) to the relation.
- *
- * This allows a full rewrite of the relation to be done with transactional
- * safety (since the filenode assignment can be rolled back).  Note however
- * that there is no simple way to access the relation's old data for the
- * remainder of the current transaction.  This limits the usefulness to cases
- * such as TRUNCATE or rebuilding an index from scratch.
- *
- * Caller must already hold exclusive lock on the relation.
- *
- * The relation is marked with relfrozenxid = freezeXid (InvalidTransactionId
- * must be passed for indexes and sequences).  This should be a lower bound on
- * the XIDs that will be put into the new relation contents.
- *
- * The new filenode's persistence is set to the given value.  This is useful
- * for the cases that are changing the relation's persistence; other callers
- * need to pass the original relpersistence value.
- */
-void
-RelationSetNewRelfilenode(Relation relation, char persistence,
-						  TransactionId freezeXid, MultiXactId minmulti)
-{
-	Oid			newrelfilenode;
-	RelFileNodeBackend newrnode;
-	Relation	pg_class;
-	HeapTuple	tuple;
-	Form_pg_class classform;
-
-	/* Indexes, sequences must have Invalid frozenxid; other rels must not */
-	Assert((relation->rd_rel->relkind == RELKIND_INDEX ||
-			relation->rd_rel->relkind == RELKIND_SEQUENCE) ?
-		   freezeXid == InvalidTransactionId :
-		   TransactionIdIsNormal(freezeXid));
-	Assert(TransactionIdIsNormal(freezeXid) == MultiXactIdIsValid(minmulti));
-
-	/* Allocate a new relfilenode */
-	newrelfilenode = GetNewRelFileNode(relation->rd_rel->reltablespace, NULL,
-									   persistence);
-
-	/*
-	 * Get a writable copy of the pg_class tuple for the given relation.
-	 */
-	pg_class = heap_open(RelationRelationId, RowExclusiveLock);
-
-	tuple = SearchSysCacheCopy1(RELOID,
-								ObjectIdGetDatum(RelationGetRelid(relation)));
-	if (!HeapTupleIsValid(tuple))
-		elog(ERROR, "could not find tuple for relation %u",
-			 RelationGetRelid(relation));
-	classform = (Form_pg_class) GETSTRUCT(tuple);
-
-	/*
-	 * Create storage for the main fork of the new relfilenode.
-	 *
-	 * NOTE: any conflict in relfilenode value will be caught here, if
-	 * GetNewRelFileNode messes up for any reason.
-	 */
-	newrnode.node = relation->rd_node;
-	newrnode.node.relNode = newrelfilenode;
-	newrnode.backend = relation->rd_backend;
-	RelationCreateStorage(newrnode.node, persistence);
-	smgrclosenode(newrnode);
-
-	/*
-	 * Schedule unlinking of the old storage at transaction commit.
-	 */
-	RelationDropStorage(relation);
-
-	/*
-	 * Now update the pg_class row.  However, if we're dealing with a mapped
-	 * index, pg_class.relfilenode doesn't change; instead we have to send the
-	 * update to the relation mapper.
-	 */
-	if (RelationIsMapped(relation))
-		RelationMapUpdateMap(RelationGetRelid(relation),
-							 newrelfilenode,
-							 relation->rd_rel->relisshared,
-							 false);
-	else
-		classform->relfilenode = newrelfilenode;
-
-	/* These changes are safe even for a mapped relation */
-	if (relation->rd_rel->relkind != RELKIND_SEQUENCE)
-	{
-		classform->relpages = 0;	/* it's empty until further notice */
-		classform->reltuples = 0;
-		classform->relallvisible = 0;
-	}
-	classform->relfrozenxid = freezeXid;
-	classform->relminmxid = minmulti;
-	classform->relpersistence = persistence;
-
-	CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
-
-	heap_freetuple(tuple);
-
-	heap_close(pg_class, RowExclusiveLock);
-
-	/*
-	 * Make the pg_class row change visible, as well as the relation map
-	 * change if any.  This will cause the relcache entry to get updated, too.
-	 */
-	CommandCounterIncrement();
-
-	/*
-	 * Mark the rel as having been given a new relfilenode in the current
-	 * (sub) transaction.  This is a hint that can be used to optimize later
-	 * operations on the rel in the same transaction.
-	 */
-	relation->rd_newRelfilenodeSubid = GetCurrentSubTransactionId();
-
-	/* Flag relation as needing eoxact cleanup (to remove the hint) */
-	EOXactListAdd(relation);
-}
-
-
 /*
  *		RelationCacheInitialize
  *
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 7fe6ff6c22..4d5b11c294 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -194,6 +194,9 @@ struct SampleScanState;
 typedef bool (*SampleScanNextBlock_function)(TableScanDesc scan, struct SampleScanState *scanstate);
 typedef bool (*SampleScanNextTuple_function)(TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot);
 
+typedef void (*SetNewFileNode_function)(Relation relation, char persistence,
+										TransactionId freezeXid, MultiXactId minmulti);
+
 /*
  * API struct for a table AM.  Note this must be allocated in a
  * server-lifetime manner, typically as a static const struct.
@@ -250,6 +253,8 @@ typedef struct TableAmRoutine
 
 	IndexBuildRangeScan_function index_build_range_scan;
 	IndexValidateScan_function index_validate_scan;
+
+	SetNewFileNode_function	SetNewFileNode;
 }			TableAmRoutine;
 
 static inline const TupleTableSlotOps*
@@ -741,6 +746,14 @@ table_index_build_range_scan(Relation heapRelation,
 		scan);
 }
 
+static inline void
+table_set_new_filenode(Relation relation, char persistence,
+						  TransactionId freezeXid, MultiXactId minmulti)
+{
+	relation->rd_tableamroutine->SetNewFileNode(relation, persistence,
+									freezeXid, minmulti);
+}
+
 extern BlockNumber table_parallelscan_nextpage(TableScanDesc scan);
 extern void table_parallelscan_startblock_init(TableScanDesc scan);
 extern Size table_parallelscan_estimate(Snapshot snapshot);
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 858a7b30d2..1482dae904 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -33,6 +33,9 @@ typedef struct RelationData *Relation;
  */
 typedef Relation *RelationPtr;
 
+/* Function to store the clenup OID's */
+extern void EOXactListAdd(Relation rel);
+
 /*
  * Routines to open (lookup) and close a relcache entry
  */
@@ -109,12 +112,6 @@ extern Relation RelationBuildLocalRelation(const char *relname,
 						   char relpersistence,
 						   char relkind);
 
-/*
- * Routine to manage assignment of new relfilenode to a relation
- */
-extern void RelationSetNewRelfilenode(Relation relation, char persistence,
-						  TransactionId freezeXid, MultiXactId minmulti);
-
 /*
  * Routines for flushing/rebuilding relcache entries in various scenarios
  */
-- 
2.18.0.windows.1

#33Alexander Korotkov
a.korotkov@postgrespro.ru
In reply to: Andres Freund (#31)
2 attachment(s)
Re: Pluggable Storage - Andres's take

Hi!

On Wed, Oct 3, 2018 at 8:16 AM Andres Freund <andres@anarazel.de> wrote:

I've pushed an updated version, with a fair amount of pending changes,
and I hope all your pending (and not redundant, by our concurrent
development), patches merged.

I'd like to also share some patches. I've used current state of
pluggable-zheap for the base of my patches.

* 0001-remove-extra-snapshot-functions.patch – removes
snapshot_satisfiesUpdate() and snapshot_satisfiesVacuum() functions
from tableam API. snapshot_satisfiesUpdate() was unused completely.
snapshot_satisfiesVacuum() was used only in heap_copy_for_cluster().
So, I've replaced it with direct heapam_satisfies_vacuum().

* 0002-add-costing-function-to-API.patch – adds function for costing
sequential and table sample scan to tableam API. zheap costing
function are now copies of heap costing function. This should be
adjusted in future. Estimation for heap lookup during index scans
should be also pluggable, but not yet implemented (TODO).

I've examined code in pluggable-zheap branch and EDB github [1] and I
didn't found anything related to "delete-marking" indexes as stated on
slide #25 of presentation [2]. So, basically contract between heap
and indexes is remain unchanged: once you update one indexed field,
you have to update all the others. Did I understand correctly that
this is postponed?

And couple more notes from me:
* Right now table_fetch_row_version() is called in most of places with
SnapshotAny. That might be working in majority of cases, because in
heap there couldn't be multiple tuples residing the same TID, while
zheap always returns most recent tuple residing this TID. But I think
it would be better to provide some meaningful snapshot instead of
SnapshotAny. If even the best thing we can do is to ask for most
recent tuple on some TID, we need more consistent way for asking table
AM for this. I'm going to elaborate more on this.
* I'm not really sure we need ability to iterate multiple tuples
referenced by index. It seems that the only place, which really needs
this is heap_copy_for_cluster(), which itself is table AM specific.
Also zheap doesn't seem to be able to return more than one tuple by
zheapam_fetch_follow(). So, I'm going to investigate more on this.
And if this iteration is really unneeded, I'll propose a patch to
delete that.

1. https://github.com/EnterpriseDB/zheap
2. http://www.pgcon.org/2018/schedule/attachments/501_zheap-a-new-storage-format-postgresql-5.pdf

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-remove-extra-snapshot-functions.patchapplication/octet-stream; name=0001-remove-extra-snapshot-functions.patchDownload
commit fea0ea9bceb698dbe88bee42d5fc5f3332a658dd
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date:   Wed Sep 26 15:29:43 2018 +0300

    Remove some snapshot functions from TableAmRoutine
    
    snapshot_satisfiesUpdate was unused.  snapshot_satisfiesVacuum was used only
    inside heap_copy_for_cluster, so it was replaced to direct
    heapam_satisfies_vacuum() call.

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c3960dc91fd..28c475e7bdc 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -483,21 +483,6 @@ heapam_satisfies(TupleTableSlot *slot, Snapshot snapshot)
 	return res;
 }
 
-static HTSU_Result
-heapam_satisfies_update(TupleTableSlot *slot, CommandId curcid)
-{
-	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
-	HTSU_Result res;
-
-	LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
-	res = HeapTupleSatisfiesUpdate(bslot->base.tuple,
-								   curcid,
-								   bslot->buffer);
-	LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
-
-	return res;
-}
-
 static HTSV_Result
 heapam_satisfies_vacuum(TupleTableSlot *slot, TransactionId OldestXmin)
 {
@@ -2003,7 +1988,7 @@ heap_copy_for_cluster(Relation OldHeap, Relation NewHeap, Relation OldIndex,
 				break;
 		}
 
-		switch (OldHeap->rd_tableamroutine->snapshot_satisfiesVacuum(slot, OldestXmin))
+		switch (heapam_satisfies_vacuum(slot, OldestXmin))
 		{
 			case HEAPTUPLE_DEAD:
 				/* Definitely dead */
@@ -2122,8 +2107,6 @@ static const TableAmRoutine heapam_methods = {
 	.slot_callbacks = heapam_slot_callbacks,
 
 	.snapshot_satisfies = heapam_satisfies,
-	.snapshot_satisfiesUpdate = heapam_satisfies_update,
-	.snapshot_satisfiesVacuum = heapam_satisfies_vacuum,
 
 	.scan_begin = heap_beginscan,
 	.scansetlimits = heap_setscanlimits,
diff --git a/src/backend/access/zheap/zheapam_handler.c b/src/backend/access/zheap/zheapam_handler.c
index bec9b16f7d6..e707baa1b5d 100644
--- a/src/backend/access/zheap/zheapam_handler.c
+++ b/src/backend/access/zheap/zheapam_handler.c
@@ -486,40 +486,6 @@ zheapam_satisfies(TupleTableSlot *slot, Snapshot snapshot)
 #endif
 }
 
-static HTSU_Result
-zheapam_satisfies_update(TupleTableSlot *slot, CommandId curcid)
-{
-	elog(ERROR, "would need to track buffer or refetch");
-#if ZBORKED
-	BufferHeapTupleTableSlot *zslot = (BufferHeapTupleTableSlot *) slot;
-	HTSU_Result res;
-
-
-	LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
-	res = HeapTupleSatisfiesUpdate(bslot->base.tuple,
-								   curcid,
-								   bslot->buffer);
-	LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
-
-	return res;
-#endif
-}
-
-static HTSV_Result
-zheapam_satisfies_vacuum(TupleTableSlot *slot, TransactionId OldestXmin)
-{
-	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
-	HTSV_Result res;
-
-	LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
-	res = HeapTupleSatisfiesVacuum(bslot->base.tuple,
-								   OldestXmin,
-								   bslot->buffer);
-	LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
-
-	return res;
-}
-
 static IndexFetchTableData*
 zheapam_begin_index_fetch(Relation rel)
 {
@@ -1621,8 +1587,6 @@ static const TableAmRoutine zheapam_methods = {
 	.slot_callbacks = zheapam_slot_callbacks,
 
 	.snapshot_satisfies = zheapam_satisfies,
-	.snapshot_satisfiesUpdate = zheapam_satisfies_update,
-	.snapshot_satisfiesVacuum = zheapam_satisfies_vacuum,
 
 	.scan_begin = zheap_beginscan,
 	.scansetlimits = zheap_setscanlimits,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 7fe6ff6c221..fb37a739918 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -55,8 +55,6 @@ extern bool synchronize_seqscans;
  * Storage routine function hooks
  */
 typedef bool (*SnapshotSatisfies_function) (TupleTableSlot *slot, Snapshot snapshot);
-typedef HTSU_Result (*SnapshotSatisfiesUpdate_function) (TupleTableSlot *slot, CommandId curcid);
-typedef HTSV_Result (*SnapshotSatisfiesVacuum_function) (TupleTableSlot *slot, TransactionId OldestXmin);
 
 typedef Oid (*TupleInsert_function) (Relation rel, TupleTableSlot *slot, CommandId cid,
 									 int options, BulkInsertState bistate);
@@ -205,8 +203,6 @@ typedef struct TableAmRoutine
 	SlotCallbacks_function slot_callbacks;
 
 	SnapshotSatisfies_function snapshot_satisfies;
-	SnapshotSatisfiesUpdate_function snapshot_satisfiesUpdate;
-	SnapshotSatisfiesVacuum_function snapshot_satisfiesVacuum;
 
 	/* Operations on physical tuples */
 	TupleInsert_function tuple_insert;
0002-add-costing-function-to-API.patchapplication/octet-stream; name=0002-add-costing-function-to-API.patchDownload
commit fa33fbebe3e33f09fdb32961b2ffdf5c7262b74a
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date:   Mon Oct 15 21:33:31 2018 +0300

    Add costing function to tableam interface
    
    Costs of sequential scan and table sample scan are estimated using
    cost_seqscan() and cost_samplescan().  But they should be table access method
    specific, because different table AMs could have different costs.  This commit
    introduces zheap cost functions to be the same as heap.  That should be
    adjusted in future.  Make costs of heap lookups during index scan is TODO.

diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile
index aee7bfd8346..e13b0e0b8fa 100644
--- a/src/backend/access/heap/Makefile
+++ b/src/backend/access/heap/Makefile
@@ -13,6 +13,7 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = heapam.o heapam_handler.o heapam_visibility.o hio.o pruneheap.o \
-	rewriteheap.o syncscan.o tuptoaster.o vacuumlazy.o visibilitymap.o
+	rewriteheap.o syncscan.o tuptoaster.o vacuumlazy.o visibilitymap.o \
+	heapam_cost.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 28c475e7bdc..35b230c3606 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2146,7 +2146,10 @@ static const TableAmRoutine heapam_methods = {
 
 	.index_build_range_scan = IndexBuildHeapRangeScan,
 
-	.index_validate_scan = validate_index_heapscan
+	.index_validate_scan = validate_index_heapscan,
+
+	.cost_scan = heapam_cost_scan,
+	.cost_samplescan = heapam_cost_samplescan
 };
 
 const TableAmRoutine *
diff --git a/src/backend/access/zheap/Makefile b/src/backend/access/zheap/Makefile
index 75b0ff69ebf..4458e1a4238 100644
--- a/src/backend/access/zheap/Makefile
+++ b/src/backend/access/zheap/Makefile
@@ -13,6 +13,6 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = prunetpd.o prunezheap.o tpd.o tpdxlog.o zheapam.o zheapam_handler.o zheapamutils.o zheapamxlog.o \
-	zhio.o zmultilocker.o ztuptoaster.o ztqual.o zvacuumlazy.o
+	zhio.o zmultilocker.o ztuptoaster.o ztqual.o zvacuumlazy.o zheapam_cost.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/zheap/zheapam_handler.c b/src/backend/access/zheap/zheapam_handler.c
index e707baa1b5d..da95c881a77 100644
--- a/src/backend/access/zheap/zheapam_handler.c
+++ b/src/backend/access/zheap/zheapam_handler.c
@@ -1627,6 +1627,8 @@ static const TableAmRoutine zheapam_methods = {
 	.index_build_range_scan = IndexBuildZHeapRangeScan,
 	.index_validate_scan = validate_index_zheapscan,
 
+	.cost_scan = zheapam_cost_scan,
+	.cost_samplescan = zheapam_cost_samplescan
 };
 
 Datum
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 7bf67a05295..abdddacf89a 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -75,6 +75,7 @@
 
 #include "access/amapi.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/tsmapi.h"
 #include "executor/executor.h"
 #include "executor/nodeHash.h"
@@ -150,9 +151,6 @@ static MergeScanSelCache *cached_scansel(PlannerInfo *root,
 static void cost_rescan(PlannerInfo *root, Path *path,
 			Cost *rescan_startup_cost, Cost *rescan_total_cost);
 static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
-static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
-						  ParamPathInfo *param_info,
-						  QualCost *qpqual_cost);
 static bool has_indexed_join_quals(NestPath *joinpath);
 static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
 				   List *quals);
@@ -174,7 +172,6 @@ static Cost append_nonpartial_cost(List *subpaths, int numpaths,
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
-static double get_parallel_divisor(Path *path);
 
 
 /*
@@ -209,70 +206,9 @@ void
 cost_seqscan(Path *path, PlannerInfo *root,
 			 RelOptInfo *baserel, ParamPathInfo *param_info)
 {
-	Cost		startup_cost = 0;
-	Cost		cpu_run_cost;
-	Cost		disk_run_cost;
-	double		spc_seq_page_cost;
-	QualCost	qpqual_cost;
-	Cost		cpu_per_tuple;
-
-	/* Should only be applied to base relations */
-	Assert(baserel->relid > 0);
-	Assert(baserel->rtekind == RTE_RELATION);
-
-	/* Mark the path with the correct row estimate */
-	if (param_info)
-		path->rows = param_info->ppi_rows;
-	else
-		path->rows = baserel->rows;
-
-	if (!enable_seqscan)
-		startup_cost += disable_cost;
-
-	/* fetch estimated page cost for tablespace containing table */
-	get_tablespace_page_costs(baserel->reltablespace,
-							  NULL,
-							  &spc_seq_page_cost);
-
-	/*
-	 * disk costs
-	 */
-	disk_run_cost = spc_seq_page_cost * baserel->pages;
-
-	/* CPU costs */
-	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
-
-	startup_cost += qpqual_cost.startup;
-	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
-	cpu_run_cost = cpu_per_tuple * baserel->tuples;
-	/* tlist eval costs are paid per output row, not per tuple scanned */
-	startup_cost += path->pathtarget->cost.startup;
-	cpu_run_cost += path->pathtarget->cost.per_tuple * path->rows;
-
-	/* Adjust costing for parallelism, if used. */
-	if (path->parallel_workers > 0)
-	{
-		double		parallel_divisor = get_parallel_divisor(path);
-
-		/* The CPU cost is divided among all the workers. */
-		cpu_run_cost /= parallel_divisor;
-
-		/*
-		 * It may be possible to amortize some of the I/O cost, but probably
-		 * not very much, because most operating systems already do aggressive
-		 * prefetching.  For now, we assume that the disk run cost can't be
-		 * amortized at all.
-		 */
-
-		/*
-		 * In the case of a parallel plan, the row count needs to represent
-		 * the number of tuples processed per worker.
-		 */
-		path->rows = clamp_row_est(path->rows / parallel_divisor);
-	}
-
-	path->startup_cost = startup_cost;
-	path->total_cost = startup_cost + cpu_run_cost + disk_run_cost;
+	TableScanCost_function cost_func = (TableScanCost_function) baserel->cost_scan;
+	Assert(cost_func != NULL);
+	return cost_func(path, root, baserel, param_info);
 }
 
 /*
@@ -286,65 +222,9 @@ void
 cost_samplescan(Path *path, PlannerInfo *root,
 				RelOptInfo *baserel, ParamPathInfo *param_info)
 {
-	Cost		startup_cost = 0;
-	Cost		run_cost = 0;
-	RangeTblEntry *rte;
-	TableSampleClause *tsc;
-	TsmRoutine *tsm;
-	double		spc_seq_page_cost,
-				spc_random_page_cost,
-				spc_page_cost;
-	QualCost	qpqual_cost;
-	Cost		cpu_per_tuple;
-
-	/* Should only be applied to base relations with tablesample clauses */
-	Assert(baserel->relid > 0);
-	rte = planner_rt_fetch(baserel->relid, root);
-	Assert(rte->rtekind == RTE_RELATION);
-	tsc = rte->tablesample;
-	Assert(tsc != NULL);
-	tsm = GetTsmRoutine(tsc->tsmhandler);
-
-	/* Mark the path with the correct row estimate */
-	if (param_info)
-		path->rows = param_info->ppi_rows;
-	else
-		path->rows = baserel->rows;
-
-	/* fetch estimated page cost for tablespace containing table */
-	get_tablespace_page_costs(baserel->reltablespace,
-							  &spc_random_page_cost,
-							  &spc_seq_page_cost);
-
-	/* if NextSampleBlock is used, assume random access, else sequential */
-	spc_page_cost = (tsm->NextSampleBlock != NULL) ?
-		spc_random_page_cost : spc_seq_page_cost;
-
-	/*
-	 * disk costs (recall that baserel->pages has already been set to the
-	 * number of pages the sampling method will visit)
-	 */
-	run_cost += spc_page_cost * baserel->pages;
-
-	/*
-	 * CPU costs (recall that baserel->tuples has already been set to the
-	 * number of tuples the sampling method will select).  Note that we ignore
-	 * execution cost of the TABLESAMPLE parameter expressions; they will be
-	 * evaluated only once per scan, and in most usages they'll likely be
-	 * simple constants anyway.  We also don't charge anything for the
-	 * calculations the sampling method might do internally.
-	 */
-	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
-
-	startup_cost += qpqual_cost.startup;
-	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
-	run_cost += cpu_per_tuple * baserel->tuples;
-	/* tlist eval costs are paid per output row, not per tuple scanned */
-	startup_cost += path->pathtarget->cost.startup;
-	run_cost += path->pathtarget->cost.per_tuple * path->rows;
-
-	path->startup_cost = startup_cost;
-	path->total_cost = startup_cost + run_cost;
+	TableScanCost_function cost_func = (TableScanCost_function) baserel->cost_samplescan;
+	Assert(cost_func != NULL);
+	return cost_func(path, root, baserel, param_info);
 }
 
 /*
@@ -3988,7 +3868,7 @@ cost_qual_eval_walker(Node *node, cost_qual_eval_context *context)
  * some of the quals.  We assume baserestrictcost was previously set by
  * set_baserel_size_estimates().
  */
-static void
+void
 get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
 						  ParamPathInfo *param_info,
 						  QualCost *qpqual_cost)
@@ -5343,7 +5223,7 @@ page_size(double tuples, int width)
  * Estimate the fraction of the work that each worker will do given the
  * number of workers budgeted for the path.
  */
-static double
+double
 get_parallel_divisor(Path *path)
 {
 	double		parallel_divisor = path->parallel_workers;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 86029cd1327..45f3b0372b9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -93,6 +93,8 @@ static void set_baserel_partition_key_exprs(Relation relation,
  *	pages		number of pages
  *	tuples		number of tuples
  *	rel_parallel_workers user-defined number of parallel workers
+ *	cost_scan	costing function for sequential scan
+ *	cost_samplescan costing function for sample scan
  *
  * Also, add information about the relation's foreign keys to root->fkey_list.
  *
@@ -443,6 +445,18 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 		rel->fdwroutine = NULL;
 	}
 
+	/* Get costing functions */
+	if (relation->rd_tableamroutine != NULL)
+	{
+		rel->cost_scan = relation->rd_tableamroutine->cost_scan;
+		rel->cost_samplescan = relation->rd_tableamroutine->cost_samplescan;
+	}
+	else
+	{
+		rel->cost_scan = NULL;
+		rel->cost_samplescan = NULL;
+	}
+
 	/* Collect info about relation's foreign keys, if relevant */
 	get_relation_foreign_keys(root, rel, relation, inhparent);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index dba50178887..2f342c7fef1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -17,6 +17,7 @@
 #include "access/genham.h"
 #include "nodes/lockoptions.h"
 #include "nodes/primnodes.h"
+#include "optimizer/cost.h"
 #include "storage/bufpage.h"
 #include "storage/lockdefs.h"
 #include "utils/relcache.h"
@@ -59,6 +60,13 @@ extern Relation heap_openrv_extended(const RangeVar *relation,
 
 #define heap_close(r,l)  relation_close(r,l)
 
+/* in heap/heamam_cost.c */
+extern void heapam_cost_scan(Path *path, PlannerInfo *root,
+							 RelOptInfo *baserel, ParamPathInfo *param_info);
+extern void heapam_cost_samplescan(Path *path, PlannerInfo *root,
+							 RelOptInfo *baserel, ParamPathInfo *param_info);
+
+
 /* struct definitions appear in relscan.h */
 typedef struct TableScanDescData *TableScanDesc;
 typedef struct HeapScanDescData *HeapScanDesc;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index fb37a739918..beea954885a 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,6 +22,7 @@
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
 #include "nodes/nodes.h"
+#include "optimizer/cost.h"
 #include "fmgr.h"
 #include "utils/guc.h"
 #include "utils/rel.h"
@@ -192,6 +193,8 @@ struct SampleScanState;
 typedef bool (*SampleScanNextBlock_function)(TableScanDesc scan, struct SampleScanState *scanstate);
 typedef bool (*SampleScanNextTuple_function)(TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot);
 
+typedef void (*TableScanCost_function)(Path *path, PlannerInfo *root, RelOptInfo *baserel, ParamPathInfo *param_info);
+
 /*
  * API struct for a table AM.  Note this must be allocated in a
  * server-lifetime manner, typically as a static const struct.
@@ -246,6 +249,10 @@ typedef struct TableAmRoutine
 
 	IndexBuildRangeScan_function index_build_range_scan;
 	IndexValidateScan_function index_validate_scan;
+
+	/* Costing functions */
+	TableScanCost_function cost_scan;
+	TableScanCost_function cost_samplescan;
 }			TableAmRoutine;
 
 static inline const TupleTableSlotOps*
diff --git a/src/include/access/zheap.h b/src/include/access/zheap.h
index c657c728ec3..583cf25f965 100644
--- a/src/include/access/zheap.h
+++ b/src/include/access/zheap.h
@@ -20,6 +20,7 @@
 #include "access/hio.h"
 #include "access/undoinsert.h"
 #include "access/zhtup.h"
+#include "optimizer/cost.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
 
@@ -211,4 +212,11 @@ typedef struct ZHeapFreeOffsetRanges
 	int nranges;
 } ZHeapFreeOffsetRanges;
 
+/* Zheap costing functions */
+extern void zheapam_cost_scan(Path *path, PlannerInfo *root,
+							  RelOptInfo *baserel, ParamPathInfo *param_info);
+extern void zheapam_cost_samplescan(Path *path, PlannerInfo *root,
+							  RelOptInfo *baserel, ParamPathInfo *param_info);
+
+
 #endif   /* ZHEAP_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index adb42650479..6c51fe27460 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -704,6 +704,10 @@ typedef struct RelOptInfo
 	List	  **partexprs;		/* Non-nullable partition key expressions. */
 	List	  **nullable_partexprs; /* Nullable partition key expressions. */
 	List	   *partitioned_child_rels; /* List of RT indexes. */
+
+	/* Rather than include tableam.h here, we declare costing functions like this */
+	void		(*cost_scan) ();	/* sequential scan cost estimator */
+	void		(*cost_samplescan) ();	/* sample scan cost estimator */
 } RelOptInfo;
 
 /*
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 77ca7ff8371..0af574c41fe 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -166,6 +166,9 @@ extern void cost_gather(GatherPath *path, PlannerInfo *root,
 extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
 extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
 extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
+extern void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
+						  ParamPathInfo *param_info,
+						  QualCost *qpqual_cost);
 extern void compute_semi_anti_join_factors(PlannerInfo *root,
 							   RelOptInfo *joinrel,
 							   RelOptInfo *outerrel,
@@ -198,6 +201,7 @@ extern void set_tablefunc_size_estimates(PlannerInfo *root, RelOptInfo *rel);
 extern void set_namedtuplestore_size_estimates(PlannerInfo *root, RelOptInfo *rel);
 extern void set_foreign_size_estimates(PlannerInfo *root, RelOptInfo *rel);
 extern PathTarget *set_pathtarget_cost_width(PlannerInfo *root, PathTarget *target);
+extern double get_parallel_divisor(Path *path);
 extern double compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel,
 					 Path *bitmapqual, int loop_count, Cost *cost, double *tuple);
 
#34Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#32)
Re: Pluggable Storage - Andres's take

On Tue, Oct 9, 2018 at 1:46 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Wed, Oct 3, 2018 at 3:16 PM Andres Freund <andres@anarazel.de> wrote:

On 2018-09-27 20:03:58 -0700, Andres Freund wrote:

On 2018-09-28 12:21:08 +1000, Haribabu Kommi wrote:

Here I attached further cleanup patches.
1. Re-arrange the GUC variable
2. Added a check function hook for default_table_access_method GUC

Cool.

3. Added a new hook validate_index. I tried to change the function
validate_index_heapscan to slotify, but that have many problems as it
is accessing some internals of the heapscandesc structure and

accessing

the buffer and etc.

Oops, I also did that locally, in a way. I also made a validate a
callback, as the validation logic is going to be specific to the AMs.
Sorry for not pushing that up earlier. I'll try to do that soon,
there's a fair amount of change.

I've pushed an updated version, with a fair amount of pending changes,
and I hope all your pending (and not redundant, by our concurrent
development), patches merged.

Yes, All the patches are merged.

There's currently 3 regression test failures, that I'll look into

tomorrow:
- partition_prune shows a few additional Heap Blocks: exact=1 lines. I'm
a bit confused as to why, but haven't really investigated yet.
- fast_default fails, because I've undone most of 7636e5c60fea83a9f3c,
I'll have to redo that in a different way.
- I occasionally see failures in aggregates.sql - I've not figured out
what's going on there.

I also observed the failure of aggregates.sql, will look into it.

Amit Khandekar said he'll publish a new version of the slot-abstraction
patch tomorrow, so I'll rebase it onto that ASAP.

OK.
Here I attached two new API patches.
1. Set New Rel File node
2. Create Init fork

The above patches are having problem and while testing it leads to crash.
Sorry for not testing earlier. The index relation also creates the
NewRelFileNode,
because of moving that function as pluggable table access method, and for
Index relations, there is no tableam routine, thus it leads to crash.

So moving the storage creation methods as table access methods doesn't
work. we may need common access methods that are shared across both
table and index.

Regards,
Haribabu Kommi
Fujitsu Australia

#35Amit Kapila
amit.kapila16@gmail.com
In reply to: Alexander Korotkov (#33)
Re: Pluggable Storage - Andres's take

On Tue, Oct 16, 2018 at 12:37 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

I've examined code in pluggable-zheap branch and EDB github [1] and I
didn't found anything related to "delete-marking" indexes as stated on
slide #25 of presentation [2]. So, basically contract between heap
and indexes is remain unchanged: once you update one indexed field,
you have to update all the others.

Yes, this will be the behavior for the first version.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#36Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#32)
Re: Pluggable Storage - Andres's take

On Tue, Oct 9, 2018 at 1:46 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Wed, Oct 3, 2018 at 3:16 PM Andres Freund <andres@anarazel.de> wrote:

On 2018-09-27 20:03:58 -0700, Andres Freund wrote:

On 2018-09-28 12:21:08 +1000, Haribabu Kommi wrote:

Here I attached further cleanup patches.
1. Re-arrange the GUC variable
2. Added a check function hook for default_table_access_method GUC

Cool.

3. Added a new hook validate_index. I tried to change the function
validate_index_heapscan to slotify, but that have many problems as it
is accessing some internals of the heapscandesc structure and

accessing

the buffer and etc.

Oops, I also did that locally, in a way. I also made a validate a
callback, as the validation logic is going to be specific to the AMs.
Sorry for not pushing that up earlier. I'll try to do that soon,
there's a fair amount of change.

I've pushed an updated version, with a fair amount of pending changes,
and I hope all your pending (and not redundant, by our concurrent
development), patches merged.

Yes, All the patches are merged.

There's currently 3 regression test failures, that I'll look into

tomorrow:
- partition_prune shows a few additional Heap Blocks: exact=1 lines. I'm
a bit confused as to why, but haven't really investigated yet.
- fast_default fails, because I've undone most of 7636e5c60fea83a9f3c,
I'll have to redo that in a different way.
- I occasionally see failures in aggregates.sql - I've not figured out
what's going on there.

I also observed the failure of aggregates.sql, will look into it.

The random failure of aggregates.sql is as follows

SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
! avg_32
! ---------------------
! 32.6666666666666667
(1 row)

  -- In 7.1, avg(float4) is computed using float8 arithmetic.
--- 8,16 ----
  (1 row)

SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
! avg_32
! --------
!
(1 row)

Same NULL result for another aggregate query on column b.

The aggtest table is accessed by two tests that are running in parallel.
i.e aggregates.sql and transactions.sql, In transactions.sql, inside a
transaction
all the records in the aggtest table are deleted and aborted the
transaction,
I suspect that some visibility checks are having some race conditions that
leads
to no records on the table aggtest, thus it returns NULL result.

If I try the scenario manually by opening a transaction and deleting the
records, the
issue is not occurring.

I am yet to find the cause for this problem.

Regards,
Haribabu Kommi

#37Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Alexander Korotkov (#33)
Re: Pluggable Storage - Andres's take

On Tue, Oct 16, 2018 at 6:06 AM Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:

Hi!

On Wed, Oct 3, 2018 at 8:16 AM Andres Freund <andres@anarazel.de> wrote:

I've pushed an updated version, with a fair amount of pending changes,
and I hope all your pending (and not redundant, by our concurrent
development), patches merged.

I'd like to also share some patches. I've used current state of
pluggable-zheap for the base of my patches.

Thanks for the review and patches.

* 0001-remove-extra-snapshot-functions.patch – removes
snapshot_satisfiesUpdate() and snapshot_satisfiesVacuum() functions
from tableam API. snapshot_satisfiesUpdate() was unused completely.
snapshot_satisfiesVacuum() was used only in heap_copy_for_cluster().
So, I've replaced it with direct heapam_satisfies_vacuum().

Thanks for the correction.

* 0002-add-costing-function-to-API.patch – adds function for costing
sequential and table sample scan to tableam API. zheap costing
function are now copies of heap costing function. This should be
adjusted in future.

This patch misses the new *_cost.c files that are added specific cost
functions.

Estimation for heap lookup during index scans
should be also pluggable, but not yet implemented (TODO).

Yes, Is it possible to use the same API that is added by above
patch?

Regards,
Haribabu Kommi
Fujitsu Australia

#38Alexander Korotkov
a.korotkov@postgrespro.ru
In reply to: Haribabu Kommi (#37)
2 attachment(s)
Re: Pluggable Storage - Andres's take

On Thu, Oct 18, 2018 at 6:28 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

On Tue, Oct 16, 2018 at 6:06 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

* 0002-add-costing-function-to-API.patch – adds function for costing
sequential and table sample scan to tableam API. zheap costing
function are now copies of heap costing function. This should be
adjusted in future.

This patch misses the new *_cost.c files that are added specific cost
functions.

Thank you for noticing. Revised patchset is attached.

Estimation for heap lookup during index scans
should be also pluggable, but not yet implemented (TODO).

Yes, Is it possible to use the same API that is added by above
patch?

I'm not yet sure. I'll elaborate more on that. I'd like to keep
number of costing functions small. Handling of costing of index scan
heap fetches will probably require function signature change.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0002-add-costing-function-to-API-2.patchapplication/octet-stream; name=0002-add-costing-function-to-API-2.patchDownload
commit 3a0868018d84db24024e2731e5bd6972b66d10e4
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date:   Mon Oct 15 21:33:31 2018 +0300

    Add costing function to tableam interface
    
    Costs of sequential scan and table sample scan are estimated using
    cost_seqscan() and cost_samplescan().  But they should be table access method
    specific, because different table AMs could have different costs.  This commit
    introduces zheap cost functions to be the same as heap.  That should be
    adjusted in future.  Make costs of heap lookups during index scan is TODO.

diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile
index aee7bfd8346..e13b0e0b8fa 100644
--- a/src/backend/access/heap/Makefile
+++ b/src/backend/access/heap/Makefile
@@ -13,6 +13,7 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = heapam.o heapam_handler.o heapam_visibility.o hio.o pruneheap.o \
-	rewriteheap.o syncscan.o tuptoaster.o vacuumlazy.o visibilitymap.o
+	rewriteheap.o syncscan.o tuptoaster.o vacuumlazy.o visibilitymap.o \
+	heapam_cost.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/heapam_cost.c b/src/backend/access/heap/heapam_cost.c
new file mode 100644
index 00000000000..5c685c262a2
--- /dev/null
+++ b/src/backend/access/heap/heapam_cost.c
@@ -0,0 +1,187 @@
+/*-------------------------------------------------------------------------
+ *
+ * heapam_cost.c
+ *	  costing functions for heap access method
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/heap/heapam_cost.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <math.h>
+
+#include "access/amapi.h"
+#include "access/htup_details.h"
+#include "access/tableam.h"
+#include "access/tsmapi.h"
+#include "executor/executor.h"
+#include "executor/nodeHash.h"
+#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/placeholder.h"
+#include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "parser/parsetree.h"
+#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/spccache.h"
+#include "utils/tuplesort.h"
+
+/*
+ * heapam_cost_scan
+ *	  Determines and returns the cost of scanning a relation sequentially.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+heapam_cost_scan(Path *path, PlannerInfo *root,
+				 RelOptInfo *baserel, ParamPathInfo *param_info)
+{
+	Cost		startup_cost = 0;
+	Cost		cpu_run_cost;
+	Cost		disk_run_cost;
+	double		spc_seq_page_cost;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (param_info)
+		path->rows = param_info->ppi_rows;
+	else
+		path->rows = baserel->rows;
+
+	if (!enable_seqscan)
+		startup_cost += disable_cost;
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  NULL,
+							  &spc_seq_page_cost);
+
+	/*
+	 * disk costs
+	 */
+	disk_run_cost = spc_seq_page_cost * baserel->pages;
+
+	/* CPU costs */
+	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+	cpu_run_cost = cpu_per_tuple * baserel->tuples;
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->pathtarget->cost.startup;
+	cpu_run_cost += path->pathtarget->cost.per_tuple * path->rows;
+
+	/* Adjust costing for parallelism, if used. */
+	if (path->parallel_workers > 0)
+	{
+		double		parallel_divisor = get_parallel_divisor(path);
+
+		/* The CPU cost is divided among all the workers. */
+		cpu_run_cost /= parallel_divisor;
+
+		/*
+		 * It may be possible to amortize some of the I/O cost, but probably
+		 * not very much, because most operating systems already do aggressive
+		 * prefetching.  For now, we assume that the disk run cost can't be
+		 * amortized at all.
+		 */
+
+		/*
+		 * In the case of a parallel plan, the row count needs to represent
+		 * the number of tuples processed per worker.
+		 */
+		path->rows = clamp_row_est(path->rows / parallel_divisor);
+	}
+
+	path->startup_cost = startup_cost;
+	path->total_cost = startup_cost + cpu_run_cost + disk_run_cost;
+}
+
+/*
+ * heapam_cost_samplescan
+ *	  Determines and returns the cost of scanning a relation using sampling.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+heapam_cost_samplescan(Path *path, PlannerInfo *root,
+					   RelOptInfo *baserel, ParamPathInfo *param_info)
+{
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	RangeTblEntry *rte;
+	TableSampleClause *tsc;
+	TsmRoutine *tsm;
+	double		spc_seq_page_cost,
+				spc_random_page_cost,
+				spc_page_cost;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+
+	/* Should only be applied to base relations with tablesample clauses */
+	Assert(baserel->relid > 0);
+	rte = planner_rt_fetch(baserel->relid, root);
+	Assert(rte->rtekind == RTE_RELATION);
+	tsc = rte->tablesample;
+	Assert(tsc != NULL);
+	tsm = GetTsmRoutine(tsc->tsmhandler);
+
+	/* Mark the path with the correct row estimate */
+	if (param_info)
+		path->rows = param_info->ppi_rows;
+	else
+		path->rows = baserel->rows;
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/* if NextSampleBlock is used, assume random access, else sequential */
+	spc_page_cost = (tsm->NextSampleBlock != NULL) ?
+		spc_random_page_cost : spc_seq_page_cost;
+
+	/*
+	 * disk costs (recall that baserel->pages has already been set to the
+	 * number of pages the sampling method will visit)
+	 */
+	run_cost += spc_page_cost * baserel->pages;
+
+	/*
+	 * CPU costs (recall that baserel->tuples has already been set to the
+	 * number of tuples the sampling method will select).  Note that we ignore
+	 * execution cost of the TABLESAMPLE parameter expressions; they will be
+	 * evaluated only once per scan, and in most usages they'll likely be
+	 * simple constants anyway.  We also don't charge anything for the
+	 * calculations the sampling method might do internally.
+	 */
+	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+	run_cost += cpu_per_tuple * baserel->tuples;
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->pathtarget->cost.startup;
+	run_cost += path->pathtarget->cost.per_tuple * path->rows;
+
+	path->startup_cost = startup_cost;
+	path->total_cost = startup_cost + run_cost;
+}
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 28c475e7bdc..35b230c3606 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2146,7 +2146,10 @@ static const TableAmRoutine heapam_methods = {
 
 	.index_build_range_scan = IndexBuildHeapRangeScan,
 
-	.index_validate_scan = validate_index_heapscan
+	.index_validate_scan = validate_index_heapscan,
+
+	.cost_scan = heapam_cost_scan,
+	.cost_samplescan = heapam_cost_samplescan
 };
 
 const TableAmRoutine *
diff --git a/src/backend/access/zheap/Makefile b/src/backend/access/zheap/Makefile
index 75b0ff69ebf..4458e1a4238 100644
--- a/src/backend/access/zheap/Makefile
+++ b/src/backend/access/zheap/Makefile
@@ -13,6 +13,6 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = prunetpd.o prunezheap.o tpd.o tpdxlog.o zheapam.o zheapam_handler.o zheapamutils.o zheapamxlog.o \
-	zhio.o zmultilocker.o ztuptoaster.o ztqual.o zvacuumlazy.o
+	zhio.o zmultilocker.o ztuptoaster.o ztqual.o zvacuumlazy.o zheapam_cost.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/zheap/zheapam_cost.c b/src/backend/access/zheap/zheapam_cost.c
new file mode 100644
index 00000000000..613ee1fb6e7
--- /dev/null
+++ b/src/backend/access/zheap/zheapam_cost.c
@@ -0,0 +1,187 @@
+/*-------------------------------------------------------------------------
+ *
+ * zheapam_cost.c
+ *	  costing functions for zheap access method
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/zheap/zheapam_cost.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <math.h>
+
+#include "access/amapi.h"
+#include "access/htup_details.h"
+#include "access/tableam.h"
+#include "access/tsmapi.h"
+#include "executor/executor.h"
+#include "executor/nodeHash.h"
+#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/placeholder.h"
+#include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "parser/parsetree.h"
+#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/spccache.h"
+#include "utils/tuplesort.h"
+
+/*
+ * zheapam_cost_scan
+ *	  Determines and returns the cost of scanning a relation sequentially.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+zheapam_cost_scan(Path *path, PlannerInfo *root,
+				  RelOptInfo *baserel, ParamPathInfo *param_info)
+{
+	Cost		startup_cost = 0;
+	Cost		cpu_run_cost;
+	Cost		disk_run_cost;
+	double		spc_seq_page_cost;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (param_info)
+		path->rows = param_info->ppi_rows;
+	else
+		path->rows = baserel->rows;
+
+	if (!enable_seqscan)
+		startup_cost += disable_cost;
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  NULL,
+							  &spc_seq_page_cost);
+
+	/*
+	 * disk costs
+	 */
+	disk_run_cost = spc_seq_page_cost * baserel->pages;
+
+	/* CPU costs */
+	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+	cpu_run_cost = cpu_per_tuple * baserel->tuples;
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->pathtarget->cost.startup;
+	cpu_run_cost += path->pathtarget->cost.per_tuple * path->rows;
+
+	/* Adjust costing for parallelism, if used. */
+	if (path->parallel_workers > 0)
+	{
+		double		parallel_divisor = get_parallel_divisor(path);
+
+		/* The CPU cost is divided among all the workers. */
+		cpu_run_cost /= parallel_divisor;
+
+		/*
+		 * It may be possible to amortize some of the I/O cost, but probably
+		 * not very much, because most operating systems already do aggressive
+		 * prefetching.  For now, we assume that the disk run cost can't be
+		 * amortized at all.
+		 */
+
+		/*
+		 * In the case of a parallel plan, the row count needs to represent
+		 * the number of tuples processed per worker.
+		 */
+		path->rows = clamp_row_est(path->rows / parallel_divisor);
+	}
+
+	path->startup_cost = startup_cost;
+	path->total_cost = startup_cost + cpu_run_cost + disk_run_cost;
+}
+
+/*
+ * zheapam_cost_samplescan
+ *	  Determines and returns the cost of scanning a relation using sampling.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+zheapam_cost_samplescan(Path *path, PlannerInfo *root,
+						RelOptInfo *baserel, ParamPathInfo *param_info)
+{
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	RangeTblEntry *rte;
+	TableSampleClause *tsc;
+	TsmRoutine *tsm;
+	double		spc_seq_page_cost,
+				spc_random_page_cost,
+				spc_page_cost;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+
+	/* Should only be applied to base relations with tablesample clauses */
+	Assert(baserel->relid > 0);
+	rte = planner_rt_fetch(baserel->relid, root);
+	Assert(rte->rtekind == RTE_RELATION);
+	tsc = rte->tablesample;
+	Assert(tsc != NULL);
+	tsm = GetTsmRoutine(tsc->tsmhandler);
+
+	/* Mark the path with the correct row estimate */
+	if (param_info)
+		path->rows = param_info->ppi_rows;
+	else
+		path->rows = baserel->rows;
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/* if NextSampleBlock is used, assume random access, else sequential */
+	spc_page_cost = (tsm->NextSampleBlock != NULL) ?
+		spc_random_page_cost : spc_seq_page_cost;
+
+	/*
+	 * disk costs (recall that baserel->pages has already been set to the
+	 * number of pages the sampling method will visit)
+	 */
+	run_cost += spc_page_cost * baserel->pages;
+
+	/*
+	 * CPU costs (recall that baserel->tuples has already been set to the
+	 * number of tuples the sampling method will select).  Note that we ignore
+	 * execution cost of the TABLESAMPLE parameter expressions; they will be
+	 * evaluated only once per scan, and in most usages they'll likely be
+	 * simple constants anyway.  We also don't charge anything for the
+	 * calculations the sampling method might do internally.
+	 */
+	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+	run_cost += cpu_per_tuple * baserel->tuples;
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->pathtarget->cost.startup;
+	run_cost += path->pathtarget->cost.per_tuple * path->rows;
+
+	path->startup_cost = startup_cost;
+	path->total_cost = startup_cost + run_cost;
+}
diff --git a/src/backend/access/zheap/zheapam_handler.c b/src/backend/access/zheap/zheapam_handler.c
index e707baa1b5d..da95c881a77 100644
--- a/src/backend/access/zheap/zheapam_handler.c
+++ b/src/backend/access/zheap/zheapam_handler.c
@@ -1627,6 +1627,8 @@ static const TableAmRoutine zheapam_methods = {
 	.index_build_range_scan = IndexBuildZHeapRangeScan,
 	.index_validate_scan = validate_index_zheapscan,
 
+	.cost_scan = zheapam_cost_scan,
+	.cost_samplescan = zheapam_cost_samplescan
 };
 
 Datum
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 7bf67a05295..abdddacf89a 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -75,6 +75,7 @@
 
 #include "access/amapi.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/tsmapi.h"
 #include "executor/executor.h"
 #include "executor/nodeHash.h"
@@ -150,9 +151,6 @@ static MergeScanSelCache *cached_scansel(PlannerInfo *root,
 static void cost_rescan(PlannerInfo *root, Path *path,
 			Cost *rescan_startup_cost, Cost *rescan_total_cost);
 static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
-static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
-						  ParamPathInfo *param_info,
-						  QualCost *qpqual_cost);
 static bool has_indexed_join_quals(NestPath *joinpath);
 static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
 				   List *quals);
@@ -174,7 +172,6 @@ static Cost append_nonpartial_cost(List *subpaths, int numpaths,
 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
-static double get_parallel_divisor(Path *path);
 
 
 /*
@@ -209,70 +206,9 @@ void
 cost_seqscan(Path *path, PlannerInfo *root,
 			 RelOptInfo *baserel, ParamPathInfo *param_info)
 {
-	Cost		startup_cost = 0;
-	Cost		cpu_run_cost;
-	Cost		disk_run_cost;
-	double		spc_seq_page_cost;
-	QualCost	qpqual_cost;
-	Cost		cpu_per_tuple;
-
-	/* Should only be applied to base relations */
-	Assert(baserel->relid > 0);
-	Assert(baserel->rtekind == RTE_RELATION);
-
-	/* Mark the path with the correct row estimate */
-	if (param_info)
-		path->rows = param_info->ppi_rows;
-	else
-		path->rows = baserel->rows;
-
-	if (!enable_seqscan)
-		startup_cost += disable_cost;
-
-	/* fetch estimated page cost for tablespace containing table */
-	get_tablespace_page_costs(baserel->reltablespace,
-							  NULL,
-							  &spc_seq_page_cost);
-
-	/*
-	 * disk costs
-	 */
-	disk_run_cost = spc_seq_page_cost * baserel->pages;
-
-	/* CPU costs */
-	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
-
-	startup_cost += qpqual_cost.startup;
-	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
-	cpu_run_cost = cpu_per_tuple * baserel->tuples;
-	/* tlist eval costs are paid per output row, not per tuple scanned */
-	startup_cost += path->pathtarget->cost.startup;
-	cpu_run_cost += path->pathtarget->cost.per_tuple * path->rows;
-
-	/* Adjust costing for parallelism, if used. */
-	if (path->parallel_workers > 0)
-	{
-		double		parallel_divisor = get_parallel_divisor(path);
-
-		/* The CPU cost is divided among all the workers. */
-		cpu_run_cost /= parallel_divisor;
-
-		/*
-		 * It may be possible to amortize some of the I/O cost, but probably
-		 * not very much, because most operating systems already do aggressive
-		 * prefetching.  For now, we assume that the disk run cost can't be
-		 * amortized at all.
-		 */
-
-		/*
-		 * In the case of a parallel plan, the row count needs to represent
-		 * the number of tuples processed per worker.
-		 */
-		path->rows = clamp_row_est(path->rows / parallel_divisor);
-	}
-
-	path->startup_cost = startup_cost;
-	path->total_cost = startup_cost + cpu_run_cost + disk_run_cost;
+	TableScanCost_function cost_func = (TableScanCost_function) baserel->cost_scan;
+	Assert(cost_func != NULL);
+	return cost_func(path, root, baserel, param_info);
 }
 
 /*
@@ -286,65 +222,9 @@ void
 cost_samplescan(Path *path, PlannerInfo *root,
 				RelOptInfo *baserel, ParamPathInfo *param_info)
 {
-	Cost		startup_cost = 0;
-	Cost		run_cost = 0;
-	RangeTblEntry *rte;
-	TableSampleClause *tsc;
-	TsmRoutine *tsm;
-	double		spc_seq_page_cost,
-				spc_random_page_cost,
-				spc_page_cost;
-	QualCost	qpqual_cost;
-	Cost		cpu_per_tuple;
-
-	/* Should only be applied to base relations with tablesample clauses */
-	Assert(baserel->relid > 0);
-	rte = planner_rt_fetch(baserel->relid, root);
-	Assert(rte->rtekind == RTE_RELATION);
-	tsc = rte->tablesample;
-	Assert(tsc != NULL);
-	tsm = GetTsmRoutine(tsc->tsmhandler);
-
-	/* Mark the path with the correct row estimate */
-	if (param_info)
-		path->rows = param_info->ppi_rows;
-	else
-		path->rows = baserel->rows;
-
-	/* fetch estimated page cost for tablespace containing table */
-	get_tablespace_page_costs(baserel->reltablespace,
-							  &spc_random_page_cost,
-							  &spc_seq_page_cost);
-
-	/* if NextSampleBlock is used, assume random access, else sequential */
-	spc_page_cost = (tsm->NextSampleBlock != NULL) ?
-		spc_random_page_cost : spc_seq_page_cost;
-
-	/*
-	 * disk costs (recall that baserel->pages has already been set to the
-	 * number of pages the sampling method will visit)
-	 */
-	run_cost += spc_page_cost * baserel->pages;
-
-	/*
-	 * CPU costs (recall that baserel->tuples has already been set to the
-	 * number of tuples the sampling method will select).  Note that we ignore
-	 * execution cost of the TABLESAMPLE parameter expressions; they will be
-	 * evaluated only once per scan, and in most usages they'll likely be
-	 * simple constants anyway.  We also don't charge anything for the
-	 * calculations the sampling method might do internally.
-	 */
-	get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
-
-	startup_cost += qpqual_cost.startup;
-	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
-	run_cost += cpu_per_tuple * baserel->tuples;
-	/* tlist eval costs are paid per output row, not per tuple scanned */
-	startup_cost += path->pathtarget->cost.startup;
-	run_cost += path->pathtarget->cost.per_tuple * path->rows;
-
-	path->startup_cost = startup_cost;
-	path->total_cost = startup_cost + run_cost;
+	TableScanCost_function cost_func = (TableScanCost_function) baserel->cost_samplescan;
+	Assert(cost_func != NULL);
+	return cost_func(path, root, baserel, param_info);
 }
 
 /*
@@ -3988,7 +3868,7 @@ cost_qual_eval_walker(Node *node, cost_qual_eval_context *context)
  * some of the quals.  We assume baserestrictcost was previously set by
  * set_baserel_size_estimates().
  */
-static void
+void
 get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
 						  ParamPathInfo *param_info,
 						  QualCost *qpqual_cost)
@@ -5343,7 +5223,7 @@ page_size(double tuples, int width)
  * Estimate the fraction of the work that each worker will do given the
  * number of workers budgeted for the path.
  */
-static double
+double
 get_parallel_divisor(Path *path)
 {
 	double		parallel_divisor = path->parallel_workers;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 86029cd1327..45f3b0372b9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -93,6 +93,8 @@ static void set_baserel_partition_key_exprs(Relation relation,
  *	pages		number of pages
  *	tuples		number of tuples
  *	rel_parallel_workers user-defined number of parallel workers
+ *	cost_scan	costing function for sequential scan
+ *	cost_samplescan costing function for sample scan
  *
  * Also, add information about the relation's foreign keys to root->fkey_list.
  *
@@ -443,6 +445,18 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 		rel->fdwroutine = NULL;
 	}
 
+	/* Get costing functions */
+	if (relation->rd_tableamroutine != NULL)
+	{
+		rel->cost_scan = relation->rd_tableamroutine->cost_scan;
+		rel->cost_samplescan = relation->rd_tableamroutine->cost_samplescan;
+	}
+	else
+	{
+		rel->cost_scan = NULL;
+		rel->cost_samplescan = NULL;
+	}
+
 	/* Collect info about relation's foreign keys, if relevant */
 	get_relation_foreign_keys(root, rel, relation, inhparent);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index dba50178887..2f342c7fef1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -17,6 +17,7 @@
 #include "access/genham.h"
 #include "nodes/lockoptions.h"
 #include "nodes/primnodes.h"
+#include "optimizer/cost.h"
 #include "storage/bufpage.h"
 #include "storage/lockdefs.h"
 #include "utils/relcache.h"
@@ -59,6 +60,13 @@ extern Relation heap_openrv_extended(const RangeVar *relation,
 
 #define heap_close(r,l)  relation_close(r,l)
 
+/* in heap/heamam_cost.c */
+extern void heapam_cost_scan(Path *path, PlannerInfo *root,
+							 RelOptInfo *baserel, ParamPathInfo *param_info);
+extern void heapam_cost_samplescan(Path *path, PlannerInfo *root,
+							 RelOptInfo *baserel, ParamPathInfo *param_info);
+
+
 /* struct definitions appear in relscan.h */
 typedef struct TableScanDescData *TableScanDesc;
 typedef struct HeapScanDescData *HeapScanDesc;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index fb37a739918..beea954885a 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,6 +22,7 @@
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
 #include "nodes/nodes.h"
+#include "optimizer/cost.h"
 #include "fmgr.h"
 #include "utils/guc.h"
 #include "utils/rel.h"
@@ -192,6 +193,8 @@ struct SampleScanState;
 typedef bool (*SampleScanNextBlock_function)(TableScanDesc scan, struct SampleScanState *scanstate);
 typedef bool (*SampleScanNextTuple_function)(TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot);
 
+typedef void (*TableScanCost_function)(Path *path, PlannerInfo *root, RelOptInfo *baserel, ParamPathInfo *param_info);
+
 /*
  * API struct for a table AM.  Note this must be allocated in a
  * server-lifetime manner, typically as a static const struct.
@@ -246,6 +249,10 @@ typedef struct TableAmRoutine
 
 	IndexBuildRangeScan_function index_build_range_scan;
 	IndexValidateScan_function index_validate_scan;
+
+	/* Costing functions */
+	TableScanCost_function cost_scan;
+	TableScanCost_function cost_samplescan;
 }			TableAmRoutine;
 
 static inline const TupleTableSlotOps*
diff --git a/src/include/access/zheap.h b/src/include/access/zheap.h
index c657c728ec3..583cf25f965 100644
--- a/src/include/access/zheap.h
+++ b/src/include/access/zheap.h
@@ -20,6 +20,7 @@
 #include "access/hio.h"
 #include "access/undoinsert.h"
 #include "access/zhtup.h"
+#include "optimizer/cost.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
 
@@ -211,4 +212,11 @@ typedef struct ZHeapFreeOffsetRanges
 	int nranges;
 } ZHeapFreeOffsetRanges;
 
+/* Zheap costing functions */
+extern void zheapam_cost_scan(Path *path, PlannerInfo *root,
+							  RelOptInfo *baserel, ParamPathInfo *param_info);
+extern void zheapam_cost_samplescan(Path *path, PlannerInfo *root,
+							  RelOptInfo *baserel, ParamPathInfo *param_info);
+
+
 #endif   /* ZHEAP_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index adb42650479..6c51fe27460 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -704,6 +704,10 @@ typedef struct RelOptInfo
 	List	  **partexprs;		/* Non-nullable partition key expressions. */
 	List	  **nullable_partexprs; /* Nullable partition key expressions. */
 	List	   *partitioned_child_rels; /* List of RT indexes. */
+
+	/* Rather than include tableam.h here, we declare costing functions like this */
+	void		(*cost_scan) ();	/* sequential scan cost estimator */
+	void		(*cost_samplescan) ();	/* sample scan cost estimator */
 } RelOptInfo;
 
 /*
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 77ca7ff8371..0af574c41fe 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -166,6 +166,9 @@ extern void cost_gather(GatherPath *path, PlannerInfo *root,
 extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
 extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
 extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
+extern void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
+						  ParamPathInfo *param_info,
+						  QualCost *qpqual_cost);
 extern void compute_semi_anti_join_factors(PlannerInfo *root,
 							   RelOptInfo *joinrel,
 							   RelOptInfo *outerrel,
@@ -198,6 +201,7 @@ extern void set_tablefunc_size_estimates(PlannerInfo *root, RelOptInfo *rel);
 extern void set_namedtuplestore_size_estimates(PlannerInfo *root, RelOptInfo *rel);
 extern void set_foreign_size_estimates(PlannerInfo *root, RelOptInfo *rel);
 extern PathTarget *set_pathtarget_cost_width(PlannerInfo *root, PathTarget *target);
+extern double get_parallel_divisor(Path *path);
 extern double compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel,
 					 Path *bitmapqual, int loop_count, Cost *cost, double *tuple);
 
0001-remove-extra-snapshot-functions-2.patchapplication/octet-stream; name=0001-remove-extra-snapshot-functions-2.patchDownload
commit fea0ea9bceb698dbe88bee42d5fc5f3332a658dd
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date:   Wed Sep 26 15:29:43 2018 +0300

    Remove some snapshot functions from TableAmRoutine
    
    snapshot_satisfiesUpdate was unused.  snapshot_satisfiesVacuum was used only
    inside heap_copy_for_cluster, so it was replaced to direct
    heapam_satisfies_vacuum() call.

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c3960dc91fd..28c475e7bdc 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -483,21 +483,6 @@ heapam_satisfies(TupleTableSlot *slot, Snapshot snapshot)
 	return res;
 }
 
-static HTSU_Result
-heapam_satisfies_update(TupleTableSlot *slot, CommandId curcid)
-{
-	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
-	HTSU_Result res;
-
-	LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
-	res = HeapTupleSatisfiesUpdate(bslot->base.tuple,
-								   curcid,
-								   bslot->buffer);
-	LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
-
-	return res;
-}
-
 static HTSV_Result
 heapam_satisfies_vacuum(TupleTableSlot *slot, TransactionId OldestXmin)
 {
@@ -2003,7 +1988,7 @@ heap_copy_for_cluster(Relation OldHeap, Relation NewHeap, Relation OldIndex,
 				break;
 		}
 
-		switch (OldHeap->rd_tableamroutine->snapshot_satisfiesVacuum(slot, OldestXmin))
+		switch (heapam_satisfies_vacuum(slot, OldestXmin))
 		{
 			case HEAPTUPLE_DEAD:
 				/* Definitely dead */
@@ -2122,8 +2107,6 @@ static const TableAmRoutine heapam_methods = {
 	.slot_callbacks = heapam_slot_callbacks,
 
 	.snapshot_satisfies = heapam_satisfies,
-	.snapshot_satisfiesUpdate = heapam_satisfies_update,
-	.snapshot_satisfiesVacuum = heapam_satisfies_vacuum,
 
 	.scan_begin = heap_beginscan,
 	.scansetlimits = heap_setscanlimits,
diff --git a/src/backend/access/zheap/zheapam_handler.c b/src/backend/access/zheap/zheapam_handler.c
index bec9b16f7d6..e707baa1b5d 100644
--- a/src/backend/access/zheap/zheapam_handler.c
+++ b/src/backend/access/zheap/zheapam_handler.c
@@ -486,40 +486,6 @@ zheapam_satisfies(TupleTableSlot *slot, Snapshot snapshot)
 #endif
 }
 
-static HTSU_Result
-zheapam_satisfies_update(TupleTableSlot *slot, CommandId curcid)
-{
-	elog(ERROR, "would need to track buffer or refetch");
-#if ZBORKED
-	BufferHeapTupleTableSlot *zslot = (BufferHeapTupleTableSlot *) slot;
-	HTSU_Result res;
-
-
-	LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
-	res = HeapTupleSatisfiesUpdate(bslot->base.tuple,
-								   curcid,
-								   bslot->buffer);
-	LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
-
-	return res;
-#endif
-}
-
-static HTSV_Result
-zheapam_satisfies_vacuum(TupleTableSlot *slot, TransactionId OldestXmin)
-{
-	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
-	HTSV_Result res;
-
-	LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
-	res = HeapTupleSatisfiesVacuum(bslot->base.tuple,
-								   OldestXmin,
-								   bslot->buffer);
-	LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
-
-	return res;
-}
-
 static IndexFetchTableData*
 zheapam_begin_index_fetch(Relation rel)
 {
@@ -1621,8 +1587,6 @@ static const TableAmRoutine zheapam_methods = {
 	.slot_callbacks = zheapam_slot_callbacks,
 
 	.snapshot_satisfies = zheapam_satisfies,
-	.snapshot_satisfiesUpdate = zheapam_satisfies_update,
-	.snapshot_satisfiesVacuum = zheapam_satisfies_vacuum,
 
 	.scan_begin = zheap_beginscan,
 	.scansetlimits = zheap_setscanlimits,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 7fe6ff6c221..fb37a739918 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -55,8 +55,6 @@ extern bool synchronize_seqscans;
  * Storage routine function hooks
  */
 typedef bool (*SnapshotSatisfies_function) (TupleTableSlot *slot, Snapshot snapshot);
-typedef HTSU_Result (*SnapshotSatisfiesUpdate_function) (TupleTableSlot *slot, CommandId curcid);
-typedef HTSV_Result (*SnapshotSatisfiesVacuum_function) (TupleTableSlot *slot, TransactionId OldestXmin);
 
 typedef Oid (*TupleInsert_function) (Relation rel, TupleTableSlot *slot, CommandId cid,
 									 int options, BulkInsertState bistate);
@@ -205,8 +203,6 @@ typedef struct TableAmRoutine
 	SlotCallbacks_function slot_callbacks;
 
 	SnapshotSatisfies_function snapshot_satisfies;
-	SnapshotSatisfiesUpdate_function snapshot_satisfiesUpdate;
-	SnapshotSatisfiesVacuum_function snapshot_satisfiesVacuum;
 
 	/* Operations on physical tuples */
 	TupleInsert_function tuple_insert;
#39Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#36)
3 attachment(s)
Re: Pluggable Storage - Andres's take

On Thu, Oct 18, 2018 at 1:04 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Tue, Oct 9, 2018 at 1:46 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

I also observed the failure of aggregates.sql, will look into it.

The random failure of aggregates.sql is as follows

SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
! avg_32
! ---------------------
! 32.6666666666666667
(1 row)

-- In 7.1, avg(float4) is computed using float8 arithmetic.
--- 8,16 ----
(1 row)

SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
! avg_32
! --------
!
(1 row)

Same NULL result for another aggregate query on column b.

The aggtest table is accessed by two tests that are running in parallel.
i.e aggregates.sql and transactions.sql, In transactions.sql, inside a
transaction
all the records in the aggtest table are deleted and aborted the
transaction,
I suspect that some visibility checks are having some race conditions that
leads
to no records on the table aggtest, thus it returns NULL result.

If I try the scenario manually by opening a transaction and deleting the
records, the
issue is not occurring.

I am yet to find the cause for this problem.

I am not yet able to generate a test case where the above issue can occur
easily for
debugging, it is happening randomly. I will try to add some logs to find
out the problem.

During the checking for the above problem, I found some corrections,
1. Remove of the tableam_common.c file as it is not used.
2. Remove the extra heaptuple visibile check in heapgettup_pagemode function
3. New API for init fork.

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0003-init-fork-API.patchapplication/octet-stream; name=0003-init-fork-API.patchDownload
From edc69f750fa13e252bff67f4aa615d4fdcec2b5e Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 22 Oct 2018 17:58:27 +1100
Subject: [PATCH 3/3] init fork API

API to create INIT_FORKNUM file with wrapper
table_create_init_fork.
---
 src/backend/access/heap/heapam_handler.c | 27 +++++++++++++++++++++++-
 src/backend/catalog/heap.c               | 24 ++-------------------
 src/backend/commands/tablecmds.c         |  4 ++--
 src/include/access/tableam.h             | 10 +++++++++
 src/include/catalog/heap.h               |  2 --
 5 files changed, 40 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c3960dc91f..c0cfbe74b1 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -30,6 +30,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/pg_am_d.h"
+#include "catalog/storage_xlog.h"
 #include "executor/executor.h"
 #include "pgstat.h"
 #include "storage/lmgr.h"
@@ -2116,6 +2117,28 @@ heap_copy_for_cluster(Relation OldHeap, Relation NewHeap, Relation OldIndex,
 	pfree(isnull);
 }
 
+/*
+ * Set up an init fork for an unlogged table so that it can be correctly
+ * reinitialized on restart.  An immediate sync is required even if the
+ * page has been logged, because the write did not go through
+ * shared_buffers and therefore a concurrent checkpoint may have moved
+ * the redo pointer past our xlog record.  Recovery may as well remove it
+ * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
+ * record. Therefore, logging is necessary even if wal_level=minimal.
+ */
+static void
+heap_create_init_fork(Relation rel)
+{
+	Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
+		   rel->rd_rel->relkind == RELKIND_MATVIEW ||
+		   rel->rd_rel->relkind == RELKIND_TOASTVALUE);
+	RelationOpenSmgr(rel);
+	smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
+	log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
+	smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
+}
+
+
 static const TableAmRoutine heapam_methods = {
 	.type = T_TableAmRoutine,
 
@@ -2163,7 +2186,9 @@ static const TableAmRoutine heapam_methods = {
 
 	.index_build_range_scan = IndexBuildHeapRangeScan,
 
-	.index_validate_scan = validate_index_heapscan
+	.index_validate_scan = validate_index_heapscan,
+
+	.CreateInitFork = heap_create_init_fork
 };
 
 const TableAmRoutine *
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 38b368f916..8e7c8ce684 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -32,6 +32,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/sysattr.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
@@ -1425,7 +1426,7 @@ heap_create_with_catalog(const char *relname,
 	 */
 	if (relpersistence == RELPERSISTENCE_UNLOGGED &&
 		relkind != RELKIND_PARTITIONED_TABLE)
-		heap_create_init_fork(new_rel_desc);
+		table_create_init_fork(new_rel_desc);
 
 	/*
 	 * ok, the relation has been cataloged, so close our relations and return
@@ -1437,27 +1438,6 @@ heap_create_with_catalog(const char *relname,
 	return relid;
 }
 
-/*
- * Set up an init fork for an unlogged table so that it can be correctly
- * reinitialized on restart.  An immediate sync is required even if the
- * page has been logged, because the write did not go through
- * shared_buffers and therefore a concurrent checkpoint may have moved
- * the redo pointer past our xlog record.  Recovery may as well remove it
- * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
- * record. Therefore, logging is necessary even if wal_level=minimal.
- */
-void
-heap_create_init_fork(Relation rel)
-{
-	Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
-		   rel->rd_rel->relkind == RELKIND_MATVIEW ||
-		   rel->rd_rel->relkind == RELKIND_TOASTVALUE);
-	RelationOpenSmgr(rel);
-	smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
-	log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
-	smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
-}
-
 /*
  *		RelationRemoveInheritance
  *
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f3526b267d..3c46a48882 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1649,7 +1649,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 			RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
 									  RecentXmin, minmulti);
 			if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
-				heap_create_init_fork(rel);
+				table_create_init_fork(rel);
 
 			heap_relid = RelationGetRelid(rel);
 			toast_relid = rel->rd_rel->reltoastrelid;
@@ -1663,7 +1663,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 				RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
 										  RecentXmin, minmulti);
 				if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
-					heap_create_init_fork(rel);
+					table_create_init_fork(rel);
 				heap_close(rel, NoLock);
 			}
 
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 7fe6ff6c22..79c71b06e5 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -194,6 +194,8 @@ struct SampleScanState;
 typedef bool (*SampleScanNextBlock_function)(TableScanDesc scan, struct SampleScanState *scanstate);
 typedef bool (*SampleScanNextTuple_function)(TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot);
 
+typedef void (*CreateInitFork_function)(Relation rel);
+
 /*
  * API struct for a table AM.  Note this must be allocated in a
  * server-lifetime manner, typically as a static const struct.
@@ -250,6 +252,8 @@ typedef struct TableAmRoutine
 
 	IndexBuildRangeScan_function index_build_range_scan;
 	IndexValidateScan_function index_validate_scan;
+
+	CreateInitFork_function CreateInitFork;
 }			TableAmRoutine;
 
 static inline const TupleTableSlotOps*
@@ -741,6 +745,12 @@ table_index_build_range_scan(Relation heapRelation,
 		scan);
 }
 
+static inline void
+table_create_init_fork(Relation relation)
+{
+	relation->rd_tableamroutine->CreateInitFork(relation);
+}
+
 extern BlockNumber table_parallelscan_nextpage(TableScanDesc scan);
 extern void table_parallelscan_startblock_init(TableScanDesc scan);
 extern Size table_parallelscan_estimate(Snapshot snapshot);
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 4584b3473c..c0e706ecc9 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -77,8 +77,6 @@ extern Oid heap_create_with_catalog(const char *relname,
 						 Oid relrewrite,
 						 ObjectAddress *typaddress);
 
-extern void heap_create_init_fork(Relation rel);
-
 extern void heap_drop_with_catalog(Oid relid);
 
 extern void heap_truncate(List *relids);
-- 
2.18.0.windows.1

0002-Remove-the-extra-Tuple-visibility-function.patchapplication/octet-stream; name=0002-Remove-the-extra-Tuple-visibility-function.patchDownload
From 3b4974dd4fa27f7a726b49cb8b16828818fe4093 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 22 Oct 2018 16:07:07 +1100
Subject: [PATCH 2/3] Remove the extra Tuple visibility function

In heapgettup_pagemode the tuple visiblity check is added
during the early devlopment of pluggable storage, but the visibility
check is already carried out in heapgetpage function itself.
---
 src/backend/access/heap/heapam.c | 28 +++++++++++-----------------
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ec99d0bcae..ef6b4c3e54 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -947,31 +947,25 @@ heapgettup_pagemode(HeapScanDesc scan,
 			/*
 			 * if current tuple qualifies, return it.
 			 */
-			if (HeapTupleSatisfies(tuple, scan->rs_scan.rs_snapshot, scan->rs_cbuf))
+			if (key != NULL)
 			{
-				/*
-				 * if current tuple qualifies, return it.
-				 */
-				if (key != NULL)
-				{
-					bool		valid;
+				bool		valid;
 
-					HeapKeyTest(tuple, RelationGetDescr(scan->rs_scan.rs_rd),
-								nkeys, key, valid);
-					if (valid)
-					{
-						scan->rs_cindex = lineindex;
-						LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-						return;
-					}
-				}
-				else
+				HeapKeyTest(tuple, RelationGetDescr(scan->rs_scan.rs_rd),
+							nkeys, key, valid);
+				if (valid)
 				{
 					scan->rs_cindex = lineindex;
 					LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 					return;
 				}
 			}
+			else
+			{
+				scan->rs_cindex = lineindex;
+				LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+				return;
+			}
 
 			/*
 			 * otherwise move to the next item on the page
-- 
2.18.0.windows.1

0001-Remove-the-old-slot-interface-file-and-also-update-t.patchapplication/octet-stream; name=0001-Remove-the-old-slot-interface-file-and-also-update-t.patchDownload
From ab6169b484b134522a4bacb22f226c3dd4f67e42 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 22 Oct 2018 16:06:10 +1100
Subject: [PATCH 1/3] Remove the old slot interface file and also update the
 Makefile.

---
 src/backend/access/table/Makefile         | 2 +-
 src/backend/access/table/tableam_common.c | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 delete mode 100644 src/backend/access/table/tableam_common.c

diff --git a/src/backend/access/table/Makefile b/src/backend/access/table/Makefile
index fe22bf9208..006ba99182 100644
--- a/src/backend/access/table/Makefile
+++ b/src/backend/access/table/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/access/table
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = tableam.o tableamapi.o tableam_common.o
+OBJS = tableam.o tableamapi.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/table/tableam_common.c b/src/backend/access/table/tableam_common.c
deleted file mode 100644
index e69de29bb2..0000000000
-- 
2.18.0.windows.1

#40Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#39)
2 attachment(s)
Re: Pluggable Storage - Andres's take

On Mon, Oct 22, 2018 at 6:16 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Thu, Oct 18, 2018 at 1:04 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Tue, Oct 9, 2018 at 1:46 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

I also observed the failure of aggregates.sql, will look into it.

The random failure of aggregates.sql is as follows

SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
! avg_32
! ---------------------
! 32.6666666666666667
(1 row)

-- In 7.1, avg(float4) is computed using float8 arithmetic.
--- 8,16 ----
(1 row)

SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
! avg_32
! --------
!
(1 row)

Same NULL result for another aggregate query on column b.

The aggtest table is accessed by two tests that are running in parallel.
i.e aggregates.sql and transactions.sql, In transactions.sql, inside a
transaction
all the records in the aggtest table are deleted and aborted the
transaction,
I suspect that some visibility checks are having some race conditions
that leads
to no records on the table aggtest, thus it returns NULL result.

If I try the scenario manually by opening a transaction and deleting the
records, the
issue is not occurring.

I am yet to find the cause for this problem.

I am not yet able to generate a test case where the above issue can occur
easily for
debugging, it is happening randomly. I will try to add some logs to find
out the problem.

I am able to generate the simple test and found the problem. The issue with
the following
SQL.

SELECT *
INTO TABLE xacttest
FROM aggtest;

During the processing of the above query, the tuple that is selected from
the aggtest is
sent to the intorel_receive() function, and the same tuple is used for the
insert, because
of this reason, the tuple xmin is updated and it leads to failure of
selecting the data from
another query. I fixed this issue by materializing the slot.

During the above test run, I found another issue during analyze, that is
trying to access
the invalid offset data. Attached a fix patch.

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0001-scan-start-offset-fix-during-analyze.patchapplication/octet-stream; name=0001-scan-start-offset-fix-during-analyze.patchDownload
From 82ae4f7a6ef78c9e04bc5abeeb0593b890ee454b Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Tue, 23 Oct 2018 15:42:57 +1100
Subject: [PATCH 1/2] scan start offset fix during analyze

---
 src/backend/access/heap/heapam_handler.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c0cfbe74b1..ae832e1f71 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1742,7 +1742,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
 {
 	HeapScanDesc scan = (HeapScanDesc) sscan;
 	Page		targpage;
-	OffsetNumber targoffset = scan->rs_cindex;
+	OffsetNumber targoffset;
 	OffsetNumber maxoffset;
 	BufferHeapTupleTableSlot *hslot;
 
@@ -1752,7 +1752,9 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
 	maxoffset = PageGetMaxOffsetNumber(targpage);
 
 	/* Inner loop over all tuples on the selected page */
-	for (targoffset = scan->rs_cindex; targoffset <= maxoffset; targoffset++)
+	for (targoffset = scan->rs_cindex ? scan->rs_cindex : FirstOffsetNumber;
+			targoffset <= maxoffset;
+			targoffset++)
 	{
 		ItemId		itemid;
 		HeapTuple	targtuple = &hslot->base.tupdata;
-- 
2.18.0.windows.1

0002-Materialize-all-the-slots-before-they-are-processed-.patchapplication/octet-stream; name=0002-Materialize-all-the-slots-before-they-are-processed-.patchDownload
From 52865d2ede0c6d0b2b8af26d67736124cd44450d Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Tue, 23 Oct 2018 17:15:18 +1100
Subject: [PATCH 2/2] Materialize all the slots before they are processed using
 into_rel

---
 src/backend/commands/createas.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 84de804175..27a28a896d 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -593,6 +593,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 	if (myState->rel->rd_rel->relhasoids)
 		slot->tts_tupleOid = InvalidOid;
 
+	ExecMaterializeSlot(slot);
 	table_insert(myState->rel,
 				 slot,
 				 myState->output_cid,
-- 
2.18.0.windows.1

#41Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#40)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Tue, Oct 23, 2018 at 5:49 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

I am able to generate the simple test and found the problem. The issue
with the following
SQL.

SELECT *
INTO TABLE xacttest
FROM aggtest;

During the processing of the above query, the tuple that is selected from
the aggtest is
sent to the intorel_receive() function, and the same tuple is used for the
insert, because
of this reason, the tuple xmin is updated and it leads to failure of
selecting the data from
another query. I fixed this issue by materializing the slot.

Wrong patch attached in the earlier mail, sorry for the inconvenience.
Attached proper fix patch.

I will look into isolation test failures.

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0002-Materialize-the-slot-before-they-are-processed-using.patchapplication/octet-stream; name=0002-Materialize-the-slot-before-they-are-processed-using.patchDownload
From f0d9dbf5c5608beb99b879e7317b68a285bbeab8 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Tue, 23 Oct 2018 17:15:18 +1100
Subject: [PATCH 2/2] Materialize the slot before they are processed using
 intorel_receive

---
 src/backend/commands/createas.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 84de804175..d3ffe417ff 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -593,6 +593,10 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 	if (myState->rel->rd_rel->relhasoids)
 		slot->tts_tupleOid = InvalidOid;
 
+	/* Materialize the slot */
+	if (!TTS_IS_VIRTUAL(slot))
+		ExecMaterializeSlot(slot);
+
 	table_insert(myState->rel,
 				 slot,
 				 myState->output_cid,
-- 
2.18.0.windows.1

#42Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#41)
2 attachment(s)
Re: Pluggable Storage - Andres's take

On Tue, Oct 23, 2018 at 6:11 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Tue, Oct 23, 2018 at 5:49 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

I am able to generate the simple test and found the problem. The issue
with the following
SQL.

SELECT *
INTO TABLE xacttest
FROM aggtest;

During the processing of the above query, the tuple that is selected from
the aggtest is
sent to the intorel_receive() function, and the same tuple is used for
the insert, because
of this reason, the tuple xmin is updated and it leads to failure of
selecting the data from
another query. I fixed this issue by materializing the slot.

Wrong patch attached in the earlier mail, sorry for the inconvenience.
Attached proper fix patch.

I will look into isolation test failures.

Here I attached the cumulative patch with all fixes that are shared in
earlier mails by me.
Except fast_default test, rest of test failures are fixed.

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0002-init-fork-API.patchapplication/octet-stream; name=0002-init-fork-API.patchDownload
From 118907d991360848715c893f8a9cf892ecc2bd5b Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 22 Oct 2018 17:58:27 +1100
Subject: [PATCH 2/2] init fork API

API to create INIT_FORKNUM file with wrapper
table_create_init_fork.
---
 src/backend/access/heap/heapam_handler.c | 27 +++++++++++++++++++++++-
 src/backend/catalog/heap.c               | 24 ++-------------------
 src/backend/commands/tablecmds.c         |  4 ++--
 src/include/access/tableam.h             | 10 +++++++++
 src/include/catalog/heap.h               |  2 --
 5 files changed, 40 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3254e30a45..ae832e1f71 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -30,6 +30,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/pg_am_d.h"
+#include "catalog/storage_xlog.h"
 #include "executor/executor.h"
 #include "pgstat.h"
 #include "storage/lmgr.h"
@@ -2118,6 +2119,28 @@ heap_copy_for_cluster(Relation OldHeap, Relation NewHeap, Relation OldIndex,
 	pfree(isnull);
 }
 
+/*
+ * Set up an init fork for an unlogged table so that it can be correctly
+ * reinitialized on restart.  An immediate sync is required even if the
+ * page has been logged, because the write did not go through
+ * shared_buffers and therefore a concurrent checkpoint may have moved
+ * the redo pointer past our xlog record.  Recovery may as well remove it
+ * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
+ * record. Therefore, logging is necessary even if wal_level=minimal.
+ */
+static void
+heap_create_init_fork(Relation rel)
+{
+	Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
+		   rel->rd_rel->relkind == RELKIND_MATVIEW ||
+		   rel->rd_rel->relkind == RELKIND_TOASTVALUE);
+	RelationOpenSmgr(rel);
+	smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
+	log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
+	smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
+}
+
+
 static const TableAmRoutine heapam_methods = {
 	.type = T_TableAmRoutine,
 
@@ -2165,7 +2188,9 @@ static const TableAmRoutine heapam_methods = {
 
 	.index_build_range_scan = IndexBuildHeapRangeScan,
 
-	.index_validate_scan = validate_index_heapscan
+	.index_validate_scan = validate_index_heapscan,
+
+	.CreateInitFork = heap_create_init_fork
 };
 
 const TableAmRoutine *
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 38b368f916..8e7c8ce684 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -32,6 +32,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/sysattr.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
@@ -1425,7 +1426,7 @@ heap_create_with_catalog(const char *relname,
 	 */
 	if (relpersistence == RELPERSISTENCE_UNLOGGED &&
 		relkind != RELKIND_PARTITIONED_TABLE)
-		heap_create_init_fork(new_rel_desc);
+		table_create_init_fork(new_rel_desc);
 
 	/*
 	 * ok, the relation has been cataloged, so close our relations and return
@@ -1437,27 +1438,6 @@ heap_create_with_catalog(const char *relname,
 	return relid;
 }
 
-/*
- * Set up an init fork for an unlogged table so that it can be correctly
- * reinitialized on restart.  An immediate sync is required even if the
- * page has been logged, because the write did not go through
- * shared_buffers and therefore a concurrent checkpoint may have moved
- * the redo pointer past our xlog record.  Recovery may as well remove it
- * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
- * record. Therefore, logging is necessary even if wal_level=minimal.
- */
-void
-heap_create_init_fork(Relation rel)
-{
-	Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
-		   rel->rd_rel->relkind == RELKIND_MATVIEW ||
-		   rel->rd_rel->relkind == RELKIND_TOASTVALUE);
-	RelationOpenSmgr(rel);
-	smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
-	log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
-	smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
-}
-
 /*
  *		RelationRemoveInheritance
  *
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f3526b267d..3c46a48882 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1649,7 +1649,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 			RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
 									  RecentXmin, minmulti);
 			if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
-				heap_create_init_fork(rel);
+				table_create_init_fork(rel);
 
 			heap_relid = RelationGetRelid(rel);
 			toast_relid = rel->rd_rel->reltoastrelid;
@@ -1663,7 +1663,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 				RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
 										  RecentXmin, minmulti);
 				if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
-					heap_create_init_fork(rel);
+					table_create_init_fork(rel);
 				heap_close(rel, NoLock);
 			}
 
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 7fe6ff6c22..79c71b06e5 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -194,6 +194,8 @@ struct SampleScanState;
 typedef bool (*SampleScanNextBlock_function)(TableScanDesc scan, struct SampleScanState *scanstate);
 typedef bool (*SampleScanNextTuple_function)(TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot);
 
+typedef void (*CreateInitFork_function)(Relation rel);
+
 /*
  * API struct for a table AM.  Note this must be allocated in a
  * server-lifetime manner, typically as a static const struct.
@@ -250,6 +252,8 @@ typedef struct TableAmRoutine
 
 	IndexBuildRangeScan_function index_build_range_scan;
 	IndexValidateScan_function index_validate_scan;
+
+	CreateInitFork_function CreateInitFork;
 }			TableAmRoutine;
 
 static inline const TupleTableSlotOps*
@@ -741,6 +745,12 @@ table_index_build_range_scan(Relation heapRelation,
 		scan);
 }
 
+static inline void
+table_create_init_fork(Relation relation)
+{
+	relation->rd_tableamroutine->CreateInitFork(relation);
+}
+
 extern BlockNumber table_parallelscan_nextpage(TableScanDesc scan);
 extern void table_parallelscan_startblock_init(TableScanDesc scan);
 extern Size table_parallelscan_estimate(Snapshot snapshot);
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 4584b3473c..c0e706ecc9 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -77,8 +77,6 @@ extern Oid heap_create_with_catalog(const char *relname,
 						 Oid relrewrite,
 						 ObjectAddress *typaddress);
 
-extern void heap_create_init_fork(Relation rel);
-
 extern void heap_drop_with_catalog(Oid relid);
 
 extern void heap_truncate(List *relids);
-- 
2.18.0.windows.1

0001-Further-fixes-and-cleanup.patchapplication/octet-stream; name=0001-Further-fixes-and-cleanup.patchDownload
From c2dc08ce7d41294cc52cc2820578e5a1668dddf7 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 22 Oct 2018 16:06:10 +1100
Subject: [PATCH 1/2] Further fixes and cleanup

1. Remove the old slot interface file and also update the Makefile.
2. CREATE AS USING method grammer support

This change was missed during earlier USING grammer support.

3. Remove the extra Tuple visibility check

In heapgettup_pagemode the tuple visiblity check is added
during the early devlopment of pluggable storage, but the visibility
check is already carried out in heapgetpage function itself.

regression fixes

1. scan start offset fix during analyze
2. Materialize the slot before they are processed using intorel_receive
3. ROW_MARK_COPY support by force store of heap tuple
4. partition prune extra heap page fix
---
 contrib/pg_visibility/pg_visibility.c     |  5 ++--
 src/backend/access/heap/heapam.c          | 28 +++++++++--------------
 src/backend/access/heap/heapam_handler.c  |  6 +++--
 src/backend/access/table/Makefile         |  2 +-
 src/backend/access/table/tableam_common.c |  0
 src/backend/commands/createas.c           |  4 ++++
 src/backend/executor/execExprInterp.c     |  3 +++
 src/backend/executor/execMain.c           | 13 ++---------
 src/backend/executor/execTuples.c         | 21 +++++++++++++++++
 src/backend/executor/nodeBitmapHeapscan.c | 12 ++++++++++
 src/backend/executor/nodeModifyTable.c    | 12 ++++++++--
 src/backend/parser/gram.y                 | 11 +++++----
 src/include/executor/tuptable.h           |  1 +
 src/include/nodes/primnodes.h             |  1 +
 14 files changed, 79 insertions(+), 40 deletions(-)
 delete mode 100644 src/backend/access/table/tableam_common.c

diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index dce5262e34..88ca4fd2af 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -563,12 +563,13 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
 
 	rel = relation_open(relid, AccessShareLock);
 
+	/* Only some relkinds have a visibility map */
+	check_relation_relkind(rel);
+
 	if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
 		ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 						errmsg("only heap AM is supported")));
 
-	/* Only some relkinds have a visibility map */
-	check_relation_relkind(rel);
 
 	nblocks = RelationGetNumberOfBlocks(rel);
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ec99d0bcae..ef6b4c3e54 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -947,31 +947,25 @@ heapgettup_pagemode(HeapScanDesc scan,
 			/*
 			 * if current tuple qualifies, return it.
 			 */
-			if (HeapTupleSatisfies(tuple, scan->rs_scan.rs_snapshot, scan->rs_cbuf))
+			if (key != NULL)
 			{
-				/*
-				 * if current tuple qualifies, return it.
-				 */
-				if (key != NULL)
-				{
-					bool		valid;
+				bool		valid;
 
-					HeapKeyTest(tuple, RelationGetDescr(scan->rs_scan.rs_rd),
-								nkeys, key, valid);
-					if (valid)
-					{
-						scan->rs_cindex = lineindex;
-						LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-						return;
-					}
-				}
-				else
+				HeapKeyTest(tuple, RelationGetDescr(scan->rs_scan.rs_rd),
+							nkeys, key, valid);
+				if (valid)
 				{
 					scan->rs_cindex = lineindex;
 					LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 					return;
 				}
 			}
+			else
+			{
+				scan->rs_cindex = lineindex;
+				LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+				return;
+			}
 
 			/*
 			 * otherwise move to the next item on the page
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c3960dc91f..3254e30a45 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1741,7 +1741,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
 {
 	HeapScanDesc scan = (HeapScanDesc) sscan;
 	Page		targpage;
-	OffsetNumber targoffset = scan->rs_cindex;
+	OffsetNumber targoffset;
 	OffsetNumber maxoffset;
 	BufferHeapTupleTableSlot *hslot;
 
@@ -1751,7 +1751,9 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
 	maxoffset = PageGetMaxOffsetNumber(targpage);
 
 	/* Inner loop over all tuples on the selected page */
-	for (targoffset = scan->rs_cindex; targoffset <= maxoffset; targoffset++)
+	for (targoffset = scan->rs_cindex ? scan->rs_cindex : FirstOffsetNumber;
+			targoffset <= maxoffset;
+			targoffset++)
 	{
 		ItemId		itemid;
 		HeapTuple	targtuple = &hslot->base.tupdata;
diff --git a/src/backend/access/table/Makefile b/src/backend/access/table/Makefile
index fe22bf9208..006ba99182 100644
--- a/src/backend/access/table/Makefile
+++ b/src/backend/access/table/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/access/table
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = tableam.o tableamapi.o tableam_common.o
+OBJS = tableam.o tableamapi.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/table/tableam_common.c b/src/backend/access/table/tableam_common.c
deleted file mode 100644
index e69de29bb2..0000000000
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 84de804175..d3ffe417ff 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -593,6 +593,10 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 	if (myState->rel->rd_rel->relhasoids)
 		slot->tts_tupleOid = InvalidOid;
 
+	/* Materialize the slot */
+	if (!TTS_IS_VIRTUAL(slot))
+		ExecMaterializeSlot(slot);
+
 	table_insert(myState->rel,
 				 slot,
 				 myState->output_cid,
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index ef94ac4aa0..12651f5ceb 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -570,6 +570,9 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 				Assert(TTS_IS_HEAPTUPLE(scanslot) ||
 					   TTS_IS_BUFFERTUPLE(scanslot));
 
+				if (hslot->tuple == NULL)
+					ExecMaterializeSlot(scanslot);
+
 				d = heap_getsysattr(hslot->tuple, attnum,
 									scanslot->tts_tupleDescriptor,
 									op->resnull);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index e055c0a7c6..34ef86a5bd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2594,7 +2594,7 @@ EvalPlanQual(EState *estate, EPQState *epqstate,
 	 * datums that may be present in copyTuple).  As with the next step, this
 	 * is to guard against early re-use of the EPQ query.
 	 */
-	if (!TupIsNull(slot))
+	if (!TupIsNull(slot) && !TTS_IS_VIRTUAL(slot))
 		ExecMaterializeSlot(slot);
 
 #if FIXME
@@ -2787,16 +2787,7 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
 			if (isNull)
 				continue;
 
-			elog(ERROR, "frak, need to implement ROW_MARK_COPY");
-#ifdef FIXME
-			// FIXME: this should just deform the tuple and store it as a
-			// virtual one.
-			tuple = table_tuple_by_datum(erm->relation, datum, erm->relid);
-
-			/* store tuple */
-			EvalPlanQualSetTuple(epqstate, erm->rti, tuple);
-#endif
-
+			ExecForceStoreHeapTupleDatum(datum, slot);
 		}
 	}
 }
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 917bf80f71..74149cc3ad 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -1364,6 +1364,27 @@ ExecStoreAllNullTuple(TupleTableSlot *slot)
 	return ExecFinishStoreSlotValues(slot);
 }
 
+void
+ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot)
+{
+	HeapTuple	tuple;
+	HeapTupleHeader td;
+
+	td = DatumGetHeapTupleHeader(data);
+
+	tuple = (HeapTuple) palloc(HEAPTUPLESIZE + HeapTupleHeaderGetDatumLength(td));
+	tuple->t_len = HeapTupleHeaderGetDatumLength(td);
+	tuple->t_self = td->t_ctid;
+	tuple->t_data = (HeapTupleHeader) ((char *) tuple + HEAPTUPLESIZE);
+	memcpy((char *) tuple->t_data, (char *) td, tuple->t_len);
+
+	ExecClearTuple(slot);
+
+	heap_deform_tuple(tuple, slot->tts_tupleDescriptor,
+					  slot->tts_values, slot->tts_isnull);
+	ExecFinishStoreSlotValues(slot);
+}
+
 /* --------------------------------
  *		ExecFetchSlotTuple
  *			Fetch the slot's regular physical tuple.
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 56880e3d16..36ca07beb2 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -224,6 +224,18 @@ BitmapHeapNext(BitmapHeapScanState *node)
 
 			BitmapAdjustPrefetchIterator(node, tbmres);
 
+			/*
+			 * Ignore any claimed entries past what we think is the end of the
+			 * relation.  (This is probably not necessary given that we got at
+			 * least AccessShareLock on the table before performing any of the
+			 * indexscans, but let's be safe.)
+			 */
+			if (tbmres->blockno >= scan->rs_nblocks)
+			{
+				node->tbmres = tbmres = NULL;
+				continue;
+			}
+
 			/*
 			 * We can skip fetching the heap page if we don't need any fields
 			 * from the heap, and the bitmap entries don't need rechecking,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 3cc9092413..307c12ee69 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -607,7 +607,7 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool canSetTag,
 		   bool changingPart,
 		   bool *tupleDeleted,
-		   TupleTableSlot **epqslot)
+		   TupleTableSlot **epqreturnslot)
 {
 	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
@@ -632,7 +632,7 @@ ExecDelete(ModifyTableState *mtstate,
 		bool		dodelete;
 
 		dodelete = ExecBRDeleteTriggers(estate, epqstate, resultRelInfo,
-										tupleid, oldtuple, epqslot);
+										tupleid, oldtuple, epqreturnslot);
 
 		if (!dodelete)			/* "do nothing" */
 			return NULL;
@@ -724,6 +724,14 @@ ldelete:;
 					/* Tuple no more passing quals, exiting... */
 					return NULL;
 				}
+
+				/**/
+				if (epqreturnslot)
+				{
+					*epqreturnslot = epqslot;
+					return NULL;
+				}
+
 				goto ldelete;
 			}
 		}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 54382aba88..f030ad25a2 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4037,7 +4037,6 @@ CreateStatsStmt:
  *
  *****************************************************************************/
 
-// PBORKED: storage option
 CreateAsStmt:
 		CREATE OptTemp TABLE create_as_target AS SelectStmt opt_with_data
 				{
@@ -4068,14 +4067,16 @@ CreateAsStmt:
 		;
 
 create_as_target:
-			qualified_name opt_column_list OptWith OnCommitOption OptTableSpace
+			qualified_name opt_column_list table_access_method_clause
+			OptWith OnCommitOption OptTableSpace
 				{
 					$$ = makeNode(IntoClause);
 					$$->rel = $1;
 					$$->colNames = $2;
-					$$->options = $3;
-					$$->onCommit = $4;
-					$$->tableSpaceName = $5;
+					$$->accessMethod = $3;
+					$$->options = $4;
+					$$->onCommit = $5;
+					$$->tableSpaceName = $6;
 					$$->viewQuery = NULL;
 					$$->skipData = false;		/* might get changed later */
 				}
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index 05f38cfd0d..20fc425a27 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -476,6 +476,7 @@ extern TupleTableSlot *ExecCopySlot(TupleTableSlot *dstslot,
 
 extern void ExecForceStoreHeapTuple(HeapTuple tuple,
 			   TupleTableSlot *slot);
+extern void ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot);
 
 extern void slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum);
 extern Datum slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull);
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 40f6eb03d2..4d194a8c2a 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -111,6 +111,7 @@ typedef struct IntoClause
 
 	RangeVar   *rel;			/* target relation name */
 	List	   *colNames;		/* column names to assign, or NIL */
+	char	   *accessMethod;	/* table access method */
 	List	   *options;		/* options from WITH clause */
 	OnCommitAction onCommit;	/* what do we do at COMMIT? */
 	char	   *tableSpaceName; /* table space to use, or NULL */
-- 
2.18.0.windows.1

#43Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Haribabu Kommi (#42)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Fri, 26 Oct 2018 at 13:25, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

Here I attached the cumulative patch with all fixes that are shared in earlier mails by me.
Except fast_default test, rest of test failures are fixed.

Hi,

If I understand correctly, these patches are for the branch "pluggable-storage"
in [1]https://github.com/anarazel/postgres-pluggable-storage (at least I couldn't apply them cleanly to "pluggable-zheap" branch),
right? I've tried to experiment a bit with the current status of the patch, and
accidentally stumbled upon what seems to be an issue - when I run pgbench
agains it with some significant number of clients and script [2]https://gist.github.com/erthalion/c85ba0e12146596d24c572234501e756:

$ pgbench -T 60 -c 128 -j 64 -f zipfian.sql

I've got for some client an error:

client 117 aborted in command 5 (SQL) of script 0; ERROR:
unrecognized heap_update status: 1

This problem couldn't be reproduced on the master branch, so I've tried to
investigate it. It comes from nodeModifyTable.c:1267, when we've got
HeapTupleInvisible as a result, and this value in turn comes from
table_lock_tuple. Everything points to the new way of handling HeapTupleUpdated
result from heap_update, when table_lock_tuple call was introduced. Since I
don't see anything similar in the master branch, can anyone clarify why is this
lock necessary here? Out of curiosity I've rearranged the code, that handles
HeapTupleUpdated, back to switch and removed table_lock_tuple (see the attached
patch, it can be applied on top of the lastest two patches posted by Haribabu)
and it seems to solve the issue.

[1]: https://github.com/anarazel/postgres-pluggable-storage
[2]: https://gist.github.com/erthalion/c85ba0e12146596d24c572234501e756

Attachments:

unrecognized_heap_status.patchapplication/octet-stream; name=unrecognized_heap_status.patchDownload
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 307c12ee69..074593b1cc 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1170,42 +1170,6 @@ lreplace:;
 							  true /* wait for commit */,
 							  &hufd, &lockmode, &update_indexes);
 
-		if (result == HeapTupleUpdated && !IsolationUsesXactSnapshot())
-		{
-			TupleTableSlot *inputslot;
-
-			EvalPlanQualBegin(epqstate, estate);
-
-			inputslot = EvalPlanQualSlot(epqstate, resultRelationDesc, resultRelInfo->ri_RangeTableIndex);
-			ExecCopySlot(inputslot, slot);
-
-			result = table_lock_tuple(resultRelationDesc, tupleid,
-									  estate->es_snapshot,
-									  inputslot, estate->es_output_cid,
-									  lockmode, LockWaitBlock,
-									  TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
-									  &hufd);
-			/* hari FIXME*/
-			/*Assert(result != HeapTupleUpdated && hufd.traversed);*/
-			if (result == HeapTupleMayBeUpdated)
-			{
-				TupleTableSlot *epqslot;
-
-				epqslot = EvalPlanQual(estate,
-									   epqstate,
-									   resultRelationDesc,
-									   resultRelInfo->ri_RangeTableIndex,
-									   inputslot);
-				if (TupIsNull(epqslot))
-				{
-					/* Tuple no more passing quals, exiting... */
-					return NULL;
-				}
-				slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
-				goto lreplace;
-			}
-		}
-
 		switch (result)
 		{
 			case HeapTupleSelfUpdated:
@@ -1250,10 +1214,37 @@ lreplace:;
 					ereport(ERROR,
 							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 							 errmsg("could not serialize access due to concurrent update")));
-				else
-					/* shouldn't get there */
-					elog(ERROR, "wrong heap_delete status: %u", result);
-				break;
+
+				if (ItemPointerIndicatesMovedPartitions(&hufd.ctid))
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("tuple to be updated was already moved to another partition due to concurrent update")));
+
+				if (!ItemPointerEquals(tupleid, &hufd.ctid))
+				{
+					TupleTableSlot *epqslot;
+					TupleTableSlot *inputslot;
+
+					EvalPlanQualBegin(epqstate, estate);
+
+					inputslot = EvalPlanQualSlot(epqstate, resultRelationDesc, resultRelInfo->ri_RangeTableIndex);
+					ExecCopySlot(inputslot, slot);
+
+					epqslot = EvalPlanQual(estate,
+										   epqstate,
+										   resultRelationDesc,
+										   resultRelInfo->ri_RangeTableIndex,
+										   inputslot);
+					if (!TupIsNull(epqslot))
+					{
+						*tupleid = hufd.ctid;
+						slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+						slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+						goto lreplace;
+					}
+				}
+				/* tuple already deleted; nothing to do */
+				return NULL;
 
 			case HeapTupleDeleted:
 				if (IsolationUsesXactSnapshot())
#44Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Dmitry Dolgov (#43)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Mon, Oct 29, 2018 at 7:40 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Fri, 26 Oct 2018 at 13:25, Haribabu Kommi <kommi.haribabu@gmail.com>

wrote:

Here I attached the cumulative patch with all fixes that are shared in

earlier mails by me.

Except fast_default test, rest of test failures are fixed.

Hi,

If I understand correctly, these patches are for the branch
"pluggable-storage"
in [1] (at least I couldn't apply them cleanly to "pluggable-zheap"
branch),
right?

Yes, the patches attached are for pluggable-storage branch.

I've tried to experiment a bit with the current status of the patch, and
accidentally stumbled upon what seems to be an issue - when I run pgbench
agains it with some significant number of clients and script [2]:

$ pgbench -T 60 -c 128 -j 64 -f zipfian.sql

Thanks for testing the patches.

I've got for some client an error:

client 117 aborted in command 5 (SQL) of script 0; ERROR:
unrecognized heap_update status: 1

This error is for the tuple state of HeapTupleInvisible, As per the comments
in heap_lock_tuple, this is possible in ON CONFLICT UPDATE, but because
of reorganizing of the table_lock_tuple out of EvalPlanQual(), the invisible
error is returned in other cases also. This case is missed in the new code.

This problem couldn't be reproduced on the master branch, so I've tried to
investigate it. It comes from nodeModifyTable.c:1267, when we've got
HeapTupleInvisible as a result, and this value in turn comes from
table_lock_tuple. Everything points to the new way of handling
HeapTupleUpdated
result from heap_update, when table_lock_tuple call was introduced. Since I
don't see anything similar in the master branch, can anyone clarify why is
this
lock necessary here?

In the master branch code also, there is a tuple lock that is happening in
EvalPlanQual() function, but pluggable-storage code, the lock is kept
outside
and also function call rearrangements, to make it easier for the table
access
methods to provide their own MVCC implementation.

Out of curiosity I've rearranged the code, that handles
HeapTupleUpdated, back to switch and removed table_lock_tuple (see the
attached
patch, it can be applied on top of the lastest two patches posted by
Haribabu)
and it seems to solve the issue.

Thanks for the patch. I didn't reproduce the problem, based on the error
from
your mail, the attached draft patch of handling of invisible tuples in
update and
delete cases should also fix it.

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0001-Handling-HeapTupleInvisible-case.patchapplication/octet-stream; name=0001-Handling-HeapTupleInvisible-case.patchDownload
From c34faa5ad6af9ef4562feabd6bb4d361fe813945 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 29 Oct 2018 15:53:30 +1100
Subject: [PATCH] Handling HeapTupleInvisible case

In update/delete scenarios, when the tuple is
concurrently updated/deleted, sometimes locking
of a tuple may return HeapTupleInvisible. Handle
that case as nothing to do.
---
 src/backend/executor/nodeModifyTable.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 307c12ee69..b3851b180d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -734,6 +734,11 @@ ldelete:;
 
 				goto ldelete;
 			}
+			else if (result == HeapTupleInvisible)
+			{
+				/* tuple is not visible; nothing to do */
+				return NULL;
+			}
 		}
 
 		switch (result)
@@ -1204,6 +1209,11 @@ lreplace:;
 				slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
 				goto lreplace;
 			}
+			else if (result == HeapTupleInvisible)
+			{
+				/* tuple is not visible; nothing to do */
+				return NULL;
+			}
 		}
 
 		switch (result)
-- 
2.18.0.windows.1

#45Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Haribabu Kommi (#44)
Re: Pluggable Storage - Andres's take

On Mon, 29 Oct 2018 at 05:56, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

This problem couldn't be reproduced on the master branch, so I've tried to
investigate it. It comes from nodeModifyTable.c:1267, when we've got
HeapTupleInvisible as a result, and this value in turn comes from
table_lock_tuple. Everything points to the new way of handling HeapTupleUpdated
result from heap_update, when table_lock_tuple call was introduced. Since I
don't see anything similar in the master branch, can anyone clarify why is this
lock necessary here?

In the master branch code also, there is a tuple lock that is happening in
EvalPlanQual() function, but pluggable-storage code, the lock is kept outside
and also function call rearrangements, to make it easier for the table access
methods to provide their own MVCC implementation.

Yes, now I see it, thanks. Also I can confirm that the attached patch solves
this issue.

FYI, alongside with reviewing the code changes I've ran few performance tests
(that's why I hit this issue with pgbench in the first place). In case of high
concurrecy so far I see small performance degradation in comparison with the
master branch (about 2-5% of average latency, depending on the level of
concurrency), but can't really say why exactly (perf just shows barely
noticeable overhead there and there, maybe what I see is actually a cumulative
impact).

#46Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Dmitry Dolgov (#45)
3 attachment(s)
Re: Pluggable Storage - Andres's take

On Wed, Oct 31, 2018 at 9:34 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Mon, 29 Oct 2018 at 05:56, Haribabu Kommi <kommi.haribabu@gmail.com>

wrote:

This problem couldn't be reproduced on the master branch, so I've tried

to

investigate it. It comes from nodeModifyTable.c:1267, when we've got
HeapTupleInvisible as a result, and this value in turn comes from
table_lock_tuple. Everything points to the new way of handling

HeapTupleUpdated

result from heap_update, when table_lock_tuple call was introduced.

Since I

don't see anything similar in the master branch, can anyone clarify why

is this

lock necessary here?

In the master branch code also, there is a tuple lock that is happening

in

EvalPlanQual() function, but pluggable-storage code, the lock is kept

outside

and also function call rearrangements, to make it easier for the table

access

methods to provide their own MVCC implementation.

Yes, now I see it, thanks. Also I can confirm that the attached patch
solves
this issue.

Thanks for the testing and confirmation.

FYI, alongside with reviewing the code changes I've ran few performance
tests
(that's why I hit this issue with pgbench in the first place). In case of
high
concurrecy so far I see small performance degradation in comparison with
the
master branch (about 2-5% of average latency, depending on the level of
concurrency), but can't really say why exactly (perf just shows barely
noticeable overhead there and there, maybe what I see is actually a
cumulative
impact).

Thanks for sharing your observation, I will also analyze and try to find
out performance
bottlenecks that are causing the overhead.

Here I attached the cumulative fixes of the patches, new API additions for
zheap and
basic outline of the documentation.

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0003-First-draft-of-pluggable-storage-documentation.patchapplication/octet-stream; name=0003-First-draft-of-pluggable-storage-documentation.patchDownload
From 2a530ea1306c291a40fff4042a0b1a5755dcefc9 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 1 Nov 2018 12:00:10 +1100
Subject: [PATCH 3/3] First draft of pluggable-storage documentation

---
 doc/src/sgml/{indexam.sgml => am.sgml}     | 590 ++++++++++++++++++++-
 doc/src/sgml/catalogs.sgml                 |   5 +-
 doc/src/sgml/config.sgml                   |  24 +
 doc/src/sgml/filelist.sgml                 |   2 +-
 doc/src/sgml/postgres.sgml                 |   2 +-
 doc/src/sgml/ref/create_access_method.sgml |  12 +-
 doc/src/sgml/ref/create_table.sgml         |  18 +-
 doc/src/sgml/ref/create_table_as.sgml      |  14 +
 doc/src/sgml/release-9.6.sgml              |   2 +-
 doc/src/sgml/xindex.sgml                   |   2 +-
 10 files changed, 640 insertions(+), 31 deletions(-)
 rename doc/src/sgml/{indexam.sgml => am.sgml} (78%)

diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/am.sgml
similarity index 78%
rename from doc/src/sgml/indexam.sgml
rename to doc/src/sgml/am.sgml
index beb99d1831..dc13bc1073 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/am.sgml
@@ -1,16 +1,20 @@
-<!-- doc/src/sgml/indexam.sgml -->
+<!-- doc/src/sgml/am.sgml -->
 
-<chapter id="indexam">
- <title>Index Access Method Interface Definition</title>
+<chapter id="am">
+ <title>Access Method Interface Definition</title>
 
   <para>
    This chapter defines the interface between the core
-   <productname>PostgreSQL</productname> system and <firstterm>index access
-   methods</firstterm>, which manage individual index types.  The core system
-   knows nothing about indexes beyond what is specified here, so it is
-   possible to develop entirely new index types by writing add-on code.
+   <productname>PostgreSQL</productname> system and <firstterm>access
+   methods</firstterm>, which manage individual <literal>INDEX</literal> 
+   and <literal>TABLE</literal> types.  The core system knows nothing
+   about these access methods beyond what is specified here, so it is
+   possible to develop entirely new access method types by writing add-on code.
   </para>
-
+ 
+ <sect1 id="index-access-methods">
+  <title>Overview of Index access methods</title>
+  
   <para>
    All indexes in <productname>PostgreSQL</productname> are what are known
    technically as <firstterm>secondary indexes</firstterm>; that is, the index is
@@ -42,8 +46,8 @@
    dead tuples are reclaimed (by vacuuming) when the dead tuples themselves
    are reclaimed.
   </para>
-
- <sect1 id="index-api">
+  
+ <sect2 id="index-api">
   <title>Basic API Structure for Indexes</title>
 
   <para>
@@ -217,9 +221,9 @@ typedef struct IndexAmRoutine
    conditions.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-functions">
+ <sect2 id="index-functions">
   <title>Index Access Method Functions</title>
 
   <para>
@@ -709,9 +713,11 @@ amparallelrescan (IndexScanDesc scan);
    the beginning.
   </para>
 
- </sect1>
+ </sect2>
+ 
+ 
 
- <sect1 id="index-scanning">
+ <sect2 id="index-scanning">
   <title>Index Scanning</title>
 
   <para>
@@ -864,9 +870,9 @@ amparallelrescan (IndexScanDesc scan);
    if its internal implementation is unsuited to one API or the other.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-locking">
+ <sect2 id="index-locking">
   <title>Index Locking Considerations</title>
 
   <para>
@@ -978,9 +984,9 @@ amparallelrescan (IndexScanDesc scan);
    reduce the frequency of such transaction cancellations.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-unique-checks">
+ <sect2 id="index-unique-checks">
   <title>Index Uniqueness Checks</title>
 
   <para>
@@ -1127,9 +1133,9 @@ amparallelrescan (IndexScanDesc scan);
     </itemizedlist>
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-cost-estimation">
+ <sect2 id="index-cost-estimation">
   <title>Index Cost Estimation Functions</title>
 
   <para>
@@ -1376,5 +1382,549 @@ cost_qual_eval(&amp;index_qual_cost, path-&gt;indexquals, root);
    Examples of cost estimator functions can be found in
    <filename>src/backend/utils/adt/selfuncs.c</filename>.
   </para>
+ </sect2>
  </sect1>
+ 
+ <sect1 id="table-access-methods">
+  <title>Overview of Table access methods </title>
+  
+  <para>
+   All Tables in <productname>PostgreSQL</productname> are the primary data store.
+   Each table is stored as its own physical <firstterm>relation</firstterm> and so
+   is described by an entry in the <structname>pg_class</structname> catalog.
+   The contents of an table are entirely under the control of its access method.
+   (All the access methods furthermore use the standard page layout described in
+   <xref linkend="storage-page-layout"/>.)
+  </para>
+  
+ <sect2 id="table-api">
+  <title>Table access method API</title>
+  
+  <para>
+   Each table access method is described by a row in the
+   <link linkend="catalog-pg-am"><structname>pg_am</structname></link>
+   system catalog.  The <structname>pg_am</structname> entry
+   specifies a name and a <firstterm>handler function</firstterm> for the access
+   method.  These entries can be created and deleted using the
+   <xref linkend="sql-create-access-method"/> and
+   <xref linkend="sql-drop-access-method"/> SQL commands.
+  </para>
+
+  <para>
+   A table access method handler function must be declared to accept a
+   single argument of type <type>internal</type> and to return the
+   pseudo-type <type>table_am_handler</type>.  The argument is a dummy value that
+   simply serves to prevent handler functions from being called directly from
+   SQL commands.  The result of the function must be a palloc'd struct of
+   type <structname>TableAmRoutine</structname>, which contains everything
+   that the core code needs to know to make use of the table access method.
+   The <structname>TableAmRoutine</structname> struct, also called the access
+   method's <firstterm>API struct</firstterm>, includes fields specifying assorted
+   fixed properties of the access method, such as whether it can support
+   bitmap scans.  More importantly, it contains pointers to support
+   functions for the access method, which do all of the real work to access
+   tables.  These support functions are plain C functions and are not
+   visible or callable at the SQL level.  The support functions are described
+   in <xref linkend="table-functions"/>.
+  </para>
+
+  <para>
+   The structure <structname>TableAmRoutine</structname> is defined thus:
+<programlisting>
+typedef struct TableAmRoutine
+{
+    NodeTag     type;
+
+    SlotCallbacks_function slot_callbacks;
+
+    SnapshotSatisfies_function snapshot_satisfies;
+    SnapshotSatisfiesUpdate_function snapshot_satisfiesUpdate;
+    SnapshotSatisfiesVacuum_function snapshot_satisfiesVacuum;
+
+    /* Operations on physical tuples */
+    TupleInsert_function tuple_insert;
+    TupleInsertSpeculative_function tuple_insert_speculative;
+    TupleCompleteSpeculative_function tuple_complete_speculative;
+    TupleUpdate_function tuple_update;
+    TupleDelete_function tuple_delete;
+    TupleFetchRowVersion_function tuple_fetch_row_version;
+    TupleLock_function tuple_lock;
+    MultiInsert_function multi_insert;
+    TupleGetLatestTid_function tuple_get_latest_tid;
+    TupleFetchFollow_function tuple_fetch_follow;
+
+    GetTupleData_function get_tuple_data;
+
+    RelationVacuum_function relation_vacuum;
+    RelationScanAnalyzeNextBlock_function scan_analyze_next_block;
+    RelationScanAnalyzeNextTuple_function scan_analyze_next_tuple;
+    RelationCopyForCluster_function relation_copy_for_cluster;
+    RelationSync_function relation_sync;
+
+    /* Operations on relation scans */
+    ScanBegin_function scan_begin;
+    ScanSetlimits_function scansetlimits;
+    ScanGetnextSlot_function scan_getnextslot;
+
+    BitmapPagescan_function scan_bitmap_pagescan;
+    BitmapPagescanNext_function scan_bitmap_pagescan_next;
+
+    SampleScanNextBlock_function scan_sample_next_block;
+    SampleScanNextTuple_function scan_sample_next_tuple;
+
+    ScanEnd_function scan_end;
+    ScanRescan_function scan_rescan;
+    ScanUpdateSnapshot_function scan_update_snapshot;
+
+    BeginIndexFetchTable_function begin_index_fetch;
+    EndIndexFetchTable_function reset_index_fetch;
+    EndIndexFetchTable_function end_index_fetch;
+
+
+    IndexBuildRangeScan_function index_build_range_scan;
+    IndexValidateScan_function index_validate_scan;
+
+    CreateInitFork_function CreateInitFork;
+}           TableAmRoutine;
+</programlisting>
+  </para>
+  
+  <para>
+   An individual table is defined by a
+   <link linkend="catalog-pg-class"><structname>pg_class</structname></link>
+   entry that describes it as a physical relation.
+  </para>
+   
+ </sect2>
+ 
+ <sect2 id="table-functions">
+  <title>Table Access Method Functions</title>
+
+  <para>
+   The table construction and maintenance functions that an table access
+   method must provide in <structname>TableAmRoutine</structname> are:
+  </para>
+
+  <para>
+<programlisting>
+TupleTableSlotOps *
+slot_callbacks (Relation relation);
+</programlisting>
+   API to access the slot specific methods;
+   Following methods are available; 
+   <structname>TTSOpsVirtual</structname>,
+   <structname>TTSOpsHeapTuple</structname>,
+   <structname>TTSOpsMinimalTuple</structname>,
+   <structname>TTSOpsBufferTuple</structname>,
+  </para>
+  
+  <para>
+<programlisting>
+bool
+snapshot_satisfies (TupleTableSlot *slot, Snapshot snapshot);
+</programlisting>
+   API to check whether the provided slot is visible to the current
+   transaction according the snapshot.
+  </para>
+ 
+  <para>
+<programlisting>
+Oid
+tuple_insert (Relation rel, TupleTableSlot *slot, CommandId cid,
+              int options, BulkInsertState bistate);
+</programlisting>
+   API to insert the tuple and provide the <literal>ItemPointerData</literal>
+   where the tuple is successfully inserted.
+  </para>
+  
+  <para>
+<programlisting>
+Oid
+tuple_insert_speculative (Relation rel,
+                         TupleTableSlot *slot,
+                         CommandId cid,
+                         int options,
+                         BulkInsertState bistate,
+                         uint32 specToken);
+</programlisting>
+   API to insert the tuple with a speculative token. This API is similar
+   like <literal>tuple_insert</literal>, with additional speculative
+   information.
+  </para>
+  
+  <para>
+<programlisting>
+void
+tuple_complete_speculative (Relation rel,
+                          TupleTableSlot *slot,
+                          uint32 specToken,
+                          bool succeeded);
+</programlisting>
+   API to complete the state of the tuple inserted by the API <literal>tuple_insert_speculative</literal>
+   with the successful completion of the index insert.
+  </para>
+  
+  
+  <para>
+<programlisting>
+HTSU_Result
+tuple_update (Relation relation,
+             ItemPointer otid,
+             TupleTableSlot *slot,
+             CommandId cid,
+             Snapshot crosscheck,
+             bool wait,
+             HeapUpdateFailureData *hufd,
+             LockTupleMode *lockmode,
+             bool *update_indexes);
+</programlisting>
+   API to update the existing tuple with new data.
+  </para>
+  
+  
+  <para>
+<programlisting>
+HTSU_Result
+tuple_delete (Relation relation,
+             ItemPointer tid,
+             CommandId cid,
+             Snapshot crosscheck,
+             bool wait,
+             HeapUpdateFailureData *hufd,
+             bool changingPart);
+</programlisting>
+   API to delete the existing tuple.
+  </para>
+  
+  
+  <para>
+<programlisting>
+bool
+tuple_fetch_row_version (Relation relation,
+                       ItemPointer tid,
+                       Snapshot snapshot,
+                       TupleTableSlot *slot,
+                       Relation stats_relation);
+</programlisting>
+   API to fetch and store the Buffered Heap tuple in the provided slot
+   based on the ItemPointer.
+  </para>
+  
+  
+  <para>
+<programlisting>
+HTSU_Result
+TupleLock_function (Relation relation,
+                   ItemPointer tid,
+                   Snapshot snapshot,
+                   TupleTableSlot *slot,
+                   CommandId cid,
+                   LockTupleMode mode,
+                   LockWaitPolicy wait_policy,
+                   uint8 flags,
+                   HeapUpdateFailureData *hufd);
+</programlisting>
+   API to lock the specified the ItemPointer tuple and fetches the newest version of
+   its tuple and TID.
+  </para>
+  
+  
+  <para>
+<programlisting>
+void
+multi_insert (Relation relation, TupleTableSlot **slots, int nslots,
+              CommandId cid, int options, BulkInsertState bistate);
+</programlisting>
+   API to insert multiple tuples at a time into the relation.
+  </para>
+  
+  
+  <para>
+<programlisting>
+void
+tuple_get_latest_tid (Relation relation,
+                    Snapshot snapshot,
+                    ItemPointer tid);
+</programlisting>
+   API to get the the latest TID of the tuple with the given itempointer.
+  </para>
+  
+  
+  <para>
+<programlisting>
+bool
+tuple_fetch_follow (struct IndexFetchTableData *scan,
+                  ItemPointer tid,
+                  Snapshot snapshot,
+                  TupleTableSlot *slot,
+                  bool *call_again, bool *all_dead);
+</programlisting>
+   API to get the all the tuples of the page that satisfies itempointer.
+  </para>
+  
+  
+  <para>
+<programlisting>
+tuple_data
+get_tuple_data (TupleTableSlot *slot, tuple_data_flags flags);
+</programlisting>
+   API to return the internal structure members of the HeapTuple.
+  </para>
+  
+  
+  <para>
+<programlisting>
+void
+relation_vacuum (Relation onerel, int options,
+                struct VacuumParams *params, BufferAccessStrategy bstrategy);
+</programlisting>
+   API to perform vacuum for one heap relation.
+  </para>
+  
+  
+  <para>
+<programlisting>
+void
+scan_analyze_next_block (TableScanDesc scan, BlockNumber blockno,
+                      BufferAccessStrategy bstrategy);
+</programlisting>
+   API to fill the scan descriptor with the buffer of the specified block.
+  </para>
+  
+  
+  <para>
+<programlisting>
+bool
+scan_analyze_next_tuple (TableScanDesc scan, TransactionId OldestXmin,
+                      double *liverows, double *deadrows, TupleTableSlot *slot));
+</programlisting>
+   API to analyze the block and fill the buffered heap tuple in the slot and also
+   provide the live and dead rows.
+  </para>
+  
+  
+  <para>
+<programlisting>
+void
+relation_copy_for_cluster (Relation NewHeap, Relation OldHeap, Relation OldIndex,
+                       bool use_sort,
+                       TransactionId OldestXmin, TransactionId FreezeXid, MultiXactId MultiXactCutoff,
+                       double *num_tuples, double *tups_vacuumed, double *tups_recently_dead);
+</programlisting>
+   API to copy one relation to another relation eith using the Index or table scan.
+  </para>
+  
+  
+  <para>
+<programlisting>
+void
+relation_sync (Relation relation);
+</programlisting>
+   API to sync the relation to disk, useful for the cases where no WAL is written.
+  </para>
+  
+  
+  <para>
+<programlisting>
+TableScanDesc
+scan_begin (Relation relation,
+            Snapshot snapshot,
+            int nkeys, ScanKey key,
+            ParallelTableScanDesc parallel_scan,
+            bool allow_strat,
+            bool allow_sync,
+            bool allow_pagemode,
+            bool is_bitmapscan,
+            bool is_samplescan,
+            bool temp_snap);
+</programlisting>
+   API to start the relation scan for the provided relation and returns the
+   <structname>TableScanDesc</structname> structure.
+  </para>
+  
+  
+    <para>
+<programlisting>
+void
+scansetlimits (TableScanDesc sscan, BlockNumber startBlk, BlockNumber numBlks);
+</programlisting>
+   API to fix the relation scan range limits.
+  </para>
+  
+  
+    <para>
+<programlisting>
+TupleTableSlot *
+scan_getnextslot (TableScanDesc scan,
+                 ScanDirection direction, TupleTableSlot *slot);
+</programlisting>
+   API to fill the next visible tuple from the relation scan in the provided slot
+   and return it.
+  </para>
+  
+  
+    <para>
+<programlisting>
+bool
+scan_bitmap_pagescan (TableScanDesc scan,
+                    TBMIterateResult *tbmres);
+</programlisting>
+   API to scan the relation and fill the scan description bitmap with valid item pointers
+   for the specified block.
+  </para>
+  
+  
+    <para>
+<programlisting>
+bool
+scan_bitmap_pagescan_next (TableScanDesc scan,
+                        TupleTableSlot *slot);
+</programlisting>
+   API to fill the buffered heap tuple data from the bitmap scanned item pointers and store
+   it in the provided slot.
+  </para>
+  
+  
+    <para>
+<programlisting>
+bool
+scan_sample_next_block (TableScanDesc scan, struct SampleScanState *scanstate);
+</programlisting>
+   API to scan the relation and fill the scan description bitmap with valid item pointers
+   for the specified block provided by the sample method.
+  </para>
+  
+  
+    <para>
+<programlisting>
+bool
+scan_sample_next_tuple (TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot);
+</programlisting>
+   API to fill the buffered heap tuple data from the bitmap scanned item pointers based on the sample
+   method and store it in the provided slot.
+  </para>
+  
+  
+    <para>
+<programlisting>
+void
+scan_end (TableScanDesc scan);
+</programlisting>
+   API to end the relation scan.
+  </para>
+  
+  
+    <para>
+<programlisting>
+void
+scan_rescan (TableScanDesc scan, ScanKey key, bool set_params,
+             bool allow_strat, bool allow_sync, bool allow_pagemode);
+</programlisting>
+   API to restart the relation scan with provided data.
+  </para>
+  
+  
+  <para>
+<programlisting>
+void
+scan_update_snapshot (TableScanDesc scan, Snapshot snapshot);
+</programlisting>
+   API to update the relation scan with the new snapshot.
+  </para>
+  
+  <para>
+<programlisting>
+IndexFetchTableData *
+begin_index_fetch (Relation relation);
+</programlisting>
+   API to prepare the <structname>IndexFetchTableData</structname> for the relation.
+  </para>
+  
+  <para>
+<programlisting>
+void
+reset_index_fetch (struct IndexFetchTableData* data);
+</programlisting>
+   API to reset the prepared internal members of the <structname>IndexFetchTableData</structname>.
+  </para>
+  
+  <para>
+<programlisting>
+void
+end_index_fetch (struct IndexFetchTableData* data);
+</programlisting>
+   API to clear and free the <structname>IndexFetchTableData</structname>.
+  </para>
+  
+    <para>
+<programlisting>
+double
+index_build_range_scan (Relation heapRelation,
+                       Relation indexRelation,
+                       IndexInfo *indexInfo,
+                       bool allow_sync,
+                       bool anyvisible,
+                       BlockNumber start_blockno,
+                       BlockNumber end_blockno,
+                       IndexBuildCallback callback,
+                       void *callback_state,
+                       TableScanDesc scan);
+</programlisting>
+   API to perform the table scan with bounded range specified by the caller
+   and insert the satisfied records into the index using the provided callback
+   function pointer.
+  </para>
+  
+    <para>
+<programlisting>
+void
+index_validate_scan (Relation heapRelation,
+                   Relation indexRelation,
+                   IndexInfo *indexInfo,
+                   Snapshot snapshot,
+                   struct ValidateIndexState *state);
+</programlisting>
+   API to perform the table scan and insert the satisfied records into the index.
+   This API is similar like <function>index_build_range_scan</function>. This 
+   is used in the scenario of concurrent index build.
+  </para>
+  
+ </sect2>
+ 
+ <sect2>
+  <title>Table scanning</title>
+  
+  <para>
+  </para>
+ </sect2>
+ 
+ <sect2>
+  <title>Table insert/update/delete</title>
+
+  <para>
+  </para>
+  </sect2>
+ 
+ <sect2>
+  <title>Table locking</title>
+
+  <para>
+  </para>
+  </sect2>
+ 
+ <sect2>
+  <title>Table vacuum</title>
+
+  <para>
+  </para>
+ </sect2>
+ 
+ <sect2>
+  <title>Table fetch</title>
+
+  <para>
+  </para>
+ </sect2>
+ 
+ </sect1> 
 </chapter>
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 0179deea2e..f0c8037bbc 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -587,8 +587,9 @@
    The catalog <structname>pg_am</structname> stores information about
    relation access methods.  There is one row for each access method supported
    by the system.
-   Currently, only indexes have access methods.  The requirements for index
-   access methods are discussed in detail in <xref linkend="indexam"/>.
+   Currently, only <literal>INDEX</literal> and <literal>TABLE</literal> have
+   access methods.  The requirements for access methods are discussed in detail
+   in <xref linkend="am"/>.
   </para>
 
   <table>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f11b8f724c..8765d7c57c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6585,6 +6585,30 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-default-table-access-method" xreflabel="default_table_access_method">
+      <term><varname>default_table_access_method</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>default_table_access_method</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        This variable specifies the default table access method using which to create
+        objects (tables and materialized views) when a <command>CREATE</command> command does
+        not explicitly specify a access method.
+       </para>
+
+       <para>
+        The value is either the name of a table access method, or an empty string
+        to specify using the default table access method of the current database.
+        If the value does not match the name of any existing table access methods,
+        <productname>PostgreSQL</productname> will automatically use the default
+        table access method of the current database.
+       </para>
+
+      </listitem>
+     </varlistentry>
+     
      <varlistentry id="guc-default-tablespace" xreflabel="default_tablespace">
       <term><varname>default_tablespace</varname> (<type>string</type>)
       <indexterm>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 48ac14a838..99a6496502 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -90,7 +90,7 @@
 <!ENTITY gin        SYSTEM "gin.sgml">
 <!ENTITY brin       SYSTEM "brin.sgml">
 <!ENTITY planstats    SYSTEM "planstats.sgml">
-<!ENTITY indexam    SYSTEM "indexam.sgml">
+<!ENTITY am         SYSTEM "am.sgml">
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 0070603fc3..3e66ae9c8a 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -251,7 +251,7 @@
   &tablesample-method;
   &custom-scan;
   &geqo;
-  &indexam;
+  &am;
   &generic-wal;
   &btree;
   &gist;
diff --git a/doc/src/sgml/ref/create_access_method.sgml b/doc/src/sgml/ref/create_access_method.sgml
index 851c5e63be..256914022a 100644
--- a/doc/src/sgml/ref/create_access_method.sgml
+++ b/doc/src/sgml/ref/create_access_method.sgml
@@ -61,7 +61,8 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
     <listitem>
      <para>
       This clause specifies the type of access method to define.
-      Only <literal>INDEX</literal> is supported at present.
+      Only <literal>INDEX</literal> and <literal>TABLE</literal>
+      are supported at present.
      </para>
     </listitem>
    </varlistentry>
@@ -76,9 +77,12 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
       declared to take a single argument of type <type>internal</type>,
       and its return type depends on the type of access method;
       for <literal>INDEX</literal> access methods, it must
-      be <type>index_am_handler</type>.  The C-level API that the handler
-      function must implement varies depending on the type of access method.
-      The index access method API is described in <xref linkend="indexam"/>.
+      be <type>index_am_handler</type> and for <literal>TABLE</literal>
+      access methods, it must be <type>table_am_handler</type>.
+      The C-level API that the handler function must implement varies
+      depending on the type of access method. The index access method API
+      is described in <xref linkend="index-access-methods"/> and the table access method
+      API is described in <xref linkend="table-access-methods"/>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 10428f8ff0..87e0f01ab2 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -29,6 +29,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 ] )
 [ INHERITS ( <replaceable>parent_table</replaceable> [, ... ] ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -40,6 +41,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     [, ... ]
 ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -51,6 +53,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     [, ... ]
 ) ] { FOR VALUES <replaceable class="parameter">partition_bound_spec</replaceable> | DEFAULT }
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -955,7 +958,7 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
 
      <para>
       The access method must support <literal>amgettuple</literal> (see <xref
-      linkend="indexam"/>); at present this means <acronym>GIN</acronym>
+      linkend="index-access-methods"/>); at present this means <acronym>GIN</acronym>
       cannot be used.  Although it's allowed, there is little point in using
       B-tree or hash indexes with an exclusion constraint, because this
       does nothing that an ordinary unique constraint doesn't do better.
@@ -1138,6 +1141,19 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional access method for the new table;
+      see <xref linkend="table-access-methods"/> for more information.
+      If this option is not specified, then the default table access method
+      is chosen for the new table. see <xref linkend="guc-default-table-access-method"/>
+      for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 527138e787..2acf52d2f5 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 <synopsis>
 CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     [ (<replaceable>column_name</replaceable> [, ...] ) ]
+    [ USING <replaceable class="parameter">method</replaceable> ]
     [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
     [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
     [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -120,6 +121,19 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional access method for the new table;
+      see <xref linkend="table-access-methods"/> for more information.
+      If this option is not specified, then the default table access method
+      is chosen for the new table. see <xref linkend="guc-default-table-access-method"/>
+      for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
diff --git a/doc/src/sgml/release-9.6.sgml b/doc/src/sgml/release-9.6.sgml
index acb6a88b31..68c79db4b5 100644
--- a/doc/src/sgml/release-9.6.sgml
+++ b/doc/src/sgml/release-9.6.sgml
@@ -10081,7 +10081,7 @@ This commit is also listed under libpq and PL/pgSQL
 2016-08-13 [ed0097e4f] Add SQL-accessible functions for inspecting index AM pro
 -->
        <para>
-        Restructure <link linkend="indexam">index access
+        Restructure <link linkend="index-access-methods">index access
         method <acronym>API</acronym></link> to hide most of it at
         the <application>C</application> level (Alexander Korotkov, Andrew Gierth)
        </para>
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml
index 9446f8b836..4fa821160c 100644
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@@ -36,7 +36,7 @@
    described in <classname>pg_am</classname>.  It is possible to add a
    new index access method by writing the necessary code and
    then creating an entry in <classname>pg_am</classname> &mdash; but that is
-   beyond the scope of this chapter (see <xref linkend="indexam"/>).
+   beyond the scope of this chapter (see <xref linkend="am"/>).
   </para>
 
   <para>
-- 
2.18.0.windows.1

0002-New-API-s-are-added.patchapplication/octet-stream; name=0002-New-API-s-are-added.patchDownload
From caf9356688a722c2246969e2f1421bf25c7c882e Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 22 Oct 2018 17:58:27 +1100
Subject: [PATCH 2/3] New API's are added

1. init fork API
2. set New filenode
3. Estimate rel size

Set New filenode and estimate rel size are added as function hooks
and these not complusory for heap, when they exist, it take over
the control.
---
 src/backend/access/heap/heapam_handler.c | 27 +++++++++++++++++++++++-
 src/backend/catalog/heap.c               | 24 ++-------------------
 src/backend/commands/tablecmds.c         |  4 ++--
 src/backend/optimizer/util/plancat.c     | 12 +++++++++++
 src/backend/utils/cache/relcache.c       | 12 +++++++++++
 src/include/access/tableam.h             | 16 ++++++++++++++
 src/include/catalog/heap.h               |  2 --
 7 files changed, 70 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3254e30a45..ae832e1f71 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -30,6 +30,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/pg_am_d.h"
+#include "catalog/storage_xlog.h"
 #include "executor/executor.h"
 #include "pgstat.h"
 #include "storage/lmgr.h"
@@ -2118,6 +2119,28 @@ heap_copy_for_cluster(Relation OldHeap, Relation NewHeap, Relation OldIndex,
 	pfree(isnull);
 }
 
+/*
+ * Set up an init fork for an unlogged table so that it can be correctly
+ * reinitialized on restart.  An immediate sync is required even if the
+ * page has been logged, because the write did not go through
+ * shared_buffers and therefore a concurrent checkpoint may have moved
+ * the redo pointer past our xlog record.  Recovery may as well remove it
+ * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
+ * record. Therefore, logging is necessary even if wal_level=minimal.
+ */
+static void
+heap_create_init_fork(Relation rel)
+{
+	Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
+		   rel->rd_rel->relkind == RELKIND_MATVIEW ||
+		   rel->rd_rel->relkind == RELKIND_TOASTVALUE);
+	RelationOpenSmgr(rel);
+	smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
+	log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
+	smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
+}
+
+
 static const TableAmRoutine heapam_methods = {
 	.type = T_TableAmRoutine,
 
@@ -2165,7 +2188,9 @@ static const TableAmRoutine heapam_methods = {
 
 	.index_build_range_scan = IndexBuildHeapRangeScan,
 
-	.index_validate_scan = validate_index_heapscan
+	.index_validate_scan = validate_index_heapscan,
+
+	.CreateInitFork = heap_create_init_fork
 };
 
 const TableAmRoutine *
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 38b368f916..8e7c8ce684 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -32,6 +32,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/sysattr.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
@@ -1425,7 +1426,7 @@ heap_create_with_catalog(const char *relname,
 	 */
 	if (relpersistence == RELPERSISTENCE_UNLOGGED &&
 		relkind != RELKIND_PARTITIONED_TABLE)
-		heap_create_init_fork(new_rel_desc);
+		table_create_init_fork(new_rel_desc);
 
 	/*
 	 * ok, the relation has been cataloged, so close our relations and return
@@ -1437,27 +1438,6 @@ heap_create_with_catalog(const char *relname,
 	return relid;
 }
 
-/*
- * Set up an init fork for an unlogged table so that it can be correctly
- * reinitialized on restart.  An immediate sync is required even if the
- * page has been logged, because the write did not go through
- * shared_buffers and therefore a concurrent checkpoint may have moved
- * the redo pointer past our xlog record.  Recovery may as well remove it
- * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
- * record. Therefore, logging is necessary even if wal_level=minimal.
- */
-void
-heap_create_init_fork(Relation rel)
-{
-	Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
-		   rel->rd_rel->relkind == RELKIND_MATVIEW ||
-		   rel->rd_rel->relkind == RELKIND_TOASTVALUE);
-	RelationOpenSmgr(rel);
-	smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
-	log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
-	smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
-}
-
 /*
  *		RelationRemoveInheritance
  *
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f3526b267d..3c46a48882 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1649,7 +1649,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 			RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
 									  RecentXmin, minmulti);
 			if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
-				heap_create_init_fork(rel);
+				table_create_init_fork(rel);
 
 			heap_relid = RelationGetRelid(rel);
 			toast_relid = rel->rd_rel->reltoastrelid;
@@ -1663,7 +1663,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 				RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
 										  RecentXmin, minmulti);
 				if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
-					heap_create_init_fork(rel);
+					table_create_init_fork(rel);
 				heap_close(rel, NoLock);
 			}
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8da468a86f..3355f8bff4 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -947,6 +947,18 @@ estimate_rel_size(Relation rel, int32 *attr_widths,
 	BlockNumber relallvisible;
 	double		density;
 
+	/*
+	 * If the relation contains any specific EstimateRelSize
+	 * function, use that instead of the regular default heap method.
+	 */
+	if (rel->rd_tableamroutine &&
+			rel->rd_tableamroutine->EstimateRelSize)
+	{
+		rel->rd_tableamroutine->EstimateRelSize(rel, attr_widths, pages,
+												tuples, allvisfrac);
+		return;
+	}
+
 	switch (rel->rd_rel->relkind)
 	{
 		case RELKIND_RELATION:
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 0d6e5a189f..9cc8e98e40 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3424,6 +3424,18 @@ RelationSetNewRelfilenode(Relation relation, char persistence,
 	HeapTuple	tuple;
 	Form_pg_class classform;
 
+	/*
+	 * If the relation contains any specific SetNewFilenode
+	 * function, use that instead of the regular default heap method.
+	 */
+	if (relation->rd_tableamroutine &&
+			relation->rd_tableamroutine->SetNewFileNode)
+	{
+		relation->rd_tableamroutine->SetNewFileNode(relation, persistence,
+													freezeXid, minmulti);
+		return;
+	}
+
 	/* Indexes, sequences must have Invalid frozenxid; other rels must not */
 	Assert((relation->rd_rel->relkind == RELKIND_INDEX ||
 			relation->rd_rel->relkind == RELKIND_SEQUENCE) ?
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 7fe6ff6c22..eb7c9b8007 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -194,6 +194,12 @@ struct SampleScanState;
 typedef bool (*SampleScanNextBlock_function)(TableScanDesc scan, struct SampleScanState *scanstate);
 typedef bool (*SampleScanNextTuple_function)(TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot);
 
+typedef void (*CreateInitFork_function)(Relation rel);
+typedef void (*EstimateRelSize_function)(Relation rel, int32 *attr_widths,
+				  BlockNumber *pages, double *tuples, double *allvisfrac);
+typedef void (*SetNewFileNode_function)(Relation relation, char persistence,
+										TransactionId freezeXid, MultiXactId minmulti);
+
 /*
  * API struct for a table AM.  Note this must be allocated in a
  * server-lifetime manner, typically as a static const struct.
@@ -250,6 +256,10 @@ typedef struct TableAmRoutine
 
 	IndexBuildRangeScan_function index_build_range_scan;
 	IndexValidateScan_function index_validate_scan;
+
+	CreateInitFork_function CreateInitFork;
+	EstimateRelSize_function EstimateRelSize;
+	SetNewFileNode_function	SetNewFileNode;
 }			TableAmRoutine;
 
 static inline const TupleTableSlotOps*
@@ -741,6 +751,12 @@ table_index_build_range_scan(Relation heapRelation,
 		scan);
 }
 
+static inline void
+table_create_init_fork(Relation relation)
+{
+	relation->rd_tableamroutine->CreateInitFork(relation);
+}
+
 extern BlockNumber table_parallelscan_nextpage(TableScanDesc scan);
 extern void table_parallelscan_startblock_init(TableScanDesc scan);
 extern Size table_parallelscan_estimate(Snapshot snapshot);
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 4584b3473c..c0e706ecc9 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -77,8 +77,6 @@ extern Oid heap_create_with_catalog(const char *relname,
 						 Oid relrewrite,
 						 ObjectAddress *typaddress);
 
-extern void heap_create_init_fork(Relation rel);
-
 extern void heap_drop_with_catalog(Oid relid);
 
 extern void heap_truncate(List *relids);
-- 
2.18.0.windows.1

0001-Further-fixes-and-cleanup.patchapplication/octet-stream; name=0001-Further-fixes-and-cleanup.patchDownload
From 5b03e5e2aab35477e409b531cca09e9fef528e6f Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 22 Oct 2018 16:06:10 +1100
Subject: [PATCH 1/3] Further fixes and cleanup

1. Remove the old slot interface file and also update the Makefile.
2. CREATE AS USING method grammer support

This change was missed during earlier USING grammer support.

3. Remove the extra Tuple visibility check

In heapgettup_pagemode the tuple visiblity check is added
during the early devlopment of pluggable storage, but the visibility
check is already carried out in heapgetpage function itself.

4. Handling HeapTupleInvisible case

In update/delete scenarios, when the tuple is
concurrently updated/deleted, sometimes locking
of a tuple may return HeapTupleInvisible. Handle
that case as nothing to do.

regression fixes

1. scan start offset fix during analyze
2. Materialize the slot before they are processed using intorel_receive
3. ROW_MARK_COPY support by force store of heap tuple
4. partition prune extra heap page fix
---
 contrib/pg_visibility/pg_visibility.c     |  5 ++--
 src/backend/access/heap/heapam.c          | 28 +++++++++--------------
 src/backend/access/heap/heapam_handler.c  |  6 +++--
 src/backend/access/table/Makefile         |  2 +-
 src/backend/access/table/tableam_common.c |  0
 src/backend/commands/createas.c           |  4 ++++
 src/backend/executor/execExprInterp.c     | 16 +++++--------
 src/backend/executor/execMain.c           | 13 ++---------
 src/backend/executor/execTuples.c         | 21 +++++++++++++++++
 src/backend/executor/nodeBitmapHeapscan.c | 12 ++++++++++
 src/backend/executor/nodeModifyTable.c    | 22 ++++++++++++++++--
 src/backend/parser/gram.y                 | 11 +++++----
 src/include/executor/tuptable.h           |  1 +
 src/include/nodes/primnodes.h             |  1 +
 14 files changed, 92 insertions(+), 50 deletions(-)
 delete mode 100644 src/backend/access/table/tableam_common.c

diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index dce5262e34..88ca4fd2af 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -563,12 +563,13 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
 
 	rel = relation_open(relid, AccessShareLock);
 
+	/* Only some relkinds have a visibility map */
+	check_relation_relkind(rel);
+
 	if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
 		ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 						errmsg("only heap AM is supported")));
 
-	/* Only some relkinds have a visibility map */
-	check_relation_relkind(rel);
 
 	nblocks = RelationGetNumberOfBlocks(rel);
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ec99d0bcae..ef6b4c3e54 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -947,31 +947,25 @@ heapgettup_pagemode(HeapScanDesc scan,
 			/*
 			 * if current tuple qualifies, return it.
 			 */
-			if (HeapTupleSatisfies(tuple, scan->rs_scan.rs_snapshot, scan->rs_cbuf))
+			if (key != NULL)
 			{
-				/*
-				 * if current tuple qualifies, return it.
-				 */
-				if (key != NULL)
-				{
-					bool		valid;
+				bool		valid;
 
-					HeapKeyTest(tuple, RelationGetDescr(scan->rs_scan.rs_rd),
-								nkeys, key, valid);
-					if (valid)
-					{
-						scan->rs_cindex = lineindex;
-						LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-						return;
-					}
-				}
-				else
+				HeapKeyTest(tuple, RelationGetDescr(scan->rs_scan.rs_rd),
+							nkeys, key, valid);
+				if (valid)
 				{
 					scan->rs_cindex = lineindex;
 					LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 					return;
 				}
 			}
+			else
+			{
+				scan->rs_cindex = lineindex;
+				LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+				return;
+			}
 
 			/*
 			 * otherwise move to the next item on the page
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c3960dc91f..3254e30a45 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1741,7 +1741,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
 {
 	HeapScanDesc scan = (HeapScanDesc) sscan;
 	Page		targpage;
-	OffsetNumber targoffset = scan->rs_cindex;
+	OffsetNumber targoffset;
 	OffsetNumber maxoffset;
 	BufferHeapTupleTableSlot *hslot;
 
@@ -1751,7 +1751,9 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
 	maxoffset = PageGetMaxOffsetNumber(targpage);
 
 	/* Inner loop over all tuples on the selected page */
-	for (targoffset = scan->rs_cindex; targoffset <= maxoffset; targoffset++)
+	for (targoffset = scan->rs_cindex ? scan->rs_cindex : FirstOffsetNumber;
+			targoffset <= maxoffset;
+			targoffset++)
 	{
 		ItemId		itemid;
 		HeapTuple	targtuple = &hslot->base.tupdata;
diff --git a/src/backend/access/table/Makefile b/src/backend/access/table/Makefile
index fe22bf9208..006ba99182 100644
--- a/src/backend/access/table/Makefile
+++ b/src/backend/access/table/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/access/table
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = tableam.o tableamapi.o tableam_common.o
+OBJS = tableam.o tableamapi.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/table/tableam_common.c b/src/backend/access/table/tableam_common.c
deleted file mode 100644
index e69de29bb2..0000000000
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 84de804175..d3ffe417ff 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -593,6 +593,10 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 	if (myState->rel->rd_rel->relhasoids)
 		slot->tts_tupleOid = InvalidOid;
 
+	/* Materialize the slot */
+	if (!TTS_IS_VIRTUAL(slot))
+		ExecMaterializeSlot(slot);
+
 	table_insert(myState->rel,
 				 slot,
 				 myState->output_cid,
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index ef94ac4aa0..8df85c2f48 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -529,20 +529,13 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 			Assert(TTS_IS_HEAPTUPLE(outerslot) ||
 				   TTS_IS_BUFFERTUPLE(outerslot));
 
-			/* The slot should have a valid heap tuple. */
-#if FIXME
-			/* The slot should have a valid heap tuple. */
-			Assert(hslot->tuple != NULL);
-#endif
-
-			/*
-			 * hari
-			 * Assert(outerslot->tts_storageslotam->slot_is_physical_tuple(outerslot));
-			 */
 			if (attnum == TableOidAttributeNumber)
 				d = ObjectIdGetDatum(outerslot->tts_tableOid);
 			else
 			{
+				/* The slot should have a valid heap tuple. */
+				Assert(hslot->tuple != NULL);
+
 				/* heap_getsysattr has sufficient defenses against bad attnums */
 				d = heap_getsysattr(hslot->tuple, attnum,
 									outerslot->tts_tupleDescriptor,
@@ -570,6 +563,9 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 				Assert(TTS_IS_HEAPTUPLE(scanslot) ||
 					   TTS_IS_BUFFERTUPLE(scanslot));
 
+				if (hslot->tuple == NULL)
+					ExecMaterializeSlot(scanslot);
+
 				d = heap_getsysattr(hslot->tuple, attnum,
 									scanslot->tts_tupleDescriptor,
 									op->resnull);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index e055c0a7c6..34ef86a5bd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2594,7 +2594,7 @@ EvalPlanQual(EState *estate, EPQState *epqstate,
 	 * datums that may be present in copyTuple).  As with the next step, this
 	 * is to guard against early re-use of the EPQ query.
 	 */
-	if (!TupIsNull(slot))
+	if (!TupIsNull(slot) && !TTS_IS_VIRTUAL(slot))
 		ExecMaterializeSlot(slot);
 
 #if FIXME
@@ -2787,16 +2787,7 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
 			if (isNull)
 				continue;
 
-			elog(ERROR, "frak, need to implement ROW_MARK_COPY");
-#ifdef FIXME
-			// FIXME: this should just deform the tuple and store it as a
-			// virtual one.
-			tuple = table_tuple_by_datum(erm->relation, datum, erm->relid);
-
-			/* store tuple */
-			EvalPlanQualSetTuple(epqstate, erm->rti, tuple);
-#endif
-
+			ExecForceStoreHeapTupleDatum(datum, slot);
 		}
 	}
 }
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 917bf80f71..74149cc3ad 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -1364,6 +1364,27 @@ ExecStoreAllNullTuple(TupleTableSlot *slot)
 	return ExecFinishStoreSlotValues(slot);
 }
 
+void
+ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot)
+{
+	HeapTuple	tuple;
+	HeapTupleHeader td;
+
+	td = DatumGetHeapTupleHeader(data);
+
+	tuple = (HeapTuple) palloc(HEAPTUPLESIZE + HeapTupleHeaderGetDatumLength(td));
+	tuple->t_len = HeapTupleHeaderGetDatumLength(td);
+	tuple->t_self = td->t_ctid;
+	tuple->t_data = (HeapTupleHeader) ((char *) tuple + HEAPTUPLESIZE);
+	memcpy((char *) tuple->t_data, (char *) td, tuple->t_len);
+
+	ExecClearTuple(slot);
+
+	heap_deform_tuple(tuple, slot->tts_tupleDescriptor,
+					  slot->tts_values, slot->tts_isnull);
+	ExecFinishStoreSlotValues(slot);
+}
+
 /* --------------------------------
  *		ExecFetchSlotTuple
  *			Fetch the slot's regular physical tuple.
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 56880e3d16..36ca07beb2 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -224,6 +224,18 @@ BitmapHeapNext(BitmapHeapScanState *node)
 
 			BitmapAdjustPrefetchIterator(node, tbmres);
 
+			/*
+			 * Ignore any claimed entries past what we think is the end of the
+			 * relation.  (This is probably not necessary given that we got at
+			 * least AccessShareLock on the table before performing any of the
+			 * indexscans, but let's be safe.)
+			 */
+			if (tbmres->blockno >= scan->rs_nblocks)
+			{
+				node->tbmres = tbmres = NULL;
+				continue;
+			}
+
 			/*
 			 * We can skip fetching the heap page if we don't need any fields
 			 * from the heap, and the bitmap entries don't need rechecking,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 3cc9092413..b3851b180d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -607,7 +607,7 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool canSetTag,
 		   bool changingPart,
 		   bool *tupleDeleted,
-		   TupleTableSlot **epqslot)
+		   TupleTableSlot **epqreturnslot)
 {
 	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
@@ -632,7 +632,7 @@ ExecDelete(ModifyTableState *mtstate,
 		bool		dodelete;
 
 		dodelete = ExecBRDeleteTriggers(estate, epqstate, resultRelInfo,
-										tupleid, oldtuple, epqslot);
+										tupleid, oldtuple, epqreturnslot);
 
 		if (!dodelete)			/* "do nothing" */
 			return NULL;
@@ -724,8 +724,21 @@ ldelete:;
 					/* Tuple no more passing quals, exiting... */
 					return NULL;
 				}
+
+				/**/
+				if (epqreturnslot)
+				{
+					*epqreturnslot = epqslot;
+					return NULL;
+				}
+
 				goto ldelete;
 			}
+			else if (result == HeapTupleInvisible)
+			{
+				/* tuple is not visible; nothing to do */
+				return NULL;
+			}
 		}
 
 		switch (result)
@@ -1196,6 +1209,11 @@ lreplace:;
 				slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
 				goto lreplace;
 			}
+			else if (result == HeapTupleInvisible)
+			{
+				/* tuple is not visible; nothing to do */
+				return NULL;
+			}
 		}
 
 		switch (result)
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 54382aba88..f030ad25a2 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4037,7 +4037,6 @@ CreateStatsStmt:
  *
  *****************************************************************************/
 
-// PBORKED: storage option
 CreateAsStmt:
 		CREATE OptTemp TABLE create_as_target AS SelectStmt opt_with_data
 				{
@@ -4068,14 +4067,16 @@ CreateAsStmt:
 		;
 
 create_as_target:
-			qualified_name opt_column_list OptWith OnCommitOption OptTableSpace
+			qualified_name opt_column_list table_access_method_clause
+			OptWith OnCommitOption OptTableSpace
 				{
 					$$ = makeNode(IntoClause);
 					$$->rel = $1;
 					$$->colNames = $2;
-					$$->options = $3;
-					$$->onCommit = $4;
-					$$->tableSpaceName = $5;
+					$$->accessMethod = $3;
+					$$->options = $4;
+					$$->onCommit = $5;
+					$$->tableSpaceName = $6;
 					$$->viewQuery = NULL;
 					$$->skipData = false;		/* might get changed later */
 				}
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index 05f38cfd0d..20fc425a27 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -476,6 +476,7 @@ extern TupleTableSlot *ExecCopySlot(TupleTableSlot *dstslot,
 
 extern void ExecForceStoreHeapTuple(HeapTuple tuple,
 			   TupleTableSlot *slot);
+extern void ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot);
 
 extern void slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum);
 extern Datum slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull);
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 40f6eb03d2..4d194a8c2a 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -111,6 +111,7 @@ typedef struct IntoClause
 
 	RangeVar   *rel;			/* target relation name */
 	List	   *colNames;		/* column names to assign, or NIL */
+	char	   *accessMethod;	/* table access method */
 	List	   *options;		/* options from WITH clause */
 	OnCommitAction onCommit;	/* what do we do at COMMIT? */
 	char	   *tableSpaceName; /* table space to use, or NULL */
-- 
2.18.0.windows.1

#47Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#46)
3 attachment(s)
Re: Pluggable Storage - Andres's take

On Fri, Nov 2, 2018 at 11:17 AM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Wed, Oct 31, 2018 at 9:34 PM Dmitry Dolgov <9erthalion6@gmail.com>
wrote:

FYI, alongside with reviewing the code changes I've ran few performance
tests
(that's why I hit this issue with pgbench in the first place). In case of
high
concurrecy so far I see small performance degradation in comparison with
the
master branch (about 2-5% of average latency, depending on the level of
concurrency), but can't really say why exactly (perf just shows barely
noticeable overhead there and there, maybe what I see is actually a
cumulative
impact).

Thanks for sharing your observation, I will also analyze and try to find
out performance
bottlenecks that are causing the overhead.

I tried running the pgbench performance tests with minimal clients in my
laptop and I didn't
find any performance issues, may be issue is visible only with higher
clients. Even with
perf tool, I am not able to get a clear problem function. As you said,
combining of all changes
leads to some overhead.

Here I attached the cumulative patches with further fixes and basic syntax
regress tests also.

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0003-First-draft-of-pluggable-storage-documentation.patchapplication/octet-stream; name=0003-First-draft-of-pluggable-storage-documentation.patchDownload
From 2c1414f2e847577174ba3087868e4920342dfeb1 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 1 Nov 2018 12:00:10 +1100
Subject: [PATCH 3/3] First draft of pluggable-storage documentation

---
 doc/src/sgml/{indexam.sgml => am.sgml}     | 590 ++++++++++++++++++++-
 doc/src/sgml/catalogs.sgml                 |   5 +-
 doc/src/sgml/config.sgml                   |  24 +
 doc/src/sgml/filelist.sgml                 |   2 +-
 doc/src/sgml/postgres.sgml                 |   2 +-
 doc/src/sgml/ref/create_access_method.sgml |  12 +-
 doc/src/sgml/ref/create_table.sgml         |  18 +-
 doc/src/sgml/ref/create_table_as.sgml      |  14 +
 doc/src/sgml/release-9.6.sgml              |   2 +-
 doc/src/sgml/xindex.sgml                   |   2 +-
 10 files changed, 640 insertions(+), 31 deletions(-)
 rename doc/src/sgml/{indexam.sgml => am.sgml} (78%)

diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/am.sgml
similarity index 78%
rename from doc/src/sgml/indexam.sgml
rename to doc/src/sgml/am.sgml
index beb99d1831..dc13bc1073 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/am.sgml
@@ -1,16 +1,20 @@
-<!-- doc/src/sgml/indexam.sgml -->
+<!-- doc/src/sgml/am.sgml -->
 
-<chapter id="indexam">
- <title>Index Access Method Interface Definition</title>
+<chapter id="am">
+ <title>Access Method Interface Definition</title>
 
   <para>
    This chapter defines the interface between the core
-   <productname>PostgreSQL</productname> system and <firstterm>index access
-   methods</firstterm>, which manage individual index types.  The core system
-   knows nothing about indexes beyond what is specified here, so it is
-   possible to develop entirely new index types by writing add-on code.
+   <productname>PostgreSQL</productname> system and <firstterm>access
+   methods</firstterm>, which manage individual <literal>INDEX</literal> 
+   and <literal>TABLE</literal> types.  The core system knows nothing
+   about these access methods beyond what is specified here, so it is
+   possible to develop entirely new access method types by writing add-on code.
   </para>
-
+ 
+ <sect1 id="index-access-methods">
+  <title>Overview of Index access methods</title>
+  
   <para>
    All indexes in <productname>PostgreSQL</productname> are what are known
    technically as <firstterm>secondary indexes</firstterm>; that is, the index is
@@ -42,8 +46,8 @@
    dead tuples are reclaimed (by vacuuming) when the dead tuples themselves
    are reclaimed.
   </para>
-
- <sect1 id="index-api">
+  
+ <sect2 id="index-api">
   <title>Basic API Structure for Indexes</title>
 
   <para>
@@ -217,9 +221,9 @@ typedef struct IndexAmRoutine
    conditions.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-functions">
+ <sect2 id="index-functions">
   <title>Index Access Method Functions</title>
 
   <para>
@@ -709,9 +713,11 @@ amparallelrescan (IndexScanDesc scan);
    the beginning.
   </para>
 
- </sect1>
+ </sect2>
+ 
+ 
 
- <sect1 id="index-scanning">
+ <sect2 id="index-scanning">
   <title>Index Scanning</title>
 
   <para>
@@ -864,9 +870,9 @@ amparallelrescan (IndexScanDesc scan);
    if its internal implementation is unsuited to one API or the other.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-locking">
+ <sect2 id="index-locking">
   <title>Index Locking Considerations</title>
 
   <para>
@@ -978,9 +984,9 @@ amparallelrescan (IndexScanDesc scan);
    reduce the frequency of such transaction cancellations.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-unique-checks">
+ <sect2 id="index-unique-checks">
   <title>Index Uniqueness Checks</title>
 
   <para>
@@ -1127,9 +1133,9 @@ amparallelrescan (IndexScanDesc scan);
     </itemizedlist>
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-cost-estimation">
+ <sect2 id="index-cost-estimation">
   <title>Index Cost Estimation Functions</title>
 
   <para>
@@ -1376,5 +1382,549 @@ cost_qual_eval(&amp;index_qual_cost, path-&gt;indexquals, root);
    Examples of cost estimator functions can be found in
    <filename>src/backend/utils/adt/selfuncs.c</filename>.
   </para>
+ </sect2>
  </sect1>
+ 
+ <sect1 id="table-access-methods">
+  <title>Overview of Table access methods </title>
+  
+  <para>
+   All Tables in <productname>PostgreSQL</productname> are the primary data store.
+   Each table is stored as its own physical <firstterm>relation</firstterm> and so
+   is described by an entry in the <structname>pg_class</structname> catalog.
+   The contents of an table are entirely under the control of its access method.
+   (All the access methods furthermore use the standard page layout described in
+   <xref linkend="storage-page-layout"/>.)
+  </para>
+  
+ <sect2 id="table-api">
+  <title>Table access method API</title>
+  
+  <para>
+   Each table access method is described by a row in the
+   <link linkend="catalog-pg-am"><structname>pg_am</structname></link>
+   system catalog.  The <structname>pg_am</structname> entry
+   specifies a name and a <firstterm>handler function</firstterm> for the access
+   method.  These entries can be created and deleted using the
+   <xref linkend="sql-create-access-method"/> and
+   <xref linkend="sql-drop-access-method"/> SQL commands.
+  </para>
+
+  <para>
+   A table access method handler function must be declared to accept a
+   single argument of type <type>internal</type> and to return the
+   pseudo-type <type>table_am_handler</type>.  The argument is a dummy value that
+   simply serves to prevent handler functions from being called directly from
+   SQL commands.  The result of the function must be a palloc'd struct of
+   type <structname>TableAmRoutine</structname>, which contains everything
+   that the core code needs to know to make use of the table access method.
+   The <structname>TableAmRoutine</structname> struct, also called the access
+   method's <firstterm>API struct</firstterm>, includes fields specifying assorted
+   fixed properties of the access method, such as whether it can support
+   bitmap scans.  More importantly, it contains pointers to support
+   functions for the access method, which do all of the real work to access
+   tables.  These support functions are plain C functions and are not
+   visible or callable at the SQL level.  The support functions are described
+   in <xref linkend="table-functions"/>.
+  </para>
+
+  <para>
+   The structure <structname>TableAmRoutine</structname> is defined thus:
+<programlisting>
+typedef struct TableAmRoutine
+{
+    NodeTag     type;
+
+    SlotCallbacks_function slot_callbacks;
+
+    SnapshotSatisfies_function snapshot_satisfies;
+    SnapshotSatisfiesUpdate_function snapshot_satisfiesUpdate;
+    SnapshotSatisfiesVacuum_function snapshot_satisfiesVacuum;
+
+    /* Operations on physical tuples */
+    TupleInsert_function tuple_insert;
+    TupleInsertSpeculative_function tuple_insert_speculative;
+    TupleCompleteSpeculative_function tuple_complete_speculative;
+    TupleUpdate_function tuple_update;
+    TupleDelete_function tuple_delete;
+    TupleFetchRowVersion_function tuple_fetch_row_version;
+    TupleLock_function tuple_lock;
+    MultiInsert_function multi_insert;
+    TupleGetLatestTid_function tuple_get_latest_tid;
+    TupleFetchFollow_function tuple_fetch_follow;
+
+    GetTupleData_function get_tuple_data;
+
+    RelationVacuum_function relation_vacuum;
+    RelationScanAnalyzeNextBlock_function scan_analyze_next_block;
+    RelationScanAnalyzeNextTuple_function scan_analyze_next_tuple;
+    RelationCopyForCluster_function relation_copy_for_cluster;
+    RelationSync_function relation_sync;
+
+    /* Operations on relation scans */
+    ScanBegin_function scan_begin;
+    ScanSetlimits_function scansetlimits;
+    ScanGetnextSlot_function scan_getnextslot;
+
+    BitmapPagescan_function scan_bitmap_pagescan;
+    BitmapPagescanNext_function scan_bitmap_pagescan_next;
+
+    SampleScanNextBlock_function scan_sample_next_block;
+    SampleScanNextTuple_function scan_sample_next_tuple;
+
+    ScanEnd_function scan_end;
+    ScanRescan_function scan_rescan;
+    ScanUpdateSnapshot_function scan_update_snapshot;
+
+    BeginIndexFetchTable_function begin_index_fetch;
+    EndIndexFetchTable_function reset_index_fetch;
+    EndIndexFetchTable_function end_index_fetch;
+
+
+    IndexBuildRangeScan_function index_build_range_scan;
+    IndexValidateScan_function index_validate_scan;
+
+    CreateInitFork_function CreateInitFork;
+}           TableAmRoutine;
+</programlisting>
+  </para>
+  
+  <para>
+   An individual table is defined by a
+   <link linkend="catalog-pg-class"><structname>pg_class</structname></link>
+   entry that describes it as a physical relation.
+  </para>
+   
+ </sect2>
+ 
+ <sect2 id="table-functions">
+  <title>Table Access Method Functions</title>
+
+  <para>
+   The table construction and maintenance functions that an table access
+   method must provide in <structname>TableAmRoutine</structname> are:
+  </para>
+
+  <para>
+<programlisting>
+TupleTableSlotOps *
+slot_callbacks (Relation relation);
+</programlisting>
+   API to access the slot specific methods;
+   Following methods are available; 
+   <structname>TTSOpsVirtual</structname>,
+   <structname>TTSOpsHeapTuple</structname>,
+   <structname>TTSOpsMinimalTuple</structname>,
+   <structname>TTSOpsBufferTuple</structname>,
+  </para>
+  
+  <para>
+<programlisting>
+bool
+snapshot_satisfies (TupleTableSlot *slot, Snapshot snapshot);
+</programlisting>
+   API to check whether the provided slot is visible to the current
+   transaction according the snapshot.
+  </para>
+ 
+  <para>
+<programlisting>
+Oid
+tuple_insert (Relation rel, TupleTableSlot *slot, CommandId cid,
+              int options, BulkInsertState bistate);
+</programlisting>
+   API to insert the tuple and provide the <literal>ItemPointerData</literal>
+   where the tuple is successfully inserted.
+  </para>
+  
+  <para>
+<programlisting>
+Oid
+tuple_insert_speculative (Relation rel,
+                         TupleTableSlot *slot,
+                         CommandId cid,
+                         int options,
+                         BulkInsertState bistate,
+                         uint32 specToken);
+</programlisting>
+   API to insert the tuple with a speculative token. This API is similar
+   like <literal>tuple_insert</literal>, with additional speculative
+   information.
+  </para>
+  
+  <para>
+<programlisting>
+void
+tuple_complete_speculative (Relation rel,
+                          TupleTableSlot *slot,
+                          uint32 specToken,
+                          bool succeeded);
+</programlisting>
+   API to complete the state of the tuple inserted by the API <literal>tuple_insert_speculative</literal>
+   with the successful completion of the index insert.
+  </para>
+  
+  
+  <para>
+<programlisting>
+HTSU_Result
+tuple_update (Relation relation,
+             ItemPointer otid,
+             TupleTableSlot *slot,
+             CommandId cid,
+             Snapshot crosscheck,
+             bool wait,
+             HeapUpdateFailureData *hufd,
+             LockTupleMode *lockmode,
+             bool *update_indexes);
+</programlisting>
+   API to update the existing tuple with new data.
+  </para>
+  
+  
+  <para>
+<programlisting>
+HTSU_Result
+tuple_delete (Relation relation,
+             ItemPointer tid,
+             CommandId cid,
+             Snapshot crosscheck,
+             bool wait,
+             HeapUpdateFailureData *hufd,
+             bool changingPart);
+</programlisting>
+   API to delete the existing tuple.
+  </para>
+  
+  
+  <para>
+<programlisting>
+bool
+tuple_fetch_row_version (Relation relation,
+                       ItemPointer tid,
+                       Snapshot snapshot,
+                       TupleTableSlot *slot,
+                       Relation stats_relation);
+</programlisting>
+   API to fetch and store the Buffered Heap tuple in the provided slot
+   based on the ItemPointer.
+  </para>
+  
+  
+  <para>
+<programlisting>
+HTSU_Result
+TupleLock_function (Relation relation,
+                   ItemPointer tid,
+                   Snapshot snapshot,
+                   TupleTableSlot *slot,
+                   CommandId cid,
+                   LockTupleMode mode,
+                   LockWaitPolicy wait_policy,
+                   uint8 flags,
+                   HeapUpdateFailureData *hufd);
+</programlisting>
+   API to lock the specified the ItemPointer tuple and fetches the newest version of
+   its tuple and TID.
+  </para>
+  
+  
+  <para>
+<programlisting>
+void
+multi_insert (Relation relation, TupleTableSlot **slots, int nslots,
+              CommandId cid, int options, BulkInsertState bistate);
+</programlisting>
+   API to insert multiple tuples at a time into the relation.
+  </para>
+  
+  
+  <para>
+<programlisting>
+void
+tuple_get_latest_tid (Relation relation,
+                    Snapshot snapshot,
+                    ItemPointer tid);
+</programlisting>
+   API to get the the latest TID of the tuple with the given itempointer.
+  </para>
+  
+  
+  <para>
+<programlisting>
+bool
+tuple_fetch_follow (struct IndexFetchTableData *scan,
+                  ItemPointer tid,
+                  Snapshot snapshot,
+                  TupleTableSlot *slot,
+                  bool *call_again, bool *all_dead);
+</programlisting>
+   API to get the all the tuples of the page that satisfies itempointer.
+  </para>
+  
+  
+  <para>
+<programlisting>
+tuple_data
+get_tuple_data (TupleTableSlot *slot, tuple_data_flags flags);
+</programlisting>
+   API to return the internal structure members of the HeapTuple.
+  </para>
+  
+  
+  <para>
+<programlisting>
+void
+relation_vacuum (Relation onerel, int options,
+                struct VacuumParams *params, BufferAccessStrategy bstrategy);
+</programlisting>
+   API to perform vacuum for one heap relation.
+  </para>
+  
+  
+  <para>
+<programlisting>
+void
+scan_analyze_next_block (TableScanDesc scan, BlockNumber blockno,
+                      BufferAccessStrategy bstrategy);
+</programlisting>
+   API to fill the scan descriptor with the buffer of the specified block.
+  </para>
+  
+  
+  <para>
+<programlisting>
+bool
+scan_analyze_next_tuple (TableScanDesc scan, TransactionId OldestXmin,
+                      double *liverows, double *deadrows, TupleTableSlot *slot));
+</programlisting>
+   API to analyze the block and fill the buffered heap tuple in the slot and also
+   provide the live and dead rows.
+  </para>
+  
+  
+  <para>
+<programlisting>
+void
+relation_copy_for_cluster (Relation NewHeap, Relation OldHeap, Relation OldIndex,
+                       bool use_sort,
+                       TransactionId OldestXmin, TransactionId FreezeXid, MultiXactId MultiXactCutoff,
+                       double *num_tuples, double *tups_vacuumed, double *tups_recently_dead);
+</programlisting>
+   API to copy one relation to another relation eith using the Index or table scan.
+  </para>
+  
+  
+  <para>
+<programlisting>
+void
+relation_sync (Relation relation);
+</programlisting>
+   API to sync the relation to disk, useful for the cases where no WAL is written.
+  </para>
+  
+  
+  <para>
+<programlisting>
+TableScanDesc
+scan_begin (Relation relation,
+            Snapshot snapshot,
+            int nkeys, ScanKey key,
+            ParallelTableScanDesc parallel_scan,
+            bool allow_strat,
+            bool allow_sync,
+            bool allow_pagemode,
+            bool is_bitmapscan,
+            bool is_samplescan,
+            bool temp_snap);
+</programlisting>
+   API to start the relation scan for the provided relation and returns the
+   <structname>TableScanDesc</structname> structure.
+  </para>
+  
+  
+    <para>
+<programlisting>
+void
+scansetlimits (TableScanDesc sscan, BlockNumber startBlk, BlockNumber numBlks);
+</programlisting>
+   API to fix the relation scan range limits.
+  </para>
+  
+  
+    <para>
+<programlisting>
+TupleTableSlot *
+scan_getnextslot (TableScanDesc scan,
+                 ScanDirection direction, TupleTableSlot *slot);
+</programlisting>
+   API to fill the next visible tuple from the relation scan in the provided slot
+   and return it.
+  </para>
+  
+  
+    <para>
+<programlisting>
+bool
+scan_bitmap_pagescan (TableScanDesc scan,
+                    TBMIterateResult *tbmres);
+</programlisting>
+   API to scan the relation and fill the scan description bitmap with valid item pointers
+   for the specified block.
+  </para>
+  
+  
+    <para>
+<programlisting>
+bool
+scan_bitmap_pagescan_next (TableScanDesc scan,
+                        TupleTableSlot *slot);
+</programlisting>
+   API to fill the buffered heap tuple data from the bitmap scanned item pointers and store
+   it in the provided slot.
+  </para>
+  
+  
+    <para>
+<programlisting>
+bool
+scan_sample_next_block (TableScanDesc scan, struct SampleScanState *scanstate);
+</programlisting>
+   API to scan the relation and fill the scan description bitmap with valid item pointers
+   for the specified block provided by the sample method.
+  </para>
+  
+  
+    <para>
+<programlisting>
+bool
+scan_sample_next_tuple (TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot);
+</programlisting>
+   API to fill the buffered heap tuple data from the bitmap scanned item pointers based on the sample
+   method and store it in the provided slot.
+  </para>
+  
+  
+    <para>
+<programlisting>
+void
+scan_end (TableScanDesc scan);
+</programlisting>
+   API to end the relation scan.
+  </para>
+  
+  
+    <para>
+<programlisting>
+void
+scan_rescan (TableScanDesc scan, ScanKey key, bool set_params,
+             bool allow_strat, bool allow_sync, bool allow_pagemode);
+</programlisting>
+   API to restart the relation scan with provided data.
+  </para>
+  
+  
+  <para>
+<programlisting>
+void
+scan_update_snapshot (TableScanDesc scan, Snapshot snapshot);
+</programlisting>
+   API to update the relation scan with the new snapshot.
+  </para>
+  
+  <para>
+<programlisting>
+IndexFetchTableData *
+begin_index_fetch (Relation relation);
+</programlisting>
+   API to prepare the <structname>IndexFetchTableData</structname> for the relation.
+  </para>
+  
+  <para>
+<programlisting>
+void
+reset_index_fetch (struct IndexFetchTableData* data);
+</programlisting>
+   API to reset the prepared internal members of the <structname>IndexFetchTableData</structname>.
+  </para>
+  
+  <para>
+<programlisting>
+void
+end_index_fetch (struct IndexFetchTableData* data);
+</programlisting>
+   API to clear and free the <structname>IndexFetchTableData</structname>.
+  </para>
+  
+    <para>
+<programlisting>
+double
+index_build_range_scan (Relation heapRelation,
+                       Relation indexRelation,
+                       IndexInfo *indexInfo,
+                       bool allow_sync,
+                       bool anyvisible,
+                       BlockNumber start_blockno,
+                       BlockNumber end_blockno,
+                       IndexBuildCallback callback,
+                       void *callback_state,
+                       TableScanDesc scan);
+</programlisting>
+   API to perform the table scan with bounded range specified by the caller
+   and insert the satisfied records into the index using the provided callback
+   function pointer.
+  </para>
+  
+    <para>
+<programlisting>
+void
+index_validate_scan (Relation heapRelation,
+                   Relation indexRelation,
+                   IndexInfo *indexInfo,
+                   Snapshot snapshot,
+                   struct ValidateIndexState *state);
+</programlisting>
+   API to perform the table scan and insert the satisfied records into the index.
+   This API is similar like <function>index_build_range_scan</function>. This 
+   is used in the scenario of concurrent index build.
+  </para>
+  
+ </sect2>
+ 
+ <sect2>
+  <title>Table scanning</title>
+  
+  <para>
+  </para>
+ </sect2>
+ 
+ <sect2>
+  <title>Table insert/update/delete</title>
+
+  <para>
+  </para>
+  </sect2>
+ 
+ <sect2>
+  <title>Table locking</title>
+
+  <para>
+  </para>
+  </sect2>
+ 
+ <sect2>
+  <title>Table vacuum</title>
+
+  <para>
+  </para>
+ </sect2>
+ 
+ <sect2>
+  <title>Table fetch</title>
+
+  <para>
+  </para>
+ </sect2>
+ 
+ </sect1> 
 </chapter>
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 0179deea2e..f0c8037bbc 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -587,8 +587,9 @@
    The catalog <structname>pg_am</structname> stores information about
    relation access methods.  There is one row for each access method supported
    by the system.
-   Currently, only indexes have access methods.  The requirements for index
-   access methods are discussed in detail in <xref linkend="indexam"/>.
+   Currently, only <literal>INDEX</literal> and <literal>TABLE</literal> have
+   access methods.  The requirements for access methods are discussed in detail
+   in <xref linkend="am"/>.
   </para>
 
   <table>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f11b8f724c..8765d7c57c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6585,6 +6585,30 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-default-table-access-method" xreflabel="default_table_access_method">
+      <term><varname>default_table_access_method</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>default_table_access_method</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        This variable specifies the default table access method using which to create
+        objects (tables and materialized views) when a <command>CREATE</command> command does
+        not explicitly specify a access method.
+       </para>
+
+       <para>
+        The value is either the name of a table access method, or an empty string
+        to specify using the default table access method of the current database.
+        If the value does not match the name of any existing table access methods,
+        <productname>PostgreSQL</productname> will automatically use the default
+        table access method of the current database.
+       </para>
+
+      </listitem>
+     </varlistentry>
+     
      <varlistentry id="guc-default-tablespace" xreflabel="default_tablespace">
       <term><varname>default_tablespace</varname> (<type>string</type>)
       <indexterm>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 48ac14a838..99a6496502 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -90,7 +90,7 @@
 <!ENTITY gin        SYSTEM "gin.sgml">
 <!ENTITY brin       SYSTEM "brin.sgml">
 <!ENTITY planstats    SYSTEM "planstats.sgml">
-<!ENTITY indexam    SYSTEM "indexam.sgml">
+<!ENTITY am         SYSTEM "am.sgml">
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 0070603fc3..3e66ae9c8a 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -251,7 +251,7 @@
   &tablesample-method;
   &custom-scan;
   &geqo;
-  &indexam;
+  &am;
   &generic-wal;
   &btree;
   &gist;
diff --git a/doc/src/sgml/ref/create_access_method.sgml b/doc/src/sgml/ref/create_access_method.sgml
index 851c5e63be..256914022a 100644
--- a/doc/src/sgml/ref/create_access_method.sgml
+++ b/doc/src/sgml/ref/create_access_method.sgml
@@ -61,7 +61,8 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
     <listitem>
      <para>
       This clause specifies the type of access method to define.
-      Only <literal>INDEX</literal> is supported at present.
+      Only <literal>INDEX</literal> and <literal>TABLE</literal>
+      are supported at present.
      </para>
     </listitem>
    </varlistentry>
@@ -76,9 +77,12 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
       declared to take a single argument of type <type>internal</type>,
       and its return type depends on the type of access method;
       for <literal>INDEX</literal> access methods, it must
-      be <type>index_am_handler</type>.  The C-level API that the handler
-      function must implement varies depending on the type of access method.
-      The index access method API is described in <xref linkend="indexam"/>.
+      be <type>index_am_handler</type> and for <literal>TABLE</literal>
+      access methods, it must be <type>table_am_handler</type>.
+      The C-level API that the handler function must implement varies
+      depending on the type of access method. The index access method API
+      is described in <xref linkend="index-access-methods"/> and the table access method
+      API is described in <xref linkend="table-access-methods"/>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 10428f8ff0..87e0f01ab2 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -29,6 +29,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 ] )
 [ INHERITS ( <replaceable>parent_table</replaceable> [, ... ] ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -40,6 +41,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     [, ... ]
 ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -51,6 +53,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     [, ... ]
 ) ] { FOR VALUES <replaceable class="parameter">partition_bound_spec</replaceable> | DEFAULT }
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -955,7 +958,7 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
 
      <para>
       The access method must support <literal>amgettuple</literal> (see <xref
-      linkend="indexam"/>); at present this means <acronym>GIN</acronym>
+      linkend="index-access-methods"/>); at present this means <acronym>GIN</acronym>
       cannot be used.  Although it's allowed, there is little point in using
       B-tree or hash indexes with an exclusion constraint, because this
       does nothing that an ordinary unique constraint doesn't do better.
@@ -1138,6 +1141,19 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional access method for the new table;
+      see <xref linkend="table-access-methods"/> for more information.
+      If this option is not specified, then the default table access method
+      is chosen for the new table. see <xref linkend="guc-default-table-access-method"/>
+      for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 527138e787..2acf52d2f5 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 <synopsis>
 CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     [ (<replaceable>column_name</replaceable> [, ...] ) ]
+    [ USING <replaceable class="parameter">method</replaceable> ]
     [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
     [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
     [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -120,6 +121,19 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional access method for the new table;
+      see <xref linkend="table-access-methods"/> for more information.
+      If this option is not specified, then the default table access method
+      is chosen for the new table. see <xref linkend="guc-default-table-access-method"/>
+      for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
diff --git a/doc/src/sgml/release-9.6.sgml b/doc/src/sgml/release-9.6.sgml
index acb6a88b31..68c79db4b5 100644
--- a/doc/src/sgml/release-9.6.sgml
+++ b/doc/src/sgml/release-9.6.sgml
@@ -10081,7 +10081,7 @@ This commit is also listed under libpq and PL/pgSQL
 2016-08-13 [ed0097e4f] Add SQL-accessible functions for inspecting index AM pro
 -->
        <para>
-        Restructure <link linkend="indexam">index access
+        Restructure <link linkend="index-access-methods">index access
         method <acronym>API</acronym></link> to hide most of it at
         the <application>C</application> level (Alexander Korotkov, Andrew Gierth)
        </para>
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml
index 9446f8b836..4fa821160c 100644
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@@ -36,7 +36,7 @@
    described in <classname>pg_am</classname>.  It is possible to add a
    new index access method by writing the necessary code and
    then creating an entry in <classname>pg_am</classname> &mdash; but that is
-   beyond the scope of this chapter (see <xref linkend="indexam"/>).
+   beyond the scope of this chapter (see <xref linkend="am"/>).
   </para>
 
   <para>
-- 
2.18.0.windows.1

0002-New-API-s-are-added.patchapplication/octet-stream; name=0002-New-API-s-are-added.patchDownload
From 826223860a977394ed2ebc1f07c1533b0f240e9c Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 22 Oct 2018 17:58:27 +1100
Subject: [PATCH 2/3] New API's are added

1. init fork API
2. set New filenode
3. Estimate rel size

Set New filenode and estimate rel size are added as function hooks
and these not complusory for heap, when they exist, it take over
the control.
---
 src/backend/access/heap/heapam_handler.c | 27 +++++++++++++++++++++++-
 src/backend/catalog/heap.c               | 24 ++-------------------
 src/backend/commands/tablecmds.c         |  4 ++--
 src/backend/optimizer/util/plancat.c     | 12 +++++++++++
 src/backend/utils/cache/relcache.c       | 12 +++++++++++
 src/include/access/tableam.h             | 16 ++++++++++++++
 src/include/catalog/heap.h               |  2 --
 7 files changed, 70 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3254e30a45..ae832e1f71 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -30,6 +30,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/pg_am_d.h"
+#include "catalog/storage_xlog.h"
 #include "executor/executor.h"
 #include "pgstat.h"
 #include "storage/lmgr.h"
@@ -2118,6 +2119,28 @@ heap_copy_for_cluster(Relation OldHeap, Relation NewHeap, Relation OldIndex,
 	pfree(isnull);
 }
 
+/*
+ * Set up an init fork for an unlogged table so that it can be correctly
+ * reinitialized on restart.  An immediate sync is required even if the
+ * page has been logged, because the write did not go through
+ * shared_buffers and therefore a concurrent checkpoint may have moved
+ * the redo pointer past our xlog record.  Recovery may as well remove it
+ * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
+ * record. Therefore, logging is necessary even if wal_level=minimal.
+ */
+static void
+heap_create_init_fork(Relation rel)
+{
+	Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
+		   rel->rd_rel->relkind == RELKIND_MATVIEW ||
+		   rel->rd_rel->relkind == RELKIND_TOASTVALUE);
+	RelationOpenSmgr(rel);
+	smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
+	log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
+	smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
+}
+
+
 static const TableAmRoutine heapam_methods = {
 	.type = T_TableAmRoutine,
 
@@ -2165,7 +2188,9 @@ static const TableAmRoutine heapam_methods = {
 
 	.index_build_range_scan = IndexBuildHeapRangeScan,
 
-	.index_validate_scan = validate_index_heapscan
+	.index_validate_scan = validate_index_heapscan,
+
+	.CreateInitFork = heap_create_init_fork
 };
 
 const TableAmRoutine *
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 398f90775f..840319668a 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -32,6 +32,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/sysattr.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
@@ -1442,7 +1443,7 @@ heap_create_with_catalog(const char *relname,
 	 */
 	if (relpersistence == RELPERSISTENCE_UNLOGGED &&
 		relkind != RELKIND_PARTITIONED_TABLE)
-		heap_create_init_fork(new_rel_desc);
+		table_create_init_fork(new_rel_desc);
 
 	/*
 	 * ok, the relation has been cataloged, so close our relations and return
@@ -1454,27 +1455,6 @@ heap_create_with_catalog(const char *relname,
 	return relid;
 }
 
-/*
- * Set up an init fork for an unlogged table so that it can be correctly
- * reinitialized on restart.  An immediate sync is required even if the
- * page has been logged, because the write did not go through
- * shared_buffers and therefore a concurrent checkpoint may have moved
- * the redo pointer past our xlog record.  Recovery may as well remove it
- * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
- * record. Therefore, logging is necessary even if wal_level=minimal.
- */
-void
-heap_create_init_fork(Relation rel)
-{
-	Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
-		   rel->rd_rel->relkind == RELKIND_MATVIEW ||
-		   rel->rd_rel->relkind == RELKIND_TOASTVALUE);
-	RelationOpenSmgr(rel);
-	smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
-	log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
-	smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
-}
-
 /*
  *		RelationRemoveInheritance
  *
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f3526b267d..3c46a48882 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1649,7 +1649,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 			RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
 									  RecentXmin, minmulti);
 			if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
-				heap_create_init_fork(rel);
+				table_create_init_fork(rel);
 
 			heap_relid = RelationGetRelid(rel);
 			toast_relid = rel->rd_rel->reltoastrelid;
@@ -1663,7 +1663,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 				RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
 										  RecentXmin, minmulti);
 				if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
-					heap_create_init_fork(rel);
+					table_create_init_fork(rel);
 				heap_close(rel, NoLock);
 			}
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8da468a86f..3355f8bff4 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -947,6 +947,18 @@ estimate_rel_size(Relation rel, int32 *attr_widths,
 	BlockNumber relallvisible;
 	double		density;
 
+	/*
+	 * If the relation contains any specific EstimateRelSize
+	 * function, use that instead of the regular default heap method.
+	 */
+	if (rel->rd_tableamroutine &&
+			rel->rd_tableamroutine->EstimateRelSize)
+	{
+		rel->rd_tableamroutine->EstimateRelSize(rel, attr_widths, pages,
+												tuples, allvisfrac);
+		return;
+	}
+
 	switch (rel->rd_rel->relkind)
 	{
 		case RELKIND_RELATION:
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 0d6e5a189f..9cc8e98e40 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3424,6 +3424,18 @@ RelationSetNewRelfilenode(Relation relation, char persistence,
 	HeapTuple	tuple;
 	Form_pg_class classform;
 
+	/*
+	 * If the relation contains any specific SetNewFilenode
+	 * function, use that instead of the regular default heap method.
+	 */
+	if (relation->rd_tableamroutine &&
+			relation->rd_tableamroutine->SetNewFileNode)
+	{
+		relation->rd_tableamroutine->SetNewFileNode(relation, persistence,
+													freezeXid, minmulti);
+		return;
+	}
+
 	/* Indexes, sequences must have Invalid frozenxid; other rels must not */
 	Assert((relation->rd_rel->relkind == RELKIND_INDEX ||
 			relation->rd_rel->relkind == RELKIND_SEQUENCE) ?
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 7fe6ff6c22..eb7c9b8007 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -194,6 +194,12 @@ struct SampleScanState;
 typedef bool (*SampleScanNextBlock_function)(TableScanDesc scan, struct SampleScanState *scanstate);
 typedef bool (*SampleScanNextTuple_function)(TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot);
 
+typedef void (*CreateInitFork_function)(Relation rel);
+typedef void (*EstimateRelSize_function)(Relation rel, int32 *attr_widths,
+				  BlockNumber *pages, double *tuples, double *allvisfrac);
+typedef void (*SetNewFileNode_function)(Relation relation, char persistence,
+										TransactionId freezeXid, MultiXactId minmulti);
+
 /*
  * API struct for a table AM.  Note this must be allocated in a
  * server-lifetime manner, typically as a static const struct.
@@ -250,6 +256,10 @@ typedef struct TableAmRoutine
 
 	IndexBuildRangeScan_function index_build_range_scan;
 	IndexValidateScan_function index_validate_scan;
+
+	CreateInitFork_function CreateInitFork;
+	EstimateRelSize_function EstimateRelSize;
+	SetNewFileNode_function	SetNewFileNode;
 }			TableAmRoutine;
 
 static inline const TupleTableSlotOps*
@@ -741,6 +751,12 @@ table_index_build_range_scan(Relation heapRelation,
 		scan);
 }
 
+static inline void
+table_create_init_fork(Relation relation)
+{
+	relation->rd_tableamroutine->CreateInitFork(relation);
+}
+
 extern BlockNumber table_parallelscan_nextpage(TableScanDesc scan);
 extern void table_parallelscan_startblock_init(TableScanDesc scan);
 extern Size table_parallelscan_estimate(Snapshot snapshot);
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 4584b3473c..c0e706ecc9 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -77,8 +77,6 @@ extern Oid heap_create_with_catalog(const char *relname,
 						 Oid relrewrite,
 						 ObjectAddress *typaddress);
 
-extern void heap_create_init_fork(Relation rel);
-
 extern void heap_drop_with_catalog(Oid relid);
 
 extern void heap_truncate(List *relids);
-- 
2.18.0.windows.1

0001-Further-fixes-and-cleanup.patchapplication/octet-stream; name=0001-Further-fixes-and-cleanup.patchDownload
From 2982d89e825c334d07aa14e8e5038ea02034e581 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 22 Oct 2018 16:06:10 +1100
Subject: [PATCH 1/3] Further fixes and cleanup

1. Remove the old slot interface file and also update the Makefile.
2. CREATE AS USING method grammer support
3. Materialized view grammer support

This change was missed during earlier USING grammer support.

3. Remove the extra Tuple visibility check

In heapgettup_pagemode the tuple visiblity check is added
during the early devlopment of pluggable storage, but the visibility
check is already carried out in heapgetpage function itself.

4. Handling HeapTupleInvisible case

In update/delete scenarios, when the tuple is
concurrently updated/deleted, sometimes locking
of a tuple may return HeapTupleInvisible. Handle
that case as nothing to do.

regression fixes

1. scan start offset fix during analyze
2. Materialize the slot before they are processed using intorel_receive
3. ROW_MARK_COPY support by force store of heap tuple
4. partition prune extra heap page fix
5. Basic syntax usage tests addition
---
 contrib/pg_visibility/pg_visibility.c       |  5 +-
 src/backend/access/heap/heapam.c            | 28 +++-----
 src/backend/access/heap/heapam_handler.c    |  6 +-
 src/backend/access/heap/heapam_visibility.c |  4 +-
 src/backend/access/index/genam.c            |  3 +-
 src/backend/access/table/Makefile           |  2 +-
 src/backend/access/table/tableam_common.c   |  0
 src/backend/catalog/heap.c                  | 17 +++++
 src/backend/commands/cluster.c              |  6 +-
 src/backend/commands/createas.c             |  5 ++
 src/backend/executor/execExprInterp.c       | 16 ++---
 src/backend/executor/execMain.c             | 13 +---
 src/backend/executor/execReplication.c      |  3 -
 src/backend/executor/execTuples.c           | 21 ++++++
 src/backend/executor/nodeBitmapHeapscan.c   | 12 ++++
 src/backend/executor/nodeModifyTable.c      | 22 +++++-
 src/backend/parser/gram.y                   | 18 ++---
 src/include/executor/tuptable.h             |  1 +
 src/include/nodes/primnodes.h               |  1 +
 src/test/regress/expected/create_am.out     | 78 +++++++++++++++++++++
 src/test/regress/sql/create_am.sql          | 46 ++++++++++++
 21 files changed, 243 insertions(+), 64 deletions(-)
 delete mode 100644 src/backend/access/table/tableam_common.c

diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index dce5262e34..88ca4fd2af 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -563,12 +563,13 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
 
 	rel = relation_open(relid, AccessShareLock);
 
+	/* Only some relkinds have a visibility map */
+	check_relation_relkind(rel);
+
 	if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
 		ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 						errmsg("only heap AM is supported")));
 
-	/* Only some relkinds have a visibility map */
-	check_relation_relkind(rel);
 
 	nblocks = RelationGetNumberOfBlocks(rel);
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ec99d0bcae..ef6b4c3e54 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -947,31 +947,25 @@ heapgettup_pagemode(HeapScanDesc scan,
 			/*
 			 * if current tuple qualifies, return it.
 			 */
-			if (HeapTupleSatisfies(tuple, scan->rs_scan.rs_snapshot, scan->rs_cbuf))
+			if (key != NULL)
 			{
-				/*
-				 * if current tuple qualifies, return it.
-				 */
-				if (key != NULL)
-				{
-					bool		valid;
+				bool		valid;
 
-					HeapKeyTest(tuple, RelationGetDescr(scan->rs_scan.rs_rd),
-								nkeys, key, valid);
-					if (valid)
-					{
-						scan->rs_cindex = lineindex;
-						LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-						return;
-					}
-				}
-				else
+				HeapKeyTest(tuple, RelationGetDescr(scan->rs_scan.rs_rd),
+							nkeys, key, valid);
+				if (valid)
 				{
 					scan->rs_cindex = lineindex;
 					LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 					return;
 				}
 			}
+			else
+			{
+				scan->rs_cindex = lineindex;
+				LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+				return;
+			}
 
 			/*
 			 * otherwise move to the next item on the page
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c3960dc91f..3254e30a45 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1741,7 +1741,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
 {
 	HeapScanDesc scan = (HeapScanDesc) sscan;
 	Page		targpage;
-	OffsetNumber targoffset = scan->rs_cindex;
+	OffsetNumber targoffset;
 	OffsetNumber maxoffset;
 	BufferHeapTupleTableSlot *hslot;
 
@@ -1751,7 +1751,9 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
 	maxoffset = PageGetMaxOffsetNumber(targpage);
 
 	/* Inner loop over all tuples on the selected page */
-	for (targoffset = scan->rs_cindex; targoffset <= maxoffset; targoffset++)
+	for (targoffset = scan->rs_cindex ? scan->rs_cindex : FirstOffsetNumber;
+			targoffset <= maxoffset;
+			targoffset++)
 	{
 		ItemId		itemid;
 		HeapTuple	targtuple = &hslot->base.tupdata;
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 8233475aa0..7bad246f55 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1838,8 +1838,10 @@ HeapTupleSatisfies(HeapTuple stup, Snapshot snapshot, Buffer buffer)
 		case NON_VACUUMABLE_VISIBILTY:
 			return HeapTupleSatisfiesNonVacuumable(stup, snapshot, buffer);
 			break;
-		default:
+		case END_OF_VISIBILITY:
 			Assert(0);
 			break;
 	}
+
+	return false; /* keep compiler quiet */
 }
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index e06bd0479f..94c9702dc1 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,10 +455,9 @@ systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup)
 
 	if (sysscan->irel)
 	{
-		IndexScanDesc scan = sysscan->iscan;
 		IndexFetchHeapData *hscan = (IndexFetchHeapData *) sysscan->iscan->xs_heapfetch;
 
-		Assert(IsMVCCSnapshot(scan->xs_snapshot));
+		Assert(IsMVCCSnapshot(sysscan->iscan->xs_snapshot));
 		//Assert(tup == &hscan->xs_ctup); replace by peeking into slot?
 		Assert(BufferIsValid(hscan->xs_cbuf));
 		/* must hold a buffer lock to call HeapTupleSatisfiesVisibility */
diff --git a/src/backend/access/table/Makefile b/src/backend/access/table/Makefile
index fe22bf9208..006ba99182 100644
--- a/src/backend/access/table/Makefile
+++ b/src/backend/access/table/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/access/table
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = tableam.o tableamapi.o tableam_common.o
+OBJS = tableam.o tableamapi.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/table/tableam_common.c b/src/backend/access/table/tableam_common.c
deleted file mode 100644
index e69de29bb2..0000000000
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 38b368f916..398f90775f 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -42,6 +42,7 @@
 #include "catalog/index.h"
 #include "catalog/objectaccess.h"
 #include "catalog/partition.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_attrdef.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
@@ -1388,6 +1389,22 @@ heap_create_with_catalog(const char *relname,
 			recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
 		}
 
+		/*
+		 * Make a dependency link to force the relation to be deleted if its
+		 * access method is. Do this only for relation and materialized views.
+		 *
+		 * No need to add an explicit dependency with toast, as the original
+		 * table depends on it.
+		 */
+		if ((relkind == RELKIND_RELATION) ||
+				(relkind == RELKIND_MATVIEW))
+		{
+			referenced.classId = AccessMethodRelationId;
+			referenced.objectId = accessmtd;
+			referenced.objectSubId = 0;
+			recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+		}
+
 		if (relacl != NULL)
 		{
 			int			nnewmembers;
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 63974979da..3ce8862a01 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -755,8 +755,6 @@ copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 	Relation	relRelation;
 	HeapTuple	reltup;
 	Form_pg_class relform;
-	TupleDesc	oldTupDesc;
-	TupleDesc	newTupDesc;
 	TransactionId OldestXmin;
 	TransactionId FreezeXid;
 	MultiXactId MultiXactCutoff;
@@ -784,9 +782,7 @@ copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 	 * Their tuple descriptors should be exactly alike, but here we only need
 	 * assume that they have the same number of columns.
 	 */
-	oldTupDesc = RelationGetDescr(OldHeap);
-	newTupDesc = RelationGetDescr(NewHeap);
-	Assert(newTupDesc->natts == oldTupDesc->natts);
+	Assert(RelationGetDescr(NewHeap)->natts == RelationGetDescr(OldHeap)->natts);
 
 	/*
 	 * If the OldHeap has a toast table, get lock on the toast table to keep
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 84de804175..82c0eb2824 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -108,6 +108,7 @@ create_ctas_internal(List *attrList, IntoClause *into)
 	create->oncommit = into->onCommit;
 	create->tablespacename = into->tableSpaceName;
 	create->if_not_exists = false;
+	create->accessMethod = into->accessMethod;
 
 	// PBORKED: toast options
 
@@ -593,6 +594,10 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 	if (myState->rel->rd_rel->relhasoids)
 		slot->tts_tupleOid = InvalidOid;
 
+	/* Materialize the slot */
+	if (!TTS_IS_VIRTUAL(slot))
+		ExecMaterializeSlot(slot);
+
 	table_insert(myState->rel,
 				 slot,
 				 myState->output_cid,
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index ef94ac4aa0..8df85c2f48 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -529,20 +529,13 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 			Assert(TTS_IS_HEAPTUPLE(outerslot) ||
 				   TTS_IS_BUFFERTUPLE(outerslot));
 
-			/* The slot should have a valid heap tuple. */
-#if FIXME
-			/* The slot should have a valid heap tuple. */
-			Assert(hslot->tuple != NULL);
-#endif
-
-			/*
-			 * hari
-			 * Assert(outerslot->tts_storageslotam->slot_is_physical_tuple(outerslot));
-			 */
 			if (attnum == TableOidAttributeNumber)
 				d = ObjectIdGetDatum(outerslot->tts_tableOid);
 			else
 			{
+				/* The slot should have a valid heap tuple. */
+				Assert(hslot->tuple != NULL);
+
 				/* heap_getsysattr has sufficient defenses against bad attnums */
 				d = heap_getsysattr(hslot->tuple, attnum,
 									outerslot->tts_tupleDescriptor,
@@ -570,6 +563,9 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 				Assert(TTS_IS_HEAPTUPLE(scanslot) ||
 					   TTS_IS_BUFFERTUPLE(scanslot));
 
+				if (hslot->tuple == NULL)
+					ExecMaterializeSlot(scanslot);
+
 				d = heap_getsysattr(hslot->tuple, attnum,
 									scanslot->tts_tupleDescriptor,
 									op->resnull);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index e055c0a7c6..34ef86a5bd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2594,7 +2594,7 @@ EvalPlanQual(EState *estate, EPQState *epqstate,
 	 * datums that may be present in copyTuple).  As with the next step, this
 	 * is to guard against early re-use of the EPQ query.
 	 */
-	if (!TupIsNull(slot))
+	if (!TupIsNull(slot) && !TTS_IS_VIRTUAL(slot))
 		ExecMaterializeSlot(slot);
 
 #if FIXME
@@ -2787,16 +2787,7 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
 			if (isNull)
 				continue;
 
-			elog(ERROR, "frak, need to implement ROW_MARK_COPY");
-#ifdef FIXME
-			// FIXME: this should just deform the tuple and store it as a
-			// virtual one.
-			tuple = table_tuple_by_datum(erm->relation, datum, erm->relid);
-
-			/* store tuple */
-			EvalPlanQualSetTuple(epqstate, erm->rti, tuple);
-#endif
-
+			ExecForceStoreHeapTupleDatum(datum, slot);
 		}
 	}
 }
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 452973e4ca..489e7d42a2 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -239,9 +239,6 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 	SnapshotData snap;
 	TransactionId xwait;
 	bool		found;
-	TupleDesc	desc = RelationGetDescr(rel);
-
-	Assert(equalTupleDescs(desc, outslot->tts_tupleDescriptor));
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 917bf80f71..74149cc3ad 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -1364,6 +1364,27 @@ ExecStoreAllNullTuple(TupleTableSlot *slot)
 	return ExecFinishStoreSlotValues(slot);
 }
 
+void
+ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot)
+{
+	HeapTuple	tuple;
+	HeapTupleHeader td;
+
+	td = DatumGetHeapTupleHeader(data);
+
+	tuple = (HeapTuple) palloc(HEAPTUPLESIZE + HeapTupleHeaderGetDatumLength(td));
+	tuple->t_len = HeapTupleHeaderGetDatumLength(td);
+	tuple->t_self = td->t_ctid;
+	tuple->t_data = (HeapTupleHeader) ((char *) tuple + HEAPTUPLESIZE);
+	memcpy((char *) tuple->t_data, (char *) td, tuple->t_len);
+
+	ExecClearTuple(slot);
+
+	heap_deform_tuple(tuple, slot->tts_tupleDescriptor,
+					  slot->tts_values, slot->tts_isnull);
+	ExecFinishStoreSlotValues(slot);
+}
+
 /* --------------------------------
  *		ExecFetchSlotTuple
  *			Fetch the slot's regular physical tuple.
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 56880e3d16..36ca07beb2 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -224,6 +224,18 @@ BitmapHeapNext(BitmapHeapScanState *node)
 
 			BitmapAdjustPrefetchIterator(node, tbmres);
 
+			/*
+			 * Ignore any claimed entries past what we think is the end of the
+			 * relation.  (This is probably not necessary given that we got at
+			 * least AccessShareLock on the table before performing any of the
+			 * indexscans, but let's be safe.)
+			 */
+			if (tbmres->blockno >= scan->rs_nblocks)
+			{
+				node->tbmres = tbmres = NULL;
+				continue;
+			}
+
 			/*
 			 * We can skip fetching the heap page if we don't need any fields
 			 * from the heap, and the bitmap entries don't need rechecking,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 3cc9092413..b3851b180d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -607,7 +607,7 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool canSetTag,
 		   bool changingPart,
 		   bool *tupleDeleted,
-		   TupleTableSlot **epqslot)
+		   TupleTableSlot **epqreturnslot)
 {
 	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
@@ -632,7 +632,7 @@ ExecDelete(ModifyTableState *mtstate,
 		bool		dodelete;
 
 		dodelete = ExecBRDeleteTriggers(estate, epqstate, resultRelInfo,
-										tupleid, oldtuple, epqslot);
+										tupleid, oldtuple, epqreturnslot);
 
 		if (!dodelete)			/* "do nothing" */
 			return NULL;
@@ -724,8 +724,21 @@ ldelete:;
 					/* Tuple no more passing quals, exiting... */
 					return NULL;
 				}
+
+				/**/
+				if (epqreturnslot)
+				{
+					*epqreturnslot = epqslot;
+					return NULL;
+				}
+
 				goto ldelete;
 			}
+			else if (result == HeapTupleInvisible)
+			{
+				/* tuple is not visible; nothing to do */
+				return NULL;
+			}
 		}
 
 		switch (result)
@@ -1196,6 +1209,11 @@ lreplace:;
 				slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
 				goto lreplace;
 			}
+			else if (result == HeapTupleInvisible)
+			{
+				/* tuple is not visible; nothing to do */
+				return NULL;
+			}
 		}
 
 		switch (result)
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 54382aba88..ea48e1d6e8 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4037,7 +4037,6 @@ CreateStatsStmt:
  *
  *****************************************************************************/
 
-// PBORKED: storage option
 CreateAsStmt:
 		CREATE OptTemp TABLE create_as_target AS SelectStmt opt_with_data
 				{
@@ -4068,14 +4067,16 @@ CreateAsStmt:
 		;
 
 create_as_target:
-			qualified_name opt_column_list OptWith OnCommitOption OptTableSpace
+			qualified_name opt_column_list table_access_method_clause
+			OptWith OnCommitOption OptTableSpace
 				{
 					$$ = makeNode(IntoClause);
 					$$->rel = $1;
 					$$->colNames = $2;
-					$$->options = $3;
-					$$->onCommit = $4;
-					$$->tableSpaceName = $5;
+					$$->accessMethod = $3;
+					$$->options = $4;
+					$$->onCommit = $5;
+					$$->tableSpaceName = $6;
 					$$->viewQuery = NULL;
 					$$->skipData = false;		/* might get changed later */
 				}
@@ -4125,14 +4126,15 @@ CreateMatViewStmt:
 		;
 
 create_mv_target:
-			qualified_name opt_column_list opt_reloptions OptTableSpace
+			qualified_name opt_column_list table_access_method_clause opt_reloptions OptTableSpace
 				{
 					$$ = makeNode(IntoClause);
 					$$->rel = $1;
 					$$->colNames = $2;
-					$$->options = $3;
+					$$->accessMethod = $3;
+					$$->options = $4;
 					$$->onCommit = ONCOMMIT_NOOP;
-					$$->tableSpaceName = $4;
+					$$->tableSpaceName = $5;
 					$$->viewQuery = NULL;		/* filled at analysis time */
 					$$->skipData = false;		/* might get changed later */
 				}
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index 05f38cfd0d..20fc425a27 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -476,6 +476,7 @@ extern TupleTableSlot *ExecCopySlot(TupleTableSlot *dstslot,
 
 extern void ExecForceStoreHeapTuple(HeapTuple tuple,
 			   TupleTableSlot *slot);
+extern void ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot);
 
 extern void slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum);
 extern Datum slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull);
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 40f6eb03d2..4d194a8c2a 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -111,6 +111,7 @@ typedef struct IntoClause
 
 	RangeVar   *rel;			/* target relation name */
 	List	   *colNames;		/* column names to assign, or NIL */
+	char	   *accessMethod;	/* table access method */
 	List	   *options;		/* options from WITH clause */
 	OnCommitAction onCommit;	/* what do we do at COMMIT? */
 	char	   *tableSpaceName; /* table space to use, or NULL */
diff --git a/src/test/regress/expected/create_am.out b/src/test/regress/expected/create_am.out
index 47dd885c4e..a4094ca3f1 100644
--- a/src/test/regress/expected/create_am.out
+++ b/src/test/regress/expected/create_am.out
@@ -99,3 +99,81 @@ HINT:  Use DROP ... CASCADE to drop the dependent objects too.
 -- Drop access method cascade
 DROP ACCESS METHOD gist2 CASCADE;
 NOTICE:  drop cascades to index grect2ind2
+-- Create a heap2 table am handler with heapam handler
+CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler;
+SELECT * FROM pg_am where amtype = 't';
+ amname |      amhandler       | amtype 
+--------+----------------------+--------
+ heap   | heap_tableam_handler | t
+ heap2  | heap_tableam_handler | t
+(2 rows)
+
+CREATE TABLE tbl_heap2(f1 int, f2 char(100)) using heap2;
+INSERT INTO tbl_heap2 VALUES(generate_series(1,10), 'Test series');
+SELECT count(*) FROM tbl_heap2;
+ count 
+-------
+    10
+(1 row)
+
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+		where a.oid = r.relam AND r.relname = 'tbl_heap2';
+  relname  | relkind | amname 
+-----------+---------+--------
+ tbl_heap2 | r       | heap2
+(1 row)
+
+-- create table as using heap2
+CREATE TABLE tblas_heap2 using heap2 AS select * from tbl_heap2;
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+		where a.oid = r.relam AND r.relname = 'tblas_heap2';
+   relname   | relkind | amname 
+-------------+---------+--------
+ tblas_heap2 | r       | heap2
+(1 row)
+
+--
+-- select into doesn't support new syntax, so it should be
+-- default access method.
+--
+SELECT INTO tblselectinto_heap from tbl_heap2;
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+		where a.oid = r.relam AND r.relname = 'tblselectinto_heap';
+      relname       | relkind | amname 
+--------------------+---------+--------
+ tblselectinto_heap | r       | heap
+(1 row)
+
+DROP TABLE tblselectinto_heap;
+-- create materialized view using heap2
+CREATE MATERIALIZED VIEW mv_heap2 USING heap2 AS
+		SELECT * FROM tbl_heap2;
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+		where a.oid = r.relam AND r.relname = 'mv_heap2';
+ relname  | relkind | amname 
+----------+---------+--------
+ mv_heap2 | m       | heap2
+(1 row)
+
+-- Try creating the unsupported relation kinds with using syntax
+CREATE VIEW test_view USING heap2 AS SELECT * FROM tbl_heap2;
+ERROR:  syntax error at or near "USING"
+LINE 1: CREATE VIEW test_view USING heap2 AS SELECT * FROM tbl_heap2...
+                              ^
+CREATE SEQUENCE test_seq USING heap2;
+ERROR:  syntax error at or near "USING"
+LINE 1: CREATE SEQUENCE test_seq USING heap2;
+                                 ^
+-- Drop table access method, but fails as objects depends on it
+DROP ACCESS METHOD heap2;
+ERROR:  cannot drop access method heap2 because other objects depend on it
+DETAIL:  table tbl_heap2 depends on access method heap2
+table tblas_heap2 depends on access method heap2
+materialized view mv_heap2 depends on access method heap2
+HINT:  Use DROP ... CASCADE to drop the dependent objects too.
+-- Drop table access method with cascade
+DROP ACCESS METHOD heap2 CASCADE;
+NOTICE:  drop cascades to 3 other objects
+DETAIL:  drop cascades to table tbl_heap2
+drop cascades to table tblas_heap2
+drop cascades to materialized view mv_heap2
diff --git a/src/test/regress/sql/create_am.sql b/src/test/regress/sql/create_am.sql
index 3e0ac104f3..0472a60f20 100644
--- a/src/test/regress/sql/create_am.sql
+++ b/src/test/regress/sql/create_am.sql
@@ -66,3 +66,49 @@ DROP ACCESS METHOD gist2;
 
 -- Drop access method cascade
 DROP ACCESS METHOD gist2 CASCADE;
+
+-- Create a heap2 table am handler with heapam handler
+CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler;
+
+SELECT * FROM pg_am where amtype = 't';
+
+CREATE TABLE tbl_heap2(f1 int, f2 char(100)) using heap2;
+INSERT INTO tbl_heap2 VALUES(generate_series(1,10), 'Test series');
+SELECT count(*) FROM tbl_heap2;
+
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+		where a.oid = r.relam AND r.relname = 'tbl_heap2';
+
+-- create table as using heap2
+CREATE TABLE tblas_heap2 using heap2 AS select * from tbl_heap2;
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+		where a.oid = r.relam AND r.relname = 'tblas_heap2';
+
+--
+-- select into doesn't support new syntax, so it should be
+-- default access method.
+--
+SELECT INTO tblselectinto_heap from tbl_heap2;
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+		where a.oid = r.relam AND r.relname = 'tblselectinto_heap';
+
+DROP TABLE tblselectinto_heap;
+
+-- create materialized view using heap2
+CREATE MATERIALIZED VIEW mv_heap2 USING heap2 AS
+		SELECT * FROM tbl_heap2;
+
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+		where a.oid = r.relam AND r.relname = 'mv_heap2';
+
+-- Try creating the unsupported relation kinds with using syntax
+CREATE VIEW test_view USING heap2 AS SELECT * FROM tbl_heap2;
+
+CREATE SEQUENCE test_seq USING heap2;
+
+
+-- Drop table access method, but fails as objects depends on it
+DROP ACCESS METHOD heap2;
+
+-- Drop table access method with cascade
+DROP ACCESS METHOD heap2 CASCADE;
-- 
2.18.0.windows.1

#48Asim R P
apraveen@pivotal.io
In reply to: Haribabu Kommi (#47)
Re: Pluggable Storage - Andres's take

Ashwin (copied) and I got a chance to go through the latest code from
Andres' github repository. We would like to share some
comments/quesitons:

The TupleTableSlot argument is well suited for row-oriented storage.
For a column-oriented storage engine, a projection list indicating the
columns to be scanned may be necessary. Is it possible to share this
information with current interface?

We realized that DDLs such as heap_create_with_catalog() are not
generalized. Haribabu's latest patch that adds
SetNewFileNode_function() and CreateInitFort_function() is a step
towards this end. However, the current API assumes that the storage
engine uses relation forks. Isn't that too restrictive?

TupleDelete_function() accepts changingPart as a parameter to indicate
if this deletion is part of a movement from one partition to another.
Partitioning is a higher level abstraction as compared to storage.
Ideally, storage layer should have no knowledge of partitioning. The
tuple delete API should not accept any parameter related to
partitioning.

The API needs to be more accommodating towards block sizes used in
storage engines. Currently, the same block size as heap seems to be
assumed, as evident from the type of some members of generic scan
object:

typedef struct TableScanDescData
{
/* state set up at initscan time */
BlockNumber rs_nblocks; /* total number of blocks in rel */
BlockNumber rs_startblock; /* block # to start at */
BlockNumber rs_numblocks; /* max number of blocks to scan */
/* rs_numblocks is usually InvalidBlockNumber, meaning "scan whole rel" */
bool rs_syncscan; /* report location to syncscan logic? */
} TableScanDescData;

Using bytes to represent this information would be more generic. E.g.
rs_startlocation as bytes/offset instead of rs_startblock and so on.

Asim

#49Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Asim R P (#48)
Re: Pluggable Storage - Andres's take

On Thu, Nov 22, 2018 at 1:12 PM Asim R P <apraveen@pivotal.io> wrote:

Ashwin (copied) and I got a chance to go through the latest code from
Andres' github repository. We would like to share some
comments/quesitons:

Thanks for the review.

The TupleTableSlot argument is well suited for row-oriented storage.
For a column-oriented storage engine, a projection list indicating the
columns to be scanned may be necessary. Is it possible to share this
information with current interface?

Currently all the interfaces are designed for row-oriented storage, as you
said we need a new API for projection list. The current patch set itself
is big and it needs to stabilized and then in the next set of the patches,
those new API's will be added that will be useful for columnar storage.

We realized that DDLs such as heap_create_with_catalog() are not
generalized. Haribabu's latest patch that adds
SetNewFileNode_function() and CreateInitFort_function() is a step
towards this end. However, the current API assumes that the storage
engine uses relation forks. Isn't that too restrictive?

Current set of API has many assumptions and uses the existing framework.
Thanks for your point, will check it how to enhance it.

TupleDelete_function() accepts changingPart as a parameter to indicate
if this deletion is part of a movement from one partition to another.
Partitioning is a higher level abstraction as compared to storage.
Ideally, storage layer should have no knowledge of partitioning. The
tuple delete API should not accept any parameter related to
partitioning.

Thanks for your point, will look into it in how to change extract it.

The API needs to be more accommodating towards block sizes used in
storage engines. Currently, the same block size as heap seems to be
assumed, as evident from the type of some members of generic scan
object:

typedef struct TableScanDescData
{
/* state set up at initscan time */
BlockNumber rs_nblocks; /* total number of blocks in rel */
BlockNumber rs_startblock; /* block # to start at */
BlockNumber rs_numblocks; /* max number of blocks to scan */
/* rs_numblocks is usually InvalidBlockNumber, meaning "scan whole rel"
*/
bool rs_syncscan; /* report location to syncscan logic? */
} TableScanDescData;

Using bytes to represent this information would be more generic. E.g.
rs_startlocation as bytes/offset instead of rs_startblock and so on.

I doubt that this may not be the only one that needs a change to support
different block sizes for different storage interfaces. Thanks for your
point,
but definitely this can be taken care in the next set of patches.

Andres, as the tupletableslot changes are committed, do you want me to
share the rebased pluggable storage patch? you already working on it?

Regards,
Haribabu Kommi
Fujitsu Australia

#50Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#49)
Re: Pluggable Storage - Andres's take

Hi,

FWIW, now that oids are removed, and the tuple table slot abstraction
got in, I'm working on rebasing the pluggable storage patchset ontop of
that.

On 2018-11-27 12:48:36 +1100, Haribabu Kommi wrote:

On Thu, Nov 22, 2018 at 1:12 PM Asim R P <apraveen@pivotal.io> wrote:

Ashwin (copied) and I got a chance to go through the latest code from
Andres' github repository. We would like to share some
comments/quesitons:

Thanks for the review.

The TupleTableSlot argument is well suited for row-oriented storage.
For a column-oriented storage engine, a projection list indicating the
columns to be scanned may be necessary. Is it possible to share this
information with current interface?

Currently all the interfaces are designed for row-oriented storage, as you
said we need a new API for projection list. The current patch set itself
is big and it needs to stabilized and then in the next set of the patches,
those new API's will be added that will be useful for columnar storage.

Precisely.

TupleDelete_function() accepts changingPart as a parameter to indicate
if this deletion is part of a movement from one partition to another.
Partitioning is a higher level abstraction as compared to storage.
Ideally, storage layer should have no knowledge of partitioning. The
tuple delete API should not accept any parameter related to
partitioning.

Thanks for your point, will look into it in how to change extract it.

I don't think that's actually a problem. The changingPart parameter is
just a marker that the deletion is part of moving a tuple across
partitions. For heap and everythign compatible that's used to include
information to the tuple that concurrent modifications to the tuple
should error out when reaching such a tuple via EPQ.

Andres, as the tupletableslot changes are committed, do you want me to
share the rebased pluggable storage patch? you already working on it?

Working on it.

Greetings,

Andres Freund

#51Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Haribabu Kommi (#46)
Re: Pluggable Storage - Andres's take

Hi,

On 2018/11/02 9:17, Haribabu Kommi wrote:

Here I attached the cumulative fixes of the patches, new API additions for
zheap and
basic outline of the documentation.

I've read the documentation patch while also looking at the code and here
are some comments.

+   Each table is stored as its own physical
<firstterm>relation</firstterm> and so
+   is described by an entry in the <structname>pg_class</structname> catalog.

I think the "so" in "and so is described by an entry in..." is not necessary.

+ The contents of an table are entirely under the control of its access
method.

"a" table

+   (All the access methods furthermore use the standard page layout
described in
+   <xref linkend="storage-page-layout"/>.)

Maybe write the two sentences above as:

A table's content is entirely controlled by its access method, although
all access methods use the same standard page layout described in <xref
linkend="storage-page-layout"/>.

+    SlotCallbacks_function slot_callbacks;
+
+    SnapshotSatisfies_function snapshot_satisfies;
+    SnapshotSatisfiesUpdate_function snapshot_satisfiesUpdate;
+    SnapshotSatisfiesVacuum_function snapshot_satisfiesVacuum;

Like other functions, how about a one sentence comment for these, like:

/*
* Function to get an AM-specific set of functions for manipulating
* TupleTableSlots
*/
SlotCallbacks_function slot_callbacks;

/* AM-specific snapshot visibility determination functions */
SnapshotSatisfies_function snapshot_satisfies;
SnapshotSatisfiesUpdate_function snapshot_satisfiesUpdate;
SnapshotSatisfiesVacuum_function snapshot_satisfiesVacuum;

+    TupleFetchFollow_function tuple_fetch_follow;
+
+    GetTupleData_function get_tuple_data;

How about removing the empty line so that get_tuple_data can be seen as
part of the group /* Operations on physical tuples */

+    RelationVacuum_function relation_vacuum;
+    RelationScanAnalyzeNextBlock_function scan_analyze_next_block;
+    RelationScanAnalyzeNextTuple_function scan_analyze_next_tuple;
+    RelationCopyForCluster_function relation_copy_for_cluster;
+    RelationSync_function relation_sync;

Add /* Operations to support VACUUM/ANALYZE */ as a description for this
group?

+    BitmapPagescan_function scan_bitmap_pagescan;
+    BitmapPagescanNext_function scan_bitmap_pagescan_next;

Add /* Operations to support bitmap scans */ as a description for this group?

+    SampleScanNextBlock_function scan_sample_next_block;
+    SampleScanNextTuple_function scan_sample_next_tuple;

Add /* Operations to support sampling scans */ as a description for this
group?

+    ScanEnd_function scan_end;
+    ScanRescan_function scan_rescan;
+    ScanUpdateSnapshot_function scan_update_snapshot;

Move these two to be in the /* Operations on relation scans */ group?

+    BeginIndexFetchTable_function begin_index_fetch;
+    EndIndexFetchTable_function reset_index_fetch;
+    EndIndexFetchTable_function end_index_fetch;

Add /* Operations to support index scans */ as a description for this group?

+    IndexBuildRangeScan_function index_build_range_scan;
+    IndexValidateScan_function index_validate_scan;

Add /* Operations to support index build */ as a description for this group?

+ CreateInitFork_function CreateInitFork;

Add /* Function to create an init fork for unlogged tables */?

By the way, I can see the following two in the source code, but not in the
documentation.

EstimateRelSize_function EstimateRelSize;
SetNewFileNode_function SetNewFileNode;

+   The table construction and maintenance functions that an table access
+   method must provide in <structname>TableAmRoutine</structname> are:

"a" table access method

+  <para>
+<programlisting>
+TupleTableSlotOps *
+slot_callbacks (Relation relation);
+</programlisting>
+   API to access the slot specific methods;
+   Following methods are available;
+   <structname>TTSOpsVirtual</structname>,
+   <structname>TTSOpsHeapTuple</structname>,
+   <structname>TTSOpsMinimalTuple</structname>,
+   <structname>TTSOpsBufferTuple</structname>,
+  </para>

Unless I'm misunderstanding what the TupleTableSlotOps abstraction is or
its relations to the TableAmRoutine abstraction, I think the text
description could better be written as:

"API to get the slot operations struct for a given table access method"

It's not clear to me why various TTSOps* structs are listed here? Is the
point that different AMs may choose one of the listed alternatives? For
example, I see that heap AM implementation returns TTOpsBufferTuple, so it
manipulates slots containing buffered tuples, right? Other AMs are free
to return any one of these? For example, some AMs may never use buffer
manager and hence not use TTOpsBufferTuple. Is that understanding correct?

+  <para>
+<programlisting>
+bool
+snapshot_satisfies (TupleTableSlot *slot, Snapshot snapshot);
+</programlisting>
+   API to check whether the provided slot is visible to the current
+   transaction according the snapshot.
+  </para>

Do you mean:

"API to check whether the tuple contained in the provided slot is
visible...."?

+  <para>
+<programlisting>
+Oid
+tuple_insert (Relation rel, TupleTableSlot *slot, CommandId cid,
+              int options, BulkInsertState bistate);
+</programlisting>
+   API to insert the tuple and provide the <literal>ItemPointerData</literal>
+   where the tuple is successfully inserted.
+  </para>

It's not clear from the signature where you get the ItemPointerData.
Looking at heapam_tuple_insert which puts it in slot->tts_tid, I think
this should mention it a bit differently, like:

API to insert the tuple contained in the provided slot and return its TID,
that is, the location where the tuple is successfully inserted

+   API to insert the tuple with a speculative token. This API is similar
+   like <literal>tuple_insert</literal>, with additional speculative
+   information.

How about:

This API is similar to <literal>tuple_insert</literal>, although with
additional information necessary for speculative insertion

+  <para>
+<programlisting>
+void
+tuple_complete_speculative (Relation rel,
+                          TupleTableSlot *slot,
+                          uint32 specToken,
+                          bool succeeded);
+</programlisting>
+   API to complete the state of the tuple inserted by the API
<literal>tuple_insert_speculative</literal>
+   with the successful completion of the index insert.
+  </para>

How about:

API to complete the speculative insertion of a tuple started by
<literal>tuple_insert_speculative</literal>, invoked after finishing the
index insert

+  <para>
+<programlisting>
+bool
+tuple_fetch_row_version (Relation relation,
+                       ItemPointer tid,
+                       Snapshot snapshot,
+                       TupleTableSlot *slot,
+                       Relation stats_relation);
+</programlisting>
+   API to fetch and store the Buffered Heap tuple in the provided slot
+   based on the ItemPointer.
+  </para>

It seems that this description is based on what heapam_fetch_row_version()
does, but it should be more generic, maybe like:

API to fetch a buffered tuple given its TID and store it in the provided slot

+  <para>
+<programlisting>
+HTSU_Result
+TupleLock_function (Relation relation,
+                   ItemPointer tid,
+                   Snapshot snapshot,
+                   TupleTableSlot *slot,
+                   CommandId cid,
+                   LockTupleMode mode,
+                   LockWaitPolicy wait_policy,
+                   uint8 flags,
+                   HeapUpdateFailureData *hufd);

I guess you meant to write "tuple_lock" here, not "TupleLock_function".

+</programlisting>
+   API to lock the specified the ItemPointer tuple and fetches the newest
version of
+   its tuple and TID.
+  </para>

How about:

API to lock the specified tuple and return the TID of its newest version

+  <para>
+<programlisting>
+void
+tuple_get_latest_tid (Relation relation,
+                    Snapshot snapshot,
+                    ItemPointer tid);
+</programlisting>
+   API to get the the latest TID of the tuple with the given itempointer.
+  </para>

How about:

API to get TID of the latest version of the specified tuple

+  <para>
+<programlisting>
+bool
+tuple_fetch_follow (struct IndexFetchTableData *scan,
+                  ItemPointer tid,
+                  Snapshot snapshot,
+                  TupleTableSlot *slot,
+                  bool *call_again, bool *all_dead);
+</programlisting>
+   API to get the all the tuples of the page that satisfies itempointer.
+  </para>

IIUC, "all the tuples of of the page" in the above sentence means all the
tuples in the HOT chain of a given heap tuple, making this description of
the API slightly specific to the heap AM. Can we make the description
more generic or is the API itself very specific that it cannot be
expressed in generic terms? Ignoring that for a moment, I think the
sentence contains more "the"s than there need to be, so maybe write as:

API to get all tuples on a given page that are linked to the tuple of the
given TID

+  <para>
+<programlisting>
+tuple_data
+get_tuple_data (TupleTableSlot *slot, tuple_data_flags flags);
+</programlisting>
+   API to return the internal structure members of the HeapTuple.
+  </para>

I think this description doesn't mention enough details of both the
information that needs to be specified when calling the function (what's
in flags) and the information that's returned.

+  <para>
+<programlisting>
+bool
+scan_analyze_next_tuple (TableScanDesc scan, TransactionId OldestXmin,
+                      double *liverows, double *deadrows, TupleTableSlot
*slot));
+</programlisting>
+   API to analyze the block and fill the buffered heap tuple in the slot
and also
+   provide the live and dead rows.
+  </para>

How about:

API to get the next tuple from the block being scanned, which also updates
the number of live and dead rows encountered

+  <para>
+<programlisting>
+void
+relation_copy_for_cluster (Relation NewHeap, Relation OldHeap, Relation
OldIndex,
+                       bool use_sort,
+                       TransactionId OldestXmin, TransactionId FreezeXid,
MultiXactId MultiXactCutoff,
+                       double *num_tuples, double *tups_vacuumed, double
*tups_recently_dead);
+</programlisting>
+   API to copy one relation to another relation eith using the Index or
table scan.
+  </para>

Typo: eith -> either

But maybe, rewrite this as:

API to make a copy of the content of a relation, optionally sorted using
either the specified index or by sorting explicitly

+  <para>
+<programlisting>
+TableScanDesc
+scan_begin (Relation relation,
+            Snapshot snapshot,
+            int nkeys, ScanKey key,
+            ParallelTableScanDesc parallel_scan,
+            bool allow_strat,
+            bool allow_sync,
+            bool allow_pagemode,
+            bool is_bitmapscan,
+            bool is_samplescan,
+            bool temp_snap);
+</programlisting>
+   API to start the relation scan for the provided relation and returns the
+   <structname>TableScanDesc</structname> structure.
+  </para>

How about:

API to start a scan of a relation using specified options, which returns
the <structname>TableScanDesc</structname> structure to be used for
subsequent scan operations

+    <para>
+<programlisting>
+void
+scansetlimits (TableScanDesc sscan, BlockNumber startBlk, BlockNumber
numBlks);
+</programlisting>
+   API to fix the relation scan range limits.
+  </para>

How about:

API to set scan range endpoints

+    <para>
+<programlisting>
+bool
+scan_bitmap_pagescan (TableScanDesc scan,
+                    TBMIterateResult *tbmres);
+</programlisting>
+   API to scan the relation and fill the scan description bitmap with
valid item pointers
+   for the specified block.
+  </para>

This says "to scan the relation", but seems to be concerned with only a
page worth of data as the name also says. Also, it's not clear what "scan
description bitmap" means. Maybe write as:

API to scan the relation block specified in the scan descriptor to collect
and return the tuples requested by the given bitmap

+    <para>
+<programlisting>
+bool
+scan_bitmap_pagescan_next (TableScanDesc scan,
+                        TupleTableSlot *slot);
+</programlisting>
+   API to fill the buffered heap tuple data from the bitmap scanned item
pointers and store
+   it in the provided slot.
+  </para>

How about:

API to select the next tuple from the set of tuples of a given page
specified in the scan descriptor and return in the provided slot; returns
false if no more tuples to return on the given page

+    <para>
+<programlisting>
+bool
+scan_sample_next_block (TableScanDesc scan, struct SampleScanState
*scanstate);
+</programlisting>
+   API to scan the relation and fill the scan description bitmap with
valid item pointers
+   for the specified block provided by the sample method.
+  </para>

Looking at the code, this API selects the next block using the sampling
method and nothing more, although I see that the heap AM implementation
also does heapgetpage thus collecting live tuples in the array known only
to heap AM. So, how about:

API to select the next block of the relation using the given sampling
method and set its information in the scan descriptor

+    <para>
+<programlisting>
+bool
+scan_sample_next_tuple (TableScanDesc scan, struct SampleScanState
*scanstate, TupleTableSlot *slot);
+</programlisting>
+   API to fill the buffered heap tuple data from the bitmap scanned item
pointers based on the sample
+   method and store it in the provided slot.
+  </para>

How about:

API to select the next tuple using the given sampling method from the set
of tuples collected from the block previously selected by the sampling method

+    <para>
+<programlisting>
+void
+scan_rescan (TableScanDesc scan, ScanKey key, bool set_params,
+             bool allow_strat, bool allow_sync, bool allow_pagemode);
+</programlisting>
+   API to restart the relation scan with provided data.
+  </para>

How about:

API to restart the given scan using provided options, releasing any
resources (such as buffer pins) already held by the scan

+  <para>
+<programlisting>
+void
+scan_update_snapshot (TableScanDesc scan, Snapshot snapshot);
+</programlisting>
+   API to update the relation scan with the new snapshot.
+  </para>

How about:

API to set the visibility snapshot to be used by a given scan

+  <para>
+<programlisting>
+IndexFetchTableData *
+begin_index_fetch (Relation relation);
+</programlisting>
+   API to prepare the <structname>IndexFetchTableData</structname> for
the relation.
+  </para>

This API is a bit vague. As in, it's not clear from the name when it's to
be called and what's be to be done with the returned struct. How about at
least adding more details about what the returned struct is for, like:

API to get the <structname>IndexFetchTableData</structname> to be assigned
to an index scan on the specified relation

+  <para>
+<programlisting>
+void
+reset_index_fetch (struct IndexFetchTableData* data);
+</programlisting>
+   API to reset the prepared internal members of the
<structname>IndexFetchTableData</structname>.
+  </para>

This description seems wrong if I look at the code. Its purpose seems to
be reset the AM-specific members, such as releasing the buffer pin held in
xs_cbuf in the heap AM's case.

How about:

API to release AM-specific resources held by the
<structname>IndexFetchTableData</structname> of a given index scan

+  <para>
+<programlisting>
+void
+end_index_fetch (struct IndexFetchTableData* data);
+</programlisting>
+   API to clear and free the <structname>IndexFetchTableData</structname>.
+  </para>

Given above, how about:

API to release AM-specific resources held by the
<structname>IndexFetchTableData</structname> of a given index scan and
free the memory of <structname>IndexFetchTableData</structname> itself

+    <para>
+<programlisting>
+double
+index_build_range_scan (Relation heapRelation,
+                       Relation indexRelation,
+                       IndexInfo *indexInfo,
+                       bool allow_sync,
+                       bool anyvisible,
+                       BlockNumber start_blockno,
+                       BlockNumber end_blockno,
+                       IndexBuildCallback callback,
+                       void *callback_state,
+                       TableScanDesc scan);
+</programlisting>
+   API to perform the table scan with bounded range specified by the caller
+   and insert the satisfied records into the index using the provided
callback
+   function pointer.
+  </para>

This is a bit heavy API and the above description lacks some details.
Also, isn't it a bit misleading to use the name end_blockno if it is
interpreted as num_blocks by the internal APIs?

How about:

API to scan the specified blocks of the given table and insert them into
the specified index using the provided callback function

+    <para>
+<programlisting>
+void
+index_validate_scan (Relation heapRelation,
+                   Relation indexRelation,
+                   IndexInfo *indexInfo,
+                   Snapshot snapshot,
+                   struct ValidateIndexState *state);
+</programlisting>
+   API to perform the table scan and insert the satisfied records into
the index.
+   This API is similar like <function>index_build_range_scan</function>.
This
+   is used in the scenario of concurrent index build.
+  </para>

This one's a complicated API too. How about:

API to scan the table according to the given snapshot and insert tuples
satisfying the snapshot into the specified index, provided their TIDs are
also present in the <structname>ValidateIndexState</structname> struct;
this API is used as the last phase of a concurrent index build

+ <sect2>
+  <title>Table scanning</title>
+
+  <para>
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Table insert/update/delete</title>
+
+  <para>
+  </para>
+  </sect2>
+
+ <sect2>
+  <title>Table locking</title>
+
+  <para>
+  </para>
+  </sect2>
+
+ <sect2>
+  <title>Table vacuum</title>
+
+  <para>
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Table fetch</title>
+
+  <para>
+  </para>
+ </sect2>

Seems like you forgot to put the individual API descriptions under these
sub-headers. Actually, I think it'd be better to try to format this page
to looks more like the following:

https://www.postgresql.org/docs/devel/fdw-callbacks.html

-   Currently, only indexes have access methods.  The requirements for index
-   access methods are discussed in detail in <xref linkend="indexam"/>.
+   Currently, only <literal>INDEX</literal> and <literal>TABLE</literal> have
+   access methods.  The requirements for access methods are discussed in
detail
+   in <xref linkend="am"/>.

Hmm, I don't see why you decided to add literal tags to INDEX and TABLE.
Couldn't this have been written as:

Currently, only tables and indexes have access methods. The requirements
for access methods are discussed in detail in <xref linkend="am"/>.

+        This variable specifies the default table access method using
which to create
+        objects (tables and materialized views) when a
<command>CREATE</command> command does
+        not explicitly specify a access method.

"variable" is not wrong, but "parameter" is used more often for GUCs. "a
access method" should be "an access method".

Maybe you could write this as:

This variable specifies the default table access method to use when
creating tables or materialized views if the <command>CREATE</command>
does not explicitly specify an access method.

+        If the value does not match the name of any existing table access
methods,
+        <productname>PostgreSQL</productname> will automatically use the
default
+        table access method of the current database.

any existing table methods -> any existing table method

Although, shouldn't that cause an error instead of ignoring the error and
use the database default access method instead?

Thank you for working on this. Really looking forward to how this shapes
up. :)

Thanks,
Amit

#52Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Amit Langote (#51)
Re: Pluggable Storage - Andres's take

On Fri, Nov 16, 2018 at 2:05 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

I tried running the pgbench performance tests with minimal clients in my laptop and I didn't
find any performance issues, may be issue is visible only with higher clients. Even with
perf tool, I am not able to get a clear problem function. As you said, combining of all changes
leads to some overhead.

Just out of curiosity I've also tried tpc-c from oltpbench (in the very same
simple environment), it doesn't show any significant difference from master as
well.

Here I attached the cumulative patches with further fixes and basic syntax regress tests also.

While testing the latest version I've noticed, that you didn't include the fix
for HeapTupleInvisible (so I see the error again), was it intentionally?

On Tue, Nov 27, 2018 at 2:55 AM Andres Freund <andres@anarazel.de> wrote:

FWIW, now that oids are removed, and the tuple table slot abstraction
got in, I'm working on rebasing the pluggable storage patchset ontop of
that.

Yes, please. I've tried it myself for reviewing purposes, but the rebasing
speed was not impressive. Also I want to suggest to move it from github and
make a regular patchset, since it's already a bit confusing in the sense what
goes where and which patch to apply on top of which branch.

#53Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#47)
Re: Pluggable Storage - Andres's take

Hi,

Thanks for these changes. I've merged a good chunk of them.

On 2018-11-16 12:05:26 +1100, Haribabu Kommi wrote:

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c3960dc91f..3254e30a45 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1741,7 +1741,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
{
HeapScanDesc scan = (HeapScanDesc) sscan;
Page		targpage;
-	OffsetNumber targoffset = scan->rs_cindex;
+	OffsetNumber targoffset;
OffsetNumber maxoffset;
BufferHeapTupleTableSlot *hslot;

@@ -1751,7 +1751,9 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
maxoffset = PageGetMaxOffsetNumber(targpage);

/* Inner loop over all tuples on the selected page */
-	for (targoffset = scan->rs_cindex; targoffset <= maxoffset; targoffset++)
+	for (targoffset = scan->rs_cindex ? scan->rs_cindex : FirstOffsetNumber;
+			targoffset <= maxoffset;
+			targoffset++)
{
ItemId		itemid;
HeapTuple	targtuple = &hslot->base.tupdata;

I thought it was better to fix the initialization for rs_cindex - any
reason you didn't go for that?

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 8233475aa0..7bad246f55 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1838,8 +1838,10 @@ HeapTupleSatisfies(HeapTuple stup, Snapshot snapshot, Buffer buffer)
case NON_VACUUMABLE_VISIBILTY:
return HeapTupleSatisfiesNonVacuumable(stup, snapshot, buffer);
break;
-		default:
+		case END_OF_VISIBILITY:
Assert(0);
break;
}
+
+	return false; /* keep compiler quiet */

I don't understand why END_OF_VISIBILITY is good idea? I now removed
END_OF_VISIBILITY, and the default case.

@@ -593,6 +594,10 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
if (myState->rel->rd_rel->relhasoids)
slot->tts_tupleOid = InvalidOid;

+	/* Materialize the slot */
+	if (!TTS_IS_VIRTUAL(slot))
+		ExecMaterializeSlot(slot);
+
table_insert(myState->rel,
slot,
myState->output_cid,

What's the point of adding materialization here?

@@ -570,6 +563,9 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
Assert(TTS_IS_HEAPTUPLE(scanslot) ||
TTS_IS_BUFFERTUPLE(scanslot));

+				if (hslot->tuple == NULL)
+					ExecMaterializeSlot(scanslot);
+
d = heap_getsysattr(hslot->tuple, attnum,
scanslot->tts_tupleDescriptor,
op->resnull);

Same?

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index e055c0a7c6..34ef86a5bd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2594,7 +2594,7 @@ EvalPlanQual(EState *estate, EPQState *epqstate,
* datums that may be present in copyTuple).  As with the next step, this
* is to guard against early re-use of the EPQ query.
*/
-	if (!TupIsNull(slot))
+	if (!TupIsNull(slot) && !TTS_IS_VIRTUAL(slot))
ExecMaterializeSlot(slot);

Same?

#if FIXME
@@ -2787,16 +2787,7 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
if (isNull)
continue;

-			elog(ERROR, "frak, need to implement ROW_MARK_COPY");
-#ifdef FIXME
-			// FIXME: this should just deform the tuple and store it as a
-			// virtual one.
-			tuple = table_tuple_by_datum(erm->relation, datum, erm->relid);
-
-			/* store tuple */
-			EvalPlanQualSetTuple(epqstate, erm->rti, tuple);
-#endif
-
+			ExecForceStoreHeapTupleDatum(datum, slot);
}
}
}

Cool.

diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 56880e3d16..36ca07beb2 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -224,6 +224,18 @@ BitmapHeapNext(BitmapHeapScanState *node)

BitmapAdjustPrefetchIterator(node, tbmres);

+			/*
+			 * Ignore any claimed entries past what we think is the end of the
+			 * relation.  (This is probably not necessary given that we got at
+			 * least AccessShareLock on the table before performing any of the
+			 * indexscans, but let's be safe.)
+			 */
+			if (tbmres->blockno >= scan->rs_nblocks)
+			{
+				node->tbmres = tbmres = NULL;
+				continue;
+			}
+

I moved this into the storage engine, there just was a minor bug
preventing the already existing check from taking effect. I don't think
we should expose this kind of thing to the outside of the storage
engine.

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 54382aba88..ea48e1d6e8 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4037,7 +4037,6 @@ CreateStatsStmt:
*
*****************************************************************************/

-// PBORKED: storage option
CreateAsStmt:
CREATE OptTemp TABLE create_as_target AS SelectStmt opt_with_data
{
@@ -4068,14 +4067,16 @@ CreateAsStmt:
;

create_as_target:
-			qualified_name opt_column_list OptWith OnCommitOption OptTableSpace
+			qualified_name opt_column_list table_access_method_clause
+			OptWith OnCommitOption OptTableSpace
{
$$ = makeNode(IntoClause);
$$->rel = $1;
$$->colNames = $2;
-					$$->options = $3;
-					$$->onCommit = $4;
-					$$->tableSpaceName = $5;
+					$$->accessMethod = $3;
+					$$->options = $4;
+					$$->onCommit = $5;
+					$$->tableSpaceName = $6;
$$->viewQuery = NULL;
$$->skipData = false;		/* might get changed later */
}
@@ -4125,14 +4126,15 @@ CreateMatViewStmt:
;
create_mv_target:
-			qualified_name opt_column_list opt_reloptions OptTableSpace
+			qualified_name opt_column_list table_access_method_clause opt_reloptions OptTableSpace
{
$$ = makeNode(IntoClause);
$$->rel = $1;
$$->colNames = $2;
-					$$->options = $3;
+					$$->accessMethod = $3;
+					$$->options = $4;
$$->onCommit = ONCOMMIT_NOOP;
-					$$->tableSpaceName = $4;
+					$$->tableSpaceName = $5;
$$->viewQuery = NULL;		/* filled at analysis time */
$$->skipData = false;		/* might get changed later */
}

Cool. I wonder if we should also somehow support SELECT INTO w/ USING?
You've apparently started to do so with?

diff --git a/src/test/regress/expected/create_am.out b/src/test/regress/expected/create_am.out
index 47dd885c4e..a4094ca3f1 100644
--- a/src/test/regress/expected/create_am.out
+++ b/src/test/regress/expected/create_am.out
@@ -99,3 +99,81 @@ HINT:  Use DROP ... CASCADE to drop the dependent objects too.
-- Drop access method cascade
DROP ACCESS METHOD gist2 CASCADE;
NOTICE:  drop cascades to index grect2ind2
+-- Create a heap2 table am handler with heapam handler
+CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler;
+SELECT * FROM pg_am where amtype = 't';
+ amname |      amhandler       | amtype 
+--------+----------------------+--------
+ heap   | heap_tableam_handler | t
+ heap2  | heap_tableam_handler | t
+(2 rows)
+
+CREATE TABLE tbl_heap2(f1 int, f2 char(100)) using heap2;
+INSERT INTO tbl_heap2 VALUES(generate_series(1,10), 'Test series');
+SELECT count(*) FROM tbl_heap2;
+ count 
+-------
+    10
+(1 row)
+
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+		where a.oid = r.relam AND r.relname = 'tbl_heap2';
+  relname  | relkind | amname 
+-----------+---------+--------
+ tbl_heap2 | r       | heap2
+(1 row)
+
+-- create table as using heap2
+CREATE TABLE tblas_heap2 using heap2 AS select * from tbl_heap2;
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+		where a.oid = r.relam AND r.relname = 'tblas_heap2';
+   relname   | relkind | amname 
+-------------+---------+--------
+ tblas_heap2 | r       | heap2
+(1 row)
+
+--
+-- select into doesn't support new syntax, so it should be
+-- default access method.
+--
+SELECT INTO tblselectinto_heap from tbl_heap2;
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+		where a.oid = r.relam AND r.relname = 'tblselectinto_heap';
+      relname       | relkind | amname 
+--------------------+---------+--------
+ tblselectinto_heap | r       | heap
+(1 row)
+
+DROP TABLE tblselectinto_heap;
+-- create materialized view using heap2
+CREATE MATERIALIZED VIEW mv_heap2 USING heap2 AS
+		SELECT * FROM tbl_heap2;
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+		where a.oid = r.relam AND r.relname = 'mv_heap2';
+ relname  | relkind | amname 
+----------+---------+--------
+ mv_heap2 | m       | heap2
+(1 row)
+
+-- Try creating the unsupported relation kinds with using syntax
+CREATE VIEW test_view USING heap2 AS SELECT * FROM tbl_heap2;
+ERROR:  syntax error at or near "USING"
+LINE 1: CREATE VIEW test_view USING heap2 AS SELECT * FROM tbl_heap2...
+                              ^
+CREATE SEQUENCE test_seq USING heap2;
+ERROR:  syntax error at or near "USING"
+LINE 1: CREATE SEQUENCE test_seq USING heap2;
+                                 ^
+-- Drop table access method, but fails as objects depends on it
+DROP ACCESS METHOD heap2;
+ERROR:  cannot drop access method heap2 because other objects depend on it
+DETAIL:  table tbl_heap2 depends on access method heap2
+table tblas_heap2 depends on access method heap2
+materialized view mv_heap2 depends on access method heap2
+HINT:  Use DROP ... CASCADE to drop the dependent objects too.
+-- Drop table access method with cascade
+DROP ACCESS METHOD heap2 CASCADE;
+NOTICE:  drop cascades to 3 other objects
+DETAIL:  drop cascades to table tbl_heap2
+drop cascades to table tblas_heap2
+drop cascades to materialized view mv_heap2
diff --git a/src/test/regress/sql/create_am.sql b/src/test/regress/sql/create_am.sql
index 3e0ac104f3..0472a60f20 100644
--- a/src/test/regress/sql/create_am.sql
+++ b/src/test/regress/sql/create_am.sql
@@ -66,3 +66,49 @@ DROP ACCESS METHOD gist2;
-- Drop access method cascade
DROP ACCESS METHOD gist2 CASCADE;
+
+-- Create a heap2 table am handler with heapam handler
+CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler;
+
+SELECT * FROM pg_am where amtype = 't';
+
+CREATE TABLE tbl_heap2(f1 int, f2 char(100)) using heap2;
+INSERT INTO tbl_heap2 VALUES(generate_series(1,10), 'Test series');
+SELECT count(*) FROM tbl_heap2;
+
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+		where a.oid = r.relam AND r.relname = 'tbl_heap2';
+
+-- create table as using heap2
+CREATE TABLE tblas_heap2 using heap2 AS select * from tbl_heap2;
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+		where a.oid = r.relam AND r.relname = 'tblas_heap2';
+
+--
+-- select into doesn't support new syntax, so it should be
+-- default access method.
+--
+SELECT INTO tblselectinto_heap from tbl_heap2;
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+		where a.oid = r.relam AND r.relname = 'tblselectinto_heap';
+
+DROP TABLE tblselectinto_heap;
+
+-- create materialized view using heap2
+CREATE MATERIALIZED VIEW mv_heap2 USING heap2 AS
+		SELECT * FROM tbl_heap2;
+
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+		where a.oid = r.relam AND r.relname = 'mv_heap2';
+
+-- Try creating the unsupported relation kinds with using syntax
+CREATE VIEW test_view USING heap2 AS SELECT * FROM tbl_heap2;
+
+CREATE SEQUENCE test_seq USING heap2;
+
+
+-- Drop table access method, but fails as objects depends on it
+DROP ACCESS METHOD heap2;
+
+-- Drop table access method with cascade
+DROP ACCESS METHOD heap2 CASCADE;
-- 
2.18.0.windows.1

Nice!

Greetings,

Andres Freund

#54Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#50)
Re: Pluggable Storage - Andres's take

Hi,

On 2018-11-26 17:55:57 -0800, Andres Freund wrote:

FWIW, now that oids are removed, and the tuple table slot abstraction
got in, I'm working on rebasing the pluggable storage patchset ontop of
that.

I've pushed a version to that to the git tree, including a rebased
version of zheap:
https://github.com/anarazel/postgres-pluggable-storage
https://github.com/anarazel/postgres-pluggable-zheap

I'm still working on moving some of the out-of-access/zheap
modifications into pluggable storage (see e.g. the first commit of the
pluggable-zheap series). But this should allow others to start on a more
recent codebasis.

My next steps are:
- make relation creation properly pluggable
- remove the typedefs from tableam.h, instead move them into the
TableAmRoutine struct.
- Move rs_{nblocks, startblock, numblocks} out of TableScanDescData
- Move HeapScanDesc and IndexFetchHeapData out of relscan.h
- See if the slot in SysScanDescData can be avoided, it's not exactly
free of overhead.
- remove ExecSlotCompare(), it's entirely unrelated to these changes imo
(and in the wrong place)
- rename HeapUpdateFailureData et al to not reference Heap
- split pluggable storage patchset, to commit earlier:
- EvalPlanQual slotification
- trigger slotification
- split of IndexBuildHeapScan out of index.c

I'm wondering whether we should add
table_beginscan/table_getnextslot/index_getnext_slot using the old API
in an earlier commit that then could be committed separately, allowing
the tablecmd.c changes to be committed soon.

I'm wondering whether we should change the table_beginscan* API so it
provides a slot - pretty much every caller has to do so, and it seems
just as easy to create/dispose via table_beginscan/endscan.

Further tasks I'm not yet planning to tackle, that I'd welcome help on:
- pg_dump support
- pg_upgrade testing
- I think we should consider removing HeapTuple->t_tableOid, it should
imo live entirely in the slot

Greetings,

Andres Freund

#55Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Andres Freund (#54)
Re: Pluggable Storage - Andres's take

On Tue, Dec 11, 2018 at 3:13 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-11-26 17:55:57 -0800, Andres Freund wrote:

FWIW, now that oids are removed, and the tuple table slot abstraction
got in, I'm working on rebasing the pluggable storage patchset ontop of
that.

I've pushed a version to that to the git tree, including a rebased
version of zheap:
https://github.com/anarazel/postgres-pluggable-storage
https://github.com/anarazel/postgres-pluggable-zheap

Great, thanks!

As a side note, I assume the last reference should be this, right?

https://github.com/anarazel/postgres-pluggable-storage/tree/pluggable-zheap

Further tasks I'm not yet planning to tackle, that I'd welcome help on:
- pg_dump support
- pg_upgrade testing
- I think we should consider removing HeapTuple->t_tableOid, it should
imo live entirely in the slot

I would love to try help with pg_dump support.

#56Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Amit Langote (#51)
Re: Pluggable Storage - Andres's take

Hello.

At Tue, 27 Nov 2018 14:58:35 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <080ce65e-7b96-adbf-1c8c-7c88d87eaeda@lab.ntt.co.jp>

+  <para>
+<programlisting>
+TupleTableSlotOps *
+slot_callbacks (Relation relation);
+</programlisting>
+   API to access the slot specific methods;
+   Following methods are available;
+   <structname>TTSOpsVirtual</structname>,
+   <structname>TTSOpsHeapTuple</structname>,
+   <structname>TTSOpsMinimalTuple</structname>,
+   <structname>TTSOpsBufferTuple</structname>,
+  </para>

Unless I'm misunderstanding what the TupleTableSlotOps abstraction is or
its relations to the TableAmRoutine abstraction, I think the text
description could better be written as:

"API to get the slot operations struct for a given table access method"

It's not clear to me why various TTSOps* structs are listed here? Is the
point that different AMs may choose one of the listed alternatives? For
example, I see that heap AM implementation returns TTOpsBufferTuple, so it
manipulates slots containing buffered tuples, right? Other AMs are free
to return any one of these? For example, some AMs may never use buffer
manager and hence not use TTOpsBufferTuple. Is that understanding correct?

Yeah, I'm not sure why it should not be a pointer to the struct
itself but a function. And the four struct doesn't seem relevant
to table AMs. Perhaps clear, getsomeattrs and so on should be
listed instead.

+  <para>
+<programlisting>
+Oid
+tuple_insert (Relation rel, TupleTableSlot *slot, CommandId cid,
+              int options, BulkInsertState bistate);
+</programlisting>
+   API to insert the tuple and provide the <literal>ItemPointerData</literal>
+   where the tuple is successfully inserted.
+  </para>

It's not clear from the signature where you get the ItemPointerData.
Looking at heapam_tuple_insert which puts it in slot->tts_tid, I think
this should mention it a bit differently, like:

API to insert the tuple contained in the provided slot and return its TID,
that is, the location where the tuple is successfully inserted

It is actually an OID, not a TID in the current code. TID is
internaly handled.

+  <para>
+<programlisting>
+bool
+tuple_fetch_follow (struct IndexFetchTableData *scan,
+                  ItemPointer tid,
+                  Snapshot snapshot,
+                  TupleTableSlot *slot,
+                  bool *call_again, bool *all_dead);
+</programlisting>
+   API to get the all the tuples of the page that satisfies itempointer.
+  </para>

IIUC, "all the tuples of of the page" in the above sentence means all the
tuples in the HOT chain of a given heap tuple, making this description of
the API slightly specific to the heap AM. Can we make the description
more generic or is the API itself very specific that it cannot be
expressed in generic terms? Ignoring that for a moment, I think the
sentence contains more "the"s than there need to be, so maybe write as:

API to get all tuples on a given page that are linked to the tuple of the
given TID

Mmm. This is exposing MVCC matters to indexam. I suppose we
should refactor this API.

+  <para>
+<programlisting>
+tuple_data
+get_tuple_data (TupleTableSlot *slot, tuple_data_flags flags);
+</programlisting>
+   API to return the internal structure members of the HeapTuple.
+  </para>

I think this description doesn't mention enough details of both the
information that needs to be specified when calling the function (what's
in flags) and the information that's returned.

(I suppose it will be described in later sections.)

+  <para>
+<programlisting>
+bool
+scan_analyze_next_tuple (TableScanDesc scan, TransactionId OldestXmin,
+                      double *liverows, double *deadrows, TupleTableSlot
*slot));
+</programlisting>
+   API to analyze the block and fill the buffered heap tuple in the slot
and also
+   provide the live and dead rows.
+  </para>

How about:

API to get the next tuple from the block being scanned, which also updates
the number of live and dead rows encountered

"live" and "dead" are MVCC terms. I suppose that we should stash
out the deadrows somwhere else. (But analyze code would need to
be modified if we do so.)

+void
+scansetlimits (TableScanDesc sscan, BlockNumber startBlk, BlockNumber
numBlks);
+</programlisting>
+   API to fix the relation scan range limits.
+  </para>

How about:

API to set scan range endpoints

This sets start point and the number of blocks.. Just "API to set
scan range" would be sifficient reffering to the parameter list.

+    <para>
+<programlisting>
+bool
+scan_bitmap_pagescan (TableScanDesc scan,
+                    TBMIterateResult *tbmres);
+</programlisting>
+   API to scan the relation and fill the scan description bitmap with
valid item pointers
+   for the specified block.
+  </para>

This says "to scan the relation", but seems to be concerned with only a
page worth of data as the name also says. Also, it's not clear what "scan
description bitmap" means. Maybe write as:

API to scan the relation block specified in the scan descriptor to collect
and return the tuples requested by the given bitmap

"API to collect the tuples in a page requested by the given
bitmpap scan result." something? I think detailed explanation
would be required apart from the one-line description. Anyway the
name TBMIterateResult doesn't seem proper to expose.

+    <para>
+<programlisting>
+bool
+scan_sample_next_block (TableScanDesc scan, struct SampleScanState
*scanstate);
+</programlisting>
+   API to scan the relation and fill the scan description bitmap with
valid item pointers
+   for the specified block provided by the sample method.
+  </para>

Looking at the code, this API selects the next block using the sampling
method and nothing more, although I see that the heap AM implementation
also does heapgetpage thus collecting live tuples in the array known only
to heap AM. So, how about:

API to select the next block of the relation using the given sampling
method and set its information in the scan descriptor

"block" and "page" seems randomly choosed here and there. I don't
mind that seen in the core but..

+    <para>
+<programlisting>
+bool
+scan_sample_next_tuple (TableScanDesc scan, struct SampleScanState
*scanstate, TupleTableSlot *slot);
+</programlisting>
+   API to fill the buffered heap tuple data from the bitmap scanned item
pointers based on the sample
+   method and store it in the provided slot.
+  </para>

How about:

API to select the next tuple using the given sampling method from the set
of tuples collected from the block previously selected by the sampling method

I'm not sure "from the set of tuples collected" is true. Just
"the state of sample scan" or something wouldn't be fine?

+    <para>
+<programlisting>
+void
+scan_rescan (TableScanDesc scan, ScanKey key, bool set_params,
+             bool allow_strat, bool allow_sync, bool allow_pagemode);
+</programlisting>
+   API to restart the relation scan with provided data.
+  </para>

How about:

API to restart the given scan using provided options, releasing any
resources (such as buffer pins) already held by the scan

It looks too-detailed to me, but "with provided data" looks too
coarse..

+  <para>
+<programlisting>
+void
+scan_update_snapshot (TableScanDesc scan, Snapshot snapshot);
+</programlisting>
+   API to update the relation scan with the new snapshot.
+  </para>

How about:

API to set the visibility snapshot to be used by a given scan

If so, the function name should be "scan_set_snapshot". Anyway
the name look like "the function to update a snapshot (itself)".

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#57Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Andres Freund (#31)
Re: Pluggable Storage - Andres's take

Hello.

(in the next branch:)
At Tue, 27 Nov 2018 14:58:35 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <080ce65e-7b96-adbf-1c8c-7c88d87eaeda@lab.ntt.co.jp>

Thank you for working on this. Really looking forward to how this shapes
up. :)

+1.

I looked through the documentation part, as where I can do something.

am.html:

61.1. Overview of Index access methods
61.1.1. Basic API Structure for Indexes
61.1.2. Index Access Method Functions
61.1.3. Index Scanning
61.2. Overview of Table access methods
61.2.1. Table access method API
61.2.2. Table Access Method Functions
61.2.3. Table scanning

Aren't 61.1 and 61.2 better in the reverse order?

Is there a reason for the difference of the titles between 61.1.1
and 61.2.1? The contents are quite similar.

+ <sect2 id="table-api">
+  <title>Table access method API</title>

The member names of index AM struct begins with "am" but they
don't have an unified prefix in table AM. It seems a bit
incosistent. Perhaps we might should rename some long and
internal names..

+ <sect2 id="table-functions">
+  <title>Table Access Method Functions</title>

Table AM functions are far finer-grained than index AM. I think
that AM developers needs the more concrete description on what
every API function does and explanation on various
previously-internal structs.

I suppose that how the functions are used in core code paths will
be written in the following sections.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#58Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#53)
Re: Pluggable Storage - Andres's take

On Tue, Dec 11, 2018 at 12:47 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

Thanks for these changes. I've merged a good chunk of them.

Thanks.

On 2018-11-16 12:05:26 +1100, Haribabu Kommi wrote:

diff --git a/src/backend/access/heap/heapam_handler.c

b/src/backend/access/heap/heapam_handler.c

index c3960dc91f..3254e30a45 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1741,7 +1741,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc

sscan, TransactionId OldestXmin, do

{
HeapScanDesc scan = (HeapScanDesc) sscan;
Page            targpage;
-     OffsetNumber targoffset = scan->rs_cindex;
+     OffsetNumber targoffset;
OffsetNumber maxoffset;
BufferHeapTupleTableSlot *hslot;

@@ -1751,7 +1751,9 @@ heapam_scan_analyze_next_tuple(TableScanDesc

sscan, TransactionId OldestXmin, do

maxoffset = PageGetMaxOffsetNumber(targpage);

/* Inner loop over all tuples on the selected page */
- for (targoffset = scan->rs_cindex; targoffset <= maxoffset;

targoffset++)

+ for (targoffset = scan->rs_cindex ? scan->rs_cindex :

FirstOffsetNumber;

+                     targoffset <= maxoffset;
+                     targoffset++)
{
ItemId          itemid;
HeapTuple       targtuple = &hslot->base.tupdata;

I thought it was better to fix the initialization for rs_cindex - any
reason you didn't go for that?

No specific reason. Thanks for the correction.

diff --git a/src/backend/access/heap/heapam_visibility.c

b/src/backend/access/heap/heapam_visibility.c

index 8233475aa0..7bad246f55 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1838,8 +1838,10 @@ HeapTupleSatisfies(HeapTuple stup, Snapshot

snapshot, Buffer buffer)

case NON_VACUUMABLE_VISIBILTY:
return HeapTupleSatisfiesNonVacuumable(stup,

snapshot, buffer);

break;
-             default:
+             case END_OF_VISIBILITY:
Assert(0);
break;
}
+
+     return false; /* keep compiler quiet */

I don't understand why END_OF_VISIBILITY is good idea? I now removed
END_OF_VISIBILITY, and the default case.

OK.

@@ -593,6 +594,10 @@ intorel_receive(TupleTableSlot *slot, DestReceiver

*self)

if (myState->rel->rd_rel->relhasoids)
slot->tts_tupleOid = InvalidOid;

+     /* Materialize the slot */
+     if (!TTS_IS_VIRTUAL(slot))
+             ExecMaterializeSlot(slot);
+
table_insert(myState->rel,
slot,
myState->output_cid,

What's the point of adding materialization here?

In earlier testing i observed as the slot that is received is a buffered
slot
and it points to the original tuple, but when it inserts it into the new
table,
the transaction id changes and it leads to invisible tuple, because of that
reason I added the materialize here.

@@ -570,6 +563,9 @@ ExecInterpExpr(ExprState *state, ExprContext

*econtext, bool *isnull)

Assert(TTS_IS_HEAPTUPLE(scanslot) ||
TTS_IS_BUFFERTUPLE(scanslot));

+                             if (hslot->tuple == NULL)
+                                     ExecMaterializeSlot(scanslot);
+
d = heap_getsysattr(hslot->tuple, attnum,

scanslot->tts_tupleDescriptor,

op->resnull);

Same?

diff --git a/src/backend/executor/execMain.c

b/src/backend/executor/execMain.c

index e055c0a7c6..34ef86a5bd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2594,7 +2594,7 @@ EvalPlanQual(EState *estate, EPQState *epqstate,
* datums that may be present in copyTuple).  As with the next

step, this

* is to guard against early re-use of the EPQ query.
*/
-     if (!TupIsNull(slot))
+     if (!TupIsNull(slot) && !TTS_IS_VIRTUAL(slot))
ExecMaterializeSlot(slot);

Same?

Earlier virtual tuple materialize was throwing error, because of that
reason I added
that check.

index 56880e3d16..36ca07beb2 100644

--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -224,6 +224,18 @@ BitmapHeapNext(BitmapHeapScanState *node)

BitmapAdjustPrefetchIterator(node, tbmres);

+                     /*
+                      * Ignore any claimed entries past what we think

is the end of the

+ * relation. (This is probably not necessary

given that we got at

+ * least AccessShareLock on the table before

performing any of the

+                      * indexscans, but let's be safe.)
+                      */
+                     if (tbmres->blockno >= scan->rs_nblocks)
+                     {
+                             node->tbmres = tbmres = NULL;
+                             continue;
+                     }
+

I moved this into the storage engine, there just was a minor bug
preventing the already existing check from taking effect. I don't think
we should expose this kind of thing to the outside of the storage
engine.

OK.

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 54382aba88..ea48e1d6e8 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4037,7 +4037,6 @@ CreateStatsStmt:
*

*****************************************************************************/

-// PBORKED: storage option
CreateAsStmt:
CREATE OptTemp TABLE create_as_target AS SelectStmt

opt_with_data

{
@@ -4068,14 +4067,16 @@ CreateAsStmt:
;

create_as_target:
- qualified_name opt_column_list OptWith

OnCommitOption OptTableSpace

+ qualified_name opt_column_list

table_access_method_clause

+                     OptWith OnCommitOption OptTableSpace
{
$$ = makeNode(IntoClause);
$$->rel = $1;
$$->colNames = $2;
-                                     $$->options = $3;
-                                     $$->onCommit = $4;
-                                     $$->tableSpaceName = $5;
+                                     $$->accessMethod = $3;
+                                     $$->options = $4;
+                                     $$->onCommit = $5;
+                                     $$->tableSpaceName = $6;
$$->viewQuery = NULL;
$$->skipData = false;           /*

might get changed later */

}
@@ -4125,14 +4126,15 @@ CreateMatViewStmt:
;

create_mv_target:
- qualified_name opt_column_list opt_reloptions

OptTableSpace

+ qualified_name opt_column_list

table_access_method_clause opt_reloptions OptTableSpace

{
$$ = makeNode(IntoClause);
$$->rel = $1;
$$->colNames = $2;
-                                     $$->options = $3;
+                                     $$->accessMethod = $3;
+                                     $$->options = $4;
$$->onCommit = ONCOMMIT_NOOP;
-                                     $$->tableSpaceName = $4;
+                                     $$->tableSpaceName = $5;
$$->viewQuery = NULL;           /*

filled at analysis time */

$$->skipData = false; /*

might get changed later */

}

Cool. I wonder if we should also somehow support SELECT INTO w/ USING?
You've apparently started to do so with?

I thought the same, but SELECT INTO is deprecated syntax, is it fine to add
the new syntax?

Regards,
Haribabu Kommi
Fujitsu Australia

#59Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Haribabu Kommi (#58)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Tue, Dec 11, 2018 at 3:13 AM Andres Freund <andres@anarazel.de> wrote:

Further tasks I'm not yet planning to tackle, that I'd welcome help on:
- pg_dump support
- pg_upgrade testing
- I think we should consider removing HeapTuple->t_tableOid, it should
imo live entirely in the slot

I'm a bit confused, but what kind of pg_dump support you're talking about?
After a quick glance I don't see so far any table access specific logic there.
To check it I've created a test access method (which is a copy of heap, but
with some small differences) and pg_dump worked as expected.

As a side note, in a table description I haven't found any mention of which
access method is used for this table, probably it's useful to show that with \d+
(see the attached patch).

Attachments:

describe_am.patchapplication/octet-stream; name=describe_am.patchDownload
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 0a181b01d9..a292c531b5 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1484,6 +1484,7 @@ describeOneTableDetails(const char *schemaname,
 		char	   *reloftype;
 		char		relpersistence;
 		char		relreplident;
+		char	   *relam;
 	}			tableinfo;
 	bool		show_column_details = false;
 
@@ -1503,9 +1504,10 @@ describeOneTableDetails(const char *schemaname,
 						  "c.relhastriggers, c.relrowsecurity, c.relforcerowsecurity, "
 						  "false AS relhasoids, %s, c.reltablespace, "
 						  "CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END, "
-						  "c.relpersistence, c.relreplident\n"
+						  "c.relpersistence, c.relreplident, am.amname\n"
 						  "FROM pg_catalog.pg_class c\n "
 						  "LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
+						  "LEFT JOIN pg_catalog.pg_am am ON (c.relam = am.oid)\n"
 						  "WHERE c.oid = '%s';",
 						  (verbose ?
 						   "pg_catalog.array_to_string(c.reloptions || "
@@ -1656,6 +1658,8 @@ describeOneTableDetails(const char *schemaname,
 		*(PQgetvalue(res, 0, 11)) : 0;
 	tableinfo.relreplident = (pset.sversion >= 90400) ?
 		*(PQgetvalue(res, 0, 12)) : 'd';
+	tableinfo.relam = (pset.sversion >= 120000) ?
+		pg_strdup(PQgetvalue(res, 0, 13)) : NULL;
 	PQclear(res);
 	res = NULL;
 
@@ -3141,6 +3145,15 @@ describeOneTableDetails(const char *schemaname,
 		/* Tablespace info */
 		add_tablespace_footer(&cont, tableinfo.relkind, tableinfo.tablespace,
 							  true);
+
+		/* Access method info */
+		if (pset.sversion >= 120000 && verbose)
+		{
+			printfPQExpBuffer(&buf, _("Access method: %s"), tableinfo.relam);
+			printTableAddFooter(&cont, buf.data);
+		}
+
+
 	}
 
 	/* reloptions, if verbose */
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 19bb538411..84d182303e 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -432,6 +432,7 @@ alter table check_con_tbl add check (check_con_function(check_con_tbl.*));
  f1     | integer |           |          |         | plain   |              | 
 Check constraints:
     "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+Access method: heap
 
 copy check_con_tbl from stdin;
 NOTICE:  input = {"f1":1}
diff --git a/src/test/regress/expected/create_table.out b/src/test/regress/expected/create_table.out
index 7e52c27e3f..dbaa713e6b 100644
--- a/src/test/regress/expected/create_table.out
+++ b/src/test/regress/expected/create_table.out
@@ -438,6 +438,7 @@ Number of partitions: 0
  b      | text    |           |          |         | extended |              | 
 Partition key: RANGE (((a + 1)), substr(b, 1, 5))
 Number of partitions: 0
+Access method: heap
 
 INSERT INTO partitioned2 VALUES (1, 'hello');
 ERROR:  no partition of relation "partitioned2" found for row
@@ -451,6 +452,7 @@ CREATE TABLE part2_1 PARTITION OF partitioned2 FOR VALUES FROM (-1, 'aaaaa') TO
  b      | text    |           |          |         | extended |              | 
 Partition of: partitioned2 FOR VALUES FROM ('-1', 'aaaaa') TO (100, 'ccccc')
 Partition constraint: (((a + 1) IS NOT NULL) AND (substr(b, 1, 5) IS NOT NULL) AND (((a + 1) > '-1'::integer) OR (((a + 1) = '-1'::integer) AND (substr(b, 1, 5) >= 'aaaaa'::text))) AND (((a + 1) < 100) OR (((a + 1) = 100) AND (substr(b, 1, 5) < 'ccccc'::text))))
+Access method: heap
 
 DROP TABLE partitioned, partitioned2;
 --
@@ -783,6 +785,7 @@ drop table parted_collate_must_match;
  b      | integer |           | not null | 1       | plain    |              | 
 Partition of: parted FOR VALUES IN ('b')
 Partition constraint: ((a IS NOT NULL) AND (a = 'b'::text))
+Access method: heap
 
 -- Both partition bound and partition key in describe output
 \d+ part_c
@@ -795,6 +798,7 @@ Partition of: parted FOR VALUES IN ('c')
 Partition constraint: ((a IS NOT NULL) AND (a = 'c'::text))
 Partition key: RANGE (b)
 Partitions: part_c_1_10 FOR VALUES FROM (1) TO (10)
+Access method: heap
 
 -- a level-2 partition's constraint will include the parent's expressions
 \d+ part_c_1_10
@@ -805,6 +809,7 @@ Partitions: part_c_1_10 FOR VALUES FROM (1) TO (10)
  b      | integer |           | not null | 0       | plain    |              | 
 Partition of: part_c FOR VALUES FROM (1) TO (10)
 Partition constraint: ((a IS NOT NULL) AND (a = 'c'::text) AND (b IS NOT NULL) AND (b >= 1) AND (b < 10))
+Access method: heap
 
 -- Show partition count in the parent's describe output
 -- Tempted to include \d+ output listing partitions with bound info but
@@ -839,6 +844,7 @@ CREATE TABLE unbounded_range_part PARTITION OF range_parted4 FOR VALUES FROM (MI
  c      | integer |           |          |         | plain   |              | 
 Partition of: range_parted4 FOR VALUES FROM (MINVALUE, MINVALUE, MINVALUE) TO (MAXVALUE, MAXVALUE, MAXVALUE)
 Partition constraint: ((abs(a) IS NOT NULL) AND (abs(b) IS NOT NULL) AND (c IS NOT NULL))
+Access method: heap
 
 DROP TABLE unbounded_range_part;
 CREATE TABLE range_parted4_1 PARTITION OF range_parted4 FOR VALUES FROM (MINVALUE, MINVALUE, MINVALUE) TO (1, MAXVALUE, MAXVALUE);
@@ -851,6 +857,7 @@ CREATE TABLE range_parted4_1 PARTITION OF range_parted4 FOR VALUES FROM (MINVALU
  c      | integer |           |          |         | plain   |              | 
 Partition of: range_parted4 FOR VALUES FROM (MINVALUE, MINVALUE, MINVALUE) TO (1, MAXVALUE, MAXVALUE)
 Partition constraint: ((abs(a) IS NOT NULL) AND (abs(b) IS NOT NULL) AND (c IS NOT NULL) AND (abs(a) <= 1))
+Access method: heap
 
 CREATE TABLE range_parted4_2 PARTITION OF range_parted4 FOR VALUES FROM (3, 4, 5) TO (6, 7, MAXVALUE);
 \d+ range_parted4_2
@@ -862,6 +869,7 @@ CREATE TABLE range_parted4_2 PARTITION OF range_parted4 FOR VALUES FROM (3, 4, 5
  c      | integer |           |          |         | plain   |              | 
 Partition of: range_parted4 FOR VALUES FROM (3, 4, 5) TO (6, 7, MAXVALUE)
 Partition constraint: ((abs(a) IS NOT NULL) AND (abs(b) IS NOT NULL) AND (c IS NOT NULL) AND ((abs(a) > 3) OR ((abs(a) = 3) AND (abs(b) > 4)) OR ((abs(a) = 3) AND (abs(b) = 4) AND (c >= 5))) AND ((abs(a) < 6) OR ((abs(a) = 6) AND (abs(b) <= 7))))
+Access method: heap
 
 CREATE TABLE range_parted4_3 PARTITION OF range_parted4 FOR VALUES FROM (6, 8, MINVALUE) TO (9, MAXVALUE, MAXVALUE);
 \d+ range_parted4_3
@@ -873,6 +881,7 @@ CREATE TABLE range_parted4_3 PARTITION OF range_parted4 FOR VALUES FROM (6, 8, M
  c      | integer |           |          |         | plain   |              | 
 Partition of: range_parted4 FOR VALUES FROM (6, 8, MINVALUE) TO (9, MAXVALUE, MAXVALUE)
 Partition constraint: ((abs(a) IS NOT NULL) AND (abs(b) IS NOT NULL) AND (c IS NOT NULL) AND ((abs(a) > 6) OR ((abs(a) = 6) AND (abs(b) >= 8))) AND (abs(a) <= 9))
+Access method: heap
 
 DROP TABLE range_parted4;
 -- user-defined operator class in partition key
@@ -909,6 +918,7 @@ SELECT obj_description('parted_col_comment'::regclass);
  b      | text    |           |          |         | extended |              | 
 Partition key: LIST (a)
 Number of partitions: 0
+Access method: heap
 
 DROP TABLE parted_col_comment;
 -- list partitioning on array type column
@@ -921,6 +931,7 @@ CREATE TABLE arrlp12 PARTITION OF arrlp FOR VALUES IN ('{1}', '{2}');
  a      | integer[] |           |          |         | extended |              | 
 Partition of: arrlp FOR VALUES IN ('{1}', '{2}')
 Partition constraint: ((a IS NOT NULL) AND ((a = '{1}'::integer[]) OR (a = '{2}'::integer[])))
+Access method: heap
 
 DROP TABLE arrlp;
 -- partition on boolean column
@@ -935,6 +946,7 @@ create table boolspart_f partition of boolspart for values in (false);
 Partition key: LIST (a)
 Partitions: boolspart_f FOR VALUES IN (false),
             boolspart_t FOR VALUES IN (true)
+Access method: heap
 
 drop table boolspart;
 -- partitions mixing temporary and permanent relations
diff --git a/src/test/regress/expected/create_table_like.out b/src/test/regress/expected/create_table_like.out
index b582211270..951d876216 100644
--- a/src/test/regress/expected/create_table_like.out
+++ b/src/test/regress/expected/create_table_like.out
@@ -164,6 +164,7 @@ CREATE TABLE ctlt12_storage (LIKE ctlt1 INCLUDING STORAGE, LIKE ctlt2 INCLUDING
  a      | text |           | not null |         | main     |              | 
  b      | text |           |          |         | extended |              | 
  c      | text |           |          |         | external |              | 
+Access method: heap
 
 CREATE TABLE ctlt12_comments (LIKE ctlt1 INCLUDING COMMENTS, LIKE ctlt2 INCLUDING COMMENTS);
 \d+ ctlt12_comments
@@ -173,6 +174,7 @@ CREATE TABLE ctlt12_comments (LIKE ctlt1 INCLUDING COMMENTS, LIKE ctlt2 INCLUDIN
  a      | text |           | not null |         | extended |              | A
  b      | text |           |          |         | extended |              | B
  c      | text |           |          |         | extended |              | C
+Access method: heap
 
 CREATE TABLE ctlt1_inh (LIKE ctlt1 INCLUDING CONSTRAINTS INCLUDING COMMENTS) INHERITS (ctlt1);
 NOTICE:  merging column "a" with inherited definition
@@ -187,6 +189,7 @@ NOTICE:  merging constraint "ctlt1_a_check" with inherited definition
 Check constraints:
     "ctlt1_a_check" CHECK (length(a) > 2)
 Inherits: ctlt1
+Access method: heap
 
 SELECT description FROM pg_description, pg_constraint c WHERE classoid = 'pg_constraint'::regclass AND objoid = c.oid AND c.conrelid = 'ctlt1_inh'::regclass;
  description 
@@ -208,6 +211,7 @@ Check constraints:
     "ctlt3_a_check" CHECK (length(a) < 5)
 Inherits: ctlt1,
           ctlt3
+Access method: heap
 
 CREATE TABLE ctlt13_like (LIKE ctlt3 INCLUDING CONSTRAINTS INCLUDING COMMENTS INCLUDING STORAGE) INHERITS (ctlt1);
 NOTICE:  merging column "a" with inherited definition
@@ -222,6 +226,7 @@ Check constraints:
     "ctlt1_a_check" CHECK (length(a) > 2)
     "ctlt3_a_check" CHECK (length(a) < 5)
 Inherits: ctlt1
+Access method: heap
 
 SELECT description FROM pg_description, pg_constraint c WHERE classoid = 'pg_constraint'::regclass AND objoid = c.oid AND c.conrelid = 'ctlt13_like'::regclass;
  description 
@@ -244,6 +249,7 @@ Check constraints:
     "ctlt1_a_check" CHECK (length(a) > 2)
 Statistics objects:
     "public"."ctlt_all_a_b_stat" (ndistinct, dependencies) ON a, b FROM ctlt_all
+Access method: heap
 
 SELECT c.relname, objsubid, description FROM pg_description, pg_index i, pg_class c WHERE classoid = 'pg_class'::regclass AND objoid = i.indexrelid AND c.oid = i.indexrelid AND i.indrelid = 'ctlt_all'::regclass ORDER BY c.relname, objsubid;
     relname     | objsubid | description 
diff --git a/src/test/regress/expected/domain.out b/src/test/regress/expected/domain.out
index 0b5a9041b0..976fd7446f 100644
--- a/src/test/regress/expected/domain.out
+++ b/src/test/regress/expected/domain.out
@@ -282,6 +282,7 @@ Rules:
     silly AS
     ON DELETE TO dcomptable DO INSTEAD  UPDATE dcomptable SET d1.r = (dcomptable.d1).r - 1::double precision, d1.i = (dcomptable.d1).i + 1::double precision
   WHERE (dcomptable.d1).i > 0::double precision
+Access method: heap
 
 drop table dcomptable;
 drop type comptype cascade;
@@ -419,6 +420,7 @@ Rules:
     silly AS
     ON DELETE TO dcomptable DO INSTEAD  UPDATE dcomptable SET d1[1].r = dcomptable.d1[1].r - 1::double precision, d1[1].i = dcomptable.d1[1].i + 1::double precision
   WHERE dcomptable.d1[1].i > 0::double precision
+Access method: heap
 
 drop table dcomptable;
 drop type comptype cascade;
diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out
index 4d82d3a7e8..94ab874d75 100644
--- a/src/test/regress/expected/foreign_data.out
+++ b/src/test/regress/expected/foreign_data.out
@@ -731,6 +731,7 @@ Check constraints:
     "ft1_c3_check" CHECK (c3 >= '01-01-1994'::date AND c3 <= '01-31-1994'::date)
 Server: s0
 FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method: 
 
 \det+
                                  List of foreign tables
@@ -800,6 +801,7 @@ Check constraints:
     "ft1_c3_check" CHECK (c3 >= '01-01-1994'::date AND c3 <= '01-31-1994'::date)
 Server: s0
 FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method: 
 
 -- can't change the column type if it's used elsewhere
 CREATE TABLE use_ft1_column_type (x ft1);
@@ -1339,6 +1341,7 @@ CREATE FOREIGN TABLE ft2 () INHERITS (fd_pt1)
  c2     | text    |           |          |         | extended |              | 
  c3     | date    |           |          |         | plain    |              | 
 Child tables: ft2
+Access method: heap
 
 \d+ ft2
                                        Foreign table "public.ft2"
@@ -1350,6 +1353,7 @@ Child tables: ft2
 Server: s0
 FDW options: (delimiter ',', quote '"', "be quoted" 'value')
 Inherits: fd_pt1
+Access method: 
 
 DROP FOREIGN TABLE ft2;
 \d+ fd_pt1
@@ -1359,6 +1363,7 @@ DROP FOREIGN TABLE ft2;
  c1     | integer |           | not null |         | plain    |              | 
  c2     | text    |           |          |         | extended |              | 
  c3     | date    |           |          |         | plain    |              | 
+Access method: heap
 
 CREATE FOREIGN TABLE ft2 (
 	c1 integer NOT NULL,
@@ -1374,6 +1379,7 @@ CREATE FOREIGN TABLE ft2 (
  c3     | date    |           |          |         |             | plain    |              | 
 Server: s0
 FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method: 
 
 ALTER FOREIGN TABLE ft2 INHERIT fd_pt1;
 \d+ fd_pt1
@@ -1384,6 +1390,7 @@ ALTER FOREIGN TABLE ft2 INHERIT fd_pt1;
  c2     | text    |           |          |         | extended |              | 
  c3     | date    |           |          |         | plain    |              | 
 Child tables: ft2
+Access method: heap
 
 \d+ ft2
                                        Foreign table "public.ft2"
@@ -1395,6 +1402,7 @@ Child tables: ft2
 Server: s0
 FDW options: (delimiter ',', quote '"', "be quoted" 'value')
 Inherits: fd_pt1
+Access method: 
 
 CREATE TABLE ct3() INHERITS(ft2);
 CREATE FOREIGN TABLE ft3 (
@@ -1418,6 +1426,7 @@ FDW options: (delimiter ',', quote '"', "be quoted" 'value')
 Inherits: fd_pt1
 Child tables: ct3,
               ft3
+Access method: 
 
 \d+ ct3
                                     Table "public.ct3"
@@ -1427,6 +1436,7 @@ Child tables: ct3,
  c2     | text    |           |          |         | extended |              | 
  c3     | date    |           |          |         | plain    |              | 
 Inherits: ft2
+Access method: heap
 
 \d+ ft3
                                        Foreign table "public.ft3"
@@ -1437,6 +1447,7 @@ Inherits: ft2
  c3     | date    |           |          |         |             | plain    |              | 
 Server: s0
 Inherits: ft2
+Access method: 
 
 -- add attributes recursively
 ALTER TABLE fd_pt1 ADD COLUMN c4 integer;
@@ -1457,6 +1468,7 @@ ALTER TABLE fd_pt1 ADD COLUMN c8 integer;
  c7     | integer |           | not null |         | plain    |              | 
  c8     | integer |           |          |         | plain    |              | 
 Child tables: ft2
+Access method: heap
 
 \d+ ft2
                                        Foreign table "public.ft2"
@@ -1475,6 +1487,7 @@ FDW options: (delimiter ',', quote '"', "be quoted" 'value')
 Inherits: fd_pt1
 Child tables: ct3,
               ft3
+Access method: 
 
 \d+ ct3
                                     Table "public.ct3"
@@ -1489,6 +1502,7 @@ Child tables: ct3,
  c7     | integer |           | not null |         | plain    |              | 
  c8     | integer |           |          |         | plain    |              | 
 Inherits: ft2
+Access method: heap
 
 \d+ ft3
                                        Foreign table "public.ft3"
@@ -1504,6 +1518,7 @@ Inherits: ft2
  c8     | integer |           |          |         |             | plain    |              | 
 Server: s0
 Inherits: ft2
+Access method: 
 
 -- alter attributes recursively
 ALTER TABLE fd_pt1 ALTER COLUMN c4 SET DEFAULT 0;
@@ -1531,6 +1546,7 @@ ALTER TABLE fd_pt1 ALTER COLUMN c8 SET STORAGE EXTERNAL;
  c7     | integer |           |          |         | plain    |              | 
  c8     | text    |           |          |         | external |              | 
 Child tables: ft2
+Access method: heap
 
 \d+ ft2
                                        Foreign table "public.ft2"
@@ -1549,6 +1565,7 @@ FDW options: (delimiter ',', quote '"', "be quoted" 'value')
 Inherits: fd_pt1
 Child tables: ct3,
               ft3
+Access method: 
 
 -- drop attributes recursively
 ALTER TABLE fd_pt1 DROP COLUMN c4;
@@ -1564,6 +1581,7 @@ ALTER TABLE fd_pt1 DROP COLUMN c8;
  c2     | text    |           |          |         | extended |              | 
  c3     | date    |           |          |         | plain    |              | 
 Child tables: ft2
+Access method: heap
 
 \d+ ft2
                                        Foreign table "public.ft2"
@@ -1577,6 +1595,7 @@ FDW options: (delimiter ',', quote '"', "be quoted" 'value')
 Inherits: fd_pt1
 Child tables: ct3,
               ft3
+Access method: 
 
 -- add constraints recursively
 ALTER TABLE fd_pt1 ADD CONSTRAINT fd_pt1chk1 CHECK (c1 > 0) NO INHERIT;
@@ -1604,6 +1623,7 @@ Check constraints:
     "fd_pt1chk1" CHECK (c1 > 0) NO INHERIT
     "fd_pt1chk2" CHECK (c2 <> ''::text)
 Child tables: ft2
+Access method: heap
 
 \d+ ft2
                                        Foreign table "public.ft2"
@@ -1619,6 +1639,7 @@ FDW options: (delimiter ',', quote '"', "be quoted" 'value')
 Inherits: fd_pt1
 Child tables: ct3,
               ft3
+Access method: 
 
 \set VERBOSITY terse
 DROP FOREIGN TABLE ft2; -- ERROR
@@ -1648,6 +1669,7 @@ Check constraints:
     "fd_pt1chk1" CHECK (c1 > 0) NO INHERIT
     "fd_pt1chk2" CHECK (c2 <> ''::text)
 Child tables: ft2
+Access method: heap
 
 \d+ ft2
                                        Foreign table "public.ft2"
@@ -1661,6 +1683,7 @@ Check constraints:
 Server: s0
 FDW options: (delimiter ',', quote '"', "be quoted" 'value')
 Inherits: fd_pt1
+Access method: 
 
 -- drop constraints recursively
 ALTER TABLE fd_pt1 DROP CONSTRAINT fd_pt1chk1 CASCADE;
@@ -1678,6 +1701,7 @@ ALTER TABLE fd_pt1 ADD CONSTRAINT fd_pt1chk3 CHECK (c2 <> '') NOT VALID;
 Check constraints:
     "fd_pt1chk3" CHECK (c2 <> ''::text) NOT VALID
 Child tables: ft2
+Access method: heap
 
 \d+ ft2
                                        Foreign table "public.ft2"
@@ -1692,6 +1716,7 @@ Check constraints:
 Server: s0
 FDW options: (delimiter ',', quote '"', "be quoted" 'value')
 Inherits: fd_pt1
+Access method: 
 
 -- VALIDATE CONSTRAINT need do nothing on foreign tables
 ALTER TABLE fd_pt1 VALIDATE CONSTRAINT fd_pt1chk3;
@@ -1705,6 +1730,7 @@ ALTER TABLE fd_pt1 VALIDATE CONSTRAINT fd_pt1chk3;
 Check constraints:
     "fd_pt1chk3" CHECK (c2 <> ''::text)
 Child tables: ft2
+Access method: heap
 
 \d+ ft2
                                        Foreign table "public.ft2"
@@ -1719,6 +1745,7 @@ Check constraints:
 Server: s0
 FDW options: (delimiter ',', quote '"', "be quoted" 'value')
 Inherits: fd_pt1
+Access method: 
 
 -- changes name of an attribute recursively
 ALTER TABLE fd_pt1 RENAME COLUMN c1 TO f1;
@@ -1736,6 +1763,7 @@ ALTER TABLE fd_pt1 RENAME CONSTRAINT fd_pt1chk3 TO f2_check;
 Check constraints:
     "f2_check" CHECK (f2 <> ''::text)
 Child tables: ft2
+Access method: heap
 
 \d+ ft2
                                        Foreign table "public.ft2"
@@ -1750,6 +1778,7 @@ Check constraints:
 Server: s0
 FDW options: (delimiter ',', quote '"', "be quoted" 'value')
 Inherits: fd_pt1
+Access method: 
 
 -- TRUNCATE doesn't work on foreign tables, either directly or recursively
 TRUNCATE ft2;  -- ERROR
@@ -1799,6 +1828,7 @@ CREATE FOREIGN TABLE fd_pt2_1 PARTITION OF fd_pt2 FOR VALUES IN (1)
  c3     | date    |           |          |         | plain    |              | 
 Partition key: LIST (c1)
 Partitions: fd_pt2_1 FOR VALUES IN (1)
+Access method: heap
 
 \d+ fd_pt2_1
                                      Foreign table "public.fd_pt2_1"
@@ -1811,6 +1841,7 @@ Partition of: fd_pt2 FOR VALUES IN (1)
 Partition constraint: ((c1 IS NOT NULL) AND (c1 = 1))
 Server: s0
 FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method: 
 
 -- partition cannot have additional columns
 DROP FOREIGN TABLE fd_pt2_1;
@@ -1830,6 +1861,7 @@ CREATE FOREIGN TABLE fd_pt2_1 (
  c4     | character(1) |           |          |         |             | extended |              | 
 Server: s0
 FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method: 
 
 ALTER TABLE fd_pt2 ATTACH PARTITION fd_pt2_1 FOR VALUES IN (1);       -- ERROR
 ERROR:  table "fd_pt2_1" contains column "c4" not found in parent "fd_pt2"
@@ -1844,6 +1876,7 @@ DROP FOREIGN TABLE fd_pt2_1;
  c3     | date    |           |          |         | plain    |              | 
 Partition key: LIST (c1)
 Number of partitions: 0
+Access method: heap
 
 CREATE FOREIGN TABLE fd_pt2_1 (
 	c1 integer NOT NULL,
@@ -1859,6 +1892,7 @@ CREATE FOREIGN TABLE fd_pt2_1 (
  c3     | date    |           |          |         |             | plain    |              | 
 Server: s0
 FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method: 
 
 -- no attach partition validation occurs for foreign tables
 ALTER TABLE fd_pt2 ATTACH PARTITION fd_pt2_1 FOR VALUES IN (1);
@@ -1871,6 +1905,7 @@ ALTER TABLE fd_pt2 ATTACH PARTITION fd_pt2_1 FOR VALUES IN (1);
  c3     | date    |           |          |         | plain    |              | 
 Partition key: LIST (c1)
 Partitions: fd_pt2_1 FOR VALUES IN (1)
+Access method: heap
 
 \d+ fd_pt2_1
                                      Foreign table "public.fd_pt2_1"
@@ -1883,6 +1918,7 @@ Partition of: fd_pt2 FOR VALUES IN (1)
 Partition constraint: ((c1 IS NOT NULL) AND (c1 = 1))
 Server: s0
 FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method: 
 
 -- cannot add column to a partition
 ALTER TABLE fd_pt2_1 ADD c4 char;
@@ -1899,6 +1935,7 @@ ALTER TABLE fd_pt2_1 ADD CONSTRAINT p21chk CHECK (c2 <> '');
  c3     | date    |           |          |         | plain    |              | 
 Partition key: LIST (c1)
 Partitions: fd_pt2_1 FOR VALUES IN (1)
+Access method: heap
 
 \d+ fd_pt2_1
                                      Foreign table "public.fd_pt2_1"
@@ -1913,6 +1950,7 @@ Check constraints:
     "p21chk" CHECK (c2 <> ''::text)
 Server: s0
 FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method: 
 
 -- cannot drop inherited NOT NULL constraint from a partition
 ALTER TABLE fd_pt2_1 ALTER c1 DROP NOT NULL;
@@ -1929,6 +1967,7 @@ ALTER TABLE fd_pt2 ALTER c2 SET NOT NULL;
  c3     | date    |           |          |         | plain    |              | 
 Partition key: LIST (c1)
 Number of partitions: 0
+Access method: heap
 
 \d+ fd_pt2_1
                                      Foreign table "public.fd_pt2_1"
@@ -1941,6 +1980,7 @@ Check constraints:
     "p21chk" CHECK (c2 <> ''::text)
 Server: s0
 FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method: 
 
 ALTER TABLE fd_pt2 ATTACH PARTITION fd_pt2_1 FOR VALUES IN (1);       -- ERROR
 ERROR:  column "c2" in child table must be marked NOT NULL
@@ -1959,6 +1999,7 @@ Partition key: LIST (c1)
 Check constraints:
     "fd_pt2chk1" CHECK (c1 > 0)
 Number of partitions: 0
+Access method: heap
 
 \d+ fd_pt2_1
                                      Foreign table "public.fd_pt2_1"
@@ -1971,6 +2012,7 @@ Check constraints:
     "p21chk" CHECK (c2 <> ''::text)
 Server: s0
 FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method: 
 
 ALTER TABLE fd_pt2 ATTACH PARTITION fd_pt2_1 FOR VALUES IN (1);       -- ERROR
 ERROR:  child table is missing constraint "fd_pt2chk1"
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index f259d07535..7bfc11c770 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1001,6 +1001,7 @@ ALTER TABLE inhts RENAME d TO dd;
  dd     | integer |           |          |         | plain   |              | 
 Inherits: inht1,
           inhs1
+Access method: heap
 
 DROP TABLE inhts;
 -- Test for renaming in diamond inheritance
@@ -1021,6 +1022,7 @@ ALTER TABLE inht1 RENAME aa TO aaa;
  z      | integer |           |          |         | plain   |              | 
 Inherits: inht2,
           inht3
+Access method: heap
 
 CREATE TABLE inhts (d int) INHERITS (inht2, inhs1);
 NOTICE:  merging multiple inherited definitions of column "b"
@@ -1038,6 +1040,7 @@ ERROR:  cannot rename inherited column "b"
  d      | integer |           |          |         | plain   |              | 
 Inherits: inht2,
           inhs1
+Access method: heap
 
 WITH RECURSIVE r AS (
   SELECT 'inht1'::regclass AS inhrelid
@@ -1084,6 +1087,7 @@ CREATE TABLE test_constraints_inh () INHERITS (test_constraints);
 Indexes:
     "test_constraints_val1_val2_key" UNIQUE CONSTRAINT, btree (val1, val2)
 Child tables: test_constraints_inh
+Access method: heap
 
 ALTER TABLE ONLY test_constraints DROP CONSTRAINT test_constraints_val1_val2_key;
 \d+ test_constraints
@@ -1094,6 +1098,7 @@ ALTER TABLE ONLY test_constraints DROP CONSTRAINT test_constraints_val1_val2_key
  val1   | character varying |           |          |         | extended |              | 
  val2   | integer           |           |          |         | plain    |              | 
 Child tables: test_constraints_inh
+Access method: heap
 
 \d+ test_constraints_inh
                                  Table "public.test_constraints_inh"
@@ -1103,6 +1108,7 @@ Child tables: test_constraints_inh
  val1   | character varying |           |          |         | extended |              | 
  val2   | integer           |           |          |         | plain    |              | 
 Inherits: test_constraints
+Access method: heap
 
 DROP TABLE test_constraints_inh;
 DROP TABLE test_constraints;
@@ -1119,6 +1125,7 @@ CREATE TABLE test_ex_constraints_inh () INHERITS (test_ex_constraints);
 Indexes:
     "test_ex_constraints_c_excl" EXCLUDE USING gist (c WITH &&)
 Child tables: test_ex_constraints_inh
+Access method: heap
 
 ALTER TABLE test_ex_constraints DROP CONSTRAINT test_ex_constraints_c_excl;
 \d+ test_ex_constraints
@@ -1127,6 +1134,7 @@ ALTER TABLE test_ex_constraints DROP CONSTRAINT test_ex_constraints_c_excl;
 --------+--------+-----------+----------+---------+---------+--------------+-------------
  c      | circle |           |          |         | plain   |              | 
 Child tables: test_ex_constraints_inh
+Access method: heap
 
 \d+ test_ex_constraints_inh
                          Table "public.test_ex_constraints_inh"
@@ -1134,6 +1142,7 @@ Child tables: test_ex_constraints_inh
 --------+--------+-----------+----------+---------+---------+--------------+-------------
  c      | circle |           |          |         | plain   |              | 
 Inherits: test_ex_constraints
+Access method: heap
 
 DROP TABLE test_ex_constraints_inh;
 DROP TABLE test_ex_constraints;
@@ -1150,6 +1159,7 @@ Indexes:
     "test_primary_constraints_pkey" PRIMARY KEY, btree (id)
 Referenced by:
     TABLE "test_foreign_constraints" CONSTRAINT "test_foreign_constraints_id1_fkey" FOREIGN KEY (id1) REFERENCES test_primary_constraints(id)
+Access method: heap
 
 \d+ test_foreign_constraints
                          Table "public.test_foreign_constraints"
@@ -1159,6 +1169,7 @@ Referenced by:
 Foreign-key constraints:
     "test_foreign_constraints_id1_fkey" FOREIGN KEY (id1) REFERENCES test_primary_constraints(id)
 Child tables: test_foreign_constraints_inh
+Access method: heap
 
 ALTER TABLE test_foreign_constraints DROP CONSTRAINT test_foreign_constraints_id1_fkey;
 \d+ test_foreign_constraints
@@ -1167,6 +1178,7 @@ ALTER TABLE test_foreign_constraints DROP CONSTRAINT test_foreign_constraints_id
 --------+---------+-----------+----------+---------+---------+--------------+-------------
  id1    | integer |           |          |         | plain   |              | 
 Child tables: test_foreign_constraints_inh
+Access method: heap
 
 \d+ test_foreign_constraints_inh
                        Table "public.test_foreign_constraints_inh"
@@ -1174,6 +1186,7 @@ Child tables: test_foreign_constraints_inh
 --------+---------+-----------+----------+---------+---------+--------------+-------------
  id1    | integer |           |          |         | plain   |              | 
 Inherits: test_foreign_constraints
+Access method: heap
 
 DROP TABLE test_foreign_constraints_inh;
 DROP TABLE test_foreign_constraints;
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index 1cf6531c01..48ad462e3d 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -156,6 +156,7 @@ Rules:
     irule3 AS
     ON INSERT TO inserttest2 DO  INSERT INTO inserttest (f4[1].if1, f4[1].if2[2])  SELECT new.f1,
             new.f2
+Access method: heap
 
 drop table inserttest2;
 drop table inserttest;
@@ -461,6 +462,7 @@ Partitions: part_aa_bb FOR VALUES IN ('aa', 'bb'),
             part_null FOR VALUES IN (NULL),
             part_xx_yy FOR VALUES IN ('xx', 'yy'), PARTITIONED,
             part_default DEFAULT, PARTITIONED
+Access method: heap
 
 -- cleanup
 drop table range_parted, list_parted;
@@ -476,6 +478,7 @@ create table part_default partition of list_parted default;
  a      | integer |           |          |         | plain   |              | 
 Partition of: list_parted DEFAULT
 No partition constraint
+Access method: heap
 
 insert into part_default values (null);
 insert into part_default values (1);
@@ -813,6 +816,7 @@ Partitions: mcrparted1_lt_b FOR VALUES FROM (MINVALUE, MINVALUE) TO ('b', MINVAL
             mcrparted6_common_ge_10 FOR VALUES FROM ('common', 10) TO ('common', MAXVALUE),
             mcrparted7_gt_common_lt_d FOR VALUES FROM ('common', MAXVALUE) TO ('d', MINVALUE),
             mcrparted8_ge_d FOR VALUES FROM ('d', MINVALUE) TO (MAXVALUE, MAXVALUE)
+Access method: heap
 
 \d+ mcrparted1_lt_b
                               Table "public.mcrparted1_lt_b"
@@ -822,6 +826,7 @@ Partitions: mcrparted1_lt_b FOR VALUES FROM (MINVALUE, MINVALUE) TO ('b', MINVAL
  b      | integer |           |          |         | plain    |              | 
 Partition of: mcrparted FOR VALUES FROM (MINVALUE, MINVALUE) TO ('b', MINVALUE)
 Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a < 'b'::text))
+Access method: heap
 
 \d+ mcrparted2_b
                                 Table "public.mcrparted2_b"
@@ -831,6 +836,7 @@ Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a < 'b'::text))
  b      | integer |           |          |         | plain    |              | 
 Partition of: mcrparted FOR VALUES FROM ('b', MINVALUE) TO ('c', MINVALUE)
 Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a >= 'b'::text) AND (a < 'c'::text))
+Access method: heap
 
 \d+ mcrparted3_c_to_common
                            Table "public.mcrparted3_c_to_common"
@@ -840,6 +846,7 @@ Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a >= 'b'::text)
  b      | integer |           |          |         | plain    |              | 
 Partition of: mcrparted FOR VALUES FROM ('c', MINVALUE) TO ('common', MINVALUE)
 Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a >= 'c'::text) AND (a < 'common'::text))
+Access method: heap
 
 \d+ mcrparted4_common_lt_0
                            Table "public.mcrparted4_common_lt_0"
@@ -849,6 +856,7 @@ Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a >= 'c'::text)
  b      | integer |           |          |         | plain    |              | 
 Partition of: mcrparted FOR VALUES FROM ('common', MINVALUE) TO ('common', 0)
 Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a = 'common'::text) AND (b < 0))
+Access method: heap
 
 \d+ mcrparted5_common_0_to_10
                          Table "public.mcrparted5_common_0_to_10"
@@ -858,6 +866,7 @@ Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a = 'common'::te
  b      | integer |           |          |         | plain    |              | 
 Partition of: mcrparted FOR VALUES FROM ('common', 0) TO ('common', 10)
 Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a = 'common'::text) AND (b >= 0) AND (b < 10))
+Access method: heap
 
 \d+ mcrparted6_common_ge_10
                           Table "public.mcrparted6_common_ge_10"
@@ -867,6 +876,7 @@ Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a = 'common'::te
  b      | integer |           |          |         | plain    |              | 
 Partition of: mcrparted FOR VALUES FROM ('common', 10) TO ('common', MAXVALUE)
 Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a = 'common'::text) AND (b >= 10))
+Access method: heap
 
 \d+ mcrparted7_gt_common_lt_d
                          Table "public.mcrparted7_gt_common_lt_d"
@@ -876,6 +886,7 @@ Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a = 'common'::te
  b      | integer |           |          |         | plain    |              | 
 Partition of: mcrparted FOR VALUES FROM ('common', MAXVALUE) TO ('d', MINVALUE)
 Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a > 'common'::text) AND (a < 'd'::text))
+Access method: heap
 
 \d+ mcrparted8_ge_d
                               Table "public.mcrparted8_ge_d"
@@ -885,6 +896,7 @@ Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a > 'common'::te
  b      | integer |           |          |         | plain    |              | 
 Partition of: mcrparted FOR VALUES FROM ('d', MINVALUE) TO (MAXVALUE, MAXVALUE)
 Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a >= 'd'::text))
+Access method: heap
 
 insert into mcrparted values ('aaa', 0), ('b', 0), ('bz', 10), ('c', -10),
     ('comm', -10), ('common', -10), ('common', 0), ('common', 10),
diff --git a/src/test/regress/expected/matview.out b/src/test/regress/expected/matview.out
index 08cd4bea48..af943ea430 100644
--- a/src/test/regress/expected/matview.out
+++ b/src/test/regress/expected/matview.out
@@ -104,6 +104,7 @@ View definition:
     mvtest_tv.totamt
    FROM mvtest_tv
   ORDER BY mvtest_tv.type;
+Access method: heap
 
 \d+ mvtest_tvm
                            Materialized view "public.mvtest_tvm"
@@ -116,6 +117,7 @@ View definition:
     mvtest_tv.totamt
    FROM mvtest_tv
   ORDER BY mvtest_tv.type;
+Access method: heap
 
 \d+ mvtest_tvvm
                            Materialized view "public.mvtest_tvvm"
@@ -125,6 +127,7 @@ View definition:
 View definition:
  SELECT mvtest_tvv.grandtot
    FROM mvtest_tvv;
+Access method: heap
 
 \d+ mvtest_bb
                             Materialized view "public.mvtest_bb"
@@ -136,6 +139,7 @@ Indexes:
 View definition:
  SELECT mvtest_tvvmv.grandtot
    FROM mvtest_tvvmv;
+Access method: heap
 
 -- test schema behavior
 CREATE SCHEMA mvtest_mvschema;
@@ -152,6 +156,7 @@ Indexes:
 View definition:
  SELECT sum(mvtest_tvm.totamt) AS grandtot
    FROM mvtest_mvschema.mvtest_tvm;
+Access method: heap
 
 SET search_path = mvtest_mvschema, public;
 \d+ mvtest_tvm
@@ -165,6 +170,7 @@ View definition:
     mvtest_tv.totamt
    FROM mvtest_tv
   ORDER BY mvtest_tv.type;
+Access method: heap
 
 -- modify the underlying table data
 INSERT INTO mvtest_t VALUES (6, 'z', 13);
@@ -369,6 +375,7 @@ UNION ALL
  SELECT mvtest_vt2.moo,
     3 * mvtest_vt2.moo
    FROM mvtest_vt2;
+Access method: heap
 
 CREATE MATERIALIZED VIEW mv_test3 AS SELECT * FROM mv_test2 WHERE moo = 12345;
 SELECT relispopulated FROM pg_class WHERE oid = 'mv_test3'::regclass;
@@ -507,6 +514,7 @@ View definition:
     'foo'::text AS u,
     'foo'::text AS u2,
     NULL::text AS n;
+Access method: heap
 
 SELECT * FROM mv_unspecified_types;
  i  | num  |  u  | u2  | n 
diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out
index afbbdd543d..439a592778 100644
--- a/src/test/regress/expected/publication.out
+++ b/src/test/regress/expected/publication.out
@@ -74,6 +74,7 @@ Indexes:
     "testpub_tbl2_pkey" PRIMARY KEY, btree (id)
 Publications:
     "testpub_foralltables"
+Access method: heap
 
 \dRp+ testpub_foralltables
                         Publication testpub_foralltables
@@ -150,6 +151,7 @@ Publications:
     "testpib_ins_trunct"
     "testpub_default"
     "testpub_fortbl"
+Access method: heap
 
 \d+ testpub_tbl1
                                                 Table "public.testpub_tbl1"
@@ -163,6 +165,7 @@ Publications:
     "testpib_ins_trunct"
     "testpub_default"
     "testpub_fortbl"
+Access method: heap
 
 \dRp+ testpub_default
                            Publication testpub_default
@@ -188,6 +191,7 @@ Indexes:
 Publications:
     "testpib_ins_trunct"
     "testpub_fortbl"
+Access method: heap
 
 -- permissions
 SET ROLE regress_publication_user2;
diff --git a/src/test/regress/expected/replica_identity.out b/src/test/regress/expected/replica_identity.out
index 175ecd2879..9ae7a090b4 100644
--- a/src/test/regress/expected/replica_identity.out
+++ b/src/test/regress/expected/replica_identity.out
@@ -171,6 +171,7 @@ Indexes:
     "test_replica_identity_hash" hash (nonkey)
     "test_replica_identity_keyab" btree (keya, keyb)
 Replica Identity: FULL
+Access method: heap
 
 ALTER TABLE test_replica_identity REPLICA IDENTITY NOTHING;
 SELECT relreplident FROM pg_class WHERE oid = 'test_replica_identity'::regclass;
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 1d12b01068..b01ff58c41 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -958,6 +958,7 @@ Policies:
 Partitions: part_document_fiction FOR VALUES FROM (11) TO (12),
             part_document_nonfiction FOR VALUES FROM (99) TO (100),
             part_document_satire FOR VALUES FROM (55) TO (56)
+Access method: heap
 
 SELECT * FROM pg_policies WHERE schemaname = 'regress_rls_schema' AND tablename like '%part_document%' ORDER BY policyname;
      schemaname     |   tablename   | policyname | permissive  |       roles        | cmd |                    qual                    | with_check 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b68b8d273f..22c38ae2e8 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2817,6 +2817,7 @@ Rules:
     r3 AS
     ON DELETE TO rules_src DO
  NOTIFY rules_src_deletion
+Access method: heap
 
 --
 -- Ensure an aliased target relation for insert is correctly deparsed.
@@ -2845,6 +2846,7 @@ Rules:
     r5 AS
     ON UPDATE TO rules_src DO INSTEAD  UPDATE rules_log trgt SET tag = 'updated'::text
   WHERE trgt.f1 = new.f1
+Access method: heap
 
 --
 -- check alter rename rule
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
index d09326c182..6b857bbc14 100644
--- a/src/test/regress/expected/update.out
+++ b/src/test/regress/expected/update.out
@@ -669,6 +669,7 @@ create table part_def partition of range_parted default;
  e      | character varying |           |          |         | extended |              | 
 Partition of: range_parted DEFAULT
 Partition constraint: (NOT ((a IS NOT NULL) AND (b IS NOT NULL) AND (((a = 'a'::text) AND (b >= '1'::bigint) AND (b < '10'::bigint)) OR ((a = 'a'::text) AND (b >= '10'::bigint) AND (b < '20'::bigint)) OR ((a = 'b'::text) AND (b >= '1'::bigint) AND (b < '10'::bigint)) OR ((a = 'b'::text) AND (b >= '10'::bigint) AND (b < '20'::bigint)) OR ((a = 'b'::text) AND (b >= '20'::bigint) AND (b < '30'::bigint)))))
+Access method: heap
 
 insert into range_parted values ('c', 9);
 -- ok
#60Andres Freund
andres@anarazel.de
In reply to: Dmitry Dolgov (#59)
Re: Pluggable Storage - Andres's take

Hi,

On 2018-12-15 20:15:12 +0100, Dmitry Dolgov wrote:

On Tue, Dec 11, 2018 at 3:13 AM Andres Freund <andres@anarazel.de> wrote:

Further tasks I'm not yet planning to tackle, that I'd welcome help on:
- pg_dump support
- pg_upgrade testing
- I think we should consider removing HeapTuple->t_tableOid, it should
imo live entirely in the slot

I'm a bit confused, but what kind of pg_dump support you're talking about?
After a quick glance I don't see so far any table access specific logic there.
To check it I've created a test access method (which is a copy of heap, but
with some small differences) and pg_dump worked as expected.

We need to dump the table access method at dump time, otherwise we loose
that information.

As a side note, in a table description I haven't found any mention of which
access method is used for this table, probably it's useful to show that with \d+
(see the attached patch).

I'm not convinced that's really worth the cost of including it in \d
(rather than \d+ or such). When developing an alternative access method
it's extremely useful to be able to just change the default access
method, and run the existing tests, which this makes harder. It's also a
lot of churn.

Greetings,

Andres Freund

#61Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Andres Freund (#60)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Sat, Dec 15, 2018 at 8:37 PM Andres Freund <andres@anarazel.de> wrote:

We need to dump the table access method at dump time, otherwise we loose
that information.

Oh, right. So, something like in the attached patch?

As a side note, in a table description I haven't found any mention of which
access method is used for this table, probably it's useful to show that with \d+
(see the attached patch).

I'm not convinced that's really worth the cost of including it in \d
(rather than \d+ or such).

Maybe I'm missing the point, but I've meant exactly the same and the patch,
suggested in the previous email, add this info to \d+

Attachments:

pg_dump_access_method.patchapplication/octet-stream; name=pg_dump_access_method.patchDownload
commit 37cfd7cf84fcdaeff7ba5ed6e56c6692377e9b37
Author: erthalion <9erthalion6@gmail.com>
Date:   Sun Dec 16 20:31:33 2018 +0100

    pg_dump support

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 637c79af48..fca00d7b5c 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -5829,6 +5829,7 @@ getTables(Archive *fout, int *numTables)
 	int			i_partkeydef;
 	int			i_ispartition;
 	int			i_partbound;
+	int			i_amname;
 
 	/*
 	 * Find all the tables and table-like objects.
@@ -5914,7 +5915,7 @@ getTables(Archive *fout, int *numTables)
 						  "tc.relfrozenxid AS tfrozenxid, "
 						  "tc.relminmxid AS tminmxid, "
 						  "c.relpersistence, c.relispopulated, "
-						  "c.relreplident, c.relpages, "
+						  "c.relreplident, c.relpages, am.amname AS amname, "
 						  "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
 						  "d.refobjid AS owning_tab, "
 						  "d.refobjsubid AS owning_col, "
@@ -5945,6 +5946,7 @@ getTables(Archive *fout, int *numTables)
 						  "d.objsubid = 0 AND "
 						  "d.refclassid = c.tableoid AND d.deptype IN ('a', 'i')) "
 						  "LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid) "
+						  "LEFT JOIN pg_am am ON (c.relam = am.oid) "
 						  "LEFT JOIN pg_init_privs pip ON "
 						  "(c.oid = pip.objoid "
 						  "AND pip.classoid = 'pg_class'::regclass "
@@ -6412,6 +6414,7 @@ getTables(Archive *fout, int *numTables)
 	i_partkeydef = PQfnumber(res, "partkeydef");
 	i_ispartition = PQfnumber(res, "ispartition");
 	i_partbound = PQfnumber(res, "partbound");
+	i_amname = PQfnumber(res, "amname");
 
 	if (dopt->lockWaitTimeout)
 	{
@@ -6481,6 +6484,10 @@ getTables(Archive *fout, int *numTables)
 		else
 			tblinfo[i].checkoption = pg_strdup(PQgetvalue(res, i, i_checkoption));
 		tblinfo[i].toast_reloptions = pg_strdup(PQgetvalue(res, i, i_toastreloptions));
+		if (PQgetisnull(res, i, i_amname))
+			tblinfo[i].amname = NULL;
+		else
+			tblinfo[i].amname = pg_strdup(PQgetvalue(res, i, i_amname));
 
 		/* other fields were zeroed above */
 
@@ -12546,6 +12553,9 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
 		case AMTYPE_INDEX:
 			appendPQExpBuffer(q, "TYPE INDEX ");
 			break;
+		case AMTYPE_TABLE:
+			appendPQExpBuffer(q, "TYPE TABLE ");
+			break;
 		default:
 			write_msg(NULL, "WARNING: invalid type \"%c\" of access method \"%s\"\n",
 					  aminfo->amtype, qamname);
@@ -15601,6 +15611,9 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 			if (tbinfo->relkind == RELKIND_PARTITIONED_TABLE)
 				appendPQExpBuffer(q, "\nPARTITION BY %s", tbinfo->partkeydef);
 
+			if (tbinfo->amname != NULL && strcmp(tbinfo->amname, "heap") != 0)
+				appendPQExpBuffer(q, "\nUSING %s", tbinfo->amname);
+
 			if (tbinfo->relkind == RELKIND_FOREIGN_TABLE)
 				appendPQExpBuffer(q, "\nSERVER %s", fmtId(srvname));
 		}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 789d6a24e2..4ca6a802f3 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -324,6 +324,7 @@ typedef struct _tableInfo
 	char	   *partkeydef;		/* partition key definition */
 	char	   *partbound;		/* partition bound definition */
 	bool		needs_override; /* has GENERATED ALWAYS AS IDENTITY */
+	char	   *amname;			/* table access method */
 
 	/*
 	 * Stuff computed only for dumpable tables.
In reply to: Dmitry Dolgov (#52)
Re: Pluggable Storage - Andres's take

On Mon, Dec 10, 2018 at 8:13 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Just out of curiosity I've also tried tpc-c from oltpbench (in the very same
simple environment), it doesn't show any significant difference from master as
well.

FWIW, I have found BenchmarkSQL to be significantly better than
oltpbench, having used both quite a bit now:

https://bitbucket.org/openscg/benchmarksql

For example, oltpbench requires a max_connections setting that far
exceeds the number of terminals/clients used by the benchmark, because
the number of connections used during bulk loading far exceeds what is
truly required. BenchmarkSQL also makes it easy to generate useful
html reports, complete with graphs.

--
Peter Geoghegan

#63Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Peter Geoghegan (#62)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Sat, Dec 15, 2018 at 8:37 PM Andres Freund <andres@anarazel.de> wrote:

We need to dump the table access method at dump time, otherwise we loose
that information.

As a result of the discussion in [1]/messages/by-id/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de (btw, thanks for starting it), here is
proposed solution with tracking current default_table_access_method. Next I'll
tackle similar issue for psql and probably add some tests for both patches.

[1]: /messages/by-id/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de

Attachments:

pg_dump_access_method.patchapplication/octet-stream; name=pg_dump_access_method.patchDownload
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 58bd3805f4..f9bae43132 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -85,6 +85,7 @@ static void _becomeUser(ArchiveHandle *AH, const char *user);
 static void _becomeOwner(ArchiveHandle *AH, TocEntry *te);
 static void _selectOutputSchema(ArchiveHandle *AH, const char *schemaName);
 static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
+static void _selectTableAccessMethod(ArchiveHandle *AH, const char *tablespace);
 static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
 static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
 static void processSearchPathEntry(ArchiveHandle *AH, TocEntry *te);
@@ -1072,6 +1073,7 @@ ArchiveEntry(Archive *AHX,
 			 const char *namespace,
 			 const char *tablespace,
 			 const char *owner,
+			 const char *tableam,
 			 const char *desc, teSection section,
 			 const char *defn,
 			 const char *dropStmt, const char *copyStmt,
@@ -1099,6 +1101,7 @@ ArchiveEntry(Archive *AHX,
 	newToc->tag = pg_strdup(tag);
 	newToc->namespace = namespace ? pg_strdup(namespace) : NULL;
 	newToc->tablespace = tablespace ? pg_strdup(tablespace) : NULL;
+	newToc->tableam = tableam ? pg_strdup(tableam) : NULL;
 	newToc->owner = pg_strdup(owner);
 	newToc->desc = pg_strdup(desc);
 	newToc->defn = pg_strdup(defn);
@@ -2367,6 +2370,7 @@ _allocAH(const char *FileSpec, const ArchiveFormat fmt,
 	AH->currUser = NULL;		/* unknown */
 	AH->currSchema = NULL;		/* ditto */
 	AH->currTablespace = NULL;	/* ditto */
+	AH->currTableAm = NULL;	/* ditto */
 
 	AH->toc = (TocEntry *) pg_malloc0(sizeof(TocEntry));
 
@@ -2594,6 +2598,7 @@ WriteToc(ArchiveHandle *AH)
 		WriteStr(AH, te->namespace);
 		WriteStr(AH, te->tablespace);
 		WriteStr(AH, te->owner);
+		WriteStr(AH, te->tableam);
 		WriteStr(AH, "false");
 
 		/* Dump list of dependencies */
@@ -2696,6 +2701,7 @@ ReadToc(ArchiveHandle *AH)
 			te->tablespace = ReadStr(AH);
 
 		te->owner = ReadStr(AH);
+		te->tableam = ReadStr(AH);
 		if (AH->version < K_VERS_1_9 || strcmp(ReadStr(AH), "true") == 0)
 			write_msg(modulename,
 					  "WARNING: restoring tables WITH OIDS is not supported anymore");
@@ -3288,6 +3294,9 @@ _reconnectToDB(ArchiveHandle *AH, const char *dbname)
 	if (AH->currTablespace)
 		free(AH->currTablespace);
 	AH->currTablespace = NULL;
+	if (AH->currTableAm)
+		free(AH->currTableAm);
+	AH->currTableAm = NULL;
 
 	/* re-establish fixed state */
 	_doSetFixedOutputState(AH);
@@ -3448,6 +3457,48 @@ _selectTablespace(ArchiveHandle *AH, const char *tablespace)
 	destroyPQExpBuffer(qry);
 }
 
+/*
+ * Set the proper default_table_access_method value for the table.
+ */
+static void
+_selectTableAccessMethod(ArchiveHandle *AH, const char *tableam)
+{
+	PQExpBuffer cmd = createPQExpBuffer();
+	const char *want, *have;
+
+	have = AH->currTableAm;
+	want = tableam;
+
+	if (!want)
+		return;
+
+	if (have && strcmp(want, have) == 0)
+		return;
+
+
+	appendPQExpBuffer(cmd, "SET default_table_access_method = %s;", tableam);
+
+	if (RestoringToDB(AH))
+	{
+		PGresult   *res;
+
+		res = PQexec(AH->connection, cmd->data);
+
+		if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
+			warn_or_exit_horribly(AH, modulename,
+								  "could not set default_table_access_method: %s",
+								  PQerrorMessage(AH->connection));
+
+		PQclear(res);
+	}
+	else
+		ahprintf(AH, "%s\n\n", cmd->data);
+
+	destroyPQExpBuffer(cmd);
+
+	AH->currTableAm = pg_strdup(want);
+}
+
 /*
  * Extract an object description for a TOC entry, and append it to buf.
  *
@@ -3547,6 +3598,7 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
 	_becomeOwner(AH, te);
 	_selectOutputSchema(AH, te->namespace);
 	_selectTablespace(AH, te->tablespace);
+	_selectTableAccessMethod(AH, te->tableam);
 
 	/* Emit header comment for item */
 	if (!AH->noTocComments)
@@ -4021,6 +4073,9 @@ restore_toc_entries_prefork(ArchiveHandle *AH, TocEntry *pending_list)
 	if (AH->currTablespace)
 		free(AH->currTablespace);
 	AH->currTablespace = NULL;
+	if (AH->currTableAm)
+		free(AH->currTableAm);
+	AH->currTableAm = NULL;
 }
 
 /*
@@ -4816,6 +4871,7 @@ CloneArchive(ArchiveHandle *AH)
 	clone->currUser = NULL;
 	clone->currSchema = NULL;
 	clone->currTablespace = NULL;
+	clone->currTableAm = NULL;
 
 	/* savedPassword must be local in case we change it while connecting */
 	if (clone->savedPassword)
@@ -4906,6 +4962,8 @@ DeCloneArchive(ArchiveHandle *AH)
 		free(AH->currSchema);
 	if (AH->currTablespace)
 		free(AH->currTablespace);
+	if (AH->currTableAm)
+		free(AH->currTableAm);
 	if (AH->savedPassword)
 		free(AH->savedPassword);
 
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 306d2ceba9..719065565b 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -347,6 +347,7 @@ struct _archiveHandle
 	char	   *currUser;		/* current username, or NULL if unknown */
 	char	   *currSchema;		/* current schema, or NULL */
 	char	   *currTablespace; /* current tablespace, or NULL */
+	char	   *currTableAm; 	/* current table access method, or NULL */
 
 	void	   *lo_buf;
 	size_t		lo_buf_used;
@@ -373,6 +374,8 @@ struct _tocEntry
 	char	   *namespace;		/* null or empty string if not in a schema */
 	char	   *tablespace;		/* null if not in a tablespace; empty string
 								 * means use database default */
+	char	   *tableam;			/* table access method, onlyt for TABLE tags */
+
 	char	   *owner;
 	char	   *desc;
 	char	   *defn;
@@ -410,7 +413,7 @@ extern TocEntry *ArchiveEntry(Archive *AHX,
 			 CatalogId catalogId, DumpId dumpId,
 			 const char *tag,
 			 const char *namespace, const char *tablespace,
-			 const char *owner,
+			 const char *owner, const char *amname,
 			 const char *desc, teSection section,
 			 const char *defn,
 			 const char *dropStmt, const char *copyStmt,
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 637c79af48..a3878ed9a2 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2136,7 +2136,7 @@ dumpTableData(Archive *fout, TableDataInfo *tdinfo)
 
 		te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
 						  tbinfo->dobj.name, tbinfo->dobj.namespace->dobj.name,
-						  NULL, tbinfo->rolname,
+						  NULL, tbinfo->rolname, NULL,
 						  "TABLE DATA", SECTION_DATA,
 						  "", "", copyStmt,
 						  &(tbinfo->dobj.dumpId), 1,
@@ -2188,6 +2188,7 @@ refreshMatViewData(Archive *fout, TableDataInfo *tdinfo)
 					 tbinfo->dobj.namespace->dobj.name, /* Namespace */
 					 NULL,		/* Tablespace */
 					 tbinfo->rolname,	/* Owner */
+					 NULL,				/* Table access method */
 					 "MATERIALIZED VIEW DATA",	/* Desc */
 					 SECTION_POST_DATA, /* Section */
 					 q->data,	/* Create */
@@ -2726,6 +2727,7 @@ dumpDatabase(Archive *fout)
 				 NULL,			/* Namespace */
 				 NULL,			/* Tablespace */
 				 dba,			/* Owner */
+				 NULL,			/* Table access method */
 				 "DATABASE",	/* Desc */
 				 SECTION_PRE_DATA,	/* Section */
 				 creaQry->data, /* Create */
@@ -2762,7 +2764,7 @@ dumpDatabase(Archive *fout)
 			appendPQExpBufferStr(dbQry, ";\n");
 
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
-						 labelq->data, NULL, NULL, dba,
+						 labelq->data, NULL, NULL, dba, NULL,
 						 "COMMENT", SECTION_NONE,
 						 dbQry->data, "", NULL,
 						 &(dbDumpId), 1,
@@ -2789,7 +2791,7 @@ dumpDatabase(Archive *fout)
 		emitShSecLabels(conn, shres, seclabelQry, "DATABASE", datname);
 		if (seclabelQry->len > 0)
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
-						 labelq->data, NULL, NULL, dba,
+						 labelq->data, NULL, NULL, dba, NULL,
 						 "SECURITY LABEL", SECTION_NONE,
 						 seclabelQry->data, "", NULL,
 						 &(dbDumpId), 1,
@@ -2859,7 +2861,7 @@ dumpDatabase(Archive *fout)
 
 	if (creaQry->len > 0)
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 datname, NULL, NULL, dba,
+					 datname, NULL, NULL, dba, NULL,
 					 "DATABASE PROPERTIES", SECTION_PRE_DATA,
 					 creaQry->data, delQry->data, NULL,
 					 &(dbDumpId), 1,
@@ -2904,7 +2906,7 @@ dumpDatabase(Archive *fout)
 						  atooid(PQgetvalue(lo_res, 0, i_relminmxid)),
 						  LargeObjectRelationId);
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 "pg_largeobject", NULL, NULL, "",
+					 "pg_largeobject", NULL, NULL, "", NULL,
 					 "pg_largeobject", SECTION_PRE_DATA,
 					 loOutQry->data, "", NULL,
 					 NULL, 0,
@@ -3014,7 +3016,7 @@ dumpEncoding(Archive *AH)
 	appendPQExpBufferStr(qry, ";\n");
 
 	ArchiveEntry(AH, nilCatalogId, createDumpId(),
-				 "ENCODING", NULL, NULL, "",
+				 "ENCODING", NULL, NULL, "", NULL,
 				 "ENCODING", SECTION_PRE_DATA,
 				 qry->data, "", NULL,
 				 NULL, 0,
@@ -3041,7 +3043,7 @@ dumpStdStrings(Archive *AH)
 					  stdstrings);
 
 	ArchiveEntry(AH, nilCatalogId, createDumpId(),
-				 "STDSTRINGS", NULL, NULL, "",
+				 "STDSTRINGS", NULL, NULL, "", NULL,
 				 "STDSTRINGS", SECTION_PRE_DATA,
 				 qry->data, "", NULL,
 				 NULL, 0,
@@ -3097,7 +3099,7 @@ dumpSearchPath(Archive *AH)
 		write_msg(NULL, "saving search_path = %s\n", path->data);
 
 	ArchiveEntry(AH, nilCatalogId, createDumpId(),
-				 "SEARCHPATH", NULL, NULL, "",
+				 "SEARCHPATH", NULL, NULL, "", NULL,
 				 "SEARCHPATH", SECTION_PRE_DATA,
 				 qry->data, "", NULL,
 				 NULL, 0,
@@ -3275,7 +3277,7 @@ dumpBlob(Archive *fout, BlobInfo *binfo)
 		ArchiveEntry(fout, binfo->dobj.catId, binfo->dobj.dumpId,
 					 binfo->dobj.name,
 					 NULL, NULL,
-					 binfo->rolname,
+					 binfo->rolname, NULL,
 					 "BLOB", SECTION_PRE_DATA,
 					 cquery->data, dquery->data, NULL,
 					 NULL, 0,
@@ -3581,6 +3583,7 @@ dumpPolicy(Archive *fout, PolicyInfo *polinfo)
 						 polinfo->dobj.namespace->dobj.name,
 						 NULL,
 						 tbinfo->rolname,
+						 NULL,
 						 "ROW SECURITY", SECTION_POST_DATA,
 						 query->data, "", NULL,
 						 &(tbinfo->dobj.dumpId), 1,
@@ -3637,6 +3640,7 @@ dumpPolicy(Archive *fout, PolicyInfo *polinfo)
 					 polinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "POLICY", SECTION_POST_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -3811,6 +3815,7 @@ dumpPublication(Archive *fout, PublicationInfo *pubinfo)
 				 NULL,
 				 NULL,
 				 pubinfo->rolname,
+				 NULL,
 				 "PUBLICATION", SECTION_POST_DATA,
 				 query->data, delq->data, NULL,
 				 NULL, 0,
@@ -3954,6 +3959,7 @@ dumpPublicationTable(Archive *fout, PublicationRelInfo *pubrinfo)
 				 tbinfo->dobj.namespace->dobj.name,
 				 NULL,
 				 "",
+				 NULL,
 				 "PUBLICATION TABLE", SECTION_POST_DATA,
 				 query->data, "", NULL,
 				 NULL, 0,
@@ -4147,6 +4153,7 @@ dumpSubscription(Archive *fout, SubscriptionInfo *subinfo)
 				 NULL,
 				 NULL,
 				 subinfo->rolname,
+				 NULL,
 				 "SUBSCRIPTION", SECTION_POST_DATA,
 				 query->data, delq->data, NULL,
 				 NULL, 0,
@@ -5829,6 +5836,7 @@ getTables(Archive *fout, int *numTables)
 	int			i_partkeydef;
 	int			i_ispartition;
 	int			i_partbound;
+	int			i_amname;
 
 	/*
 	 * Find all the tables and table-like objects.
@@ -5914,7 +5922,7 @@ getTables(Archive *fout, int *numTables)
 						  "tc.relfrozenxid AS tfrozenxid, "
 						  "tc.relminmxid AS tminmxid, "
 						  "c.relpersistence, c.relispopulated, "
-						  "c.relreplident, c.relpages, "
+						  "c.relreplident, c.relpages, am.amname AS amname, "
 						  "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
 						  "d.refobjid AS owning_tab, "
 						  "d.refobjsubid AS owning_col, "
@@ -5945,6 +5953,7 @@ getTables(Archive *fout, int *numTables)
 						  "d.objsubid = 0 AND "
 						  "d.refclassid = c.tableoid AND d.deptype IN ('a', 'i')) "
 						  "LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid) "
+						  "LEFT JOIN pg_am am ON (c.relam = am.oid) "
 						  "LEFT JOIN pg_init_privs pip ON "
 						  "(c.oid = pip.objoid "
 						  "AND pip.classoid = 'pg_class'::regclass "
@@ -6412,6 +6421,7 @@ getTables(Archive *fout, int *numTables)
 	i_partkeydef = PQfnumber(res, "partkeydef");
 	i_ispartition = PQfnumber(res, "ispartition");
 	i_partbound = PQfnumber(res, "partbound");
+	i_amname = PQfnumber(res, "amname");
 
 	if (dopt->lockWaitTimeout)
 	{
@@ -6481,6 +6491,11 @@ getTables(Archive *fout, int *numTables)
 		else
 			tblinfo[i].checkoption = pg_strdup(PQgetvalue(res, i, i_checkoption));
 		tblinfo[i].toast_reloptions = pg_strdup(PQgetvalue(res, i, i_toastreloptions));
+		if (PQgetisnull(res, i, i_amname))
+			tblinfo[i].amname = NULL;
+		else
+			tblinfo[i].amname = pg_strdup(PQgetvalue(res, i, i_amname));
+
 
 		/* other fields were zeroed above */
 
@@ -9355,7 +9370,7 @@ dumpComment(Archive *fout, const char *type, const char *name,
 		 * post-data.
 		 */
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 tag->data, namespace, NULL, owner,
+					 tag->data, namespace, NULL, owner, NULL,
 					 "COMMENT", SECTION_NONE,
 					 query->data, "", NULL,
 					 &(dumpId), 1,
@@ -9423,7 +9438,7 @@ dumpTableComment(Archive *fout, TableInfo *tbinfo,
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
 						 tag->data,
 						 tbinfo->dobj.namespace->dobj.name,
-						 NULL, tbinfo->rolname,
+						 NULL, tbinfo->rolname, NULL,
 						 "COMMENT", SECTION_NONE,
 						 query->data, "", NULL,
 						 &(tbinfo->dobj.dumpId), 1,
@@ -9447,7 +9462,7 @@ dumpTableComment(Archive *fout, TableInfo *tbinfo,
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
 						 tag->data,
 						 tbinfo->dobj.namespace->dobj.name,
-						 NULL, tbinfo->rolname,
+						 NULL, tbinfo->rolname, NULL,
 						 "COMMENT", SECTION_NONE,
 						 query->data, "", NULL,
 						 &(tbinfo->dobj.dumpId), 1,
@@ -9728,7 +9743,7 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 				TocEntry   *te;
 
 				te = ArchiveEntry(fout, dobj->catId, dobj->dumpId,
-								  dobj->name, NULL, NULL, "",
+								  dobj->name, NULL, NULL, "", NULL,
 								  "BLOBS", SECTION_DATA,
 								  "", "", NULL,
 								  NULL, 0,
@@ -9802,7 +9817,7 @@ dumpNamespace(Archive *fout, NamespaceInfo *nspinfo)
 		ArchiveEntry(fout, nspinfo->dobj.catId, nspinfo->dobj.dumpId,
 					 nspinfo->dobj.name,
 					 NULL, NULL,
-					 nspinfo->rolname,
+					 nspinfo->rolname, NULL,
 					 "SCHEMA", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -9938,7 +9953,7 @@ dumpExtension(Archive *fout, ExtensionInfo *extinfo)
 		ArchiveEntry(fout, extinfo->dobj.catId, extinfo->dobj.dumpId,
 					 extinfo->dobj.name,
 					 NULL, NULL,
-					 "",
+					 "", NULL,
 					 "EXTENSION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10090,6 +10105,7 @@ dumpEnumType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10217,6 +10233,7 @@ dumpRangeType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10290,6 +10307,7 @@ dumpUndefinedType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10572,6 +10590,7 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10729,6 +10748,7 @@ dumpDomain(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "DOMAIN", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10951,6 +10971,7 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -11085,7 +11106,7 @@ dumpCompositeTypeColComments(Archive *fout, TypeInfo *tyinfo)
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
 						 target->data,
 						 tyinfo->dobj.namespace->dobj.name,
-						 NULL, tyinfo->rolname,
+						 NULL, tyinfo->rolname, NULL,
 						 "COMMENT", SECTION_NONE,
 						 query->data, "", NULL,
 						 &(tyinfo->dobj.dumpId), 1,
@@ -11142,6 +11163,7 @@ dumpShellType(Archive *fout, ShellTypeInfo *stinfo)
 					 stinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 stinfo->baseType->rolname,
+					 NULL,
 					 "SHELL TYPE", SECTION_PRE_DATA,
 					 q->data, "", NULL,
 					 NULL, 0,
@@ -11251,7 +11273,7 @@ dumpProcLang(Archive *fout, ProcLangInfo *plang)
 	if (plang->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, plang->dobj.catId, plang->dobj.dumpId,
 					 plang->dobj.name,
-					 NULL, NULL, plang->lanowner,
+					 NULL, NULL, plang->lanowner, NULL,
 					 "PROCEDURAL LANGUAGE", SECTION_PRE_DATA,
 					 defqry->data, delqry->data, NULL,
 					 NULL, 0,
@@ -11924,6 +11946,7 @@ dumpFunc(Archive *fout, FuncInfo *finfo)
 					 finfo->dobj.namespace->dobj.name,
 					 NULL,
 					 finfo->rolname,
+					 NULL,
 					 keyword, SECTION_PRE_DATA,
 					 q->data, delqry->data, NULL,
 					 NULL, 0,
@@ -12056,7 +12079,7 @@ dumpCast(Archive *fout, CastInfo *cast)
 	if (cast->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, cast->dobj.catId, cast->dobj.dumpId,
 					 labelq->data,
-					 NULL, NULL, "",
+					 NULL, NULL, "", NULL,
 					 "CAST", SECTION_PRE_DATA,
 					 defqry->data, delqry->data, NULL,
 					 NULL, 0,
@@ -12184,7 +12207,7 @@ dumpTransform(Archive *fout, TransformInfo *transform)
 	if (transform->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, transform->dobj.catId, transform->dobj.dumpId,
 					 labelq->data,
-					 NULL, NULL, "",
+					 NULL, NULL, "", NULL,
 					 "TRANSFORM", SECTION_PRE_DATA,
 					 defqry->data, delqry->data, NULL,
 					 transform->dobj.dependencies, transform->dobj.nDeps,
@@ -12400,6 +12423,7 @@ dumpOpr(Archive *fout, OprInfo *oprinfo)
 					 oprinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 oprinfo->rolname,
+					 NULL,
 					 "OPERATOR", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -12546,6 +12570,9 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
 		case AMTYPE_INDEX:
 			appendPQExpBuffer(q, "TYPE INDEX ");
 			break;
+		case AMTYPE_TABLE:
+			appendPQExpBuffer(q, "TYPE TABLE ");
+			break;
 		default:
 			write_msg(NULL, "WARNING: invalid type \"%c\" of access method \"%s\"\n",
 					  aminfo->amtype, qamname);
@@ -12570,6 +12597,7 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
 					 NULL,
 					 NULL,
 					 "",
+					 NULL,
 					 "ACCESS METHOD", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -12936,6 +12964,7 @@ dumpOpclass(Archive *fout, OpclassInfo *opcinfo)
 					 opcinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 opcinfo->rolname,
+					 NULL,
 					 "OPERATOR CLASS", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13203,6 +13232,7 @@ dumpOpfamily(Archive *fout, OpfamilyInfo *opfinfo)
 					 opfinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 opfinfo->rolname,
+					 NULL,
 					 "OPERATOR FAMILY", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13346,6 +13376,7 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
 					 collinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 collinfo->rolname,
+					 NULL,
 					 "COLLATION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13441,6 +13472,7 @@ dumpConversion(Archive *fout, ConvInfo *convinfo)
 					 convinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 convinfo->rolname,
+					 NULL,
 					 "CONVERSION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13930,6 +13962,7 @@ dumpAgg(Archive *fout, AggInfo *agginfo)
 					 agginfo->aggfn.dobj.namespace->dobj.name,
 					 NULL,
 					 agginfo->aggfn.rolname,
+					 NULL,
 					 "AGGREGATE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14028,6 +14061,7 @@ dumpTSParser(Archive *fout, TSParserInfo *prsinfo)
 					 prsinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 "",
+					 NULL,
 					 "TEXT SEARCH PARSER", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14108,6 +14142,7 @@ dumpTSDictionary(Archive *fout, TSDictInfo *dictinfo)
 					 dictinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 dictinfo->rolname,
+					 NULL,
 					 "TEXT SEARCH DICTIONARY", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14169,6 +14204,7 @@ dumpTSTemplate(Archive *fout, TSTemplateInfo *tmplinfo)
 					 tmplinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 "",
+					 NULL,
 					 "TEXT SEARCH TEMPLATE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14289,6 +14325,7 @@ dumpTSConfig(Archive *fout, TSConfigInfo *cfginfo)
 					 cfginfo->dobj.namespace->dobj.name,
 					 NULL,
 					 cfginfo->rolname,
+					 NULL,
 					 "TEXT SEARCH CONFIGURATION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14355,6 +14392,7 @@ dumpForeignDataWrapper(Archive *fout, FdwInfo *fdwinfo)
 					 NULL,
 					 NULL,
 					 fdwinfo->rolname,
+					 NULL,
 					 "FOREIGN DATA WRAPPER", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14446,6 +14484,7 @@ dumpForeignServer(Archive *fout, ForeignServerInfo *srvinfo)
 					 NULL,
 					 NULL,
 					 srvinfo->rolname,
+					 NULL,
 					 "SERVER", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14564,6 +14603,7 @@ dumpUserMappings(Archive *fout,
 					 namespace,
 					 NULL,
 					 owner,
+					 NULL,
 					 "USER MAPPING", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 &dumpId, 1,
@@ -14643,6 +14683,7 @@ dumpDefaultACL(Archive *fout, DefaultACLInfo *daclinfo)
 					 daclinfo->dobj.namespace ? daclinfo->dobj.namespace->dobj.name : NULL,
 					 NULL,
 					 daclinfo->defaclrole,
+					 NULL,
 					 "DEFAULT ACL", SECTION_POST_DATA,
 					 q->data, "", NULL,
 					 NULL, 0,
@@ -14741,6 +14782,7 @@ dumpACL(Archive *fout, CatalogId objCatId, DumpId objDumpId,
 					 tag->data, nspname,
 					 NULL,
 					 owner ? owner : "",
+					 NULL,
 					 "ACL", SECTION_NONE,
 					 sql->data, "", NULL,
 					 &(objDumpId), 1,
@@ -14826,7 +14868,7 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
 
 		appendPQExpBuffer(tag, "%s %s", type, name);
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 tag->data, namespace, NULL, owner,
+					 tag->data, namespace, NULL, owner, NULL,
 					 "SECURITY LABEL", SECTION_NONE,
 					 query->data, "", NULL,
 					 &(dumpId), 1,
@@ -14908,7 +14950,7 @@ dumpTableSecLabel(Archive *fout, TableInfo *tbinfo, const char *reltypename)
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
 					 target->data,
 					 tbinfo->dobj.namespace->dobj.name,
-					 NULL, tbinfo->rolname,
+					 NULL, tbinfo->rolname, NULL,
 					 "SECURITY LABEL", SECTION_NONE,
 					 query->data, "", NULL,
 					 &(tbinfo->dobj.dumpId), 1,
@@ -15994,6 +16036,8 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 (tbinfo->relkind == RELKIND_VIEW) ? NULL : tbinfo->reltablespace,
 					 tbinfo->rolname,
+					 (tbinfo->relkind == RELKIND_RELATION) ?
+					 tbinfo->amname : NULL,
 					 reltypename,
 					 tbinfo->postponed_def ?
 					 SECTION_POST_DATA : SECTION_PRE_DATA,
@@ -16074,6 +16118,7 @@ dumpAttrDef(Archive *fout, AttrDefInfo *adinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "DEFAULT", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -16190,6 +16235,7 @@ dumpIndex(Archive *fout, IndxInfo *indxinfo)
 						 tbinfo->dobj.namespace->dobj.name,
 						 indxinfo->tablespace,
 						 tbinfo->rolname,
+						 NULL,
 						 "INDEX", SECTION_POST_DATA,
 						 q->data, delq->data, NULL,
 						 NULL, 0,
@@ -16234,6 +16280,7 @@ dumpIndexAttach(Archive *fout, IndexAttachInfo *attachinfo)
 					 attachinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 "",
+					 NULL,
 					 "INDEX ATTACH", SECTION_POST_DATA,
 					 q->data, "", NULL,
 					 NULL, 0,
@@ -16289,6 +16336,7 @@ dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo)
 					 statsextinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 statsextinfo->rolname,
+					 NULL,
 					 "STATISTICS", SECTION_POST_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -16450,6 +16498,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 						 tbinfo->dobj.namespace->dobj.name,
 						 indxinfo->tablespace,
 						 tbinfo->rolname,
+						 NULL,
 						 "CONSTRAINT", SECTION_POST_DATA,
 						 q->data, delq->data, NULL,
 						 NULL, 0,
@@ -16490,6 +16539,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 						 tbinfo->dobj.namespace->dobj.name,
 						 NULL,
 						 tbinfo->rolname,
+						 NULL,
 						 "FK CONSTRAINT", SECTION_POST_DATA,
 						 q->data, delq->data, NULL,
 						 NULL, 0,
@@ -16522,6 +16572,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 							 tbinfo->dobj.namespace->dobj.name,
 							 NULL,
 							 tbinfo->rolname,
+							 NULL,
 							 "CHECK CONSTRAINT", SECTION_POST_DATA,
 							 q->data, delq->data, NULL,
 							 NULL, 0,
@@ -16555,6 +16606,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 							 tyinfo->dobj.namespace->dobj.name,
 							 NULL,
 							 tyinfo->rolname,
+							 NULL,
 							 "CHECK CONSTRAINT", SECTION_POST_DATA,
 							 q->data, delq->data, NULL,
 							 NULL, 0,
@@ -16829,6 +16881,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "SEQUENCE", SECTION_PRE_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -16870,6 +16923,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
 							 tbinfo->dobj.namespace->dobj.name,
 							 NULL,
 							 tbinfo->rolname,
+							 NULL,
 							 "SEQUENCE OWNED BY", SECTION_PRE_DATA,
 							 query->data, "", NULL,
 							 &(tbinfo->dobj.dumpId), 1,
@@ -16938,6 +16992,7 @@ dumpSequenceData(Archive *fout, TableDataInfo *tdinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "SEQUENCE SET", SECTION_DATA,
 					 query->data, "", NULL,
 					 &(tbinfo->dobj.dumpId), 1,
@@ -17137,6 +17192,7 @@ dumpTrigger(Archive *fout, TriggerInfo *tginfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "TRIGGER", SECTION_POST_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -17223,7 +17279,7 @@ dumpEventTrigger(Archive *fout, EventTriggerInfo *evtinfo)
 	if (evtinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, evtinfo->dobj.catId, evtinfo->dobj.dumpId,
 					 evtinfo->dobj.name, NULL, NULL,
-					 evtinfo->evtowner,
+					 evtinfo->evtowner, NULL,
 					 "EVENT TRIGGER", SECTION_POST_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -17384,6 +17440,7 @@ dumpRule(Archive *fout, RuleInfo *rinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "RULE", SECTION_POST_DATA,
 					 cmd->data, delcmd->data, NULL,
 					 NULL, 0,
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 789d6a24e2..4024d0c1e3 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -324,6 +324,7 @@ typedef struct _tableInfo
 	char	   *partkeydef;		/* partition key definition */
 	char	   *partbound;		/* partition bound definition */
 	bool		needs_override; /* has GENERATED ALWAYS AS IDENTITY */
+	char	   *amname; 		/* table access method */
 
 	/*
 	 * Stuff computed only for dumpable tables.
#64Andres Freund
andres@anarazel.de
In reply to: Dmitry Dolgov (#63)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-01-12 01:35:06 +0100, Dmitry Dolgov wrote:

On Sat, Dec 15, 2018 at 8:37 PM Andres Freund <andres@anarazel.de> wrote:

We need to dump the table access method at dump time, otherwise we loose
that information.

As a result of the discussion in [1] (btw, thanks for starting it), here is
proposed solution with tracking current default_table_access_method. Next I'll
tackle similar issue for psql and probably add some tests for both patches.

Thanks!

+/*
+ * Set the proper default_table_access_method value for the table.
+ */
+static void
+_selectTableAccessMethod(ArchiveHandle *AH, const char *tableam)
+{
+	PQExpBuffer cmd = createPQExpBuffer();
+	const char *want, *have;
+
+	have = AH->currTableAm;
+	want = tableam;
+
+	if (!want)
+		return;
+
+	if (have && strcmp(want, have) == 0)
+		return;
+
+
+	appendPQExpBuffer(cmd, "SET default_table_access_method = %s;", tableam);

This needs escaping, at the very least with "", but better with proper
routines for dealing with identifiers.

@@ -5914,7 +5922,7 @@ getTables(Archive *fout, int *numTables)
"tc.relfrozenxid AS tfrozenxid, "
"tc.relminmxid AS tminmxid, "
"c.relpersistence, c.relispopulated, "
-						  "c.relreplident, c.relpages, "
+						  "c.relreplident, c.relpages, am.amname AS amname, "

That AS doesn't do anything, does it?

/* other fields were zeroed above */

@@ -9355,7 +9370,7 @@ dumpComment(Archive *fout, const char *type, const char *name,
* post-data.
*/
ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 tag->data, namespace, NULL, owner,
+					 tag->data, namespace, NULL, owner, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(dumpId), 1,

We really ought to move the arguments to a struct, so we don't generate
quite as much useless diffs whenever we do a change around one of
these...

Greetings,

Andres Freund

#65Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Andres Freund (#64)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Sat, Jan 12, 2019 at 1:44 AM Andres Freund <andres@anarazel.de> wrote:

+ appendPQExpBuffer(cmd, "SET default_table_access_method = %s;", tableam);

This needs escaping, at the very least with "", but better with proper
routines for dealing with identifiers.

Thanks for noticing, fixed.

@@ -5914,7 +5922,7 @@ getTables(Archive *fout, int *numTables)
"tc.relfrozenxid AS tfrozenxid, "
"tc.relminmxid AS tminmxid, "
"c.relpersistence, c.relispopulated, "
-                                               "c.relreplident, c.relpages, "
+                                               "c.relreplident, c.relpages, am.amname AS amname, "

That AS doesn't do anything, does it?

Rigth, I've renamed it few times and forgot to get rid of it. Removed.

/* other fields were zeroed above */

@@ -9355,7 +9370,7 @@ dumpComment(Archive *fout, const char *type, const char *name,
* post-data.
*/
ArchiveEntry(fout, nilCatalogId, createDumpId(),
-                                      tag->data, namespace, NULL, owner,
+                                      tag->data, namespace, NULL, owner, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(dumpId), 1,

We really ought to move the arguments to a struct, so we don't generate
quite as much useless diffs whenever we do a change around one of
these...

That's what I though too. Maybe then I'll suggest a mini-patch to the master to
refactor these arguments out into a separate struct, so we can leverage it here.

Attachments:

pg_dump_access_method_v2.patchapplication/octet-stream; name=pg_dump_access_method_v2.patchDownload
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 58bd3805f4..6f1b717e06 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -85,6 +85,7 @@ static void _becomeUser(ArchiveHandle *AH, const char *user);
 static void _becomeOwner(ArchiveHandle *AH, TocEntry *te);
 static void _selectOutputSchema(ArchiveHandle *AH, const char *schemaName);
 static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
+static void _selectTableAccessMethod(ArchiveHandle *AH, const char *tablespace);
 static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
 static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
 static void processSearchPathEntry(ArchiveHandle *AH, TocEntry *te);
@@ -1072,6 +1073,7 @@ ArchiveEntry(Archive *AHX,
 			 const char *namespace,
 			 const char *tablespace,
 			 const char *owner,
+			 const char *tableam,
 			 const char *desc, teSection section,
 			 const char *defn,
 			 const char *dropStmt, const char *copyStmt,
@@ -1099,6 +1101,7 @@ ArchiveEntry(Archive *AHX,
 	newToc->tag = pg_strdup(tag);
 	newToc->namespace = namespace ? pg_strdup(namespace) : NULL;
 	newToc->tablespace = tablespace ? pg_strdup(tablespace) : NULL;
+	newToc->tableam = tableam ? pg_strdup(tableam) : NULL;
 	newToc->owner = pg_strdup(owner);
 	newToc->desc = pg_strdup(desc);
 	newToc->defn = pg_strdup(defn);
@@ -2367,6 +2370,7 @@ _allocAH(const char *FileSpec, const ArchiveFormat fmt,
 	AH->currUser = NULL;		/* unknown */
 	AH->currSchema = NULL;		/* ditto */
 	AH->currTablespace = NULL;	/* ditto */
+	AH->currTableAm = NULL;	/* ditto */
 
 	AH->toc = (TocEntry *) pg_malloc0(sizeof(TocEntry));
 
@@ -2594,6 +2598,7 @@ WriteToc(ArchiveHandle *AH)
 		WriteStr(AH, te->namespace);
 		WriteStr(AH, te->tablespace);
 		WriteStr(AH, te->owner);
+		WriteStr(AH, te->tableam);
 		WriteStr(AH, "false");
 
 		/* Dump list of dependencies */
@@ -2696,6 +2701,7 @@ ReadToc(ArchiveHandle *AH)
 			te->tablespace = ReadStr(AH);
 
 		te->owner = ReadStr(AH);
+		te->tableam = ReadStr(AH);
 		if (AH->version < K_VERS_1_9 || strcmp(ReadStr(AH), "true") == 0)
 			write_msg(modulename,
 					  "WARNING: restoring tables WITH OIDS is not supported anymore");
@@ -3288,6 +3294,9 @@ _reconnectToDB(ArchiveHandle *AH, const char *dbname)
 	if (AH->currTablespace)
 		free(AH->currTablespace);
 	AH->currTablespace = NULL;
+	if (AH->currTableAm)
+		free(AH->currTableAm);
+	AH->currTableAm = NULL;
 
 	/* re-establish fixed state */
 	_doSetFixedOutputState(AH);
@@ -3448,6 +3457,48 @@ _selectTablespace(ArchiveHandle *AH, const char *tablespace)
 	destroyPQExpBuffer(qry);
 }
 
+/*
+ * Set the proper default_table_access_method value for the table.
+ */
+static void
+_selectTableAccessMethod(ArchiveHandle *AH, const char *tableam)
+{
+	PQExpBuffer cmd = createPQExpBuffer();
+	const char *want, *have;
+
+	have = AH->currTableAm;
+	want = tableam;
+
+	if (!want)
+		return;
+
+	if (have && strcmp(want, have) == 0)
+		return;
+
+
+	appendPQExpBuffer(cmd, "SET default_table_access_method = %s;", fmtId(want));
+
+	if (RestoringToDB(AH))
+	{
+		PGresult   *res;
+
+		res = PQexec(AH->connection, cmd->data);
+
+		if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
+			warn_or_exit_horribly(AH, modulename,
+								  "could not set default_table_access_method: %s",
+								  PQerrorMessage(AH->connection));
+
+		PQclear(res);
+	}
+	else
+		ahprintf(AH, "%s\n\n", cmd->data);
+
+	destroyPQExpBuffer(cmd);
+
+	AH->currTableAm = pg_strdup(want);
+}
+
 /*
  * Extract an object description for a TOC entry, and append it to buf.
  *
@@ -3547,6 +3598,7 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
 	_becomeOwner(AH, te);
 	_selectOutputSchema(AH, te->namespace);
 	_selectTablespace(AH, te->tablespace);
+	_selectTableAccessMethod(AH, te->tableam);
 
 	/* Emit header comment for item */
 	if (!AH->noTocComments)
@@ -4021,6 +4073,9 @@ restore_toc_entries_prefork(ArchiveHandle *AH, TocEntry *pending_list)
 	if (AH->currTablespace)
 		free(AH->currTablespace);
 	AH->currTablespace = NULL;
+	if (AH->currTableAm)
+		free(AH->currTableAm);
+	AH->currTableAm = NULL;
 }
 
 /*
@@ -4816,6 +4871,7 @@ CloneArchive(ArchiveHandle *AH)
 	clone->currUser = NULL;
 	clone->currSchema = NULL;
 	clone->currTablespace = NULL;
+	clone->currTableAm = NULL;
 
 	/* savedPassword must be local in case we change it while connecting */
 	if (clone->savedPassword)
@@ -4906,6 +4962,8 @@ DeCloneArchive(ArchiveHandle *AH)
 		free(AH->currSchema);
 	if (AH->currTablespace)
 		free(AH->currTablespace);
+	if (AH->currTableAm)
+		free(AH->currTableAm);
 	if (AH->savedPassword)
 		free(AH->savedPassword);
 
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 306d2ceba9..719065565b 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -347,6 +347,7 @@ struct _archiveHandle
 	char	   *currUser;		/* current username, or NULL if unknown */
 	char	   *currSchema;		/* current schema, or NULL */
 	char	   *currTablespace; /* current tablespace, or NULL */
+	char	   *currTableAm; 	/* current table access method, or NULL */
 
 	void	   *lo_buf;
 	size_t		lo_buf_used;
@@ -373,6 +374,8 @@ struct _tocEntry
 	char	   *namespace;		/* null or empty string if not in a schema */
 	char	   *tablespace;		/* null if not in a tablespace; empty string
 								 * means use database default */
+	char	   *tableam;			/* table access method, onlyt for TABLE tags */
+
 	char	   *owner;
 	char	   *desc;
 	char	   *defn;
@@ -410,7 +413,7 @@ extern TocEntry *ArchiveEntry(Archive *AHX,
 			 CatalogId catalogId, DumpId dumpId,
 			 const char *tag,
 			 const char *namespace, const char *tablespace,
-			 const char *owner,
+			 const char *owner, const char *amname,
 			 const char *desc, teSection section,
 			 const char *defn,
 			 const char *dropStmt, const char *copyStmt,
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 637c79af48..512c486546 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2136,7 +2136,7 @@ dumpTableData(Archive *fout, TableDataInfo *tdinfo)
 
 		te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
 						  tbinfo->dobj.name, tbinfo->dobj.namespace->dobj.name,
-						  NULL, tbinfo->rolname,
+						  NULL, tbinfo->rolname, NULL,
 						  "TABLE DATA", SECTION_DATA,
 						  "", "", copyStmt,
 						  &(tbinfo->dobj.dumpId), 1,
@@ -2188,6 +2188,7 @@ refreshMatViewData(Archive *fout, TableDataInfo *tdinfo)
 					 tbinfo->dobj.namespace->dobj.name, /* Namespace */
 					 NULL,		/* Tablespace */
 					 tbinfo->rolname,	/* Owner */
+					 NULL,				/* Table access method */
 					 "MATERIALIZED VIEW DATA",	/* Desc */
 					 SECTION_POST_DATA, /* Section */
 					 q->data,	/* Create */
@@ -2726,6 +2727,7 @@ dumpDatabase(Archive *fout)
 				 NULL,			/* Namespace */
 				 NULL,			/* Tablespace */
 				 dba,			/* Owner */
+				 NULL,			/* Table access method */
 				 "DATABASE",	/* Desc */
 				 SECTION_PRE_DATA,	/* Section */
 				 creaQry->data, /* Create */
@@ -2762,7 +2764,7 @@ dumpDatabase(Archive *fout)
 			appendPQExpBufferStr(dbQry, ";\n");
 
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
-						 labelq->data, NULL, NULL, dba,
+						 labelq->data, NULL, NULL, dba, NULL,
 						 "COMMENT", SECTION_NONE,
 						 dbQry->data, "", NULL,
 						 &(dbDumpId), 1,
@@ -2789,7 +2791,7 @@ dumpDatabase(Archive *fout)
 		emitShSecLabels(conn, shres, seclabelQry, "DATABASE", datname);
 		if (seclabelQry->len > 0)
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
-						 labelq->data, NULL, NULL, dba,
+						 labelq->data, NULL, NULL, dba, NULL,
 						 "SECURITY LABEL", SECTION_NONE,
 						 seclabelQry->data, "", NULL,
 						 &(dbDumpId), 1,
@@ -2859,7 +2861,7 @@ dumpDatabase(Archive *fout)
 
 	if (creaQry->len > 0)
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 datname, NULL, NULL, dba,
+					 datname, NULL, NULL, dba, NULL,
 					 "DATABASE PROPERTIES", SECTION_PRE_DATA,
 					 creaQry->data, delQry->data, NULL,
 					 &(dbDumpId), 1,
@@ -2904,7 +2906,7 @@ dumpDatabase(Archive *fout)
 						  atooid(PQgetvalue(lo_res, 0, i_relminmxid)),
 						  LargeObjectRelationId);
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 "pg_largeobject", NULL, NULL, "",
+					 "pg_largeobject", NULL, NULL, "", NULL,
 					 "pg_largeobject", SECTION_PRE_DATA,
 					 loOutQry->data, "", NULL,
 					 NULL, 0,
@@ -3014,7 +3016,7 @@ dumpEncoding(Archive *AH)
 	appendPQExpBufferStr(qry, ";\n");
 
 	ArchiveEntry(AH, nilCatalogId, createDumpId(),
-				 "ENCODING", NULL, NULL, "",
+				 "ENCODING", NULL, NULL, "", NULL,
 				 "ENCODING", SECTION_PRE_DATA,
 				 qry->data, "", NULL,
 				 NULL, 0,
@@ -3041,7 +3043,7 @@ dumpStdStrings(Archive *AH)
 					  stdstrings);
 
 	ArchiveEntry(AH, nilCatalogId, createDumpId(),
-				 "STDSTRINGS", NULL, NULL, "",
+				 "STDSTRINGS", NULL, NULL, "", NULL,
 				 "STDSTRINGS", SECTION_PRE_DATA,
 				 qry->data, "", NULL,
 				 NULL, 0,
@@ -3097,7 +3099,7 @@ dumpSearchPath(Archive *AH)
 		write_msg(NULL, "saving search_path = %s\n", path->data);
 
 	ArchiveEntry(AH, nilCatalogId, createDumpId(),
-				 "SEARCHPATH", NULL, NULL, "",
+				 "SEARCHPATH", NULL, NULL, "", NULL,
 				 "SEARCHPATH", SECTION_PRE_DATA,
 				 qry->data, "", NULL,
 				 NULL, 0,
@@ -3275,7 +3277,7 @@ dumpBlob(Archive *fout, BlobInfo *binfo)
 		ArchiveEntry(fout, binfo->dobj.catId, binfo->dobj.dumpId,
 					 binfo->dobj.name,
 					 NULL, NULL,
-					 binfo->rolname,
+					 binfo->rolname, NULL,
 					 "BLOB", SECTION_PRE_DATA,
 					 cquery->data, dquery->data, NULL,
 					 NULL, 0,
@@ -3581,6 +3583,7 @@ dumpPolicy(Archive *fout, PolicyInfo *polinfo)
 						 polinfo->dobj.namespace->dobj.name,
 						 NULL,
 						 tbinfo->rolname,
+						 NULL,
 						 "ROW SECURITY", SECTION_POST_DATA,
 						 query->data, "", NULL,
 						 &(tbinfo->dobj.dumpId), 1,
@@ -3637,6 +3640,7 @@ dumpPolicy(Archive *fout, PolicyInfo *polinfo)
 					 polinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "POLICY", SECTION_POST_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -3811,6 +3815,7 @@ dumpPublication(Archive *fout, PublicationInfo *pubinfo)
 				 NULL,
 				 NULL,
 				 pubinfo->rolname,
+				 NULL,
 				 "PUBLICATION", SECTION_POST_DATA,
 				 query->data, delq->data, NULL,
 				 NULL, 0,
@@ -3954,6 +3959,7 @@ dumpPublicationTable(Archive *fout, PublicationRelInfo *pubrinfo)
 				 tbinfo->dobj.namespace->dobj.name,
 				 NULL,
 				 "",
+				 NULL,
 				 "PUBLICATION TABLE", SECTION_POST_DATA,
 				 query->data, "", NULL,
 				 NULL, 0,
@@ -4147,6 +4153,7 @@ dumpSubscription(Archive *fout, SubscriptionInfo *subinfo)
 				 NULL,
 				 NULL,
 				 subinfo->rolname,
+				 NULL,
 				 "SUBSCRIPTION", SECTION_POST_DATA,
 				 query->data, delq->data, NULL,
 				 NULL, 0,
@@ -5829,6 +5836,7 @@ getTables(Archive *fout, int *numTables)
 	int			i_partkeydef;
 	int			i_ispartition;
 	int			i_partbound;
+	int			i_amname;
 
 	/*
 	 * Find all the tables and table-like objects.
@@ -5914,7 +5922,7 @@ getTables(Archive *fout, int *numTables)
 						  "tc.relfrozenxid AS tfrozenxid, "
 						  "tc.relminmxid AS tminmxid, "
 						  "c.relpersistence, c.relispopulated, "
-						  "c.relreplident, c.relpages, "
+						  "c.relreplident, c.relpages, am.amname, "
 						  "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
 						  "d.refobjid AS owning_tab, "
 						  "d.refobjsubid AS owning_col, "
@@ -5945,6 +5953,7 @@ getTables(Archive *fout, int *numTables)
 						  "d.objsubid = 0 AND "
 						  "d.refclassid = c.tableoid AND d.deptype IN ('a', 'i')) "
 						  "LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid) "
+						  "LEFT JOIN pg_am am ON (c.relam = am.oid) "
 						  "LEFT JOIN pg_init_privs pip ON "
 						  "(c.oid = pip.objoid "
 						  "AND pip.classoid = 'pg_class'::regclass "
@@ -6412,6 +6421,7 @@ getTables(Archive *fout, int *numTables)
 	i_partkeydef = PQfnumber(res, "partkeydef");
 	i_ispartition = PQfnumber(res, "ispartition");
 	i_partbound = PQfnumber(res, "partbound");
+	i_amname = PQfnumber(res, "amname");
 
 	if (dopt->lockWaitTimeout)
 	{
@@ -6481,6 +6491,11 @@ getTables(Archive *fout, int *numTables)
 		else
 			tblinfo[i].checkoption = pg_strdup(PQgetvalue(res, i, i_checkoption));
 		tblinfo[i].toast_reloptions = pg_strdup(PQgetvalue(res, i, i_toastreloptions));
+		if (PQgetisnull(res, i, i_amname))
+			tblinfo[i].amname = NULL;
+		else
+			tblinfo[i].amname = pg_strdup(PQgetvalue(res, i, i_amname));
+
 
 		/* other fields were zeroed above */
 
@@ -9355,7 +9370,7 @@ dumpComment(Archive *fout, const char *type, const char *name,
 		 * post-data.
 		 */
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 tag->data, namespace, NULL, owner,
+					 tag->data, namespace, NULL, owner, NULL,
 					 "COMMENT", SECTION_NONE,
 					 query->data, "", NULL,
 					 &(dumpId), 1,
@@ -9423,7 +9438,7 @@ dumpTableComment(Archive *fout, TableInfo *tbinfo,
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
 						 tag->data,
 						 tbinfo->dobj.namespace->dobj.name,
-						 NULL, tbinfo->rolname,
+						 NULL, tbinfo->rolname, NULL,
 						 "COMMENT", SECTION_NONE,
 						 query->data, "", NULL,
 						 &(tbinfo->dobj.dumpId), 1,
@@ -9447,7 +9462,7 @@ dumpTableComment(Archive *fout, TableInfo *tbinfo,
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
 						 tag->data,
 						 tbinfo->dobj.namespace->dobj.name,
-						 NULL, tbinfo->rolname,
+						 NULL, tbinfo->rolname, NULL,
 						 "COMMENT", SECTION_NONE,
 						 query->data, "", NULL,
 						 &(tbinfo->dobj.dumpId), 1,
@@ -9728,7 +9743,7 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 				TocEntry   *te;
 
 				te = ArchiveEntry(fout, dobj->catId, dobj->dumpId,
-								  dobj->name, NULL, NULL, "",
+								  dobj->name, NULL, NULL, "", NULL,
 								  "BLOBS", SECTION_DATA,
 								  "", "", NULL,
 								  NULL, 0,
@@ -9802,7 +9817,7 @@ dumpNamespace(Archive *fout, NamespaceInfo *nspinfo)
 		ArchiveEntry(fout, nspinfo->dobj.catId, nspinfo->dobj.dumpId,
 					 nspinfo->dobj.name,
 					 NULL, NULL,
-					 nspinfo->rolname,
+					 nspinfo->rolname, NULL,
 					 "SCHEMA", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -9938,7 +9953,7 @@ dumpExtension(Archive *fout, ExtensionInfo *extinfo)
 		ArchiveEntry(fout, extinfo->dobj.catId, extinfo->dobj.dumpId,
 					 extinfo->dobj.name,
 					 NULL, NULL,
-					 "",
+					 "", NULL,
 					 "EXTENSION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10090,6 +10105,7 @@ dumpEnumType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10217,6 +10233,7 @@ dumpRangeType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10290,6 +10307,7 @@ dumpUndefinedType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10572,6 +10590,7 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10729,6 +10748,7 @@ dumpDomain(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "DOMAIN", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10951,6 +10971,7 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -11085,7 +11106,7 @@ dumpCompositeTypeColComments(Archive *fout, TypeInfo *tyinfo)
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
 						 target->data,
 						 tyinfo->dobj.namespace->dobj.name,
-						 NULL, tyinfo->rolname,
+						 NULL, tyinfo->rolname, NULL,
 						 "COMMENT", SECTION_NONE,
 						 query->data, "", NULL,
 						 &(tyinfo->dobj.dumpId), 1,
@@ -11142,6 +11163,7 @@ dumpShellType(Archive *fout, ShellTypeInfo *stinfo)
 					 stinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 stinfo->baseType->rolname,
+					 NULL,
 					 "SHELL TYPE", SECTION_PRE_DATA,
 					 q->data, "", NULL,
 					 NULL, 0,
@@ -11251,7 +11273,7 @@ dumpProcLang(Archive *fout, ProcLangInfo *plang)
 	if (plang->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, plang->dobj.catId, plang->dobj.dumpId,
 					 plang->dobj.name,
-					 NULL, NULL, plang->lanowner,
+					 NULL, NULL, plang->lanowner, NULL,
 					 "PROCEDURAL LANGUAGE", SECTION_PRE_DATA,
 					 defqry->data, delqry->data, NULL,
 					 NULL, 0,
@@ -11924,6 +11946,7 @@ dumpFunc(Archive *fout, FuncInfo *finfo)
 					 finfo->dobj.namespace->dobj.name,
 					 NULL,
 					 finfo->rolname,
+					 NULL,
 					 keyword, SECTION_PRE_DATA,
 					 q->data, delqry->data, NULL,
 					 NULL, 0,
@@ -12056,7 +12079,7 @@ dumpCast(Archive *fout, CastInfo *cast)
 	if (cast->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, cast->dobj.catId, cast->dobj.dumpId,
 					 labelq->data,
-					 NULL, NULL, "",
+					 NULL, NULL, "", NULL,
 					 "CAST", SECTION_PRE_DATA,
 					 defqry->data, delqry->data, NULL,
 					 NULL, 0,
@@ -12184,7 +12207,7 @@ dumpTransform(Archive *fout, TransformInfo *transform)
 	if (transform->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, transform->dobj.catId, transform->dobj.dumpId,
 					 labelq->data,
-					 NULL, NULL, "",
+					 NULL, NULL, "", NULL,
 					 "TRANSFORM", SECTION_PRE_DATA,
 					 defqry->data, delqry->data, NULL,
 					 transform->dobj.dependencies, transform->dobj.nDeps,
@@ -12400,6 +12423,7 @@ dumpOpr(Archive *fout, OprInfo *oprinfo)
 					 oprinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 oprinfo->rolname,
+					 NULL,
 					 "OPERATOR", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -12546,6 +12570,9 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
 		case AMTYPE_INDEX:
 			appendPQExpBuffer(q, "TYPE INDEX ");
 			break;
+		case AMTYPE_TABLE:
+			appendPQExpBuffer(q, "TYPE TABLE ");
+			break;
 		default:
 			write_msg(NULL, "WARNING: invalid type \"%c\" of access method \"%s\"\n",
 					  aminfo->amtype, qamname);
@@ -12570,6 +12597,7 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
 					 NULL,
 					 NULL,
 					 "",
+					 NULL,
 					 "ACCESS METHOD", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -12936,6 +12964,7 @@ dumpOpclass(Archive *fout, OpclassInfo *opcinfo)
 					 opcinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 opcinfo->rolname,
+					 NULL,
 					 "OPERATOR CLASS", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13203,6 +13232,7 @@ dumpOpfamily(Archive *fout, OpfamilyInfo *opfinfo)
 					 opfinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 opfinfo->rolname,
+					 NULL,
 					 "OPERATOR FAMILY", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13346,6 +13376,7 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
 					 collinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 collinfo->rolname,
+					 NULL,
 					 "COLLATION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13441,6 +13472,7 @@ dumpConversion(Archive *fout, ConvInfo *convinfo)
 					 convinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 convinfo->rolname,
+					 NULL,
 					 "CONVERSION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13930,6 +13962,7 @@ dumpAgg(Archive *fout, AggInfo *agginfo)
 					 agginfo->aggfn.dobj.namespace->dobj.name,
 					 NULL,
 					 agginfo->aggfn.rolname,
+					 NULL,
 					 "AGGREGATE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14028,6 +14061,7 @@ dumpTSParser(Archive *fout, TSParserInfo *prsinfo)
 					 prsinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 "",
+					 NULL,
 					 "TEXT SEARCH PARSER", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14108,6 +14142,7 @@ dumpTSDictionary(Archive *fout, TSDictInfo *dictinfo)
 					 dictinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 dictinfo->rolname,
+					 NULL,
 					 "TEXT SEARCH DICTIONARY", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14169,6 +14204,7 @@ dumpTSTemplate(Archive *fout, TSTemplateInfo *tmplinfo)
 					 tmplinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 "",
+					 NULL,
 					 "TEXT SEARCH TEMPLATE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14289,6 +14325,7 @@ dumpTSConfig(Archive *fout, TSConfigInfo *cfginfo)
 					 cfginfo->dobj.namespace->dobj.name,
 					 NULL,
 					 cfginfo->rolname,
+					 NULL,
 					 "TEXT SEARCH CONFIGURATION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14355,6 +14392,7 @@ dumpForeignDataWrapper(Archive *fout, FdwInfo *fdwinfo)
 					 NULL,
 					 NULL,
 					 fdwinfo->rolname,
+					 NULL,
 					 "FOREIGN DATA WRAPPER", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14446,6 +14484,7 @@ dumpForeignServer(Archive *fout, ForeignServerInfo *srvinfo)
 					 NULL,
 					 NULL,
 					 srvinfo->rolname,
+					 NULL,
 					 "SERVER", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14564,6 +14603,7 @@ dumpUserMappings(Archive *fout,
 					 namespace,
 					 NULL,
 					 owner,
+					 NULL,
 					 "USER MAPPING", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 &dumpId, 1,
@@ -14643,6 +14683,7 @@ dumpDefaultACL(Archive *fout, DefaultACLInfo *daclinfo)
 					 daclinfo->dobj.namespace ? daclinfo->dobj.namespace->dobj.name : NULL,
 					 NULL,
 					 daclinfo->defaclrole,
+					 NULL,
 					 "DEFAULT ACL", SECTION_POST_DATA,
 					 q->data, "", NULL,
 					 NULL, 0,
@@ -14741,6 +14782,7 @@ dumpACL(Archive *fout, CatalogId objCatId, DumpId objDumpId,
 					 tag->data, nspname,
 					 NULL,
 					 owner ? owner : "",
+					 NULL,
 					 "ACL", SECTION_NONE,
 					 sql->data, "", NULL,
 					 &(objDumpId), 1,
@@ -14826,7 +14868,7 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
 
 		appendPQExpBuffer(tag, "%s %s", type, name);
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 tag->data, namespace, NULL, owner,
+					 tag->data, namespace, NULL, owner, NULL,
 					 "SECURITY LABEL", SECTION_NONE,
 					 query->data, "", NULL,
 					 &(dumpId), 1,
@@ -14908,7 +14950,7 @@ dumpTableSecLabel(Archive *fout, TableInfo *tbinfo, const char *reltypename)
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
 					 target->data,
 					 tbinfo->dobj.namespace->dobj.name,
-					 NULL, tbinfo->rolname,
+					 NULL, tbinfo->rolname, NULL,
 					 "SECURITY LABEL", SECTION_NONE,
 					 query->data, "", NULL,
 					 &(tbinfo->dobj.dumpId), 1,
@@ -15994,6 +16036,8 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 (tbinfo->relkind == RELKIND_VIEW) ? NULL : tbinfo->reltablespace,
 					 tbinfo->rolname,
+					 (tbinfo->relkind == RELKIND_RELATION) ?
+					 tbinfo->amname : NULL,
 					 reltypename,
 					 tbinfo->postponed_def ?
 					 SECTION_POST_DATA : SECTION_PRE_DATA,
@@ -16074,6 +16118,7 @@ dumpAttrDef(Archive *fout, AttrDefInfo *adinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "DEFAULT", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -16190,6 +16235,7 @@ dumpIndex(Archive *fout, IndxInfo *indxinfo)
 						 tbinfo->dobj.namespace->dobj.name,
 						 indxinfo->tablespace,
 						 tbinfo->rolname,
+						 NULL,
 						 "INDEX", SECTION_POST_DATA,
 						 q->data, delq->data, NULL,
 						 NULL, 0,
@@ -16234,6 +16280,7 @@ dumpIndexAttach(Archive *fout, IndexAttachInfo *attachinfo)
 					 attachinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 "",
+					 NULL,
 					 "INDEX ATTACH", SECTION_POST_DATA,
 					 q->data, "", NULL,
 					 NULL, 0,
@@ -16289,6 +16336,7 @@ dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo)
 					 statsextinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 statsextinfo->rolname,
+					 NULL,
 					 "STATISTICS", SECTION_POST_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -16450,6 +16498,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 						 tbinfo->dobj.namespace->dobj.name,
 						 indxinfo->tablespace,
 						 tbinfo->rolname,
+						 NULL,
 						 "CONSTRAINT", SECTION_POST_DATA,
 						 q->data, delq->data, NULL,
 						 NULL, 0,
@@ -16490,6 +16539,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 						 tbinfo->dobj.namespace->dobj.name,
 						 NULL,
 						 tbinfo->rolname,
+						 NULL,
 						 "FK CONSTRAINT", SECTION_POST_DATA,
 						 q->data, delq->data, NULL,
 						 NULL, 0,
@@ -16522,6 +16572,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 							 tbinfo->dobj.namespace->dobj.name,
 							 NULL,
 							 tbinfo->rolname,
+							 NULL,
 							 "CHECK CONSTRAINT", SECTION_POST_DATA,
 							 q->data, delq->data, NULL,
 							 NULL, 0,
@@ -16555,6 +16606,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 							 tyinfo->dobj.namespace->dobj.name,
 							 NULL,
 							 tyinfo->rolname,
+							 NULL,
 							 "CHECK CONSTRAINT", SECTION_POST_DATA,
 							 q->data, delq->data, NULL,
 							 NULL, 0,
@@ -16829,6 +16881,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "SEQUENCE", SECTION_PRE_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -16870,6 +16923,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
 							 tbinfo->dobj.namespace->dobj.name,
 							 NULL,
 							 tbinfo->rolname,
+							 NULL,
 							 "SEQUENCE OWNED BY", SECTION_PRE_DATA,
 							 query->data, "", NULL,
 							 &(tbinfo->dobj.dumpId), 1,
@@ -16938,6 +16992,7 @@ dumpSequenceData(Archive *fout, TableDataInfo *tdinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "SEQUENCE SET", SECTION_DATA,
 					 query->data, "", NULL,
 					 &(tbinfo->dobj.dumpId), 1,
@@ -17137,6 +17192,7 @@ dumpTrigger(Archive *fout, TriggerInfo *tginfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "TRIGGER", SECTION_POST_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -17223,7 +17279,7 @@ dumpEventTrigger(Archive *fout, EventTriggerInfo *evtinfo)
 	if (evtinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, evtinfo->dobj.catId, evtinfo->dobj.dumpId,
 					 evtinfo->dobj.name, NULL, NULL,
-					 evtinfo->evtowner,
+					 evtinfo->evtowner, NULL,
 					 "EVENT TRIGGER", SECTION_POST_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -17384,6 +17440,7 @@ dumpRule(Archive *fout, RuleInfo *rinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "RULE", SECTION_POST_DATA,
 					 cmd->data, delcmd->data, NULL,
 					 NULL, 0,
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 789d6a24e2..4024d0c1e3 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -324,6 +324,7 @@ typedef struct _tableInfo
 	char	   *partkeydef;		/* partition key definition */
 	char	   *partbound;		/* partition bound definition */
 	bool		needs_override; /* has GENERATED ALWAYS AS IDENTITY */
+	char	   *amname; 		/* table access method */
 
 	/*
 	 * Stuff computed only for dumpable tables.
#66Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Dmitry Dolgov (#65)
Re: Pluggable Storage - Andres's take

Thanks for the patch updates.

A few comments so far from me :

+static void _selectTableAccessMethod(ArchiveHandle *AH, const char
*tablespace);
tablespace => tableam

+_selectTableAccessMethod(ArchiveHandle *AH, const char *tableam)
+{
+       PQExpBuffer cmd = createPQExpBuffer();
createPQExpBuffer() should be moved after the below statement, so that
it does not leak memory :
if (have && strcmp(want, have) == 0)
return;

char *tableam; /* table access method, onlyt for TABLE tags */
Indentation is a bit misaligned. onlyt=> only

@@ -2696,6 +2701,7 @@ ReadToc(ArchiveHandle *AH)
te->tablespace = ReadStr(AH);

te->owner = ReadStr(AH);
+ te->tableam = ReadStr(AH);

Above, I am not sure about the this, but possibly we may require to
have archive-version check like how it is done for tablespace :
if (AH->version >= K_VERS_1_10)
te->tablespace = ReadStr(AH);

So how about bumping up the archive version and doing these checks ?
Otherwise, if we run pg_restore using old version, we may read some
junk into te->tableam, or possibly crash. As I said, I am not sure
about this due to lack of clear understanding of archive versioning,
but let me know if you indeed find this issue to be true.

#67Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Amit Khandekar (#66)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Mon, Jan 14, 2019 at 2:07 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

createPQExpBuffer() should be moved after the below statement, so that
it does not leak memory

Thanks for noticing, fixed.

So how about bumping up the archive version and doing these checks ?

Yeah, you're right, I've added this check.

Attachments:

pg_dump_access_method_v3.patchapplication/octet-stream; name=pg_dump_access_method_v3.patchDownload
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 58bd3805f4..aadeacf95d 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -85,6 +85,7 @@ static void _becomeUser(ArchiveHandle *AH, const char *user);
 static void _becomeOwner(ArchiveHandle *AH, TocEntry *te);
 static void _selectOutputSchema(ArchiveHandle *AH, const char *schemaName);
 static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
+static void _selectTableAccessMethod(ArchiveHandle *AH, const char *tablespace);
 static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
 static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
 static void processSearchPathEntry(ArchiveHandle *AH, TocEntry *te);
@@ -1072,6 +1073,7 @@ ArchiveEntry(Archive *AHX,
 			 const char *namespace,
 			 const char *tablespace,
 			 const char *owner,
+			 const char *tableam,
 			 const char *desc, teSection section,
 			 const char *defn,
 			 const char *dropStmt, const char *copyStmt,
@@ -1099,6 +1101,7 @@ ArchiveEntry(Archive *AHX,
 	newToc->tag = pg_strdup(tag);
 	newToc->namespace = namespace ? pg_strdup(namespace) : NULL;
 	newToc->tablespace = tablespace ? pg_strdup(tablespace) : NULL;
+	newToc->tableam = tableam ? pg_strdup(tableam) : NULL;
 	newToc->owner = pg_strdup(owner);
 	newToc->desc = pg_strdup(desc);
 	newToc->defn = pg_strdup(defn);
@@ -2367,6 +2370,7 @@ _allocAH(const char *FileSpec, const ArchiveFormat fmt,
 	AH->currUser = NULL;		/* unknown */
 	AH->currSchema = NULL;		/* ditto */
 	AH->currTablespace = NULL;	/* ditto */
+	AH->currTableAm = NULL;	/* ditto */
 
 	AH->toc = (TocEntry *) pg_malloc0(sizeof(TocEntry));
 
@@ -2594,6 +2598,7 @@ WriteToc(ArchiveHandle *AH)
 		WriteStr(AH, te->namespace);
 		WriteStr(AH, te->tablespace);
 		WriteStr(AH, te->owner);
+		WriteStr(AH, te->tableam);
 		WriteStr(AH, "false");
 
 		/* Dump list of dependencies */
@@ -2696,6 +2701,9 @@ ReadToc(ArchiveHandle *AH)
 			te->tablespace = ReadStr(AH);
 
 		te->owner = ReadStr(AH);
+		if (AH->version >= K_VERS_1_14)
+			te->tableam = ReadStr(AH);
+
 		if (AH->version < K_VERS_1_9 || strcmp(ReadStr(AH), "true") == 0)
 			write_msg(modulename,
 					  "WARNING: restoring tables WITH OIDS is not supported anymore");
@@ -3288,6 +3296,9 @@ _reconnectToDB(ArchiveHandle *AH, const char *dbname)
 	if (AH->currTablespace)
 		free(AH->currTablespace);
 	AH->currTablespace = NULL;
+	if (AH->currTableAm)
+		free(AH->currTableAm);
+	AH->currTableAm = NULL;
 
 	/* re-establish fixed state */
 	_doSetFixedOutputState(AH);
@@ -3448,6 +3459,48 @@ _selectTablespace(ArchiveHandle *AH, const char *tablespace)
 	destroyPQExpBuffer(qry);
 }
 
+/*
+ * Set the proper default_table_access_method value for the table.
+ */
+static void
+_selectTableAccessMethod(ArchiveHandle *AH, const char *tableam)
+{
+	PQExpBuffer cmd;
+	const char *want, *have;
+
+	have = AH->currTableAm;
+	want = tableam;
+
+	if (!want)
+		return;
+
+	if (have && strcmp(want, have) == 0)
+		return;
+
+	cmd = createPQExpBuffer();
+	appendPQExpBuffer(cmd, "SET default_table_access_method = %s;", fmtId(want));
+
+	if (RestoringToDB(AH))
+	{
+		PGresult   *res;
+
+		res = PQexec(AH->connection, cmd->data);
+
+		if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
+			warn_or_exit_horribly(AH, modulename,
+								  "could not set default_table_access_method: %s",
+								  PQerrorMessage(AH->connection));
+
+		PQclear(res);
+	}
+	else
+		ahprintf(AH, "%s\n\n", cmd->data);
+
+	destroyPQExpBuffer(cmd);
+
+	AH->currTableAm = pg_strdup(want);
+}
+
 /*
  * Extract an object description for a TOC entry, and append it to buf.
  *
@@ -3547,6 +3600,7 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
 	_becomeOwner(AH, te);
 	_selectOutputSchema(AH, te->namespace);
 	_selectTablespace(AH, te->tablespace);
+	_selectTableAccessMethod(AH, te->tableam);
 
 	/* Emit header comment for item */
 	if (!AH->noTocComments)
@@ -4021,6 +4075,9 @@ restore_toc_entries_prefork(ArchiveHandle *AH, TocEntry *pending_list)
 	if (AH->currTablespace)
 		free(AH->currTablespace);
 	AH->currTablespace = NULL;
+	if (AH->currTableAm)
+		free(AH->currTableAm);
+	AH->currTableAm = NULL;
 }
 
 /*
@@ -4816,6 +4873,7 @@ CloneArchive(ArchiveHandle *AH)
 	clone->currUser = NULL;
 	clone->currSchema = NULL;
 	clone->currTablespace = NULL;
+	clone->currTableAm = NULL;
 
 	/* savedPassword must be local in case we change it while connecting */
 	if (clone->savedPassword)
@@ -4906,6 +4964,8 @@ DeCloneArchive(ArchiveHandle *AH)
 		free(AH->currSchema);
 	if (AH->currTablespace)
 		free(AH->currTablespace);
+	if (AH->currTableAm)
+		free(AH->currTableAm);
 	if (AH->savedPassword)
 		free(AH->savedPassword);
 
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 306d2ceba9..8131ff0a3d 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -94,6 +94,7 @@ typedef z_stream *z_streamp;
 													 * entries */
 #define K_VERS_1_13 MAKE_ARCHIVE_VERSION(1, 13, 0)	/* change search_path
 													 * behavior */
+#define K_VERS_1_14 MAKE_ARCHIVE_VERSION(1, 14, 0)	/* add tableam */
 
 /* Current archive version number (the format we can output) */
 #define K_VERS_MAJOR 1
@@ -347,6 +348,7 @@ struct _archiveHandle
 	char	   *currUser;		/* current username, or NULL if unknown */
 	char	   *currSchema;		/* current schema, or NULL */
 	char	   *currTablespace; /* current tablespace, or NULL */
+	char	   *currTableAm; 	/* current table access method, or NULL */
 
 	void	   *lo_buf;
 	size_t		lo_buf_used;
@@ -373,6 +375,8 @@ struct _tocEntry
 	char	   *namespace;		/* null or empty string if not in a schema */
 	char	   *tablespace;		/* null if not in a tablespace; empty string
 								 * means use database default */
+	char	   *tableam;		/* table access method, only for TABLE tags */
+
 	char	   *owner;
 	char	   *desc;
 	char	   *defn;
@@ -410,7 +414,7 @@ extern TocEntry *ArchiveEntry(Archive *AHX,
 			 CatalogId catalogId, DumpId dumpId,
 			 const char *tag,
 			 const char *namespace, const char *tablespace,
-			 const char *owner,
+			 const char *owner, const char *amname,
 			 const char *desc, teSection section,
 			 const char *defn,
 			 const char *dropStmt, const char *copyStmt,
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 637c79af48..512c486546 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2136,7 +2136,7 @@ dumpTableData(Archive *fout, TableDataInfo *tdinfo)
 
 		te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
 						  tbinfo->dobj.name, tbinfo->dobj.namespace->dobj.name,
-						  NULL, tbinfo->rolname,
+						  NULL, tbinfo->rolname, NULL,
 						  "TABLE DATA", SECTION_DATA,
 						  "", "", copyStmt,
 						  &(tbinfo->dobj.dumpId), 1,
@@ -2188,6 +2188,7 @@ refreshMatViewData(Archive *fout, TableDataInfo *tdinfo)
 					 tbinfo->dobj.namespace->dobj.name, /* Namespace */
 					 NULL,		/* Tablespace */
 					 tbinfo->rolname,	/* Owner */
+					 NULL,				/* Table access method */
 					 "MATERIALIZED VIEW DATA",	/* Desc */
 					 SECTION_POST_DATA, /* Section */
 					 q->data,	/* Create */
@@ -2726,6 +2727,7 @@ dumpDatabase(Archive *fout)
 				 NULL,			/* Namespace */
 				 NULL,			/* Tablespace */
 				 dba,			/* Owner */
+				 NULL,			/* Table access method */
 				 "DATABASE",	/* Desc */
 				 SECTION_PRE_DATA,	/* Section */
 				 creaQry->data, /* Create */
@@ -2762,7 +2764,7 @@ dumpDatabase(Archive *fout)
 			appendPQExpBufferStr(dbQry, ";\n");
 
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
-						 labelq->data, NULL, NULL, dba,
+						 labelq->data, NULL, NULL, dba, NULL,
 						 "COMMENT", SECTION_NONE,
 						 dbQry->data, "", NULL,
 						 &(dbDumpId), 1,
@@ -2789,7 +2791,7 @@ dumpDatabase(Archive *fout)
 		emitShSecLabels(conn, shres, seclabelQry, "DATABASE", datname);
 		if (seclabelQry->len > 0)
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
-						 labelq->data, NULL, NULL, dba,
+						 labelq->data, NULL, NULL, dba, NULL,
 						 "SECURITY LABEL", SECTION_NONE,
 						 seclabelQry->data, "", NULL,
 						 &(dbDumpId), 1,
@@ -2859,7 +2861,7 @@ dumpDatabase(Archive *fout)
 
 	if (creaQry->len > 0)
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 datname, NULL, NULL, dba,
+					 datname, NULL, NULL, dba, NULL,
 					 "DATABASE PROPERTIES", SECTION_PRE_DATA,
 					 creaQry->data, delQry->data, NULL,
 					 &(dbDumpId), 1,
@@ -2904,7 +2906,7 @@ dumpDatabase(Archive *fout)
 						  atooid(PQgetvalue(lo_res, 0, i_relminmxid)),
 						  LargeObjectRelationId);
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 "pg_largeobject", NULL, NULL, "",
+					 "pg_largeobject", NULL, NULL, "", NULL,
 					 "pg_largeobject", SECTION_PRE_DATA,
 					 loOutQry->data, "", NULL,
 					 NULL, 0,
@@ -3014,7 +3016,7 @@ dumpEncoding(Archive *AH)
 	appendPQExpBufferStr(qry, ";\n");
 
 	ArchiveEntry(AH, nilCatalogId, createDumpId(),
-				 "ENCODING", NULL, NULL, "",
+				 "ENCODING", NULL, NULL, "", NULL,
 				 "ENCODING", SECTION_PRE_DATA,
 				 qry->data, "", NULL,
 				 NULL, 0,
@@ -3041,7 +3043,7 @@ dumpStdStrings(Archive *AH)
 					  stdstrings);
 
 	ArchiveEntry(AH, nilCatalogId, createDumpId(),
-				 "STDSTRINGS", NULL, NULL, "",
+				 "STDSTRINGS", NULL, NULL, "", NULL,
 				 "STDSTRINGS", SECTION_PRE_DATA,
 				 qry->data, "", NULL,
 				 NULL, 0,
@@ -3097,7 +3099,7 @@ dumpSearchPath(Archive *AH)
 		write_msg(NULL, "saving search_path = %s\n", path->data);
 
 	ArchiveEntry(AH, nilCatalogId, createDumpId(),
-				 "SEARCHPATH", NULL, NULL, "",
+				 "SEARCHPATH", NULL, NULL, "", NULL,
 				 "SEARCHPATH", SECTION_PRE_DATA,
 				 qry->data, "", NULL,
 				 NULL, 0,
@@ -3275,7 +3277,7 @@ dumpBlob(Archive *fout, BlobInfo *binfo)
 		ArchiveEntry(fout, binfo->dobj.catId, binfo->dobj.dumpId,
 					 binfo->dobj.name,
 					 NULL, NULL,
-					 binfo->rolname,
+					 binfo->rolname, NULL,
 					 "BLOB", SECTION_PRE_DATA,
 					 cquery->data, dquery->data, NULL,
 					 NULL, 0,
@@ -3581,6 +3583,7 @@ dumpPolicy(Archive *fout, PolicyInfo *polinfo)
 						 polinfo->dobj.namespace->dobj.name,
 						 NULL,
 						 tbinfo->rolname,
+						 NULL,
 						 "ROW SECURITY", SECTION_POST_DATA,
 						 query->data, "", NULL,
 						 &(tbinfo->dobj.dumpId), 1,
@@ -3637,6 +3640,7 @@ dumpPolicy(Archive *fout, PolicyInfo *polinfo)
 					 polinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "POLICY", SECTION_POST_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -3811,6 +3815,7 @@ dumpPublication(Archive *fout, PublicationInfo *pubinfo)
 				 NULL,
 				 NULL,
 				 pubinfo->rolname,
+				 NULL,
 				 "PUBLICATION", SECTION_POST_DATA,
 				 query->data, delq->data, NULL,
 				 NULL, 0,
@@ -3954,6 +3959,7 @@ dumpPublicationTable(Archive *fout, PublicationRelInfo *pubrinfo)
 				 tbinfo->dobj.namespace->dobj.name,
 				 NULL,
 				 "",
+				 NULL,
 				 "PUBLICATION TABLE", SECTION_POST_DATA,
 				 query->data, "", NULL,
 				 NULL, 0,
@@ -4147,6 +4153,7 @@ dumpSubscription(Archive *fout, SubscriptionInfo *subinfo)
 				 NULL,
 				 NULL,
 				 subinfo->rolname,
+				 NULL,
 				 "SUBSCRIPTION", SECTION_POST_DATA,
 				 query->data, delq->data, NULL,
 				 NULL, 0,
@@ -5829,6 +5836,7 @@ getTables(Archive *fout, int *numTables)
 	int			i_partkeydef;
 	int			i_ispartition;
 	int			i_partbound;
+	int			i_amname;
 
 	/*
 	 * Find all the tables and table-like objects.
@@ -5914,7 +5922,7 @@ getTables(Archive *fout, int *numTables)
 						  "tc.relfrozenxid AS tfrozenxid, "
 						  "tc.relminmxid AS tminmxid, "
 						  "c.relpersistence, c.relispopulated, "
-						  "c.relreplident, c.relpages, "
+						  "c.relreplident, c.relpages, am.amname, "
 						  "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
 						  "d.refobjid AS owning_tab, "
 						  "d.refobjsubid AS owning_col, "
@@ -5945,6 +5953,7 @@ getTables(Archive *fout, int *numTables)
 						  "d.objsubid = 0 AND "
 						  "d.refclassid = c.tableoid AND d.deptype IN ('a', 'i')) "
 						  "LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid) "
+						  "LEFT JOIN pg_am am ON (c.relam = am.oid) "
 						  "LEFT JOIN pg_init_privs pip ON "
 						  "(c.oid = pip.objoid "
 						  "AND pip.classoid = 'pg_class'::regclass "
@@ -6412,6 +6421,7 @@ getTables(Archive *fout, int *numTables)
 	i_partkeydef = PQfnumber(res, "partkeydef");
 	i_ispartition = PQfnumber(res, "ispartition");
 	i_partbound = PQfnumber(res, "partbound");
+	i_amname = PQfnumber(res, "amname");
 
 	if (dopt->lockWaitTimeout)
 	{
@@ -6481,6 +6491,11 @@ getTables(Archive *fout, int *numTables)
 		else
 			tblinfo[i].checkoption = pg_strdup(PQgetvalue(res, i, i_checkoption));
 		tblinfo[i].toast_reloptions = pg_strdup(PQgetvalue(res, i, i_toastreloptions));
+		if (PQgetisnull(res, i, i_amname))
+			tblinfo[i].amname = NULL;
+		else
+			tblinfo[i].amname = pg_strdup(PQgetvalue(res, i, i_amname));
+
 
 		/* other fields were zeroed above */
 
@@ -9355,7 +9370,7 @@ dumpComment(Archive *fout, const char *type, const char *name,
 		 * post-data.
 		 */
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 tag->data, namespace, NULL, owner,
+					 tag->data, namespace, NULL, owner, NULL,
 					 "COMMENT", SECTION_NONE,
 					 query->data, "", NULL,
 					 &(dumpId), 1,
@@ -9423,7 +9438,7 @@ dumpTableComment(Archive *fout, TableInfo *tbinfo,
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
 						 tag->data,
 						 tbinfo->dobj.namespace->dobj.name,
-						 NULL, tbinfo->rolname,
+						 NULL, tbinfo->rolname, NULL,
 						 "COMMENT", SECTION_NONE,
 						 query->data, "", NULL,
 						 &(tbinfo->dobj.dumpId), 1,
@@ -9447,7 +9462,7 @@ dumpTableComment(Archive *fout, TableInfo *tbinfo,
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
 						 tag->data,
 						 tbinfo->dobj.namespace->dobj.name,
-						 NULL, tbinfo->rolname,
+						 NULL, tbinfo->rolname, NULL,
 						 "COMMENT", SECTION_NONE,
 						 query->data, "", NULL,
 						 &(tbinfo->dobj.dumpId), 1,
@@ -9728,7 +9743,7 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 				TocEntry   *te;
 
 				te = ArchiveEntry(fout, dobj->catId, dobj->dumpId,
-								  dobj->name, NULL, NULL, "",
+								  dobj->name, NULL, NULL, "", NULL,
 								  "BLOBS", SECTION_DATA,
 								  "", "", NULL,
 								  NULL, 0,
@@ -9802,7 +9817,7 @@ dumpNamespace(Archive *fout, NamespaceInfo *nspinfo)
 		ArchiveEntry(fout, nspinfo->dobj.catId, nspinfo->dobj.dumpId,
 					 nspinfo->dobj.name,
 					 NULL, NULL,
-					 nspinfo->rolname,
+					 nspinfo->rolname, NULL,
 					 "SCHEMA", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -9938,7 +9953,7 @@ dumpExtension(Archive *fout, ExtensionInfo *extinfo)
 		ArchiveEntry(fout, extinfo->dobj.catId, extinfo->dobj.dumpId,
 					 extinfo->dobj.name,
 					 NULL, NULL,
-					 "",
+					 "", NULL,
 					 "EXTENSION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10090,6 +10105,7 @@ dumpEnumType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10217,6 +10233,7 @@ dumpRangeType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10290,6 +10307,7 @@ dumpUndefinedType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10572,6 +10590,7 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10729,6 +10748,7 @@ dumpDomain(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "DOMAIN", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10951,6 +10971,7 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -11085,7 +11106,7 @@ dumpCompositeTypeColComments(Archive *fout, TypeInfo *tyinfo)
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
 						 target->data,
 						 tyinfo->dobj.namespace->dobj.name,
-						 NULL, tyinfo->rolname,
+						 NULL, tyinfo->rolname, NULL,
 						 "COMMENT", SECTION_NONE,
 						 query->data, "", NULL,
 						 &(tyinfo->dobj.dumpId), 1,
@@ -11142,6 +11163,7 @@ dumpShellType(Archive *fout, ShellTypeInfo *stinfo)
 					 stinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 stinfo->baseType->rolname,
+					 NULL,
 					 "SHELL TYPE", SECTION_PRE_DATA,
 					 q->data, "", NULL,
 					 NULL, 0,
@@ -11251,7 +11273,7 @@ dumpProcLang(Archive *fout, ProcLangInfo *plang)
 	if (plang->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, plang->dobj.catId, plang->dobj.dumpId,
 					 plang->dobj.name,
-					 NULL, NULL, plang->lanowner,
+					 NULL, NULL, plang->lanowner, NULL,
 					 "PROCEDURAL LANGUAGE", SECTION_PRE_DATA,
 					 defqry->data, delqry->data, NULL,
 					 NULL, 0,
@@ -11924,6 +11946,7 @@ dumpFunc(Archive *fout, FuncInfo *finfo)
 					 finfo->dobj.namespace->dobj.name,
 					 NULL,
 					 finfo->rolname,
+					 NULL,
 					 keyword, SECTION_PRE_DATA,
 					 q->data, delqry->data, NULL,
 					 NULL, 0,
@@ -12056,7 +12079,7 @@ dumpCast(Archive *fout, CastInfo *cast)
 	if (cast->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, cast->dobj.catId, cast->dobj.dumpId,
 					 labelq->data,
-					 NULL, NULL, "",
+					 NULL, NULL, "", NULL,
 					 "CAST", SECTION_PRE_DATA,
 					 defqry->data, delqry->data, NULL,
 					 NULL, 0,
@@ -12184,7 +12207,7 @@ dumpTransform(Archive *fout, TransformInfo *transform)
 	if (transform->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, transform->dobj.catId, transform->dobj.dumpId,
 					 labelq->data,
-					 NULL, NULL, "",
+					 NULL, NULL, "", NULL,
 					 "TRANSFORM", SECTION_PRE_DATA,
 					 defqry->data, delqry->data, NULL,
 					 transform->dobj.dependencies, transform->dobj.nDeps,
@@ -12400,6 +12423,7 @@ dumpOpr(Archive *fout, OprInfo *oprinfo)
 					 oprinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 oprinfo->rolname,
+					 NULL,
 					 "OPERATOR", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -12546,6 +12570,9 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
 		case AMTYPE_INDEX:
 			appendPQExpBuffer(q, "TYPE INDEX ");
 			break;
+		case AMTYPE_TABLE:
+			appendPQExpBuffer(q, "TYPE TABLE ");
+			break;
 		default:
 			write_msg(NULL, "WARNING: invalid type \"%c\" of access method \"%s\"\n",
 					  aminfo->amtype, qamname);
@@ -12570,6 +12597,7 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
 					 NULL,
 					 NULL,
 					 "",
+					 NULL,
 					 "ACCESS METHOD", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -12936,6 +12964,7 @@ dumpOpclass(Archive *fout, OpclassInfo *opcinfo)
 					 opcinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 opcinfo->rolname,
+					 NULL,
 					 "OPERATOR CLASS", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13203,6 +13232,7 @@ dumpOpfamily(Archive *fout, OpfamilyInfo *opfinfo)
 					 opfinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 opfinfo->rolname,
+					 NULL,
 					 "OPERATOR FAMILY", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13346,6 +13376,7 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
 					 collinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 collinfo->rolname,
+					 NULL,
 					 "COLLATION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13441,6 +13472,7 @@ dumpConversion(Archive *fout, ConvInfo *convinfo)
 					 convinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 convinfo->rolname,
+					 NULL,
 					 "CONVERSION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13930,6 +13962,7 @@ dumpAgg(Archive *fout, AggInfo *agginfo)
 					 agginfo->aggfn.dobj.namespace->dobj.name,
 					 NULL,
 					 agginfo->aggfn.rolname,
+					 NULL,
 					 "AGGREGATE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14028,6 +14061,7 @@ dumpTSParser(Archive *fout, TSParserInfo *prsinfo)
 					 prsinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 "",
+					 NULL,
 					 "TEXT SEARCH PARSER", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14108,6 +14142,7 @@ dumpTSDictionary(Archive *fout, TSDictInfo *dictinfo)
 					 dictinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 dictinfo->rolname,
+					 NULL,
 					 "TEXT SEARCH DICTIONARY", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14169,6 +14204,7 @@ dumpTSTemplate(Archive *fout, TSTemplateInfo *tmplinfo)
 					 tmplinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 "",
+					 NULL,
 					 "TEXT SEARCH TEMPLATE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14289,6 +14325,7 @@ dumpTSConfig(Archive *fout, TSConfigInfo *cfginfo)
 					 cfginfo->dobj.namespace->dobj.name,
 					 NULL,
 					 cfginfo->rolname,
+					 NULL,
 					 "TEXT SEARCH CONFIGURATION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14355,6 +14392,7 @@ dumpForeignDataWrapper(Archive *fout, FdwInfo *fdwinfo)
 					 NULL,
 					 NULL,
 					 fdwinfo->rolname,
+					 NULL,
 					 "FOREIGN DATA WRAPPER", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14446,6 +14484,7 @@ dumpForeignServer(Archive *fout, ForeignServerInfo *srvinfo)
 					 NULL,
 					 NULL,
 					 srvinfo->rolname,
+					 NULL,
 					 "SERVER", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14564,6 +14603,7 @@ dumpUserMappings(Archive *fout,
 					 namespace,
 					 NULL,
 					 owner,
+					 NULL,
 					 "USER MAPPING", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 &dumpId, 1,
@@ -14643,6 +14683,7 @@ dumpDefaultACL(Archive *fout, DefaultACLInfo *daclinfo)
 					 daclinfo->dobj.namespace ? daclinfo->dobj.namespace->dobj.name : NULL,
 					 NULL,
 					 daclinfo->defaclrole,
+					 NULL,
 					 "DEFAULT ACL", SECTION_POST_DATA,
 					 q->data, "", NULL,
 					 NULL, 0,
@@ -14741,6 +14782,7 @@ dumpACL(Archive *fout, CatalogId objCatId, DumpId objDumpId,
 					 tag->data, nspname,
 					 NULL,
 					 owner ? owner : "",
+					 NULL,
 					 "ACL", SECTION_NONE,
 					 sql->data, "", NULL,
 					 &(objDumpId), 1,
@@ -14826,7 +14868,7 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
 
 		appendPQExpBuffer(tag, "%s %s", type, name);
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 tag->data, namespace, NULL, owner,
+					 tag->data, namespace, NULL, owner, NULL,
 					 "SECURITY LABEL", SECTION_NONE,
 					 query->data, "", NULL,
 					 &(dumpId), 1,
@@ -14908,7 +14950,7 @@ dumpTableSecLabel(Archive *fout, TableInfo *tbinfo, const char *reltypename)
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
 					 target->data,
 					 tbinfo->dobj.namespace->dobj.name,
-					 NULL, tbinfo->rolname,
+					 NULL, tbinfo->rolname, NULL,
 					 "SECURITY LABEL", SECTION_NONE,
 					 query->data, "", NULL,
 					 &(tbinfo->dobj.dumpId), 1,
@@ -15994,6 +16036,8 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 (tbinfo->relkind == RELKIND_VIEW) ? NULL : tbinfo->reltablespace,
 					 tbinfo->rolname,
+					 (tbinfo->relkind == RELKIND_RELATION) ?
+					 tbinfo->amname : NULL,
 					 reltypename,
 					 tbinfo->postponed_def ?
 					 SECTION_POST_DATA : SECTION_PRE_DATA,
@@ -16074,6 +16118,7 @@ dumpAttrDef(Archive *fout, AttrDefInfo *adinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "DEFAULT", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -16190,6 +16235,7 @@ dumpIndex(Archive *fout, IndxInfo *indxinfo)
 						 tbinfo->dobj.namespace->dobj.name,
 						 indxinfo->tablespace,
 						 tbinfo->rolname,
+						 NULL,
 						 "INDEX", SECTION_POST_DATA,
 						 q->data, delq->data, NULL,
 						 NULL, 0,
@@ -16234,6 +16280,7 @@ dumpIndexAttach(Archive *fout, IndexAttachInfo *attachinfo)
 					 attachinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 "",
+					 NULL,
 					 "INDEX ATTACH", SECTION_POST_DATA,
 					 q->data, "", NULL,
 					 NULL, 0,
@@ -16289,6 +16336,7 @@ dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo)
 					 statsextinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 statsextinfo->rolname,
+					 NULL,
 					 "STATISTICS", SECTION_POST_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -16450,6 +16498,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 						 tbinfo->dobj.namespace->dobj.name,
 						 indxinfo->tablespace,
 						 tbinfo->rolname,
+						 NULL,
 						 "CONSTRAINT", SECTION_POST_DATA,
 						 q->data, delq->data, NULL,
 						 NULL, 0,
@@ -16490,6 +16539,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 						 tbinfo->dobj.namespace->dobj.name,
 						 NULL,
 						 tbinfo->rolname,
+						 NULL,
 						 "FK CONSTRAINT", SECTION_POST_DATA,
 						 q->data, delq->data, NULL,
 						 NULL, 0,
@@ -16522,6 +16572,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 							 tbinfo->dobj.namespace->dobj.name,
 							 NULL,
 							 tbinfo->rolname,
+							 NULL,
 							 "CHECK CONSTRAINT", SECTION_POST_DATA,
 							 q->data, delq->data, NULL,
 							 NULL, 0,
@@ -16555,6 +16606,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 							 tyinfo->dobj.namespace->dobj.name,
 							 NULL,
 							 tyinfo->rolname,
+							 NULL,
 							 "CHECK CONSTRAINT", SECTION_POST_DATA,
 							 q->data, delq->data, NULL,
 							 NULL, 0,
@@ -16829,6 +16881,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "SEQUENCE", SECTION_PRE_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -16870,6 +16923,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
 							 tbinfo->dobj.namespace->dobj.name,
 							 NULL,
 							 tbinfo->rolname,
+							 NULL,
 							 "SEQUENCE OWNED BY", SECTION_PRE_DATA,
 							 query->data, "", NULL,
 							 &(tbinfo->dobj.dumpId), 1,
@@ -16938,6 +16992,7 @@ dumpSequenceData(Archive *fout, TableDataInfo *tdinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "SEQUENCE SET", SECTION_DATA,
 					 query->data, "", NULL,
 					 &(tbinfo->dobj.dumpId), 1,
@@ -17137,6 +17192,7 @@ dumpTrigger(Archive *fout, TriggerInfo *tginfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "TRIGGER", SECTION_POST_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -17223,7 +17279,7 @@ dumpEventTrigger(Archive *fout, EventTriggerInfo *evtinfo)
 	if (evtinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, evtinfo->dobj.catId, evtinfo->dobj.dumpId,
 					 evtinfo->dobj.name, NULL, NULL,
-					 evtinfo->evtowner,
+					 evtinfo->evtowner, NULL,
 					 "EVENT TRIGGER", SECTION_POST_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -17384,6 +17440,7 @@ dumpRule(Archive *fout, RuleInfo *rinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "RULE", SECTION_POST_DATA,
 					 cmd->data, delcmd->data, NULL,
 					 NULL, 0,
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 789d6a24e2..4024d0c1e3 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -324,6 +324,7 @@ typedef struct _tableInfo
 	char	   *partkeydef;		/* partition key definition */
 	char	   *partbound;		/* partition bound definition */
 	bool		needs_override; /* has GENERATED ALWAYS AS IDENTITY */
+	char	   *amname; 		/* table access method */
 
 	/*
 	 * Stuff computed only for dumpable tables.
#68Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#54)
Re: Pluggable Storage - Andres's take

On Tue, Dec 11, 2018 at 1:13 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-11-26 17:55:57 -0800, Andres Freund wrote:
Further tasks I'm not yet planning to tackle, that I'd welcome help on:
- pg_upgrade testing

I did the pg_upgrade testing from older version with some tables and views
exists, and all of them are properly transformed into new server with heap
as the default access method.

I will add the dimitry pg_dump patch and test the pg_upgrade to confirm
the proper access method is retained on the upgraded database.

- I think we should consider removing HeapTuple->t_tableOid, it should
imo live entirely in the slot

I removed the t_tableOid from HeapTuple and during testing I found some
problems with triggers, will post the patch once it is fixed.

Regards,
Haribabu Kommi
Fujitsu Australia

#69Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#68)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-01-15 18:02:38 +1100, Haribabu Kommi wrote:

On Tue, Dec 11, 2018 at 1:13 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-11-26 17:55:57 -0800, Andres Freund wrote:
Further tasks I'm not yet planning to tackle, that I'd welcome help on:
- pg_upgrade testing

I did the pg_upgrade testing from older version with some tables and views
exists, and all of them are properly transformed into new server with heap
as the default access method.

I will add the dimitry pg_dump patch and test the pg_upgrade to confirm
the proper access method is retained on the upgraded database.

- I think we should consider removing HeapTuple->t_tableOid, it should
imo live entirely in the slot

I removed the t_tableOid from HeapTuple and during testing I found some
problems with triggers, will post the patch once it is fixed.

Please note that I'm working on a heavily revised version of the patch
right now, trying to clean up a lot of things (you might have seen some
of the threads I started). I hope to post it ~Thursday. Local-ish
patches shouldn't be a problem though.

Greetings,

Andres Freund

#70Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Dmitry Dolgov (#65)
Re: Pluggable Storage - Andres's take

On Sat, 12 Jan 2019 at 18:11, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Sat, Jan 12, 2019 at 1:44 AM Andres Freund <andres@anarazel.de> wrote:

/* other fields were zeroed above */

@@ -9355,7 +9370,7 @@ dumpComment(Archive *fout, const char *type, const char *name,
* post-data.
*/
ArchiveEntry(fout, nilCatalogId, createDumpId(),
-                                      tag->data, namespace, NULL, owner,
+                                      tag->data, namespace, NULL, owner, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(dumpId), 1,

We really ought to move the arguments to a struct, so we don't generate
quite as much useless diffs whenever we do a change around one of
these...

That's what I thought too. Maybe then I'll suggest a mini-patch to the master to
refactor these arguments out into a separate struct, so we can leverage it here.

Then for each of the calls, we would need to declare that structure
variable (with = {0}) and assign required fields in that structure
before passing it to ArchiveEntry(). But a major reason of
ArchiveEntry() is to avoid doing this and instead conveniently pass
those fields as parameters. This will cause unnecessary more lines of
code. I think better way is to have an ArchiveEntry() function with
limited number of parameters, and have an ArchiveEntryEx() with those
extra parameters which are not needed in usual cases. E.g. we can have
tablespace, tableam, dumpFn and dumpArg as those extra arguments of
ArchiveEntryEx(), because most of the places these are passed as NULL.
All future arguments would go in ArchiveEntryEx().
Comments ?

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#71Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Dmitry Dolgov (#67)
Re: Pluggable Storage - Andres's take

On Tue, 15 Jan 2019 at 12:27, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Mon, Jan 14, 2019 at 2:07 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

createPQExpBuffer() should be moved after the below statement, so that
it does not leak memory

Thanks for noticing, fixed.

Looks good.

So how about bumping up the archive version and doing these checks ?

Yeah, you're right, I've added this check.

Need to bump K_VERS_MINOR as well.

On Mon, 14 Jan 2019 at 18:36, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

+static void _selectTableAccessMethod(ArchiveHandle *AH, const char
*tablespace);
tablespace => tableam

This is yet to be addressed.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#72Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Amit Khandekar (#71)
2 attachment(s)
Re: Pluggable Storage - Andres's take

On Tue, Jan 15, 2019 at 10:52 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Need to bump K_VERS_MINOR as well.

I've bumped it up, but somehow this change escaped the previous version. Now
should be there, thanks!

On Mon, 14 Jan 2019 at 18:36, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

+static void _selectTableAccessMethod(ArchiveHandle *AH, const char
*tablespace);
tablespace => tableam

This is yet to be addressed.

Fixed.

Also I guess another attached patch should address the psql part, namely
displaying a table access method with \d+ and possibility to hide it with a
psql variable (HIDE_TABLEAM, but I'm open for suggestion about the name).

Attachments:

pg_dump_access_method_v4.patchapplication/octet-stream; name=pg_dump_access_method_v4.patchDownload
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 58bd3805f4..38f24ba1bf 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -85,6 +85,7 @@ static void _becomeUser(ArchiveHandle *AH, const char *user);
 static void _becomeOwner(ArchiveHandle *AH, TocEntry *te);
 static void _selectOutputSchema(ArchiveHandle *AH, const char *schemaName);
 static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
+static void _selectTableAccessMethod(ArchiveHandle *AH, const char *tableam);
 static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
 static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
 static void processSearchPathEntry(ArchiveHandle *AH, TocEntry *te);
@@ -1072,6 +1073,7 @@ ArchiveEntry(Archive *AHX,
 			 const char *namespace,
 			 const char *tablespace,
 			 const char *owner,
+			 const char *tableam,
 			 const char *desc, teSection section,
 			 const char *defn,
 			 const char *dropStmt, const char *copyStmt,
@@ -1099,6 +1101,7 @@ ArchiveEntry(Archive *AHX,
 	newToc->tag = pg_strdup(tag);
 	newToc->namespace = namespace ? pg_strdup(namespace) : NULL;
 	newToc->tablespace = tablespace ? pg_strdup(tablespace) : NULL;
+	newToc->tableam = tableam ? pg_strdup(tableam) : NULL;
 	newToc->owner = pg_strdup(owner);
 	newToc->desc = pg_strdup(desc);
 	newToc->defn = pg_strdup(defn);
@@ -2367,6 +2370,7 @@ _allocAH(const char *FileSpec, const ArchiveFormat fmt,
 	AH->currUser = NULL;		/* unknown */
 	AH->currSchema = NULL;		/* ditto */
 	AH->currTablespace = NULL;	/* ditto */
+	AH->currTableAm = NULL;	/* ditto */
 
 	AH->toc = (TocEntry *) pg_malloc0(sizeof(TocEntry));
 
@@ -2594,6 +2598,7 @@ WriteToc(ArchiveHandle *AH)
 		WriteStr(AH, te->namespace);
 		WriteStr(AH, te->tablespace);
 		WriteStr(AH, te->owner);
+		WriteStr(AH, te->tableam);
 		WriteStr(AH, "false");
 
 		/* Dump list of dependencies */
@@ -2696,6 +2701,9 @@ ReadToc(ArchiveHandle *AH)
 			te->tablespace = ReadStr(AH);
 
 		te->owner = ReadStr(AH);
+		if (AH->version >= K_VERS_1_14)
+			te->tableam = ReadStr(AH);
+
 		if (AH->version < K_VERS_1_9 || strcmp(ReadStr(AH), "true") == 0)
 			write_msg(modulename,
 					  "WARNING: restoring tables WITH OIDS is not supported anymore");
@@ -3288,6 +3296,9 @@ _reconnectToDB(ArchiveHandle *AH, const char *dbname)
 	if (AH->currTablespace)
 		free(AH->currTablespace);
 	AH->currTablespace = NULL;
+	if (AH->currTableAm)
+		free(AH->currTableAm);
+	AH->currTableAm = NULL;
 
 	/* re-establish fixed state */
 	_doSetFixedOutputState(AH);
@@ -3448,6 +3459,48 @@ _selectTablespace(ArchiveHandle *AH, const char *tablespace)
 	destroyPQExpBuffer(qry);
 }
 
+/*
+ * Set the proper default_table_access_method value for the table.
+ */
+static void
+_selectTableAccessMethod(ArchiveHandle *AH, const char *tableam)
+{
+	PQExpBuffer cmd;
+	const char *want, *have;
+
+	have = AH->currTableAm;
+	want = tableam;
+
+	if (!want)
+		return;
+
+	if (have && strcmp(want, have) == 0)
+		return;
+
+	cmd = createPQExpBuffer();
+	appendPQExpBuffer(cmd, "SET default_table_access_method = %s;", fmtId(want));
+
+	if (RestoringToDB(AH))
+	{
+		PGresult   *res;
+
+		res = PQexec(AH->connection, cmd->data);
+
+		if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
+			warn_or_exit_horribly(AH, modulename,
+								  "could not set default_table_access_method: %s",
+								  PQerrorMessage(AH->connection));
+
+		PQclear(res);
+	}
+	else
+		ahprintf(AH, "%s\n\n", cmd->data);
+
+	destroyPQExpBuffer(cmd);
+
+	AH->currTableAm = pg_strdup(want);
+}
+
 /*
  * Extract an object description for a TOC entry, and append it to buf.
  *
@@ -3547,6 +3600,7 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
 	_becomeOwner(AH, te);
 	_selectOutputSchema(AH, te->namespace);
 	_selectTablespace(AH, te->tablespace);
+	_selectTableAccessMethod(AH, te->tableam);
 
 	/* Emit header comment for item */
 	if (!AH->noTocComments)
@@ -4021,6 +4075,9 @@ restore_toc_entries_prefork(ArchiveHandle *AH, TocEntry *pending_list)
 	if (AH->currTablespace)
 		free(AH->currTablespace);
 	AH->currTablespace = NULL;
+	if (AH->currTableAm)
+		free(AH->currTableAm);
+	AH->currTableAm = NULL;
 }
 
 /*
@@ -4816,6 +4873,7 @@ CloneArchive(ArchiveHandle *AH)
 	clone->currUser = NULL;
 	clone->currSchema = NULL;
 	clone->currTablespace = NULL;
+	clone->currTableAm = NULL;
 
 	/* savedPassword must be local in case we change it while connecting */
 	if (clone->savedPassword)
@@ -4906,6 +4964,8 @@ DeCloneArchive(ArchiveHandle *AH)
 		free(AH->currSchema);
 	if (AH->currTablespace)
 		free(AH->currTablespace);
+	if (AH->currTableAm)
+		free(AH->currTableAm);
 	if (AH->savedPassword)
 		free(AH->savedPassword);
 
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 306d2ceba9..c719fca0ad 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -94,10 +94,11 @@ typedef z_stream *z_streamp;
 													 * entries */
 #define K_VERS_1_13 MAKE_ARCHIVE_VERSION(1, 13, 0)	/* change search_path
 													 * behavior */
+#define K_VERS_1_14 MAKE_ARCHIVE_VERSION(1, 14, 0)	/* add tableam */
 
 /* Current archive version number (the format we can output) */
 #define K_VERS_MAJOR 1
-#define K_VERS_MINOR 13
+#define K_VERS_MINOR 14
 #define K_VERS_REV 0
 #define K_VERS_SELF MAKE_ARCHIVE_VERSION(K_VERS_MAJOR, K_VERS_MINOR, K_VERS_REV);
 
@@ -347,6 +348,7 @@ struct _archiveHandle
 	char	   *currUser;		/* current username, or NULL if unknown */
 	char	   *currSchema;		/* current schema, or NULL */
 	char	   *currTablespace; /* current tablespace, or NULL */
+	char	   *currTableAm; 	/* current table access method, or NULL */
 
 	void	   *lo_buf;
 	size_t		lo_buf_used;
@@ -373,6 +375,8 @@ struct _tocEntry
 	char	   *namespace;		/* null or empty string if not in a schema */
 	char	   *tablespace;		/* null if not in a tablespace; empty string
 								 * means use database default */
+	char	   *tableam;		/* table access method, only for TABLE tags */
+
 	char	   *owner;
 	char	   *desc;
 	char	   *defn;
@@ -410,7 +414,7 @@ extern TocEntry *ArchiveEntry(Archive *AHX,
 			 CatalogId catalogId, DumpId dumpId,
 			 const char *tag,
 			 const char *namespace, const char *tablespace,
-			 const char *owner,
+			 const char *owner, const char *amname,
 			 const char *desc, teSection section,
 			 const char *defn,
 			 const char *dropStmt, const char *copyStmt,
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 637c79af48..512c486546 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2136,7 +2136,7 @@ dumpTableData(Archive *fout, TableDataInfo *tdinfo)
 
 		te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
 						  tbinfo->dobj.name, tbinfo->dobj.namespace->dobj.name,
-						  NULL, tbinfo->rolname,
+						  NULL, tbinfo->rolname, NULL,
 						  "TABLE DATA", SECTION_DATA,
 						  "", "", copyStmt,
 						  &(tbinfo->dobj.dumpId), 1,
@@ -2188,6 +2188,7 @@ refreshMatViewData(Archive *fout, TableDataInfo *tdinfo)
 					 tbinfo->dobj.namespace->dobj.name, /* Namespace */
 					 NULL,		/* Tablespace */
 					 tbinfo->rolname,	/* Owner */
+					 NULL,				/* Table access method */
 					 "MATERIALIZED VIEW DATA",	/* Desc */
 					 SECTION_POST_DATA, /* Section */
 					 q->data,	/* Create */
@@ -2726,6 +2727,7 @@ dumpDatabase(Archive *fout)
 				 NULL,			/* Namespace */
 				 NULL,			/* Tablespace */
 				 dba,			/* Owner */
+				 NULL,			/* Table access method */
 				 "DATABASE",	/* Desc */
 				 SECTION_PRE_DATA,	/* Section */
 				 creaQry->data, /* Create */
@@ -2762,7 +2764,7 @@ dumpDatabase(Archive *fout)
 			appendPQExpBufferStr(dbQry, ";\n");
 
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
-						 labelq->data, NULL, NULL, dba,
+						 labelq->data, NULL, NULL, dba, NULL,
 						 "COMMENT", SECTION_NONE,
 						 dbQry->data, "", NULL,
 						 &(dbDumpId), 1,
@@ -2789,7 +2791,7 @@ dumpDatabase(Archive *fout)
 		emitShSecLabels(conn, shres, seclabelQry, "DATABASE", datname);
 		if (seclabelQry->len > 0)
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
-						 labelq->data, NULL, NULL, dba,
+						 labelq->data, NULL, NULL, dba, NULL,
 						 "SECURITY LABEL", SECTION_NONE,
 						 seclabelQry->data, "", NULL,
 						 &(dbDumpId), 1,
@@ -2859,7 +2861,7 @@ dumpDatabase(Archive *fout)
 
 	if (creaQry->len > 0)
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 datname, NULL, NULL, dba,
+					 datname, NULL, NULL, dba, NULL,
 					 "DATABASE PROPERTIES", SECTION_PRE_DATA,
 					 creaQry->data, delQry->data, NULL,
 					 &(dbDumpId), 1,
@@ -2904,7 +2906,7 @@ dumpDatabase(Archive *fout)
 						  atooid(PQgetvalue(lo_res, 0, i_relminmxid)),
 						  LargeObjectRelationId);
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 "pg_largeobject", NULL, NULL, "",
+					 "pg_largeobject", NULL, NULL, "", NULL,
 					 "pg_largeobject", SECTION_PRE_DATA,
 					 loOutQry->data, "", NULL,
 					 NULL, 0,
@@ -3014,7 +3016,7 @@ dumpEncoding(Archive *AH)
 	appendPQExpBufferStr(qry, ";\n");
 
 	ArchiveEntry(AH, nilCatalogId, createDumpId(),
-				 "ENCODING", NULL, NULL, "",
+				 "ENCODING", NULL, NULL, "", NULL,
 				 "ENCODING", SECTION_PRE_DATA,
 				 qry->data, "", NULL,
 				 NULL, 0,
@@ -3041,7 +3043,7 @@ dumpStdStrings(Archive *AH)
 					  stdstrings);
 
 	ArchiveEntry(AH, nilCatalogId, createDumpId(),
-				 "STDSTRINGS", NULL, NULL, "",
+				 "STDSTRINGS", NULL, NULL, "", NULL,
 				 "STDSTRINGS", SECTION_PRE_DATA,
 				 qry->data, "", NULL,
 				 NULL, 0,
@@ -3097,7 +3099,7 @@ dumpSearchPath(Archive *AH)
 		write_msg(NULL, "saving search_path = %s\n", path->data);
 
 	ArchiveEntry(AH, nilCatalogId, createDumpId(),
-				 "SEARCHPATH", NULL, NULL, "",
+				 "SEARCHPATH", NULL, NULL, "", NULL,
 				 "SEARCHPATH", SECTION_PRE_DATA,
 				 qry->data, "", NULL,
 				 NULL, 0,
@@ -3275,7 +3277,7 @@ dumpBlob(Archive *fout, BlobInfo *binfo)
 		ArchiveEntry(fout, binfo->dobj.catId, binfo->dobj.dumpId,
 					 binfo->dobj.name,
 					 NULL, NULL,
-					 binfo->rolname,
+					 binfo->rolname, NULL,
 					 "BLOB", SECTION_PRE_DATA,
 					 cquery->data, dquery->data, NULL,
 					 NULL, 0,
@@ -3581,6 +3583,7 @@ dumpPolicy(Archive *fout, PolicyInfo *polinfo)
 						 polinfo->dobj.namespace->dobj.name,
 						 NULL,
 						 tbinfo->rolname,
+						 NULL,
 						 "ROW SECURITY", SECTION_POST_DATA,
 						 query->data, "", NULL,
 						 &(tbinfo->dobj.dumpId), 1,
@@ -3637,6 +3640,7 @@ dumpPolicy(Archive *fout, PolicyInfo *polinfo)
 					 polinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "POLICY", SECTION_POST_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -3811,6 +3815,7 @@ dumpPublication(Archive *fout, PublicationInfo *pubinfo)
 				 NULL,
 				 NULL,
 				 pubinfo->rolname,
+				 NULL,
 				 "PUBLICATION", SECTION_POST_DATA,
 				 query->data, delq->data, NULL,
 				 NULL, 0,
@@ -3954,6 +3959,7 @@ dumpPublicationTable(Archive *fout, PublicationRelInfo *pubrinfo)
 				 tbinfo->dobj.namespace->dobj.name,
 				 NULL,
 				 "",
+				 NULL,
 				 "PUBLICATION TABLE", SECTION_POST_DATA,
 				 query->data, "", NULL,
 				 NULL, 0,
@@ -4147,6 +4153,7 @@ dumpSubscription(Archive *fout, SubscriptionInfo *subinfo)
 				 NULL,
 				 NULL,
 				 subinfo->rolname,
+				 NULL,
 				 "SUBSCRIPTION", SECTION_POST_DATA,
 				 query->data, delq->data, NULL,
 				 NULL, 0,
@@ -5829,6 +5836,7 @@ getTables(Archive *fout, int *numTables)
 	int			i_partkeydef;
 	int			i_ispartition;
 	int			i_partbound;
+	int			i_amname;
 
 	/*
 	 * Find all the tables and table-like objects.
@@ -5914,7 +5922,7 @@ getTables(Archive *fout, int *numTables)
 						  "tc.relfrozenxid AS tfrozenxid, "
 						  "tc.relminmxid AS tminmxid, "
 						  "c.relpersistence, c.relispopulated, "
-						  "c.relreplident, c.relpages, "
+						  "c.relreplident, c.relpages, am.amname, "
 						  "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
 						  "d.refobjid AS owning_tab, "
 						  "d.refobjsubid AS owning_col, "
@@ -5945,6 +5953,7 @@ getTables(Archive *fout, int *numTables)
 						  "d.objsubid = 0 AND "
 						  "d.refclassid = c.tableoid AND d.deptype IN ('a', 'i')) "
 						  "LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid) "
+						  "LEFT JOIN pg_am am ON (c.relam = am.oid) "
 						  "LEFT JOIN pg_init_privs pip ON "
 						  "(c.oid = pip.objoid "
 						  "AND pip.classoid = 'pg_class'::regclass "
@@ -6412,6 +6421,7 @@ getTables(Archive *fout, int *numTables)
 	i_partkeydef = PQfnumber(res, "partkeydef");
 	i_ispartition = PQfnumber(res, "ispartition");
 	i_partbound = PQfnumber(res, "partbound");
+	i_amname = PQfnumber(res, "amname");
 
 	if (dopt->lockWaitTimeout)
 	{
@@ -6481,6 +6491,11 @@ getTables(Archive *fout, int *numTables)
 		else
 			tblinfo[i].checkoption = pg_strdup(PQgetvalue(res, i, i_checkoption));
 		tblinfo[i].toast_reloptions = pg_strdup(PQgetvalue(res, i, i_toastreloptions));
+		if (PQgetisnull(res, i, i_amname))
+			tblinfo[i].amname = NULL;
+		else
+			tblinfo[i].amname = pg_strdup(PQgetvalue(res, i, i_amname));
+
 
 		/* other fields were zeroed above */
 
@@ -9355,7 +9370,7 @@ dumpComment(Archive *fout, const char *type, const char *name,
 		 * post-data.
 		 */
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 tag->data, namespace, NULL, owner,
+					 tag->data, namespace, NULL, owner, NULL,
 					 "COMMENT", SECTION_NONE,
 					 query->data, "", NULL,
 					 &(dumpId), 1,
@@ -9423,7 +9438,7 @@ dumpTableComment(Archive *fout, TableInfo *tbinfo,
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
 						 tag->data,
 						 tbinfo->dobj.namespace->dobj.name,
-						 NULL, tbinfo->rolname,
+						 NULL, tbinfo->rolname, NULL,
 						 "COMMENT", SECTION_NONE,
 						 query->data, "", NULL,
 						 &(tbinfo->dobj.dumpId), 1,
@@ -9447,7 +9462,7 @@ dumpTableComment(Archive *fout, TableInfo *tbinfo,
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
 						 tag->data,
 						 tbinfo->dobj.namespace->dobj.name,
-						 NULL, tbinfo->rolname,
+						 NULL, tbinfo->rolname, NULL,
 						 "COMMENT", SECTION_NONE,
 						 query->data, "", NULL,
 						 &(tbinfo->dobj.dumpId), 1,
@@ -9728,7 +9743,7 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 				TocEntry   *te;
 
 				te = ArchiveEntry(fout, dobj->catId, dobj->dumpId,
-								  dobj->name, NULL, NULL, "",
+								  dobj->name, NULL, NULL, "", NULL,
 								  "BLOBS", SECTION_DATA,
 								  "", "", NULL,
 								  NULL, 0,
@@ -9802,7 +9817,7 @@ dumpNamespace(Archive *fout, NamespaceInfo *nspinfo)
 		ArchiveEntry(fout, nspinfo->dobj.catId, nspinfo->dobj.dumpId,
 					 nspinfo->dobj.name,
 					 NULL, NULL,
-					 nspinfo->rolname,
+					 nspinfo->rolname, NULL,
 					 "SCHEMA", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -9938,7 +9953,7 @@ dumpExtension(Archive *fout, ExtensionInfo *extinfo)
 		ArchiveEntry(fout, extinfo->dobj.catId, extinfo->dobj.dumpId,
 					 extinfo->dobj.name,
 					 NULL, NULL,
-					 "",
+					 "", NULL,
 					 "EXTENSION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10090,6 +10105,7 @@ dumpEnumType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10217,6 +10233,7 @@ dumpRangeType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10290,6 +10307,7 @@ dumpUndefinedType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10572,6 +10590,7 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10729,6 +10748,7 @@ dumpDomain(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "DOMAIN", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10951,6 +10971,7 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -11085,7 +11106,7 @@ dumpCompositeTypeColComments(Archive *fout, TypeInfo *tyinfo)
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
 						 target->data,
 						 tyinfo->dobj.namespace->dobj.name,
-						 NULL, tyinfo->rolname,
+						 NULL, tyinfo->rolname, NULL,
 						 "COMMENT", SECTION_NONE,
 						 query->data, "", NULL,
 						 &(tyinfo->dobj.dumpId), 1,
@@ -11142,6 +11163,7 @@ dumpShellType(Archive *fout, ShellTypeInfo *stinfo)
 					 stinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 stinfo->baseType->rolname,
+					 NULL,
 					 "SHELL TYPE", SECTION_PRE_DATA,
 					 q->data, "", NULL,
 					 NULL, 0,
@@ -11251,7 +11273,7 @@ dumpProcLang(Archive *fout, ProcLangInfo *plang)
 	if (plang->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, plang->dobj.catId, plang->dobj.dumpId,
 					 plang->dobj.name,
-					 NULL, NULL, plang->lanowner,
+					 NULL, NULL, plang->lanowner, NULL,
 					 "PROCEDURAL LANGUAGE", SECTION_PRE_DATA,
 					 defqry->data, delqry->data, NULL,
 					 NULL, 0,
@@ -11924,6 +11946,7 @@ dumpFunc(Archive *fout, FuncInfo *finfo)
 					 finfo->dobj.namespace->dobj.name,
 					 NULL,
 					 finfo->rolname,
+					 NULL,
 					 keyword, SECTION_PRE_DATA,
 					 q->data, delqry->data, NULL,
 					 NULL, 0,
@@ -12056,7 +12079,7 @@ dumpCast(Archive *fout, CastInfo *cast)
 	if (cast->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, cast->dobj.catId, cast->dobj.dumpId,
 					 labelq->data,
-					 NULL, NULL, "",
+					 NULL, NULL, "", NULL,
 					 "CAST", SECTION_PRE_DATA,
 					 defqry->data, delqry->data, NULL,
 					 NULL, 0,
@@ -12184,7 +12207,7 @@ dumpTransform(Archive *fout, TransformInfo *transform)
 	if (transform->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, transform->dobj.catId, transform->dobj.dumpId,
 					 labelq->data,
-					 NULL, NULL, "",
+					 NULL, NULL, "", NULL,
 					 "TRANSFORM", SECTION_PRE_DATA,
 					 defqry->data, delqry->data, NULL,
 					 transform->dobj.dependencies, transform->dobj.nDeps,
@@ -12400,6 +12423,7 @@ dumpOpr(Archive *fout, OprInfo *oprinfo)
 					 oprinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 oprinfo->rolname,
+					 NULL,
 					 "OPERATOR", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -12546,6 +12570,9 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
 		case AMTYPE_INDEX:
 			appendPQExpBuffer(q, "TYPE INDEX ");
 			break;
+		case AMTYPE_TABLE:
+			appendPQExpBuffer(q, "TYPE TABLE ");
+			break;
 		default:
 			write_msg(NULL, "WARNING: invalid type \"%c\" of access method \"%s\"\n",
 					  aminfo->amtype, qamname);
@@ -12570,6 +12597,7 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
 					 NULL,
 					 NULL,
 					 "",
+					 NULL,
 					 "ACCESS METHOD", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -12936,6 +12964,7 @@ dumpOpclass(Archive *fout, OpclassInfo *opcinfo)
 					 opcinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 opcinfo->rolname,
+					 NULL,
 					 "OPERATOR CLASS", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13203,6 +13232,7 @@ dumpOpfamily(Archive *fout, OpfamilyInfo *opfinfo)
 					 opfinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 opfinfo->rolname,
+					 NULL,
 					 "OPERATOR FAMILY", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13346,6 +13376,7 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
 					 collinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 collinfo->rolname,
+					 NULL,
 					 "COLLATION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13441,6 +13472,7 @@ dumpConversion(Archive *fout, ConvInfo *convinfo)
 					 convinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 convinfo->rolname,
+					 NULL,
 					 "CONVERSION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13930,6 +13962,7 @@ dumpAgg(Archive *fout, AggInfo *agginfo)
 					 agginfo->aggfn.dobj.namespace->dobj.name,
 					 NULL,
 					 agginfo->aggfn.rolname,
+					 NULL,
 					 "AGGREGATE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14028,6 +14061,7 @@ dumpTSParser(Archive *fout, TSParserInfo *prsinfo)
 					 prsinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 "",
+					 NULL,
 					 "TEXT SEARCH PARSER", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14108,6 +14142,7 @@ dumpTSDictionary(Archive *fout, TSDictInfo *dictinfo)
 					 dictinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 dictinfo->rolname,
+					 NULL,
 					 "TEXT SEARCH DICTIONARY", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14169,6 +14204,7 @@ dumpTSTemplate(Archive *fout, TSTemplateInfo *tmplinfo)
 					 tmplinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 "",
+					 NULL,
 					 "TEXT SEARCH TEMPLATE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14289,6 +14325,7 @@ dumpTSConfig(Archive *fout, TSConfigInfo *cfginfo)
 					 cfginfo->dobj.namespace->dobj.name,
 					 NULL,
 					 cfginfo->rolname,
+					 NULL,
 					 "TEXT SEARCH CONFIGURATION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14355,6 +14392,7 @@ dumpForeignDataWrapper(Archive *fout, FdwInfo *fdwinfo)
 					 NULL,
 					 NULL,
 					 fdwinfo->rolname,
+					 NULL,
 					 "FOREIGN DATA WRAPPER", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14446,6 +14484,7 @@ dumpForeignServer(Archive *fout, ForeignServerInfo *srvinfo)
 					 NULL,
 					 NULL,
 					 srvinfo->rolname,
+					 NULL,
 					 "SERVER", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14564,6 +14603,7 @@ dumpUserMappings(Archive *fout,
 					 namespace,
 					 NULL,
 					 owner,
+					 NULL,
 					 "USER MAPPING", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 &dumpId, 1,
@@ -14643,6 +14683,7 @@ dumpDefaultACL(Archive *fout, DefaultACLInfo *daclinfo)
 					 daclinfo->dobj.namespace ? daclinfo->dobj.namespace->dobj.name : NULL,
 					 NULL,
 					 daclinfo->defaclrole,
+					 NULL,
 					 "DEFAULT ACL", SECTION_POST_DATA,
 					 q->data, "", NULL,
 					 NULL, 0,
@@ -14741,6 +14782,7 @@ dumpACL(Archive *fout, CatalogId objCatId, DumpId objDumpId,
 					 tag->data, nspname,
 					 NULL,
 					 owner ? owner : "",
+					 NULL,
 					 "ACL", SECTION_NONE,
 					 sql->data, "", NULL,
 					 &(objDumpId), 1,
@@ -14826,7 +14868,7 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
 
 		appendPQExpBuffer(tag, "%s %s", type, name);
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 tag->data, namespace, NULL, owner,
+					 tag->data, namespace, NULL, owner, NULL,
 					 "SECURITY LABEL", SECTION_NONE,
 					 query->data, "", NULL,
 					 &(dumpId), 1,
@@ -14908,7 +14950,7 @@ dumpTableSecLabel(Archive *fout, TableInfo *tbinfo, const char *reltypename)
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
 					 target->data,
 					 tbinfo->dobj.namespace->dobj.name,
-					 NULL, tbinfo->rolname,
+					 NULL, tbinfo->rolname, NULL,
 					 "SECURITY LABEL", SECTION_NONE,
 					 query->data, "", NULL,
 					 &(tbinfo->dobj.dumpId), 1,
@@ -15994,6 +16036,8 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 (tbinfo->relkind == RELKIND_VIEW) ? NULL : tbinfo->reltablespace,
 					 tbinfo->rolname,
+					 (tbinfo->relkind == RELKIND_RELATION) ?
+					 tbinfo->amname : NULL,
 					 reltypename,
 					 tbinfo->postponed_def ?
 					 SECTION_POST_DATA : SECTION_PRE_DATA,
@@ -16074,6 +16118,7 @@ dumpAttrDef(Archive *fout, AttrDefInfo *adinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "DEFAULT", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -16190,6 +16235,7 @@ dumpIndex(Archive *fout, IndxInfo *indxinfo)
 						 tbinfo->dobj.namespace->dobj.name,
 						 indxinfo->tablespace,
 						 tbinfo->rolname,
+						 NULL,
 						 "INDEX", SECTION_POST_DATA,
 						 q->data, delq->data, NULL,
 						 NULL, 0,
@@ -16234,6 +16280,7 @@ dumpIndexAttach(Archive *fout, IndexAttachInfo *attachinfo)
 					 attachinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 "",
+					 NULL,
 					 "INDEX ATTACH", SECTION_POST_DATA,
 					 q->data, "", NULL,
 					 NULL, 0,
@@ -16289,6 +16336,7 @@ dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo)
 					 statsextinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 statsextinfo->rolname,
+					 NULL,
 					 "STATISTICS", SECTION_POST_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -16450,6 +16498,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 						 tbinfo->dobj.namespace->dobj.name,
 						 indxinfo->tablespace,
 						 tbinfo->rolname,
+						 NULL,
 						 "CONSTRAINT", SECTION_POST_DATA,
 						 q->data, delq->data, NULL,
 						 NULL, 0,
@@ -16490,6 +16539,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 						 tbinfo->dobj.namespace->dobj.name,
 						 NULL,
 						 tbinfo->rolname,
+						 NULL,
 						 "FK CONSTRAINT", SECTION_POST_DATA,
 						 q->data, delq->data, NULL,
 						 NULL, 0,
@@ -16522,6 +16572,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 							 tbinfo->dobj.namespace->dobj.name,
 							 NULL,
 							 tbinfo->rolname,
+							 NULL,
 							 "CHECK CONSTRAINT", SECTION_POST_DATA,
 							 q->data, delq->data, NULL,
 							 NULL, 0,
@@ -16555,6 +16606,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 							 tyinfo->dobj.namespace->dobj.name,
 							 NULL,
 							 tyinfo->rolname,
+							 NULL,
 							 "CHECK CONSTRAINT", SECTION_POST_DATA,
 							 q->data, delq->data, NULL,
 							 NULL, 0,
@@ -16829,6 +16881,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "SEQUENCE", SECTION_PRE_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -16870,6 +16923,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
 							 tbinfo->dobj.namespace->dobj.name,
 							 NULL,
 							 tbinfo->rolname,
+							 NULL,
 							 "SEQUENCE OWNED BY", SECTION_PRE_DATA,
 							 query->data, "", NULL,
 							 &(tbinfo->dobj.dumpId), 1,
@@ -16938,6 +16992,7 @@ dumpSequenceData(Archive *fout, TableDataInfo *tdinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "SEQUENCE SET", SECTION_DATA,
 					 query->data, "", NULL,
 					 &(tbinfo->dobj.dumpId), 1,
@@ -17137,6 +17192,7 @@ dumpTrigger(Archive *fout, TriggerInfo *tginfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "TRIGGER", SECTION_POST_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -17223,7 +17279,7 @@ dumpEventTrigger(Archive *fout, EventTriggerInfo *evtinfo)
 	if (evtinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, evtinfo->dobj.catId, evtinfo->dobj.dumpId,
 					 evtinfo->dobj.name, NULL, NULL,
-					 evtinfo->evtowner,
+					 evtinfo->evtowner, NULL,
 					 "EVENT TRIGGER", SECTION_POST_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -17384,6 +17440,7 @@ dumpRule(Archive *fout, RuleInfo *rinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "RULE", SECTION_POST_DATA,
 					 cmd->data, delcmd->data, NULL,
 					 NULL, 0,
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 789d6a24e2..4024d0c1e3 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -324,6 +324,7 @@ typedef struct _tableInfo
 	char	   *partkeydef;		/* partition key definition */
 	char	   *partbound;		/* partition bound definition */
 	bool		needs_override; /* has GENERATED ALWAYS AS IDENTITY */
+	char	   *amname; 		/* table access method */
 
 	/*
 	 * Stuff computed only for dumpable tables.
psql_describe_am.patchapplication/octet-stream; name=psql_describe_am.patchDownload
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 0a181b01d9..3eef06ab7d 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1484,6 +1484,8 @@ describeOneTableDetails(const char *schemaname,
 		char	   *reloftype;
 		char		relpersistence;
 		char		relreplident;
+		char	   *relam;
+		bool	    relam_is_default;
 	}			tableinfo;
 	bool		show_column_details = false;
 
@@ -1503,9 +1505,11 @@ describeOneTableDetails(const char *schemaname,
 						  "c.relhastriggers, c.relrowsecurity, c.relforcerowsecurity, "
 						  "false AS relhasoids, %s, c.reltablespace, "
 						  "CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END, "
-						  "c.relpersistence, c.relreplident\n"
+						  "c.relpersistence, c.relreplident, am.amname,"
+						  "am.amname = current_setting('default_table_access_method') \n"
 						  "FROM pg_catalog.pg_class c\n "
 						  "LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
+						  "LEFT JOIN pg_catalog.pg_am am ON (c.relam = am.oid)\n"
 						  "WHERE c.oid = '%s';",
 						  (verbose ?
 						   "pg_catalog.array_to_string(c.reloptions || "
@@ -1656,6 +1660,17 @@ describeOneTableDetails(const char *schemaname,
 		*(PQgetvalue(res, 0, 11)) : 0;
 	tableinfo.relreplident = (pset.sversion >= 90400) ?
 		*(PQgetvalue(res, 0, 12)) : 'd';
+	if (pset.sversion >= 120000)
+	{
+		tableinfo.relam = PQgetisnull(res, 0, 13) ?
+			(char *) NULL : pg_strdup(PQgetvalue(res, 0, 13));
+		tableinfo.relam_is_default = strcmp(PQgetvalue(res, 0, 14), "t") == 0;
+	}
+	else
+	{
+		tableinfo.relam = NULL;
+		tableinfo.relam_is_default = false;
+	}
 	PQclear(res);
 	res = NULL;
 
@@ -3141,6 +3156,16 @@ describeOneTableDetails(const char *schemaname,
 		/* Tablespace info */
 		add_tablespace_footer(&cont, tableinfo.relkind, tableinfo.tablespace,
 							  true);
+
+		/* Access method info */
+		if (pset.sversion >= 120000 && verbose && tableinfo.relam != NULL &&
+		   !(pset.hide_tableam && tableinfo.relam_is_default))
+		{
+			printfPQExpBuffer(&buf, _("Access method: %s"), fmtId(tableinfo.relam));
+			printTableAddFooter(&cont, buf.data);
+		}
+
+
 	}
 
 	/* reloptions, if verbose */
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 176c85afd0..0c62dfac30 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -140,6 +140,7 @@ typedef struct _psqlSettings
 	const char *prompt3;
 	PGVerbosity verbosity;		/* current error verbosity level */
 	PGContextVisibility show_context;	/* current context display level */
+	bool		hide_tableam;
 } PsqlSettings;
 
 extern PsqlSettings pset;
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index e7536a8a06..b757febcc5 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -1128,6 +1128,11 @@ show_context_hook(const char *newval)
 	return true;
 }
 
+static bool
+hide_tableam_hook(const char *newval)
+{
+	return ParseVariableBool(newval, "HIDE_TABLEAM", &pset.hide_tableam);
+}
 
 static void
 EstablishVariableSpace(void)
@@ -1191,4 +1196,7 @@ EstablishVariableSpace(void)
 	SetVariableHooks(pset.vars, "SHOW_CONTEXT",
 					 show_context_substitute_hook,
 					 show_context_hook);
+	SetVariableHooks(pset.vars, "HIDE_TABLEAM",
+					 bool_substitute_hook,
+					 hide_tableam_hook);
 }
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 19bb538411..031c09422c 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 CREATE TEMP TABLE x (
 	a serial,
 	b int,
diff --git a/src/test/regress/expected/create_table.out b/src/test/regress/expected/create_table.out
index 7e52c27e3f..15c4474235 100644
--- a/src/test/regress/expected/create_table.out
+++ b/src/test/regress/expected/create_table.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 --
 -- CREATE_TABLE
 --
diff --git a/src/test/regress/expected/create_table_like.out b/src/test/regress/expected/create_table_like.out
index b582211270..08cbeadf0c 100644
--- a/src/test/regress/expected/create_table_like.out
+++ b/src/test/regress/expected/create_table_like.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 /* Test inheritance of structure (LIKE) */
 CREATE TABLE inhx (xx text DEFAULT 'text');
 /*
diff --git a/src/test/regress/expected/domain.out b/src/test/regress/expected/domain.out
index 0b5a9041b0..e2568008d9 100644
--- a/src/test/regress/expected/domain.out
+++ b/src/test/regress/expected/domain.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 --
 -- Test domains.
 --
diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out
index 4d82d3a7e8..60a28c09fb 100644
--- a/src/test/regress/expected/foreign_data.out
+++ b/src/test/regress/expected/foreign_data.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 --
 -- Test foreign-data wrapper and server management.
 --
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index f259d07535..5236af2744 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 --
 -- Test inheritance features
 --
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index 1cf6531c01..ab90db4d66 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 --
 -- insert with DEFAULT in the target_list
 --
diff --git a/src/test/regress/expected/matview.out b/src/test/regress/expected/matview.out
index 08cd4bea48..ff9ade106d 100644
--- a/src/test/regress/expected/matview.out
+++ b/src/test/regress/expected/matview.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 -- create a table to use as a basis for views and materialized views in various combinations
 CREATE TABLE mvtest_t (id int NOT NULL PRIMARY KEY, type text NOT NULL, amt numeric NOT NULL);
 INSERT INTO mvtest_t VALUES
diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out
index afbbdd543d..cc5d42dbf2 100644
--- a/src/test/regress/expected/publication.out
+++ b/src/test/regress/expected/publication.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 --
 -- PUBLICATION
 --
diff --git a/src/test/regress/expected/replica_identity.out b/src/test/regress/expected/replica_identity.out
index 175ecd2879..f3331ee833 100644
--- a/src/test/regress/expected/replica_identity.out
+++ b/src/test/regress/expected/replica_identity.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 CREATE TABLE test_replica_identity (
        id serial primary key,
        keya text not null,
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 1d12b01068..a3c3befb04 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 --
 -- Test of Row-level security feature
 --
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b68b8d273f..feb7cd95b1 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 --
 -- RULES
 -- From Jan's original setup_ruletest.sql and run_ruletest.sql
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
index d09326c182..bf209c14b0 100644
--- a/src/test/regress/expected/update.out
+++ b/src/test/regress/expected/update.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 --
 -- UPDATE syntax tests
 --
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index e36df8858e..bec93af438 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 CREATE TEMP TABLE x (
 	a serial,
 	b int,
diff --git a/src/test/regress/sql/create_table.sql b/src/test/regress/sql/create_table.sql
index a2cae9663c..fb1e9083a4 100644
--- a/src/test/regress/sql/create_table.sql
+++ b/src/test/regress/sql/create_table.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 --
 -- CREATE_TABLE
 --
diff --git a/src/test/regress/sql/create_table_like.sql b/src/test/regress/sql/create_table_like.sql
index 65c3880792..46889a5b1f 100644
--- a/src/test/regress/sql/create_table_like.sql
+++ b/src/test/regress/sql/create_table_like.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 /* Test inheritance of structure (LIKE) */
 CREATE TABLE inhx (xx text DEFAULT 'text');
 
diff --git a/src/test/regress/sql/domain.sql b/src/test/regress/sql/domain.sql
index 68da27de22..c12f0b3093 100644
--- a/src/test/regress/sql/domain.sql
+++ b/src/test/regress/sql/domain.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 --
 -- Test domains.
 --
diff --git a/src/test/regress/sql/foreign_data.sql b/src/test/regress/sql/foreign_data.sql
index d6fb3fae4e..3aba5a9236 100644
--- a/src/test/regress/sql/foreign_data.sql
+++ b/src/test/regress/sql/foreign_data.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 --
 -- Test foreign-data wrapper and server management.
 --
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 425052c1f4..40dffccc7d 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 --
 -- Test inheritance features
 --
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index a7f659bc2b..85b5ca909c 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 --
 -- insert with DEFAULT in the target_list
 --
diff --git a/src/test/regress/sql/matview.sql b/src/test/regress/sql/matview.sql
index d96175aa26..2ce509581d 100644
--- a/src/test/regress/sql/matview.sql
+++ b/src/test/regress/sql/matview.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 -- create a table to use as a basis for views and materialized views in various combinations
 CREATE TABLE mvtest_t (id int NOT NULL PRIMARY KEY, type text NOT NULL, amt numeric NOT NULL);
 INSERT INTO mvtest_t VALUES
diff --git a/src/test/regress/sql/publication.sql b/src/test/regress/sql/publication.sql
index 815410b3c5..d5a1370c74 100644
--- a/src/test/regress/sql/publication.sql
+++ b/src/test/regress/sql/publication.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 --
 -- PUBLICATION
 --
diff --git a/src/test/regress/sql/replica_identity.sql b/src/test/regress/sql/replica_identity.sql
index b08a3623b8..90db4a7c40 100644
--- a/src/test/regress/sql/replica_identity.sql
+++ b/src/test/regress/sql/replica_identity.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 CREATE TABLE test_replica_identity (
        id serial primary key,
        keya text not null,
diff --git a/src/test/regress/sql/rowsecurity.sql b/src/test/regress/sql/rowsecurity.sql
index 38e9b38bc4..4b1c2ee619 100644
--- a/src/test/regress/sql/rowsecurity.sql
+++ b/src/test/regress/sql/rowsecurity.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 --
 -- Test of Row-level security feature
 --
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index f4ee30ec8f..a54038f3a2 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 --
 -- RULES
 -- From Jan's original setup_ruletest.sql and run_ruletest.sql
diff --git a/src/test/regress/sql/update.sql b/src/test/regress/sql/update.sql
index c9bb3b53d3..837b6d1871 100644
--- a/src/test/regress/sql/update.sql
+++ b/src/test/regress/sql/update.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 --
 -- UPDATE syntax tests
 --
#73Andres Freund
andres@anarazel.de
In reply to: Amit Khandekar (#70)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-01-15 14:37:36 +0530, Amit Khandekar wrote:

Then for each of the calls, we would need to declare that structure
variable (with = {0}) and assign required fields in that structure
before passing it to ArchiveEntry(). But a major reason of
ArchiveEntry() is to avoid doing this and instead conveniently pass
those fields as parameters. This will cause unnecessary more lines of
code. I think better way is to have an ArchiveEntry() function with
limited number of parameters, and have an ArchiveEntryEx() with those
extra parameters which are not needed in usual cases.

I don't think that'll really solve the problem. I think it might be more
reasonable to rely on structs. Now that we can rely on designated
initializers for structs we can do something like

ArchiveEntry((ArchiveArgs){.tablespace = 3,
.dumpFn = somefunc,
...});

and unused arguments will automatically initialized to zero. Or we
could pass the struct as a pointer, might be more efficient (although I
doubt it matters here):

ArchiveEntry(&(ArchiveArgs){.tablespace = 3,
.dumpFn = somefunc,
...});

What do others think? It'd probably be a good idea to start a new
thread about this.

Greetings,

Andres Freund

#74Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Dmitry Dolgov (#72)
Re: Pluggable Storage - Andres's take

On Tue, 15 Jan 2019 at 17:58, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Tue, Jan 15, 2019 at 10:52 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Need to bump K_VERS_MINOR as well.

I've bumped it up, but somehow this change escaped the previous version. Now
should be there, thanks!

On Mon, 14 Jan 2019 at 18:36, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

+static void _selectTableAccessMethod(ArchiveHandle *AH, const char
*tablespace);
tablespace => tableam

This is yet to be addressed.

Fixed.

Thanks, the patch looks good to me. Of course there's the other thread
about ArchiveEntry arguments which may alter this patch, but
otherwise, I have no more comments on this patch.

Also I guess another attached patch should address the psql part, namely
displaying a table access method with \d+ and possibility to hide it with a
psql variable (HIDE_TABLEAM, but I'm open for suggestion about the name).

Will have a look at this one.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#75Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Khandekar (#74)
Re: Pluggable Storage - Andres's take

On Fri, 18 Jan 2019 at 10:13, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On Tue, 15 Jan 2019 at 17:58, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Also I guess another attached patch should address the psql part, namely
displaying a table access method with \d+ and possibility to hide it with a
psql variable (HIDE_TABLEAM, but I'm open for suggestion about the name).

I am ok with the name.

Will have a look at this one.

--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
 CREATE TEMP TABLE x (

I thought we wanted to avoid having to add this setting in individual
regression tests. Can't we do this in pg_regress as a common setting ?

+ /* Access method info */
+ if (pset.sversion >= 120000 && verbose && tableinfo.relam != NULL &&
+    !(pset.hide_tableam && tableinfo.relam_is_default))
+ {
+         printfPQExpBuffer(&buf, _("Access method: %s"),
fmtId(tableinfo.relam));

So this will make psql hide the access method if it's same as the
default. I understand that this was kind of concluded in the other
thread "Displaying and dumping of table access methods". But IMHO, if
the hide_tableam is false, we should *always* show the access method,
regardless of the default value. I mean, we can make it simple : off
means never show table-access, on means always show table-access,
regardless of the default access method. And this also will work with
regression tests. If some regression test wants specifically to output
the access method, it can have a "\SET HIDE_TABLEAM off" command.

If we hide the method if it's default, then for a regression test that
wants to forcibly show the table access method of all tables, it won't
show up for tables that have default access method.

------------

+ if (pset.sversion >= 120000 && verbose && tableinfo.relam != NULL &&

If the server does not support relam, tableinfo.relam will be NULL
anyways. So I think sversion check is not needed.

------------

+ printfPQExpBuffer(&buf, _("Access method: %s"), fmtId(tableinfo.relam));
fmtId is not required. In fact, we should display the access method
name as-is. fmtId is required only for identifiers present in SQL
queries.

-----------

+      printfPQExpBuffer(&buf, _("Access method: %s"), fmtId(tableinfo.relam));
+      printTableAddFooter(&cont, buf.data);
+   }
+
+
 }

Last two blank lines are not needed.

-----------

+ bool hide_tableam;
} PsqlSettings;

These variables, it seems, are supposed to be grouped together by type.

-----------

I believe you are going to add a new regression testcase for the change ?

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#76Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Amit Khandekar (#75)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Fri, Jan 18, 2019 at 11:22 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on

I thought we wanted to avoid having to add this setting in individual
regression tests. Can't we do this in pg_regress as a common setting ?

Yeah, you're probably right. Actually, I couldn't find anything that looks like
"common settings", and so far I've placed it into psql_start_test as a psql
argument. But not sure, maybe there is a better place.

+ /* Access method info */
+ if (pset.sversion >= 120000 && verbose && tableinfo.relam != NULL &&
+    !(pset.hide_tableam && tableinfo.relam_is_default))
+ {
+         printfPQExpBuffer(&buf, _("Access method: %s"),
fmtId(tableinfo.relam));

So this will make psql hide the access method if it's same as the
default. I understand that this was kind of concluded in the other
thread "Displaying and dumping of table access methods". But IMHO, if
the hide_tableam is false, we should *always* show the access method,
regardless of the default value. I mean, we can make it simple : off
means never show table-access, on means always show table-access,
regardless of the default access method. And this also will work with
regression tests. If some regression test wants specifically to output
the access method, it can have a "\SET HIDE_TABLEAM off" command.

If we hide the method if it's default, then for a regression test that
wants to forcibly show the table access method of all tables, it won't
show up for tables that have default access method.

I can't imagine, what kind of test would need to forcibly show the table access
method of all the tables? Even if you need to verify tableam for something,
maybe it's even easier just to select it from pg_am?

+ if (pset.sversion >= 120000 && verbose && tableinfo.relam != NULL &&

If the server does not support relam, tableinfo.relam will be NULL
anyways. So I think sversion check is not needed.
------------

+ printfPQExpBuffer(&buf, _("Access method: %s"), fmtId(tableinfo.relam));
fmtId is not required.
-----------
+      printfPQExpBuffer(&buf, _("Access method: %s"), fmtId(tableinfo.relam));
+      printTableAddFooter(&cont, buf.data);
+   }
+
+
}

Last two blank lines are not needed.

Right, fixed.

+ bool hide_tableam;
} PsqlSettings;

These variables, it seems, are supposed to be grouped together by type.

Well, this grouping looks strange for me. But since I don't have a strong
opinion, I moved the variable.

I believe you are going to add a new regression testcase for the change ?

Yep.

Attachments:

psql_describe_am_v2.patchapplication/octet-stream; name=psql_describe_am_v2.patchDownload
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 0a181b01d9..f76c734a28 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1484,6 +1484,8 @@ describeOneTableDetails(const char *schemaname,
 		char	   *reloftype;
 		char		relpersistence;
 		char		relreplident;
+		char	   *relam;
+		bool	    relam_is_default;
 	}			tableinfo;
 	bool		show_column_details = false;
 
@@ -1503,9 +1505,11 @@ describeOneTableDetails(const char *schemaname,
 						  "c.relhastriggers, c.relrowsecurity, c.relforcerowsecurity, "
 						  "false AS relhasoids, %s, c.reltablespace, "
 						  "CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END, "
-						  "c.relpersistence, c.relreplident\n"
+						  "c.relpersistence, c.relreplident, am.amname,"
+						  "am.amname = current_setting('default_table_access_method') \n"
 						  "FROM pg_catalog.pg_class c\n "
 						  "LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
+						  "LEFT JOIN pg_catalog.pg_am am ON (c.relam = am.oid)\n"
 						  "WHERE c.oid = '%s';",
 						  (verbose ?
 						   "pg_catalog.array_to_string(c.reloptions || "
@@ -1656,6 +1660,17 @@ describeOneTableDetails(const char *schemaname,
 		*(PQgetvalue(res, 0, 11)) : 0;
 	tableinfo.relreplident = (pset.sversion >= 90400) ?
 		*(PQgetvalue(res, 0, 12)) : 'd';
+	if (pset.sversion >= 120000)
+	{
+		tableinfo.relam = PQgetisnull(res, 0, 13) ?
+			(char *) NULL : pg_strdup(PQgetvalue(res, 0, 13));
+		tableinfo.relam_is_default = strcmp(PQgetvalue(res, 0, 14), "t") == 0;
+	}
+	else
+	{
+		tableinfo.relam = NULL;
+		tableinfo.relam_is_default = false;
+	}
 	PQclear(res);
 	res = NULL;
 
@@ -3141,6 +3156,14 @@ describeOneTableDetails(const char *schemaname,
 		/* Tablespace info */
 		add_tablespace_footer(&cont, tableinfo.relkind, tableinfo.tablespace,
 							  true);
+
+		/* Access method info */
+		if (verbose && tableinfo.relam != NULL &&
+		   !(pset.hide_tableam && tableinfo.relam_is_default))
+		{
+			printfPQExpBuffer(&buf, _("Access method: %s"), tableinfo.relam);
+			printTableAddFooter(&cont, buf.data);
+		}
 	}
 
 	/* reloptions, if verbose */
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 176c85afd0..058233b348 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -127,6 +127,7 @@ typedef struct _psqlSettings
 	bool		quiet;
 	bool		singleline;
 	bool		singlestep;
+	bool		hide_tableam;
 	int			fetch_count;
 	int			histsize;
 	int			ignoreeof;
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index e7536a8a06..b757febcc5 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -1128,6 +1128,11 @@ show_context_hook(const char *newval)
 	return true;
 }
 
+static bool
+hide_tableam_hook(const char *newval)
+{
+	return ParseVariableBool(newval, "HIDE_TABLEAM", &pset.hide_tableam);
+}
 
 static void
 EstablishVariableSpace(void)
@@ -1191,4 +1196,7 @@ EstablishVariableSpace(void)
 	SetVariableHooks(pset.vars, "SHOW_CONTEXT",
 					 show_context_substitute_hook,
 					 show_context_hook);
+	SetVariableHooks(pset.vars, "HIDE_TABLEAM",
+					 bool_substitute_hook,
+					 hide_tableam_hook);
 }
diff --git a/src/test/regress/pg_regress_main.c b/src/test/regress/pg_regress_main.c
index bd613e4fda..1b4bca704b 100644
--- a/src/test/regress/pg_regress_main.c
+++ b/src/test/regress/pg_regress_main.c
@@ -74,10 +74,11 @@ psql_start_test(const char *testname,
 	}
 
 	offset += snprintf(psql_cmd + offset, sizeof(psql_cmd) - offset,
-					   "\"%s%spsql\" -X -a -q -d \"%s\" < \"%s\" > \"%s\" 2>&1",
+					   "\"%s%spsql\" -X -a -q -d \"%s\" -v %s < \"%s\" > \"%s\" 2>&1",
 					   bindir ? bindir : "",
 					   bindir ? "/" : "",
 					   dblist->str,
+					   "HIDE_TABLEAM=\"on\"",
 					   infile,
 					   outfile);
 	if (offset >= sizeof(psql_cmd))
#77Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#69)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Tue, Jan 15, 2019 at 6:05 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-01-15 18:02:38 +1100, Haribabu Kommi wrote:

On Tue, Dec 11, 2018 at 1:13 PM Andres Freund <andres@anarazel.de>

wrote:

Hi,

On 2018-11-26 17:55:57 -0800, Andres Freund wrote:
Further tasks I'm not yet planning to tackle, that I'd welcome help on:
- pg_upgrade testing

I did the pg_upgrade testing from older version with some tables and

views

exists, and all of them are properly transformed into new server with

heap

as the default access method.

I will add the dimitry pg_dump patch and test the pg_upgrade to confirm
the proper access method is retained on the upgraded database.

- I think we should consider removing HeapTuple->t_tableOid, it should
imo live entirely in the slot

I removed the t_tableOid from HeapTuple and during testing I found some
problems with triggers, will post the patch once it is fixed.

Please note that I'm working on a heavily revised version of the patch
right now, trying to clean up a lot of things (you might have seen some
of the threads I started). I hope to post it ~Thursday. Local-ish
patches shouldn't be a problem though.

Yes, I am checking you other threads of refactoring and cleanups.
I will rebase this patch once the revised code is available.

I am not able to remove the complete t_tableOid from HeapTuple,
because of its use in triggers, as the slot is not available in triggers
and I need to store the tableOid also as part of the tuple.

Currently setting of t_tableOid is done only when the tuple is formed
from the slot, and it is use is replaced with slot member.

comments?

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0001-Reduce-the-use-of-HeapTuple-t_tableOid.patchapplication/octet-stream; name=0001-Reduce-the-use-of-HeapTuple-t_tableOid.patchDownload
From 58ee84b870221a70f8995fd27f1de0e83ec5602a Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Wed, 16 Jan 2019 18:43:47 +1100
Subject: [PATCH] Reduce the use of HeapTuple t_tableOid

Except trigger and where the HeapTuple is generated
and passed from slots, still the t_tableOid is used.
This needs to be replaced when the t_tableOid is stored
as a seperate variable/parameter.
---
 contrib/hstore/hstore_io.c                    |  2 -
 contrib/pg_visibility/pg_visibility.c         |  1 -
 contrib/pgstattuple/pgstatapprox.c            |  1 -
 contrib/pgstattuple/pgstattuple.c             |  3 +-
 contrib/postgres_fdw/postgres_fdw.c           | 14 ++++++-
 src/backend/access/common/heaptuple.c         |  7 ----
 src/backend/access/heap/heapam.c              | 41 ++++++-------------
 src/backend/access/heap/heapam_handler.c      | 29 ++++++-------
 src/backend/access/heap/heapam_visibility.c   | 20 ++++-----
 src/backend/access/heap/pruneheap.c           |  2 -
 src/backend/access/heap/tuptoaster.c          |  3 --
 src/backend/access/heap/vacuumlazy.c          |  2 -
 src/backend/access/index/genam.c              |  4 +-
 src/backend/catalog/indexing.c                |  2 +-
 src/backend/commands/analyze.c                |  2 +-
 src/backend/commands/functioncmds.c           |  3 +-
 src/backend/commands/schemacmds.c             |  1 -
 src/backend/commands/trigger.c                | 21 +++++-----
 src/backend/executor/execExprInterp.c         |  1 -
 src/backend/executor/execTuples.c             | 25 +++++++----
 src/backend/executor/execUtils.c              |  2 -
 src/backend/executor/nodeAgg.c                |  3 +-
 src/backend/executor/nodeGather.c             |  1 +
 src/backend/executor/nodeGatherMerge.c        |  1 +
 src/backend/executor/nodeIndexonlyscan.c      |  4 +-
 src/backend/executor/nodeIndexscan.c          |  3 +-
 src/backend/executor/nodeModifyTable.c        |  6 +--
 src/backend/executor/nodeSetOp.c              |  1 +
 src/backend/executor/spi.c                    |  1 -
 src/backend/executor/tqueue.c                 |  1 -
 src/backend/replication/logical/decode.c      |  9 ----
 .../replication/logical/reorderbuffer.c       |  4 +-
 src/backend/utils/adt/expandedrecord.c        |  1 -
 src/backend/utils/adt/jsonfuncs.c             |  2 -
 src/backend/utils/adt/rowtypes.c              | 10 -----
 src/backend/utils/cache/catcache.c            |  1 -
 src/backend/utils/sort/tuplesort.c            |  7 ++--
 src/include/access/heapam.h                   |  2 +-
 src/include/executor/tuptable.h               |  5 +--
 src/include/utils/tqual.h                     |  1 +
 src/pl/plpgsql/src/pl_exec.c                  |  2 -
 src/test/regress/regress.c                    |  1 -
 42 files changed, 98 insertions(+), 154 deletions(-)

diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 745497c76f..05244e77ef 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -845,7 +845,6 @@ hstore_from_record(PG_FUNCTION_ARGS)
 		/* Build a temporary HeapTuple control structure */
 		tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 		ItemPointerSetInvalid(&(tuple.t_self));
-		tuple.t_tableOid = InvalidOid;
 		tuple.t_data = rec;
 
 		values = (Datum *) palloc(ncolumns * sizeof(Datum));
@@ -998,7 +997,6 @@ hstore_populate_record(PG_FUNCTION_ARGS)
 		/* Build a temporary HeapTuple control structure */
 		tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 		ItemPointerSetInvalid(&(tuple.t_self));
-		tuple.t_tableOid = InvalidOid;
 		tuple.t_data = rec;
 	}
 
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index ce9ca704f6..1b1e00d724 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -655,7 +655,6 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
 			ItemPointerSet(&(tuple.t_self), blkno, offnum);
 			tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 			tuple.t_len = ItemIdGetLength(itemid);
-			tuple.t_tableOid = relid;
 
 			/*
 			 * If we're checking whether the page is all-visible, we expect
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index c59fd10dc1..cef8606550 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -152,7 +152,6 @@ statapprox_heap(Relation rel, output_type *stat)
 
 			tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 			tuple.t_len = ItemIdGetLength(itemid);
-			tuple.t_tableOid = RelationGetRelid(rel);
 
 			/*
 			 * We follow VACUUM's lead in counting INSERT_IN_PROGRESS tuples
diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 520438d779..a39b03bde5 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -344,7 +344,8 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 		/* must hold a buffer lock to call HeapTupleSatisfiesVisibility */
 		LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
 
-		if (HeapTupleSatisfies(tuple, &SnapshotDirty, hscan->rs_cbuf))
+		if (HeapTupleSatisfies(tuple, RelationGetRelid(hscan->rs_scan.rs_rd),
+								&SnapshotDirty, hscan->rs_cbuf))
 		{
 			stat.tuple_len += tuple->t_len;
 			stat.tuple_count++;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index cc5b928950..5355e0e00e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1445,6 +1445,8 @@ postgresIterateForeignScan(ForeignScanState *node)
 	 */
 	ExecStoreHeapTuple(fsstate->tuples[fsstate->next_tuple++],
 					   slot,
+					   fsstate->rel ?
+							   RelationGetRelid(fsstate->rel) : InvalidOid,
 					   false);
 
 	return slot;
@@ -3538,7 +3540,11 @@ store_returning_result(PgFdwModifyState *fmstate,
 											NULL,
 											fmstate->temp_cxt);
 		/* tuple will be deleted when it is cleared from the slot */
-		ExecStoreHeapTuple(newtup, slot, true);
+		ExecStoreHeapTuple(newtup,
+							slot,
+							fmstate->rel ?
+								RelationGetRelid(fmstate->rel) : InvalidOid,
+							true);
 	}
 	PG_CATCH();
 	{
@@ -3810,7 +3816,11 @@ get_returning_data(ForeignScanState *node)
 												dmstate->retrieved_attrs,
 												node,
 												dmstate->temp_cxt);
-			ExecStoreHeapTuple(newtup, slot, false);
+			ExecStoreHeapTuple(newtup,
+								slot,
+								dmstate->rel ?
+								RelationGetRelid(dmstate->rel) : InvalidOid,
+								false);
 		}
 		PG_CATCH();
 		{
diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index 06dd628a5b..5beef33291 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -689,7 +689,6 @@ heap_copytuple(HeapTuple tuple)
 	newTuple = (HeapTuple) palloc(HEAPTUPLESIZE + tuple->t_len);
 	newTuple->t_len = tuple->t_len;
 	newTuple->t_self = tuple->t_self;
-	newTuple->t_tableOid = tuple->t_tableOid;
 	newTuple->t_data = (HeapTupleHeader) ((char *) newTuple + HEAPTUPLESIZE);
 	memcpy((char *) newTuple->t_data, (char *) tuple->t_data, tuple->t_len);
 	return newTuple;
@@ -715,7 +714,6 @@ heap_copytuple_with_tuple(HeapTuple src, HeapTuple dest)
 
 	dest->t_len = src->t_len;
 	dest->t_self = src->t_self;
-	dest->t_tableOid = src->t_tableOid;
 	dest->t_data = (HeapTupleHeader) palloc(src->t_len);
 	memcpy((char *) dest->t_data, (char *) src->t_data, src->t_len);
 }
@@ -850,7 +848,6 @@ expand_tuple(HeapTuple *targetHeapTuple,
 			= targetTHeader
 			= (HeapTupleHeader) ((char *) *targetHeapTuple + HEAPTUPLESIZE);
 		(*targetHeapTuple)->t_len = len;
-		(*targetHeapTuple)->t_tableOid = sourceTuple->t_tableOid;
 		(*targetHeapTuple)->t_self = sourceTuple->t_self;
 
 		targetTHeader->t_infomask = sourceTHeader->t_infomask;
@@ -1078,7 +1075,6 @@ heap_form_tuple(TupleDesc tupleDescriptor,
 	 */
 	tuple->t_len = len;
 	ItemPointerSetInvalid(&(tuple->t_self));
-	tuple->t_tableOid = InvalidOid;
 
 	HeapTupleHeaderSetDatumLength(td, len);
 	HeapTupleHeaderSetTypeId(td, tupleDescriptor->tdtypeid);
@@ -1162,7 +1158,6 @@ heap_modify_tuple(HeapTuple tuple,
 	 */
 	newTuple->t_data->t_ctid = tuple->t_data->t_ctid;
 	newTuple->t_self = tuple->t_self;
-	newTuple->t_tableOid = tuple->t_tableOid;
 
 	return newTuple;
 }
@@ -1225,7 +1220,6 @@ heap_modify_tuple_by_cols(HeapTuple tuple,
 	 */
 	newTuple->t_data->t_ctid = tuple->t_data->t_ctid;
 	newTuple->t_self = tuple->t_self;
-	newTuple->t_tableOid = tuple->t_tableOid;
 
 	return newTuple;
 }
@@ -1465,7 +1459,6 @@ heap_tuple_from_minimal_tuple(MinimalTuple mtup)
 	result = (HeapTuple) palloc(HEAPTUPLESIZE + len);
 	result->t_len = len;
 	ItemPointerSetInvalid(&(result->t_self));
-	result->t_tableOid = InvalidOid;
 	result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
 	memcpy((char *) result->t_data + MINIMAL_TUPLE_OFFSET, mtup, mtup->t_len);
 	memset(result->t_data, 0, offsetof(HeapTupleHeaderData, t_infomask2));
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f769d828ff..22080fc5bc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -423,7 +423,6 @@ heapgetpage(TableScanDesc sscan, BlockNumber page)
 			HeapTupleData loctup;
 			bool		valid;
 
-			loctup.t_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 			loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
 			loctup.t_len = ItemIdGetLength(lpp);
 			ItemPointerSet(&(loctup.t_self), page, lineoff);
@@ -431,7 +430,8 @@ heapgetpage(TableScanDesc sscan, BlockNumber page)
 			if (all_visible)
 				valid = true;
 			else
-				valid = HeapTupleSatisfies(&loctup, snapshot, buffer);
+				valid = HeapTupleSatisfies(&loctup, RelationGetRelid(scan->rs_scan.rs_rd),
+											snapshot, buffer);
 
 			CheckForSerializableConflictOut(valid, scan->rs_scan.rs_rd, &loctup,
 											buffer, snapshot);
@@ -646,7 +646,8 @@ heapgettup(HeapScanDesc scan,
 				/*
 				 * if current tuple qualifies, return it.
 				 */
-				valid = HeapTupleSatisfies(tuple, snapshot, scan->rs_cbuf);
+				valid = HeapTupleSatisfies(tuple, RelationGetRelid(scan->rs_scan.rs_rd),
+											snapshot, scan->rs_cbuf);
 
 				CheckForSerializableConflictOut(valid, scan->rs_scan.rs_rd, tuple,
 												scan->rs_cbuf, snapshot);
@@ -1442,9 +1443,6 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 	if (!is_bitmapscan && snapshot)
 		PredicateLockRelation(relation, snapshot);
 
-	/* we only need to set this up once */
-	scan->rs_ctup.t_tableOid = RelationGetRelid(relation);
-
 	/*
 	 * we do this here instead of in initscan() because heap_rescan also calls
 	 * initscan() and we don't want to allocate memory again
@@ -1657,6 +1655,7 @@ heap_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *s
 
 	pgstat_count_heap_getnext(scan->rs_scan.rs_rd);
 
+	slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 	return ExecStoreBufferHeapTuple(&scan->rs_ctup, slot,
 									scan->rs_cbuf);
 }
@@ -1760,12 +1759,11 @@ heap_fetch(Relation relation,
 	ItemPointerCopy(tid, &(tuple->t_self));
 	tuple->t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tuple->t_len = ItemIdGetLength(lp);
-	tuple->t_tableOid = RelationGetRelid(relation);
 
 	/*
 	 * check time qualification of tuple, then release lock
 	 */
-	valid = HeapTupleSatisfies(tuple, snapshot, buffer);
+	valid = HeapTupleSatisfies(tuple, RelationGetRelid(relation), snapshot, buffer);
 
 	if (valid)
 		PredicateLockTuple(relation, tuple, snapshot);
@@ -1870,7 +1868,6 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 
 		heapTuple->t_data = (HeapTupleHeader) PageGetItem(dp, lp);
 		heapTuple->t_len = ItemIdGetLength(lp);
-		heapTuple->t_tableOid = RelationGetRelid(relation);
 		ItemPointerSetOffsetNumber(&heapTuple->t_self, offnum);
 
 		/*
@@ -1907,7 +1904,8 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 			ItemPointerSet(&(heapTuple->t_self), BufferGetBlockNumber(buffer), offnum);
 
 			/* If it's visible per the snapshot, we must return it */
-			valid = HeapTupleSatisfies(heapTuple, snapshot, buffer);
+			valid = HeapTupleSatisfies(heapTuple, RelationGetRelid(relation),
+										snapshot, buffer);
 			CheckForSerializableConflictOut(valid, relation, heapTuple,
 											buffer, snapshot);
 			/* reset to original, non-redirected, tid */
@@ -2064,7 +2062,6 @@ heap_get_latest_tid(Relation relation,
 		tp.t_self = ctid;
 		tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 		tp.t_len = ItemIdGetLength(lp);
-		tp.t_tableOid = RelationGetRelid(relation);
 
 		/*
 		 * After following a t_ctid link, we might arrive at an unrelated
@@ -2081,7 +2078,7 @@ heap_get_latest_tid(Relation relation,
 		 * Check time qualification of tuple; if visible, set it as the new
 		 * result candidate.
 		 */
-		valid = HeapTupleSatisfies(&tp, snapshot, buffer);
+		valid = HeapTupleSatisfies(&tp, RelationGetRelid(relation), snapshot, buffer);
 		CheckForSerializableConflictOut(valid, relation, &tp, buffer, snapshot);
 		if (valid)
 			*tid = ctid;
@@ -2433,7 +2430,6 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 
 	HeapTupleHeaderSetCmin(tup->t_data, cid);
 	HeapTupleHeaderSetXmax(tup->t_data, 0); /* for cleanliness */
-	tup->t_tableOid = RelationGetRelid(relation);
 
 	/*
 	 * If the new tuple is too big for storage or contains already toasted
@@ -2491,9 +2487,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	{
 		heaptuples[i] = heap_prepare_insert(relation, ExecFetchSlotHeapTuple(slots[i], true, NULL),
 											xid, cid, options);
-
-		if (slots[i]->tts_tableOid != InvalidOid)
-			heaptuples[i]->t_tableOid = slots[i]->tts_tableOid;
 	}
 
 	/*
@@ -2883,7 +2876,6 @@ heap_delete(Relation relation, ItemPointer tid,
 	lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
 	Assert(ItemIdIsNormal(lp));
 
-	tp.t_tableOid = RelationGetRelid(relation);
 	tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tp.t_len = ItemIdGetLength(lp);
 	tp.t_self = *tid;
@@ -3000,7 +2992,7 @@ l1:
 	if (crosscheck != InvalidSnapshot && result == HeapTupleMayBeUpdated)
 	{
 		/* Perform additional check for transaction-snapshot mode RI updates */
-		if (!HeapTupleSatisfies(&tp, crosscheck, buffer))
+		if (!HeapTupleSatisfies(&tp, RelationGetRelid(relation), crosscheck, buffer))
 			result = HeapTupleUpdated;
 	}
 
@@ -3404,14 +3396,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	 * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
 	 * properly.
 	 */
-	oldtup.t_tableOid = RelationGetRelid(relation);
 	oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	oldtup.t_len = ItemIdGetLength(lp);
 	oldtup.t_self = *otid;
 
-	/* the new tuple is ready, except for this: */
-	newtup->t_tableOid = RelationGetRelid(relation);
-
 	/* Determine columns modified by the update. */
 	modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
 												  &oldtup, newtup);
@@ -3642,7 +3630,7 @@ l2:
 	if (crosscheck != InvalidSnapshot && result == HeapTupleMayBeUpdated)
 	{
 		/* Perform additional check for transaction-snapshot mode RI updates */
-		if (!HeapTupleSatisfies(&oldtup, crosscheck, buffer))
+		if (!HeapTupleSatisfies(&oldtup, RelationGetRelid(relation), crosscheck, buffer))
 			result = HeapTupleUpdated;
 	}
 
@@ -4267,14 +4255,14 @@ ProjIndexIsUnchanged(Relation relation, HeapTuple oldtup, HeapTuple newtup)
 			int			i;
 
 			ResetExprContext(econtext);
-			ExecStoreHeapTuple(oldtup, slot, false);
+			ExecStoreHeapTuple(oldtup, slot, RelationGetRelid(relation),false);
 			FormIndexDatum(indexInfo,
 						   slot,
 						   estate,
 						   old_values,
 						   old_isnull);
 
-			ExecStoreHeapTuple(newtup, slot, false);
+			ExecStoreHeapTuple(newtup, slot, RelationGetRelid(relation), false);
 			FormIndexDatum(indexInfo,
 						   slot,
 						   estate,
@@ -4486,7 +4474,6 @@ heap_lock_tuple(Relation relation, ItemPointer tid,
 
 	tuple->t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tuple->t_len = ItemIdGetLength(lp);
-	tuple->t_tableOid = RelationGetRelid(relation);
 	tuple->t_self = *tid;
 
 l3:
@@ -6044,7 +6031,6 @@ heap_abort_speculative(Relation relation, HeapTuple tuple)
 	lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
 	Assert(ItemIdIsNormal(lp));
 
-	tp.t_tableOid = RelationGetRelid(relation);
 	tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tp.t_len = ItemIdGetLength(lp);
 	tp.t_self = *tid;
@@ -7747,7 +7733,6 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
 	HeapTupleHeader hdr = tup->t_data;
 
 	Assert(ItemPointerIsValid(&tup->t_self));
-	Assert(tup->t_tableOid != InvalidOid);
 
 	xlrec.top_xid = GetTopTransactionId();
 	xlrec.target_node = relation->rd_node;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 95513dfec8..9119bdf162 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -90,8 +90,6 @@ heapam_heap_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
 
 	/* Update the tuple with table oid */
 	slot->tts_tableOid = RelationGetRelid(relation);
-	if (slot->tts_tableOid != InvalidOid)
-		tuple->t_tableOid = slot->tts_tableOid;
 
 	/* Perform the insertion, and copy the resulting ItemPointer */
 	heap_insert(relation, tuple, cid, options, bistate);
@@ -110,8 +108,6 @@ heapam_heap_insert_speculative(Relation relation, TupleTableSlot *slot, CommandI
 
 	/* Update the tuple with table oid */
 	slot->tts_tableOid = RelationGetRelid(relation);
-	if (slot->tts_tableOid != InvalidOid)
-		tuple->t_tableOid = slot->tts_tableOid;
 
 	HeapTupleHeaderSetSpeculativeToken(tuple->t_data, specToken);
 
@@ -386,10 +382,6 @@ heapam_heap_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
 	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
 	HTSU_Result result;
 
-	/* Update the tuple with table oid */
-	if (slot->tts_tableOid != InvalidOid)
-		tuple->t_tableOid = slot->tts_tableOid;
-
 	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
 						 hufd, lockmode);
 	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
@@ -450,7 +442,7 @@ heapam_satisfies(Relation rel, TupleTableSlot *slot, Snapshot snapshot)
 	 * Caller should be holding pin, but not lock.
 	 */
 	LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
-	res = HeapTupleSatisfies(bslot->base.tuple, snapshot, bslot->buffer);
+	res = HeapTupleSatisfies(bslot->base.tuple, RelationGetRelid(rel), snapshot, bslot->buffer);
 	LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
 
 	return res;
@@ -984,7 +976,7 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 		MemoryContextReset(econtext->ecxt_per_tuple_memory);
 
 		/* Set up for predicate or expression evaluation */
-		ExecStoreHeapTuple(heapTuple, slot, false);
+		ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(sscan->rs_rd), false);
 
 		/*
 		 * In a partial index, discard tuples that don't satisfy the
@@ -1240,7 +1232,7 @@ validate_index_heapscan(Relation heapRelation,
 			MemoryContextReset(econtext->ecxt_per_tuple_memory);
 
 			/* Set up for predicate or expression evaluation */
-			ExecStoreHeapTuple(heapTuple, slot, false);
+			ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(sscan->rs_rd), false);
 
 			/*
 			 * In a partial index, discard tuples that don't satisfy the
@@ -1393,9 +1385,11 @@ heapam_scan_bitmap_pagescan(TableScanDesc sscan,
 				continue;
 			loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
 			loctup.t_len = ItemIdGetLength(lp);
-			loctup.t_tableOid = scan->rs_scan.rs_rd->rd_id;
 			ItemPointerSet(&loctup.t_self, page, offnum);
-			valid = HeapTupleSatisfies(&loctup, snapshot, buffer);
+			valid = HeapTupleSatisfies(&loctup,
+							RelationGetRelid(scan->rs_scan.rs_rd),
+							snapshot,
+							buffer);
 			if (valid)
 			{
 				scan->rs_vistuples[ntup++] = offnum;
@@ -1432,7 +1426,6 @@ heapam_scan_bitmap_pagescan_next(TableScanDesc sscan, TupleTableSlot *slot)
 
 	scan->rs_ctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
 	scan->rs_ctup.t_len = ItemIdGetLength(lp);
-	scan->rs_ctup.t_tableOid = scan->rs_scan.rs_rd->rd_id;
 	ItemPointerSet(&scan->rs_ctup.t_self, scan->rs_cblock, targoffset);
 
 	pgstat_count_heap_fetch(scan->rs_scan.rs_rd);
@@ -1444,6 +1437,7 @@ heapam_scan_bitmap_pagescan_next(TableScanDesc sscan, TupleTableSlot *slot)
 	ExecStoreBufferHeapTuple(&scan->rs_ctup,
 							 slot,
 							 scan->rs_cbuf);
+	slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 
 	scan->rs_cindex++;
 
@@ -1490,7 +1484,9 @@ SampleHeapTupleVisible(HeapScanDesc scan, Buffer buffer,
 	else
 	{
 		/* Otherwise, we have to check the tuple individually. */
-		return HeapTupleSatisfies(tuple, scan->rs_scan.rs_snapshot, buffer);
+		return HeapTupleSatisfies(tuple,
+									RelationGetRelid(scan->rs_scan.rs_rd),
+									scan->rs_scan.rs_snapshot, buffer);
 	}
 }
 
@@ -1635,6 +1631,7 @@ heapam_scan_sample_next_tuple(TableScanDesc sscan, struct SampleScanState *scans
 				continue;
 
 			ExecStoreBufferHeapTuple(tuple, slot, scan->rs_cbuf);
+			slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 
 			/* Found visible tuple, return it. */
 			if (!pagemode)
@@ -1720,7 +1717,6 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
 
 		ItemPointerSet(&targtuple->t_self, scan->rs_cblock, scan->rs_cindex);
 
-		targtuple->t_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 		targtuple->t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
 		targtuple->t_len = ItemIdGetLength(itemid);
 
@@ -1792,6 +1788,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
 		if (sample_it)
 		{
 			ExecStoreBufferHeapTuple(targtuple, slot, scan->rs_cbuf);
+			slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 			scan->rs_cindex++;
 
 			/* note that we leave the buffer locked here! */
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 1ac1a20c1d..cd4d3af3c3 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -179,7 +179,6 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
@@ -370,7 +369,6 @@ HeapTupleSatisfiesToast(HeapTuple htup, Snapshot snapshot,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
@@ -464,7 +462,6 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
@@ -757,7 +754,6 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	snapshot->xmin = snapshot->xmax = InvalidTransactionId;
 	snapshot->speculativeToken = 0;
@@ -981,7 +977,6 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
@@ -1183,7 +1178,6 @@ HeapTupleSatisfiesVacuum(HeapTuple stup, TransactionId OldestXmin,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	/*
 	 * Has inserting transaction committed?
@@ -1611,7 +1605,6 @@ HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin)
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	/*
 	 * If the inserting transaction is marked invalid, then it aborted, and
@@ -1675,7 +1668,7 @@ TransactionIdInArray(TransactionId xid, TransactionId *xip, Size num)
  * complicated than when dealing "only" with the present.
  */
 static bool
-HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
+HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Oid relid, Snapshot snapshot,
 							   Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
@@ -1683,7 +1676,6 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 	TransactionId xmax = HeapTupleHeaderGetRawXmax(tuple);
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	/* inserting transaction aborted */
 	if (HeapTupleHeaderXminInvalid(tuple))
@@ -1704,7 +1696,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 		 * values externally.
 		 */
 		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(), snapshot,
-												 htup, buffer,
+												 htup, relid, buffer,
 												 &cmin, &cmax);
 
 		if (!resolved)
@@ -1775,7 +1767,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 
 		/* Lookup actual cmin/cmax values */
 		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(), snapshot,
-												 htup, buffer,
+												 htup, relid, buffer,
 												 &cmin, &cmax);
 
 		if (!resolved)
@@ -1813,8 +1805,10 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 }
 
 bool
-HeapTupleSatisfies(HeapTuple stup, Snapshot snapshot, Buffer buffer)
+HeapTupleSatisfies(HeapTuple stup, Oid relid, Snapshot snapshot, Buffer buffer)
 {
+	Assert(relid != InvalidOid);
+
 	switch (snapshot->visibility_type)
 	{
 		case MVCC_VISIBILITY:
@@ -1833,7 +1827,7 @@ HeapTupleSatisfies(HeapTuple stup, Snapshot snapshot, Buffer buffer)
 			return HeapTupleSatisfiesDirty(stup, snapshot, buffer);
 			break;
 		case HISTORIC_MVCC_VISIBILITY:
-			return HeapTupleSatisfiesHistoricMVCC(stup, snapshot, buffer);
+			return HeapTupleSatisfiesHistoricMVCC(stup, relid, snapshot, buffer);
 			break;
 		case NON_VACUUMABLE_VISIBILTY:
 			return HeapTupleSatisfiesNonVacuumable(stup, snapshot, buffer);
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c2f5343dac..79964e157a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -367,8 +367,6 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 				i;
 	HeapTupleData tup;
 
-	tup.t_tableOid = RelationGetRelid(relation);
-
 	rootlp = PageGetItemId(dp, rootoffnum);
 
 	/*
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 486cde4aff..5349dbf805 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1023,7 +1023,6 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 		result_tuple = (HeapTuple) palloc0(HEAPTUPLESIZE + new_tuple_len);
 		result_tuple->t_len = new_tuple_len;
 		result_tuple->t_self = newtup->t_self;
-		result_tuple->t_tableOid = newtup->t_tableOid;
 		new_data = (HeapTupleHeader) ((char *) result_tuple + HEAPTUPLESIZE);
 		result_tuple->t_data = new_data;
 
@@ -1124,7 +1123,6 @@ toast_flatten_tuple(HeapTuple tup, TupleDesc tupleDesc)
 	 * a syscache entry.
 	 */
 	new_tuple->t_self = tup->t_self;
-	new_tuple->t_tableOid = tup->t_tableOid;
 
 	new_tuple->t_data->t_choice = tup->t_data->t_choice;
 	new_tuple->t_data->t_ctid = tup->t_data->t_ctid;
@@ -1195,7 +1193,6 @@ toast_flatten_tuple_to_datum(HeapTupleHeader tup,
 	/* Build a temporary HeapTuple control structure */
 	tmptup.t_len = tup_len;
 	ItemPointerSetInvalid(&(tmptup.t_self));
-	tmptup.t_tableOid = InvalidOid;
 	tmptup.t_data = tup;
 
 	/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 429f9ad52a..87589a927e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1004,7 +1004,6 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 			tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 			tuple.t_len = ItemIdGetLength(itemid);
-			tuple.t_tableOid = RelationGetRelid(onerel);
 
 			tupgone = false;
 
@@ -2238,7 +2237,6 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(rel);
 
 		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
 		{
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5f033c5ee4..02afc191b6 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -462,7 +462,7 @@ systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup)
 		Assert(BufferIsValid(hscan->xs_cbuf));
 		/* must hold a buffer lock to call HeapTupleSatisfiesVisibility */
 		LockBuffer(hscan->xs_cbuf, BUFFER_LOCK_SHARE);
-		result = HeapTupleSatisfies(tup, freshsnap, hscan->xs_cbuf);
+		result = HeapTupleSatisfies(tup, RelationGetRelid(sysscan->heap_rel), freshsnap, hscan->xs_cbuf);
 		LockBuffer(hscan->xs_cbuf, BUFFER_LOCK_UNLOCK);
 	}
 	else
@@ -474,7 +474,7 @@ systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup)
 		Assert(BufferIsValid(scan->rs_cbuf));
 		/* must hold a buffer lock to call HeapTupleSatisfiesVisibility */
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
-		result = HeapTupleSatisfies(tup, freshsnap, scan->rs_cbuf);
+		result = HeapTupleSatisfies(tup, RelationGetRelid(sysscan->heap_rel), freshsnap, scan->rs_cbuf);
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 	}
 	return result;
diff --git a/src/backend/catalog/indexing.c b/src/backend/catalog/indexing.c
index 52a2ccb40f..88b8df0b7a 100644
--- a/src/backend/catalog/indexing.c
+++ b/src/backend/catalog/indexing.c
@@ -97,7 +97,7 @@ CatalogIndexInsert(CatalogIndexState indstate, HeapTuple heapTuple)
 	/* Need a slot to hold the tuple being examined */
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
 									&TTSOpsHeapTuple);
-	ExecStoreHeapTuple(heapTuple, slot, false);
+	ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(heapRelation), false);
 
 	/*
 	 * for each index, form and insert the index tuple
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 29e2377b52..b2697dae44 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -758,7 +758,7 @@ compute_index_stats(Relation onerel, double totalrows,
 			ResetExprContext(econtext);
 
 			/* Set up for predicate or expression evaluation */
-			ExecStoreHeapTuple(heapTuple, slot, false);
+			ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(onerel), false);
 
 			/* If index is partial, check predicate */
 			if (predicate != NULL)
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index ebece4d1d7..a9f7d55940 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -2360,10 +2360,9 @@ ExecuteCallStmt(CallStmt *stmt, ParamListInfo params, bool atomic, DestReceiver
 
 		rettupdata.t_len = HeapTupleHeaderGetDatumLength(td);
 		ItemPointerSetInvalid(&(rettupdata.t_self));
-		rettupdata.t_tableOid = InvalidOid;
 		rettupdata.t_data = td;
 
-		slot = ExecStoreHeapTuple(&rettupdata, tstate->slot, false);
+		slot = ExecStoreHeapTuple(&rettupdata, tstate->slot, InvalidOid, false);
 		tstate->dest->receiveSlot(slot, tstate->dest);
 
 		end_tup_output(tstate);
diff --git a/src/backend/commands/schemacmds.c b/src/backend/commands/schemacmds.c
index f0ebe2d1c3..ba31f46019 100644
--- a/src/backend/commands/schemacmds.c
+++ b/src/backend/commands/schemacmds.c
@@ -355,7 +355,6 @@ AlterSchemaOwner_internal(HeapTuple tup, Relation rel, Oid newOwnerId)
 {
 	Form_pg_namespace nspForm;
 
-	Assert(tup->t_tableOid == NamespaceRelationId);
 	Assert(RelationGetRelid(rel) == NamespaceRelationId);
 
 	nspForm = (Form_pg_namespace) GETSTRUCT(tup);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 6a00a96f59..313222008d 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2573,7 +2573,7 @@ ExecBRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
 		}
 		if (newtuple != oldtuple)
 		{
-			ExecForceStoreHeapTuple(newtuple, slot);
+			ExecForceStoreHeapTuple(newtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
 			newtuple = ExecFetchSlotHeapTuple(slot, true, NULL);
 		}
 	}
@@ -2653,7 +2653,8 @@ ExecIRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
 		}
 		if (oldtuple != newtuple)
 		{
-			ExecForceStoreHeapTuple(newtuple, LocTriggerData.tg_trigslot);
+			ExecForceStoreHeapTuple(newtuple, LocTriggerData.tg_trigslot,
+									RelationGetRelid(relinfo->ri_RelationDesc));
 			newtuple = ExecFetchSlotHeapTuple(slot, true, NULL);
 		}
 	}
@@ -2777,7 +2778,7 @@ ExecBRDeleteTriggers(EState *estate, EPQState *epqstate,
 	else
 	{
 		trigtuple = fdw_trigtuple;
-		ExecForceStoreHeapTuple(trigtuple, slot);
+		ExecForceStoreHeapTuple(trigtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
 	}
 
 	LocTriggerData.type = T_TriggerData;
@@ -2854,7 +2855,7 @@ ExecARDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
 		}
 		else
 		{
-			ExecForceStoreHeapTuple(fdw_trigtuple, slot);
+			ExecForceStoreHeapTuple(fdw_trigtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
 		}
 
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_DELETE,
@@ -2884,7 +2885,7 @@ ExecIRDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
 	LocTriggerData.tg_oldtable = NULL;
 	LocTriggerData.tg_newtable = NULL;
 
-	ExecForceStoreHeapTuple(trigtuple, slot);
+	ExecForceStoreHeapTuple(trigtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 	for (i = 0; i < trigdesc->numtriggers; i++)
 	{
@@ -3044,7 +3045,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 	}
 	else
 	{
-		ExecForceStoreHeapTuple(fdw_trigtuple, oldslot);
+		ExecForceStoreHeapTuple(fdw_trigtuple, oldslot, RelationGetRelid(relinfo->ri_RelationDesc));
 		trigtuple = fdw_trigtuple;
 	}
 
@@ -3090,7 +3091,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 		}
 
 		if (newtuple != oldtuple)
-			ExecForceStoreHeapTuple(newtuple, newslot);
+			ExecForceStoreHeapTuple(newtuple, newslot, RelationGetRelid(relinfo->ri_RelationDesc));
 	}
 	if (false && trigtuple != fdw_trigtuple && trigtuple != newtuple)
 		heap_freetuple(trigtuple);
@@ -3132,7 +3133,7 @@ ExecARUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 							   NULL,
 							   NULL);
 		else if (fdw_trigtuple != NULL)
-			ExecForceStoreHeapTuple(fdw_trigtuple, oldslot);
+			ExecForceStoreHeapTuple(fdw_trigtuple, oldslot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_UPDATE,
 							  true, oldslot, newslot, recheckIndexes,
@@ -3161,7 +3162,7 @@ ExecIRUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 	LocTriggerData.tg_oldtable = NULL;
 	LocTriggerData.tg_newtable = NULL;
 
-	ExecForceStoreHeapTuple(trigtuple, oldslot);
+	ExecForceStoreHeapTuple(trigtuple, oldslot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 	for (i = 0; i < trigdesc->numtriggers; i++)
 	{
@@ -3193,7 +3194,7 @@ ExecIRUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 			return false;		/* "do nothing" */
 
 		if (oldtuple != newtuple)
-			ExecForceStoreHeapTuple(newtuple, newslot);
+			ExecForceStoreHeapTuple(newtuple, newslot, RelationGetRelid(relinfo->ri_RelationDesc));
 	}
 
 	return true;
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 6cac1cf99c..b2a70bc07d 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3004,7 +3004,6 @@ ExecEvalFieldStoreDeForm(ExprState *state, ExprEvalStep *op, ExprContext *econte
 		tuphdr = DatumGetHeapTupleHeader(tupDatum);
 		tmptup.t_len = HeapTupleHeaderGetDatumLength(tuphdr);
 		ItemPointerSetInvalid(&(tmptup.t_self));
-		tmptup.t_tableOid = InvalidOid;
 		tmptup.t_data = tuphdr;
 
 		heap_deform_tuple(&tmptup, tupDesc,
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index d91a71a7c1..ac8e8dc8cd 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -383,7 +383,7 @@ tts_heap_copyslot(TupleTableSlot *dstslot, TupleTableSlot *srcslot)
 	tuple = ExecCopySlotHeapTuple(srcslot);
 	MemoryContextSwitchTo(oldcontext);
 
-	ExecStoreHeapTuple(tuple, dstslot, true);
+	ExecStoreHeapTuple(tuple, dstslot, srcslot->tts_tableOid, true);
 }
 
 static HeapTuple
@@ -1126,6 +1126,7 @@ MakeTupleTableSlot(TupleDesc tupleDesc,
 	slot->tts_tupleDescriptor = tupleDesc;
 	slot->tts_mcxt = CurrentMemoryContext;
 	slot->tts_nvalid = 0;
+	slot->tts_tableOid = InvalidOid;
 
 	if (tupleDesc != NULL)
 	{
@@ -1388,6 +1389,7 @@ ExecSetSlotDescriptor(TupleTableSlot *slot, /* slot to change */
 TupleTableSlot *
 ExecStoreHeapTuple(HeapTuple tuple,
 				   TupleTableSlot *slot,
+				   Oid relid,
 				   bool shouldFree)
 {
 	/*
@@ -1405,7 +1407,7 @@ ExecStoreHeapTuple(HeapTuple tuple,
 	else
 		elog(ERROR, "trying to store a heap tuple into wrong type of slot");
 
-	slot->tts_tableOid = tuple->t_tableOid;
+	slot->tts_tableOid = relid;
 
 	return slot;
 }
@@ -1446,8 +1448,6 @@ ExecStoreBufferHeapTuple(HeapTuple tuple,
 		elog(ERROR, "trying to store an on-disk heap tuple into wrong type of slot");
 	tts_buffer_heap_store_tuple(slot, tuple, buffer);
 
-	slot->tts_tableOid = tuple->t_tableOid;
-
 	return slot;
 }
 
@@ -1482,11 +1482,12 @@ ExecStoreMinimalTuple(MinimalTuple mtup,
  */
 void
 ExecForceStoreHeapTuple(HeapTuple tuple,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						Oid relid)
 {
 	if (TTS_IS_HEAPTUPLE(slot))
 	{
-		ExecStoreHeapTuple(tuple, slot, false);
+		ExecStoreHeapTuple(tuple, slot, relid, false);
 	}
 	else if (TTS_IS_BUFFERTUPLE(slot))
 	{
@@ -1499,6 +1500,7 @@ ExecForceStoreHeapTuple(HeapTuple tuple,
 		oldContext = MemoryContextSwitchTo(slot->tts_mcxt);
 		bslot->base.tuple = heap_copytuple(tuple);
 		MemoryContextSwitchTo(oldContext);
+		slot->tts_tableOid = relid;
 	}
 	else
 	{
@@ -1506,6 +1508,7 @@ ExecForceStoreHeapTuple(HeapTuple tuple,
 		heap_deform_tuple(tuple, slot->tts_tupleDescriptor,
 						  slot->tts_values, slot->tts_isnull);
 		ExecStoreVirtualTuple(slot);
+		slot->tts_tableOid = relid;
 	}
 }
 
@@ -1639,6 +1642,8 @@ ExecStoreAllNullTuple(TupleTableSlot *slot)
 HeapTuple
 ExecFetchSlotHeapTuple(TupleTableSlot *slot, bool materialize, bool *shouldFree)
 {
+	HeapTuple htup;
+
 	/*
 	 * sanity checks
 	 */
@@ -1653,14 +1658,18 @@ ExecFetchSlotHeapTuple(TupleTableSlot *slot, bool materialize, bool *shouldFree)
 	{
 		if (shouldFree)
 			*shouldFree = true;
-		return slot->tts_ops->copy_heap_tuple(slot);
+		htup = slot->tts_ops->copy_heap_tuple(slot);
 	}
 	else
 	{
 		if (shouldFree)
 			*shouldFree = false;
-		return slot->tts_ops->get_heap_tuple(slot);
+		htup = slot->tts_ops->get_heap_tuple(slot);
 	}
+
+	htup->t_tableOid = slot->tts_tableOid;
+
+	return htup;
 }
 
 /* --------------------------------
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 4031642b80..db2020bd0d 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1070,7 +1070,6 @@ GetAttributeByName(HeapTupleHeader tuple, const char *attname, bool *isNull)
 	 */
 	tmptup.t_len = HeapTupleHeaderGetDatumLength(tuple);
 	ItemPointerSetInvalid(&(tmptup.t_self));
-	tmptup.t_tableOid = InvalidOid;
 	tmptup.t_data = tuple;
 
 	result = heap_getattr(&tmptup,
@@ -1118,7 +1117,6 @@ GetAttributeByNum(HeapTupleHeader tuple,
 	 */
 	tmptup.t_len = HeapTupleHeaderGetDatumLength(tuple);
 	ItemPointerSetInvalid(&(tmptup.t_self));
-	tmptup.t_tableOid = InvalidOid;
 	tmptup.t_data = tuple;
 
 	result = heap_getattr(&tmptup,
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index daf56cd3d1..ced48bb791 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -1801,7 +1801,8 @@ agg_retrieve_direct(AggState *aggstate)
 				 * cleared from the slot.
 				 */
 				ExecForceStoreHeapTuple(aggstate->grp_firstTuple,
-								   firstSlot);
+								   firstSlot,
+								   InvalidOid);
 				aggstate->grp_firstTuple = NULL;	/* don't keep two pointers */
 
 				/* set up for first advance_aggregates call */
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 1dd8bb3f3a..1d4d79a3ab 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -280,6 +280,7 @@ gather_getnext(GatherState *gatherstate)
 			{
 				ExecStoreHeapTuple(tup, /* tuple to store */
 								   fslot,	/* slot to store the tuple */
+								   InvalidOid,
 								   true);	/* pfree tuple when done with it */
 				return fslot;
 			}
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 54ef0ca7b7..73625965b2 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -703,6 +703,7 @@ gather_merge_readnext(GatherMergeState *gm_state, int reader, bool nowait)
 	ExecStoreHeapTuple(tup,			/* tuple to store */
 					   gm_state->gm_slots[reader],	/* slot in which to store
 													 * the tuple */
+					   InvalidOid,
 					   true);		/* pfree tuple when done with it */
 
 	return true;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c39c4f453d..3c51c4f635 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -203,8 +203,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 			 */
 			Assert(slot->tts_tupleDescriptor->natts ==
 				   scandesc->xs_hitupdesc->natts);
-			ExecForceStoreHeapTuple(scandesc->xs_hitup, slot);
-			slot->tts_tableOid = RelationGetRelid(scandesc->heapRelation);
+			ExecForceStoreHeapTuple(scandesc->xs_hitup, slot,
+					RelationGetRelid(scandesc->heapRelation));
 		}
 		else if (scandesc->xs_itup)
 			StoreIndexTuple(slot, scandesc->xs_itup, scandesc->xs_itupdesc);
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index b38dadaa9a..28b3bdb4d4 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -247,8 +247,7 @@ IndexNextWithReorder(IndexScanState *node)
 				tuple = reorderqueue_pop(node);
 
 				/* Pass 'true', as the tuple in the queue is a palloc'd copy */
-				slot->tts_tableOid = RelationGetRelid(scandesc->heapRelation);
-				ExecStoreHeapTuple(tuple, slot, true);
+				ExecStoreHeapTuple(tuple, slot, RelationGetRelid(scandesc->heapRelation), true);
 				return slot;
 			}
 		}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index d1ac9fc2e9..8aa7501830 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -842,7 +842,7 @@ ldelete:;
 			slot = ExecTriggerGetReturnSlot(estate, resultRelationDesc);
 			if (oldtuple != NULL)
 			{
-				ExecForceStoreHeapTuple(oldtuple, slot);
+				ExecForceStoreHeapTuple(oldtuple, slot, RelationGetRelid(resultRelationDesc));
 			}
 			else
 			{
@@ -2035,10 +2035,6 @@ ExecModifyTable(PlanState *pstate)
 					oldtupdata.t_len =
 						HeapTupleHeaderGetDatumLength(oldtupdata.t_data);
 					ItemPointerSetInvalid(&(oldtupdata.t_self));
-					/* Historically, view triggers see invalid t_tableOid. */
-					oldtupdata.t_tableOid =
-						(relkind == RELKIND_VIEW) ? InvalidOid :
-						RelationGetRelid(resultRelInfo->ri_RelationDesc);
 
 					oldtuple = &oldtupdata;
 				}
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 48b7aa9b8b..4f7da00d82 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -269,6 +269,7 @@ setop_retrieve_direct(SetOpState *setopstate)
 		 */
 		ExecStoreHeapTuple(setopstate->grp_firstTuple,
 						   resultTupleSlot,
+						   InvalidOid,
 						   true);
 		setopstate->grp_firstTuple = NULL;	/* don't keep two pointers */
 
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 34664e76d1..d9398ed527 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -870,7 +870,6 @@ SPI_modifytuple(Relation rel, HeapTuple tuple, int natts, int *attnum,
 		 */
 		mtuple->t_data->t_ctid = tuple->t_data->t_ctid;
 		mtuple->t_self = tuple->t_self;
-		mtuple->t_tableOid = tuple->t_tableOid;
 	}
 	else
 	{
diff --git a/src/backend/executor/tqueue.c b/src/backend/executor/tqueue.c
index e2b596cf74..d3ef18e264 100644
--- a/src/backend/executor/tqueue.c
+++ b/src/backend/executor/tqueue.c
@@ -206,7 +206,6 @@ TupleQueueReaderNext(TupleQueueReader *reader, bool nowait, bool *done)
 	 * (which had better be sufficiently aligned).
 	 */
 	ItemPointerSetInvalid(&htup.t_self);
-	htup.t_tableOid = InvalidOid;
 	htup.t_len = nbytes;
 	htup.t_data = data;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index e3b05657f8..a3430e1336 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -940,12 +940,6 @@ DecodeMultiInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			/* not a disk based tuple */
 			ItemPointerSetInvalid(&tuple->tuple.t_self);
 
-			/*
-			 * We can only figure this out after reassembling the
-			 * transactions.
-			 */
-			tuple->tuple.t_tableOid = InvalidOid;
-
 			tuple->tuple.t_len = datalen + SizeofHeapTupleHeader;
 
 			memset(header, 0, SizeofHeapTupleHeader);
@@ -1033,9 +1027,6 @@ DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tuple)
 	/* not a disk based tuple */
 	ItemPointerSetInvalid(&tuple->tuple.t_self);
 
-	/* we can only figure this out after reassembling the transactions */
-	tuple->tuple.t_tableOid = InvalidOid;
-
 	/* data is not stored aligned, copy to aligned storage */
 	memcpy((char *) &xlhdr,
 		   data,
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 23466bade2..60ee12b91a 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3482,7 +3482,7 @@ UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
 bool
 ResolveCminCmaxDuringDecoding(HTAB *tuplecid_data,
 							  Snapshot snapshot,
-							  HeapTuple htup, Buffer buffer,
+							  HeapTuple htup, Oid relid, Buffer buffer,
 							  CommandId *cmin, CommandId *cmax)
 {
 	ReorderBufferTupleCidKey key;
@@ -3524,7 +3524,7 @@ restart:
 	 */
 	if (ent == NULL && !updated_mapping)
 	{
-		UpdateLogicalMappings(tuplecid_data, htup->t_tableOid, snapshot);
+		UpdateLogicalMappings(tuplecid_data, relid, snapshot);
 		/* now check but don't update for a mapping again */
 		updated_mapping = true;
 		goto restart;
diff --git a/src/backend/utils/adt/expandedrecord.c b/src/backend/utils/adt/expandedrecord.c
index 5561b741e9..fbf26b0891 100644
--- a/src/backend/utils/adt/expandedrecord.c
+++ b/src/backend/utils/adt/expandedrecord.c
@@ -610,7 +610,6 @@ make_expanded_record_from_datum(Datum recorddatum, MemoryContext parentcontext)
 
 	tmptup.t_len = HeapTupleHeaderGetDatumLength(tuphdr);
 	ItemPointerSetInvalid(&(tmptup.t_self));
-	tmptup.t_tableOid = InvalidOid;
 	tmptup.t_data = tuphdr;
 
 	oldcxt = MemoryContextSwitchTo(objcxt);
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index fc1581c92b..5fe0659dcf 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -3147,7 +3147,6 @@ populate_record(TupleDesc tupdesc,
 		/* Build a temporary HeapTuple control structure */
 		tuple.t_len = HeapTupleHeaderGetDatumLength(defaultval);
 		ItemPointerSetInvalid(&(tuple.t_self));
-		tuple.t_tableOid = InvalidOid;
 		tuple.t_data = defaultval;
 
 		/* Break down the tuple into fields */
@@ -3546,7 +3545,6 @@ populate_recordset_record(PopulateRecordsetState *state, JsObject *obj)
 	/* ok, save into tuplestore */
 	tuple.t_len = HeapTupleHeaderGetDatumLength(tuphead);
 	ItemPointerSetInvalid(&(tuple.t_self));
-	tuple.t_tableOid = InvalidOid;
 	tuple.t_data = tuphead;
 
 	tuplestore_puttuple(state->tuple_store, &tuple);
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index 5f729342f8..060ee6c6ca 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -324,7 +324,6 @@ record_out(PG_FUNCTION_ARGS)
 	/* Build a temporary HeapTuple control structure */
 	tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 	ItemPointerSetInvalid(&(tuple.t_self));
-	tuple.t_tableOid = InvalidOid;
 	tuple.t_data = rec;
 
 	/*
@@ -671,7 +670,6 @@ record_send(PG_FUNCTION_ARGS)
 	/* Build a temporary HeapTuple control structure */
 	tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 	ItemPointerSetInvalid(&(tuple.t_self));
-	tuple.t_tableOid = InvalidOid;
 	tuple.t_data = rec;
 
 	/*
@@ -821,11 +819,9 @@ record_cmp(FunctionCallInfo fcinfo)
 	/* Build temporary HeapTuple control structures */
 	tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
 	ItemPointerSetInvalid(&(tuple1.t_self));
-	tuple1.t_tableOid = InvalidOid;
 	tuple1.t_data = record1;
 	tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
 	ItemPointerSetInvalid(&(tuple2.t_self));
-	tuple2.t_tableOid = InvalidOid;
 	tuple2.t_data = record2;
 
 	/*
@@ -1063,11 +1059,9 @@ record_eq(PG_FUNCTION_ARGS)
 	/* Build temporary HeapTuple control structures */
 	tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
 	ItemPointerSetInvalid(&(tuple1.t_self));
-	tuple1.t_tableOid = InvalidOid;
 	tuple1.t_data = record1;
 	tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
 	ItemPointerSetInvalid(&(tuple2.t_self));
-	tuple2.t_tableOid = InvalidOid;
 	tuple2.t_data = record2;
 
 	/*
@@ -1326,11 +1320,9 @@ record_image_cmp(FunctionCallInfo fcinfo)
 	/* Build temporary HeapTuple control structures */
 	tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
 	ItemPointerSetInvalid(&(tuple1.t_self));
-	tuple1.t_tableOid = InvalidOid;
 	tuple1.t_data = record1;
 	tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
 	ItemPointerSetInvalid(&(tuple2.t_self));
-	tuple2.t_tableOid = InvalidOid;
 	tuple2.t_data = record2;
 
 	/*
@@ -1570,11 +1562,9 @@ record_image_eq(PG_FUNCTION_ARGS)
 	/* Build temporary HeapTuple control structures */
 	tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
 	ItemPointerSetInvalid(&(tuple1.t_self));
-	tuple1.t_tableOid = InvalidOid;
 	tuple1.t_data = record1;
 	tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
 	ItemPointerSetInvalid(&(tuple2.t_self));
-	tuple2.t_tableOid = InvalidOid;
 	tuple2.t_data = record2;
 
 	/*
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index b31fd5acea..7bf2f4617f 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1846,7 +1846,6 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
 								MAXIMUM_ALIGNOF + dtp->t_len);
 		ct->tuple.t_len = dtp->t_len;
 		ct->tuple.t_self = dtp->t_self;
-		ct->tuple.t_tableOid = dtp->t_tableOid;
 		ct->tuple.t_data = (HeapTupleHeader)
 			MAXALIGN(((char *) ct) + sizeof(CatCTup));
 		/* copy tuple contents */
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 7d2b6facf2..3bd8cde14b 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -3792,11 +3792,11 @@ comparetup_cluster(const SortTuple *a, const SortTuple *b,
 
 		ecxt_scantuple = GetPerTupleExprContext(state->estate)->ecxt_scantuple;
 
-		ExecStoreHeapTuple(ltup, ecxt_scantuple, false);
+		ExecStoreHeapTuple(ltup, ecxt_scantuple, InvalidOid, false);
 		FormIndexDatum(state->indexInfo, ecxt_scantuple, state->estate,
 					   l_index_values, l_index_isnull);
 
-		ExecStoreHeapTuple(rtup, ecxt_scantuple, false);
+		ExecStoreHeapTuple(rtup, ecxt_scantuple, InvalidOid, false);
 		FormIndexDatum(state->indexInfo, ecxt_scantuple, state->estate,
 					   r_index_values, r_index_isnull);
 
@@ -3926,8 +3926,7 @@ readtup_cluster(Tuplesortstate *state, SortTuple *stup,
 	tuple->t_len = t_len;
 	LogicalTapeReadExact(state->tapeset, tapenum,
 						 &tuple->t_self, sizeof(ItemPointerData));
-	/* We don't currently bother to reconstruct t_tableOid */
-	tuple->t_tableOid = InvalidOid;
+
 	/* Read in the tuple body */
 	LogicalTapeReadExact(state->tapeset, tapenum,
 						 tuple->t_data, tuple->t_len);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a309db1a1c..8dc1880925 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -218,7 +218,7 @@ extern void heap_vacuum_rel(Relation onerel, int options,
 				struct VacuumParams *params, BufferAccessStrategy bstrategy);
 
 /* in heap/heapam_visibility.c */
-extern bool HeapTupleSatisfies(HeapTuple stup, Snapshot snapshot, Buffer buffer);
+extern bool HeapTupleSatisfies(HeapTuple stup, Oid relid, Snapshot snapshot, Buffer buffer);
 extern HTSU_Result HeapTupleSatisfiesUpdate(HeapTuple stup, CommandId curcid,
 						 Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple stup, TransactionId OldestXmin,
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index c87689b3dd..b1d69ab5ea 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -306,9 +306,8 @@ extern TupleTableSlot *MakeSingleTupleTableSlot(TupleDesc tupdesc,
 extern void ExecDropSingleTupleTableSlot(TupleTableSlot *slot);
 extern void ExecSetSlotDescriptor(TupleTableSlot *slot, TupleDesc tupdesc);
 extern TupleTableSlot *ExecStoreHeapTuple(HeapTuple tuple,
-				   TupleTableSlot *slot,
-				   bool shouldFree);
-extern void ExecForceStoreHeapTuple(HeapTuple tuple, TupleTableSlot *slot);
+				   TupleTableSlot *slot, Oid relid, bool shouldFree);
+extern void ExecForceStoreHeapTuple(HeapTuple tuple, TupleTableSlot *slot, Oid relid);
 /* FIXME: Remove */
 extern void ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot);
 extern TupleTableSlot *ExecStoreBufferHeapTuple(HeapTuple tuple,
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 1fe9cc6402..ccd81dff39 100644
--- a/src/include/utils/tqual.h
+++ b/src/include/utils/tqual.h
@@ -39,6 +39,7 @@ struct HTAB;
 extern bool ResolveCminCmaxDuringDecoding(struct HTAB *tuplecid_data,
 							  Snapshot snapshot,
 							  HeapTuple htup,
+							  Oid relid,
 							  Buffer buffer,
 							  CommandId *cmin, CommandId *cmax);
 
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 1e0617322b..efce063ecd 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -7234,7 +7234,6 @@ deconstruct_composite_datum(Datum value, HeapTupleData *tmptup)
 	/* Build a temporary HeapTuple control structure */
 	tmptup->t_len = HeapTupleHeaderGetDatumLength(td);
 	ItemPointerSetInvalid(&(tmptup->t_self));
-	tmptup->t_tableOid = InvalidOid;
 	tmptup->t_data = td;
 
 	/* Extract rowtype info and find a tupdesc */
@@ -7403,7 +7402,6 @@ exec_move_row_from_datum(PLpgSQL_execstate *estate,
 		/* Build a temporary HeapTuple control structure */
 		tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
 		ItemPointerSetInvalid(&(tmptup.t_self));
-		tmptup.t_tableOid = InvalidOid;
 		tmptup.t_data = td;
 
 		/* Extract rowtype info */
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index a2e57768d4..76dff2a51d 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -524,7 +524,6 @@ make_tuple_indirect(PG_FUNCTION_ARGS)
 	/* Build a temporary HeapTuple control structure */
 	tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 	ItemPointerSetInvalid(&(tuple.t_self));
-	tuple.t_tableOid = InvalidOid;
 	tuple.t_data = rec;
 
 	values = (Datum *) palloc(ncolumns * sizeof(Datum));
-- 
2.18.0.windows.1

#78Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#54)
42 attachment(s)
Re: Pluggable Storage - Andres's take

Hi,

(resending with compressed attachements, perhaps that'll go through)

On 2018-12-10 18:13:40 -0800, Andres Freund wrote:

On 2018-11-26 17:55:57 -0800, Andres Freund wrote:

FWIW, now that oids are removed, and the tuple table slot abstraction
got in, I'm working on rebasing the pluggable storage patchset ontop of
that.

I've pushed a version to that to the git tree, including a rebased
version of zheap:
https://github.com/anarazel/postgres-pluggable-storage
https://github.com/anarazel/postgres-pluggable-zheap

I've pushed the newest, substantially revised, version to the same
repository. Note, that while the newest pluggable-zheap version is newer
than my last email, it's not based on the latest version, and the
pluggable-zheap development is now happening in the main zheap
repository.

My next steps are:
- make relation creation properly pluggable
- remove the typedefs from tableam.h, instead move them into the
TableAmRoutine struct.
- Move rs_{nblocks, startblock, numblocks} out of TableScanDescData
- Move HeapScanDesc and IndexFetchHeapData out of relscan.h
- remove ExecSlotCompare(), it's entirely unrelated to these changes imo
(and in the wrong place)

These are done.

- split pluggable storage patchset, to commit earlier:
- EvalPlanQual slotification
- trigger slotification
- split of IndexBuildHeapScan out of index.c

The patchset is now pretty granularly split into individual pieces.
There's two commits that might be worthwhile to split up further:

1) The commit introducing table_beginscan et al, currently also
introduces indexscans through tableam.
2) The commit introducing table_(insert|delete|update) also includes
table_lock_tuple(), which in turn changes a bunch of EPQ related
code. It's probably worthwhile to break that out.

I tried to make each individual commit make some sense, and pass all
tests on its own. That requires some changes that are then obsolted
in a later commit, but it's not as much as I feared.

- rename HeapUpdateFailureData et al to not reference Heap

I've not done that, I decided it's best to do that after all the work
has gone in.

- See if the slot in SysScanDescData can be avoided, it's not exactly
free of overhead.

After reconsidering, I don't think it's worth doing so.

There's pretty substantial changes in this series, besides the things
mentioned above:

- I re-introduced parallel scan into pluggable storage, but added a set
of helper functions to avoid having to duplicate the current block
based logic from heap. That way it can be shared between most/all
block based AMs
- latestRemovedXid handling is moved into the table-AM, that's required
for correct replay on Hot-Standby, where we do not know the AM of the
current
- the whole truncation and relation creation code has been overhauled
- the order of functions in tableam.h, heapam_handler.c etc has been
made more sensible
- a number of callbacks have been obsoleted (relation_sync,
relation_create_init_fork, scansetlimits)
- A bunch of prerequisite work has been merged
- (heap|relation)_(open|openrv|close) have been split into their own
files
- To avoid having to care about the bulk-insert flags code that uses a
bulk-insert now unconditionally calls table_finish_bulk_insert(). The
AM then internally can decide what it needs to do in case of
e.g. HEAP_INSERT_SKIP_WAL. Zheap currently for example doesn't
implement that (because UNDO handling is complicated), and this way it
can just ignore the option, without needing call-site code for that.
- A *lot* of cleanups

Todo:
- merge psql / pg_dump support by Dmitry
- consider removing scan_update_snapshot
- consider removing table_gimmegimmeslot()
- add substantial docs for every callback
- consider revising the current table_lock_tuple() API, I'm not quite
convinced that's right
- reconsider heap_fetch() API changes, causes unnecessary pain
- polish the split out trigger and EPQ changes, so they can be merged
soon-ish

I plan to merge the first few commits pretty soon (as largely announced
in related threads).

While I saw an initial attempt at writing smgl docs for the table AM
API, I'm not convinced that's the best approach. I think it might make
more sense to have high-level docs in sgml, but then do all the
per-callback docs in tableam.h.

Greetings,

Andres Freund

Attachments:

v12-0001-WIP-Introduce-access-table.h-access-relation.h.patch.gzapplication/x-patch-gzipDownload
v12-0002-Replace-heapam.h-includes-with-relation.h-table..patch.gzapplication/x-patch-gzipDownload
v12-0003-Replace-uses-of-heap_open-et-al-with-table_open-.patch.gzapplication/x-patch-gzipDownload
��'E\v12-0003-Replace-uses-of-heap_open-et-al-with-table_open-.patch�\�o�F����+6�=���p^�����/{b{��I6%��Px����U���I��I��x�����>���������cC{<���`6��c������y�6z�m�H�|0����������;�������_A5����B��P$�������s����v-�����)��a��o��|����a�t�g��Y�����f|�������w��?`���h�/v/�.7K"h)��J��"X���q�=;����pEzwnw����x��������:>���_8��D���l�����dD�2����;�?���5��������>3?��{�J�N�D����o��]����?b�
R��:�u��.JMJR�������F�f%��2����>jz8(K��ED�Yb���:,���H�D�=��z���X�����u�J��L5�P/��A����py i������"����0
�P&|������#t|���y�z��^Y��-���u�(a.��j��x�K�q��u)#�(���S��8�IM�\���t���]@j�xE��T��������
������I���^k'h����'#��D��B�����Z���cGf=�2���/�����E�vU���u�������+������!G��r�n���E�F"�_�e�����a3F@J����c�@��Wg���4�x�\.��I������'�5���bm�� ��u���l���u����
�j�W���.��[9Zi�R�V�G����.u��*o���n/b��e)�����,:�n��w��L����:��|�n~�&�q�R��D�>��G
��R�)��R������R:���=o����(��2��ZF���Y�:�0Xkl��}Z)��X@*�\V~3�����Z\�.NZ�h�p�~��M\��%�Y�s��Rk�
A�}`�����y<���A�ryi�����}��h��u4Z��"g��F9
\�I��]�$���C�A�t��;"a�&��� �sI�f��V������@1>����f����3�r��8��VO-�S�-}��A&5nc�8<4W@�2�B��Mf��6�K"]`��?O�L�y��W�Ga&�l��-�����/I����z_-I��w'�,��E�'k�T��W�\�\�P��O�:Q\��V��w�2�r^�U
{�P�|��E���,�����`MKR~`��i��(Ii���\�$�CW'>�qI8@&G_� )�z=6p���I�	��0�Uj�D��)�R��p����1x%q��T4�����:�����II���8�h��3�U��a;�(�P����C����V�{�Ob��*R�z]Zq�2���a�7;x
��1U6g	kA.�OZ� �KqB����o<-��I�U:L��D;%�*�9�f!O��\*t����5�J%�Hj�����T$\�R5��&;�5%��#*�����B�l#�L���H�P���J)�F��.�W�p)�K���!��h�kl��f��j���;1q#~�!��R�H�����0��id��>(�1s�KVe��:>D
�������g	W�[�����c���x����l�x0����������{�io�����h21g������F�����}H3wh��g����3e��n��:� �������c�J,��2��e	A�TN���m��
�u�����	��� �
��'oD|���A^e&��k(��[��~���@���Q=P*������Y���:^K�{M��lO���KA����
<����I=t����jt�H(�T��*Q����������Q7a�������}0�����7p��h��N����S���z�V�(���������
��"^�?����1����b4T0��t��RP(Bu����	6�B���������{�~�`q�"���vqxv2���X�'�3[#�VN��u�����Yg�H�.>����f��
�������� �������5����������;�(�HG���*�?^��p�!y,����3D�aW�����:�S����6 ��E�H���}�`�El�t�WW�[�c����;%nL~@���48yU��Q,�0��� ����o��a7w��{wq�y������.�4����=��5N�q�x���q��G�v�-4q�5���qlGX��b7Iuj�d�r����v.�+���\�Z�v�Zfm��@�HO_�C����������st��F�S�=sc:��'b�!`��e����6=���)���Bd4$���HZ=d�*�b��I�����f�)���v�������X�����]�Tx'$Sa!��LK���7�Q�����z]�����D
�\���$j��0$rd�^��c��n��J�,
<���M<`�C$l�n��B���e'4��|�c��!C:L`��<����<�6-�w����v�'"���AyME��b�OTaL�]b��*VT-��I;C`��nX�b�+0-;8@BO<(kx�;���'p �$�I��!���;$U��#T���U�A����W_�1�a�6�lh���q@L��c��lm�����6w�S���b������4�J�x��
@�P�jM�|�m%���-��T����i������g�����T

9�M���7���,����~���]���y��9��-�{<C	�.��R|��Z�_>���]\��^�<��?��Vo��)cKe�Ao>X\��Y�k'3kd�vC�i��b}!�A�7�u�}�*�7U������(�������p_�f����cHU0k}���yC�
���@���X2����2ef�H�����5j`�(�P�`�T�n�
C���GHn{)��\,���VRVNZ*�0��@$�)�<��VLy�����Q��R�@es��e�	c��:�6'���(��]���j<�w��7W��%���[S�����*A�����&i\��^���0��E&�G��k�l�@Z/���X*���R��A��9�h���������}�����*��Y��\�=,��_^]��A��L?����r�6��������k
���������7��<�%��t�W��@�����$��k1c��M�q!LF*'�*R��r>�Ft�N����
�]��HQ2Yw)?���8(e&Y���8�P���w~4D(������6�A�G��LSA��\=������_�^x�Q"
QL��
���[9w�
��T�N���,��w��l�gCn�y��h[}-A�R�e����+�C�R��DL��u����K�;Y��������/�9����������Kr?�����*!�s���h{B��#59�IM���5	�}h��������:
s�u�<q��aB%G���O7����r�X^+#�p�*��`��3� !/1���@�%��x�n���/����2v���H=��C]_�G�}MJK8YPv|`�g���w�1I<��8J�q�B��kP����P��l8�f�75`�gF����f0�����2�i
��������*�H-P�y��u���L��q�,dGV,<�c,�������1�Y�!T��C��@�O>�H����H�?@��}i}�@����{����b����C�p\�L���<R����m��5�M�����i6��wb�>mk��g�A���n�i��u����6wE�6�D���������(=1XG�z�\�?7!��=CH�������jw�����O�i�k�m0�C�*��zA�&��d
�G�X�`:4�:"~������������.�(1�������/���3�$��+?@��F���4M��1+	����Q�Dz��d��AZ��h7��a�K�'���'���s���W�q7��z�5���\������������������{ou��
���h6�
y��G��������W��c[=D��)CO�\�j<����{%bHP�wS�	�7V ����!Y�E�ZB���D8���<9�����
�A����a�mYZ��v�|����u�
�����~����!`G�]bi3���X^^�����~����9�����rbz���M)�|#�pK��tgUm�j��������D�J<��������\N�d�P#tD@L|�D����A�Zq�&5�	f������g�V��i\��0�iNF��d6X>
GD4����P�=��<����>e��@���������J],�����������^�
�
��Q5����}�L����%���Z��#�5nR��73�Y:�}GI-����W��^��<OB�6�i���Y�� ����oP�_�P�a���<�p��9;��Y<���0?��?/��1_z����<B3�����A/���>LqB����q��l���6�������.,,O��4<�
��ez�������{�C����+�T����6�4�-���j����^���W��B)��E�S6]w?��_��x��<���L�e'���I|���L�B}J��E��b����)9�_��&*W�p���2)�����>��,���Ge,���a�'��~&.�^zr
S����d{�)���������Gdd^b�>�U���b���\�����`����+-�)|���A�t)(�Mc�2Wv�t����xm*#>���%h���
���MGSj?GS�r�G�2�x`Z�	�M8����5�i����Y%�L:{�� ��"A6��f�/_���E	�n{H!YQ�X������_0��FF�M{�bJ�QrN�T���|(O�}�J�3��iw	����^^^	*;��_X2�H��G�l`�&�A�b�7&���ie�wKU�5���r���!������"6���k���:r)���/��);��t~����"?D��o^e+:h�;��3�����c!��%������e�������f$��o�����R9������AUk���v�,�-n#�A������9Q�����r����+;���!7#�s����2�|����n��?�-��G�:�N����_��?;��4�����[�	���������l�����=�zpB�����(��0�P��i@�P*������O�N1������T8m����2�%Ro��{r�x��6�aEQ�y<%�;�!����vqvs�8��A��n�\>�73��`{�o�����������|F�^S��K03ynl6/hp��\�@��pNVA�[~�c��-����������5�?q���y*Ll�l�E�������<�>�c��W���?|�
��"*����������b@�o
���Cz��tH��=���$�o�^nC�����f\�K_�D-�_�/��K��:��j
��Z�����\�S����'�-�� ��E"o:�',j����G���������ok�7i��p��f�XwBT�G1�-\,��cm��x8����t2����gvCV��vt���
��@��
�	Qz����h\�?��	�e��C�TRG��)|�������y3nH���TM���,��A��*[!����Q1�Al�T�m�����6�P���O����kD��_��'���d7d_e[��r� ����m��J3�z�pu�����s5���8���R�iQxt(�*/�^�=�,��a�;�kZ��Jw����l.�U�:a��a��sp;H�+<������[�:���"r 8b��L���x<�c��9QS��*��z)��x�������2):1n��/p����|�P]?�RX�13%l��-�5K5^�X�r�����:�fj��'�����������$�[~���:�#sx��7��%��wl�#����;�$@	k�`��v6�����
7��Nvrf�l4�����u�����)��9���V��GS�!_-�c35u�������G�4��	�l�������������J��B5�G��o�$��]�}����[���G�G� ���N�F�����Q��D�RA\�O����xB�on��MA,ff��<3�4�v�W q?b�49&M&9����XI,�#>��p���h�W��SX�����f��u��C��J�-g�|7����6J+�\�nE�G[�i2|V�k�k��Uk��d����IL�j6lN����_�F���q����\�hQH
]c~��������y�M��hP�Is2mw�F�Z�&�~���{����u����u�%���8���Fc��n���%��`Q�b��-�$��'N|7\<q��4����%����,��0� �U\iY�8�^0��|V|�YQ@<���2�:
&J�K�����}B
��id�s���V�N�mM�`�}N�������'�/���R<���z0�|��|t3<���"���L�"�.��q�Q<I��x��R������4��z��o�;������~�o5{�K"�����n��85���k�;�=��
�
E!u�����������)&�g��|/��.�Lr9L��I��Y������`����Y���8��\�:=r�E�~�x6b�������iU!���0�w#]��a(|��A���@��t�ll���lp�}�y��=���hg�1��;�R3��/��n�8����s�MlS��
R�`D�|����:vN������������
�G�N�������������d&�Y�2}T����Y�X�e��q$�P��UE[�}�FPL*�\N�����?�������]
	�Q^R�Mz���'r��)?B����)�2����V��o���=d����>E�h@4	<N���+�~zM��,c�t��?���#���*V��oo"�,�pZ�<�����D�����9a��x.$�� �����Z��zaa�~�_��[�_��	�,�F��O� �Pq��P��
�I[�4������tD���FD	���nT{�E0t;�����z�l_����fPCns`��g�]F�`����8�
u���� ���������'��`��95�_���n������5m��V��oV��I��5��i����:J+9�&��%��P����bq�1u��*���~rWJp�E��0;#�.����k@B�M��:=4|���������������������1OM��C����2���0	��~�wd	9q�Y����������z��w�|y�-���<2H�
#��a5�����rn�8UTi2���r���?����+��������&_e�����Nk��u ���0��.�P�$�a)KL#���[�;�CW��
 s"�����\0����yI�KB���pegAtB����������N�XOs�W$� ��5}C#Nj����c���58^
�y����{�1��M�D�&-;�nG68?��.d�y,�+�~�F��B*�=j�9.���d*?r����#�fBBzE������Z�W����:h�8JT�(v��_��>{�3lI~)��f��j@�����6GG��!%�8�������b�y�����>�'M�������
a�~�uM6�5������|��DF|5�Ig���@uxP�|A�Q"9��m��l7����\C�L7��S$��V�k<=��/��:"&���I��x_
>����(#II%��k!���|e��+�S�l��u��f�]L�UMw��A�J��k7�96�g2J �|��/�)�6���N��
PE�����`V|�z&���M����; �&�&��2�Y�dks�CT����M��85�)����
���$�r��5S_��O����B��l��AO]^��/^�8�����f�
������tx���fP��Y8`[pn���`�F������������]x��&���5�W��d�W�?p�V���5��7��h.iR:6_K���b�&��i��3� �v7~����$�\].^_�n�?
�7s
��~S��-x�E��|���_����W{8��%���
X%�������������h�uqG���j�����]��_R!ysz�����"�D����Q��-x���A�nG���7�����H�[���E���������
9D�+�;��+b5��.����d���������uk�~]�������6B9N�9�q����I2Q/��#e�//H�5�E���$����Y�|?�ZCb��/����fp����h��7���GJw�����$t���_��[�%t��_�?����3�5����[���s�Z��_�z�Ny|�3�\��6F��������M����0�l��:��������sz�d�9��Od�����i�}��N�%���hGAT|��v����n���O-i�^�RzEbh�����k����`..T�,NX����S�����9��SQs��L;lv�^���������z0�^�?J'��f����O�(|E*qc����"}�<�c�sm�v�V���S :s�I���p�,f�Ay0�����TC����M��D#�����,g�A0��l7zL������F�G&�_w`AcBJ0aa���� �A����X��@�w��
{���}/I���V�A��� �m����N�<�gGA����Ec���m��&mi%y���\"S�:�K�S����Fnf�����Qi��w�wT-���Z�����&Y�(��5�����q()X`�����HL<V��L~|fM���!�x�U0��%E�������e]��xb�;�z�Z�����^����F�Qw���9�X��Q���9��9 y�
�L`/�_ELTn�������������Wg�������Oo��fq�V����������!�?���g~s��)�s!%R�6����d��/r[������+���#��y�k�N�,EB���98��X?�;����:X��!�'~�z�������O�4�-���b��|���/&���l"s��fo�7Z�j����}
b�Vev���fd�`[X���g�AQ0�T���</Bt��@�����}�e8y���L�{�2��u��C������tI��*�G"�x��J��9u��`Z�����5��x�u*�
��q�QwZl%n�`r'����x=�_���\��H�/<_���K0@�\�[�I!/KS:Y�����B���t�K�)9_��r��[���b[Kj� ��X`
-��&��4%���-omr�q�}5��_��`�3 ��T�4�*�A��@|�J���xn���
��[������������������f�\-X�/\_�Y���r7���1#>+�m�I���P-�c��[-��<u�
�_]]���p=��<{}-��7W��<�g�d��R���!
�E��J��
�`a�P#��|�;��>�LV�Z|=<Y�}�'�A9��#t�����l��k�������n�G�P�������d�]���%��9�1+nw�[��t�Xub*��������]��o����[���)�s�����
'�(R�~�Q�����Z�����W�O��w��+�����S��:�@h#KZ������.y�����e	�r��m"��{����8��d.("�].���.�����;������'oIl��7����D_����T�b��*W��h1|����`������r���bE��(d�mYT�F�
%Q(��$��>/������h��9�ML>dw�uJ���p�Z)KO(%?�}U�����D��p#�D���$.��1'�$�:`R�1��$�����y!f��x�Y3���2v�\�_%;�h^�g�yp�����vV���[�����/bGV�b�J�q2mX	���X<���,v�yMwbm��z��eS[,�"�Fd��\[��$����C�'W+��F�+s�}�h�9�S�;����������:r��|��e
-6����?*a }i_�($����&w��W�\��s����g&���NV��^�B�[F�4��8������H%���=~����Y�@w�%t����$TrG�I����$^�=���B*K����W�u:���#���)1��8��ZY?a�x�X�S�����r�og��L���M����v����6�mb�������e�����y�Ute�=X
F%JG���].����JZ*�F�MOq�h?���9�������C�r:^�k@��lL������t����T(��[_Y�,1*��M��������1��V�U0����3A�K[]��s�}��O�#}�uK�!0�x�4�o�y�8G��`�r��!
�B`�����i�,EJ9�5����v%����[c�%������I8_��`����B����&�zF0nS@P��Zz�����yl9ITj���k������t������z�v���L�r_	�A� g�}�����bN"l��5������8�Y���!��L�d������zJ�[�������XW���S
@��Hh'V%�f��'b���V��E>b�uI��8x
0�o� ����_2
G���������jI�A|�'bg�X)#����������Q�d�JK�'��m��-����i�I��9��Z6��RL���v����
g�c��8���O��ED�b]�q+x�\ ������\fSp�4�~�vt��-����F$�	�M�G?=�����A�1�TJ2s��^<O���A+��`���&u������y����������,��X~�����5�I�S���dZ��j���Qk�p����x0����H��'bb��X����i #9��3:Rkz��^�����5m�,Yx�����U,:)r�~5{S��-g=��h��b�}��Y%%JE�m�<��I�g����P�N<f�3.��)�ox~44k���^A���;�����<I�������q�]�Ye3��6Q6eC��r�����@���X�S�)xrj�+SHC�G���7p�����'�x�D���p�OKxk ��V�
!`�r����	F��2�Wwwb)rTC�4�pX]%�n���z��j)�=����m��e0�^�j\�[J�ZO�N�A�AG� ����J����Q a�P��i��X��;������c��;�SL%�vNo���*/�&��|'��E��a��������F1�stJ����`=(��-�U�����KNN���e22r��9���9�c�*��������#��+���E���6:T`�����~t�C���A�'������6&�H.CO�+)��Zm4���4�fZ�����CL[���<������7��o���2��K�
i�	6�Q?<-s��pc����X��]������h��7R��Q�d=?���a������Y%~'9�Y���}�a�*�0)��#6N��w�6�_�8R����Q�zH<��:B=����60��L����l�����v@������W��a��D��4`����Y���C�I�c6r����*e�(����A���1�Lo7:�S5�Z*�����~��� Gb�S�:^�U�"8*A�~������8���cP����sGR���)3�\x*T7�2T�B�9U=N������(\�`P���&����'�8Bur>�������J��m��p>VF��+���	�~������'�4��(y�s��K!(�@����������E�3� ��'6����8���Ku�lv�-.����y`�q.�E�_�V����������5�����=p��^�%�\��M�a��\*g�t���\�w+8<���P�b�l:�'���7��O��"t��n0C�W�]��3E����TH���g+hYp����]6�tpC%P�0UeP*����,K<:�C%�t�1�*���-'#G�o����{n�|(h�f�l�(��������V���m��	�!e�h)�^K��w��t,����������Q�����3;3!UK��x����UY�g�XU��rn�����o�(���HdU�K��5C�u=�d���K�cUbxze��r���h�y����q���L�d���T���>P��:�������og���"C��q�f`�G{��|DU<7��(������
�����������c[ -�\I�mX5�#m��yN>��)�(;��*�^��[I������]�������"_S��%�|�+��`q����o���y6�13����
ZrE7��NyU���#�"�
�>�w�r���]�8������M�%��3����J&�9�l	8VR���yl��	�Hr��]H�>qT�f9�7���:�l�)jj��/ ����G�>�\C<�4��\j�'�c�����\n*"�����n��7�n�>�V�O��f�=-��"��OF�-��K�%��8��\2�W��,L��-������I"C1��T;�����Y�2��A=���$/��PZL�EDl��O��j4����Q��t�c�<�dq&p#D�d%�����,��X$9�� ���3�T�9�V+l����x	1d��@|�x"�v$N�6d��������
������"�iJ+A-O�]�i���&���I����b{2�������#�N*++���'������9R����W�#6���S�G_,����`4�?]-cUH�N��/l����J��P�1�EL/�;��<�M��>+�0�4��%������ �����7g���d+����R|Hu�w&�=�T��>���;����%� ��K�ge��E�e�����B��R���E����']�����B�h���3Op/��o���n�@���1��n	S�}�!0�G��*�|3�q^�3�jM� �&��6Ra$v��+$;�3xA�z����{�mb�a��R��Z���^�#����jr��T���*�L�`�����
b���m���B�L|O����u{�� �Qof@�|�aN�H/�6o}A���k����{UE�#���������.�G�5�[d���Z�<��Ag����N�8$Z��e���-'&���t�A���������=�Go���2<}=�{
�(�%<l����x������w�	�f�gH�Dj<���a����k��62I�|K}AK�:FK�m���;`:�8_��#h_�b:e��a
P��;#��b.���K4'&����W�,iaI��V�5�8�Z�W�p�J3oq/��]�Bh������ah�]z&�����$��L��f�p�D ���;����o��QT��p���1����������;-N8l5w����2�X�\�x�aRS��a�����*Vw��	�),��0K�������d��n��Ru��:�Dj�a0 
N:�T�V���$�������)��nhM9��Ei�z��#
���:������n^Ot�Q������a��A����$'��7G���}���sTJq����Q]����$D�{DEs���9�e~(�ee��
<-t86dt,%� �m(C�i�|Uc���)��x�����<9��
��0Z�������
�� eg"�)
����4��#��q$�4kI:F���X^
��k)b��������mY��9K')���������ULM��lvZ��0�����O��j��uP��'Y�����MU�MD���JE�UX��+*9����������W3�#���Y�
o��-c�u��sw-�n?���
h���(�%4�Ln���T�f�?qt�x�8�%��g��j���]�-+���5os��v��c�2�!���h�XV��x:!{?^�!�������W���w1=]<T����<���m�T�	]	�jr'6�{����sW�f�Z^�/�i�Qn��v��������E 0��\��������
L���-�f	��8�#0�C0O|�3�w!R�j��`����M���K?��>
FC�`�K����*T,�X�G�:wq��DVB��ljV}Q��[Y���mi�Rv%=��T&������A.	@�?�x�����y��f�e�5�/`�@�|������<��J�n����z��8�Wd��b���)�q]�GV��7�-�`�zblb�!z5���zX2���_[��������JWc�rI��&
��k[���$�u���9��@�C����T$���!�u$�����e_�� �����Y9)H��(� O�
!�t�>��/5]��.�8%��� ����/p�0�V�6q�Z������QA6��L�,��������`ty5�����z88O�����Mh9f��#�o�2\O{���������H�KWw28���tX�5��-}�:���!E��ii�z��G��U�X�aL�:#��%1
Rh���:l�� ����PG�n.��X�h�v^��7z�����V��NgZ�v���9�.�_~�]�%Z��MrH�U|%4�D�8��B�s�> ��8���0Hx>	1;1�<Ub0�Q)_�B{�����8:����UN���@f>rq��GE� �'������������9�x�
?��$������=<��a�����PB�b�	��3d�6����KN�b|(5x������������m�<q������U�� ���40G�I�	�C���9<6u)���D?��](Q6UT9:J���w*���\o��a�;:�/�Y��KSA���y�����f�^���2V��Ux���������S
7.�qWTb��D��?�':o��M�@E�os�$L.�l������q�����������;$sH�G���s@f�p��������e3���%�Q4O��x4���<�y�SE�"~r�~2(T��6����f����z���K�Q1kb���[�8E���ZA�47��������wF?0�;M�p�a�����^N����E�-H�*��/��y�0�1HA�|�h3w�������<�@�By���:�3���9�M2�6����Sz�)���S$J�JQ�\n��d���pK���7Z��1�(��iZ��REI��[�q��R.q�V{J�E�:%w�u��T|A�
���/)�@*�>�����Wx$�J1TQ�|vj2�E�0
�`��[�� ���	Z��iH]�`'�v��J��;��Lv��5|=;�c�������V�9�$�U����2S�X��;U��
;��(���"��"^��4J�t����X�=��J�����Le����81s$h�T�G"���S���Vj��x��u.s�Y����R�,�Y0AL�9�,o�]
��~]�I�����l�!� �mV�{*���1h��q�O��9���V������W�N� ]�W�DSQ0}`�Y�������gXF�����1���[P!�x��`:_[��B��A,m`2A�d���c��rH����f%�@.�w����{�f���q�������B���d8�.�>���[�g-�G(_*�6�q'����%��"���V��|�WZHM���������S?�`r��*]��0�}�1������;C#�����_��W�cD'�z�}��lp��N~��z2O�u�^_oh^$>�����1f ���y��y
�SaSaS�k�j�(_M�~�i�����O�1Q�x���:�ve
��evdp��U|�bJ���|/��-b%��C�#���nc�}gO�b��G��KM���tK�,����SGi�vS����%��uR���
��T���Y&B0���l�n�	3���@��f�������-��[�:	u��C�������/|�|L�[�X�:�?��/���f��ty��}�\��$,�_��z��cXm�����A4�u�����4<��
��gj�Z
E�[/+9���J��[���L���%H�%l�_�I{���,�v)��4��J��bg���������>e���  E�&w@H?����5J>��>��:M���|},q�qeW&��8����F[�7�"�_�"��]F��:A������=[����o+���6��zQ�Bh�do��mR"���n����	��R{����!�����h<(�xz��6���w	L���`h�<fMA�A�J�o��`�ns���S�
u7ZH��?i4&����V;
��O:�����n���n���k��[b��������R�s0��K�7��`��>X��F�D�;_(��F�+��L'V�B�G�a���*�g�����d�8��U��."���s.y������� �6���o��Po�N�u�;���P��$����4!	|y����)���Z:���N|�n�i�s�"'�:?�i7OynSS"�m���v(�2��$)zH�Q�=�k�b'�}o��_�K5�r���K���o�#�lx!�T��*E��E�$
�"
�kj43
� �$lt#]��������n��[�������[������F�N�~�9��]�]�N&�^����^�
���`���w��#�xO��g�i3���kB>_}p�
[����*i�!���R&[�x���w�6����:aU�;B\Hw�z!#q~
��?>�\��4��"#:����)Y%�E:(��.�o�Q�%���5���xV�n.4Z��L����-�����W��.Pj�?��i��\j�L�'�����vw�w�n���T����v��~�Q��@���c6De��j?]�g���{��H�	��d�:�dy�Y/���B�����Y9�:3{Z���p.W�f��MV9��!��8X�x�O�i0y.�&D�J5.'�s���sf�A�aE������u1T4�_��Op���`�!1K�@ua����d��P�G�1~����1i�����������78���DlFD�	aPZ���t��|�B����z�1?d���
_��T����y�n�k����zH��^>p%$E�+:�������d��'%KJe�|��Z��iL@xl����������~M\�S�IN�R Nz����Un�Q�w�Jn3����;]q:�Dm�����?-���m�g��@z*/(��r�1q���O�&��{�@��Y�!:_�7���#@��"dM���\K;����2��ar)��|xBpg1���f=���h�U�NS�����	Q�`_�-)��1]��c�3fB(��m=���g�b0�`
2.E�B���G�N��rMN�jY���$��2�T��H�fn�{�;�d�J*�U��[��?�T�w��u	������'9�[��jf�������y���w�"&j�F���p����l�y"=�����Pte��"��������'�7�4����F�}\#Nlb7�#�z1��uf�8�9X��PInV�����D��������K�v9�.��(^,#q��x|��'�8����z���[B	E�F�����AqP��a�t1M9xC�\\$2/VE9_��C�$�������*�>;�z��(�t�j
�.w�����v�{t)ET�,	�m�1 ��k���.XD>G��i�n��'�rq�Yk!y�@��I��}O�J�(�R����+TO� ����L��Y1�=B�_Kri9���(MZ�c���2��*_���f���6I+�m�V��K��#~��B�plD�5;�u����0N������k�b�	;�)Iiy��I8����
9k�/����5�&��U0s|9�:��,U�%e*2	�N�� ��zX�A�C8��fYn}�@��������t��\%c����%�i���������������=is�W�_�z���5T���tz��h�4���V~����!so����
��g�n
�-��+����PC'�x��#��g�e�L��q'��C�K�������e,����t�����S�E0#��Zn�����tS'������5%v�f}'��4��~n���M��mf9�m�~���|���7�&eL*6�!�����P�n���U���q����W�����A�����}-S���2�Y��@��QQ�^CG*,�D��	��iU@(kF4�t��jd�tT�rr� =%���}|_��&	����WM
��j�'����}BjFG��cV���-�����p�F��,���}"�2�a�ibaf��"�x	I���4�_Akq
s?��c��3��k�9��/2 #bxu���/��u��4�K���+�e�q1n-�>�,}5�W�<@��A�O�������K�_�����~�S������`5i�SH�|��z�\��%����
���z�E(�t=4m7�4z��o�fy�@�f#����5�\��M���=��TdH$���^���E{�<�����-�����V�cU��J	E2:�����&�1��Ag��D�q��Z2��#T����[q$M�������u����i��n_\�F���6E�����������?g�NZ��(^Sv��)�%����fd��e���:An���}IT��5�]��v�\?�T-g;\�9i �W6-���#�da�F�k�u�
H�����`����$2�O3dP%��yc�nM�Lh���1�V���������<l|a��o<�Z(��ZJ��L5e)�O'�v���z�j��4[���&�������ts+�4��a���8�73vc?`%z��?�Q�moVsq�����r'����g�	>���x������p>��Cx��ka��3�d3���*	�-_
w#v���p���������|�W�*�:U���C�@�0�����(b���TW����>e�L|�0��<
��G��AA�b2���:�2����@��]�(��Gg�d5\et��$����tT���2����v2]!���[������������xm���7���+a��*t�y���u�E�&a��]�V���t�Sv������Z��0\���D�0<����r���A�C��<�����i�m�������y��vE�_(���v��b�A�?�����XaX��R��]J�>-�CT�1G�rG�uA����^��ohl�0�V���?�Wz��8v��!�G�4s�0j�[8���%}���$�,�"��D3
WRi"�9����;���y��Xn�sq��e�������6�����tw1��.{�+��f�R)j<$*v���qY��M�_�p�����&\�l^LA
�����
��K|�P�#����b;��N~��nH\w�*�w��Q�+a�V��m��uU>���T�GV&�>+=��Wn��9����e������d���d��o�/��1k|���r.��{��8�����K<���������G��y����n7�P����1���f����nx��������P���dG����>���+Y�����nF�7���s�4E*�o�qAF�@TZ��]�B#$��aE�(����&�q�MX�rT��p�o�9�@O;eHw�����~�JQ<�V3���gk��MT��R�F_j~S�e�M�|���~gCS�;����q�;�Vk���;�Is\���=���
RCJd��@������4�����qpT���;C���/�p,H��N��0Z�A�!�&�<?��J,�Y,	��x��7,Y�N�eN�hsuq�!���ocg���W��B�u���%Uh�N�����8��<�r8`�-�h���P��9��o����E�o�D�����aS>�J�9;�
[�9l��l������rx�7����7�u1(���\�/����~��5���l]��k�@"�*;��B9Z����F����<,��O��M5Jdh��1xL�������z��G�SA,�>~<��rm�;�$����F^��=[�bQn����dk�7��|/-�t"l���U�7�1��9{n�|m����[��I�������"E�i��d�>L�Kd����W�����N�
��o|��Q�l�Q���(��(O%�i��/���Y��[�o-�]t�_�Zw<�����V����7�n���b�V���(��2oh� 8C�(����d��F�R#�������:�Syz�*(6j7��M���*s���B�`�8�^������l�����=S�>=d ,�J���,���!o���Z&l��>!�8�0��y�����1�8��VAN���h�U����Z�5�/�����+!����M��@��gRk�z��>��;qG:�TW60���)�Q"�K2T$��g��\�T�.�j�~��ExAg��/�}��8�r��x}��gl���4K�������k�����d�lJS&�)%%^	��l��1��6�z�9����e��}���������c+�SP�����j��;�T^T���>��P��q@��Vw�	�F0 ��3��B�������8Fu��|/���:�U��T���xN;��T*<E8]�[$�18��S�O�7+!�n�4�c�d����F���������d"9���N���o���t�J����p�!��,e�;3��!���XfH�,#��k	nSk�Jo�JF�����+i���4)��/��A�W��� xR�.)o�`��V����������t
Yzw+�t1nO^���J�l>[0M�;2����f�;�����	�^������?�V��fs�����[�L`wXx,����q�O��7�;�+�1o����#J;��Kw%���t4�W.=����|�����<�Kk�/��o0�����>cil���S��D#0��0�B�e�"���o\��Z���Wx����W
T�qR�9���~yU�fw��S�]�D���*����b���>���_���Iq�RL��������
7<&SZA�d��K��A��4��������a��\c/�|gP�(E�����R*�y2�0�SU"�����!���{��/+��F���i�M��v��j�W��{�zs\�����
wE�!���%Ms���2����R��)�$�
��?_��R�e�B�a��W�Z*�y�B}���eo73��]��Bj8�2<�r�d��������
�W�n�gE-�k�RAe���?�����I����V�n#
�c�����]���M�n�1n�'�R����p��(V�A��,�����;6���5���d�!�J)W�&��Qj��g���${1J�n�u��0�`�&Ja�2Uy7� WR���X��
Yc��i�}�m����o1kopc����,q�$2�B�|���R�>�2�����C�E�U5�	V_���a���	~���&���b�|/�f�	��;�j<�Yb��N�y��(
T��$�f	K���II�T.��.?`��8y�@�X�I�5�$;������u]��\�p0b��x�����������q���Gtp�� ��%E��Y���a��x��5�1���D#�*g�.�����9�8�{�HTFBF�V�>�����sfE�����i����6�����_�@�@3�A�R2nW&��Kle������E�AGz����6z?�n��)XtR�M���I4�^�o�z�^�����Z���;�4	��P��6���@�����Koe��K���AlTR5����O���R��c����
T�{H�C|�X>�`����c^��z��+#��){��^����o"p�s��X�7��������*��N�I1��([,�����S �3�Og�L��
	"^�'?
��%����)8f��v����,6!CJ-��xLH��)��<�6�����S�Z�w��F�]�6��d��W���P
(m��NT����<G!�1���������O��qt�a�G7���CE�;,�m��
`�U�������s�f��(�L��	��`�}��F@�C���M>`B;��l:����/�����������w�A��k�!������O�L�
�_v�e�EL�S4y��zQ���,��;)3]g%^-*O�
���d��|�npy>�<���������X��No�DL�EK�L�Cb8c�B!���GF�x�V�� k�.����"����4��#��^&x�5Y�s'��)X�|��q�8�.�B���L�r=����{&@��I�E_!��_^N������@��e,��Kt�D=������!3
�h��1���l=��`���p{�@�3���>�p���h��W���2�X�W����Y�;��	��#[S��l$��_j�����zj�������@�'�30�"�d���"6�y�fcf��]��<V���<�������	����F��J������2�h�Ej(7�V�I��mt��7�Vk����OZ�V)TvU�z�FpC�6 ��rl����77��Pe"�7�Fm���"�}R�/f3.|S��!o5����0�s��
�T1G����><6���:��I��T������o�l���O)g��oj�bQ���*�L�*���viX_f�7K)�|-n�T�f�3�e3��8���G��f�0��������Z}<��Ccw:��j�NFo�r�hG��TyS�RP&�����aqF������geo��NC#%^���0qc���w�$��'h���a�Tt?�P�!��!�2G){�����2��Q+��U\d�����$����I�\-t�$����W��<�O3v������S���1e���xz=8������k
 ��MXhg�0�����b���~������5�HqL��O���e�#0N���'O��df��"���1�*'�%X��p�����������Eyo|�A{��A��
����c�=�ou��R�}��@�:�cf%��1'p����������|K���5������Oy.*�F|���s����`�[J��	IZ<V�Xh*7�HH���4�<�N�"N���-�S��h+n��H(��4f
�<wZ��iW^
S3���X��4h���s7��,k
p~��Y[b�2\C��=���m���pW���!p����pnIe#����6_���=�9��w&�����i�b���-��\�]�����E ��T{?�2&���^.e�|j����Z�����L����	�c��S�)�w.Bu��+��[��q�q
v�r�;[�
��e]N�/
���O�o.�?���k�0���h�I��������7��oAWF��z��zj�z����V�/����}_����].�##5q���<>A����X5�P��������/� �S��:�K���n!�]����Q�^W\��o���'I��B�����$�
�Q2F��w'w��3,|:����Ay�k�8�G���2�w���45=�6����s�����d�vgi��1��Joi�p����N�t�������xH�q��KV�|�c�=�7��qAWI�k���E&�x����k;l���6���Z�`�)SE�D����z�<�a��p�Q1��;�iY�����/��!��;20&���>���4,��Q�
�B/���Qa�w��`���=���Jv/I#cYhFq�/�AW��6���>(�D��C�4]w��\e�
w�,��������p.�[��v�7�a��o��n�[�[%-�v���k�-	-�o��z�:yc�d5r���$��7;�>dk��K��MR�����\�.�\��H��{T�V���.�M_�U"8��mnV��S���F��8M<����f��s��]��
g{�	/gaF���&�����R��������Q�y�������8��<
���M��rv*��w�eN���}*���\sBJ��<*�4��i�A�,�;�������Q��l��	�b8{�w��y&~��P@�� �l���[0������?�9����/
x�M�D���Q��2�6Q�=4���2bu� ������*�����+�/F=����������}H�.Eu�����43��L#'%�`���s�+���9oJ�q*�c������j�}i���~�)L\��Z�N��:����'�ZG
�ZT����cJ#J7�Q����
tXd(�_�4K��F����������sGq�x��:�-H��B?st�Z���/��/��5���5�z��
��m�w �W2�e���Stf����>�������z�Zm��z�V�z^��Y�{*<�st����-������h8���62L�9	���P	�w:�7	Y��h�mpp�����LX�$�>����1�'�p��0��HX���\2dR
�\�dqs��]�%e[
-tarQ����%w.���?Q@(����5%>8�0���n���LQ�������>`�����OK��[��+����	@����xq���W�v������
������,wF�,*WN���I&^E�t��������'������uB�����F��*
�r%��B�����7Xb�KAA7������&�yLw��9b�g�Dy���<#��iB�����H&�R	2�%���cP:�O��L�N�@=����AX}t�43��p�"A�q����N�(F��;!�g�|�1S8!�����8j>��$��n�;��]�pO_F��r�O��C�#�L���U0�����L.�){�B?��I8��SR���,]�A8ffC�^	���
��S�$�������% #p��<���T�^;���(3��`�8t����6p~�>9B|_Lt�*�������:�*t����T��U�]�>�5<�S�v�c���w�n�:FW�j����:U�Z<]x�I
�1��I|f�(����\ U�d����dtM�`$���g2ZI�p|��nD�/S #�t@��e�@G���k�z����w�4�R��L�Ui �,�r]�~��������T|S���X��"��U�_=�y�X���v����kO�EM���z	5��o�rqy>W�4p�>}��0������|�1E�Qo��B*���"����������1�(����V��)	�?1F\��^�8�B�h��%;8����Y�U�@����c��������9���4�H>����� b���U�^�+���A�����`?7;x�JY'x�Y����_<���-+X����������{�<�*T����,<)H1�[/>��m�b��*'������X��K��a���g�ef7�c�&oks������7�w�oO����B�=��J�����U�"�ja���:�o.�oc(I#�l	F^���Q��;Qx��2�$�1�=���}C{��m�wp�W��>��fZtD�~���$����9����t+��/lk����Fg�$be�W��O�?h}�0W D2_$�����VZ�K���_�7iL��v�?�V{�����z��h��2t�d��p����j������f5_U�G�Y,�=6����M�D�\!�e�oJ\�*z*�,+R����s��re�v1s4�t�\b��]��N��@��?�������yAty�����Km����$�FSV��j��"2c����i���P���?����
gV� -c������xO��)���aX2X�2�R�;���2�7�I�
��Z�v�n�����
G��
�7B{��P������i��@�<�O
�B����N�,_�4��x�&I��e��+��b��TK�E�R�m�m\Z�x	��{�M~X/��O��+j��8~������7dRwf��J�m�.uE�0W+���g��K���8�ff��]����O�:��[J�G����++���#���R���b&|��.8��E+����m���
&���Y"�<�����n�/���������QK5����6j^��Mk�ju�k�k�q��5�h`��"L�!�y�8�5j�lS��^����\�S0����d����#������o�a�S�?g34.;�K���'�:;�%�cY�<�Br<[Gp�|Z��X���+R�����Q��L	�����e������\���*~M�ep}}vu.���7�_��8�\�����������sD�y|[�fn�)!���G:s|f����o�U�=8E��;�d�d������X"U��M�!f���' 7�0��n��oZ�[�!��'#X%�@��I�6����5o��F����n��\S]	6����n`��� ��t6Y��F�;S5���XB������#���0Z�^���J6�ac���*�@@~���mH�2l����j����-�Kd0��a45w�t[���)	F�,|��S�] x��4c��Wi�������h8x��"�|�����E�PV���3`�JHyRPo������G�[4�)\�;	��nA����#6�3*V�#��c�C!$�I	�l��6$��&6QH����Ukb57��O���O�F����w���`���hx����_U�4��VX�*�	v�av�V�a6?��];?BU������� ���E|��$�/c@k�������2Du;f[c9����!�J�4���g��e�f�m$��V��zOx�4#{_���A����V�w��#J-�����;9]�"����g�>�B�_Ex�1o������	]J���Zg�8.R�U���Z+�����F5o4�jK��� ��.B����N#�?�8�z�C��t=<���X�3�H�-����b7�U��o�k�Zo��V�}���x��t\B1�](&�y�)�����k�����S���o`����,����f�)�g�l�(��1������Q���3��s-���f���p������!����T@� �#d�;;�t�%��=���T�U�H���������e��gKf��P��LE�Qb�w8�T�#~i�
���V`�X�z��8\o[�>�����Bp�,`���	�_�%��S� Z=Q*�����g�}?��8TGX�����y�A�cP��7�a���$t�V+x�M���
a}����h���!��a�Qf'3D��m �	�������c�G�����1�(�g��Qv�t���}���X|e�5�-�]�B�d�;�H���\L
EU[b8E���r\gv|�|����Mm^����_���	nW�Rl�12���B�#H)��#m���o�����f)U��Ji �((�X"w�*r�K�LeJ�iN<B�w���>gxL1�wF��h?`<��F�>z��'�K�W��0�G�~���O7��@��d�H��}�\tkM�x�PwB��p�+��������^��V�peh�%V�8��V0�nl,����l���d����Z���0�������uq��$3�,����]���'���n�'\���<��P-IYD+l�������RH�&%4[�e�,C[�L�m���F/���k�������8��/9(���$�}A���NfbG��bA�������&���c'7����M�<{Ymx`�����L6(G�.����6;�g&p^�hC�8O�n�?�r$�m��y4��&���(n�������J���{�~�vDBG�	���fpa�L���e���EW[$RtQ��,�A��7������t\�������zX����cw�G�%��[�!
[K����J�r���Y��������m��M���Fu��o0=�&�(�U���u����|9���K�LB����������_)������o���d�V���t�2������S#�avp^��"��8��b/��['��R�e��k�T����V��O���B�}.p)a<(/��:�0S�1�>��O�@k4 ��t(�?�K���+���T�Y���D����#�����
����)=Uc%Ef��v��w����PK>|�R�@J�g��p���d��|]����}�d��b�;Q&����ijSMs�<������V��a�h�6��t@��4�o�.3Fwg��$��-<zK�LLj�/���������U|���A�U(m@���<��B�R6�����tU��XY��\v"%�s����pp-�
�5e��6�H��������d��p��n�6�W�a��g�@��������E�,��{B��'�	��6I�������;�YEB��>P����j�������EY��� sc�C��dI�����z����V}�9�O�
��-�ItX�I4%��&���!��\����IC�j<
~�G��@�r����:.�x�,Q�V���%E��bR4� \���wn���[���1�����,��0D*�2k&��E-���+��p�d�_�8/q?���=�	Qf�Q �?7�����[n{R����~o�7���^�������e#rj���k�q�g��4�PzvPU���lH�23��;?� D..�-"��:S�#����u�W���������P�>�S������ ���m�@�&����Rp.
i���$>�	�l��	"g�1�A���tlX��m�C���=gd���U�m��6��!��<B�P����e���R;��$4{yNJ����~���I�����8�I���z�P�S�R���a��V6�j��V��S��-�����:9���,8R�:���i�AHk�������j
`�}cSk�5��)����8����p9�p���.���0��T��7Grml��'Y��Q����(�H����ul�V2e����8���&���$�[����������/�6'mhlx�,}DX>��W�����i�3�����U��5��y��O��[`B`������Yy�F������%����?'�x9
�t��e�GFh����F�owf@�*����e.��H
�c���=JwdOd��� M�xP6�����[ hO��5%w ����/���4�8.�o����KK7Z��/$�s�d�9gW��^}"C��0�����c��$�{6.�q�'��r;�02�:��r���
,�����b4�A,C��T��v:g�KV�x�h��ct��Z�9��%o�&*Uo���Z�zo�[�~��u��;����F�wK������nE��-��dUQX1	����H�����aX=�>
��#^B�|��#X��b�A����
Q���b�z�hq��U(W�y���DFBn^�FF���u���
�3/.o�CG`�G���'���H&�P����|��}�^y��"��:~1~XL*�\w(��^�"��#�r����r�n9��#�-?`ul^�X~�_f^�F�7����tZ�v<_����m�1[�����W�5)Hm\$��m�D����@��,��l�K�����r���B��/�0���V��,u��e��Q7�J����d�^�'�PG����-�n;�QW w`/������+��r���U���B���U���0GXH��IFIJ���MJy�b_�^�������k�K�U��C����T*�ov�Z!��	T��w#��@�?�B���5�IH���N�L��,|_�TKI�{�����]b�����<3�`����l�!�A��)���?����"&2��pH�������4�5��F>HN�[b��'��7�i�
	{�XR�p���A�N�1�<����A�R��}�h���,!<!�^P��%zcr.a|�]�G�i����:�������^�a$��2#�jH����R7���G[&z�=d��y7��3��y��EI�Lh�����5���)�~�%!�&���;�W�,q}�	RU�����Q"�H��j��B�����
�93������P�P��W�����`ED���I�R�q��`��o��H��W����.[��*/2��9Z[��D����<��}tgb�%��0`�.�0�r�X�	gQ&���7�,�?a$���6r����od�W����t�d�r1��n!�B��__�SbFygqqPQ�H���(+,	Q�Q�6��7������44.J]S(��G?g�Z���Y���)d��7�v������Q��V���>U����6�b&|*a#"uhw�b��^@���OZ��0�P��JDL�\iY�I��_��C�����J�9LB+�:7�C�0���U��N4:���������hpy.���z0B|��������w�@�������
��#D	�iy:]�b���w�O�uu��`$%��T{����oA;\x�+�����qPG}�r�3O9��<�1�b�����p��CZ�%7�2���9C|0�Ds=cPc<c4#�����g$�4�����r�a6(ob*(oO�7�����0s_Q%�+�|��xo;���9�����c���m�X�mX-7wK��!���<��N���B�o��A�t��>�P�K��m���$
����A6�@	#���,������������6G��S�q��M	Z���`�	�BU@��"G�Z2�^O���@�,M"G�"��N
O%k�g�2��-�Z���JsZ���)��D:��[�����UY�7YI����������{�c�����PJ�:�P��1����Vi�����9=���C��
���tv�k�KNy��\�.C��1��K�Qk�Z�W)���y���C����Y"�(�l8����@��u�/���0��,��x�����~���������4���%+���4j�.�F���l��(p�"'p/��,e�D����� a���a�y���C�\]�X���F��8cI��)�no���^\�N�1mu�}��Z�U�����Bq8uU*�����h��������P�6�i��?"<��bU/ !�8�y�8#��M�~�\v�&��BE���I�h�S��C?���l�Kq�[�F�(��-B�K4�Y��x���n�^����v�1���I�YMvX0����L�PM	
%v�g���#Jj��7���Q�������	c���{Jq��*��Kq�����;�VCj���'o7������,���B�t7�0��w$�^��Kq,�����V�5N�� D8���	���~4�'��������G��������2%zc�t~;c��Q�}����(adCA���W����_�t=���8�e��~\���_�sJ�@���)t��O�oC�'%Q�0�h����w��U+�7v���|����+�~`p���8B�!��{��T�|�|A�&o�VT'���rC���MB<l�v ��#��L���4�G��v��9�I����jJh���"�z+����b
�
\>����V2�N��c�����48�	��-�w�G�
A������FQ���W8��7��Q�pmO�}����@�wXW���{���
T��e&�p��)�|
��>�. �4��Q��(���x��R3�3kJ��_���!��%O:�:�gV]4���V`����������������������qb�(T��R����a�����B�`�#U���h	\
C<���E�r*.�T��)Oqtd�������Cn&�V�����������X��a��*�t(M���������������6���Ha+�^�^��<�.���o4H�BQ��)��D�]:����r��f�^c�n�Y�����-O^n9��|�q�}��h���Qo2��{�_�j�4<D���SE���9[��-���j)�I�e��3�n�&�N��t�f�����f�s�����������7F3�$Z[-mjUmR�VYm5�������������(<��[�DJ���4\��������n{�)����9��������e���	%G��|���mD�V�0H����$���(K��;�9n�F�@zzY/M������������e��������`��aW�Rp?QaX�����K�b4P�T#�tn�T�d�<���2�|�U����]���S������2�
~��&�|����&�	YJ���*��������d�e���y��SOPE���Xw���������e���w�l�X���L
a��l����FUY�.�BjX�T�����>�-�����������*2�eV;������Cuj
$�����0��F��]|e
^
	B6�����/3�w���G���r�		������n>�L~�K���#_�$N����]�WZSJ�+1�
���=��������.�+�Z%h�������@(@K�%���l�XU8�"�a�������w�����������N���T�&���KJ��i!��6�4�PY�;�����
��&	U������]/����S	br'��>�&���=�oZ�&��jQ��U��(]��:�I�^��?������E���N|:�:Qad�9�Z<�.��Xq`�0�>V����X#����hc� a]����B^R�I|���E_I�<��7����f�a�;��$
Q��),[�����2�cW.�L�T|�E��������J/�@0G!lR�UE����������"��&r�X���G��OlB��??��j�#.�$�G��%�?��d-��1�����b5�}KK�]98����7)��#�N;�&�F~;��w[
���D�'N���*)�{��p�fa��������r9{P��5�z��Ce���x�xw�i�?b�ze�a!W)/d@h�w��~����+���f�J��u�n�l�>����O�l`S��X@d!���J�xg*�;/������vC���D�l���i��nM��V�9�����S!�����c7$N�q���_Q30�����6�N2T�����{������l8��no�������WW�����{�����w�B��T��Yk��_:�IE�������x���wq� ��gd�E!� ��f�~R~���V�p��������m�D^dJ������A/�3g���S��*')Z-��ne������B�E��-+X6=��q�����+c�l���68����.{�����6�8���A� iYV����I��N���h;��+���*_���A�Y�t��/�%|W�sI��a;��'���g�;�fA]��<K*)/�-|������O�����8BD�:����td>I��Hii���=��h�sWa�f'���4+���
�e[:e���*9�G������C�-�_1��|�ZfV�Vi����'\�%�$#-��6zQ�/��{����U2�+�C���(=f���W�A��+�Ex���n98R`�O0$����[u�����%qq���!��8*�a����
����3������!.���|0:������p0�z��gC&�BAXs���NgB�R^���J��B�X�����~
���?#0(�H��D�	�m�L��w��X=���^c��Z-W�����N�L��hd~�	s��4P�4��<'��d|�W��T��;Q7�h�9��I �&�$za���h�����.�z���Z�<�{���9&����/���uQ�WR���x�~|t��?��Q��f���cN��-/{��gn?��o�q������ND\_�������)�D�2����|z�ftq������4_w�&��?A+p?U����S�����������ST��t���4������6���W)�M�$�(R�
�����d��f��%����j59�i�	����t���3Nj�^�^��o�����u�N�C�e���[�}Q>�.)3�[=B�k����}��W&�/���4���o��J��B�amI�t��
a��V���������p�3-�)3���O����8�I�7�1����2't���������T�z���
.��g]^	�}�f'�g	��o�Ee��z����
�V%��y�^�_�O�Uo��;���u�e�
���������������M�iJ|q%��Q�^W���	h=1R�����DC�[G� �3�w���������,����DE�qI���!�5,�;P�1^�h����[@T�P��Y�1�i��we�d~SST\`y3��ta*�������h��,#�1��0��g����UB3���������������@��fr��|���lV(�Fe��x�W�%�an�TA��\�
�an���P��:`ru�z����7��p�-lM���f���w�F��=+�����Ii+������@��z	c���6f�=���Uv������`��f�HI�=I�$����(T���I��������9��*3����02�C���5��I&�+��~K����8C���$�,���(U��<��=p�!{�coA�v��\�=&A��c�YF��U�p�1����{��Bq[����[��~������������
�0���(����e��b/(�E�<d�r�hS����!MW�w�[L-ST���b��Y���z����gp}��8���#>s�UEL�E���2+�<��x(qk��6�%=1a���T�U���!�j����u�Q �5{��J���w����]������(c<���c��R2_�W0o�������{b5����%f�D9����8?�n�����k��)_�p~�R@��e�/�o�����a�.b���$�N��[~�Q;P�M#w����v��.{?�6h8���Fex�j��w�7�����%
�)$[����Q�_��ZbU���v�������l��4s�%�9'��bW�3q�( ������X��
���[%�&l����}����_��n{����nc\�$ivV`�4�a�5F�$^��?
>!���X�m���;�P�����`�&��Ms�,�V
|.&�(��F��I�gQ����2(_d$��2�de�rZ�@��������w���B���TR����sHz������&�BA�}�2�9+Y%%O��(h��N�����z4WS��OW�(�W~,~��;v����K6����K.1�|$~f�C��U�����H���2Jg��/�h���ZO���1vX���/�*��������+r��\������p
�W����$�7���+��B��hr��$������Ip�}���9w8������cx#HP�]���J���=��b1~�(Wc�K�E1M*Fw�IE���
$���R�W����x1��+��9��B-��%��*M��g0���}��+�p�]Huy��8'\G����:������k������`�!��H0��t�h�IcT�i�q����qaK�����n�)�R�R:�G:�����������2^���F|�c��`x3�~6d������� Fr%�T�9? �}�����l3~��t_yN��[;���J�?Jh��W�|���@=N)}����9�n o[����j�f+���m���:����������f���Tu,�&��f*d���cYB��sC�����������}���w��V�-Z'��,�?��Z8�9�YW���a�������iX���.0��f0A���)���9��
��O:qT_Uo�R�A'Z�c�0�1H�F�������%0�������_��c��'�vI�Pa�a5*Ib�V�$~PG�LLMSa�S7�zw�;�2>���������j��M*���������e�^�Q����0�a�A�'p���-�`�F�v���f=_d�q���
��OlT���a��}��{�o�WN���	c��yt_��1u�����TG:��]:J��HFA�H<z��*����G�c6{��@�C��+4BP��>i����8����R�c��c�H��������#���w6���w����P:R�����.T�}&x{���+X6r��B��i��@���`c��E,�H)���"#4�?8�c;��W|�P���b����1j*�2�)>����5�c��~����O�?�����
<?r�3��z�� ��G�Ps�G0������A]Y�n����|�>K��X�����#-5��q:���X�O�?�h��_��R�����	�7G��c:��G����x-��|=8'�#�����&S��w���.�{�=tpi$�ubbD�e�8w�w�w���o�2�1�r����b��n!M_��s{���7��i�s���_'���/���������I���Z��+^hM�/�f�gf�s�E^��@��p�v��Ax!�G���!����_�~��C��7��_$A����O6v_nc�;v#�H��6�{�_T�#3��x�j���G�cB�!�2���#���
���F5X�'����?�**��-"n��a��Y�U���u��7k�j�mu�~k�l�);��*�`��/(�d����z��`���]���!NJb��HN� )4`}���:��,����x��a�����X��0]���jSW�@�u��JS�6�'�R�J�����	q:��e�H;����t��
�3�Q
�5:>�Z��LkH$��Od{��e�������K5��i�65��5��(���H�LBz�����|�����x��q�wL�����n��(�(����^��On$'!�<��@C5�G���r��/��31X�W��U�RD���������cVkcd6�@@�x�nZ��jmY?�d��g[,(�J0����������/�����'�H,v`����CU9��<�z���ln6����M�|�!h�b������ga�r���jp�7�s.~�Hg���-�F5�?������l��qT����l$F��X9�!�b���N��B<�`7h<��Y ���{=2;t��(��{�������BJ1�nj�b�&��n�ER/I�j���{-�{�Q��1
��'v��#&:��!@ 8#!2��`���691���iS�n����E��,�c������?���VM�7����.���E�X|IkJf��L���fY-tx������zM�����V'8{�2���-�t�dAk�?_�?@���_�k�p�^�Eq����q�"X��;U�)���z��P6�h��:��/��0�j$��b~����QF�����W?�8Q�������l�&G��jN�jg�3�����!?�8Y�4ox�=�
���*����Gz{�"��^_�c��=�9���0�H��g�`7h(o�y; Gy�RI���-��/@��$�%�1!��>���Ng�:8t�n� ����6u��V|d�������n�o�;�j���M��I��)sj��+8��
�x��Em+3�j��#%��/R8��rj�H��j���l�4^"� ��M��f�'�&\�~s�������u(�lPaJ'=\���������4��G#p1~A�4p��7A��L�����U���#M����W�B��|�����R������[���6�Z����\�4�J2�GY�I��������h���vC�/_3��UM2�^m�<�,Auw��B�Z["�����H>j{���L��s'w�����X[�����+?�!��Jw�M"��U���[2��[��|���#ys�ns�T#5&�luub��(�s6�,U�EIW�V,�E�,��C������)���A�����z�������g����,��n�'������!�����{3�G�a��
�E�����w�J�{��g������i�����9�M&g�<9����l�n�oz:�Z@�"��O����pGtq8����%�V�8�F�����0KXur�����WA!��o�v��B <�.y�d�A�����$�� �:�	�i�'wv��/�e��!9��	���=$���0}-a.R�n�F\���omC����Jy������c���H�f��L�c��V��~�o���Z�u��Bk��2��$wjhl~�����!���/��j:jy�����S���Q���S�]dy��������dM�OnY���e+�_�4�?�%���&���	J�HB�,	T��<?�q�;q��yf)@s(b���`�B���l7����>q��
g��#f���E�[G������ �Z�����WX��>4���$��������,8��,�<��L�;�^�����_��j�����8�w���R�T &U�wS�y�����K���taY��!I�l�x��w����,����(�D���Je����������?`[H��+g�e-�w���$gE2f/�2��K8�:-YC�a����sk�v�&���w}��g2p��3Y9+��G�����~D�K��4;�|N%���J�3h�D�����	P��lIu1=/��6f�3�	QSU�;!��)M�IF�f�[v��[F�<:C��)j����=�:41OZ�$���$��
�82��y������N���~p>��|3��!�!���oc�_��'nV*����U��^IM�����8�$�Oq{[3��>�n��(���
���X��tq�B�N�`DcP7A���8�m�D�������G���JJ��xln��Dez������\x����Tr�t�p������?l7���^/��$�q�H�^��#,�<���W`�����hT.��~<�������<���:�.�8�!;�	�F��z��V���WH��}�a��(ss���Q��h�a��1 ���	7a������(L��M�9��*A�B�*5?��XO���z�]1V���{��s
�]���e�z��C=4�1~�zj0��<�����h#����%�=��yD��YB`�8�MD�$\LE/+�df?<�z�k�i�w<���d�dY.$!+6���O~z��YY�5C���Y0�a�oq;
?��<n)�������=e�����T�S��K�4;�g*��K���������
���R��py����������:�s
A��'h"9v�r���"�<�k�(��!�������nLp���.�����Q���A\����P��V9���r�n�m{�����j����]�Ss�e�r��"��n�&��i��B"�XhfK�k����W�g��X{������w������dF�M2Td�I�� ����|B��4���V�@�Z��iV(����#��
�E�.��c��������:��E|�#9����\�kA��`�*�����;�$#�[i�[S�Br&�8Z�+ny��S�����R��=0��|�z��!�$:T��|�8V������o'�F��}|)!��� ��	��~,9��'w����&Wr��-I3\����VT���\�2;�T�(�P�g��u�����N���k�I�[���2�a���=1�8+���4�m%�M*>Uu������Wg�������Oo�2}�����s��<V�{C����vN��q�~�o���q^��������7K��b)`�|C��G{�x�0�Z��Y�+T]Xn�Q2T?��m�����uv0
S�jR:~����:��D%�SSX9��]����f����f���O�0�$�+���Fy���L�w�M
��m��Cg������`��($�
s����>��������O��`��`:�T���.�����c�eW�C2KU���^t�����]��3k=V�B2c�]�u��o=�9����5����N	�9�[	{Y>gp��;�2ju�9/)^H�gn�3���w�WD�~��sz�i����V���>Mk?oZ�(�f�M�m��>��R�������,�wj9�������~�r?6�3!;:���������_FB3�P([��
od����G+��
�����|�x��C�����Z����M�4���`6�
�5"M�}�����9?!4;o��y-tz����P\�bM��
���N[
�
��y���t vg�Q������W��������BV�nj��v��(=��M�jei�zD����[Xs����2���{l;J(�DTE�0 ��MA������M�T�M$��
<lrL��&w�+@-��F���?SE��j���*�UIe��HyA]��3���U��Ke��*`�@��'-"�R�uB��
L{���
W��<ir�egr}�I/���%�n���l#����n�S���^���L�-�Yo��T��z+0�X��"@���Rt���2Dwe�!�C�1 ���@h�0gFF����F�C7�N�&����j�|�k�W��������3,�7$���b���
v;��P�S*�)�g/�w�����SN"IM{�>�t���C���DC$v�n��V���a_}�*�M*����f"�-����`_�U�!#��D��������s�:3�#��{���Ey)��p�������z�?��cV�����o��I}�����j�������N}i8��_s�������sOf����<63|"
+
?������oC����h���)�H�I�w������_�d��`7�cql�Qq����w��Yjgl7O��Y��0v�Y����?1��p����%��#b�Axv���W����+f3��XP3�6���X���\e%�����L����p=b�3���k�k�yy1����*-���)��}f�*�@�9��I~m�B�W&�|C{�����+��.Y�e���<�e��m'�:�DkQ
��Q��33�v�T��ZtHi5l$dg-<Y��M�px��������jX(;-2�����f#u'��O������Q�;�xku)����hPv����:� TX�X�w.�\��`$J�oi������D�b�C4�+s��e
	E�k�B�B��-��kz=�G�m�[��+UR����|�Z�VI�
��y|��*� ��9<�9��'�p0��L��P�#� T�=L��s�+1��6�&mH�t�����B,�{�!�<f�K�}���t�!�_�!l�������b�H~I���	����a��V{��i�S(�A4����g�no�+yg������m<�$����V_*���5Z�=�UT���m
�Gs^�Z���L/|�@e�dt#����?�����B
�"�_T,�f�����\���%���|{�|��5�-]��m��H�$���9(%�iNK��)`�]�5y�P��n;��_����U�3h��8�A���nIR�X��q�����K���/��
�M�em������O��i�
�]w�h<FgE*������,���,
X�P	a"+�����,��B�8O�:���������N�>�
qH����N���1ZOV#7��jt������SB����|;���<� �:��>�6����J��`�

>����JF����!hj��b�P���K"��-�M�g��O+�#�O���!�'A�d�4\�"��$��w��/p��`V�`�9�p�R_P$b`]CW��������u4�3w�+<S��U�~��S�(���d�Bo��`d����tUa\���2S�e�d�I��X���rs*^�%���W��&^����U�S�z������t�E"*��|�����Z��h\Ie&?a�u���t�����!�9y�������{sVF"������/o��/�����A���V������N���7�o��k�|�Z�:��W���#Pf�u���%�������P��4&{��r9{x��������<]�a}�
������@:�I��7�>l�"�]��(s���(���������CU\��
y1/�-��",�X���nO�t���9ae���:�0j��GS�6��T��9������	��X�_lXj�3,��$y����I�������'f��������@�X�������\�'^q)��O55�R�Z�B�]yZ�f~T1��)��N�t�NP���{��N�=����S�w�>{����c,Y���,\���}u��q�61�����j�4��/.�^�(�U�
&����Vi��h�:y�����_�N�����m6S��5���^��6�<��z��5����Q2q�
��[\Q*�^�c_w'������9��� g4���n��t�=���V��q�����~���Y���lF�	�B�E9$(�����#��ii���������Q�����a�
�s?
5bA�����w�*J���\����0y���H����_g���������;�$v�r:�qh%��|�U�ByN��'������<��e"2����O�#a���� �<���
�&������S�K,��.z�j����<ON����6_�Kt`�E�$���<l�8Dx1+k�+�������3�q)�~���C��g��8�,'C��R��c&���IqBC�i�������2�z�
f���U�1mH-�#+W�Z��t9	�PJ��H�[�%��Twt�E2��^�+�,�#R�H(���<�0�������P�S�O@|��w�"��U����S_�>�9�f����~eK�[��	��e�|��t^�Z���]c5�v*I�Y�������w�n�g/:��r�vm��������O�d����8��(���_��U��7�������
��.�y�kA3�O�b��`�-�s�[���B8��������j���3�~yv:��������M�x���g�����x�e��u�����%�k�A��s�S�2gr#�v��	����%�����_e�f�8iI5�4ux�i�eY�����j���z4�}�1&����6���YN��9�dv�k����i�c2�2@�f��lTF�]�u��O����J���?-�Q���Q0����F���������v0#8\�r(�(}~�`����O���O�v�������S>��{&��%QF~��o�-��o�1bR��G~�Hb@u�������<8�����'b��m9-���'��ju��#�*^���X��c�c
�w��������l0z�������N����FBPc�����OI�crlF�H)jj��v	��'�Y��yS�*�z�G���jc169 �6����Xk(�"K��V�Z��]���R'b/���-$�-���d{����u�<S���J,Op��N85wu���}mK=cAt�{8��7�&�������s_��q�1������O������?
#:Y��%gV�W��c��X�*��d���hL=_!�N�t��2jI���8���O-�L$���BCqR
����� 2pc����s�>2���\��	�A�"���Kc"#�&����D�1h7Ui��	����*����%o�0|�ZJ��:f���Q������M��g�u���~�]�'��!���������$�~t���O���g�����bnsl���u�BL\��p� =�'O��q�8��G�T�����b������:�9Jv�}H�_��f|5��!:%k�*6����38~�[�j��=��n(ms~-����^�E��������������%��Wd�1��Yx����s��e�����G����;�cF��]4�\j����v���Ht��������z��Ij,����J�=�6�c�5a���lj�Q��*3�7U������ ����RAb��Z���T�AAiPV���4Z��n����D��-Cq��*��]�fj;�
]�����H0�K�DTd����IO������>X��
F��8���@�a���$()�|$�k�����\��NetLN��~��L)�j��>E+�U���e�"IV�o��SA��|�$Q0T�����M��W����~��.Ai�����6S�m��f��,-�|�j���|�r�:>>>R�-�vm�f��$`�F��fO��uR����83'��>��n�n����0&�d;jlf��Xs'�)4���eo������^B+�D���$�]%$�/�g���}�\�\������@����)����wb�h�H�>S".�%<m�����j����d�9��JZT�"$���ef��[�!�a���3�}_����l�����@�b[�`��#��,�~5�`��#�n������I�49�A	�x��0p<�O1S����h������^����E��t}�Y�4�5��}g|����'�=r�>yV:Is�����J�H���8�*�:�$���F���8�.��i�p'��;�=��f�V���j���1�+����R�cH����7����6����&�9-B`�,)/1��|�{��'�
����B���C�]�o��6v��H�,�de�a��\.hcB�C'j��G�}FF����3��2����t��{��:#��F�����W(ez�!�S-���d.s��~������:��>��G��=��b1
y���B�R�r&��Mn�l+�q��I^{����+-��D��v�������:9%��N�:M�*U��o�O�QH�a~�,�+^���\5B0��BbX�3�{4�(��d�5&����P56������7��/F�h0�F#k���C_�uF��0�l���
���"�078��#��"
�F��{��}�dzz�8�O���h���a_�.���K�S����;H�\G7�x7���w	z����1��8�C��/������6WHl�%�tG$|�j�2G�b�U�/Z�b�" �8�Q`@����
����66�e4��P��Z�N�gT��B��G��;l�B�����*�+�.m�_'V�9>�<�e�����S���r�X�������
]i��]�U�������G����<����r���/��'T�'Y��?�4���$,��7�k�~If�3#��7�w%$>t�p��&c�`��8d4�	@ �.2��Da���EB�^�W�U LO=�,����JB���]j%k�*������������78��H�26e0� �t=O��k���Q��O���]M����w�3+CA�BE�����o�TtA��8��8��������P7_�\Z��C]I~8�t�����bR���,���������(�����:�vCW������M�?��2t�6&�r��'�i6������-�w�����R��P��(����j�!X���
cR�FE:Fsk0�<�o�X����2
�((�7i����	�$Rf���i����a�|��
���P��EM�����-Y�LU#;��>�������U�o(�TL�����
�-O���7D�{�'NV}��9�r)����L�V�(�jr+���h���{��P�s����i��Xi���G���{�8�p>���9����p
���� �\4�I�mv�����@;U��C����B}�H����/�[*L�h�1R�����m/����=�b���aD������+�b	���������"O��IU��zP����VY���%���B.����)3ZH}�N�_�7"�e.#�U��'��h������lV��`i����g����b������%Yn;
u�b�m4��`a+n\�����1���C']Jh�FZ�� V�e�!��L?c�d9��E?d��u�Y+L��:�e�����q(;:b��xR� ��{��\�sijk��<��OF�������7�#�p�������P�G;�lg�/�h��Xb/���S ���Na�8�!��JG�f�LV6Q�4X��>U:��A)�TP<�ij�Nv��'Om����.��� ��S��
�**	y-�TZ���"K�pm&\{O�������eEVj���XrI��D� �H�7�������
%S�a��a^�
�?��iP�=��>��F��������h1��� �kr��;�M�3))��������p���1'����:�|}��d;���W���qc��-|��m���wEJS%����K�x���_�w��6�<�G��:o���Tg:h��$G�����-������X���`�x%����k!mp-\
���4#T����6#_�Fb��-��Y��������0�{
�}����-*S�WS4��%h,^���5��KQ���r%���f%0~U�$ /��x"#�;-B���#������������&i�(\�����- �`���X��F�^�W��{���xa7�r�}=����m{�lwi{���H�$����V����^va������l������zS{A��1M��������JX�D�i�*���P���m]e�1��Kh�|�T�lM�cu2F��1O?\�l�:%�A�������X&�;���<D�������7�r��a�Uu���&�}mrG*+�I'
x�H��S���%�Y�{5o�&�[?e��k�$����c��:a{o�v��-��;84K��iZ� �{�>���@�/:�����]��}�bHG�K����N�n�
9�iN_�)#��_�C{���:�����Kv"�@�6#W���$���F	��=��>��V����
��$RfF�n�r�
���X&�����Z�e�2 ��p�����$���C�Ko�R���[�e0���qJE��E�r�[������YH^4=��Ne�GhK�PW���
V�
���9GG��t:�*?<�����5"����	��i�X|�_f�z�������8���Rq6�o�*L�N�T<�8���������o��.�!)�C'�=�A���A�?�x�:$!�H�
f�jId����#��,qO������cp�SB�t?�u�+�c�S�Z=%��R?����Ym2�����n�A2)�b��8<�G�	�i��,[I���3YKP�����De���V�^S���^�P�9��p���aV�J�X�V#����'���)�ZC��Q���������-A������C9����c�dU:@�����o?�6")���}Hj%�%~-�u�Pl�SC������������i�/J��Vf��{��0�(���fi��Y=dF�m��b�}?�;w*�0��*�%}L�d#�>�e����R
���~��n���D��\�]D�NY1z�A�8�_�\��<$�&����U���Z������TH��$X�&���v�{��V#���6B�	�Cy6����^���L-���:�d���
 ��t���'?�-�
���D2v��-,��^�����A��5J�'�V����=TiI(��5���;�oYZZn��CB�*�����8��u�����$�����1���G��4�[Lr�����z������T��c�`'��L���8�{b{�3"��0A=�C�|"~}��F-��x����{�#����xQ�"Uj��!$�Tu�`��>�B`�����Du��X �3����1���V�����G0��\��P�&4��6If��|���4.w/�x����-���S��V���#b�;D�vg�<H!�LJ^�rZr���C.s��oh�$s������d�c���Q���C3�*&�(���R5��I�Ar��	WY	UA������O��jL?1�r�%�����"��Ot�9C��SNn8�oQ$5�G~8$�{����J��)`K��E�q�G���,������;��w"����*�C1c!����W��.qI�S�����H�L:E90K�|�,�~
���?������FO�`jA��fb?����-�Y'�
�:q���J��FZ�R�C�4�pX����z�����pBI�8nx�P�d��m��;i�4X���n�/�K!�^0�2ug&A�&��0�I���]	���� ��t���cHL�"L���W���������+6ox��U
����C��-���"�N9b"��pH)d����WW�������W����-�_o�����}�iA��oAL�]�����Q��A7��G����������_��%7o���)���M��
�?
ig����&�u/#�c�����S���Q�P9�m0''t[���#?�����r��8*-+���Fv�AI��7I�M�$��a�w|`��b	CET���N-?�I�'i�M��F~[2�l�N����!y�2@��M�D��+"\;&I.���T��I�,��M�o���-��|��P>�	�k~/1�A��	T�Y��@�$�RB�H�h���Z����E�"�m��Y'�\������/������/�IdWe����+P�%�������@
a��a
�+�v��(���r�9��3Da��Ds_�$ny��F[BD44��}Zy��5��0^���B;	#��vP�m��^��8�ypi�Z�+g���I���p?%%;0�)/��?�F�?��8�����#���	�7A����'��MS�V�8�H�0�L1>�-n
�>a�7�:�����V�_��uL2s)�l��������._��g�;����zp%�Bd!i�@�r9��*�E����\�YX�nc���=/��^�c�;��7h�A�.E�K�%�+n�V�>�01�}�
f�x�7�Ue�B"�5���5���z��
���~� �"�P3q�wx�u2�j��0:s������k�=l��#�����the���`ES"��t�l	��iP\�P~P�:���J������/D_W\���	�1���b�3+�*iER\%C�HL�It�U���*I)����3��3T�ZG��Xz��@�����z�>M}x�i�
��~T�����i����-�ms����Uy��kt��
���4����i�I���dg�4�y��6���
r���Y�<PmKP����'�,U�bg�j���T��>�W��Q~����/kwv#N+�����N�W#�����e��><����������J�,lm��������$�?Z�#�:�C	sa'�8��/N�����
� H= ~/n��]��>�a(�T��p���3;}������;y<q�O�d5{8��0������5�m��LS����e����)��n{8#9��#�����\�q�>�M�?|/�7���3w��k���{����{�9%|�E���,2�fh\<%E��������^�Peq�ph�k_��t�f+J7����d1n��4��_�eX��d�����hVc��I���~���n�����t`�2�s���������b�)����-	]o�F�L2����W�?M���[���ks+1KYoV��rK(O�7;���~K|�j��P���z�����M��}'!},�9�IZ�b���Eo���n5z4h��=��0� 5����&n�m-���R{�n��"<&�����J19�V��[A����f��C�8jx~�O%<��M����#6JM$�`�1�I���b�,���Q?��u���c�6X�#���]?��j� p�"�o'���i��1�0�w#T�st����(K*�S�@����� �����}J���*�f��<�o-���N��u�<�n7Y#��<�z����
��?��Y"(���;u���������V���A�KGZ0q0f�_gt�����n}u��+�\�k����yG(����#�`'X4����l�����Vjt��Mw�[���%���80H8k
P��=���{7�0PA��OQO*��N
w�)�����@S�X�6L��M!WU��H�������m~>_V���4A��pO��P���M ;�h�j6���N���=����y)�/fx��ct���k��F��)�h1��e�+����o���:1j�^�dM���e�	y	��6a&��b*���V��.���9��^�7Y�Kp��f��>�����������!����@@�#H	��k�`NC�T,c-q�}�i��E~_&���b���6t��8%�;�5�Gf�3�j�Qr���s�����$+.k0�8���\h<Q�������q��4�uJc�JX�9T�x^�2�'�:��^7�`����(�����L?��D�6���~v����0����1B�e@	q��h'��*��h&!������8_�f��p��z��!��d��$X��n�������.���Hq��
�mT1��1����euaBM�Y��2�B��f��Ke/A������8_�B���[��Y��f
1[;������?��#�*���Ki0i�1Ef9-���p���WM�6��%�������#[�c�o�X{��Yl���+1E��7�������{r�:��8 �N���S;�t����W���=�-������"5���LD4Q#P�bn�e^u��GR��IR�@*U��X��1���3o0~ �P|U��X����9j+f�Tu)mZ��a����?x�Q.����W��|��D�g���b��d�����s��|^��&.�V^�����]^��I+��-��9Ob��3~c�w8�o��dH3"z5b?���d�t!k>|�v�7Z}I���H��y�V=�M����i��ff�����,���E~6�E�3|������Qw����I��r����>����	�F3O�����3O��<
Dd��||�^��[|
���������$�\h���ArZt9�NR�]��i��z)�DK���R��|):�`� ������0L�;���N�u�N���3A��J�\-*Oq�B`�8O����|�npy>�<��������}BH�m��	��/a��b7ogF/;x=)(uW�g��Sw������(g��7���S!v|� �'�[sB"\z�������}��T35�t�G�vzY}��BJ��+���f��:�@���'e�s
�3�~��8�P���K�T(7c�������b������K��J�b��`����X�/#�D�����tEj��Q���h�7b��#�B���g�����R�A��/��n��m�s�5�����
a|o?Z��4��O\�q����%u�����}������.O������AQp���$W[��OTIA��\���G�q*�����t���<��]d#�s=X
�����x� -)b���]C��]�]��y��.Z,�{k%QK�P!�)�����Qb����{�,�	�Ti1<�
�C��&�]l;R2��cG)}8��H3�l�o:4L��7MH�~����kV
�t��@
Br��2��(�������2�QC�����*]�8v^�7���gC���4�c���Q���|�����_5�����u�� �D����J�;������R��-��o���@������v'����v�R��r
 d/y����)�E������J�uA��|��$�Ov�d�f�F+�?(2��sZM�s�@w�?��Y�s��B���@[����]`Jr|����'������%/��Y�
�+�]Q��4�u2o���;��R��;?�������\�o��[���>�BrFD��#FO���b�7`�FE���#
�N�X���S��A/���g���)����m�b�7�#�VZ8�R����JW����!v�P�MW�Mw��)�dH5�}�;`cs#t�5P�w2���CzTz�n1��k|g���3�2^��,�,w�~������!M�S�J����	�)q�P3	�=���y�z/��=  �8�]�>��$`��/�\��ppLBB�U4�'�f���Y��:K���zt��_)4����A��j���?��������A�������wK�o5{�����_�V����%0#��r+��T�$��F�F�
5.�vB���9����z[�����`���������A���Hc:��.���W&�*"�d��l%�%�?��%��H�XiL����Q$�D"X������8�z��r���'+�@�'�����b�%QR*���	���.��}2�)�@������Rm���'��E�xa,�B��<.7o���7�k*P����`�����z�~�B��V�m��qAP�X�T��''���I��;	 X0Xf��|�!l����J��1�'Io�6�p��M2�25�}���:��c���7�&>plM��g�O87���y����c�`�T��V�6<	�U����y���$�&���V�����������S����Y������!����?������!�%*����j�j�xs3����i*B�����9)�L�cI����=�n�|��@`�(\������#n!/a��N�p�=R����u��[�"o�NZ0�8����U�M�l#�)-�U�t�)�I'yRQ���uQ�^�n����jQ�\��1D�3W��.C�:�r�)'���8S[�=U�����-G��0-���7b�|�R�.E�����S���D9PbS!Tt�'��-��0��U��F�V8�#.�SF���S5,�� sw������Y��V�X�*����OR������1 5���Z����'���mqn�o�2z��k�\,1�c���	� U���,QN��h�����,tl��U�KH��$��z1��%�d�G�r/&bB5����(�������|�O�QB�%�]�Rp��G��5�� Vk���'`D�3��Ka��}Jld�cw"n����md��p�`����+�U`��"2[��u����i���m�����Z6�����v�v�I��!6�u�������J�������O����d���(;��
�^9q4�.G7��CdF8Y7������Q�r4A�c���v�'��}���"�� C)_GQ��&��I>��-�\����������(=6��e�c��x���J���n����S�����������;�P�-��00.���Y��f��6D�>�R�����&	#d���� o�D������?d�y��Cz�>�g��M&���)�n�=�F�������e���Ba��t����Y+�Ul����7�6���;���;�
5���w�����#7��(���1�F�10�!�`�.���Lk���b����
DX��G�����H���>��o1n~K�Q)���p�A|��Dfx�L�l�w/^_P�X��i�����l{�����G���Ata��:'m$	~8I�{yP@��q�@[�������S���[�{�����������*e�G
�X�8��m����ou�����v�����z�iw�O&���;���2)��'�}�]���D5��tQ�c\m��QVZF��fKb%�Apo��������>������Z4�#��C�B��=�! :�Kl8�8$m��e7�������o�p�X�f��������������M��j��;�y	WAj|%}.�������u{i�����l��VM�T���A���+���������}sq����������3c������r�F�:��_�,���t����(^N��DTi�5�.�Q�������%8��D�_|����Q�"Db�V�$hy
]]0,�����Q����,�ED�l�P�m$]�#����Z�M���m)�|+e�zp����,��������`ty5�����z88?��N���L����7�f6
gY2�/�n�i�1 D6�m�)��]q�������\��+{D�xy�Q�l'1�2UI�L��#��Ed,r����J�kd��G�kmYL��/��@g�B���SA�U]X5[�zK�1�}������! ���m{;a�{N�uG��c����
e&����������l�@�n����@��g�V#g����	JWbzgq�t��d�;����
C��TH�m��R��eo���Zu:1�� S#FCD������)�x{����p���C��?$^4�R�,g��$�F4&AG�N�C������K�����PG���TF�rd������b,���e����.8��|
ox�t������V[�Z���6�~�S��������'}A��s!�����>c�#�����F�����&tbe]����
�������RlK���;�d���M��W(�O�>V``�i�|��s���46��
�X�m�H��a��t��_h\HUw�S�����.�O����<�:djUC�,;H:
���_����r�0���C���2��4�y"dD����Rg�b��H�7�h�����R���W�&�n���9}(���]��d�6v�.���D����	
�
m
h�
�����b��v
^�wn�A�e�&���������`uG�r�Y�-��MF,���B��o���.�S�4�_��N��	)R��o�����Q1t�	��H��g����~�����A&�-��)Vw#q����cm��2��j���0�.|�]�����P����~sA}���-xg�9�V"�+oD�S!�@��,9^C��x4�J�dfD&Gi�2�:J{QmM����L �W����V�"�4����0Qg��Z�\B������p*Th�Z���#N�����H(�)��E����v&�r1��a����e������d�$�����P�6
���G����YU
��Gq^U��A�1�f�	v���1�/�3��0�PA:�Ihm�G������
J����EI����[��`4�"��4b��aI�qF��{���
�b��9G��zFt����$P��������ymE��`q�������b���+:�zT��_u�B��
_N�I�p8��W!��*�89rCV��8d�V��e�'$= ��C�� ��(��Cza{���U���z��$��P��y��������M90��^e���3U9�8g]�JdE^���6+$�0J�U���t��FCn�E�$�"D	q+y����,Nd$B�c3:�	��[�0O�lPg���;
�1����u����&�v���y�Ph "LT�����KH���tBlw����/%�_qf�x2
���(�m	r�����M6��fj��v���(G��//^!�W*���j8�*(�}��
F�>4�f�;��R_��Gr-��{�����QL�&'4	�K��3�?�����(�`����W���Y0��}]C��(�"j�������2��zEq���xD�2�re� #�I�J�nE)�W��'�^��j'"�g�����]���= ���_����qY��c1��bzg�1��r���LB5��3�
7��y2��3�Jz����i ��8c���?���?���&�/��GH#$�GG��u�c���<��J���DAOd;x��x��,�S�_Q��!�S�4m[ue�����U��������hR��?m�u2���=5�����L�l��Hl�nG����l��b���H��QG6E*���a�%������n��
�W���B��{��nV�mTl����4,am ���Q4��������3�p�^[��������5�u�*D�t��C0i;����
N��M>�&Bd�[�
�����hcR�����Q����2���rFner2��%8]0(�r�T�c'����P��~��B*�� �	wi4U��t�hD������Y��u��fU���r�5���z|�m2�(���	,�%�5}�Uz������5��x������ ����=F��d��.x�����n�;mt[�I�����V�����e��VwE�x�!��dM��1���x���b��W�t,Q<�XU_�%%������Wg�������Oo��P1���y[���)�.��:1������]w�2@�o��D����7�h7p�u���Ww��K{!��h�Q�������i���,���(a�	�T�-j���k��!��[�U��%����8[�Dm6����[+� �uJt~�'L��D��>����������1�����:tpz}3��{�1��	27��<�<�O��1�Em�B��0���3�q��wp�G�����e�A�)`1�����"��v#O���qf[�`��/i��Q|�����Ew`�J��r�a�!F`�N	"�P kK���@����S�;�Z,�v���u���Pk����������l������a�����������LK+C�/%����s��=l2h���M�"�V�W0Z����Y!�d��n9�_x�;M�t��l�5��~K�B96�v�w��u��!�� ��(�Q�f�j%���+4��?���|��vWu��]%�����j/���w��]�5&�x��n�/�}Ga�@����d���4/�G�����
Z0O�0gz�����NK���C�����a�������woN��l��CU����7]J��+������3��)��Z��K�S��]1�K
���g�p��H��?���8�X���W���.%�:�:X���u��>�N��i�B����^�K���lR$�bl�V>�����8���z2�E��R#��?��������� ���~�����b1 ��G�i�1_'2@�}��A���9����4�$2}�9Z�
y�~DA�	���.��<�-����}
��O�-"Q��_�?��3�UAw6��m��{w[�y�����������Y���P��H�Jm�=C��������
p�$��r��9+vgQ�n�Yl��.�\����l��Y��l�,�����5L��q����'��\Y_&���d���#�e�Q���������`������k��)�L�����u���\���gDA'�P����@G�<H���p�X��5\��h�"�
."���C�'�����X�q��@����<,�
��B	M�fm\k���j��v;�^�^�������'����h���^������^��QE���!e�������Lxd���%���w)g�[�R_���Ab.�u�z���*���;�"�A�c�#gC&w2,��b��
�S������`*s<�p�����f�q�B(��Iv������"U/}�-e��u���}^u���f�����	*���!��y�(���=L����$��@W��#��b�J����v���C!��%W��+e��/X5�rK��$Pd��H���(1R�z��*����G���2�z�R����x����e��\�f65��~yt�������)���;]�"6�'+�`�G�Qb0��9��5A�!&�<,E�nQc����UA��9/�Tpa�)XE{6Z5�
]w�����E��B���f�.��pU����DhB��1���p� �!iLGSc�*Wx3����2(6R-t�/.��
�v���eI���y�y,;���6Ob;=����C����a����3���m���0�;��?n�����f8Aj�����K��oRiT\<U��*M�
\���������?�b���#m3��P��qieg%��m�y�}�'�6�'x7��F�����q��f�e~C����A�h�����`��R�*wm����M1�<�J)�3�|��t��^���G��;�����v���j��������A��L�(�����m�@[�!y�z�9��ya���8{�z���~�}�D��~�=N��5��x�z 2��3�����v�����}���7���]��+6�ja|���I��8�L��'��g�?��W����aV��dA����4TY�)j�!�lj��*Y�WaD�>�8�Xr�g�+�����Jz\=�\������@�C>���x�*�p��Rz=�,�������J��W����I�����.��f�P�~=�j���X-��V��f������
� �����H��F�[�o�|�������X�n��V
�|�_J�n�e�i[f���OK���
����Bc���o�Nf��0
�Z�5������F��-���
�a��y(���s#�$I�����@���N�r��7��;���(�K�2E��|l6	8���"�N�����U����h�C�]�t@����<@x�
�N��#�a��	-�_����l����#���c����{�+��G�R�V=�
��C2�P�l�Iz�5E	�VD0�J��07��_	�v[��RH�^g���f8;��zb����b�6�V���
�/m���LQ�#k#q��}��?�L^��	���m���J�,m0��+u�
���v&�?��5e1���u�����q��gbtP�����:�qq���1c7~�*X��^�qV�����~��>E*�q{8	� 7��D�p��`���^�oW���:{-�<�����z��27�x��.r����f��5k�&]��{5�Zm�&�����Z��f�OA�� ����	�y
�bB0���gK<��?�p���ALhX�Z��N�s:�����b$����Z�^s���$t^%�{%%���������+�T���OD��~<�*�#G����E����*��d�W����R��g��%S/�t��"Mn;�J�*��wN+���xC�~N��<����T������Jqb= 3��}n�������?����������5}�R~���`������G~H��
�2�����0�4<v���B���<����d�8����������7I���U8�A|����ku��jP~^7���HJg���6W�V}�"�4�j���,��aIv�� ����X��B�(:�L���h���$�*s��v���Y��^9}?�����@l~����'LU������nm/�H8�H� L��5��'t_����
��>�+����Q��v	���#�^�[�lB�Vg���H7��������}'�'P����}�Ls��3�i������-a�EJ#�+���>
�C�Pu2A�?r��#`.�����\W:��9�:�WK��z\�'�=^\���Z@Z�Y���G��"rA�,�E\4!g��*u��qf�*?Eo���^]�6G�0p�8�g~)���#'�j���-�&�������������l��PY/�b��l[��"�+�����.�t���6�;�+�������d���[O��l?��������z�j�w}��O��i���D�T`AQm0���qt�����F;T |U"��k*�M`���20�6U��d���>�����['#�s��hBe.*���S;�z�*�KN���!� �����J0?rSOACC�//������I<4�b��
b��+w��G|b��Do�xV��e����umo7[5��������D�=b/��C*6l��(C��,�4�wp���wm�J����`U��%%�l��B_	*X`�,��A�T��77t�T�s��4�
��������i����d�iRl�5����Q"f0�
�`��s�o��q�*�)��E�����QL���?[:���fVA�toY�`���d��WSi��YM���C�DZrdn6��X���6B?�O�E����,��_�D����>��6'{�`�qx+�N���`��N��_x/or��f����Nc��v=[sa%JC�8��
���~+0�LQ�Y�h��R����n��V����-c!������$�����O_����7�>����r���0M�	�i^�%���m+�;�LA7���)��=V�]��GzsH�.������5�P���q��:��V����h��rx���P�bwB��P��
��U�C,�q���^	
���}Tc�<�l\9�gqD'��[0��!�dH'T�����'d�I����%e��7�Pz��JT��&������Q�r�
�{4��5Q������4�}��jX� ^O�>���G���P.��E�Y��8c���r����|���3�;�i��s'-�����jV_�AV+��X$�*DIa���!�Q�Oe�N=��zW�6+�G_(����������pX"�k�X\�GCg5T��h��VbN�p�(��Q<�{�)������w�����(��<����&&�`�%��O�X�=�e6&T�����dA�+����/eEA�I��3��
o����b>�!
F�����sS��ttE����WT���}xR%w�]���������X�������9�a'�>t������C�P���?��n���?��Y�9�
��7�a����*Z�a�(�b�>`Z���w��I��0��H�5n���n�Zm����Vow�e%����a6�3BZ�@��P���*o��>b���W�q�* )`��``��Sn@'Q�A��������P�zee�~��f�/�%%��KWht�H����O6���M�R�~#�mEGfC�a��\���@^���u�	��Q��{��y9��2�(��������C���3o�	��Yuzo��fr7+t����|�I&���.���6:^c=)y���Q��c\�����b-���[)�"n��L4�����QM;���&��zk�X����I��u E\j~��������?
��JoK+�z��?��#�-�,�Y<�N
���7b83G�-������nu���i�������P�bP�IV����^YN��_�4�[V��Vg�v�Z���V���[�&=�=����
_6C�i���`|�
�5Tx����H������%>��M<�-tYy,��:a|�g$	U�9��$t����\����u��B%4��	�-��X�-N���j����k�C���*�S����  c8�[�S��N�@*��"1�A�I�5b�8��2+g5Cf ��N��~${�\�F�*�@5)E$LZ[����-�k���������t�������!u^r���_>����d_��1�� cR�nWF�����fxD	U�R�M���c��X�N�.|q~\�U($�����%D�$LD��9���`�q��H����d��E���d��+(�<w��p��h�<"��������?�����f��5���F�����j��iOk����2�����m���\����|��#*�5�����(���@a���_��W��
$�W�l� � #���b,����r�5&��,����+�[���L�XH���%��|���i�h�����i�m����;%~��b���n����?��g3&���3kpB�/���(b�DCf���o��n��[�k�������6�e!�&�6���>���W,#:kBK<oc"�!*�R��F#
+,*������&�����j�:����cReD�+��]��H����D�e3��H�1�[-n��� ����=�{�����f�fp6<�P�M�knORVBol�s��$���
r�H���eW�P�gBC����L��Bd�Z�I��
�n����=��
HtX����dgb3[��OK�\n�{�`�K���C*�x6 <�(����8�U4_n�����r�%�+����
��F���\���f���:$�R"��5#�pz��-����B��n �?���i7��MU*��	�^e�@B��#�j���x����}�KgrM���g������]���}�m�d����J�0�����y����hx��5���6�hoLB�
$VI8%�4���Y�sCV`>(�+s>E����F7������T���N9��i�6��U(�f����H%����O*L�46���1x����2�~{��/�Vyt��)�*	,�'��H�W�ax��&}p��jl�m���F�����k���xq"v��!)��p>"J�u,�QFX@�.wPBM7��'f�*�
#(j���B�Z���:tB�P	���B����O~�f����S���d�4��2WH�Id�2��u����g	�/w�b���b�vj����iv���;n�k����/���v�Q��M9�#�,�)$*�u����H�"��!$M%�Q,�#��`�+-�!��G]�Q���m�~Y���
a���]�g�-���U��z!N*��!�!�C��E#jxkH�wr�Q���?Hqq)�����7�����w\,�E!pTd�+�v(��������C�$�<�'�g<������rqy>zwz=�bl�������n���b�g��HoE�5���v���u(�N���,�G��l}���0
��X���b9CB���!s���iP
n��J���Fn4v&��g������j��e8~t��x�dc���T!�f+�1>U>��a}"(Y��L.(�������A��S�� �&��s��*
�l\&7&K)�c���Q����X����
��[H��
�#��!��FF6!��_,{���:��x��)���,�I����J��4s���Kc �V�CD���(���s����������A����q��x���&��md&k�4�CRTA\��>�Vw�5�W�����~��*��H>Bw,$��<�*�s�����E::'{����l��^�����9��	:��tt~�S����n���&��)���s����s5aAtbw��$�?��L�!�������@?Inh�d�����R�s:�]Ns��DF���CL��(�Lxb� P@���Od^�$���
��:���8�(�|�.����]��fS�m�9����~�\Cj���Tm*Zm")��\p�"��_8����wd��;��1�q��~6�3]��%�J�br>0��l�����������kS<]�Wb����~�!bY,�O�'���C�D���Q��w8:C���m�fK����S����%K�r1�9���w�
xw_:��9�h��Q�yLH�X�~�0���O�
������vcR��z��jzn����-}�nG�:b��U|%C�<��<��?a��V��8�@����xA��aw�;b�2@�Sx��g�Qr�n��(�R�H"�)�������Z8�t|'�Eo��I�H[��.���z���O��<�����?��[�G|��'
_N�����;�BF2�[.�F��8>���'�@�lI��E���l���J1X�u8G��=W�SIF��SnQ-Bo�����j�N����	f�N�^���������6,'nE�FjD�,I��JR��[�F����v��cC�����8K	�^�}�h%�+�C�J���W���f��Vk��x����iY�h��a.����7i'���jr
:����^��|Og�W�H�T|�ZF0�^P�"R���;�F*|��&�L�=14{�L� 9!�(�C~�K���Y��}����N}���?M����� 9��?��loz�/���m�F�{���`�`ec�N�06���'�%]?��\Ul"�/
�&�"��oc��4|�\`�=m��;>���	kS�]����<Y�dljp&����3A:��e9c7�KQ���#(��X�x���*O�
'NR
>�A�	7D�q�Z	��=CW�aq����R�Fib��~$O���������=D'���X�w=O��
�	�&;�,q(�5%�����/���r�-���������C����s�q&�C��P�l��x�6?�������eg�G��\3����QN0_�$��$������_���D�*����G��e����m`���W��B�D�@���1�^��1���9;�P�#��'��0���9�G��o}��8�y��@z*���	*a�'��1��#�xP�FxQ,_AD]A~��9tbZ�qPB���5��j��
�*�w&(&f!���K��m����]�$���1gWp�j���	�J�������D'���,Q���z�`��
�gh+ZPd,��1��������W��$%�7_I�`I.EYU(��$����?���w#�� +Rl�~�T6�48�#( ���'�����Z�(P�CV���G�����.���
����$)�X���x����!9��4���N2����1�d��@���������'>X,���+b&m]����s��5�=�	�np}sq3\�
0LH�h���tO���'s�������i��i{=��p�Z���O�^����n��@7$����
i��3=�����&������B(�����r�#?�N�D;�Md�������������(�4p@AGzOQi�X��7��5g@RW�]�Lz��x��%���}b���c,C��"�(_M��fd� 3<q��C@bJ'�)Y�\r��G�6,[/Vj���k���.��:�==qV c�!���A�5�0q�P����2�L�������/�����4%l��������%E�}4)=����L�a���~���\���r����T������%�j��X�b�/�x��P�l�C�k���q;s����#^}I��6�/�d�0�'$l��$g��������y+N�(��9l$.�("N�`OR�������{V
�2���W�2�kB�������s,|��)�
�Xp|��>�{��)�p�:3��EF?m�w��\L�C\�z�7P��U�|d,+#��������y� �^�����/���7#2t
_��I����W�QR��xk�cK4�]�����N��oN�n�mW��F}�7n�W#��L�2�-i�u�%�+���Py����o�n�H��[������K���7��e:�F��y��98 �<&	J������G7oP��������hTWWWWW�ja�V����{��V�^�����������A��v
�����%b��R)�Ak�Bh�$ke@�<��t
�!S\�������$6H"�O��z�v�����L�GM�|���5$�4��������lg�������:���{2��N�}�K�q19H��
�I�c\�\�������2�r�3;�����O���M���b�T�S����I���n��������d%������SJ�<j=�U@O��'����6�C��X�lH_���(@n.(��C�Y��awn[���������)Y
��Y
J��L��f4/C	�i����z�D�E�6p�R)�T	���X9:?��Q���p�I$B��Au�h�e�����.�6��~�`.�[/�|0�U(c�c{U��		]h��c}/8C�.D��C)���-����oT�!��Bh��';�Q!���$�y�^;HDW�.Ow���Oo�OwN1$��4qO���y������7��5�,��Bc`83��+��	����a��x���$�|�e������:��V���pS�����J���7���[��Q�6�XH�'���#"eRy@{�C�Z��8�L������=����h�������q�e7��R��J��=f�����\ �"���f�����Z�%��F��f��DJk�>��>X������������������v��������--/f5�{����~��������(^�E4�0�\BHK}�HN��wY5��:6?n���&H�A~�sv>$�$���P�m�jI�!���{�je�6b�f�w�KW�y���Nw.t�~>6�}O���yU�lv�UT �jr����6��1/|m]f���`}�~�_]�������j�
~���AZ.+*{4����k������C������z�@G�R��pP�
0SC&����H;0�w��B����w�&���Dg�����y��F 8��z4�� BD>�W��G�����X�Q$�/9St���[�R
�w�?(�e�0�w�r�<��@�H�\d�ed��}.�"���w���z��h;u%8^X|����Co�gR�R���F��<���B�RD��D�
���w���uCb'Bvt�_��k1��/�WRw}�B�Xs�M�e]	m�a�d�������SzQ�$'��}=��`	0������A(�`68�h���|���o?��*��y2Fn�j	���D;([����V�t����8���bQk��`�*���\��\L��`Od�s�'�GpLo�G�;�8�:�����$��+M����Wo�crvk]^�Z�K��fr3����l
�]���*9`��{bh���F�m
����(Qh�����Y<���[J����,r'B���\�W���cX���������w[\�:��X$����F���C+�~t��G��1��7���t������a��4�Q)I+��%���x�����7�sb4�fr����'Q�@�M����{�6B�3�x�A�g!3:���!��h���Jp!fyM�OF.�tK�<%�G�g-�U�zy"�L��SKpj����m�M��b������������*��z�S���HC����D!	�-���'7�sA�a�Bp���!L����K���8t�)��yJ��E/��'n����*t���U��N8�B;�Z�E<g��}�EF(P�L���(����6��F�>bNJi[�-f�zp&^�CB� �����1&��Xl�0@�2�^d/W��`�� �;���6|	�Jx�F�<�1�Gb�8�(N�E�4/�{6Xq��w�E�"�9m��K��i)?���d��O����G�T�-%�|�~��<�;�=����|���^���/�#N��=�R)x^$���V.��g���*-8mBT��
��K���i��ut|�k�EYfSm$�:��:��=Y&��&b�CB%>����C.��I��:�2�}x�r��>�2@2�.�;�{��G�P���Q8���1����L.�S�F"�:5��N(��N���eMNW��~J���2xx.A
s����e��Y���Y�i������7�p��BtI�G�U�b+t�@���vU���t�L��B��M�(��mf����d���8��E����O����pk�r� F�D��L�����D���$q���D�y�T���&�>�|;�������$���\���,��3��I~t��.�<J*������I���~FVU$��drC��M�rLN�DV��MC��kX*���8�H$�x���y����,����Xt3/j��=��J�b��8��e�n�2���`����wWI����d��d
�[���$�b[�w�H�6 N]�2^��6w�x����G������������^�^L47'#���A8kN����,'=��d����A
FO�~z=]��������p_�fIM1&L����-��t��0.�#0�����&D�l*��{�;Y^�I�����<����v,=mo�X�H��s�����d��=�6-�pY�B�Z�T�Y�~ax�^WS3���4W���BN���
&�G��7U�cN�Q��
�������z����+>�w1^Ub��M1<����\��I1�424���y�
�������}5V�'������zf�3����9�`�v%�u�
V������ ���B�BE3�=C����b�J����H�}4��G��3�\-*4i|�H��25&-F���d-�CCM��M��b��fk��a�%�r����+�/dbX���Mz�
9�J�����^Jt6�����|��w��0���8���b!�j���$�`b[#�$c�'
�������r�������\9J�E�E�_�%D,��+e�x�3��!V���>	���L������D�OK�1��f[�yu�{�9;��Z�o��9�,=L������0�m��l���u#����3S���k%g����]������^����>j�KV1c8��xH�pt}$������v����3SNU:y�=S�a��q�(z�k��`�y��g"���i�=�|�����s>��$'���0�9����=�0m��U������hH�
�>����i��A��4#�M�Iru���[u��������ztp�!�*_;<����s��9�9����.e��+���[�b������Jd���o��O�SN���*���U7���0P2��R�u��������K�@����1�sx?9��x3A������+���������VL���qui�]]�?�����	�
8����0�q�3?^�;�����2���a��%���h�j�V�o�5�Vh������{��kwF�V������u����"�����UIt��3��Pr<�6o�S���w����E�&[����%oj���
u��'�(c������)h�����<�6���	i�g}G^-Bd	���z���K'���Z�
ig�?Nng�7�n�������wO��5����p�P��������$Ix ��PS��\�(���O�HG����b�&�:5&�0�ytB�01���f�������zkO�C�Od7���o]�COl��a+E���0y�z�'��������*&��5{����������q�v�%x��
VJ��(�P.�8����hJ%��gU��`�P*<.S��XG��<z#�e�I@f)9���h���;x���3���p���l��!����c����)�s����az�V�Q�X�B'}II�ow/�����������<�&�� ��b��I��3W=���.�|z��5����]��Vk���{m��vK���[+�v�r�AX��N�������W�^���S����)#3�@`Hl
�*&+��i�h|suaD��a�S1?��#��U��R3��u�S���b����5M��?���{J����d�Lz���^KL�KF���y<#���<��x�^C��������=M��S�9��0����Dn��������Q))���o�c�o���j
f��u{�hX<�������������lZ�/&_���~�I��;�����8�9�����_��&��������b�8�����r�/�r!4�`��T3L�q�7��t8���Z��K��}�s"'[����7(M6���#�a�(�RB�:
�����0 �Zg��5VN1B�C�����~�a��~�,������y�����e�x�z�������}{�7%�o�q�V\����"��D�w��`)���X�j;�G�������]�����<E�m�u�yohw!�w�3����S�\V4�7KK+�R�V�������� )�����)����m�l7�������c��q��8h���]mR�C�Is����&�YY�
�?��$������/������K��+��;�8��6,� 9��)^��d����LO�?L�%��E�Q����
�q����x����!<�8$���vz�����z��o~��R,��T���oo'p�2�!���k�4�[B����W���m�K�����E�v!�h��3$��z�L�W���B��!Xi��!��_����e������g�%rHR�7WF��M��FB*���o��v�W�&(����
��hn���V��kw����v�i�����A'����t��p Jo��w����@�
7\%�>�����}H��qSxy��U�����A5����0������At�c�������r��C����l��������e6�?���=��{��G*g�$��p,tqw5��I����yVVR�e��fmQ�e}������Z�}��^�{�!��`F���n��w�}�������]}�����aU�I��3O�))&Z�E{�1�M�K�d����!%@��+L����~:/3�E4Tc�Y/�l�a�1/4c���C��x�V��v��h�u��B�
�-4\��N�9j9^x���`�h!HF�_����/����:��q�UP�?��OQ�r~0Xv�y��hk��6��F
�{��*�X*��nz��di��.��VP�fr�����|R���$X���%,�(��"�b����3j����x������)���&Y-imr�h�s����'oo!�X��
SL*~�I#[�_�?M�j�>��L���1!�'�S.>K!��������v�9#��%�C���G���lUK���j
��:�f�AT|$�q�:�����t���;-�a���P��GtHi,vn�	���)N���^�N`0��7�v
����k�����5gy��r1�l����~~&a��f
�3��q����#4x$v��Z�7f���bb���`���w�e#iq|��@#�� ��?����X�����n�
�*����[/��h??�`��6�9���Pht�a�<�M����Ep&y�wg�����������HkjO���@���4G�5��m������&��$kj���!��lg�Q�HW��'YJRH:�I�`���!�����i�����:21�����L���H����Z,7��S!����������C�x5.���/sLET�C�x�T���{����.����}�uN���|^���o�j�2�V6���f���e`��R�E���x�X�]��P��uv^/���e�������Z����^�����rW4$����)|,�����U�SD�)o]�.@[P������m����e+.J&��������������	��� M���;7���Ej)%HB�I�%�l��u!6k��9*������II���i������0{�Q{*f�3�o:�)��X&�K�_����g
�,e���m������d�m�_���w���6�C�����j0$�q(��N�q#���������c�$�"��K���3c}�gv(���5����g]N���/��;������zC
��k}:��p["���G�hZ�y��	��?�W��'�-���>�g{��n��_e����4�=��;�E�{�.�}��e��Z+5�j����Yz����~6����-���R:o����	?o����-���|����y�s5�����a��:����S�&�������x6��cs�)$��a)���������X���'j��`���-�0�)tv}� 	
r��������1ci�-��f�����qb�I�fJ/S�e)[K�������2�-Z �����:�u�������������\��n�3��x�p)�E���:�4�px��>a���������lLD�J�j&������
e'ofn�����9nsz{�\���m�!���?s�B�J�%������Y��s��'������l�����^��.0zF��i�1�C�z�?�t��X"5<���.��	��&\m��L�N�"�p�K{%.�A��}X����I��'����)�/�Y�y�T��|����������E�����'W�Q33	s' uWhN��s���D�}�}0=.���I+BW���r��e���q�O�7�)>x�_��?{hCb*4(��T���\MC-���i�\`t+r��j�^�w���v���7(���%��%��Z�+��(�U����$%j%�\���a%oc?�~q{�2��{:�n�A�_��vG��/�~&���&�{�]�����5\��s��B��J�C~# L<�Ib���3�Y����
���/���'";(���"ylh��~e�A�������Dd��a��CXcC�l��S����vlh:o X��[��;����3,A2�m�l7�C��>�������;O���H�z���������I�I�������:x#��	;2 �I�&,�6F@�Ea�B�Ml<(,�����H���Hz�+�;�
Wq���)����9����-D����kK��	��u�Y����Q�%��;��y���<T���]�ogbN�(
%��� C���������2���	e�I�}�eK������RI���5Z�%3��@&s`�����X��x(�_��:���)��T��(�FW���x�1s�J�Al�nF���f���Y"��X1�!�����g�7)�Sp=��r�@��H���C��p�w�oMRC[R��aV��-�=g�:�W�NC~�|/�0��j�73���=��=?�����3H�X�)_{�=,�����������A^����NatV�D���3�`E�)	?T�O��	I^>
mz�6����g����96�+�w-�����
3zZ�W���Q�<f^���e���gv!u]��&�6f���1��m�*��*���}/�<x��r��u��Q���b�������rPp��v��PY���C=�*����?4�Xd��W%�>������G��{��RMp�pK����)[d���G�����7�7�"�o�b�`���;X	�n�Cm��C�`���b��h8�����v�0�v���m�Qf�p��������j�z���P���/��u���8��%���3��h(�Gj(,{�VI���!R5�����<�����


h���$(�b���z�?�e�3g`PsA.|��F�0�����g���2uJR�I9;�2��Z@"K0y7o�<Wm��t�z��nfXL�+X����n�&����8s�����i������0��8�U��M"s���Fq�~bh���b�D�o�?
V�t!��dv��9\���z�k:�a��KP`
�+Z;3�]�K��]Vu��a���R�x����
���-PcI�����M%��f<l����X��[^����6��7�8��Ca����
�+�;0�|tp�to���M�����������G�Q����O��������I���3�N�xY��&�i|���'�Z�'������ui�K��|E058�d�X�a�F�\W���)z[?�<y�X��{9��F����*�O�`j,�J�j�����>'��$4��u��^���sE=�[���a��W?�?��(�Q�O�7����4>?�1�Ht��E�P"�<2�:`4[h�%� ����=v�g�0�Q�ex�BB�������E�����o\�i(�VT>B�W�?�3p}�����8�����`Fr�?�M?��RDH=����@:o�.����'��5�j8{���4�yN�EZ��%�<H��&Q�@���M+0�v�0T���i+����(�r1�4��DE����;;���?3� -�|�|:Z8r��H�}�:k��-��� (L��}�u�@7S��[���'2���S�>�6�L���a��V�p-9���|���/�9�^�I4r���Z]GsDTm�'�,`��R�1.�:�cw:����^x�V��;��{�z:��\���$�>��lx���;�� �6���@�l���Iz�&���7���,vB�w�����v1-8c+��;�U��nc�v�0�q�����)V8/����~�1_�*��wd?���������t����L�<BiC	fK0`r��x����}���|>@��A�y���{�����;7)����$�U������B�������A��bk��C�(��l-"���@����;����k�O�S�,#w�:���mk2���bW,����t��Q
��4�1h�v�d;��:����F1��K�r��(>�r��7L���� �.�6]N����+`����������ky<��H��hQ����H/.���`��`�q?j.;r�N5��%
��I���9T1�:]�:PL��'�����������bo�K����m���	Jz4�^(D:mo�(3X�P�dn-r��LA�d	cS`�i h����O������1��O
��*	�CNC�;u�'U"aB��u��G�t�p�@V���
�������o=3c�~�����F� ��P�/����b����C���v���8��F�mz��w���-�0`�E�x�������K�Q����������=em\����=e�=�6m;7
M�����M*���j-����D���`����a!��m_�w�}����WP�@'��`
B3�<���)��B�9�����A�[��T����t�����X���L��K�T-6��c��^�_B-��>��D�O(L�5X��]P)��L��fx[\����3���n��90x ���K����!u����w���(�*#!mi���������{���w������B��!8El�Ds�<Ba����}Q*���BwK0��������A�j?"���Z	Pb��L�<�Z�es@�%|�v�y��Ea��,1�V��/.�����K�2��R.r�~���&�vx�i,L�������\]u�{�w��A�������q��nF��]�q�rjW��!j�6l���G�4��of!R���*�m3+��o,]��Q� ���^|e�+���b*9;����N���;Y������o��R?FQU~L`"��25�4�-!o�]�X�@���{�~op(�x���f��*�).'7���b�A���6�[=V�,�~��rO�{p���w�=����t��`�MA��,�\,�Dx����fop �
����SZ����6�3�G�r#WF����4s����p�[G��i�B��*�*T�4�N9���KU=�KkT4�b1�R�M�����u��C��|=��y,��jp��ULY\��gJ��"���$��L�D��,�BU�;���������>er��a����U��X��$�xS$�\K] =���h	=(��@R��Kf����rp�&
N�|���ZZ�7�AYe�������
i]�iR�*��DQ�8���c:�V����3v=�$hi������B�6��h�t��uk	'{�����QB�9$8��+!Xy��v�\)�B�K�<�K�����qE(��W]u$!���!�bWx�	���$n������Q�������7`�5u:Yt'\���/���^u������?r�,�1\��s)���zJ��o��IWF���)SL����^Z�^Cq!�o=��p��y�o�Z=s���=�mW&��4W>���(R)�rO�T&�TJ�����
���n��y�@mH�1A��*v;;��^`�v��R�u����
��Ig��/�c���Q��)���k�V�x����S�N�!��Uz'T��D�=�*?�^���+*�`qACN���t��x���SsF��5f�*$W0s�5]�C����;�c�#��"�\���b�IQf_��Z����T�����qF�3/���G��&@G� �����1����s{#��b� ���B�S[���m
'�V!��!��?(�s|��r�x�����%�|��x�:F<2�����$'�Qyv��c8)Qggi]����WPN?%���_���u�����=/E�wg�X�B��|�4-0�<�B.��%@�J"`�0�<���DP�-���D�RL������;T�4�l�P�������z�A0(�C8"��Q�z8��z����r��K��C
�L�X���=�,�K����>� ~s�=�Z�z[�/
����>$�^Fb:UF������[���i��BD���b8���M�V$����='�01<Fl��b"T��#|��"�SV������� ��;�`��)��Y��t}�7�v����}��v���z�+<�X��m�{|��H�
����h���Q���3<d�^�q�.p*�h<�1��l�*&�����m5:w�����V�k���Xm��������e3�O�Rt�K
a�8���[��t�	5�W1nu��GA��Z���
P(�B(@b��`���hiFOj�I���)��q�XR�t��;���
�cK���D�S�r�b��U�
�(�%�d(�Q���v��s���f�A�.p3K��)�4�<�f���WoOe�rH�4{��j0hI�������R�7o�x�F��w�����{~�L����/GG
�c1vbK�f��Sq{Z	K������h���9����#����&M�8�5��E�K�RC��~� 6����
��u��U&lR/��PZ<�����2��BD8os��
�X��eYO4}od����n�#��A��u����V�����68t7&�7^d�Tl��\&�Ota�@R<$�5;X�
_>VgY�)VeT:���q4����:1�`���Nv�bK�;�%�~(^�J^���N��������e�_��>�`��
v12-0004-Remove-superfluous-tqual.h-includes.patch.gzapplication/x-patch-gzipDownload
v12-0005-WIP-rename-tqual.c-to-heapam_visibility.c.patch.gzapplication/x-patch-gzipDownload
v12-0006-WIP-change-snapshot-type-to-enum-rather-than-cal.patch.gzapplication/x-patch-gzipDownload
v12-0007-Move-generic-snapshot-related-code-from-tqual.h-.patch.gzapplication/x-patch-gzipDownload
v12-0008-Move-heap-visibility-routines-to-heapam.h.patch.gzapplication/x-patch-gzipDownload
v12-0009-Rephrase-references-to-time-qualification.patch.gzapplication/x-patch-gzipDownload
v12-0010-WIP-Extend-tuples-when-getting-them-from-a-slot.patch.gzapplication/x-patch-gzipDownload
v12-0011-WIP-Move-page-initialization-from-RelationAddExt.patch.gzapplication/x-patch-gzipDownload
v12-0012-WIP-ForceStore-HeapTupleDatum.patch.gzapplication/x-patch-gzipDownload
v12-0013-slot-type-fixes.patch.gzapplication/x-patch-gzipDownload
v12-0014-Rename-RelationData.rd_amroutine-to-rd_indam.patch.gzapplication/x-patch-gzipDownload
v12-0015-Add-ExecStorePinnedBufferHeapTuple.patch.gzapplication/x-patch-gzipDownload
v12-0016-Store-HeapTupleData-in-Buffer-HeapTupleTableSlot.patch.gzapplication/x-patch-gzipDownload
v12-0017-Buffer-tuples-may-be-virtualized.patch.gzapplication/x-patch-gzipDownload
v12-0018-slot-tableoid-tid-support.patch.gzapplication/x-patch-gzipDownload
v12-0019-Set-tableoid-in-a-bunch-of-places.patch.gzapplication/x-patch-gzipDownload
v12-0020-WIP-Slotified-triggers.patch.gzapplication/x-patch-gzipDownload
v12-0021-WIP-Slotify-EPQ.patch.gzapplication/x-patch-gzipDownload
v12-0022-tableam-introduce-minimal-infrastructure.patch.gzapplication/x-patch-gzipDownload
v12-0023-tableam-Introduce-and-use-begin-endscan-and-do-i.patch.gzapplication/x-patch-gzipDownload
��'E\v12-0023-tableam-Introduce-and-use-begin-endscan-and-do-i.patch�\ys�F������zi��i?{����:���y/�B��X�Hb����=�r����F�E�����7����u�~o4��lk0�,�1��Xo[z{<�V{2����.}���%��X����g�v�spl��#�
�`��<���N�tO��������S=�o�m�����U���3a���N�M���q�}p��f���xstw�{�tY���wb�n�\��.�0�����a����3�kq�&������������h)��������A��8`�L��/�97Z�<p���a�y�d��_��������r}�A��nu�D"��0Z�<�=M�D�Q(�X��/4�G����:�D����kE�&D$�e��(���&�5sDH�f<����eDs]����z`���D\_�/}�Q��l���O#����6��< ~����'�����f�������K��d�;Cd�>�"J�jmx*�[��5QV�H4���RVDo)5�X����(CT.�eKgfL�n?�a��3����kd�\��#����FD���"�=�o���l�z���zt��C6��{�D��$R<����\i�h&�����3
B2�p���6�"�H.�C���H�A�����tW;�Dd3p�i�"���-����AN@�ko!�=4��m��M��i��B�o��2����
�*!"���,���������A��R7�e�����r��2'��u3�y��gnF����(x�N��%Ct��]5�i3Ed��|��X$�TDB��?vB�D�A�)���� �P�|�������e���L�E��]���;��zP���D��-�]���cP����2�T�M�/� tP�` �0�( �~��(#��C�yw*�T�H"�M��I���i	��-W�<�}�e�'z���W���?N���Sn;�����qEK���r��6���%��9!�����'���j��������)��A�*T	U�]��<��N7�rd1��$��T��g��IZ0<�sAnO��d��
}�E��=x!�M����h��L�����3���U8:��D�]KTkp���1��\�j� �m�h�����]ScW�9����ph�;�q�90}�i��a��a������������������{�>|8`V:g��|���g07�,����-TW��p�Y�@E��<Q��Y7����_}Cc��&0����M�cF��"4�Xi.|��)w)�0��:K�p�t��t+G��U�U1���{R��R�����_�_�j��g���vdQ����q��V�0f���4*�g���P�����J��H�6�^Z�A�%����Ac0�~�@�F�l���oN��L&��[�#5��qM�n2&�M&_Su�Z�fW~�Y8�CF� �����@D��HS�p��"��Z�1ZHx'�����d�i�0�� ��j������Gx�B�~XG���&mz�E����_�l�����%�	�����R��a�j���ewukdw���l����m����.�/e�������Q}�eT�l������)��|�zqz��A��*p�e��0��*���Q8w��3��9{w��������5`�
F2�n>j��W'w�W������}��V0$
�������a�������R�V
�J�f�������p�7v�:��a�A�F�IG]�)\��hq��"U�M?���H�0��p�T�5���_$�a��<���"]����]b+�����Gax�]B��`@��������� �A�*/u��*�����B�S�
����^��|�Q�����t��n%��]���#6�$�,���^�X%�uvD�z;��$A
�[����lK��@�`z��;;�����?�iG����i
�Wx@u�z6�^O����� ��'��g�������L����n�on��wg��:�V�>X�Y������9��h�|��M
�sW��p�z��wlz������vs4��@u��JiI�91Uv����������_�N�����9A���n}���H��e�
�QI?��e�$�:��PSbk����1�b!+(������+�	�����J�����y����Tt��+C�my���z���� �%�_Y�(,�14�Y�������}�����/�6sW� Y�Izc
��D��L�D$0Q	��>�l-dU�%���n^�l�*,VVZ4T�v-]\��9����9LX�JI&6"x�q^E�:;��'=���p�k5L���IJa�2���AV�!�3���a
�Li��k1��m�@>,�	@�$3��8d!q�}p��0��f�	|������l�}�>���~w4=�>�"��AAb�C���D�����P|Y-?��u��j���n�u��R��g����sZ�b��U��IA�C���hb�c����g��s��~���^�.���\)���@�����*�*���������C�aOC%4d)�(q8x�trQc�'�N��Bwr����j�u���G`���;���N{OO�
������m��c����S�ss���m����Tj,.��r;=��gPi��S�}U���i	�/{i�n��~W7'�a�?����0�a�=��eh��[����H $��|���z�����BL_����)�N��AP�<���[�kDn.�����B�mb�/G*�\Cg��*���/�G��#�JNR=��4��B&h��l��O����G^�l�#]��dR���m�xE��R��^��
����	�UR�O�Z`b��T�
n�����3]�hA����[+��?��g�����q���w���Ug0
�u&�*���l�-��/<Gj"��(���@%������g���5
�%�[r�dM:!-v\���T��/��L(<?$K��Su'��A%8�����&�7w����X�;�1�	W�6�����m�(��^e���2�k�xr�%��y^T�>�+ ��\��������6�F�t(������
������`	�z9��"���@�|�?�������7W0���R�T�2X�����
}����T�pH���b�@E�<L�A09���`�d��<��B�OW��u�2}z|�]]O/�>�|������=�t^��e����$&b�c1�d1���gp6z�9<����*��1�Q8\*�1��p`�U���>F��Iww��=��bsl���5������[��b�PC�XF���d
N��-8^&+��	
�2N���Z� ��?�|���d�{���[����I�o;��j6���p�n�&F�����E�������0~FVX7� ���4,�r���*w@`��
i�_��%�[���l��I�aQ-iV.PmPdd[��d3!���\������4��{����� l���s@���/���:9��{��x�����n�K�������R���T�,!���H�/Y.������/��F�[��mN������H#!���
�����I��(��������r��`���=��^}dV���+��c	����"�R���5��!�e�r�qs@�������XF�$%n�F;��uB����%8�?#�������I1
7�xJ��y���`(��PMx�sm�ri��#B$�,�.nq��>���[4L�� b��Z<����{w��y�/
�Y�:j�5�,8�r��H����F
��?��v�'5�Dw���$�����������U���7��i���F�ZS�x�p���o�Z�K8�"U�s'�B}�-��z����!��~�?�t��h��}��c�%Z����*7��'��#�
�S
���J�JG{"��:��~��~Kf���Kf���rp��/*o`���xh���z��Dy�I���s�z=mKW9H5c�0Qy��mJ�5����
����p0��f��l�c��y��������k)�)i/��Hb��Nl�T_�m��l{�m���rn�&�T�}����{m	iz�,�y����4@��iF=��~&�����4i2�c��6cwU,�����m�Mc���Ig��f��nn�H�nG����L)?�<���:�J[3��o��xq{wK�.��.���k��E����
��|�t��2-�W/ ���,���*E2�3dp�M����.�*;�&���_7a��2����cmk���Nj
��'q,�� �ok��z8����������/��{zv��[-���}�Zi��pi��6�]g�7�� y	_����cI�/@�7��5X��&`$/Z�����d��z��\�Az��3��6C��=��M����)�����Rl���9���<�+��6�H-��X+��<��a��76��uJ�07��`�V��L���fs�1'm>4��xg�dX�6������74lA�ao�����S�b�� w�d]���Q����&|����������b��[�v��	cvr���c��0s�V�.����e2k������>�P��OP	�
M�B���M��8��K�%s��1�#���n�"e�7
���'=������MS�]+����$������C����!;v|/��w��~p@rz_pL�t�-�)2���!n���Q�3���_S��|�?z���ON@�^y�<�=�g<L��"K�h�j���e�,,���gOk����P�*79��<b��q���
'$�o�92��6�(���t��>�7I�`�E�1R�6�~���s����#����>�]j8N<s��G�aIm��j��D,Ks����:�"�Kf�VD!�2"�uW<�&��X/EN�����R�������d%3�m��ao������l�M�	��g���Hn�Lw
��7o�w-|�-'F�m���,�xR���.1��p2	|�������
���O�]"��@����
�e�R7z�|����&�Jf��vK�v�;C"��K���Mp�d�g��Q��������FtA'�t��^�����ux��%\�M�������4���d&w�`������ 5�.h-a���'��6T� �R���������i���3�� �<3�����]�����$v����J�%[�sbJ��!������Km��Am���I������u,>q�HBZ��Nmqs�`�x<rDCW2@E���n\r8�,YjYmL�s������5�v}�o3"P�����@��+F�Y80H�"|�Xi�G���BZ��S&q`(��5/>���of\�O�~��T��J#��W����>e�cf}�(��X_��42y$���Hg��V�oKy��ex������l��,�M��H|��e��O�����Ak���MRiYn�����N*�:��%H �������x�����4SQe������ ����:},@:��C.$���@k7�H�X����w��x?���!��/�:�O0=� �q^��������8z�B�Z6U�R����o�0G�t"j�X�%��~4����WMy��W���2|���c�S�4�	���p�u�� p��Wr�z*)���|��P�v����)���LJ*��G;DT�X���8���P��}��_�2��1"�$��5XZ_.�<ZU0�������/��j�;��_k��C�L��Sr���{��<b=��m��F%'�Bt��[#��#���B�L��V���J���V��BV2�<����^���r�B����W����7�n���?����q(s1�%�d�l+=�-��t2��_�%�I�
���3��o��6�����<O<��j�u�������=��-Y��3�x�Y����\����X�$�9���x	�P?��%�;{��HpBl�s�v���cH46�������@�������N�1����/��������*i�h��V!l����}���GY�I+
qZK�f�krfW8#1�)�K���c�=�O�(]V�����y�]')
�z{�y_�t_p������g�1����k�v����l���:ZD�UF�A���<�7�k�B���#&�e��(��@�!��j�8����:s��?��{�7�����BN	*�?:�f��D�������[?	�0��
�+�:���	�0n�/v"%}s �yG$�H�qx"�}*s	<k��Q�a��5�O��A��*��1���H�d�N�e�f"���@,Y��~���o(��~�����7�Q�;��=shf7��J��g����Z���������#U"`-�R�kJj�����Dh41n���vM�}AO��=��W� U!��7 ��K:8��l�T��qYQpN7��4�d��*�zu[s�tT�1�_BSW>��������X;9@��h�tdt8�	:b#�=�����i���kGP��U�fB�<_*�O+	-0�~�����e�Aw��o�O�e�[���j2�t�����VG:V-����Q+��G���V6E
�9���QPT����e�����<�n`#��.���-7�|S�� ��M���nj�!�y���2�jOk�K�Y3��`�����Li��>d#���A�L�`D
>��5��q0a����x&������e4���Y
��!�����L�V�'��q�SVq@��7��6����pl�S��'*<��f��\6����C�y��.���
����m!ZV��yK��Y�������k�;Y2������(u \��6w��L����������r�R��
��&�Sa$^��p*�J��e���:$G�$}WeV4g���8��K)
�8&s�c�vjM��7�9t��Ib0E�e]�H]�x�;O�����T�<�R��woY:���.tz{Mv����m�sr(%�����eRS"�H4iG��5����x���_�y���Hc�C�����J���@�H�V�}��aA���>���R\����a��c��\'XY	��^�P,%WJ�t�6f�����}�\��~�U�^IyM�`E�-��@P����] ��}gd���*PB���n�E����k,���@Z�w0�}���@ ��B�^��~GB���3��Z.���>�����������},(�'�HH�d������Tc>
8������Ig�p����}�h���������c��dGZ�����\�%�V
/�����(?��E���C��v�t�[m��bt�&\�������d������I�����2�����[?����Q�M��%��&�����Zu>&��M�;������
R���j(�)a�S�^����d�/Q�cal&s@�~?�92�^z�9�2�x)�������	�bz�\�>���e-0��ma��`�LGc�����e&���3d|c���f��f�06&��as�{\r�E/�M���#�ET�A�f��IB���A�v���K9��*(j*���U�����]�>�L(
t/���!���E�b�������h����
6;D`0]/P1.��_�a�z+�-h��w^�	�`��0JT����$����b�ld� O��`2K�����S�������1Y�}��	���H����"CW����n��.�����4Z/�(��w��h��j�����q���Q��~���|I����
�nA��gJ rya��=t<�������C������p}F�4�D�z�B� YT���-��>g�G}��_��w���bg<��bg<�;;O�����g<B��8g���BT�s&m
���'��N�u~���O��L(_�d�9�/�.W���A�qjAr:��(�d�����(��������u"�v�"�� ���o�C=ro�H����+�^��������o�J��}���D�?��L8,j�cZH-�=�0Z�����v��T�2�����s��f������U*������:�<����\Z��k���E|������S���A"
q�T8���|D����1�>N|��d��vH��1��cK���N�������t�|}�;��	Q����A����LzpR��v#�c�
H<���#%"p��M��kA�Q�����n(��6�����~�v?ny��8�a�B�6r�Y��;�+�6K?Z� S
��
6M�/1���k�
��}n�M����S<y>�F��dI�� �K����:{.(p�]%!�?�}+�eo����{�<�M�dy0W���5v5�������������
x�^�a%�z@J�M���cpw4X������7�
4)�C
�������&�]UNf%�+n����v3
XL�
��cggbA���]�G�����8$�;�yjPv;80��c�X�T�Hyz�{��� T�s=1D����\H\�&���K�I���#�EI�����G�-����A@%:��k�or�XbEg��Q����/������X�n�	�k��k:����n�c�K�C�E9��D]�PF���a� �4�@�g�������/b��D5^!���K;��0�S�����f�C`�[��"��cO�`jn�Jb�7������w�B2alW�"x:2D�Z�"��`�_�
���k�!_��2��m��t��E��E��r���2i�2�b,�P��t �SJ�v�C�E�Z�qd�;�?�!�v�Y���87�1^A��|cF?�`�=��Gk�1�����k�@K���:;�%jZ3u�p���.7sG����X�gfK����
%��^*�<���r��l7�V�[�7�m���K�n���.��9W\1����	:��L���|�f��m�B)$qBF���~�l��������5S*�y"�+
 ��Q�IV���,�^
�$�%��L�se��b@��EHL/+����^�����p.dP��0�w4��y���L�:0�F����1L���my/�� �Oe&�2T�W���P%��9�����@�o�2��WY�V�G�L
���z/g�IrY�CG_����!U���oCI�9�y�����Y���T�'�I##u���c #��u%pJH������)R��!��
��P/�.������>���GI��{u��G��;S�8��k��!]
r����4�)�������E������L�37:�G�Y�����#�����V�H���	&SF��@d�������r�'���cjx����+��;�p����e��1%A�$z��pq���| 	��F�n����;��3]y�����&C�>��3��	;5�4�������E�L\!������e��+�����)��NG�2Hf�����v���=I��,�Z��C��F*uBEK��c1`
����c�Og���bF�
23�kq�`H��u�)}��o�
R��
'�����E�`�n�J�`����+��T�8$�Ct�~��h6�����x�{�>}�p��V�Skh��<M��1��Yc�,�K�n��u����wu���8�FPp��Rw����eHC+�z	j��+�2q7PyP����7-j����T�<�����M�TK
i�a��	��+v<F�x�;4���8�����i�7y�����^y��S��o�5q��5+m�DTSY%M��"���qI�n���)��V�o� �'!����[�vRz8=T���S<(:"N���%k�Z�y!{zR���^"g���.%���[����452�D�o���C�n�,��-Pv�������i��&�M�VI+Y��m������25d����u�!��
D���~e�@R��_JF���;6����B��Q��)��f ���)p9I��k@���0�m$c
���O�*o�K�~��j>��E�P�����J
���dq�����
6"���	�$��~LS���hx�<3�U�_�(�(����c�)��d}g��H��H`0�M���z��;�����@����]y��X*Prk�kF��v��`���$�B�Eqv<Xb&!��c(^��,��(�P^�~����c���:UCcO��`Nl�2HG�i��b������(I�RR������W�>�
N+����i�5��/��/��h^Tz��0�.��w<}
p(�8�f�-]+�3��AXHf��9�����7�m~*���~��:�<�\O�,^Es������RE#�6�����!m�5�<��A�^��S��Mte������~e&���SK�m2o�����(��[�]�J1ap����N�����b��G)�8���>>��n�(	���%���i����b�&W�*��FT]��8�������^1yc���K��7 t2��M8`�c����r�1��:�L �D(V&�A%mQ&����*�N���5
(x�X�s����],\�}�����n|j���Y��������_b���j�s����bD��$#@IJf��v���*H���YYEkH�&�4I����Dr���K�%	�����vM��V>sH3����uJ� ^Hl^|���h��K�����8�]Rgh�P���cA����C����D�i�R��>��B��w7%�N����&xJ�"Z"����}K��FP��\H��,oN��X
�J�GW�)�
����w�������2�SO��Vl2�G�{�k�d������wf�,�����u��St-�G�61��qJ^���{���Bm��
�
�V���	[�1K��,j#���)\
�/�G���&�i�S�����ts�}%���Y�Qzo��l��A�@�v�����h|�5��EC'���+2�b��V	���+��}�E�n������j��4v��V��} �*�������(��"��)��]�>EGT�������N�c����
�w�g�D�5�C�w$���m���0��:�T�D ��G�
�)Yms�����
�������rJ��3�����l3��n��E�!�!S_�8F�Z�(��eJ���Q�GN[ P��G	����<�kj�]��YQ��}d|���*�
�#��{]8GTd��Hs��v
�;�������n�h�qO��0Z�R���8��`��\��������\�@�-��O��������
����J5D���~�S�$CP�"�
Mls��R��~����9S�P�������$ZK����
���J�,u����;��
�W7���d��}��m���PM�a��S��������AK�.������2N��{����5��JD��f���s�E�al��S�	?��	T���V�e����uf�����\�>��>�mG�Nl�,Ps����h���1�����'s��K24
�x���5�Z���+
m�
�����l�
��`K��|t�t�4�dw�95�%&d3��z�b�f��N MB�>�E#��.�[�J
�D��g��3��@��+��K�3!&m"$�r�#�n6w5�)m!18 �*��g�f5�%�Kpt�n��� f���vVO� T�I�������.e6�t!w�@��5�����w�t�����[��S����"*U��q�Z�R0���Z[N�\)����������;�(F����������S��+���0���o�������/K�����/M��\��Ll\��hA��o3(��� ���.	���D�QW� =y�|10:B������� u�o�,��n����4�V���3��v-8����"s h'"j��[�������!~S��O��=Y]��q�v|L��<�N/.�/8'F-L�9������_�N8=����t���jz�����W��e
O�^�������/��[�L���1��������'�_����O���9��JtA���"Z�k��?���jr� TX�u���SP��[:.���A���s�YPl��La�����z�)��(��.fY"4����z��@;���8H�G�=0��JI|(��0�k��&�����b�����z#^�i�_�E.<X����7�W���7o�+����w���_��bi�A��}}��Z��/��6p�����
C���U�_7����}���)Q����&���E�����9oc�N�/����wH_����r�V��b*��`���x���W�3�FK����9��!t!Vr��nj��y�����u��I]�I6�D6��V�$|�Z��N��������+qc'��5�q��LHd��g��S�()�D��DoCYov-�3)Z"�p�+���������s�Q�7�@����D�H�\H�`�c��F��Ay����.�pJ ���tW,wg�rI��_��a;�v����}���~��6�������(���%euA'����YF�����I^�6��ay`+Cp��_�Rpln���X�J>�<�bUu������>�8pM����])�ea5�� sv)���;6��f"��C��

R~�i�=�x��X917������J�X�H����k�N����NT��9W�k�Jt�#��	d�4$nP����{����#�[�d���D�o*��\����4�pp�	��(�S���B���Fp�!$5M��7P
���`(H`.`7{�X�8y-6��&����Ny���7����=��7����G;	G���=�u��������?�^����O�{��O���!s�B�m�s.~���B�_��|����p��������V����:\�+xY|E���w,��>��)�A��������\`��:��+��e*^E���g+���%�������Y�B�
()���C�p�?Dt!
��%4iiPl���]b/����7����R�>�5�H��G�mI.���R|��d$7��=T��#���9���3Et�3k���.`����<��l���b�c�o�v��Ng��,����=�w\���C�������\�����G��:'4H��s���y�d���h�g��a�5{a��)c�.�)��"9yH���\�pA\~�e��6p#��{�aj�d�!�m��s�qy������&\>v���,#�a��38�S��8r�<B?x0��rG���&�79F���t��1���'��� ���)�������Sb���F�/^�!�C3,���;5p��$��������Wp�G�����
�)<�>�y���l�B���\�9� �C�E�PG���c��b�����d�"��Sf����W$O��	�3��e�Q�m2eo�jHZ�-#}�W�L��{��/��!����k��7l#��m;����������4je�l,�\P�m+N�(������J=��2c/8F�B�����g������M����\�{�/C��o���~j�;�b?\�Y��Y9��������!��3�c�9b���m6�jd)a���-����J�Z+N3��go.�`Fo����"	�#,2!5�Q�}PX���]{D�s	fK���B���|dc5����<9��U�/�l2"����F��TV��&-��r������~�;$���{^�����
�����!-�/!�01�Q,���p�P�tT)'���`#��R��D����h��^�&�8o��S�I��&S��"�����-���{:��gt�-"���I,����
�������C�l���qA ���OY���9�7�{8]�Y���]"�Yq��m��+���\�����e�F�k#��{����NT�����Q��^��f[Px��$�<��'D4�������n\$p����'�g��w�tp���(Q�XMX���A������j����,�����T�����n�f���6�^k\���\#S|U��Iy��C]���`Z��K�����-#��ri�5^R�*X���-w9
;�k��z��l���+������F�A��+���R����pjl#�����{��*�w�b�3yd1;zi�����f�,\g	]��8S7lPd�U����k����������M�;h��H����`d�j���%�-m��1
��h?�O9+wh�v�9�1�����-J����[h5��z�z��%sIz�2M.��61��������2[�0�D�>b}�z���[�Cx)��`�,N����.��9	���3!�Z���7�4b=q�'�HE|��8�!�f9�C�v��>������^0���4�P��gO�x>���GG�{�����?�SR[�/f�%P�}�vz�zq��vjk�Q�7���1l����)���Y�^��n7����{fd�R��)�7����)eK������mH0�3Lz	�6n��:d7�����.%t8�<^�
b�!�r_�3�*��.T��������Zf����DA*��"F0�s�S�.<�34|!�Z�s2>>75��4.�RmN���w�G���r��=�sb��A&�P�5%�K��}o@�Mpa�,������-r�{u��,Ew()�<H;	h�����(��{y�����lOY�izG�b`��qRlR7�p���%����M����4�an���I%��{
�#A�h,�����6?I ����h���K*��"�8����I �)�,g:�]5��[��(��*&}�����K�>{.��% q���x���<�o���Fz_;`��0����?��'�-�y�!\!�L�$(bP{
��V�XX�B��jl�^XC���_�����$���Q�G%�
c��pI�a����$����>�$������&+G���������tai�QB/����`+�K�Wpc����(^���#TQR�0����
���Fo$.�a��)T��#��uI5��7:\7*������V�m��y�����rd7�75}%� �a��(�pQ��d�2�e4�JH��tF_�R�*��� Z'
"?�d/5�p$��r����r����<���L�f���
qJ�����T�$8EY���e�Q���������%���[
��S{zo=t����)���������X8`������M\X�
~H�����r���Wj�<��x���N��rxnK�V[�������%f^�p��������=�g�#
�JM5:*XIw�Ofq��FV
�b����E�)��r���v(�s���b��/�CI�9G�v�3�D���8c�]��Nz�����A�le�S�&-�?N�0\6��x�������r������B��D�I���]$ySg��hx����o(2����+�I�`�7(H�V�W��BCR��{
Q����0]�e�I���������9�t*��[��x�\���z����r8�{����S�Tr����qoT�e�rDH�Ks��U%���/$�
��8^�N7�A�79���7&�%8����%tm?��w��O5b�_��,�vw8�-Z����i�;�A�������a��
7Fh����������?~�I��]fA4i�!�����{���@[Z��s�V��Wf�3��i�&#�� 
�6u'$$���Mv����Ul�qW!��pk�*�M���`�����1*Z<�W�����h��G��Rqt��y/���z���L�td���Rn�L�d>#(3������:c�/��n��9\�6BQ�lh���d�
�4����*�M���KjN9��!E��P�C��)����Q)�I����vHV�#�]�2h�fN�f����^�M���jX�Y�S��*z�kk�i���y~�5;Zp����4�����&��D�p��!�#|(�@���V�W���8����\H�hxe���:ME�Of�^�?B�;����x{u�n����W?]��oS|��������/Nk����"�����{�c`JE�"a������,�vJvjA��!�����M0�H�j
�n����&N��Y��F�����tN]�T�!������[�pR"l�r�H�p8km��J����:���X�@'����UR$*8����M��������	���^����X���9�
��%q������6��1N���2�`I��#;��d-�JT%��O���;U��*6f�*���T���*sHf�5%�U3c�w�@��#w9��w-����O@���(Y,�y��#4���j��C�,-e�D&��Q�iX��m�\��g�5��F�)
���R�th�����%�J
��}�{y*s+{�"�D	a�!P?�#]�|���@�+p"V�@�~`j��'60�1F<�?������`����{b�S���c������aWF/r�i�c�d���M�t/�N������Z�����Q�����8��/��*)g��\���G�����=80T�_��lW�(����Y �&�?F���U��vB)7�Y���o%�Q��A :#���(mP@��-��#�?Z%�����9@�T�� �������6.l�C��P�e��$���(y���1�~�����2������)�L���6�O��^�c�2��g��E�d[�����{u�kww������z�|pw�-x?��Fh��J��l���w��g�������-%$��o�.���F��6����kP�"�<�y�b�_{�`;�H��bE6��I�$��1�
��B:>qkl������"_a�%GqAI�!uC]%;���t��|���[m��+�����}������yJ��vY� z�?'2��J�C�S
pE���
�T>"8���`R���s�;�}�����$w�)�vc�
�\�B�-\�Q7U��,���C��R�)2��:L e�U0qMmFk��Ku�u����mY�24��rxy�
�EJ���E�Jb��9n�R�<J�=�q2�V��{��7���5��Q�(yi��
�QD`�	��%�y���Mw��������o5�9���\��a�M�A��a��#��oSi��+������g�������NK�������-u ���>��w:��|\r�^�
����8���/c�C���%������B`��c@10�\y:�K����Q���B���wJ�&?��9Qb����xitH5�����E�O��`yC�� VC~�qc�������7L;n1�"��<����08[_���c��"��7�(j%{�vz�S�?��BS���E�2m4&TRb]6�ur��H����R�{���P[����S����ag��.�&LX=��t0��;'(X��v17|������C�Cx���!��,�����`FnE���A�		��t���5����0)1 �@M9)	)!��k2� ���B�-H������6�&a�lp�i����&�6���H,5sJ�j���<�0Yf
�$=~���,��@#�0S��b�Kb�4�y���m�NV1�3k�]�M%�k8��*9\���Z�j/^(q����&��tYq(�Ka�����p��
�~�9�@��n�X�d��]&��-���F�]��[	�-�o����a��sG4{�}G��`Vi��g��>�R�����"�����6��W�$e��!�	����A��(�AB-���c���E�8���h&.�2���SH-��������{�Ip��I��w��m����RP��!q1�;,��rY��%��p]�����v3bY�<�w�Z��k��Jxf2��]��TkX���	P%r���q����bz@�`�O����BG2�q����&s�k�([��L������s]ub!X%�9	����]}��[�[���-�,$l��?�nWg���
e��8}c�{�S����L�[���>�a��)7����:
���y�������./��,K���yQ����Ig��p�Z�v��M�������2�]�e�!y��3^�~��La�[�s��%���l���
�>�eST�2�;C��2����N<�g��[QJ_�����d���^�H���8�VX��u�����5����HL$H��{���`%�z���vr���^����Y��/z�����lx�����|�s�t���?�#��{Qx���b��p6�%���O��$��h��y�u1:������{7�����i>M���������3���H�qEi���n��|��i���]p��EB@jr������=b��b���9���!>6 �����/����aaL?��4`~�?��p>�,�I�Q�V2�`���CqD��Vk4�w���|4�B��}"�D��C�&eA|w:]�U�G�+���A��h���8��3cI���<B K!� ���/�0�.���Q�}��&�L�:6Z�Y��h����S�G�����h�I���8����d��y@Od)��c��U�SE�L ��.�����k�}V�zaT���A9�#Q8�b�A���/�8����������B[r�P�#/�^n�e�_AZ����\�Q*�&��$

q9BG��o,����`c�7���m�l������\B$C�>���P�Ywm���R��������+t�)��D��2���X�1�J\�!��q������bu��a�e�����P�;�Pk#�X����p"��o��n������7o�����N_^]���I�5�in|��*
�c������W,B���:Eb�Z�����r_-���v���7�i��7����C��f�)k~�O�������o\�wzu�	,Roq?���O%&�:C��^g�E0A.P@JR���|+�S���������>
���H�uJ��Scp��b��;W?rPi������MD0�(La�1j��kn��#�&���h~CU����,LO���7>u��XI'���5�=�hl���Yj���
a��i�|�<��>�?����'3� %y����������.<������L5A����:�8��\x����X)+vn*��0��!�O����O�
vv�_�qDk@�����uB�}�:K��%����)��n.��-	G�!���+e�����Mkui� ����H���t8�����V�����dx)F�W�F��
�c/)��H����K����n<��u'V���;������X���i(�$�y�M%��[���xv,�f�{�O����
���(K��T�VJ�j����f���7����v�T��Je�BK��I�-��xs-�{����L�>K������5����A�
x*5!�
oV*���3�;t�9�_�~9������\:�Rs����L�m������w"
%V��I���p��;��l!��3w��a��w�I��R,�7$����F���O����~w���x]Y^��R0��JUW����_)q�2NC�>8�����o���Ni`. ����'?���^��|s:=y����r�������/�x
�"���$^�KY ��W�x����\���K��a��o�2����+���S:���j��nl��Q�Nk9k�q�6��|R/����l
�)�{���;T>a��(�gSE���j���~s��#YxM��6r$q-�Y5�7OO<o ��bW�a�����w�7�����+���Z���%89b�.
��LN�Q�E6<Y�����o��^"zY����
�$���O)%i"�����^HmJf���k�-����/���;wP�q���R������tq�R:F$S`�� -Oa�z��YS*�E5�|���Jr��1�)�)Rr;^������e��C(|YyHQ�W��v=���9��@�����OY=#4�cc%��h(�������h!>d�#)��|��C�E�(�^4gn48��}�B�'���n:>�d��*� �S��"�����^+���]�<�����M�!��K����V'FN8�P�X"��e��1�K��J�����
�J��(x�p����w;��
�`��;V_t�^#>���0 ��������[�n�dt�����Dv�-���>OU�w��g���9!���7�k����)��z�P��v�<
��f����y��\~���]������Mf)�+��;��{�����\Eg%1�0�<��]��F�1�}�+O�|�l���Y��I�=3�N,!�x��@�������v�w��GN�w(}B�p��Q;�UB����*Z���E�&R����
��:���Oo��S�9
�7����g�f��\���{�-��^�&ZC�$��=5�Ym3T�`�B�����4�9����U4G��n�}��5� �/w������� �?15@n�(�Df3�� ���N���7�t����x
2��H��������
���h(�;��Id�	E}�B��Z%�Vd2�(��<T��H�:���q.�K��E�X�Ac��7w����J������&����n1���x<�\O��%2�'����s�5�0�cyB��kOi���|��A�-��,�����Boh)��[| `SO������|���(`����o�N�a�TB#�1D�:lO�Q�w8����Xz�s;�'p]:�8!)arGH�g�	T�-C�'�)���d�}��P���YCM��v��� !=�2�����,�C�
@�������^ !������������-�8�s�afEa�GK�&������8sSm^*�T��[����:F���=�Lj��������Bu�D�����TK��)������(=E�+����Y�����cR2'wF������C�$
q��00|�9)4�SY�����/��b�&"�"�-�����+�u��p���?�?s<��HM���cT]�`��e*u���I��t�!�">'����	p@���X�������#�y�B�����}oq�\?��� ���6�)��2�B9u���@��~�B������?T}�n��,���
�b"
���RM�
M?�Pi5D����/71u�Bcy�i%?����4V@@;��"���<����|6	F"AfB��O�ui�b[S��%�2�{3��SG�Q�C�0o1��&�f����ap�z�p0a>�������0��NFG\�6�.q���["C��1�L(��7/��R�6���v{<!}�{����[
�u��55X��
v���E}Q�P�Nan������K�{2��{�����W���{��F�[�&��0��]��P��ypcv��cz�����lb�$�bw�Y&�� 9�Lv��0���,�I�\Q�����t�"��@��X-�\�&�X6?���1l�a��P��[�$^N��M\��{�[HL���8y���b8RP� ��_|wls1&���hy�n�*�G���|�����c��vl������7��H���u�����N���q?�/�E�r`����L��
�X������o��J������M���$�\��H^�G�9���#���;��kD�r����>b&��t��<���}��@x"����j�
���+��������;&�?L���==�n���KC�?��j��1+WZt���X��N���p��2%n�#���E��epO���8��)#����E�L ��G 3�u��%����7<�����E��ix�����GQ)��82��H���@��}e�r/������
vT��\K&�q�7�������
���d2�NvG,:��&��������16��MN�KV"�N] ��]���k�|�Ze�K��Z$d_�O�o�1�k��,]M����86K������M�h�7�07Ri>=Vn���Q5-���z�A.�u��n�T��6�KW��n��mvN<�p����o��bq���Vp���$�6���~/!f^/A�i���NY�e��.�%�6��^����������	+#�����`�B�ep82�9��s����M����s@~����HB�_H�chy9�dF��E��U���N�c��������
���Fz�
G�Iw��C����'�a8
&��@1woE��nG��%Z�eAb������K
�-M�0��e��ab��J��k�16�`�`�a��`
�V�ic���
	t
� 7�L��6��`��3!�����;P[�Q{,K�!��8��y}ru����1���C����=�������3	B�l�"q�z
asT��������}��D��3!������D`;�����uT��\�8���U���m�VsU��G�[������E�������\���-���=y���U-�
/�pl�4�{�M�����!?|F�P>���j]�a*�7��Qv��4r�c�h
�p�139��B�S��}��`E
�M�2[��&��b��d�l�Wf����������z<��~��srT1�J��>}����
�k�yTg}��k�u%��q��8�=��i;�����?���O�W�W?�����e4Ix�S/�Qr��HTh�du;���|(�t���������L&�V�?���"V�B)tT��
M~�L��,K�����f��O����u�����]��R���&P3=[_�o���GAZ;��@���m�{��E�
�:�DIS��!�#������ ��)>�sR}�����[��/�xD	q�}�J�����5h�*������������d.��#���e�mb�E�Af�&�����}�[�K9e����s�����)Q�x�{$�sbfKA� �u����Qp,���d�WB��������JP1���s\�i�x�i��o��w�tYk���/���b�e�`���g������k���{,�J�*��`��_�F,�YO:\c�@�W*\��Qv��=NOW�}�L�<+�rN��L&�����&��P�����L�h�I_����!�j�
e���`��d~`�wz�b0[��p�����d���n��n�I���.'@s��!���w6���	���s(}f��[����������Z�����p��cd?�� [ K6B�lT�
U|�j���r��?�R��V��N���J�n��f��UAi�	0[�������_�s�����h}
��A�MeZe��X�R��9K�Q �ugl���_Lk�sg��{E����E�.kc���e��6��B������2yq����:������������\v�8�)�u}X]�Bs_����Sa���n�����
����M����������:�%X�Et	��B*���c�~��z���\����%�9Z*/t��!�xUd{�$'�B.�`+����a� �R9�<��l(
'��i�
`��U���6����&�����Q�����@*������D�y�V.>��L??��#��� B�	5�.#�����_&\o�pJS}i<�W�L[Hf#����8��S1]�D
f5i�;2s����Z`v0����k���\���/eZ���d��8y�>q*�����a�n���u��W�v�M���&����[��������@W�A�6��sfEFi|5J3"'^?�4�S�L2mc�/_O�f�� L�_W��v,�����f(XKv�uvu���p~iU�iy������`�)� Z�sK�v��.~�pR��n�T�QL����&%.�nCv.�;sx�(uC5 1�+g|AGF!3yIsi3G����@��O�Og���X:��h������P���*��i�!�U�o��I�Q({�T�~��_���	�G��Lq�� �|D�	�KYW���T%������k���i�>*�|����U
p�K���������W6������}����3(�At\
����FS�`�e�qh��<�p6�����o1�#��0
�@�2,���Z�����w�P�%9o�~�x�$��}�������O�%�%��l��b����	m��"�o#��_��a����9�C'��,M�(=�R�C��*���r�U��0�R��D���Y�N�c��/o�oaXJgu�q�OO�cv�������2['���!�`�U,�&k�E�k�T��j9G��#����#/^LP���D��Q����G���fR�AHr��GL��_�y�FF��"&U�Sk��g�=U{������[������f[F���,.'�����;��r@�7���i�������V�������D����N�q�"�����?�KE�l��;�)%P����Y.�0�Rv
?���F3\B6�
t	D^�����d82��W���~�/�)�VTn�z��u2N�����Ua<1��2K���$���_���>�u���WH�|�WSz^o�K���O���C�g���9:����n��,~��r����$��} a���I9�xFe�m�ZoW��7�>9SOt������R�N!�Z�>P��,9����G�����g�Z��!V*Us���d�H����e�6Y����A�9�xC@jn���s�e
�Y�&W�<Oi8-n�K��.C�Z`
+���������H�0|��WI���VHh�u~��}�3�N_J��$�L���z�o�^�Cd��6:�{AE�2`�uHp*
G�\R��ST��d��I7�$� ���:�r�^��Nux��}���_Y'bFs��CQ��m���cjE��E���hF�c�`��	�
;���h��oRZ�&����6�l�xb*���`���K������p�$��������+��!J]���fI�?�k��k0�U�������n��J���/��G�U����D��*���T��;�/z}�=?�����ro�f�K�"�tsK+.X�R��\+v��{�`<��;�n�5�G� ��U��e���a�[R
0���?���+��?K�!xB����.~E��R	p��|�V���,@+
���m;���Anw�������~g�m����"og��������y�F�,Wp����Nw�-3����^~w.9������k��%���R�|�:��o�(�m������=��?���T~'�43�RH���cJN/.�/)�"s�c�i��6����Fn�^*���\��yOZ��U���(��V������'��b��V����y7�����N+IW�9�`B����x�[o0K�krm��J�8V,�������b�C�/j���A�eo�o��I���P8`��U��E1��G_��w&�;��'}���W�Q�������������I�J?V�:�
�w9%R�d����s;9"��v��}(�f;[F����X�dY��b�
���
�����`�+���vXI=�����N �U@!������!"��^����N���6}������(Q��k��<�3�N��60�8�
�:0��<���L_�$�
d2���B�8}�a=����������	!�Ct�p|����F}���A�v��&��X$�q�	4vDP�}�d
�'�f���IB������P=���x>��cS�c%����������I�������24��M.����[@A���R.1��2��*�8l~��A>������8h����v?#�����T-2�B��V��d��Cb��o~+��0�s�j%��>�O�0)��Bu����P�=��
��n������bW��_l��E��n���?�����cl�#��b�QI3`��a�8r��U@5�xw<J�g������t������`�}F�I� ��L�FCY��w ��22�#�3�#��`$�*�L�%���}����M������zG����dv���A.4`���5���#*v�����w?�{�z���-�����T���t��M��7�9��1Z�P8��~K����*�<V99���<U��/
���m$c���`��:32sI��Jx���?SX#��Y��������V������e*8�Lu)���f�bN���}_�	�����xl��0���D���O�*f�#t���X#��?�N����?'r~�f���WW���4��8��V���c7]�o�kyO��\@��^��Rr�5��N��Fg�L���c��evO��fR� ����v%!f�-����~�G���o
���i��n����	9
{b��2g8�H�IX3����=YG��s-8��Vy<K�����W������v�)����q�n��@���V���v!a���Y�����������������]��2y�J��^��:�U�����
�������G���@�
�&����j��GS�$F��|�l���T@:���_��~�?m�?[�Yr���}akB� }I0����P 2 �x'��}V����s�R��':�m:��)����X��U�� E#?{���(���0��p��s��=�^L���z������\V���*S$�@������l�/���(��K��:U2����\��$��D�2[��Z!��y��<Fp#Z�a&���
�1��� �js�QWq-�*����B�t4b���K������_p���F&I�������q��������7��f���foUb������<�[��R����''�%��uAF1�fq�\�R��B�xsW��V�>&�~�;�/��n{����O%���#o���+b�� 4��,�F����X�8�������m���K���6H���B&	8�2�~Q�_H,a4�h  X88U���5$�7w� �B	rY�v��#���[h�\��L�����=j�@�>�?���H:���PO.���U
YN�o��`k�����rT��1�f�s����q�$k���K���,7�J�o���	G0�����
�@�y0�Lc�h?���l��T�Z9{� %V;
wD�2��Z��$�p�����G5�{�-F�Ta����mz7
f5,��u�n)���Q
\��{���@����~���?}���T��7��V���=���F%F���)�X��x��A����	%�f+�}�K��P�9��p�#�r��\IjqM�ClW
�/�U�q�
Rx�dd�!��y�_(�h2��}.��bt�f�����O2���7���~yry:}uqzru���xv1�|����
���?�-^���GJ��6���J%3F�F�E3F�E�H2�%h��������l������q�=e��G2�{%�:?Y;��S��)��-�7m�f?����e~��R|�a:n�3�����"�~��b��r�QEy"pi�?����tCF�!#�����!N#�C�o����&��Q�'�^{
q>�w�yw^������?4��� r�4&����[&��9�u{=������{>B�-���5�jt���;,������������Y�S�.��X�P�����k���B�P[Y�I����J0���w:�rOV�{Txp�$.�8=-Z��������PZ-	0�:�;�/����]l�l��������� �����e�5�]���G���?�|$����K���9�����g���Vb�8p/Gk(K"��E����'�M�q*���h"���$u�v����^��~�i7;� �f3�<u1���N�.��w�v�D�A�)D����m�����)��Y��a�����lv�(�&�s)�P��(��'�}��FUN���f��Si����PP*�f9g�������#�a9����PF/R����}�4n����7eQ��+)�������`&M"F+��o���Z�C���04%��Q�S�����;��+4.!�N�Pm��3�ty������G��e�.)��#����9��ZUF�C"�0������������?�����"��4��i�Y�y������Z����K���W	2a��G�p��?���C��:�4��1��+E�����h&�P"�	eLB����:�xb�Q�g\��;bw�t���P;�>�+�}�`,y�=�q��s��%��������FN��}8/.1a�������;�*.��;��BS�6~���D���C�J�W��5����C�����G���$O����_��H|������O����5\90������A�
�i�I�6��5�$��z������D�e\�Ga�������vy0�wy������<����
R�|���/����c�����}�ytq6�A���Q���t1� ��=/�gDi�;�6��_�}~�m�1<Y.k�'���F�8�GVUK��@tCSk�����Kf�M��>Kv1^����������_N�#��!�K�T�t:C�u��_��!R�RQ�I�z��?���1���� ��P�p����E������j����8�:������.�lG�9(��)$����j���
�o#2���1p�V���9P��A��~<�����?��9{���	���wkP��R|���~��������&�����0|��*c�ee���n�����w�*�g�<��@����Y8�?p:����t��;+��s����C�|���sL�e��`�~�>�>��I�J�k_��__�k�__�`�����!�5�lL��h��q�������zg�<����
��qv�9�R���DvcI
������~T������g"�  3����/-�����3�.
���%b�N{�]�*�&�L���D����f`z�9��Ap������O����X�e�Xx�,=�`�d�t���tsV0�9��N��yT#�Z�	������Z��#O<�>n�t(=:�S{QE���h=�#����D:~=����+$j�2=�7�5��JNbu �s�	0�\u��W6*�[��j88�����YP;f�=F�)�=c�aI��&2�C������:�.�T��
�e�N4<���Q!G4�Q�&�F�f�%��]�iMa<c�q��c3Y$K�d��KJ�sb�k��yv���7?Z������';�,���C��>t�_���x���==�9����niGF;:;����"�tT^�,�AK�)Ns6����F���7&���XW0��4��daG��"�K;�?�NW����V���F�����������v��������J)
S�S�-�^3�C �0V:%�L�J��DP��=|�dd�2|��QO}^�]����+��#IzC�8�D$�4i�=�7}Y�+�xX"�����:K�R��s�WL��O��|��������r���N���6T���c�iY���8�7�v����3{���A�3^�Z�`0�(h������R�V��T�|h�a{��z���n��\'~���
�) X�'DA
F��u�����8{�].k���{��kzC�]���%nI28a5����\DI����d�����LJ�;����
4��r���4V����a,sbt�e.��]�C���UR�>.�%fb^a��`����;������:���,b���C�)'�������g�:�T����1yt�n�5�_dXw}�19�l�y�pCUU�Mf?��Z
+9F������h>'�{���u|?=��-;9��Gt��ul�yj3��>2c����(�a��4��������/�a+���x�M�����W+�	��3��\T�*����s����x������p�	����![������I�O��S�����a����DpY�B3�g+���62�g�r�C� ����8�S�����&q�b���h�1I��_���5�T<F�'wu������A���o�]k4F���H�U�-�v������W
�JK�a-UWR��4K��9��@�t�����G����5G�o���'~�.��^���"�}�����X��(��y�/�*Z��V/XE��,.gr��sA{6�����D�z�\���O���Y�4��<�rXN6�ON2+k�+*�W����dP[8��'�p1��Z�`��A�?���K�+^v%
)�R����6Z�/j�7�Bw�����2����q�V��������T������Gq�N��/8�3.z8�"��B{��)��E�$)�5�������v��ug2c~���Vr��������\0)�8{���������=�%�C�?�:�?�w�q2�b���v��k�5�<A��a<"F���vc�GB@	 -�z�b��d���,E]�Q�[���"<_���dY	�,L)�!RqG�)Q�f�vg�;��s;j�JV

��]j����Gm����#3w�C��QT$�����`>8�2�s��a�<r�e$����)��J�B��p�	��D�������E)Sa�BB���i�G	i���}�����H�M��j�O�]���d&�b��)�>3r_Y� �[����:� �u��e�e����wc��PC�t�/�O�~D1�gY�zT�q�"��[���o�~��o�D�~*�X�a+���,4�zQ-�
D���p���7���z�z'������D)��sT��t��0^0���\�7�GF4Q��E��e��jmm��.K��96bJX/��D	Z������}i'���<������B�^��������#*�9�O��[>�$��m*Q�r��4BH���X��)������rG�Z����t)��PQSY�J�v�G�N�������1�2�6J����]kD�����������t#�Q����X���+��:W��)��*�r�F�������8u���&��o�s.9�J2��P4=�������(��^P�t��tH���=��~g����Q�,�S�u`TP�&�i��RC����"�Z��I>�i X
hV�"+��WK`�����
�@�*XBx�������6����|�0��s�s��#�4FF�>�N	Zt�e6%��d�PA��k.���H��fi�
�s�d�w��~��E�)-�W�P��|_'�����t���l����=�[��,���&V=�/��y��6��r����(�aJ.9�09�f�T����\h+����SlEp"������F�6�L�*���l��s��r0	EY��}��Y�I���~��B����#6�&�O`�9<��_	vC�r*��X�����L��f��c�um���b�$��G<��'�{W�T}o����[q�&�`����1x���;���$7��T�X\D�g����v��e���"��Wocq
d
��U�Z���KKVD�����d�����a3��D�B��N��n�1 Bz��� �9C8YQf���E����B��C���'�PJ��9���:�j���n�Q�
E�4�S�X��Co�~�|z,E�_���%:��#d�:��\0D�h�e���q��"'�;��!
XB�n����
�F����W����*�X��80/�l�o$xHJ�W[�����`<a_p4�q8�����B���V05��PL��	#���(�f�2��VT���6D������Pd�����!�+�'���i&D�E��o����������/^�w��,^������=�f����9���"R>ZeW����6Iz-\C,��W����g��KT ���f��R�h�������\�KcRfl-'cv����>/�>�t7����C�����9\�u�����5
��>�#�q��"��.Y'��t�����c�
V��q+,�������i����O
��������=�C��~|1�m���M$P+������U3����a64��39�Y����C��$A��,oF����Xo9�CHF��o}������h:"�e�\f�����%G{�K�!Yh~?�j{�K�����0����o��
9�k��_TJ�����F��A��`@�������JF�B5Q~
�D�,V_�u�/@�E��x�M&���R7t���d��|7j���e������%y*}�Y��3�!��Q���Nq��RT`�Y��� ��5��gQ&(>|�l��3���7��<O���-��%Y���u���9�UC��0���#���D�$����/Hh����d��6����F9J������&$\�RO]q}�MI��������N��h
S���t)(�,������|���ZW4���XGe����t�������W?�]�%��^�}�U�,���8��*X����=�l��qK��2Wh����0:w,s����������l�9BR$sXy����s���^��s7��i_l��~>C�?�$1*���
�����}� l����������^��#Yl��d�BU=?��c�<L�J���������^�V`�:j������$=}������-�/k��*�e�
r����!��arl�V��������nv������7\_g7���x����U�'A�"o����4�(�^���%w��fgO�v�����c��>�Y6���5�z���3��J"�1��z�vca1l�?����j����AJ����j��d���y��L�O�N���m���6��3�H%x�^��q��@��or��-�#ey�x� ��xrsTK}�m	���B��_&P�������a��ub�������xdp+��Q.��0�@�����0M�Z�><��m�h����m��6^�����C��3�r��P���S��S^<u�{�dX��G[�"����d�9���o��������K��&R�����u#�3�i�����z����<}
�j������7��o�����>J���o���*#iU��T���	y8
n}�a&�U������th�%`
B&�c]��#6mJ��)���q@���P��~}!m�e��{��$�KTx�Z9�@�z��|e����q�t�X8��;2"��L�kz����i`,����V;�n�;���� ���Tt_��
��X��Z��8��"
�?+��������;���{O� �	i*�v�lA6�(�/���Mvu�c�0��X����W�x DZZ���q�v���I���gN����Je*�{iv�x�d��\���_���H�S��g�
X�������.M�����b���I}��jd3��-��P+����S�P���X5z��0�=�jF��R>Bn��8�w�hK�;�nH�� �D��C}����BkZf���5���It��)`
LM��*hR��I�|�d��{���>����������F��G�_3!{�*�K���� B�R�F�%q�D����m�6oe_��=�W�S���|�?2z��2���)��GY�2��=����#��
�	�2@z�C`����
�
�����6���$n��������K��e����b����y+��o���&%�@-�'��e@���*������j`;L1�h��qUU�rVf*�.C�����,W����2��U����X�U���Q����x�93;�v�1���1����E��~�����I��b%�.��HAV�#^��F�y�cU��2�������R���v��z����2�.����i��p���f��d����/����Nw8��>L���L5�W�$���A��gxp�ez���E�Y�
���u��O�0@�Q�dz	F�����~��p~1={wuzq�������2u�������a�D�������t���A��[�*#�PK��h�p�i�F���b���G|�l��Q�Ej���c����g�iI�0kZ�A���!0�1G��t�J��J�"	����S�+�,��~�J*be���������bNc��
3��e'�_Y@se,��2��;H���+�U�j�9\.r
4�?�����={Zk����>�}���������4��Cvp�tMzF��\D{��r�iK�}��B�P���� =��x��u�n:�
��aM8����-��2�c���I�?���i<�8�=<�noB�cz�{�����4[�Fe�m���Ut}�	x�^�79���SH���u2�������+t�C�5������
��(��PA�M�$�Z0:4���~6I�1
�a�KVD
rZ#��/baE?��1�>��N�������<�1�81I���&M��s	n�|�(r���(��EpF�b�3$	�Y�>�i��m�*��_?hU���Weq��,��2G�w�}��z��HK����0�5���?�ni:l.����t� �
BNzi����aP�{�pU�OG�R�(�"-ZiVT�����/a
�+U��������\��|���bA�����J����(������aP(i�����f�e�R�-91��r��������d�w�pG���3��{�q������4}A�9D#q"�F�
�����C*������!���\���v����eg�������l8aE	����n
���A��N�g�z�%a>���=[e����������j`/�][)/�����i2� A��Q����f���3fmC�;G/�|���(y	�.�������=k���pR!�L�����h
��8-3��YHR���I������W�"Md;�vF���;��������I�1����o���TE��P���J��6�iy��1f��\���G���yp��,J?x��
|��A�w�N];�&��������%���72��gR;4vc�2[���T��f���n��01�hU,(k�1V�w�0�o��,��Fqj�dI�nP�(���nip����]a�/����$7i�A�;��.*��v;�6�����f�&�9��c�(���D�w1��fX��[<�)�z��X9.���@�9���D]p�-��������T��������z��2\B��3)9�����&�:7��hf�T������� �>m�������������<�[�����g�&���1�{��'")��}nC��I�<���f���~�Q`����	��\�q�uq?��m,���=\��Y
%k�{��|4��[�Ag�{��h���5�������Q6�:r��,'G\3���0�2��#���+�Z��5C�l�apn���~��^{*nu��/���y��L2��(�����
��c6���~�A���Z��`1����(�s_�wl��v�0�?��M��������i����6�("�9��=�����|������K���6Z=�O���C}�r�P)�N��A�a�f�1k�P�wl�������(������
����yU�@�=���I~�j0���[�����_��������{�(&�K��+��RS�i�w��������#cou���j��:�	,@�AW`i�,A��������\y���T�����N�x.���v#�W��9��!�Mg��P)�M��1M����A�[�������:�y������}�4��+,��X"������:��#.�L4N�U�y��y�6�������'���}����Nx����/�hZ���
 �S�7`W~�LS4����<�pF���E��������%�g'$���k7.��
�-��S���n/���wq���S����+��w{���!r�V���-Z!+c��=���:Aa�7�q�������<�d��M�co��(R���AM����9��W7�L��d�	0�\�Z�A$&���K%5�W��dw9�������h;B����L��1_��XX��]w��B[`�����:������jo����q����oG����>�l�����(j��dCg�G�����U��U1(����OtQ]�-�nFJ���n
D.O����2D�z��\�[�����C�+F1�.�T|aH��3�p�~��(�P���8=����ZR�-������DgT����~8���9�������S�x���=�j���;:���#���J�x��,��G�D��$Y���B�~��b|�<�R�>E�t��[��d!c�8���H�~Oz%�����k`��"�P�>�o�w�q����vC�i��C����UA�Y��C����W
���M����8\�N��N.~�&g�Gs�v����Z(�����l���)�c�T;�ZJ�-��}O��������'L}���d�fb
�z��p$�8j�b;�|g5*|�w����P8rh�Q6��
-	�"�'�O��2��=�!
"�����Fkx���7K�{,��"���]�:N��^�Fv�1�!#c��0��?��6��R	Y��Tew*���Z.tQ]�Y�7wz�W�Z�wN��Q���������Oz�m����8�����u9�����$Q�u��B�;�rQS�����t#-�P�S�����U�qs�����T���;�V��;v�G��u3����c����E���l)���P��\��x�i�f�a���e��?;�A�Qh������h�v
��h�X���i�ES�O��3�SL�`���D
M��.��zUT#�����k�s�F��'.�0g�mx��o����D�.
C{!��~�)��j�W����?Z;V���d"J�?�{������V��9x��Fih���q^z/��(��3s'����hck\�&��F`�b�s�T���N��T6����qw<���y�5X�N/O���J��e7U
)Y����rh�M('k7�f��f�O:;�3�������U��{�w�Me�q���@��7��kt�i��:�2�VmV.��^$X��V"�/(���
�97�;>Zp1�E�%k=��4��F��(����;R��v����-!Q�����9,Q%����BqSw�������`F�*j����h
>cH��1A�R���0�C~9�mT;��
Y����~>�b������t�4�p
��4���;������v��aD��	����c�m�wS��G	�����\�r�R7W���n�L�#��_5�]�|9���*�[f�7�=�m��oSi�~�+S��X�S�e���*���d�\���"H���T.	�!%Qg��J
q[@/V�3���3Me
��	���B<$X�j����v�R}Nd�KS�@=M7�,�@�����3�9X�o7Uh�����p�(���y(t�������y���bx��7�i�U����v���t3��p�4N^nj��a��v���XI�����fl���y}���+��Em���mZS�X���F�%��>S}�6|�$�9��j�~��5�o��Q��V��&�'�R�Q�1F��<�gU���m�
���I�;��SL*�R�Q��(��}p��*�����x���]��h��ud���	�\�����d�]>v�/�q
\����:jBV���[�`�5�?c���;��L�[W�R�w
��T�����G�X��x���{�R]���JK'��,��B��?f`vL���v��p�3��z]�n���?FawP�>)���A)oK�����_�<��iv-�,�_T�V��*���B�����D��Zj]
j?���A���Kt����3���6�j)3J����0����
�(n���E���E1�lvI}����*N�dG�2x�������e(�a8o@�sH\�����)l'"��R��(3A�H�0���c�5TT���qFbFK}�p�**)�6����U��1�����:��P��I��
w����$�`��I���TB��������#2[T-��9_�5B2����[&��^'����9�E�:�7e�"����~H���u��PaT9tj����}��G����$��J��t:���7���`0D�"Z��?�-i��PA|�������-���[`�`G�I���7X���(u��97b��z~��x���'��]�d8��i8��5`zX�z#�F�+�a����wgh��}�/wP�?K=l"K���4-�{4eY��t�G�E���&m/���R�X������:lQ�b�qi��)������W�f�q�
BFe�C
���a7�`�j���v��h�)S���U�������
�V��g��Xk�U��	����qQO�!6QVa�|�����n�����i�iO�#��3x��<&�TQ����^���F����������k#t�J����$L����0%���[m�u�����`���n��������?e7��H\�)_]:�/��%�m����!��;S�|�������;���0����I8������f���p��L]��U����"/����
��M�1p���u��4;6�'?�w I��BM]�e���~�r�GG/�N!�2�l��wdJ�`�
�R��!�D}p,����o~:{�z�������Oem�v�#X61��wX��X4�;����w�N$���X��W�B8���������dq5*\/~��an�����k������4B���E��t���Hl��!?�*6L��)9�^w8���Y�j	�w�}6��b�T_���j�:�!�C��y ��=&g����b�{2Y]�C�s��~��R>LH�Mj�W���_�h6;����j�(����K�4n���;�����!���t�-�sI���.�B�
���r1_�GE<�����]K�����nA��c�pf&���&J���Zw�=�w�����-YbD*�Q�jx��kOy��H6=��M�R?���C��	KI[n�yF?��������]��6)P����
�`6��a�;o��`��f�`4�	��n��o�deF���oO"�E���-Kt�v���h#u��Pyl%T��lW.dr��^�����7�c���1/���/�3����3�����y��E��^G6ez#��`�������
����.���a�Fu��Q��F�*Qw�O��n*��@������x�������Q1������v���H�+��N�fE!����8��(��"�p[��{�[E���q��#o���G	�N�CHN$z���������jR��4F3����7' B�&��<�,U����{.��2����LL��j�V��$�B��u����d`%��H���a8\���`����?�Y�_u$���N���^���|v��#�3�T���M��|�G���m��]<j�x
)���v���2���Pge*�`�1Dc��PJ��R�����_��O�����+���?+��t�	_K��[���~�&������������w���B
|sz����B��7����I���-r��������H�(��#��cVy�l�.$��������l(Da�\d@��&�<Bx���������9}u5}w~5={��.N��������������N��OW�u��9�z�
PG(�X�__�k�__cY�c��,�TX>��_�sGU*'�7m�;��N��>�.DPq�]_����J�9�e�A�2
����j"���Yo�'�A����N������/���T��Ix��6����L��n:z]���b[z��YV���/K�7e���0
W?;�h�zA��=?��'��Bm��{��H��y��p��-�|��H��o�xB��em�6/����$�T����O���+���sSR^��$��d��#T}c��m��E���_��u���^r�4�(�T9��qN��w����"�1�k�'`
+���	���*�b�A�f��!�.
� W<8�C�*9@���j�����r$�m�~v�� b���4��R�6��
ND
����&\�/��r�#��9�|/���i�����h���`�bI�e��8}DX��E
��B�p[�M��������r1���7��:�K���ey��rm�jn/����/;�;��Hsm��������<�X�)�)�	���r�Z{tu���221CV��������C��eI	Y���3,N�d)!�P�H(T��x2������M����[?������d��v�������C�6����a�����V+�M��E8&�����qD�v����*��e�����?6[T�����g;�>��@]$�w�B���-�,=�����=_��@<�
TI�f72�j��o!E6{��������,�V:{;���)�K��x`��W��H�����s�r+�9y��s��2�<��<���>'���X������'C�
4�
;��|�s8��t��cI��������lC�'���W��d���$L�%�)G}9�.����Y����{���F�qH�k�U�P���o�w����E]��[qU�|�x�����gb��puD�������Y�J����a_�E���]������3jEg�D����������s��qM<��plH�;�Fr��y���>�-���k���^����N�=��������Z�qCM�@�{*��OO�������m���uf��p\gbwK��p��OIo`���M�`P��6��Q��7����b���U�x
�Z����j��	`��.yB��5K"�Q�raW�|�_L��
�	�0�*�`�Bo}No3�UauF���-iK������Fp��9���t�B�O��r ������;�Mp���x=����z%����,8�@P��)������Wy��:�y�1���:�:Vym�������V(|��k!$?���!������:�E�g����5��gF�c����s��0���P����U�����/
�B0s�\42�������J���B���������
�6!odZ������FpA	�sv��^~xw�{y2����V�t$�/����X���|�E�w/��t������V:J�/H����)M�Q,����+b^��[	a�w�x�d�|�2U�rJ��H���_6t��%Z��%i��Y=,,(T ��6���7U�,�;�Y�=H$�$��#� ����&eiS��1J����.����y+�*L�H�le�]��������������:�/i
��lkQ�]T�{f>��3���\dX�{C�i���W�r[����h�0�~F�TP?G����N�S����'�q���;3�vf>;V�z0�!�I``�D���e�*Y�U(PB�J�I��E_�F�g�K���m<i"�Gx"���l)�*@�j"��x�>��
��w9�w�7/�N�0?��A	�]%[H=���$&b?p'�e��/�=L��CPPo����"m��Zhm�D%��a����u�#_�=FR�cD�/��<� =`A�Gv-H�FY��j@�
���z���f�C��@8��/�!���8�H�����(2"g���8S �b�e������)��J �r����G[�t/l�Wa�z]���������z�����U
u�]$�=���q��T�A�_e�j�O���Ee\�Y�a�
�����`�ts��rv�"~������,��4��Tw�(�HN~px�E��]��#�v���6���������)�B����?�RmY�������]�.M�v	5�+�[��s�e)��)N��Qj2�{X+�;��I���
����q"���"*�K���J���Y��������5e4,Ue��|��E��3�3A�����14�8��Kf��h�`�n��C��Z���3�'�0�3N����1��q��[��i���{u"2MA%�rf��(�\�#'9z�<J���o��:��Z����4 bSl5�7N&�7���oN�����U�f��l4�E�[-��;����s�6K;*���&(��N N����zUHL
U��M��K%��B��0�&���%`i
�t���h��
��)�&a����G������X0��-�{a@w|��.,dqo�M}�)Pz
��n Eg<��8�)�X��6�$����L���6	[�?����=f��
DS�6��Y�x&P�L�m��I"�9��PC��-6��7hRb��p 0�����:S�A���XM+���E!�Lh}�����@�!?�2���5�'���l�Rt7��*]�����>�~�a�����'uc�`E�b�\�#?cm����E

�Md���NJ�������M1���������=C���>9��L���`u��U4�n���K�y����W�^�X^(�J�����V�L����t���������l�7np���o�����_�^�9��\\�����������s)���z�"?�u���F�S��h�IV
u�i��
,�9X���s����M�G.RN��-�x�����������DEW�<y>��f��txA�8H�2+����a���:�����b�@+�
6���t�A����t-��4�l����Jj��^�q�?Wr'~�U���F��	:MX)�rsM�a������k��d�i>�%���c��A�=�+���y/�Q�A����hnj����|4s���v����0���HCl0&���Et��7�q�G����05�x�,1�c������Y�������W^U�zonZl����-R����|���i����*~��t)C��4�wO�����2���Q��Q��M���,��8�cC.Z��d�������6�=�L�
�<��q����D�c�6)���y����g��!����<Sy��K�1P�
�r�q���S|ob�K�q���X�3Y���;C�CN:���"G������D��^2������S�����%����K��<�G!�����&�5��8������5���81�2�CNY���sNj�qm���M�>wDFP����	�4�Vk�$r��zV����&#��oaF����638�O�z>gW�����NP2�{�.5���g���x�O.�Vw���8t�\�:�
�\���K����|_���
���:�
���-h��[���;����.�����Cvr��<���R�'50��
���(�"Yn&���p��N�	�;�!E����84����I&�;{I_8���_(�	�/u�
A
B���x~q��o	b��%���_i�#������q�3�
D-�i9��y��n��C�
�@�r�D����
jg����|MK�3a=��j,�Gi/�s�R�c<f�8ij����H��0�^����ucv���>}��py��NJ��&�%�	������������Z#������ ��Z�nJ�Ei�XR��@��l`i�n.YSu\��M�	*�#�|���zBH��c@?S�-("Y���
74X�>�le3j�BF�����/��h��8����|u�n�#t����b��&�Z�����~�{��"������#G���J�I�	Wz�L�CF�-������B�����:>�^�_0J]o��o�j���r�9�����N~~sE�J�'�^�^^
rr��������6����Df��h,Z�t
2�+���$^1k����4��z�eo��N���w������duCi�����8�z��
Xr�x�
�pP�$��=M�e���?��J]�r.^���������-��8�*W�2��*������8%rO}hUWw��jp���D.��j��$�0r6KV����%!����������q������r_+��=���'6�m���xk�������a)�����	�nI^P����t�f���;h���y5�L��!�eN�{������1�e�+�j���Mf�-�p����c�������
��=�MC���{%���j�� �H��F��k�$x�VM�qU���[@���?�2��.^7�!�wp/������xm*w����.�k�p��]�@�v@'�*'�[�K����U7�wq��&H#��U<�����T�<B?���W����h�\�0��Y��\D�s�w"����C[zm��I^7� ���,������������ja��u�Z����L�DT�"{��������,~	����+���������<|��D�i\��TGM��x�,z��R�u����O��Q�m���a6��}��M����!w%��N!y����5��kPn��?#���0K@A4@K��K��	Q��{�x�"�*P��Y���PE���?�����[��L��j�8�����)-��UF��cgd��p�����57�2`���&Dga���R�i��)��l'���ph|T��J-��	\_�Xh�KS�����������7|r�	�7�X����JB�6����D���C�k�"������(�5v<�W+(�ew�����#[����R��tP�J��X�AI�����7�
8)m�����&��c_���iI���n���
"�}De:ZX[��l��A$���WG�CC���j������?(<��^�3�.�h����a���q�;�#�sz�H��cL�>�����D����-���J%�Yb&M�����|�/R���P���l6Ng��/z�?���l��6�����n�'��V�W�!@vD-���
��
�#��Km�5�)�.�m��� @��,�����/{�e�{?8#?+�� ��}��~k+UI�7~?H��n���bs�J9���&_F��)���z�0l��,�X�n�s(�hy�+?O����*q�g���T.B���'��9���J�|�>������BLsP���,�=�7��.eR!j��J�����-�� �X2�
��;,�{�<�,\@iD�����
������E�������m��T��z�P�=W�b:�Z���N������Iv���M��P�������m$  ���&�v���`"`�"R���R,G#Sql�eV��J��K,a��v�
U�&zO=�>0����d�Z>9=����/�p�f6��	�t����q���;�y���tte���.�^��cy���:����|$�������J��A�mpj�-���� a~
]�X��`�r�������6��������TrG�>�\�E���q�P��o���r�m�/��^�.*l���},���K6�h���f�4����u�=��4���)�&S��T$�Q�RHVP#����v%����Ie/���XR��B��$�$�'�vK���s�}	uN���eb����MN�w��1^#���I�h�;[x)��Vwp�))��p�?�:�+������xWfk��@N�)e���b�`$	c����c�T@_<K�d�����,�(Cep=8�=x7������0�~���O9�9���uoFx�������=B1Z���Z��p��Kas�9�����l�e?��A!@�<�WTF112V�������Y���T�}W?br��<>�����9���?h�-��������<h���������NE&�|?E��|�@!oI���������e`�~Wi�T�$q�����g�1�L�t����~�<hoA��T�bT�A]5�Y��f\|&�pp�q�������p"�5H�>�FT4[>�ruNBX� S�KY�J���w���L;���9���0Q��U ��&=h2�LxD��=�J�U���
_��
z;�2n��A!��A�syh�wF�R2�����O�+��{I*:�n$��o�f��7���lXN*�=�E�
��pBx0�8s��0<9KC�#��h�����\���f��\�E�N9vK>zQ[����/��(���W�N�����	��*��	r�>�\�������n;��&��?
�N�=G��p1,:�,�u+;�;[���v�T�P����j6�c��]���ZB���P�}*�'uW�fWD�&��+�=����������;8�a�G������������'��:�l�a�T�:8 2��z9�U���.������}�%�c���@=��`�q���1�gv�rM���Oo���7O�g�G�/}�����$N��i�MW��S�3'w]����~���2��YQ�����Fu���R���3���M�!$�������Y;fK�6�\`��F��~z��R�;��=�Q|�mu��v+�w[('?y���@�k
v12-0024-tableam-Inquire-slot-type-from-AM-rather-than-ha.patch.gzapplication/x-patch-gzipDownload
��'E\v12-0024-tableam-Inquire-slot-type-from-AM-rather-than-ha.patch�<is�F���_1�W��I��y��,QmlK+���J�X8$l���6��=��AJVj_��*1HLO�L�=���(\0���Vz���G�c���2��5��+��U��a��a�&|��!S���i��\�v8��E�W���l��Wf`F����u���s3�'l�
�������L�N��e�:�HQ&+������������1�w��~a�i����.�O+/�,���%�K�\\���6��d�#���������������dF'7������y}rp���*��08a�$Y�'���0NfQ���������qi&�������8��-x����m��sn.���2	�8�Q�f���1�U���9���\�����m�,I�w���t��+��������8��t>�������������l�:;����k/�xm�������*	#�p����Y����uJ}��^P����:]�Q�% ����/}���)���>��uRE�-���^�0�?������U���������o�\�M���6t<��y�Q�H���)I��s��ya���M�����r�9]6	�LE���X!X�8���U�6���>��wa�Q��v�Z��Nrs�]��H�	j*s=�������f��yA�#1n��l�0��\����!.�tf^��������Y
f;���������!7�A����_�zh#��H����^����A���!{������0Hg���=����	Q�������Vv,L����i'k-���Ol����?�$J	D2������6����j��\qG}G���2���(����
�d��c"���3�LP��
�I#�<��F�5A:/�iq��(A=/�x��:�{r�/�[�#�����vf�l�������s�.Q�8B�
��?���vr��_�\�Gh`	Y������b�3o����.�n�w������T�e�8�=��/D������
���^�a���;ONNdi���.[�*%~
H��j���`��v��Z�5rz#�;�l�VM[D!�t4x�o��2N j�{8t�k��d*'L�[�y>�8}��vz{���xzzv6�L�o��?\��0�����G������p�1r"Zq����
�$ae	~���#1,ri
�������F�@�
���'I��m�<�4��y9SI��h@�]���<��Q��sx�<���/;d�ssp��$d���2�t�f��p�1]H"8���5}�#ggW��jI<);���P"37��=�&H�e�	�9_�0^�`rJ,6����4�� ���3�o&�1��Hr�� "o6��g
�,��m�� ��T����4�h�L]�O��F���������������
�l�>W7�����FD�&m@����1<��*
X&��,$���Y�
Ik�7���a�g>|�[��#"~!��*��$d6�Y��$@����t=��7�K0��U�DB�����GR��C�Y�;��Ys2 ��*p�!���
�c��vu=���Z�%�#s44y�u��k�w��nI��l��u��}EEKOr����^y��y���.\!
b
���;#=��b�o���$-�_�O��k���,�3��8Z�m����kWQ�RAZN�	B�+]��1�K���h��w����q���:/��%�bO�BJ&qe47��9|$Wo�"��%io��}�c/y4���)"�\&�s��Z�2�\a�����i�����c������	��g�&R�'��m���j�����b��NsEhVn>K����T��3}����A��<���w��{p/0���G	R���H���l����/��+�J�_a��������U�9�'I.���s��5��d�h4��<�����������8�����+��jG-"UC�1L��a
����_�~z��U�]Q�Gp$Mp�m�6��#�SH�1H�f�������F%|v��~��o=G��Q/� G
�SXB�(�-)�H���kl��+���'�A��"��@+	�Y6q���[���@d�����m��Y��-_��nM����m������4���;�A��(��W�^���/��*��M ����}���.��$}<�+�aW���������+�I��/����<����8��&BPP����x���V���-�C�- �a�����������M8cw(!w�{�'�l	Dz�L�+�p�"��ojzo�G��t�}�o�z���5�p��j$���1�F�`G�	�@u=�N��s����r��f��Gh��lk�Z^�z^zB��6b��;�|��'�B0:�|^
 ���I;�
%3�B�����tN�k8h�f|��[	�-���u��\QX�L	�fYd�HH*lU��W�!Q�7���H��o�����lG&<���M��&�����D��E���]�>��S|m�dS��_��1��{�Q
8����p�,��p\(c@���=T���G�}��b�����j��<�#'���H�c���Q�������(���H����Q���\�����m���>P�����sYc��S3z�����v�M{�
BUF���1P t����&��Qn+�?`;.`����P����L��4����tx�V|jW��|r�UeB�@����0u{�[��������R�p�G�{PJ�z'PdV�+S��X0#@������G��5�*�U0�`��%����>�Swd�C��
�����+�����C���UC�\�r���k��H(Jth�A7�kJ~[���:<1�T����CF�l9���]�F�=M�����c��k���2}����E0t����aDk�I&�XE���������4/���8��P�X�E�|(Em����v��*q&������]�2`�����F������;�I�"Nn�����Cc��h;/q�=IC��[�(����5C5JQ����#�5���]�q5}�;�`��.�P�"(y�9����\�E�'e�RVb�C41�OYU������<��x;V��Zi("'N7������&\�	���Y�I���Hr�g����(����6�9G��{���r�M3:��Sn���$}��J�(
{S�5���5l]�*uJb��~�����I��~����~�_.|��`z�����e(w!x�-C����)�e�;����c�U���H�g�aZ���K�����KVV������������+�H4MS�8Ev<'�ECg�0@@�MS��������cnFv�KR�?��`�X����L�:����]C�O�&sVc=�+Q�R�e��"d�Ol���P�M]O��i��L/'����������9��������/.�7�Co�T�C��[M��� �cwY!��K�����!���tx��^^�k�]����jh��vym��'���� w��Pd���^uq�W?����*� �e�$��E�������f*�,E_��?+,�`D�)�Bb.��%5.��!�Z/����;>+�t�F�Q3Hr#�1��1q��&�[- ���k��La����.��%y�+��f�TP
�`�q��(i����%���;�K?k��*�J2Z��D{	B6��]����6����-A��?����V�;Pm���HQ�t)�N"#}��7���d���F�L�� ��MW*S�B�/9����I�?������M��x�����|f��l�7�@�
�(���Q�x�:�����h[�Nj���X����I�LW�t��d�����P��H�3��J�n�L�[�&��a��vV^��b��`�����ai
]�w���R�	�+����JxJ��!%o������T�S��D4k2N�q�7gq+��Hb��t�����<�3��
{E�lf�HG'��-����U>��i{�T��RFX�(� �j�k���0j
4�����}��)���=E7k��j���u�o�����vyhV����=�������%;�Z�R���u4�}�W�zZ5�q$��o�y���8)$sB��1o�4E�%�3�����5��sa�>���K?�����������a�h�iu���������Q6�}>�}��F�k�����kPS���>*Q4�B��c]�rzb��&Ol1%1�[$�.�X�tDzI'��������t���}�xA�R�}����5+�u!*�Q]���������?��Un��!1e���!�~����Y����s�Oa�����V\���O�rO��r�L�>?{�dY����xeaS;��e���(� ui�eR��$��h��x7���������ng�U��9����<a�?�@5�v�1�?�
��{��C�}��A����W1�/�;Q� �� �dM5��W����8��#�"#qL0��;�LVI@b�<�{�sm��yk./�h"$ ����*�(
��y��z�+ k����z�����Q��[�j��2U�*S��T���Z���hm/�\5&���w^�I%U��f�^l���� 8J+O����xy*z�/K�1����m���7bU������;��tT����h�h���@��2�����h����$)_�z�%�,��B394��VM�;����jK��B�%�n���"|�g�����|h�K�%����m�K����VdV{�-������T�m3�<XW���8�9�z����(�>��.��|����d"~����[|�J}�'���^l�w}���3|J�]>t<�*����.�������^��n�����,���{}~z;f��m1�5��BQ����>�������8�@�iz0;�9����0����H�*���U��������I�����7�bP���S���`�/j|"Z�`��20�go����&��[<��h�kIy���2�j�,W�W:������3IR>Zhaa �G�x5G����X���x��PIs^4+��V���<��FEhY��[�I+�v%��W��|�f�������_��I��p2i}�����n��U�[C�u���,a����`i[�/�k����=�����
�����sv�%sf.��,#�F�m�<��h	��I|������i��BuN��$�uF�������jv��b8�
W�n�de�$VC�I&E����L���	F��o����j��I����h�������� cU�),��!T9�8(��=n��Pw��:���o��z@q)��;�����NIA�D>�p�4�	���+}*q���h�~pR�]�6�}p��t8��@���Rn��)��a�^jJ���3�����N�EP�$���Y��P��C�	�Y����7������7������FTv���*/����T^��r�n���Yq�����iL/((�Cz�����Y�J�N\B-\���3�P-Veu�4=���H�x�'��	��,�[�^ni��	K!���P�sYX�gB�$�fS��LK��jKb
��
<�Z��v�v_>y��<A�o�aX������YS����`��m�w_��������
�B3���UO��'�8(n�\��*��y}0��y�����9Z�-��q��d/K�
����_���M/��������*x*��{��
��vOU�t�Y��S(F�
7�j��*k�����������UV��-0�)�����WU�����O��������A�@�e�$��U
F�U�,"����SP��^^^zf�43��;A��}5��*����qSa�:
�o������W�Y6�xQBm��m;�����;n����U~J{Gx���'C��U<?���=�-U��5*��_���w�`�������)�V���'xA����YJc=�8�����CcC�$#�.V33#���T|;���YcU�D����i��d
�w��>�	�	�
��J�`�K-�.a���aP�~������:���������	����
c_:�2|�}q	0�@wY�}��4������ �g�U?��*�%�'����T	L�������A=TP����'5�p�K���Y����\Z��),��(�����M��������L������e�gh�g�W�l�����\\�2�a
v12-0025-tableam-introduce-slot-based-table-getnext-and-u.patch.gzapplication/x-patch-gzipDownload
v12-0026-tableam-Add-insert-delete-update-lock_tuple.patch.gzapplication/x-patch-gzipDownload
v12-0027-tableam-Add-fetch_row_version.patch.gzapplication/x-patch-gzipDownload
v12-0028-tableam-Add-use-tableam_fetch_follow_check.patch.gzapplication/x-patch-gzipDownload
v12-0029-tableam-Add-table_get_latest_tid.patch.gzapplication/x-patch-gzipDownload
v12-0030-tableam-multi_insert-and-slotify-COPY.patch.gzapplication/x-patch-gzipDownload
v12-0031-tableam-finish_bulk_insert.patch.gzapplication/x-patch-gzipDownload
v12-0032-tableam-slotify-CREATE-TABLE-AS-and-CREATE-MATER.patch.gzapplication/x-patch-gzipDownload
v12-0033-tableam-index-builds.patch.gzapplication/x-patch-gzipDownload
v12-0034-tableam-relation-creation-VACUUM-FULL-CLUSTER-SE.patch.gzapplication/x-patch-gzipDownload
v12-0035-tableam-VACUUM-and-ANALYZE.patch.gzapplication/x-patch-gzipDownload
v12-0036-tableam-planner-size-estimation.patch.gzapplication/x-patch-gzipDownload
v12-0037-tableam-Sample-Scan-Support.patch.gzapplication/x-patch-gzipDownload
v12-0038-tableam-bitmap-heap-scan.patch.gzapplication/x-patch-gzipDownload
v12-0039-tableam-remaining-stuff.patch.gzapplication/x-patch-gzipDownload
v12-0040-WIP-Move-xid-horizon-computation-for-page-level-.patch.gzapplication/x-patch-gzipDownload
v12-0041-tableam-Add-function-to-determine-newest-xid-amo.patch.gzapplication/x-patch-gzipDownload
v12-0042-tableam-Fetch-tuples-for-triggers-EPQ-using-a-pr.patch.gzapplication/x-patch-gzipDownload
#79Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Dmitry Dolgov (#76)
Re: Pluggable Storage - Andres's take

On Sun, 20 Jan 2019 at 22:46, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Fri, Jan 18, 2019 at 11:22 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on

I thought we wanted to avoid having to add this setting in individual
regression tests. Can't we do this in pg_regress as a common setting ?

Yeah, you're probably right. Actually, I couldn't find anything that looks like
"common settings", and so far I've placed it into psql_start_test as a psql
argument. But not sure, maybe there is a better place.

Yeah, psql_start_test() looks good to me. pg_regress does not seem to
have it's own psqlrc file where we could have put this variable. May
be later on if we want to have more such variables, we could device
this infrastructure.

+ /* Access method info */
+ if (pset.sversion >= 120000 && verbose && tableinfo.relam != NULL &&
+    !(pset.hide_tableam && tableinfo.relam_is_default))
+ {
+         printfPQExpBuffer(&buf, _("Access method: %s"),
fmtId(tableinfo.relam));

So this will make psql hide the access method if it's same as the
default. I understand that this was kind of concluded in the other
thread "Displaying and dumping of table access methods". But IMHO, if
the hide_tableam is false, we should *always* show the access method,
regardless of the default value. I mean, we can make it simple : off
means never show table-access, on means always show table-access,
regardless of the default access method. And this also will work with
regression tests. If some regression test wants specifically to output
the access method, it can have a "\SET HIDE_TABLEAM off" command.

If we hide the method if it's default, then for a regression test that
wants to forcibly show the table access method of all tables, it won't
show up for tables that have default access method.

I can't imagine, what kind of test would need to forcibly show the table access
method of all the tables? Even if you need to verify tableam for something,
maybe it's even easier just to select it from pg_am?

Actually my statement is wrong, sorry. For a regression test that
wants to forcibly show table access for all tables, it just needs to
SET HIDE_TABLEAM to OFF. With your patch, if we set HIDE_TABLEAM to
OFF, it will *always* show the table access regardless of default
access method.

It is with HIDE_TABLEAM=ON that your patch hides the table access
conditionally (i.e. it shows when default value does not match). It's
in this case, that I feel we should *unconditionally* hide the table
access. Regression tests that use \d+ to show the table details might
not be interested specifically in table access method. But these will
fail if run with a modified default access method.

Besides, my general inclination is : keep the GUC behaviour simple;
and also, it looks like we can keep the regression test output
consistent without having to have this conditional behaviour.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#80Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#42)
2 attachment(s)
Re: Pluggable Storage - Andres's take

On Mon, Jan 21, 2019 at 1:01 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-12-10 18:13:40 -0800, Andres Freund wrote:

On 2018-11-26 17:55:57 -0800, Andres Freund wrote:

FWIW, now that oids are removed, and the tuple table slot abstraction
got in, I'm working on rebasing the pluggable storage patchset ontop of
that.

I've pushed a version to that to the git tree, including a rebased
version of zheap:
https://github.com/anarazel/postgres-pluggable-storage
https://github.com/anarazel/postgres-pluggable-zheap

I've pushed the newest, substantially revised, version to the same
repository. Note, that while the newest pluggable-zheap version is newer
than my last email, it's not based on the latest version, and the
pluggable-zheap development is now happening in the main zheap
repository.

Thanks for the new version of patches and changes.

Todo:
- consider removing scan_update_snapshot

Attached the patch for removal of scan_update_snapshot
and also the rebased patch of reduction in use of t_tableOid.

- consider removing table_gimmegimmeslot()
- add substantial docs for every callback

Will work on the above two.

While I saw an initial attempt at writing smgl docs for the table AM

API, I'm not convinced that's the best approach. I think it might make
more sense to have high-level docs in sgml, but then do all the
per-callback docs in tableam.h.

OK, I will update the sgml docs accordingly.
Index AM has per callback docs in the sgml, refactor them also?

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0002-Removal-of-scan_update_snapshot.patchapplication/octet-stream; name=0002-Removal-of-scan_update_snapshot.patchDownload
From d3471dfa797ae83b97acb5404b31cafeac88492e Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Tue, 22 Jan 2019 11:29:20 +1100
Subject: [PATCH 2/2] Removal of scan_update_snapshot

The snapshot structure is avaiable in the tablescandesc
structure itself, so it can be accessed outside itself,
no need of any callback.
---
 src/backend/access/heap/heapam.c          | 18 ------------------
 src/backend/access/heap/heapam_handler.c  |  1 -
 src/backend/executor/nodeBitmapHeapscan.c |  6 +++++-
 src/include/access/heapam.h               |  1 -
 src/include/access/tableam.h              | 10 ----------
 5 files changed, 5 insertions(+), 31 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7f594b3e4f..6655a95433 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1252,24 +1252,6 @@ heap_endscan(TableScanDesc sscan)
 	pfree(scan);
 }
 
-/* ----------------
- *		heap_update_snapshot
- *
- *		Update snapshot info in heap scan descriptor.
- * ----------------
- */
-void
-heap_update_snapshot(TableScanDesc sscan, Snapshot snapshot)
-{
-	HeapScanDesc scan = (HeapScanDesc) sscan;
-
-	Assert(IsMVCCSnapshot(snapshot));
-
-	RegisterSnapshot(snapshot);
-	scan->rs_scan.rs_snapshot = snapshot;
-	scan->rs_scan.rs_temp_snap = true;
-}
-
 /* ----------------
  *		heap_getnext	- retrieve next tuple in scan
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 62c5f9fa9f..3dc1444739 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2308,7 +2308,6 @@ static const TableAmRoutine heapam_methods = {
 	.scan_begin = heap_beginscan,
 	.scan_end = heap_endscan,
 	.scan_rescan = heap_rescan,
-	.scan_update_snapshot = heap_update_snapshot,
 	.scan_getnextslot = heap_getnextslot,
 
 	.parallelscan_estimate = table_block_parallelscan_estimate,
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 59061c746b..b48ab5036c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -954,5 +954,9 @@ ExecBitmapHeapInitializeWorker(BitmapHeapScanState *node,
 	node->pstate = pstate;
 
 	snapshot = RestoreSnapshot(pstate->phs_snapshot_data);
-	table_scan_update_snapshot(node->ss.ss_currentScanDesc, snapshot);
+	Assert(IsMVCCSnapshot(snapshot));
+
+	RegisterSnapshot(snapshot);
+	node->ss.ss_currentScanDesc->rs_snapshot = snapshot;
+	node->ss.ss_currentScanDesc->rs_temp_snap = true;
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 7c9c4f5e98..dd67e7e270 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -182,7 +182,6 @@ extern void simple_heap_update(Relation relation, ItemPointer otid,
 				   HeapTuple tup);
 
 extern void heap_sync(Relation relation);
-extern void heap_update_snapshot(TableScanDesc scan, Snapshot snapshot);
 
 extern TransactionId heap_compute_xid_horizon_for_tuples(Relation rel,
 														 ItemPointerData *items,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 428ff90cad..092980e205 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -89,7 +89,6 @@ typedef struct TableAmRoutine
 	void		(*scan_end) (TableScanDesc scan);
 	void		(*scan_rescan) (TableScanDesc scan, struct ScanKeyData *key, bool set_params,
 								bool allow_strat, bool allow_sync, bool allow_pagemode);
-	void		(*scan_update_snapshot) (TableScanDesc scan, Snapshot snapshot);
 	TupleTableSlot *(*scan_getnextslot) (TableScanDesc scan,
 										 ScanDirection direction, TupleTableSlot *slot);
 
@@ -389,15 +388,6 @@ table_rescan_set_params(TableScanDesc scan, struct ScanKeyData *key,
 										 allow_pagemode);
 }
 
-/*
- * Update snapshot info in heap scan descriptor.
- */
-static inline void
-table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot)
-{
-	scan->rs_rd->rd_tableam->scan_update_snapshot(scan, snapshot);
-}
-
 static inline TupleTableSlot *
 table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *slot)
 {
-- 
2.18.0.windows.1

0001-Reduce-the-use-of-HeapTuple-t_tableOid.patchapplication/octet-stream; name=0001-Reduce-the-use-of-HeapTuple-t_tableOid.patchDownload
From feec41723edd21987838080487349e4852d7daa7 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Wed, 16 Jan 2019 18:43:47 +1100
Subject: [PATCH 1/2] Reduce the use of HeapTuple t_tableOid

Except trigger and where the HeapTuple is generated
and passed from slots, still the t_tableOid is used.
This needs to be replaced when the t_tableOid is stored
as a seperate variable/parameter.
---
 contrib/hstore/hstore_io.c                    |  2 --
 contrib/pg_visibility/pg_visibility.c         |  1 -
 contrib/pgstattuple/pgstatapprox.c            |  1 -
 contrib/pgstattuple/pgstattuple.c             |  3 +-
 contrib/postgres_fdw/postgres_fdw.c           | 11 ++++--
 src/backend/access/common/heaptuple.c         |  7 ----
 src/backend/access/heap/heapam.c              | 35 +++++--------------
 src/backend/access/heap/heapam_handler.c      | 27 +++++++-------
 src/backend/access/heap/heapam_visibility.c   | 20 ++++-------
 src/backend/access/heap/pruneheap.c           |  2 --
 src/backend/access/heap/tuptoaster.c          |  3 --
 src/backend/access/heap/vacuumlazy.c          |  2 --
 src/backend/access/index/genam.c              |  1 -
 src/backend/catalog/indexing.c                |  2 +-
 src/backend/commands/analyze.c                |  2 +-
 src/backend/commands/functioncmds.c           |  3 +-
 src/backend/commands/schemacmds.c             |  1 -
 src/backend/commands/trigger.c                | 21 +++++------
 src/backend/executor/execExprInterp.c         |  1 -
 src/backend/executor/execTuples.c             | 29 +++++++++------
 src/backend/executor/execUtils.c              |  2 --
 src/backend/executor/nodeAgg.c                |  3 +-
 src/backend/executor/nodeGather.c             |  1 +
 src/backend/executor/nodeGatherMerge.c        |  1 +
 src/backend/executor/nodeIndexonlyscan.c      |  4 +--
 src/backend/executor/nodeIndexscan.c          |  3 +-
 src/backend/executor/nodeModifyTable.c        |  6 +---
 src/backend/executor/nodeSetOp.c              |  1 +
 src/backend/executor/spi.c                    |  1 -
 src/backend/executor/tqueue.c                 |  1 -
 src/backend/replication/logical/decode.c      |  9 -----
 .../replication/logical/reorderbuffer.c       |  4 +--
 src/backend/utils/adt/expandedrecord.c        |  1 -
 src/backend/utils/adt/jsonfuncs.c             |  2 --
 src/backend/utils/adt/rowtypes.c              | 10 ------
 src/backend/utils/cache/catcache.c            |  1 -
 src/backend/utils/sort/tuplesort.c            |  7 ++--
 src/include/access/heapam.h                   |  3 +-
 src/include/executor/tuptable.h               |  5 ++-
 src/pl/plpgsql/src/pl_exec.c                  |  2 --
 src/test/regress/regress.c                    |  1 -
 41 files changed, 92 insertions(+), 150 deletions(-)

diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 745497c76f..05244e77ef 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -845,7 +845,6 @@ hstore_from_record(PG_FUNCTION_ARGS)
 		/* Build a temporary HeapTuple control structure */
 		tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 		ItemPointerSetInvalid(&(tuple.t_self));
-		tuple.t_tableOid = InvalidOid;
 		tuple.t_data = rec;
 
 		values = (Datum *) palloc(ncolumns * sizeof(Datum));
@@ -998,7 +997,6 @@ hstore_populate_record(PG_FUNCTION_ARGS)
 		/* Build a temporary HeapTuple control structure */
 		tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 		ItemPointerSetInvalid(&(tuple.t_self));
-		tuple.t_tableOid = InvalidOid;
 		tuple.t_data = rec;
 	}
 
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index c9166730fe..503f00408c 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -657,7 +657,6 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
 			ItemPointerSet(&(tuple.t_self), blkno, offnum);
 			tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 			tuple.t_len = ItemIdGetLength(itemid);
-			tuple.t_tableOid = relid;
 
 			/*
 			 * If we're checking whether the page is all-visible, we expect
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index 636c8d40ac..6abe22ed61 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -152,7 +152,6 @@ statapprox_heap(Relation rel, output_type *stat)
 
 			tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 			tuple.t_len = ItemIdGetLength(itemid);
-			tuple.t_tableOid = RelationGetRelid(rel);
 
 			/*
 			 * We follow VACUUM's lead in counting INSERT_IN_PROGRESS tuples
diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 9bcb640884..a0e7abe748 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -344,7 +344,8 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 		/* must hold a buffer lock to call HeapTupleSatisfiesVisibility */
 		LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
 
-		if (HeapTupleSatisfiesVisibility(tuple, &SnapshotDirty, hscan->rs_cbuf))
+		if (HeapTupleSatisfiesVisibility(tuple, RelationGetRelid(hscan->rs_scan.rs_rd),
+								&SnapshotDirty, hscan->rs_cbuf))
 		{
 			stat.tuple_len += tuple->t_len;
 			stat.tuple_count++;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 525629a6cc..1f806a588b 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1468,6 +1468,8 @@ postgresIterateForeignScan(ForeignScanState *node)
 	 */
 	ExecStoreHeapTuple(fsstate->tuples[fsstate->next_tuple++],
 					   slot,
+					   fsstate->rel ?
+							   RelationGetRelid(fsstate->rel) : InvalidOid,
 					   false);
 
 	return slot;
@@ -3482,7 +3484,8 @@ store_returning_result(PgFdwModifyState *fmstate,
 		 * The returning slot will currently not necessarily be suitable to
 		 * store heaptuples directly, so allow for conversion.
 		 */
-		ExecForceStoreHeapTuple(newtup, slot);
+		ExecForceStoreHeapTuple(newtup, slot,
+					fmstate->rel ? RelationGetRelid(fmstate->rel) : InvalidOid);
 		ExecMaterializeSlot(slot);
 		pfree(newtup);
 	}
@@ -3756,7 +3759,11 @@ get_returning_data(ForeignScanState *node)
 												dmstate->retrieved_attrs,
 												node,
 												dmstate->temp_cxt);
-			ExecStoreHeapTuple(newtup, slot, false);
+			ExecStoreHeapTuple(newtup,
+								slot,
+								dmstate->rel ?
+								RelationGetRelid(dmstate->rel) : InvalidOid,
+								false);
 		}
 		PG_CATCH();
 		{
diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index ed4549ca57..d4ede6d47a 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -689,7 +689,6 @@ heap_copytuple(HeapTuple tuple)
 	newTuple = (HeapTuple) palloc(HEAPTUPLESIZE + tuple->t_len);
 	newTuple->t_len = tuple->t_len;
 	newTuple->t_self = tuple->t_self;
-	newTuple->t_tableOid = tuple->t_tableOid;
 	newTuple->t_data = (HeapTupleHeader) ((char *) newTuple + HEAPTUPLESIZE);
 	memcpy((char *) newTuple->t_data, (char *) tuple->t_data, tuple->t_len);
 	return newTuple;
@@ -715,7 +714,6 @@ heap_copytuple_with_tuple(HeapTuple src, HeapTuple dest)
 
 	dest->t_len = src->t_len;
 	dest->t_self = src->t_self;
-	dest->t_tableOid = src->t_tableOid;
 	dest->t_data = (HeapTupleHeader) palloc(src->t_len);
 	memcpy((char *) dest->t_data, (char *) src->t_data, src->t_len);
 }
@@ -850,7 +848,6 @@ expand_tuple(HeapTuple *targetHeapTuple,
 			= targetTHeader
 			= (HeapTupleHeader) ((char *) *targetHeapTuple + HEAPTUPLESIZE);
 		(*targetHeapTuple)->t_len = len;
-		(*targetHeapTuple)->t_tableOid = sourceTuple->t_tableOid;
 		(*targetHeapTuple)->t_self = sourceTuple->t_self;
 
 		targetTHeader->t_infomask = sourceTHeader->t_infomask;
@@ -1078,7 +1075,6 @@ heap_form_tuple(TupleDesc tupleDescriptor,
 	 */
 	tuple->t_len = len;
 	ItemPointerSetInvalid(&(tuple->t_self));
-	tuple->t_tableOid = InvalidOid;
 
 	HeapTupleHeaderSetDatumLength(td, len);
 	HeapTupleHeaderSetTypeId(td, tupleDescriptor->tdtypeid);
@@ -1162,7 +1158,6 @@ heap_modify_tuple(HeapTuple tuple,
 	 */
 	newTuple->t_data->t_ctid = tuple->t_data->t_ctid;
 	newTuple->t_self = tuple->t_self;
-	newTuple->t_tableOid = tuple->t_tableOid;
 
 	return newTuple;
 }
@@ -1225,7 +1220,6 @@ heap_modify_tuple_by_cols(HeapTuple tuple,
 	 */
 	newTuple->t_data->t_ctid = tuple->t_data->t_ctid;
 	newTuple->t_self = tuple->t_self;
-	newTuple->t_tableOid = tuple->t_tableOid;
 
 	return newTuple;
 }
@@ -1465,7 +1459,6 @@ heap_tuple_from_minimal_tuple(MinimalTuple mtup)
 	result = (HeapTuple) palloc(HEAPTUPLESIZE + len);
 	result->t_len = len;
 	ItemPointerSetInvalid(&(result->t_self));
-	result->t_tableOid = InvalidOid;
 	result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
 	memcpy((char *) result->t_data + MINIMAL_TUPLE_OFFSET, mtup, mtup->t_len);
 	memset(result->t_data, 0, offsetof(HeapTupleHeaderData, t_infomask2));
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8837f83292..7f594b3e4f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -414,7 +414,6 @@ heapgetpage(TableScanDesc sscan, BlockNumber page)
 			HeapTupleData loctup;
 			bool		valid;
 
-			loctup.t_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 			loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
 			loctup.t_len = ItemIdGetLength(lpp);
 			ItemPointerSet(&(loctup.t_self), page, lineoff);
@@ -422,7 +421,8 @@ heapgetpage(TableScanDesc sscan, BlockNumber page)
 			if (all_visible)
 				valid = true;
 			else
-				valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer);
+				valid = HeapTupleSatisfiesVisibility(&loctup, RelationGetRelid(scan->rs_scan.rs_rd),
+									snapshot, buffer);
 
 			CheckForSerializableConflictOut(valid, scan->rs_scan.rs_rd, &loctup,
 											buffer, snapshot);
@@ -640,7 +640,7 @@ heapgettup(HeapScanDesc scan,
 				/*
 				 * if current tuple qualifies, return it.
 				 */
-				valid = HeapTupleSatisfiesVisibility(tuple,
+				valid = HeapTupleSatisfiesVisibility(tuple, RelationGetRelid(scan->rs_scan.rs_rd),
 													 snapshot,
 													 scan->rs_cbuf);
 
@@ -1156,9 +1156,6 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 	if (!is_bitmapscan && snapshot)
 		PredicateLockRelation(relation, snapshot);
 
-	/* we only need to set this up once */
-	scan->rs_ctup.t_tableOid = RelationGetRelid(relation);
-
 	/*
 	 * we do this here instead of in initscan() because heap_rescan also calls
 	 * initscan() and we don't want to allocate memory again
@@ -1383,6 +1380,7 @@ heap_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *s
 
 	pgstat_count_heap_getnext(scan->rs_scan.rs_rd);
 
+	slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 	return ExecStoreBufferHeapTuple(&scan->rs_ctup, slot,
 									scan->rs_cbuf);
 }
@@ -1486,12 +1484,11 @@ heap_fetch(Relation relation,
 	ItemPointerCopy(tid, &(tuple->t_self));
 	tuple->t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tuple->t_len = ItemIdGetLength(lp);
-	tuple->t_tableOid = RelationGetRelid(relation);
 
 	/*
 	 * check tuple visibility, then release lock
 	 */
-	valid = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
+	valid = HeapTupleSatisfiesVisibility(tuple, RelationGetRelid(relation), snapshot, buffer);
 
 	if (valid)
 		PredicateLockTuple(relation, tuple, snapshot);
@@ -1596,7 +1593,6 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 
 		heapTuple->t_data = (HeapTupleHeader) PageGetItem(dp, lp);
 		heapTuple->t_len = ItemIdGetLength(lp);
-		heapTuple->t_tableOid = RelationGetRelid(relation);
 		ItemPointerSetOffsetNumber(&heapTuple->t_self, offnum);
 
 		/*
@@ -1633,7 +1629,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 			ItemPointerSet(&(heapTuple->t_self), BufferGetBlockNumber(buffer), offnum);
 
 			/* If it's visible per the snapshot, we must return it */
-			valid = HeapTupleSatisfiesVisibility(heapTuple, snapshot, buffer);
+			valid = HeapTupleSatisfiesVisibility(heapTuple, RelationGetRelid(relation), snapshot, buffer);
 			CheckForSerializableConflictOut(valid, relation, heapTuple,
 											buffer, snapshot);
 			/* reset to original, non-redirected, tid */
@@ -1790,7 +1786,6 @@ heap_get_latest_tid(Relation relation,
 		tp.t_self = ctid;
 		tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 		tp.t_len = ItemIdGetLength(lp);
-		tp.t_tableOid = RelationGetRelid(relation);
 
 		/*
 		 * After following a t_ctid link, we might arrive at an unrelated
@@ -1807,7 +1802,7 @@ heap_get_latest_tid(Relation relation,
 		 * Check tuple visibility; if visible, set it as the new result
 		 * candidate.
 		 */
-		valid = HeapTupleSatisfiesVisibility(&tp, snapshot, buffer);
+		valid = HeapTupleSatisfiesVisibility(&tp, RelationGetRelid(relation), snapshot, buffer);
 		CheckForSerializableConflictOut(valid, relation, &tp, buffer, snapshot);
 		if (valid)
 			*tid = ctid;
@@ -2160,7 +2155,6 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 
 	HeapTupleHeaderSetCmin(tup->t_data, cid);
 	HeapTupleHeaderSetXmax(tup->t_data, 0); /* for cleanliness */
-	tup->t_tableOid = RelationGetRelid(relation);
 
 	/*
 	 * If the new tuple is too big for storage or contains already toasted
@@ -2218,9 +2212,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	{
 		heaptuples[i] = heap_prepare_insert(relation, ExecFetchSlotHeapTuple(slots[i], true, NULL),
 											xid, cid, options);
-
-		if (slots[i]->tts_tableOid != InvalidOid)
-			heaptuples[i]->t_tableOid = slots[i]->tts_tableOid;
 	}
 
 	/*
@@ -2610,7 +2601,6 @@ heap_delete(Relation relation, ItemPointer tid,
 	lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
 	Assert(ItemIdIsNormal(lp));
 
-	tp.t_tableOid = RelationGetRelid(relation);
 	tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tp.t_len = ItemIdGetLength(lp);
 	tp.t_self = *tid;
@@ -2727,7 +2717,7 @@ l1:
 	if (crosscheck != InvalidSnapshot && result == HeapTupleMayBeUpdated)
 	{
 		/* Perform additional check for transaction-snapshot mode RI updates */
-		if (!HeapTupleSatisfiesVisibility(&tp, crosscheck, buffer))
+		if (!HeapTupleSatisfiesVisibility(&tp, RelationGetRelid(relation), crosscheck, buffer))
 			result = HeapTupleUpdated;
 	}
 
@@ -3130,14 +3120,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	 * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
 	 * properly.
 	 */
-	oldtup.t_tableOid = RelationGetRelid(relation);
 	oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	oldtup.t_len = ItemIdGetLength(lp);
 	oldtup.t_self = *otid;
 
-	/* the new tuple is ready, except for this: */
-	newtup->t_tableOid = RelationGetRelid(relation);
-
 	/* Determine columns modified by the update. */
 	modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
 												  &oldtup, newtup);
@@ -3368,7 +3354,7 @@ l2:
 	if (crosscheck != InvalidSnapshot && result == HeapTupleMayBeUpdated)
 	{
 		/* Perform additional check for transaction-snapshot mode RI updates */
-		if (!HeapTupleSatisfiesVisibility(&oldtup, crosscheck, buffer))
+		if (!HeapTupleSatisfiesVisibility(&oldtup, RelationGetRelid(relation), crosscheck, buffer))
 			result = HeapTupleUpdated;
 	}
 
@@ -4122,7 +4108,6 @@ heap_lock_tuple(Relation relation, ItemPointer tid,
 
 	tuple->t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tuple->t_len = ItemIdGetLength(lp);
-	tuple->t_tableOid = RelationGetRelid(relation);
 	tuple->t_self = *tid;
 
 l3:
@@ -5676,7 +5661,6 @@ heap_abort_speculative(Relation relation, HeapTuple tuple)
 	lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
 	Assert(ItemIdIsNormal(lp));
 
-	tp.t_tableOid = RelationGetRelid(relation);
 	tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tp.t_len = ItemIdGetLength(lp);
 	tp.t_self = *tid;
@@ -7508,7 +7492,6 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
 	HeapTupleHeader hdr = tup->t_data;
 
 	Assert(ItemPointerIsValid(&tup->t_self));
-	Assert(tup->t_tableOid != InvalidOid);
 
 	xlrec.top_xid = GetTopTransactionId();
 	xlrec.target_node = relation->rd_node;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 1f6bcf1c51..62c5f9fa9f 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -119,8 +119,6 @@ heapam_heap_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
 
 	/* Update the tuple with table oid */
 	slot->tts_tableOid = RelationGetRelid(relation);
-	if (slot->tts_tableOid != InvalidOid)
-		tuple->t_tableOid = slot->tts_tableOid;
 
 	/* Perform the insertion, and copy the resulting ItemPointer */
 	heap_insert(relation, tuple, cid, options, bistate);
@@ -139,8 +137,6 @@ heapam_heap_insert_speculative(Relation relation, TupleTableSlot *slot, CommandI
 
 	/* Update the tuple with table oid */
 	slot->tts_tableOid = RelationGetRelid(relation);
-	if (slot->tts_tableOid != InvalidOid)
-		tuple->t_tableOid = slot->tts_tableOid;
 
 	HeapTupleHeaderSetSpeculativeToken(tuple->t_data, specToken);
 
@@ -567,7 +563,9 @@ heapam_tuple_satisfies_snapshot(Relation rel, TupleTableSlot *slot, Snapshot sna
 	 * Caller should be holding pin, but not lock.
 	 */
 	LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
-	res = HeapTupleSatisfiesVisibility(bslot->base.tuple, snapshot,
+	res = HeapTupleSatisfiesVisibility(bslot->base.tuple,
+									   RelationGetRelid(rel),
+									   snapshot,
 									   bslot->buffer);
 	LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
 
@@ -733,7 +731,6 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
 
 		ItemPointerSet(&targtuple->t_self, scan->rs_cblock, scan->rs_cindex);
 
-		targtuple->t_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 		targtuple->t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
 		targtuple->t_len = ItemIdGetLength(itemid);
 
@@ -818,6 +815,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
 		if (sample_it)
 		{
 			ExecStoreBufferHeapTuple(targtuple, slot, scan->rs_cbuf);
+			slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 			scan->rs_cindex++;
 
 			/* note that we leave the buffer locked here! */
@@ -1478,7 +1476,7 @@ heapam_index_build_range_scan(Relation heapRelation,
 		MemoryContextReset(econtext->ecxt_per_tuple_memory);
 
 		/* Set up for predicate or expression evaluation */
-		ExecStoreHeapTuple(heapTuple, slot, false);
+		ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(sscan->rs_rd), false);
 
 		/*
 		 * In a partial index, discard tuples that don't satisfy the
@@ -1732,7 +1730,7 @@ heapam_index_validate_scan(Relation heapRelation,
 			MemoryContextReset(econtext->ecxt_per_tuple_memory);
 
 			/* Set up for predicate or expression evaluation */
-			ExecStoreHeapTuple(heapTuple, slot, false);
+			ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(sscan->rs_rd), false);
 
 			/*
 			 * In a partial index, discard tuples that don't satisfy the
@@ -1885,9 +1883,9 @@ heapam_scan_bitmap_pagescan(TableScanDesc sscan,
 				continue;
 			loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
 			loctup.t_len = ItemIdGetLength(lp);
-			loctup.t_tableOid = scan->rs_scan.rs_rd->rd_id;
 			ItemPointerSet(&loctup.t_self, page, offnum);
-			valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer);
+			valid = HeapTupleSatisfiesVisibility(&loctup, 
+					RelationGetRelid(scan->rs_scan.rs_rd), snapshot, buffer);
 			if (valid)
 			{
 				scan->rs_vistuples[ntup++] = offnum;
@@ -1924,7 +1922,6 @@ heapam_scan_bitmap_pagescan_next(TableScanDesc sscan, TupleTableSlot *slot)
 
 	scan->rs_ctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
 	scan->rs_ctup.t_len = ItemIdGetLength(lp);
-	scan->rs_ctup.t_tableOid = scan->rs_scan.rs_rd->rd_id;
 	ItemPointerSet(&scan->rs_ctup.t_self, scan->rs_cblock, targoffset);
 
 	pgstat_count_heap_fetch(scan->rs_scan.rs_rd);
@@ -1936,6 +1933,7 @@ heapam_scan_bitmap_pagescan_next(TableScanDesc sscan, TupleTableSlot *slot)
 	ExecStoreBufferHeapTuple(&scan->rs_ctup,
 							 slot,
 							 scan->rs_cbuf);
+	slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 
 	scan->rs_cindex++;
 
@@ -1982,8 +1980,10 @@ SampleHeapTupleVisible(HeapScanDesc scan, Buffer buffer,
 	else
 	{
 		/* Otherwise, we have to check the tuple individually. */
-		return HeapTupleSatisfiesVisibility(tuple, scan->rs_scan.rs_snapshot,
-											buffer);
+		return HeapTupleSatisfiesVisibility(tuple, 
+				RelationGetRelid(scan->rs_scan.rs_rd), 
+				scan->rs_scan.rs_snapshot,
+				buffer);
 	}
 }
 
@@ -2132,6 +2132,7 @@ heapam_scan_sample_next_tuple(TableScanDesc sscan, struct SampleScanState *scans
 				LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 
 			ExecStoreBufferHeapTuple(tuple, slot, scan->rs_cbuf);
+			slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 
 			/* Count successfully-fetched tuples as heap fetches */
 			pgstat_count_heap_getnext(scan->rs_scan.rs_rd);
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index cf9aa5c126..8c298f08e4 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -176,7 +176,6 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
@@ -367,7 +366,6 @@ HeapTupleSatisfiesToast(HeapTuple htup, Snapshot snapshot,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
@@ -461,7 +459,6 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
@@ -754,7 +751,6 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	snapshot->xmin = snapshot->xmax = InvalidTransactionId;
 	snapshot->speculativeToken = 0;
@@ -978,7 +974,6 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
@@ -1179,7 +1174,6 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	/*
 	 * Has inserting transaction committed?
@@ -1434,7 +1428,6 @@ HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin)
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	/*
 	 * If the inserting transaction is marked invalid, then it aborted, and
@@ -1550,7 +1543,7 @@ TransactionIdInArray(TransactionId xid, TransactionId *xip, Size num)
  * complicated than when dealing "only" with the present.
  */
 static bool
-HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
+HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Oid relid, Snapshot snapshot,
 							   Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
@@ -1558,7 +1551,6 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 	TransactionId xmax = HeapTupleHeaderGetRawXmax(tuple);
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	/* inserting transaction aborted */
 	if (HeapTupleHeaderXminInvalid(tuple))
@@ -1579,7 +1571,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 		 * values externally.
 		 */
 		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(), snapshot,
-												 htup, buffer,
+												 htup, relid, buffer,
 												 &cmin, &cmax);
 
 		if (!resolved)
@@ -1650,7 +1642,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 
 		/* Lookup actual cmin/cmax values */
 		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(), snapshot,
-												 htup, buffer,
+												 htup, relid, buffer,
 												 &cmin, &cmax);
 
 		if (!resolved)
@@ -1698,8 +1690,10 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
  *	if so, the indicated buffer is marked dirty.
  */
 bool
-HeapTupleSatisfiesVisibility(HeapTuple tup, Snapshot snapshot, Buffer buffer)
+HeapTupleSatisfiesVisibility(HeapTuple tup, Oid relid, Snapshot snapshot, Buffer buffer)
 {
+	Assert(relid != InvalidOid);
+	
 	switch (snapshot->snapshot_type)
 	{
 		case SNAPSHOT_MVCC:
@@ -1718,7 +1712,7 @@ HeapTupleSatisfiesVisibility(HeapTuple tup, Snapshot snapshot, Buffer buffer)
 			return HeapTupleSatisfiesDirty(tup, snapshot, buffer);
 			break;
 		case SNAPSHOT_HISTORIC_MVCC:
-			return HeapTupleSatisfiesHistoricMVCC(tup, snapshot, buffer);
+			return HeapTupleSatisfiesHistoricMVCC(tup, relid, snapshot, buffer);
 			break;
 		case SNAPSHOT_NON_VACUUMABLE:
 			return HeapTupleSatisfiesNonVacuumable(tup, snapshot, buffer);
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a3e51922d8..e09a9d7340 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -366,8 +366,6 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 				i;
 	HeapTupleData tup;
 
-	tup.t_tableOid = RelationGetRelid(relation);
-
 	rootlp = PageGetItemId(dp, rootoffnum);
 
 	/*
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 7ea964c493..257ae9761b 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1022,7 +1022,6 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 		result_tuple = (HeapTuple) palloc0(HEAPTUPLESIZE + new_tuple_len);
 		result_tuple->t_len = new_tuple_len;
 		result_tuple->t_self = newtup->t_self;
-		result_tuple->t_tableOid = newtup->t_tableOid;
 		new_data = (HeapTupleHeader) ((char *) result_tuple + HEAPTUPLESIZE);
 		result_tuple->t_data = new_data;
 
@@ -1123,7 +1122,6 @@ toast_flatten_tuple(HeapTuple tup, TupleDesc tupleDesc)
 	 * a syscache entry.
 	 */
 	new_tuple->t_self = tup->t_self;
-	new_tuple->t_tableOid = tup->t_tableOid;
 
 	new_tuple->t_data->t_choice = tup->t_data->t_choice;
 	new_tuple->t_data->t_ctid = tup->t_data->t_ctid;
@@ -1194,7 +1192,6 @@ toast_flatten_tuple_to_datum(HeapTupleHeader tup,
 	/* Build a temporary HeapTuple control structure */
 	tmptup.t_len = tup_len;
 	ItemPointerSetInvalid(&(tmptup.t_self));
-	tmptup.t_tableOid = InvalidOid;
 	tmptup.t_data = tup;
 
 	/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 85680ab6db..860ea42cba 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1003,7 +1003,6 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 			tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 			tuple.t_len = ItemIdGetLength(itemid);
-			tuple.t_tableOid = RelationGetRelid(onerel);
 
 			tupgone = false;
 
@@ -2236,7 +2235,6 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(rel);
 
 		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
 		{
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index f4a527b126..250c746971 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -510,7 +510,6 @@ systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup)
 	result = table_tuple_satisfies_snapshot(sysscan->heap_rel,
 											sysscan->slot,
 											freshsnap);
-
 	return result;
 }
 
diff --git a/src/backend/catalog/indexing.c b/src/backend/catalog/indexing.c
index 0c994122d8..7d443f8fe7 100644
--- a/src/backend/catalog/indexing.c
+++ b/src/backend/catalog/indexing.c
@@ -99,7 +99,7 @@ CatalogIndexInsert(CatalogIndexState indstate, HeapTuple heapTuple)
 	/* Need a slot to hold the tuple being examined */
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
 									&TTSOpsHeapTuple);
-	ExecStoreHeapTuple(heapTuple, slot, false);
+	ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(heapRelation), false);
 
 	/*
 	 * for each index, form and insert the index tuple
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index fb4384d556..a71d7b658e 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -760,7 +760,7 @@ compute_index_stats(Relation onerel, double totalrows,
 			ResetExprContext(econtext);
 
 			/* Set up for predicate or expression evaluation */
-			ExecStoreHeapTuple(heapTuple, slot, false);
+			ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(onerel), false);
 
 			/* If index is partial, check predicate */
 			if (predicate != NULL)
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index eae2b09830..734b5165dd 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -2359,10 +2359,9 @@ ExecuteCallStmt(CallStmt *stmt, ParamListInfo params, bool atomic, DestReceiver
 
 		rettupdata.t_len = HeapTupleHeaderGetDatumLength(td);
 		ItemPointerSetInvalid(&(rettupdata.t_self));
-		rettupdata.t_tableOid = InvalidOid;
 		rettupdata.t_data = td;
 
-		slot = ExecStoreHeapTuple(&rettupdata, tstate->slot, false);
+		slot = ExecStoreHeapTuple(&rettupdata, tstate->slot, InvalidOid, false);
 		tstate->dest->receiveSlot(slot, tstate->dest);
 
 		end_tup_output(tstate);
diff --git a/src/backend/commands/schemacmds.c b/src/backend/commands/schemacmds.c
index 6cf94a3140..4492ae2b0e 100644
--- a/src/backend/commands/schemacmds.c
+++ b/src/backend/commands/schemacmds.c
@@ -355,7 +355,6 @@ AlterSchemaOwner_internal(HeapTuple tup, Relation rel, Oid newOwnerId)
 {
 	Form_pg_namespace nspForm;
 
-	Assert(tup->t_tableOid == NamespaceRelationId);
 	Assert(RelationGetRelid(rel) == NamespaceRelationId);
 
 	nspForm = (Form_pg_namespace) GETSTRUCT(tup);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index ced31bf552..ddc804af29 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2570,7 +2570,7 @@ ExecBRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
 		}
 		if (newtuple != oldtuple)
 		{
-			ExecForceStoreHeapTuple(newtuple, slot);
+			ExecForceStoreHeapTuple(newtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
 			newtuple = ExecFetchSlotHeapTuple(slot, true, NULL);
 		}
 	}
@@ -2647,7 +2647,8 @@ ExecIRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
 		}
 		if (oldtuple != newtuple)
 		{
-			ExecForceStoreHeapTuple(newtuple, LocTriggerData.tg_trigslot);
+			ExecForceStoreHeapTuple(newtuple, LocTriggerData.tg_trigslot,
+									RelationGetRelid(relinfo->ri_RelationDesc));
 			newtuple = ExecFetchSlotHeapTuple(slot, true, NULL);
 		}
 	}
@@ -2770,7 +2771,7 @@ ExecBRDeleteTriggers(EState *estate, EPQState *epqstate,
 	else
 	{
 		trigtuple = fdw_trigtuple;
-		ExecForceStoreHeapTuple(trigtuple, slot);
+		ExecForceStoreHeapTuple(trigtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
 	}
 
 	LocTriggerData.type = T_TriggerData;
@@ -2846,7 +2847,7 @@ ExecARDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
 		}
 		else
 		{
-			ExecForceStoreHeapTuple(fdw_trigtuple, slot);
+			ExecForceStoreHeapTuple(fdw_trigtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
 		}
 
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_DELETE,
@@ -2876,7 +2877,7 @@ ExecIRDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
 	LocTriggerData.tg_oldtable = NULL;
 	LocTriggerData.tg_newtable = NULL;
 
-	ExecForceStoreHeapTuple(trigtuple, slot);
+	ExecForceStoreHeapTuple(trigtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 	for (i = 0; i < trigdesc->numtriggers; i++)
 	{
@@ -3036,7 +3037,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 	}
 	else
 	{
-		ExecForceStoreHeapTuple(fdw_trigtuple, oldslot);
+		ExecForceStoreHeapTuple(fdw_trigtuple, oldslot, RelationGetRelid(relinfo->ri_RelationDesc));
 		trigtuple = fdw_trigtuple;
 	}
 
@@ -3082,7 +3083,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 		}
 
 		if (newtuple != oldtuple)
-			ExecForceStoreHeapTuple(newtuple, newslot);
+			ExecForceStoreHeapTuple(newtuple, newslot, RelationGetRelid(relinfo->ri_RelationDesc));
 	}
 	if (false && trigtuple != fdw_trigtuple && trigtuple != newtuple)
 		heap_freetuple(trigtuple);
@@ -3124,7 +3125,7 @@ ExecARUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 							   NULL,
 							   NULL);
 		else if (fdw_trigtuple != NULL)
-			ExecForceStoreHeapTuple(fdw_trigtuple, oldslot);
+			ExecForceStoreHeapTuple(fdw_trigtuple, oldslot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_UPDATE,
 							  true, oldslot, newslot, recheckIndexes,
@@ -3153,7 +3154,7 @@ ExecIRUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 	LocTriggerData.tg_oldtable = NULL;
 	LocTriggerData.tg_newtable = NULL;
 
-	ExecForceStoreHeapTuple(trigtuple, oldslot);
+	ExecForceStoreHeapTuple(trigtuple, oldslot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 	for (i = 0; i < trigdesc->numtriggers; i++)
 	{
@@ -3185,7 +3186,7 @@ ExecIRUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 			return false;		/* "do nothing" */
 
 		if (oldtuple != newtuple)
-			ExecForceStoreHeapTuple(newtuple, newslot);
+			ExecForceStoreHeapTuple(newtuple, newslot, RelationGetRelid(relinfo->ri_RelationDesc));
 	}
 
 	return true;
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index d793efdd9c..928052b1ae 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3003,7 +3003,6 @@ ExecEvalFieldStoreDeForm(ExprState *state, ExprEvalStep *op, ExprContext *econte
 		tuphdr = DatumGetHeapTupleHeader(tupDatum);
 		tmptup.t_len = HeapTupleHeaderGetDatumLength(tuphdr);
 		ItemPointerSetInvalid(&(tmptup.t_self));
-		tmptup.t_tableOid = InvalidOid;
 		tmptup.t_data = tuphdr;
 
 		heap_deform_tuple(&tmptup, tupDesc,
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 9afea31e6a..7c48d50784 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -387,7 +387,7 @@ tts_heap_copyslot(TupleTableSlot *dstslot, TupleTableSlot *srcslot)
 	tuple = ExecCopySlotHeapTuple(srcslot);
 	MemoryContextSwitchTo(oldcontext);
 
-	ExecStoreHeapTuple(tuple, dstslot, true);
+	ExecStoreHeapTuple(tuple, dstslot, srcslot->tts_tableOid, true);
 }
 
 static HeapTuple
@@ -1117,6 +1117,7 @@ MakeTupleTableSlot(TupleDesc tupleDesc,
 	slot->tts_tupleDescriptor = tupleDesc;
 	slot->tts_mcxt = CurrentMemoryContext;
 	slot->tts_nvalid = 0;
+	slot->tts_tableOid = InvalidOid;
 
 	if (tupleDesc != NULL)
 	{
@@ -1329,6 +1330,7 @@ ExecSetSlotDescriptor(TupleTableSlot *slot, /* slot to change */
 TupleTableSlot *
 ExecStoreHeapTuple(HeapTuple tuple,
 				   TupleTableSlot *slot,
+				   Oid relid,
 				   bool shouldFree)
 {
 	/*
@@ -1342,7 +1344,7 @@ ExecStoreHeapTuple(HeapTuple tuple,
 		elog(ERROR, "trying to store a heap tuple into wrong type of slot");
 	tts_heap_store_tuple(slot, tuple, shouldFree);
 
-	slot->tts_tableOid = tuple->t_tableOid;
+	slot->tts_tableOid = relid;
 
 	return slot;
 }
@@ -1364,6 +1366,8 @@ ExecStoreHeapTuple(HeapTuple tuple,
  *
  * If the target slot is not guaranteed to be TTSOpsBufferHeapTuple type slot,
  * use the, more expensive, ExecForceStoreHeapTuple().
+ *
+ * NOTE: Don't set the tts_tableOid from tuple t_tableOid.
  * --------------------------------
  */
 TupleTableSlot *
@@ -1383,8 +1387,6 @@ ExecStoreBufferHeapTuple(HeapTuple tuple,
 		elog(ERROR, "trying to store an on-disk heap tuple into wrong type of slot");
 	tts_buffer_heap_store_tuple(slot, tuple, buffer, false);
 
-	slot->tts_tableOid = tuple->t_tableOid;
-
 	return slot;
 }
 
@@ -1409,8 +1411,6 @@ ExecStorePinnedBufferHeapTuple(HeapTuple tuple,
 		elog(ERROR, "trying to store an on-disk heap tuple into wrong type of slot");
 	tts_buffer_heap_store_tuple(slot, tuple, buffer, true);
 
-	slot->tts_tableOid = tuple->t_tableOid;
-
 	return slot;
 }
 
@@ -1445,11 +1445,12 @@ ExecStoreMinimalTuple(MinimalTuple mtup,
  */
 void
 ExecForceStoreHeapTuple(HeapTuple tuple,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						Oid relid)
 {
 	if (TTS_IS_HEAPTUPLE(slot))
 	{
-		ExecStoreHeapTuple(tuple, slot, false);
+		ExecStoreHeapTuple(tuple, slot, relid, false);
 	}
 	else if (TTS_IS_BUFFERTUPLE(slot))
 	{
@@ -1462,6 +1463,7 @@ ExecForceStoreHeapTuple(HeapTuple tuple,
 		oldContext = MemoryContextSwitchTo(slot->tts_mcxt);
 		bslot->base.tuple = heap_copytuple(tuple);
 		MemoryContextSwitchTo(oldContext);
+		slot->tts_tableOid = relid;
 	}
 	else
 	{
@@ -1469,6 +1471,7 @@ ExecForceStoreHeapTuple(HeapTuple tuple,
 		heap_deform_tuple(tuple, slot->tts_tupleDescriptor,
 						  slot->tts_values, slot->tts_isnull);
 		ExecStoreVirtualTuple(slot);
+		slot->tts_tableOid = relid;
 	}
 }
 
@@ -1602,6 +1605,8 @@ ExecStoreAllNullTuple(TupleTableSlot *slot)
 HeapTuple
 ExecFetchSlotHeapTuple(TupleTableSlot *slot, bool materialize, bool *shouldFree)
 {
+	HeapTuple htup;
+
 	/*
 	 * sanity checks
 	 */
@@ -1616,14 +1621,18 @@ ExecFetchSlotHeapTuple(TupleTableSlot *slot, bool materialize, bool *shouldFree)
 	{
 		if (shouldFree)
 			*shouldFree = true;
-		return slot->tts_ops->copy_heap_tuple(slot);
+		htup = slot->tts_ops->copy_heap_tuple(slot);
 	}
 	else
 	{
 		if (shouldFree)
 			*shouldFree = false;
-		return slot->tts_ops->get_heap_tuple(slot);
+		htup = slot->tts_ops->get_heap_tuple(slot);
 	}
+
+	htup->t_tableOid = slot->tts_tableOid;
+
+	return htup;
 }
 
 /* --------------------------------
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 60928d3f80..b0bccb78ae 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1011,7 +1011,6 @@ GetAttributeByName(HeapTupleHeader tuple, const char *attname, bool *isNull)
 	 */
 	tmptup.t_len = HeapTupleHeaderGetDatumLength(tuple);
 	ItemPointerSetInvalid(&(tmptup.t_self));
-	tmptup.t_tableOid = InvalidOid;
 	tmptup.t_data = tuple;
 
 	result = heap_getattr(&tmptup,
@@ -1059,7 +1058,6 @@ GetAttributeByNum(HeapTupleHeader tuple,
 	 */
 	tmptup.t_len = HeapTupleHeaderGetDatumLength(tuple);
 	ItemPointerSetInvalid(&(tmptup.t_self));
-	tmptup.t_tableOid = InvalidOid;
 	tmptup.t_data = tuple;
 
 	result = heap_getattr(&tmptup,
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 508c919574..36d86bfdae 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -1801,7 +1801,8 @@ agg_retrieve_direct(AggState *aggstate)
 				 * cleared from the slot.
 				 */
 				ExecForceStoreHeapTuple(aggstate->grp_firstTuple,
-								   firstSlot);
+								   firstSlot,
+								   InvalidOid);
 				aggstate->grp_firstTuple = NULL;	/* don't keep two pointers */
 
 				/* set up for first advance_aggregates call */
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 70a4e90a05..37bce93cc3 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -280,6 +280,7 @@ gather_getnext(GatherState *gatherstate)
 			{
 				ExecStoreHeapTuple(tup, /* tuple to store */
 								   fslot,	/* slot to store the tuple */
+								   InvalidOid,
 								   true);	/* pfree tuple when done with it */
 				return fslot;
 			}
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 8635865d1e..827578ca8d 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -703,6 +703,7 @@ gather_merge_readnext(GatherMergeState *gm_state, int reader, bool nowait)
 	ExecStoreHeapTuple(tup,			/* tuple to store */
 					   gm_state->gm_slots[reader],	/* slot in which to store
 													 * the tuple */
+					   InvalidOid,
 					   true);		/* pfree tuple when done with it */
 
 	return true;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 08527d527c..ab3e3a6c60 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -206,8 +206,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 			 */
 			Assert(slot->tts_tupleDescriptor->natts ==
 				   scandesc->xs_hitupdesc->natts);
-			ExecForceStoreHeapTuple(scandesc->xs_hitup, slot);
-			slot->tts_tableOid = RelationGetRelid(scandesc->heapRelation);
+			ExecForceStoreHeapTuple(scandesc->xs_hitup, slot,
+					RelationGetRelid(scandesc->heapRelation));
 		}
 		else if (scandesc->xs_itup)
 			StoreIndexTuple(slot, scandesc->xs_itup, scandesc->xs_itupdesc);
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index b8580a896a..6e26d16f35 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -247,8 +247,7 @@ IndexNextWithReorder(IndexScanState *node)
 				tuple = reorderqueue_pop(node);
 
 				/* Pass 'true', as the tuple in the queue is a palloc'd copy */
-				slot->tts_tableOid = RelationGetRelid(scandesc->heapRelation);
-				ExecStoreHeapTuple(tuple, slot, true);
+				ExecStoreHeapTuple(tuple, slot, RelationGetRelid(scandesc->heapRelation), true);
 				return slot;
 			}
 		}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 20e77db4a8..6440c4ed69 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -847,7 +847,7 @@ ldelete:;
 			slot = ExecTriggerGetReturnSlot(estate, resultRelInfo);
 			if (oldtuple != NULL)
 			{
-				ExecForceStoreHeapTuple(oldtuple, slot);
+				ExecForceStoreHeapTuple(oldtuple, slot, RelationGetRelid(resultRelationDesc));
 			}
 			else
 			{
@@ -2036,10 +2036,6 @@ ExecModifyTable(PlanState *pstate)
 					oldtupdata.t_len =
 						HeapTupleHeaderGetDatumLength(oldtupdata.t_data);
 					ItemPointerSetInvalid(&(oldtupdata.t_self));
-					/* Historically, view triggers see invalid t_tableOid. */
-					oldtupdata.t_tableOid =
-						(relkind == RELKIND_VIEW) ? InvalidOid :
-						RelationGetRelid(resultRelInfo->ri_RelationDesc);
 
 					oldtuple = &oldtupdata;
 				}
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 5d8c8b8b02..1c063f6a7b 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -269,6 +269,7 @@ setop_retrieve_direct(SetOpState *setopstate)
 		 */
 		ExecStoreHeapTuple(setopstate->grp_firstTuple,
 						   resultTupleSlot,
+						   InvalidOid,
 						   true);
 		setopstate->grp_firstTuple = NULL;	/* don't keep two pointers */
 
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 94a53e0e3f..0fb406c64f 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -870,7 +870,6 @@ SPI_modifytuple(Relation rel, HeapTuple tuple, int natts, int *attnum,
 		 */
 		mtuple->t_data->t_ctid = tuple->t_data->t_ctid;
 		mtuple->t_self = tuple->t_self;
-		mtuple->t_tableOid = tuple->t_tableOid;
 	}
 	else
 	{
diff --git a/src/backend/executor/tqueue.c b/src/backend/executor/tqueue.c
index 6e2eaa5dcf..3ebf1e347e 100644
--- a/src/backend/executor/tqueue.c
+++ b/src/backend/executor/tqueue.c
@@ -208,7 +208,6 @@ TupleQueueReaderNext(TupleQueueReader *reader, bool nowait, bool *done)
 	 * (which had better be sufficiently aligned).
 	 */
 	ItemPointerSetInvalid(&htup.t_self);
-	htup.t_tableOid = InvalidOid;
 	htup.t_len = nbytes;
 	htup.t_data = data;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index eec3a22842..f26dac0f70 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -940,12 +940,6 @@ DecodeMultiInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			/* not a disk based tuple */
 			ItemPointerSetInvalid(&tuple->tuple.t_self);
 
-			/*
-			 * We can only figure this out after reassembling the
-			 * transactions.
-			 */
-			tuple->tuple.t_tableOid = InvalidOid;
-
 			tuple->tuple.t_len = datalen + SizeofHeapTupleHeader;
 
 			memset(header, 0, SizeofHeapTupleHeader);
@@ -1033,9 +1027,6 @@ DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tuple)
 	/* not a disk based tuple */
 	ItemPointerSetInvalid(&tuple->tuple.t_self);
 
-	/* we can only figure this out after reassembling the transactions */
-	tuple->tuple.t_tableOid = InvalidOid;
-
 	/* data is not stored aligned, copy to aligned storage */
 	memcpy((char *) &xlhdr,
 		   data,
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index a49e226967..a2a3c4760e 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3482,7 +3482,7 @@ UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
 bool
 ResolveCminCmaxDuringDecoding(HTAB *tuplecid_data,
 							  Snapshot snapshot,
-							  HeapTuple htup, Buffer buffer,
+							  HeapTuple htup, Oid relid, Buffer buffer,
 							  CommandId *cmin, CommandId *cmax)
 {
 	ReorderBufferTupleCidKey key;
@@ -3524,7 +3524,7 @@ restart:
 	 */
 	if (ent == NULL && !updated_mapping)
 	{
-		UpdateLogicalMappings(tuplecid_data, htup->t_tableOid, snapshot);
+		UpdateLogicalMappings(tuplecid_data, relid, snapshot);
 		/* now check but don't update for a mapping again */
 		updated_mapping = true;
 		goto restart;
diff --git a/src/backend/utils/adt/expandedrecord.c b/src/backend/utils/adt/expandedrecord.c
index 9971abd71f..a49cf9b467 100644
--- a/src/backend/utils/adt/expandedrecord.c
+++ b/src/backend/utils/adt/expandedrecord.c
@@ -610,7 +610,6 @@ make_expanded_record_from_datum(Datum recorddatum, MemoryContext parentcontext)
 
 	tmptup.t_len = HeapTupleHeaderGetDatumLength(tuphdr);
 	ItemPointerSetInvalid(&(tmptup.t_self));
-	tmptup.t_tableOid = InvalidOid;
 	tmptup.t_data = tuphdr;
 
 	oldcxt = MemoryContextSwitchTo(objcxt);
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index dd88c09e6d..314777909d 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -3147,7 +3147,6 @@ populate_record(TupleDesc tupdesc,
 		/* Build a temporary HeapTuple control structure */
 		tuple.t_len = HeapTupleHeaderGetDatumLength(defaultval);
 		ItemPointerSetInvalid(&(tuple.t_self));
-		tuple.t_tableOid = InvalidOid;
 		tuple.t_data = defaultval;
 
 		/* Break down the tuple into fields */
@@ -3546,7 +3545,6 @@ populate_recordset_record(PopulateRecordsetState *state, JsObject *obj)
 	/* ok, save into tuplestore */
 	tuple.t_len = HeapTupleHeaderGetDatumLength(tuphead);
 	ItemPointerSetInvalid(&(tuple.t_self));
-	tuple.t_tableOid = InvalidOid;
 	tuple.t_data = tuphead;
 
 	tuplestore_puttuple(state->tuple_store, &tuple);
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index 0467f97dc3..2fd3f49e7e 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -324,7 +324,6 @@ record_out(PG_FUNCTION_ARGS)
 	/* Build a temporary HeapTuple control structure */
 	tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 	ItemPointerSetInvalid(&(tuple.t_self));
-	tuple.t_tableOid = InvalidOid;
 	tuple.t_data = rec;
 
 	/*
@@ -671,7 +670,6 @@ record_send(PG_FUNCTION_ARGS)
 	/* Build a temporary HeapTuple control structure */
 	tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 	ItemPointerSetInvalid(&(tuple.t_self));
-	tuple.t_tableOid = InvalidOid;
 	tuple.t_data = rec;
 
 	/*
@@ -821,11 +819,9 @@ record_cmp(FunctionCallInfo fcinfo)
 	/* Build temporary HeapTuple control structures */
 	tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
 	ItemPointerSetInvalid(&(tuple1.t_self));
-	tuple1.t_tableOid = InvalidOid;
 	tuple1.t_data = record1;
 	tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
 	ItemPointerSetInvalid(&(tuple2.t_self));
-	tuple2.t_tableOid = InvalidOid;
 	tuple2.t_data = record2;
 
 	/*
@@ -1063,11 +1059,9 @@ record_eq(PG_FUNCTION_ARGS)
 	/* Build temporary HeapTuple control structures */
 	tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
 	ItemPointerSetInvalid(&(tuple1.t_self));
-	tuple1.t_tableOid = InvalidOid;
 	tuple1.t_data = record1;
 	tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
 	ItemPointerSetInvalid(&(tuple2.t_self));
-	tuple2.t_tableOid = InvalidOid;
 	tuple2.t_data = record2;
 
 	/*
@@ -1326,11 +1320,9 @@ record_image_cmp(FunctionCallInfo fcinfo)
 	/* Build temporary HeapTuple control structures */
 	tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
 	ItemPointerSetInvalid(&(tuple1.t_self));
-	tuple1.t_tableOid = InvalidOid;
 	tuple1.t_data = record1;
 	tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
 	ItemPointerSetInvalid(&(tuple2.t_self));
-	tuple2.t_tableOid = InvalidOid;
 	tuple2.t_data = record2;
 
 	/*
@@ -1570,11 +1562,9 @@ record_image_eq(PG_FUNCTION_ARGS)
 	/* Build temporary HeapTuple control structures */
 	tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
 	ItemPointerSetInvalid(&(tuple1.t_self));
-	tuple1.t_tableOid = InvalidOid;
 	tuple1.t_data = record1;
 	tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
 	ItemPointerSetInvalid(&(tuple2.t_self));
-	tuple2.t_tableOid = InvalidOid;
 	tuple2.t_data = record2;
 
 	/*
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 258a1d64cc..3e165af707 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1846,7 +1846,6 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
 								MAXIMUM_ALIGNOF + dtp->t_len);
 		ct->tuple.t_len = dtp->t_len;
 		ct->tuple.t_self = dtp->t_self;
-		ct->tuple.t_tableOid = dtp->t_tableOid;
 		ct->tuple.t_data = (HeapTupleHeader)
 			MAXALIGN(((char *) ct) + sizeof(CatCTup));
 		/* copy tuple contents */
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 60b96df8f9..a3ed15214a 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -3792,11 +3792,11 @@ comparetup_cluster(const SortTuple *a, const SortTuple *b,
 
 		ecxt_scantuple = GetPerTupleExprContext(state->estate)->ecxt_scantuple;
 
-		ExecStoreHeapTuple(ltup, ecxt_scantuple, false);
+		ExecStoreHeapTuple(ltup, ecxt_scantuple, InvalidOid, false);
 		FormIndexDatum(state->indexInfo, ecxt_scantuple, state->estate,
 					   l_index_values, l_index_isnull);
 
-		ExecStoreHeapTuple(rtup, ecxt_scantuple, false);
+		ExecStoreHeapTuple(rtup, ecxt_scantuple, InvalidOid, false);
 		FormIndexDatum(state->indexInfo, ecxt_scantuple, state->estate,
 					   r_index_values, r_index_isnull);
 
@@ -3926,8 +3926,7 @@ readtup_cluster(Tuplesortstate *state, SortTuple *stup,
 	tuple->t_len = t_len;
 	LogicalTapeReadExact(state->tapeset, tapenum,
 						 &tuple->t_self, sizeof(ItemPointerData));
-	/* We don't currently bother to reconstruct t_tableOid */
-	tuple->t_tableOid = InvalidOid;
+
 	/* Read in the tuple body */
 	LogicalTapeReadExact(state->tapeset, tapenum,
 						 tuple->t_data, tuple->t_len);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2bb170e708..7c9c4f5e98 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -211,7 +211,7 @@ extern void heap_vacuum_rel(Relation onerel, int options,
 				struct VacuumParams *params, BufferAccessStrategy bstrategy);
 
 /* in heap/heapam_visibility.c */
-extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot,
+extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Oid relid, Snapshot snapshot,
 										 Buffer buffer);
 extern HTSU_Result HeapTupleSatisfiesUpdate(HeapTuple stup, CommandId curcid,
 						 Buffer buffer);
@@ -231,6 +231,7 @@ struct HTAB;
 extern bool ResolveCminCmaxDuringDecoding(struct HTAB *tuplecid_data,
 							  Snapshot snapshot,
 							  HeapTuple htup,
+							  Oid relid,
 							  Buffer buffer,
 							  CommandId *cmin, CommandId *cmax);
 
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index fe8630e67a..996153e03f 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -306,9 +306,8 @@ extern TupleTableSlot *MakeSingleTupleTableSlot(TupleDesc tupdesc,
 extern void ExecDropSingleTupleTableSlot(TupleTableSlot *slot);
 extern void ExecSetSlotDescriptor(TupleTableSlot *slot, TupleDesc tupdesc);
 extern TupleTableSlot *ExecStoreHeapTuple(HeapTuple tuple,
-				   TupleTableSlot *slot,
-				   bool shouldFree);
-extern void ExecForceStoreHeapTuple(HeapTuple tuple, TupleTableSlot *slot);
+				   TupleTableSlot *slot, Oid relid, bool shouldFree);
+extern void ExecForceStoreHeapTuple(HeapTuple tuple, TupleTableSlot *slot, Oid relid);
 /* FIXME: Remove */
 extern void ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot);
 extern TupleTableSlot *ExecStoreBufferHeapTuple(HeapTuple tuple,
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 5c6dbe4c5f..1a61eb85b1 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -7236,7 +7236,6 @@ deconstruct_composite_datum(Datum value, HeapTupleData *tmptup)
 	/* Build a temporary HeapTuple control structure */
 	tmptup->t_len = HeapTupleHeaderGetDatumLength(td);
 	ItemPointerSetInvalid(&(tmptup->t_self));
-	tmptup->t_tableOid = InvalidOid;
 	tmptup->t_data = td;
 
 	/* Extract rowtype info and find a tupdesc */
@@ -7405,7 +7404,6 @@ exec_move_row_from_datum(PLpgSQL_execstate *estate,
 		/* Build a temporary HeapTuple control structure */
 		tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
 		ItemPointerSetInvalid(&(tmptup.t_self));
-		tmptup.t_tableOid = InvalidOid;
 		tmptup.t_data = td;
 
 		/* Extract rowtype info */
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index 70727286ca..924bd85ce1 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -524,7 +524,6 @@ make_tuple_indirect(PG_FUNCTION_ARGS)
 	/* Build a temporary HeapTuple control structure */
 	tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 	ItemPointerSetInvalid(&(tuple.t_self));
-	tuple.t_tableOid = InvalidOid;
 	tuple.t_data = rec;
 
 	values = (Datum *) palloc(ncolumns * sizeof(Datum));
-- 
2.18.0.windows.1

#81Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#80)
Re: Pluggable Storage - Andres's take

Hi,

Thanks!

On 2019-01-22 11:51:57 +1100, Haribabu Kommi wrote:

Attached the patch for removal of scan_update_snapshot
and also the rebased patch of reduction in use of t_tableOid.

I'll soon look at the latter.

- consider removing table_gimmegimmeslot()
- add substantial docs for every callback

Will work on the above two.

I think it's easier if I do the first, because I can just do it while
rebasing, reducing unnecessary conflicts.

While I saw an initial attempt at writing smgl docs for the table AM
API, I'm not convinced that's the best approach. I think it might make
more sense to have high-level docs in sgml, but then do all the
per-callback docs in tableam.h.

OK, I will update the sgml docs accordingly.
Index AM has per callback docs in the sgml, refactor them also?

I don't think it's a good idea to tackle the index docs at the same time
- this patchset is already humongously large...

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 62c5f9fa9f..3dc1444739 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2308,7 +2308,6 @@ static const TableAmRoutine heapam_methods = {
.scan_begin = heap_beginscan,
.scan_end = heap_endscan,
.scan_rescan = heap_rescan,
-	.scan_update_snapshot = heap_update_snapshot,
.scan_getnextslot = heap_getnextslot,
.parallelscan_estimate = table_block_parallelscan_estimate,
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 59061c746b..b48ab5036c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -954,5 +954,9 @@ ExecBitmapHeapInitializeWorker(BitmapHeapScanState *node,
node->pstate = pstate;
snapshot = RestoreSnapshot(pstate->phs_snapshot_data);
-	table_scan_update_snapshot(node->ss.ss_currentScanDesc, snapshot);
+	Assert(IsMVCCSnapshot(snapshot));
+
+	RegisterSnapshot(snapshot);
+	node->ss.ss_currentScanDesc->rs_snapshot = snapshot;
+	node->ss.ss_currentScanDesc->rs_temp_snap = true;
}

I was rather thinking that we'd just move this logic into
table_scan_update_snapshot(), without it invoking a callback.

Greetings,

Andres Freund

#82Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#81)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Tue, Jan 22, 2019 at 12:15 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

Thanks!

On 2019-01-22 11:51:57 +1100, Haribabu Kommi wrote:

Attached the patch for removal of scan_update_snapshot
and also the rebased patch of reduction in use of t_tableOid.

I'll soon look at the latter.

Thanks.

- consider removing table_gimmegimmeslot()
- add substantial docs for every callback

Will work on the above two.

I think it's easier if I do the first, because I can just do it while
rebasing, reducing unnecessary conflicts.

OK. I will work on the doc changes.

While I saw an initial attempt at writing smgl docs for the table AM
API, I'm not convinced that's the best approach. I think it might make
more sense to have high-level docs in sgml, but then do all the
per-callback docs in tableam.h.

OK, I will update the sgml docs accordingly.
Index AM has per callback docs in the sgml, refactor them also?

I don't think it's a good idea to tackle the index docs at the same time
- this patchset is already humongously large...

OK.

diff --git a/src/backend/access/heap/heapam_handler.c

b/src/backend/access/heap/heapam_handler.c

index 62c5f9fa9f..3dc1444739 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2308,7 +2308,6 @@ static const TableAmRoutine heapam_methods = {
.scan_begin = heap_beginscan,
.scan_end = heap_endscan,
.scan_rescan = heap_rescan,
-     .scan_update_snapshot = heap_update_snapshot,
.scan_getnextslot = heap_getnextslot,

.parallelscan_estimate = table_block_parallelscan_estimate,
diff --git a/src/backend/executor/nodeBitmapHeapscan.c

b/src/backend/executor/nodeBitmapHeapscan.c

index 59061c746b..b48ab5036c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -954,5 +954,9 @@ ExecBitmapHeapInitializeWorker(BitmapHeapScanState

*node,

node->pstate = pstate;

snapshot = RestoreSnapshot(pstate->phs_snapshot_data);
-     table_scan_update_snapshot(node->ss.ss_currentScanDesc, snapshot);
+     Assert(IsMVCCSnapshot(snapshot));
+
+     RegisterSnapshot(snapshot);
+     node->ss.ss_currentScanDesc->rs_snapshot = snapshot;
+     node->ss.ss_currentScanDesc->rs_temp_snap = true;
}

I was rather thinking that we'd just move this logic into
table_scan_update_snapshot(), without it invoking a callback.

OK. Changed accordingly.
But the table_scan_update_snapshot() function is moved into tableam.c,
to avoid additional header file snapmgr.h inclusion in tableam.h

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0002-Removal-of-scan_update_snapshot-callback.patchapplication/octet-stream; name=0002-Removal-of-scan_update_snapshot-callback.patchDownload
From 71ecc03b10a2877172d7a46dd4ef756bbe5ab879 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Tue, 22 Jan 2019 11:29:20 +1100
Subject: [PATCH] Removal of scan_update_snapshot callback

The snapshot structure is avaiable in the tablescandesc
structure itself, so it can be accessed outside itself,
no need of any callback.
---
 src/backend/access/heap/heapam.c         | 18 ------------------
 src/backend/access/heap/heapam_handler.c |  1 -
 src/backend/access/table/tableam.c       | 13 +++++++++++++
 src/include/access/heapam.h              |  1 -
 src/include/access/tableam.h             | 16 ++++++----------
 5 files changed, 19 insertions(+), 30 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7f594b3e4f..6655a95433 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1252,24 +1252,6 @@ heap_endscan(TableScanDesc sscan)
 	pfree(scan);
 }
 
-/* ----------------
- *		heap_update_snapshot
- *
- *		Update snapshot info in heap scan descriptor.
- * ----------------
- */
-void
-heap_update_snapshot(TableScanDesc sscan, Snapshot snapshot)
-{
-	HeapScanDesc scan = (HeapScanDesc) sscan;
-
-	Assert(IsMVCCSnapshot(snapshot));
-
-	RegisterSnapshot(snapshot);
-	scan->rs_scan.rs_snapshot = snapshot;
-	scan->rs_scan.rs_temp_snap = true;
-}
-
 /* ----------------
  *		heap_getnext	- retrieve next tuple in scan
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 62c5f9fa9f..3dc1444739 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2308,7 +2308,6 @@ static const TableAmRoutine heapam_methods = {
 	.scan_begin = heap_beginscan,
 	.scan_end = heap_endscan,
 	.scan_rescan = heap_rescan,
-	.scan_update_snapshot = heap_update_snapshot,
 	.scan_getnextslot = heap_getnextslot,
 
 	.parallelscan_estimate = table_block_parallelscan_estimate,
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index f579fb0d71..a2da7b7809 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -394,3 +394,16 @@ table_block_parallelscan_nextpage(Relation rel, ParallelBlockTableScanDesc pbsca
 
 	return page;
 }
+
+/*
+ * Update snapshot info in table scan descriptor.
+ */
+void
+table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot)
+{
+	Assert(IsMVCCSnapshot(snapshot));
+
+	RegisterSnapshot(snapshot);
+	scan->rs_snapshot = snapshot;
+	scan->rs_temp_snap = true;
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 7c9c4f5e98..dd67e7e270 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -182,7 +182,6 @@ extern void simple_heap_update(Relation relation, ItemPointer otid,
 				   HeapTuple tup);
 
 extern void heap_sync(Relation relation);
-extern void heap_update_snapshot(TableScanDesc scan, Snapshot snapshot);
 
 extern TransactionId heap_compute_xid_horizon_for_tuples(Relation rel,
 														 ItemPointerData *items,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 428ff90cad..4aa4369366 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -89,7 +89,6 @@ typedef struct TableAmRoutine
 	void		(*scan_end) (TableScanDesc scan);
 	void		(*scan_rescan) (TableScanDesc scan, struct ScanKeyData *key, bool set_params,
 								bool allow_strat, bool allow_sync, bool allow_pagemode);
-	void		(*scan_update_snapshot) (TableScanDesc scan, Snapshot snapshot);
 	TupleTableSlot *(*scan_getnextslot) (TableScanDesc scan,
 										 ScanDirection direction, TupleTableSlot *slot);
 
@@ -389,15 +388,6 @@ table_rescan_set_params(TableScanDesc scan, struct ScanKeyData *key,
 										 allow_pagemode);
 }
 
-/*
- * Update snapshot info in heap scan descriptor.
- */
-static inline void
-table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot)
-{
-	scan->rs_rd->rd_tableam->scan_update_snapshot(scan, snapshot);
-}
-
 static inline TupleTableSlot *
 table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *slot)
 {
@@ -799,6 +789,12 @@ extern BlockNumber table_block_parallelscan_nextpage(Relation rel, ParallelBlock
 extern void table_block_parallelscan_startblock_init(Relation rel, ParallelBlockTableScanDesc pbscan);
 
 
+/* ----------------------------------------------------------------------------
+ * Helper function to update the snapshot of the scan descriptor
+ * ----------------------------------------------------------------------------
+ */
+extern void table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot);
+
 /* ----------------------------------------------------------------------------
  * Functions in tableamapi.c
  * ----------------------------------------------------------------------------
-- 
2.18.0.windows.1

#83Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Haribabu Kommi (#82)
Re: Pluggable Storage - Andres's take

On Mon, Jan 21, 2019 at 3:01 AM Andres Freund <andres@anarazel.de> wrote:

The patchset is now pretty granularly split into individual pieces.

Wow, thanks!

On Mon, Jan 21, 2019 at 9:33 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Regression tests that use \d+ to show the table details might
not be interested specifically in table access method. But these will
fail if run with a modified default access method.

I see your point, but if a test is not interested specifically in a table am,
then I guess it wouldn't use a custom table am in the first place, right? Any
way, I don't have strong opinion here, so if everyone agrees that HIDE_TABLEAM
will show/hide access method unconditionally, I'm fine with that.

#84Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Dmitry Dolgov (#83)
Re: Pluggable Storage - Andres's take

On Tue, 22 Jan 2019 at 15:29, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Mon, Jan 21, 2019 at 9:33 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Regression tests that use \d+ to show the table details might
not be interested specifically in table access method. But these will
fail if run with a modified default access method.

I see your point, but if a test is not interested specifically in a table am,
then I guess it wouldn't use a custom table am in the first place, right?

Right. It wouldn't use a custom table am. But I mean, despite not
using a custom table am, the test would fail if the regression runs
with a changed default access method, because the regression output
file has only one particular am value output.

Anyway, I don't have strong opinion here, so if everyone agrees that HIDE_TABLEAM
will show/hide access method unconditionally, I'm fine with that.

Yeah, I agree it's subjective.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#85Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Amit Khandekar (#84)
2 attachment(s)
Re: Pluggable Storage - Andres's take

On Sun, Jan 20, 2019 at 6:17 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Fri, Jan 18, 2019 at 11:22 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

I believe you are going to add a new regression testcase for the change ?

Yep.

So, here are these two patches for pg_dump/psql with a few regression tests.

Attachments:

psql_describe_am_v3.patchapplication/octet-stream; name=psql_describe_am_v3.patchDownload
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 0a181b01d9..f76c734a28 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1484,6 +1484,8 @@ describeOneTableDetails(const char *schemaname,
 		char	   *reloftype;
 		char		relpersistence;
 		char		relreplident;
+		char	   *relam;
+		bool	    relam_is_default;
 	}			tableinfo;
 	bool		show_column_details = false;
 
@@ -1503,9 +1505,11 @@ describeOneTableDetails(const char *schemaname,
 						  "c.relhastriggers, c.relrowsecurity, c.relforcerowsecurity, "
 						  "false AS relhasoids, %s, c.reltablespace, "
 						  "CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END, "
-						  "c.relpersistence, c.relreplident\n"
+						  "c.relpersistence, c.relreplident, am.amname,"
+						  "am.amname = current_setting('default_table_access_method') \n"
 						  "FROM pg_catalog.pg_class c\n "
 						  "LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
+						  "LEFT JOIN pg_catalog.pg_am am ON (c.relam = am.oid)\n"
 						  "WHERE c.oid = '%s';",
 						  (verbose ?
 						   "pg_catalog.array_to_string(c.reloptions || "
@@ -1656,6 +1660,17 @@ describeOneTableDetails(const char *schemaname,
 		*(PQgetvalue(res, 0, 11)) : 0;
 	tableinfo.relreplident = (pset.sversion >= 90400) ?
 		*(PQgetvalue(res, 0, 12)) : 'd';
+	if (pset.sversion >= 120000)
+	{
+		tableinfo.relam = PQgetisnull(res, 0, 13) ?
+			(char *) NULL : pg_strdup(PQgetvalue(res, 0, 13));
+		tableinfo.relam_is_default = strcmp(PQgetvalue(res, 0, 14), "t") == 0;
+	}
+	else
+	{
+		tableinfo.relam = NULL;
+		tableinfo.relam_is_default = false;
+	}
 	PQclear(res);
 	res = NULL;
 
@@ -3141,6 +3156,14 @@ describeOneTableDetails(const char *schemaname,
 		/* Tablespace info */
 		add_tablespace_footer(&cont, tableinfo.relkind, tableinfo.tablespace,
 							  true);
+
+		/* Access method info */
+		if (verbose && tableinfo.relam != NULL &&
+		   !(pset.hide_tableam && tableinfo.relam_is_default))
+		{
+			printfPQExpBuffer(&buf, _("Access method: %s"), tableinfo.relam);
+			printTableAddFooter(&cont, buf.data);
+		}
 	}
 
 	/* reloptions, if verbose */
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 176c85afd0..058233b348 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -127,6 +127,7 @@ typedef struct _psqlSettings
 	bool		quiet;
 	bool		singleline;
 	bool		singlestep;
+	bool		hide_tableam;
 	int			fetch_count;
 	int			histsize;
 	int			ignoreeof;
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index e7536a8a06..b757febcc5 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -1128,6 +1128,11 @@ show_context_hook(const char *newval)
 	return true;
 }
 
+static bool
+hide_tableam_hook(const char *newval)
+{
+	return ParseVariableBool(newval, "HIDE_TABLEAM", &pset.hide_tableam);
+}
 
 static void
 EstablishVariableSpace(void)
@@ -1191,4 +1196,7 @@ EstablishVariableSpace(void)
 	SetVariableHooks(pset.vars, "SHOW_CONTEXT",
 					 show_context_substitute_hook,
 					 show_context_hook);
+	SetVariableHooks(pset.vars, "HIDE_TABLEAM",
+					 bool_substitute_hook,
+					 hide_tableam_hook);
 }
diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out
index 775b127121..4c9a55a08a 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -2773,6 +2773,44 @@ Argument data types | numeric
 Type                | func
 
 \pset tuples_only false
+-- check conditional tableam display
+-- Create a heap2 table am handler with heapam handler
+CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler;
+CREATE TABLE tbl_heap2(f1 int, f2 char(100)) using heap2;
+CREATE TABLE tbl_heap(f1 int, f2 char(100)) using heap;
+\d+ tbl_heap2
+                                     Table "public.tbl_heap2"
+ Column |      Type      | Collation | Nullable | Default | Storage  | Stats target | Description 
+--------+----------------+-----------+----------+---------+----------+--------------+-------------
+ f1     | integer        |           |          |         | plain    |              | 
+ f2     | character(100) |           |          |         | extended |              | 
+Access method: heap2
+
+\d+ tbl_heap
+                                     Table "public.tbl_heap"
+ Column |      Type      | Collation | Nullable | Default | Storage  | Stats target | Description 
+--------+----------------+-----------+----------+---------+----------+--------------+-------------
+ f1     | integer        |           |          |         | plain    |              | 
+ f2     | character(100) |           |          |         | extended |              | 
+
+\set HIDE_TABLEAM off
+\d+ tbl_heap2
+                                     Table "public.tbl_heap2"
+ Column |      Type      | Collation | Nullable | Default | Storage  | Stats target | Description 
+--------+----------------+-----------+----------+---------+----------+--------------+-------------
+ f1     | integer        |           |          |         | plain    |              | 
+ f2     | character(100) |           |          |         | extended |              | 
+Access method: heap2
+
+\d+ tbl_heap
+                                     Table "public.tbl_heap"
+ Column |      Type      | Collation | Nullable | Default | Storage  | Stats target | Description 
+--------+----------------+-----------+----------+---------+----------+--------------+-------------
+ f1     | integer        |           |          |         | plain    |              | 
+ f2     | character(100) |           |          |         | extended |              | 
+Access method: heap
+
+\set HIDE_TABLEAM on
 -- test numericlocale (as best we can without control of psql's locale)
 \pset format aligned
 \pset expanded off
diff --git a/src/test/regress/pg_regress_main.c b/src/test/regress/pg_regress_main.c
index bd613e4fda..1b4bca704b 100644
--- a/src/test/regress/pg_regress_main.c
+++ b/src/test/regress/pg_regress_main.c
@@ -74,10 +74,11 @@ psql_start_test(const char *testname,
 	}
 
 	offset += snprintf(psql_cmd + offset, sizeof(psql_cmd) - offset,
-					   "\"%s%spsql\" -X -a -q -d \"%s\" < \"%s\" > \"%s\" 2>&1",
+					   "\"%s%spsql\" -X -a -q -d \"%s\" -v %s < \"%s\" > \"%s\" 2>&1",
 					   bindir ? bindir : "",
 					   bindir ? "/" : "",
 					   dblist->str,
+					   "HIDE_TABLEAM=\"on\"",
 					   infile,
 					   outfile);
 	if (offset >= sizeof(psql_cmd))
diff --git a/src/test/regress/sql/psql.sql b/src/test/regress/sql/psql.sql
index 1bb2a6e16d..09ba7b6c14 100644
--- a/src/test/regress/sql/psql.sql
+++ b/src/test/regress/sql/psql.sql
@@ -448,6 +448,19 @@ select 1 where false;
 \df exp
 \pset tuples_only false
 
+-- check conditional tableam display
+
+-- Create a heap2 table am handler with heapam handler
+CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler;
+CREATE TABLE tbl_heap2(f1 int, f2 char(100)) using heap2;
+CREATE TABLE tbl_heap(f1 int, f2 char(100)) using heap;
+\d+ tbl_heap2
+\d+ tbl_heap
+\set HIDE_TABLEAM off
+\d+ tbl_heap2
+\d+ tbl_heap
+\set HIDE_TABLEAM on
+
 -- test numericlocale (as best we can without control of psql's locale)
 
 \pset format aligned
pg_dump_access_method_v5.patchapplication/octet-stream; name=pg_dump_access_method_v5.patchDownload
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 58bd3805f4..38f24ba1bf 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -85,6 +85,7 @@ static void _becomeUser(ArchiveHandle *AH, const char *user);
 static void _becomeOwner(ArchiveHandle *AH, TocEntry *te);
 static void _selectOutputSchema(ArchiveHandle *AH, const char *schemaName);
 static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
+static void _selectTableAccessMethod(ArchiveHandle *AH, const char *tableam);
 static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
 static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
 static void processSearchPathEntry(ArchiveHandle *AH, TocEntry *te);
@@ -1072,6 +1073,7 @@ ArchiveEntry(Archive *AHX,
 			 const char *namespace,
 			 const char *tablespace,
 			 const char *owner,
+			 const char *tableam,
 			 const char *desc, teSection section,
 			 const char *defn,
 			 const char *dropStmt, const char *copyStmt,
@@ -1099,6 +1101,7 @@ ArchiveEntry(Archive *AHX,
 	newToc->tag = pg_strdup(tag);
 	newToc->namespace = namespace ? pg_strdup(namespace) : NULL;
 	newToc->tablespace = tablespace ? pg_strdup(tablespace) : NULL;
+	newToc->tableam = tableam ? pg_strdup(tableam) : NULL;
 	newToc->owner = pg_strdup(owner);
 	newToc->desc = pg_strdup(desc);
 	newToc->defn = pg_strdup(defn);
@@ -2367,6 +2370,7 @@ _allocAH(const char *FileSpec, const ArchiveFormat fmt,
 	AH->currUser = NULL;		/* unknown */
 	AH->currSchema = NULL;		/* ditto */
 	AH->currTablespace = NULL;	/* ditto */
+	AH->currTableAm = NULL;	/* ditto */
 
 	AH->toc = (TocEntry *) pg_malloc0(sizeof(TocEntry));
 
@@ -2594,6 +2598,7 @@ WriteToc(ArchiveHandle *AH)
 		WriteStr(AH, te->namespace);
 		WriteStr(AH, te->tablespace);
 		WriteStr(AH, te->owner);
+		WriteStr(AH, te->tableam);
 		WriteStr(AH, "false");
 
 		/* Dump list of dependencies */
@@ -2696,6 +2701,9 @@ ReadToc(ArchiveHandle *AH)
 			te->tablespace = ReadStr(AH);
 
 		te->owner = ReadStr(AH);
+		if (AH->version >= K_VERS_1_14)
+			te->tableam = ReadStr(AH);
+
 		if (AH->version < K_VERS_1_9 || strcmp(ReadStr(AH), "true") == 0)
 			write_msg(modulename,
 					  "WARNING: restoring tables WITH OIDS is not supported anymore");
@@ -3288,6 +3296,9 @@ _reconnectToDB(ArchiveHandle *AH, const char *dbname)
 	if (AH->currTablespace)
 		free(AH->currTablespace);
 	AH->currTablespace = NULL;
+	if (AH->currTableAm)
+		free(AH->currTableAm);
+	AH->currTableAm = NULL;
 
 	/* re-establish fixed state */
 	_doSetFixedOutputState(AH);
@@ -3448,6 +3459,48 @@ _selectTablespace(ArchiveHandle *AH, const char *tablespace)
 	destroyPQExpBuffer(qry);
 }
 
+/*
+ * Set the proper default_table_access_method value for the table.
+ */
+static void
+_selectTableAccessMethod(ArchiveHandle *AH, const char *tableam)
+{
+	PQExpBuffer cmd;
+	const char *want, *have;
+
+	have = AH->currTableAm;
+	want = tableam;
+
+	if (!want)
+		return;
+
+	if (have && strcmp(want, have) == 0)
+		return;
+
+	cmd = createPQExpBuffer();
+	appendPQExpBuffer(cmd, "SET default_table_access_method = %s;", fmtId(want));
+
+	if (RestoringToDB(AH))
+	{
+		PGresult   *res;
+
+		res = PQexec(AH->connection, cmd->data);
+
+		if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
+			warn_or_exit_horribly(AH, modulename,
+								  "could not set default_table_access_method: %s",
+								  PQerrorMessage(AH->connection));
+
+		PQclear(res);
+	}
+	else
+		ahprintf(AH, "%s\n\n", cmd->data);
+
+	destroyPQExpBuffer(cmd);
+
+	AH->currTableAm = pg_strdup(want);
+}
+
 /*
  * Extract an object description for a TOC entry, and append it to buf.
  *
@@ -3547,6 +3600,7 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
 	_becomeOwner(AH, te);
 	_selectOutputSchema(AH, te->namespace);
 	_selectTablespace(AH, te->tablespace);
+	_selectTableAccessMethod(AH, te->tableam);
 
 	/* Emit header comment for item */
 	if (!AH->noTocComments)
@@ -4021,6 +4075,9 @@ restore_toc_entries_prefork(ArchiveHandle *AH, TocEntry *pending_list)
 	if (AH->currTablespace)
 		free(AH->currTablespace);
 	AH->currTablespace = NULL;
+	if (AH->currTableAm)
+		free(AH->currTableAm);
+	AH->currTableAm = NULL;
 }
 
 /*
@@ -4816,6 +4873,7 @@ CloneArchive(ArchiveHandle *AH)
 	clone->currUser = NULL;
 	clone->currSchema = NULL;
 	clone->currTablespace = NULL;
+	clone->currTableAm = NULL;
 
 	/* savedPassword must be local in case we change it while connecting */
 	if (clone->savedPassword)
@@ -4906,6 +4964,8 @@ DeCloneArchive(ArchiveHandle *AH)
 		free(AH->currSchema);
 	if (AH->currTablespace)
 		free(AH->currTablespace);
+	if (AH->currTableAm)
+		free(AH->currTableAm);
 	if (AH->savedPassword)
 		free(AH->savedPassword);
 
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 306d2ceba9..c719fca0ad 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -94,10 +94,11 @@ typedef z_stream *z_streamp;
 													 * entries */
 #define K_VERS_1_13 MAKE_ARCHIVE_VERSION(1, 13, 0)	/* change search_path
 													 * behavior */
+#define K_VERS_1_14 MAKE_ARCHIVE_VERSION(1, 14, 0)	/* add tableam */
 
 /* Current archive version number (the format we can output) */
 #define K_VERS_MAJOR 1
-#define K_VERS_MINOR 13
+#define K_VERS_MINOR 14
 #define K_VERS_REV 0
 #define K_VERS_SELF MAKE_ARCHIVE_VERSION(K_VERS_MAJOR, K_VERS_MINOR, K_VERS_REV);
 
@@ -347,6 +348,7 @@ struct _archiveHandle
 	char	   *currUser;		/* current username, or NULL if unknown */
 	char	   *currSchema;		/* current schema, or NULL */
 	char	   *currTablespace; /* current tablespace, or NULL */
+	char	   *currTableAm; 	/* current table access method, or NULL */
 
 	void	   *lo_buf;
 	size_t		lo_buf_used;
@@ -373,6 +375,8 @@ struct _tocEntry
 	char	   *namespace;		/* null or empty string if not in a schema */
 	char	   *tablespace;		/* null if not in a tablespace; empty string
 								 * means use database default */
+	char	   *tableam;		/* table access method, only for TABLE tags */
+
 	char	   *owner;
 	char	   *desc;
 	char	   *defn;
@@ -410,7 +414,7 @@ extern TocEntry *ArchiveEntry(Archive *AHX,
 			 CatalogId catalogId, DumpId dumpId,
 			 const char *tag,
 			 const char *namespace, const char *tablespace,
-			 const char *owner,
+			 const char *owner, const char *amname,
 			 const char *desc, teSection section,
 			 const char *defn,
 			 const char *dropStmt, const char *copyStmt,
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 637c79af48..512c486546 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2136,7 +2136,7 @@ dumpTableData(Archive *fout, TableDataInfo *tdinfo)
 
 		te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
 						  tbinfo->dobj.name, tbinfo->dobj.namespace->dobj.name,
-						  NULL, tbinfo->rolname,
+						  NULL, tbinfo->rolname, NULL,
 						  "TABLE DATA", SECTION_DATA,
 						  "", "", copyStmt,
 						  &(tbinfo->dobj.dumpId), 1,
@@ -2188,6 +2188,7 @@ refreshMatViewData(Archive *fout, TableDataInfo *tdinfo)
 					 tbinfo->dobj.namespace->dobj.name, /* Namespace */
 					 NULL,		/* Tablespace */
 					 tbinfo->rolname,	/* Owner */
+					 NULL,				/* Table access method */
 					 "MATERIALIZED VIEW DATA",	/* Desc */
 					 SECTION_POST_DATA, /* Section */
 					 q->data,	/* Create */
@@ -2726,6 +2727,7 @@ dumpDatabase(Archive *fout)
 				 NULL,			/* Namespace */
 				 NULL,			/* Tablespace */
 				 dba,			/* Owner */
+				 NULL,			/* Table access method */
 				 "DATABASE",	/* Desc */
 				 SECTION_PRE_DATA,	/* Section */
 				 creaQry->data, /* Create */
@@ -2762,7 +2764,7 @@ dumpDatabase(Archive *fout)
 			appendPQExpBufferStr(dbQry, ";\n");
 
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
-						 labelq->data, NULL, NULL, dba,
+						 labelq->data, NULL, NULL, dba, NULL,
 						 "COMMENT", SECTION_NONE,
 						 dbQry->data, "", NULL,
 						 &(dbDumpId), 1,
@@ -2789,7 +2791,7 @@ dumpDatabase(Archive *fout)
 		emitShSecLabels(conn, shres, seclabelQry, "DATABASE", datname);
 		if (seclabelQry->len > 0)
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
-						 labelq->data, NULL, NULL, dba,
+						 labelq->data, NULL, NULL, dba, NULL,
 						 "SECURITY LABEL", SECTION_NONE,
 						 seclabelQry->data, "", NULL,
 						 &(dbDumpId), 1,
@@ -2859,7 +2861,7 @@ dumpDatabase(Archive *fout)
 
 	if (creaQry->len > 0)
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 datname, NULL, NULL, dba,
+					 datname, NULL, NULL, dba, NULL,
 					 "DATABASE PROPERTIES", SECTION_PRE_DATA,
 					 creaQry->data, delQry->data, NULL,
 					 &(dbDumpId), 1,
@@ -2904,7 +2906,7 @@ dumpDatabase(Archive *fout)
 						  atooid(PQgetvalue(lo_res, 0, i_relminmxid)),
 						  LargeObjectRelationId);
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 "pg_largeobject", NULL, NULL, "",
+					 "pg_largeobject", NULL, NULL, "", NULL,
 					 "pg_largeobject", SECTION_PRE_DATA,
 					 loOutQry->data, "", NULL,
 					 NULL, 0,
@@ -3014,7 +3016,7 @@ dumpEncoding(Archive *AH)
 	appendPQExpBufferStr(qry, ";\n");
 
 	ArchiveEntry(AH, nilCatalogId, createDumpId(),
-				 "ENCODING", NULL, NULL, "",
+				 "ENCODING", NULL, NULL, "", NULL,
 				 "ENCODING", SECTION_PRE_DATA,
 				 qry->data, "", NULL,
 				 NULL, 0,
@@ -3041,7 +3043,7 @@ dumpStdStrings(Archive *AH)
 					  stdstrings);
 
 	ArchiveEntry(AH, nilCatalogId, createDumpId(),
-				 "STDSTRINGS", NULL, NULL, "",
+				 "STDSTRINGS", NULL, NULL, "", NULL,
 				 "STDSTRINGS", SECTION_PRE_DATA,
 				 qry->data, "", NULL,
 				 NULL, 0,
@@ -3097,7 +3099,7 @@ dumpSearchPath(Archive *AH)
 		write_msg(NULL, "saving search_path = %s\n", path->data);
 
 	ArchiveEntry(AH, nilCatalogId, createDumpId(),
-				 "SEARCHPATH", NULL, NULL, "",
+				 "SEARCHPATH", NULL, NULL, "", NULL,
 				 "SEARCHPATH", SECTION_PRE_DATA,
 				 qry->data, "", NULL,
 				 NULL, 0,
@@ -3275,7 +3277,7 @@ dumpBlob(Archive *fout, BlobInfo *binfo)
 		ArchiveEntry(fout, binfo->dobj.catId, binfo->dobj.dumpId,
 					 binfo->dobj.name,
 					 NULL, NULL,
-					 binfo->rolname,
+					 binfo->rolname, NULL,
 					 "BLOB", SECTION_PRE_DATA,
 					 cquery->data, dquery->data, NULL,
 					 NULL, 0,
@@ -3581,6 +3583,7 @@ dumpPolicy(Archive *fout, PolicyInfo *polinfo)
 						 polinfo->dobj.namespace->dobj.name,
 						 NULL,
 						 tbinfo->rolname,
+						 NULL,
 						 "ROW SECURITY", SECTION_POST_DATA,
 						 query->data, "", NULL,
 						 &(tbinfo->dobj.dumpId), 1,
@@ -3637,6 +3640,7 @@ dumpPolicy(Archive *fout, PolicyInfo *polinfo)
 					 polinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "POLICY", SECTION_POST_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -3811,6 +3815,7 @@ dumpPublication(Archive *fout, PublicationInfo *pubinfo)
 				 NULL,
 				 NULL,
 				 pubinfo->rolname,
+				 NULL,
 				 "PUBLICATION", SECTION_POST_DATA,
 				 query->data, delq->data, NULL,
 				 NULL, 0,
@@ -3954,6 +3959,7 @@ dumpPublicationTable(Archive *fout, PublicationRelInfo *pubrinfo)
 				 tbinfo->dobj.namespace->dobj.name,
 				 NULL,
 				 "",
+				 NULL,
 				 "PUBLICATION TABLE", SECTION_POST_DATA,
 				 query->data, "", NULL,
 				 NULL, 0,
@@ -4147,6 +4153,7 @@ dumpSubscription(Archive *fout, SubscriptionInfo *subinfo)
 				 NULL,
 				 NULL,
 				 subinfo->rolname,
+				 NULL,
 				 "SUBSCRIPTION", SECTION_POST_DATA,
 				 query->data, delq->data, NULL,
 				 NULL, 0,
@@ -5829,6 +5836,7 @@ getTables(Archive *fout, int *numTables)
 	int			i_partkeydef;
 	int			i_ispartition;
 	int			i_partbound;
+	int			i_amname;
 
 	/*
 	 * Find all the tables and table-like objects.
@@ -5914,7 +5922,7 @@ getTables(Archive *fout, int *numTables)
 						  "tc.relfrozenxid AS tfrozenxid, "
 						  "tc.relminmxid AS tminmxid, "
 						  "c.relpersistence, c.relispopulated, "
-						  "c.relreplident, c.relpages, "
+						  "c.relreplident, c.relpages, am.amname, "
 						  "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
 						  "d.refobjid AS owning_tab, "
 						  "d.refobjsubid AS owning_col, "
@@ -5945,6 +5953,7 @@ getTables(Archive *fout, int *numTables)
 						  "d.objsubid = 0 AND "
 						  "d.refclassid = c.tableoid AND d.deptype IN ('a', 'i')) "
 						  "LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid) "
+						  "LEFT JOIN pg_am am ON (c.relam = am.oid) "
 						  "LEFT JOIN pg_init_privs pip ON "
 						  "(c.oid = pip.objoid "
 						  "AND pip.classoid = 'pg_class'::regclass "
@@ -6412,6 +6421,7 @@ getTables(Archive *fout, int *numTables)
 	i_partkeydef = PQfnumber(res, "partkeydef");
 	i_ispartition = PQfnumber(res, "ispartition");
 	i_partbound = PQfnumber(res, "partbound");
+	i_amname = PQfnumber(res, "amname");
 
 	if (dopt->lockWaitTimeout)
 	{
@@ -6481,6 +6491,11 @@ getTables(Archive *fout, int *numTables)
 		else
 			tblinfo[i].checkoption = pg_strdup(PQgetvalue(res, i, i_checkoption));
 		tblinfo[i].toast_reloptions = pg_strdup(PQgetvalue(res, i, i_toastreloptions));
+		if (PQgetisnull(res, i, i_amname))
+			tblinfo[i].amname = NULL;
+		else
+			tblinfo[i].amname = pg_strdup(PQgetvalue(res, i, i_amname));
+
 
 		/* other fields were zeroed above */
 
@@ -9355,7 +9370,7 @@ dumpComment(Archive *fout, const char *type, const char *name,
 		 * post-data.
 		 */
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 tag->data, namespace, NULL, owner,
+					 tag->data, namespace, NULL, owner, NULL,
 					 "COMMENT", SECTION_NONE,
 					 query->data, "", NULL,
 					 &(dumpId), 1,
@@ -9423,7 +9438,7 @@ dumpTableComment(Archive *fout, TableInfo *tbinfo,
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
 						 tag->data,
 						 tbinfo->dobj.namespace->dobj.name,
-						 NULL, tbinfo->rolname,
+						 NULL, tbinfo->rolname, NULL,
 						 "COMMENT", SECTION_NONE,
 						 query->data, "", NULL,
 						 &(tbinfo->dobj.dumpId), 1,
@@ -9447,7 +9462,7 @@ dumpTableComment(Archive *fout, TableInfo *tbinfo,
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
 						 tag->data,
 						 tbinfo->dobj.namespace->dobj.name,
-						 NULL, tbinfo->rolname,
+						 NULL, tbinfo->rolname, NULL,
 						 "COMMENT", SECTION_NONE,
 						 query->data, "", NULL,
 						 &(tbinfo->dobj.dumpId), 1,
@@ -9728,7 +9743,7 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 				TocEntry   *te;
 
 				te = ArchiveEntry(fout, dobj->catId, dobj->dumpId,
-								  dobj->name, NULL, NULL, "",
+								  dobj->name, NULL, NULL, "", NULL,
 								  "BLOBS", SECTION_DATA,
 								  "", "", NULL,
 								  NULL, 0,
@@ -9802,7 +9817,7 @@ dumpNamespace(Archive *fout, NamespaceInfo *nspinfo)
 		ArchiveEntry(fout, nspinfo->dobj.catId, nspinfo->dobj.dumpId,
 					 nspinfo->dobj.name,
 					 NULL, NULL,
-					 nspinfo->rolname,
+					 nspinfo->rolname, NULL,
 					 "SCHEMA", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -9938,7 +9953,7 @@ dumpExtension(Archive *fout, ExtensionInfo *extinfo)
 		ArchiveEntry(fout, extinfo->dobj.catId, extinfo->dobj.dumpId,
 					 extinfo->dobj.name,
 					 NULL, NULL,
-					 "",
+					 "", NULL,
 					 "EXTENSION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10090,6 +10105,7 @@ dumpEnumType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10217,6 +10233,7 @@ dumpRangeType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10290,6 +10307,7 @@ dumpUndefinedType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10572,6 +10590,7 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10729,6 +10748,7 @@ dumpDomain(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "DOMAIN", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -10951,6 +10971,7 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
 					 tyinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tyinfo->rolname,
+					 NULL,
 					 "TYPE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -11085,7 +11106,7 @@ dumpCompositeTypeColComments(Archive *fout, TypeInfo *tyinfo)
 			ArchiveEntry(fout, nilCatalogId, createDumpId(),
 						 target->data,
 						 tyinfo->dobj.namespace->dobj.name,
-						 NULL, tyinfo->rolname,
+						 NULL, tyinfo->rolname, NULL,
 						 "COMMENT", SECTION_NONE,
 						 query->data, "", NULL,
 						 &(tyinfo->dobj.dumpId), 1,
@@ -11142,6 +11163,7 @@ dumpShellType(Archive *fout, ShellTypeInfo *stinfo)
 					 stinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 stinfo->baseType->rolname,
+					 NULL,
 					 "SHELL TYPE", SECTION_PRE_DATA,
 					 q->data, "", NULL,
 					 NULL, 0,
@@ -11251,7 +11273,7 @@ dumpProcLang(Archive *fout, ProcLangInfo *plang)
 	if (plang->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, plang->dobj.catId, plang->dobj.dumpId,
 					 plang->dobj.name,
-					 NULL, NULL, plang->lanowner,
+					 NULL, NULL, plang->lanowner, NULL,
 					 "PROCEDURAL LANGUAGE", SECTION_PRE_DATA,
 					 defqry->data, delqry->data, NULL,
 					 NULL, 0,
@@ -11924,6 +11946,7 @@ dumpFunc(Archive *fout, FuncInfo *finfo)
 					 finfo->dobj.namespace->dobj.name,
 					 NULL,
 					 finfo->rolname,
+					 NULL,
 					 keyword, SECTION_PRE_DATA,
 					 q->data, delqry->data, NULL,
 					 NULL, 0,
@@ -12056,7 +12079,7 @@ dumpCast(Archive *fout, CastInfo *cast)
 	if (cast->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, cast->dobj.catId, cast->dobj.dumpId,
 					 labelq->data,
-					 NULL, NULL, "",
+					 NULL, NULL, "", NULL,
 					 "CAST", SECTION_PRE_DATA,
 					 defqry->data, delqry->data, NULL,
 					 NULL, 0,
@@ -12184,7 +12207,7 @@ dumpTransform(Archive *fout, TransformInfo *transform)
 	if (transform->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, transform->dobj.catId, transform->dobj.dumpId,
 					 labelq->data,
-					 NULL, NULL, "",
+					 NULL, NULL, "", NULL,
 					 "TRANSFORM", SECTION_PRE_DATA,
 					 defqry->data, delqry->data, NULL,
 					 transform->dobj.dependencies, transform->dobj.nDeps,
@@ -12400,6 +12423,7 @@ dumpOpr(Archive *fout, OprInfo *oprinfo)
 					 oprinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 oprinfo->rolname,
+					 NULL,
 					 "OPERATOR", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -12546,6 +12570,9 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
 		case AMTYPE_INDEX:
 			appendPQExpBuffer(q, "TYPE INDEX ");
 			break;
+		case AMTYPE_TABLE:
+			appendPQExpBuffer(q, "TYPE TABLE ");
+			break;
 		default:
 			write_msg(NULL, "WARNING: invalid type \"%c\" of access method \"%s\"\n",
 					  aminfo->amtype, qamname);
@@ -12570,6 +12597,7 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
 					 NULL,
 					 NULL,
 					 "",
+					 NULL,
 					 "ACCESS METHOD", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -12936,6 +12964,7 @@ dumpOpclass(Archive *fout, OpclassInfo *opcinfo)
 					 opcinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 opcinfo->rolname,
+					 NULL,
 					 "OPERATOR CLASS", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13203,6 +13232,7 @@ dumpOpfamily(Archive *fout, OpfamilyInfo *opfinfo)
 					 opfinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 opfinfo->rolname,
+					 NULL,
 					 "OPERATOR FAMILY", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13346,6 +13376,7 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
 					 collinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 collinfo->rolname,
+					 NULL,
 					 "COLLATION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13441,6 +13472,7 @@ dumpConversion(Archive *fout, ConvInfo *convinfo)
 					 convinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 convinfo->rolname,
+					 NULL,
 					 "CONVERSION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -13930,6 +13962,7 @@ dumpAgg(Archive *fout, AggInfo *agginfo)
 					 agginfo->aggfn.dobj.namespace->dobj.name,
 					 NULL,
 					 agginfo->aggfn.rolname,
+					 NULL,
 					 "AGGREGATE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14028,6 +14061,7 @@ dumpTSParser(Archive *fout, TSParserInfo *prsinfo)
 					 prsinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 "",
+					 NULL,
 					 "TEXT SEARCH PARSER", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14108,6 +14142,7 @@ dumpTSDictionary(Archive *fout, TSDictInfo *dictinfo)
 					 dictinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 dictinfo->rolname,
+					 NULL,
 					 "TEXT SEARCH DICTIONARY", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14169,6 +14204,7 @@ dumpTSTemplate(Archive *fout, TSTemplateInfo *tmplinfo)
 					 tmplinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 "",
+					 NULL,
 					 "TEXT SEARCH TEMPLATE", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14289,6 +14325,7 @@ dumpTSConfig(Archive *fout, TSConfigInfo *cfginfo)
 					 cfginfo->dobj.namespace->dobj.name,
 					 NULL,
 					 cfginfo->rolname,
+					 NULL,
 					 "TEXT SEARCH CONFIGURATION", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14355,6 +14392,7 @@ dumpForeignDataWrapper(Archive *fout, FdwInfo *fdwinfo)
 					 NULL,
 					 NULL,
 					 fdwinfo->rolname,
+					 NULL,
 					 "FOREIGN DATA WRAPPER", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14446,6 +14484,7 @@ dumpForeignServer(Archive *fout, ForeignServerInfo *srvinfo)
 					 NULL,
 					 NULL,
 					 srvinfo->rolname,
+					 NULL,
 					 "SERVER", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -14564,6 +14603,7 @@ dumpUserMappings(Archive *fout,
 					 namespace,
 					 NULL,
 					 owner,
+					 NULL,
 					 "USER MAPPING", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 &dumpId, 1,
@@ -14643,6 +14683,7 @@ dumpDefaultACL(Archive *fout, DefaultACLInfo *daclinfo)
 					 daclinfo->dobj.namespace ? daclinfo->dobj.namespace->dobj.name : NULL,
 					 NULL,
 					 daclinfo->defaclrole,
+					 NULL,
 					 "DEFAULT ACL", SECTION_POST_DATA,
 					 q->data, "", NULL,
 					 NULL, 0,
@@ -14741,6 +14782,7 @@ dumpACL(Archive *fout, CatalogId objCatId, DumpId objDumpId,
 					 tag->data, nspname,
 					 NULL,
 					 owner ? owner : "",
+					 NULL,
 					 "ACL", SECTION_NONE,
 					 sql->data, "", NULL,
 					 &(objDumpId), 1,
@@ -14826,7 +14868,7 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
 
 		appendPQExpBuffer(tag, "%s %s", type, name);
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
-					 tag->data, namespace, NULL, owner,
+					 tag->data, namespace, NULL, owner, NULL,
 					 "SECURITY LABEL", SECTION_NONE,
 					 query->data, "", NULL,
 					 &(dumpId), 1,
@@ -14908,7 +14950,7 @@ dumpTableSecLabel(Archive *fout, TableInfo *tbinfo, const char *reltypename)
 		ArchiveEntry(fout, nilCatalogId, createDumpId(),
 					 target->data,
 					 tbinfo->dobj.namespace->dobj.name,
-					 NULL, tbinfo->rolname,
+					 NULL, tbinfo->rolname, NULL,
 					 "SECURITY LABEL", SECTION_NONE,
 					 query->data, "", NULL,
 					 &(tbinfo->dobj.dumpId), 1,
@@ -15994,6 +16036,8 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 (tbinfo->relkind == RELKIND_VIEW) ? NULL : tbinfo->reltablespace,
 					 tbinfo->rolname,
+					 (tbinfo->relkind == RELKIND_RELATION) ?
+					 tbinfo->amname : NULL,
 					 reltypename,
 					 tbinfo->postponed_def ?
 					 SECTION_POST_DATA : SECTION_PRE_DATA,
@@ -16074,6 +16118,7 @@ dumpAttrDef(Archive *fout, AttrDefInfo *adinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "DEFAULT", SECTION_PRE_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -16190,6 +16235,7 @@ dumpIndex(Archive *fout, IndxInfo *indxinfo)
 						 tbinfo->dobj.namespace->dobj.name,
 						 indxinfo->tablespace,
 						 tbinfo->rolname,
+						 NULL,
 						 "INDEX", SECTION_POST_DATA,
 						 q->data, delq->data, NULL,
 						 NULL, 0,
@@ -16234,6 +16280,7 @@ dumpIndexAttach(Archive *fout, IndexAttachInfo *attachinfo)
 					 attachinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 "",
+					 NULL,
 					 "INDEX ATTACH", SECTION_POST_DATA,
 					 q->data, "", NULL,
 					 NULL, 0,
@@ -16289,6 +16336,7 @@ dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo)
 					 statsextinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 statsextinfo->rolname,
+					 NULL,
 					 "STATISTICS", SECTION_POST_DATA,
 					 q->data, delq->data, NULL,
 					 NULL, 0,
@@ -16450,6 +16498,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 						 tbinfo->dobj.namespace->dobj.name,
 						 indxinfo->tablespace,
 						 tbinfo->rolname,
+						 NULL,
 						 "CONSTRAINT", SECTION_POST_DATA,
 						 q->data, delq->data, NULL,
 						 NULL, 0,
@@ -16490,6 +16539,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 						 tbinfo->dobj.namespace->dobj.name,
 						 NULL,
 						 tbinfo->rolname,
+						 NULL,
 						 "FK CONSTRAINT", SECTION_POST_DATA,
 						 q->data, delq->data, NULL,
 						 NULL, 0,
@@ -16522,6 +16572,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 							 tbinfo->dobj.namespace->dobj.name,
 							 NULL,
 							 tbinfo->rolname,
+							 NULL,
 							 "CHECK CONSTRAINT", SECTION_POST_DATA,
 							 q->data, delq->data, NULL,
 							 NULL, 0,
@@ -16555,6 +16606,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 							 tyinfo->dobj.namespace->dobj.name,
 							 NULL,
 							 tyinfo->rolname,
+							 NULL,
 							 "CHECK CONSTRAINT", SECTION_POST_DATA,
 							 q->data, delq->data, NULL,
 							 NULL, 0,
@@ -16829,6 +16881,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "SEQUENCE", SECTION_PRE_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -16870,6 +16923,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
 							 tbinfo->dobj.namespace->dobj.name,
 							 NULL,
 							 tbinfo->rolname,
+							 NULL,
 							 "SEQUENCE OWNED BY", SECTION_PRE_DATA,
 							 query->data, "", NULL,
 							 &(tbinfo->dobj.dumpId), 1,
@@ -16938,6 +16992,7 @@ dumpSequenceData(Archive *fout, TableDataInfo *tdinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "SEQUENCE SET", SECTION_DATA,
 					 query->data, "", NULL,
 					 &(tbinfo->dobj.dumpId), 1,
@@ -17137,6 +17192,7 @@ dumpTrigger(Archive *fout, TriggerInfo *tginfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "TRIGGER", SECTION_POST_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -17223,7 +17279,7 @@ dumpEventTrigger(Archive *fout, EventTriggerInfo *evtinfo)
 	if (evtinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
 		ArchiveEntry(fout, evtinfo->dobj.catId, evtinfo->dobj.dumpId,
 					 evtinfo->dobj.name, NULL, NULL,
-					 evtinfo->evtowner,
+					 evtinfo->evtowner, NULL,
 					 "EVENT TRIGGER", SECTION_POST_DATA,
 					 query->data, delqry->data, NULL,
 					 NULL, 0,
@@ -17384,6 +17440,7 @@ dumpRule(Archive *fout, RuleInfo *rinfo)
 					 tbinfo->dobj.namespace->dobj.name,
 					 NULL,
 					 tbinfo->rolname,
+					 NULL,
 					 "RULE", SECTION_POST_DATA,
 					 cmd->data, delcmd->data, NULL,
 					 NULL, 0,
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 789d6a24e2..4024d0c1e3 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -324,6 +324,7 @@ typedef struct _tableInfo
 	char	   *partkeydef;		/* partition key definition */
 	char	   *partbound;		/* partition bound definition */
 	bool		needs_override; /* has GENERATED ALWAYS AS IDENTITY */
+	char	   *amname; 		/* table access method */
 
 	/*
 	 * Stuff computed only for dumpable tables.
diff --git a/src/test/modules/test_pg_dump/t/001_base.pl b/src/test/modules/test_pg_dump/t/001_base.pl
index fb4ecf8aca..4c0bc695fc 100644
--- a/src/test/modules/test_pg_dump/t/001_base.pl
+++ b/src/test/modules/test_pg_dump/t/001_base.pl
@@ -322,6 +322,19 @@ my %tests = (
 		like => { binary_upgrade => 1, },
 	},
 
+	'CREATE ACCESS METHOD regress_test_table_am' => {
+		create_order => 11,
+        create_sql   => 'CREATE ACCESS METHOD regress_table_am TYPE TABLE HANDLER heap_tableam_handler;',
+		regexp => qr/^
+			\QCREATE ACCESS METHOD regress_table_am TYPE TABLE HANDLER heap_tableam_handler;\E
+			\n/xm,
+		like => {
+            %full_runs,
+            schema_only         => 1,
+            section_pre_data    => 1,
+        },
+	},
+
 	'COMMENT ON EXTENSION test_pg_dump' => {
 		regexp => qr/^
 			\QCOMMENT ON EXTENSION test_pg_dump \E
@@ -537,6 +550,32 @@ my %tests = (
 			schema_only      => 1,
 			section_pre_data => 1,
 		},
+	},
+
+	'SET regress_pg_dump_table_am' => {
+		create_order => 12,
+		regexp => qr/^
+			\QSET default_table_access_method = regress_table_am;\E
+			\n/xm,
+		like => {
+            %full_runs,
+            schema_only         => 1,
+            section_pre_data    => 1,
+        },
+	},
+
+	'CREATE TABLE regress_pg_dump_table_am' => {
+		create_order => 13,
+		create_sql =>
+		  'CREATE TABLE regress_pg_dump_table_am (col1 int not null, col2 int) USING regress_table_am;',
+		regexp => qr/^
+			\QCREATE TABLE public.regress_pg_dump_table_added (\E
+			\n\s+\Qcol1 integer NOT NULL,\E
+			\n\s+\Qcol2 integer\E
+			\n\);\n/xm,
+		like => {
+            binary_upgrade => 1,
+        },
 	},);
 
 #########################################
#86Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Andres Freund (#78)
1 attachment(s)
Re: Pluggable Storage - Andres's take

Hi,

Attached is a patch that adds some test scenarios for testing the
dependency of various object types on table am. Besides simple tables,
it considers materialized views, partitioned table, foreign table, and
composite types, and verifies that the dependency is created only for
those object types that support table access method.

This patch is based on commit 1bc7e6a4838 in
https://github.com/anarazel/postgres-pluggable-storage

Thanks
-Amit Khandekar

Attachments:

test_tableam_dependency.patchapplication/octet-stream; name=test_tableam_dependency.patchDownload
diff --git a/src/test/regress/expected/create_am.out b/src/test/regress/expected/create_am.out
index e15ba33..1a3686e 100644
--- a/src/test/regress/expected/create_am.out
+++ b/src/test/regress/expected/create_am.out
@@ -165,16 +165,98 @@ CREATE SEQUENCE test_seq USING heap2;
 ERROR:  syntax error at or near "USING"
 LINE 1: CREATE SEQUENCE test_seq USING heap2;
                                  ^
+--
+-- Test dependencies of various object types with their table access method.
+--
+set default_table_access_method = 'heap2';
+-- Partitioned tables : this should have tableam dependency.
+CREATE TABLE parted_heap2 ( a text, b int) PARTITION BY list (a) USING heap2;
+CREATE TABLE part1_heap2  PARTITION OF parted_heap2 for VALUES in ('a', 'b');
+-- Create these relation kinds so as to verify that no tableam dependency is
+-- created for them, in case of unsupported default_table_access_method.
+CREATE VIEW test_view AS SELECT * FROM tbl_heap2;
+CREATE SEQUENCE test_seq;
+-- Foreign tables.
+CREATE FOREIGN DATA WRAPPER fdw_heap2 VALIDATOR postgresql_fdw_validator;
+CREATE SERVER fs_heap2 FOREIGN DATA WRAPPER fdw_heap2 ;
+CREATE FOREIGN table ft_heap2 () SERVER fs_heap2;
+-- Toast tables: Base table should have the dependency, but it's toast table
+-- should not.
+CREATE TABLE toast_table_heap2(v varchar);
+-- Composite types.
+CREATE TYPE ct_heap2 AS (id int, v varchar);
+\set depend_query 'SELECT pg_describe_object(classid,objid,objsubid) AS obj FROM pg_depend, pg_am WHERE pg_depend.refclassid = $$pg_am$$::regclass AND pg_am.oid = pg_depend.refobjid AND pg_am.amname = $$heap2$$ ORDER BY 1'
+-- Show all the dependencies
+:depend_query;
+            obj             
+----------------------------
+ materialized view mv_heap2
+ table part1_heap2
+ table parted_heap2
+ table tblas_heap2
+ table tbl_heap2
+ table toast_table_heap2
+(6 rows)
+
 -- Drop table access method, but fails as objects depends on it
 DROP ACCESS METHOD heap2;
 ERROR:  cannot drop access method heap2 because other objects depend on it
 DETAIL:  table tbl_heap2 depends on access method heap2
+view test_view depends on table tbl_heap2
+table tblas_heap2 depends on access method heap2
+materialized view mv_heap2 depends on access method heap2
+table parted_heap2 depends on access method heap2
+table toast_table_heap2 depends on access method heap2
+HINT:  Use DROP ... CASCADE to drop the dependent objects too.
+-- Drop the dependent objects one by one, and verify the effect on dependencies.
+DROP TABLE parted_heap2;
+:depend_query;
+            obj             
+----------------------------
+ materialized view mv_heap2
+ table tblas_heap2
+ table tbl_heap2
+ table toast_table_heap2
+(4 rows)
+
+DROP ACCESS METHOD heap2;
+ERROR:  cannot drop access method heap2 because other objects depend on it
+DETAIL:  table tbl_heap2 depends on access method heap2
+view test_view depends on table tbl_heap2
 table tblas_heap2 depends on access method heap2
 materialized view mv_heap2 depends on access method heap2
+table toast_table_heap2 depends on access method heap2
+HINT:  Use DROP ... CASCADE to drop the dependent objects too.
+DROP MATERIALIZED VIEW mv_heap2;
+:depend_query;
+           obj           
+-------------------------
+ table tblas_heap2
+ table tbl_heap2
+ table toast_table_heap2
+(3 rows)
+
+DROP ACCESS METHOD heap2;
+ERROR:  cannot drop access method heap2 because other objects depend on it
+DETAIL:  table tbl_heap2 depends on access method heap2
+view test_view depends on table tbl_heap2
+table tblas_heap2 depends on access method heap2
+table toast_table_heap2 depends on access method heap2
 HINT:  Use DROP ... CASCADE to drop the dependent objects too.
--- Drop table access method with cascade
-DROP ACCESS METHOD heap2 CASCADE;
-NOTICE:  drop cascades to 3 other objects
-DETAIL:  drop cascades to table tbl_heap2
-drop cascades to table tblas_heap2
-drop cascades to materialized view mv_heap2
+DROP VIEW test_view;
+DROP TABLE tbl_heap2;
+DROP TABLE tblas_heap2;
+DROP TABLE toast_table_heap2;
+:depend_query;
+ obj 
+-----
+(0 rows)
+
+DROP ACCESS METHOD heap2;
+-- cleanup
+reset default_table_access_method;
+DROP FOREIGN TABLE ft_heap2;
+DROP SERVER fs_heap2;
+DROP FOREIGN DATA WRAPPER fdw_heap2;
+DROP SEQUENCE test_seq;
+DROP TYPE ct_heap2;
diff --git a/src/test/regress/sql/create_am.sql b/src/test/regress/sql/create_am.sql
index 2c7b481..75cb512 100644
--- a/src/test/regress/sql/create_am.sql
+++ b/src/test/regress/sql/create_am.sql
@@ -108,8 +108,58 @@ CREATE VIEW test_view USING heap2 AS SELECT * FROM tbl_heap2;
 CREATE SEQUENCE test_seq USING heap2;
 
 
+--
+-- Test dependencies of various object types with their table access method.
+--
+
+set default_table_access_method = 'heap2';
+
+-- Partitioned tables : this should have tableam dependency.
+CREATE TABLE parted_heap2 ( a text, b int) PARTITION BY list (a) USING heap2;
+CREATE TABLE part1_heap2  PARTITION OF parted_heap2 for VALUES in ('a', 'b');
+
+-- Create these relation kinds so as to verify that no tableam dependency is
+-- created for them, in case of unsupported default_table_access_method.
+CREATE VIEW test_view AS SELECT * FROM tbl_heap2;
+CREATE SEQUENCE test_seq;
+-- Foreign tables.
+CREATE FOREIGN DATA WRAPPER fdw_heap2 VALIDATOR postgresql_fdw_validator;
+CREATE SERVER fs_heap2 FOREIGN DATA WRAPPER fdw_heap2 ;
+CREATE FOREIGN table ft_heap2 () SERVER fs_heap2;
+-- Toast tables: Base table should have the dependency, but it's toast table
+-- should not.
+CREATE TABLE toast_table_heap2(v varchar);
+-- Composite types.
+CREATE TYPE ct_heap2 AS (id int, v varchar);
+
+\set depend_query 'SELECT pg_describe_object(classid,objid,objsubid) AS obj FROM pg_depend, pg_am WHERE pg_depend.refclassid = $$pg_am$$::regclass AND pg_am.oid = pg_depend.refobjid AND pg_am.amname = $$heap2$$ ORDER BY 1'
+
+-- Show all the dependencies
+:depend_query;
+
 -- Drop table access method, but fails as objects depends on it
 DROP ACCESS METHOD heap2;
 
--- Drop table access method with cascade
-DROP ACCESS METHOD heap2 CASCADE;
+-- Drop the dependent objects one by one, and verify the effect on dependencies.
+DROP TABLE parted_heap2;
+:depend_query;
+DROP ACCESS METHOD heap2;
+
+DROP MATERIALIZED VIEW mv_heap2;
+:depend_query;
+DROP ACCESS METHOD heap2;
+
+DROP VIEW test_view;
+DROP TABLE tbl_heap2;
+DROP TABLE tblas_heap2;
+DROP TABLE toast_table_heap2;
+:depend_query;
+DROP ACCESS METHOD heap2;
+
+-- cleanup
+reset default_table_access_method;
+DROP FOREIGN TABLE ft_heap2;
+DROP SERVER fs_heap2;
+DROP FOREIGN DATA WRAPPER fdw_heap2;
+DROP SEQUENCE test_seq;
+DROP TYPE ct_heap2;
#87Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#82)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Tue, Jan 22, 2019 at 1:43 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

OK. I will work on the doc changes.

Sorry for the delay.

Attached a draft patch of doc and comments changes that I worked upon.
Currently I added comments to the callbacks that are present in the
TableAmRoutine
structure and I copied it into the docs. I am not sure whether it is a good
approach or not?
I am yet to add description for the each parameter of the callbacks for
easier understanding.

Or, Giving description of each callbacks in the docs with division of those
callbacks
according to them divided in the TableAmRoutine structure? Currently
following divisions
are available.
1. Table scan
2. Parallel table scan
3. Index scan
4. Manipulation of physical tuples
5. Non-modifying operations on individual tuples
6. DDL
7. Planner
8. Executor

Suggestions?

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0001-Doc-and-comments-update.patchapplication/octet-stream; name=0001-Doc-and-comments-update.patchDownload
From 3023a9c77892af520c392c8b6fc944ad0ff75096 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Fri, 25 Jan 2019 15:04:50 +1100
Subject: [PATCH] Doc and comments update

---
 doc/src/sgml/{indexam.sgml => am.sgml}        | 506 +++++++++++++++++-
 doc/src/sgml/catalogs.sgml                    |   5 +-
 doc/src/sgml/config.sgml                      |  24 +
 doc/src/sgml/filelist.sgml                    |   2 +-
 doc/src/sgml/postgres.sgml                    |   2 +-
 doc/src/sgml/ref/create_access_method.sgml    |  12 +-
 .../sgml/ref/create_materialized_view.sgml    |  14 +
 doc/src/sgml/ref/create_table.sgml            |  18 +-
 doc/src/sgml/ref/create_table_as.sgml         |  14 +
 doc/src/sgml/release-9.6.sgml                 |   2 +-
 doc/src/sgml/xindex.sgml                      |   2 +-
 src/backend/access/heap/heapam.c              |   8 +-
 src/backend/access/heap/heapam_handler.c      |  29 +-
 src/backend/access/table/tableam.c            |  10 +
 src/include/access/tableam.h                  | 165 ++++++
 15 files changed, 779 insertions(+), 34 deletions(-)
 rename doc/src/sgml/{indexam.sgml => am.sgml} (79%)

diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/am.sgml
similarity index 79%
rename from doc/src/sgml/indexam.sgml
rename to doc/src/sgml/am.sgml
index 05102724ea..118df57a1c 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/am.sgml
@@ -1,16 +1,480 @@
-<!-- doc/src/sgml/indexam.sgml -->
+<!-- doc/src/sgml/am.sgml -->
 
-<chapter id="indexam">
- <title>Index Access Method Interface Definition</title>
+<chapter id="am">
+ <title>Access Method Interface Definition</title>
 
   <para>
    This chapter defines the interface between the core
-   <productname>PostgreSQL</productname> system and <firstterm>index access
-   methods</firstterm>, which manage individual index types.  The core system
-   knows nothing about indexes beyond what is specified here, so it is
-   possible to develop entirely new index types by writing add-on code.
+   <productname>PostgreSQL</productname> system and <firstterm>access
+   methods</firstterm>, which manage individual <literal>INDEX</literal> 
+   and <literal>TABLE</literal> types.  The core system knows nothing
+   about these access methods beyond what is specified here, so it is
+   possible to develop entirely new access method types by writing add-on code.
+  </para>
+ 
+ <sect1 id="table-access-methods">
+  <title>Overview of Table access methods </title>
+  
+  <para>
+   All Tables in <productname>PostgreSQL</productname> are the primary
+   data store. Each table is stored as its own physical <firstterm>relation</firstterm>
+   and so is described by an entry in the <structname>pg_class</structname>
+   catalog. The contents of an table are entirely under the control of its
+   access method. (All the access methods furthermore use the standard page
+   layout described in <xref linkend="storage-page-layout"/>.)
+  </para>
+  
+ <sect2 id="table-api">
+  <title>Table access method API</title>
+  
+  <para>
+   Each table access method is described by a row in the
+   <link linkend="catalog-pg-am"><structname>pg_am</structname></link> system
+   catalog. The <structname>pg_am</structname> entry specifies a <firstterm>type</firstterm>
+   of the access method and a <firstterm>handler function</firstterm> for the
+   access method. These entries can be created and deleted using the <xref linkend="sql-create-access-method"/>
+   and <xref linkend="sql-drop-access-method"/> SQL commands.
+  </para>
+
+  <para>
+   A table access method handler function must be declared to accept a
+   single argument of type <type>internal</type> and to return the
+   pseudo-type <type>table_am_handler</type>.  The argument is a dummy value that
+   simply serves to prevent handler functions from being called directly from
+   SQL commands.  The result of the function must be a palloc'd struct of
+   type <structname>TableAmRoutine</structname>, which contains everything
+   that the core code needs to know to make use of the table access method.
+   The <structname>TableAmRoutine</structname> struct, also called the access
+   method's <firstterm>API struct</firstterm>, includes fields specifying assorted
+   fixed properties of the access method, such as whether it can support
+   bitmap scans.  More importantly, it contains pointers to support
+   functions for the access method, which do all of the real work to access
+   tables.  These support functions are plain C functions and are not
+   visible or callable at the SQL level.  The support functions are described
+   below.
+  </para>
+
+  <para>
+   The structure <structname>TableAmRoutine</structname> is defined thus:
+<programlisting>
+/*
+ * API struct for a table AM.  Note this must be allocated in a
+ * server-lifetime manner, typically as a static const struct, which then gets
+ * returned by FormData_pg_am.amhandler.
+ */
+typedef struct TableAmRoutine
+{
+	NodeTag		type;
+
+	/*
+	 * Return slot implementation suitable for storing a tuple of this AM.
+	 */
+	const TupleTableSlotOps *(*slot_callbacks) (Relation rel);
+
+
+	/* ------------------------------------------------------------------------
+	 * Table scan callbacks.
+	 * ------------------------------------------------------------------------
+	 */
+
+	/*
+	 * Returns scan descriptor that is opened on the provided relation based
+	 * on the provided keys.
+	 */
+	TableScanDesc (*scan_begin) (Relation rel,
+								 Snapshot snapshot,
+								 int nkeys, struct ScanKeyData *key,
+								 ParallelTableScanDesc parallel_scan,
+								 bool allow_strat,
+								 bool allow_sync,
+								 bool allow_pagemode,
+								 bool is_bitmapscan,
+								 bool is_samplescan,
+								 bool temp_snap);
+
+	/* API to end scan that is started */
+	void		(*scan_end) (TableScanDesc scan);
+
+	/* Restart the scan that is already started on the corresponding relation */
+	void		(*scan_rescan) (TableScanDesc scan, struct ScanKeyData *key, bool set_params,
+								bool allow_strat, bool allow_sync, bool allow_pagemode);
+
+	/* Returns the next satisfied tuple from the scan */
+	TupleTableSlot *(*scan_getnextslot) (TableScanDesc scan,
+										 ScanDirection direction, TupleTableSlot *slot);
+
+
+	/* ------------------------------------------------------------------------
+	 * Parallel table scan related functions.
+	 * ------------------------------------------------------------------------
+	 */
+
+	/*
+	 * Returns the total size that is required for the relation to perform
+	 * parallel sequential scan on the relation
+	 */
+	Size		(*parallelscan_estimate) (Relation rel);
+
+	/*
+	 * Initialize the parallel scan of the relation that is necessary and also
+	 * return the total size that is required for storing the parallel scan
+	 */
+	Size		(*parallelscan_initialize) (Relation rel, ParallelTableScanDesc parallel_scan);
+
+	/*
+	 * Reinitialize the parallel scan structure parameters that are necessary to
+	 * restart the parallel scan again.
+	 */
+	void		(*parallelscan_reinitialize) (Relation rel, ParallelTableScanDesc parallel_scan);
+
+
+	/* ------------------------------------------------------------------------
+	 * Index Scan Callbacks
+	 * ------------------------------------------------------------------------
+	 */
+
+	/*
+	 * Returns the allocated prepared the IndexFetchTableData structure
+	 * for the relation.
+	 */
+	struct IndexFetchTableData *(*begin_index_fetch) (Relation rel);
+
+	/* Resets the internal members of the IndexFetchTableData structure */
+	void		(*reset_index_fetch) (struct IndexFetchTableData *data);
+
+	/* Frees the IndexFetchTableData that is allocated */
+	void		(*end_index_fetch) (struct IndexFetchTableData *data);
+
+	/*
+	 * Compute the newest xid among the tuples pointed to by items. This is
+	 * used to compute what snapshots to conflict with when replaying WAL
+	 * records for page-level index vacuums.
+	 */
+	TransactionId (*compute_xid_horizon_for_tuples) (Relation rel,
+													 ItemPointerData *items,
+													 int nitems);
+
+
+	/* ------------------------------------------------------------------------
+	 * Manipulations of physical tuples.
+	 * ------------------------------------------------------------------------
+	 */
+
+	/*
+	 * Insert the tuple into the relation specified and provide the location
+	 * of the tuple in the form of ItemPointerData and also use the
+	 * BulkInsertStateData if available.
+	 */
+	void		(*tuple_insert) (Relation rel, TupleTableSlot *slot, CommandId cid,
+								 int options, struct BulkInsertStateData *bistate);
+
+	/*
+	 * It is similar like tuple_insert API, but it inserts the tuple with
+	 * speculative token, to confirm the success of the operation.
+	 */
+	void		(*tuple_insert_speculative) (Relation rel,
+											 TupleTableSlot *slot,
+											 CommandId cid,
+											 int options,
+											 struct BulkInsertStateData *bistate,
+											 uint32 specToken);
+
+	/*
+	 * API to complete the speculative insert that is done by the
+	 * tuple_insert_speculative, returns the result based on the
+	 * success of the operation.
+	 */
+	void		(*tuple_complete_speculative) (Relation rel,
+											   TupleTableSlot *slot,
+											   uint32 specToken,
+											   bool succeeded);
+
+	/*
+	 * Deletes the tuple of a relation pointed by the ItemPointer
+	 * and returns the result of the operation. In case of any failure
+	 * updates the hufd.
+	 */
+	HTSU_Result (*tuple_delete) (Relation rel,
+								 ItemPointer tid,
+								 CommandId cid,
+								 Snapshot snapshot,
+								 Snapshot crosscheck,
+								 bool wait,
+								 HeapUpdateFailureData *hufd,
+								 bool changingPart);
+
+	/*
+	 * Update a tuple with the new tuple pointed by the ItemPointer and
+	 * returns the result of the operation and also updates the flag to
+	 * indicate whether the index needs an update or not? In case of any
+	 * failure, it update the hufd flag.
+	 */
+	HTSU_Result (*tuple_update) (Relation rel,
+								 ItemPointer otid,
+								 TupleTableSlot *slot,
+								 CommandId cid,
+								 Snapshot snapshot,
+								 Snapshot crosscheck,
+								 bool wait,
+								 HeapUpdateFailureData *hufd,
+								 LockTupleMode *lockmode,
+								 bool *update_indexes);
+
+	/*
+	 * Insert multiple tuples into the relation for faster data insertion.
+	 * It can use the BulkInsertStateData if available.
+	 */
+	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
+								 CommandId cid, int options, struct BulkInsertStateData *bistate);
+
+	/*
+	 * Locks the tuple record specified by the ItemPointer after getting the
+	 * latest record and returns the result of the operation. In case of
+	 * failure updates the hufd.
+	 */
+	HTSU_Result (*tuple_lock) (Relation rel,
+							   ItemPointer tid,
+							   Snapshot snapshot,
+							   TupleTableSlot *slot,
+							   CommandId cid,
+							   LockTupleMode mode,
+							   LockWaitPolicy wait_policy,
+							   uint8 flags,
+							   HeapUpdateFailureData *hufd);
+
+	/*
+	 * Perform operations necessary to complete insertions made via
+	 * tuple_insert and multi_insert with a BulkInsertState specified. This
+	 * e.g. may e.g. used to flush the relation when inserting with skipping
+	 * WAL.
+	 *
+	 * May be NULL.
+	 */
+	void		(*finish_bulk_insert) (Relation rel, int options);
+
+
+	/* ------------------------------------------------------------------------
+	 * Non-modifying operations on individual tuples.
+	 * ------------------------------------------------------------------------
+	 */
+
+	/*
+	 * Fetches the latest tuple specified by the ItemPointer and store it
+	 * in the slot.
+	 */
+	bool		(*tuple_fetch_row_version) (Relation rel,
+											ItemPointer tid,
+											Snapshot snapshot,
+											TupleTableSlot *slot,
+											Relation stats_relation);
+
+	/*
+	 * Gets the latest ItemPointer of the tuple based on the specified
+	 * ItemPointer.
+	 *
+	 * For example, in the case of Heap AM, the update chains are created
+	 * whenever any tuple is updated. This API is useful to find out
+	 * latest ItemPointer.
+	 */
+	void		(*tuple_get_latest_tid) (Relation rel,
+										 Snapshot snapshot,
+										 ItemPointer tid);
+
+	/*
+	 * Fetches the tuple pointed by the ItemPointer based on the
+	 * IndexFetchTableData and store it in the specified slot and
+	 * also updates the flags.
+	 */
+	bool		(*tuple_fetch_follow) (struct IndexFetchTableData *scan,
+									   ItemPointer tid,
+									   Snapshot snapshot,
+									   TupleTableSlot *slot,
+									   bool *call_again, bool *all_dead);
+
+	/*
+	 * Performs the tuple visibility according to the snapshot and returns
+	 * "true" if is visible otherwise "false"
+	 */
+	bool		(*tuple_satisfies_snapshot) (Relation rel,
+											 TupleTableSlot *slot,
+											 Snapshot snapshot);
+
+
+	/* ------------------------------------------------------------------------
+	 * DDL related functionality.
+	 * ------------------------------------------------------------------------
+	 */
+
+	/*
+	 * Creates the storage that is necessary to store the tuples of the
+	 * relation and also updates the minimum XID that is possible to insert
+	 * the tuples.
+	 */
+	void		(*relation_set_new_filenode) (Relation rel,
+											  char persistence,
+											  TransactionId *freezeXid,
+											  MultiXactId *minmulti);
+
+	/*
+	 * Truncate the specified relation, this operation is not non-reversible.
+	 */
+	void		(*relation_nontransactional_truncate) (Relation rel);
+
+	/*
+	 * Performs copy of the relation existing data to the new filenodes
+	 * specified by the newrnode and removes the existing filenodes.
+	 */
+	void		(*relation_copy_data) (Relation rel, RelFileNode newrnode);
+
+	/*
+	 * Performs vacuuming of the relation based on the specified params.
+	 * It Gathers all the dead tuples of the relation and clean them including
+	 * the indexes.
+	 */
+	void		(*relation_vacuum) (Relation onerel, int options,
+									struct VacuumParams *params, BufferAccessStrategy bstrategy);
+
+	/*
+	 * Prepares the block of the relation specified for the analysis of tuples that
+	 * are present in the block.
+	 */
+	void		(*scan_analyze_next_block) (TableScanDesc scan, BlockNumber blockno,
+											BufferAccessStrategy bstrategy);
+
+	/*
+	 * Scans all the tuples in the page based on the snapshot and return the
+	 * visible tuple, and also update the stats related to the page.
+	 */
+	bool		(*scan_analyze_next_tuple) (TableScanDesc scan, TransactionId OldestXmin,
+											double *liverows, double *deadrows, TupleTableSlot *slot);
+
+	/*
+	 * Reorganizes the relation data in the new filenode according to the
+	 * specified index. All the tuples the new file node are ordered similar like
+	 * index and also removes some of the dead tuples.
+	 */
+	void		(*relation_copy_for_cluster) (Relation NewHeap, Relation OldHeap, Relation OldIndex,
+											  bool use_sort,
+											  TransactionId OldestXmin, TransactionId FreezeXid, MultiXactId MultiXactCutoff,
+											  double *num_tuples, double *tups_vacuumed, double *tups_recently_dead);
+
+	/*
+	 * Performs the range scan of the relation instead of performing the full scan.
+	 * Range scan can be specified by start and end block numbers. In case if there
+	 * is no range, specify the InvalidBlockNumber for end block.
+	 */
+	double		(*index_build_range_scan) (Relation heap_rel,
+										   Relation index_rel,
+										   IndexInfo *index_nfo,
+										   bool allow_sync,
+										   bool anyvisible,
+										   BlockNumber start_blockno,
+										   BlockNumber end_blockno,
+										   IndexBuildCallback callback,
+										   void *callback_state,
+										   TableScanDesc scan);
+
+	/*
+	 * Performs the table scan and insert the satisfied records into the index.
+	 * This API is similar like index_build_range_scan, but this is used for
+	 * the scenario where the index is getting build concurrently.
+	 */
+	void		(*index_validate_scan) (Relation heap_rel,
+										Relation index_rel,
+										IndexInfo *index_info,
+										Snapshot snapshot,
+										struct ValidateIndexState *state);
+
+
+	/* ------------------------------------------------------------------------
+	 * Planner related functions.
+	 * ------------------------------------------------------------------------
+	 */
+
+	/*
+	 * Estimates the total size of the relation and also returns the number of
+	 * pages, tuples and etc related to the corresponding relation.
+	 */
+	void		(*relation_estimate_size) (Relation rel, int32 *attr_widths,
+										   BlockNumber *pages, double *tuples, double *allvisfrac);
+
+
+	/* ------------------------------------------------------------------------
+	 * Executor related functions.
+	 * ------------------------------------------------------------------------
+	 */
+
+	/*
+	 * Scans the page for all the tuples and store all of their ItemPointers
+	 * based on the visibility.
+	 */
+	bool		(*scan_bitmap_pagescan) (TableScanDesc scan,
+										 TBMIterateResult *tbmres);
+
+	/* Returns the next tuple of the scan */
+	bool		(*scan_bitmap_pagescan_next) (TableScanDesc scan,
+											  TupleTableSlot *slot);
+
+	/*
+	 * Get the next block for sampling based on the sampling method that is
+	 * available or sequentially to get the next from the scan.
+	 */
+	bool		(*scan_sample_next_block) (TableScanDesc scan,
+										   struct SampleScanState *scanstate);
+
+	/*
+	 * Get the next tuple to sample from the current sampling block based on
+	 * the sampling method, otherwise get the next visible tuple of the block.
+	 */
+	bool		(*scan_sample_next_tuple) (TableScanDesc scan,
+										   struct SampleScanState *scanstate,
+										   TupleTableSlot *slot);
+} TableAmRoutine;
+
+</programlisting>
+  </para>
+ </sect2> 
+  
+ <sect2>
+  <title>Table scanning</title>
+  
+  <para>
   </para>
+ </sect2>
+ 
+ <sect2>
+  <title>Table insert/update/delete</title>
 
+  <para>
+  </para>
+  </sect2>
+ 
+ <sect2>
+  <title>Table locking</title>
+
+  <para>
+  </para>
+  </sect2>
+ 
+ <sect2>
+  <title>Table vacuum</title>
+
+  <para>
+  </para>
+ </sect2>
+ 
+ <sect2>
+  <title>Table fetch</title>
+
+  <para>
+  </para>
+ </sect2>
+ 
+ </sect1> 
+ 
+ <sect1 id="index-access-methods">
+  <title>Overview of Index access methods</title>
+  
   <para>
    All indexes in <productname>PostgreSQL</productname> are what are known
    technically as <firstterm>secondary indexes</firstterm>; that is, the index is
@@ -42,8 +506,8 @@
    dead tuples are reclaimed (by vacuuming) when the dead tuples themselves
    are reclaimed.
   </para>
-
- <sect1 id="index-api">
+  
+ <sect2 id="index-api">
   <title>Basic API Structure for Indexes</title>
 
   <para>
@@ -217,9 +681,9 @@ typedef struct IndexAmRoutine
    conditions.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-functions">
+ <sect2 id="index-functions">
   <title>Index Access Method Functions</title>
 
   <para>
@@ -710,9 +1174,11 @@ amparallelrescan (IndexScanDesc scan);
    the beginning.
   </para>
 
- </sect1>
+ </sect2>
+ 
+ 
 
- <sect1 id="index-scanning">
+ <sect2 id="index-scanning">
   <title>Index Scanning</title>
 
   <para>
@@ -865,9 +1331,9 @@ amparallelrescan (IndexScanDesc scan);
    if its internal implementation is unsuited to one API or the other.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-locking">
+ <sect2 id="index-locking">
   <title>Index Locking Considerations</title>
 
   <para>
@@ -979,9 +1445,9 @@ amparallelrescan (IndexScanDesc scan);
    reduce the frequency of such transaction cancellations.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-unique-checks">
+ <sect2 id="index-unique-checks">
   <title>Index Uniqueness Checks</title>
 
   <para>
@@ -1128,9 +1594,9 @@ amparallelrescan (IndexScanDesc scan);
     </itemizedlist>
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-cost-estimation">
+ <sect2 id="index-cost-estimation">
   <title>Index Cost Estimation Functions</title>
 
   <para>
@@ -1377,5 +1843,7 @@ cost_qual_eval(&amp;index_qual_cost, path-&gt;indexquals, root);
    Examples of cost estimator functions can be found in
    <filename>src/backend/utils/adt/selfuncs.c</filename>.
   </para>
+ </sect2>
  </sect1>
+ 
 </chapter>
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index af4d0625ea..35122035e5 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -587,8 +587,9 @@
    The catalog <structname>pg_am</structname> stores information about
    relation access methods.  There is one row for each access method supported
    by the system.
-   Currently, only indexes have access methods.  The requirements for index
-   access methods are discussed in detail in <xref linkend="indexam"/>.
+   Currently, only <literal>INDEX</literal> and <literal>TABLE</literal> have
+   access methods.  The requirements for access methods are discussed in detail
+   in <xref linkend="am"/>.
   </para>
 
   <table>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b6f5822b84..0f62270cee 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7168,6 +7168,30 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-default-table-access-method" xreflabel="default_table_access_method">
+      <term><varname>default_table_access_method</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>default_table_access_method</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        This variable specifies the default table access method using which to create
+        objects (tables and materialized views) when a <command>CREATE</command> command does
+        not explicitly specify an access method.
+       </para>
+
+       <para>
+        The value is either the name of a table access method, or an empty string
+        to specify using the default table access method of the current database.
+        If the value does not match the name of any existing table access methods,
+        <productname>PostgreSQL</productname> will automatically use the default
+        table access method of the current database.
+       </para>
+
+      </listitem>
+     </varlistentry>
+     
      <varlistentry id="guc-default-tablespace" xreflabel="default_tablespace">
       <term><varname>default_tablespace</varname> (<type>string</type>)
       <indexterm>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 5dfdf54815..fed460b7c3 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -89,7 +89,7 @@
 <!ENTITY gin        SYSTEM "gin.sgml">
 <!ENTITY brin       SYSTEM "brin.sgml">
 <!ENTITY planstats    SYSTEM "planstats.sgml">
-<!ENTITY indexam    SYSTEM "indexam.sgml">
+<!ENTITY am         SYSTEM "am.sgml">
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 96d196d229..9dce0c5f81 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -250,7 +250,7 @@
   &tablesample-method;
   &custom-scan;
   &geqo;
-  &indexam;
+  &am;
   &generic-wal;
   &btree;
   &gist;
diff --git a/doc/src/sgml/ref/create_access_method.sgml b/doc/src/sgml/ref/create_access_method.sgml
index 851c5e63be..256914022a 100644
--- a/doc/src/sgml/ref/create_access_method.sgml
+++ b/doc/src/sgml/ref/create_access_method.sgml
@@ -61,7 +61,8 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
     <listitem>
      <para>
       This clause specifies the type of access method to define.
-      Only <literal>INDEX</literal> is supported at present.
+      Only <literal>INDEX</literal> and <literal>TABLE</literal>
+      are supported at present.
      </para>
     </listitem>
    </varlistentry>
@@ -76,9 +77,12 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
       declared to take a single argument of type <type>internal</type>,
       and its return type depends on the type of access method;
       for <literal>INDEX</literal> access methods, it must
-      be <type>index_am_handler</type>.  The C-level API that the handler
-      function must implement varies depending on the type of access method.
-      The index access method API is described in <xref linkend="indexam"/>.
+      be <type>index_am_handler</type> and for <literal>TABLE</literal>
+      access methods, it must be <type>table_am_handler</type>.
+      The C-level API that the handler function must implement varies
+      depending on the type of access method. The index access method API
+      is described in <xref linkend="index-access-methods"/> and the table access method
+      API is described in <xref linkend="table-access-methods"/>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/doc/src/sgml/ref/create_materialized_view.sgml b/doc/src/sgml/ref/create_materialized_view.sgml
index 7f31ab4d26..3a052ee6a4 100644
--- a/doc/src/sgml/ref/create_materialized_view.sgml
+++ b/doc/src/sgml/ref/create_materialized_view.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 <synopsis>
 CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     [ (<replaceable>column_name</replaceable> [, ...] ) ]
+    [ USING <replaceable class="parameter">method</replaceable> ]
     [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) ]
     [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
     AS <replaceable>query</replaceable>
@@ -85,6 +86,19 @@ CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional access method for the new materialize view;
+      see <xref linkend="table-access-methods"/> for more information.
+      If this option is not specified, then the default table access method
+      is chosen for the new materialized view. see <xref linkend="guc-default-table-access-method"/>
+      for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 857515ec8f..72a1a785e7 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -29,6 +29,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 ] )
 [ INHERITS ( <replaceable>parent_table</replaceable> [, ... ] ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -40,6 +41,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     [, ... ]
 ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -51,6 +53,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     [, ... ]
 ) ] { FOR VALUES <replaceable class="parameter">partition_bound_spec</replaceable> | DEFAULT }
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -953,7 +956,7 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
 
      <para>
       The access method must support <literal>amgettuple</literal> (see <xref
-      linkend="indexam"/>); at present this means <acronym>GIN</acronym>
+      linkend="index-access-methods"/>); at present this means <acronym>GIN</acronym>
       cannot be used.  Although it's allowed, there is little point in using
       B-tree or hash indexes with an exclusion constraint, because this
       does nothing that an ordinary unique constraint doesn't do better.
@@ -1136,6 +1139,19 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional access method for the new table;
+      see <xref linkend="table-access-methods"/> for more information.
+      If this option is not specified, then the default table access method
+      is chosen for the new table. see <xref linkend="guc-default-table-access-method"/>
+      for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 679e8f521e..90c9dbdaa5 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 <synopsis>
 CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     [ (<replaceable>column_name</replaceable> [, ...] ) ]
+    [ USING <replaceable class="parameter">method</replaceable> ]
     [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
     [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
     [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -120,6 +121,19 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional access method for the new table;
+      see <xref linkend="table-access-methods"/> for more information.
+      If this option is not specified, then the default table access method
+      is chosen for the new table. see <xref linkend="guc-default-table-access-method"/>
+      for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
diff --git a/doc/src/sgml/release-9.6.sgml b/doc/src/sgml/release-9.6.sgml
index acebcc6249..c0a96c2cce 100644
--- a/doc/src/sgml/release-9.6.sgml
+++ b/doc/src/sgml/release-9.6.sgml
@@ -10763,7 +10763,7 @@ This commit is also listed under libpq and PL/pgSQL
 2016-08-13 [ed0097e4f] Add SQL-accessible functions for inspecting index AM pro
 -->
        <para>
-        Restructure <link linkend="indexam">index access
+        Restructure <link linkend="index-access-methods">index access
         method <acronym>API</acronym></link> to hide most of it at
         the <application>C</application> level (Alexander Korotkov, Andrew Gierth)
        </para>
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml
index 9446f8b836..4fa821160c 100644
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@@ -36,7 +36,7 @@
    described in <classname>pg_am</classname>.  It is possible to add a
    new index access method by writing the necessary code and
    then creating an entry in <classname>pg_am</classname> &mdash; but that is
-   beyond the scope of this chapter (see <xref linkend="indexam"/>).
+   beyond the scope of this chapter (see <xref linkend="am"/>).
   </para>
 
   <para>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6655a95433..328626d633 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1098,7 +1098,10 @@ fastgetattr(HeapTuple tup, int attnum, TupleDesc tupleDesc,
  * ----------------------------------------------------------------
  */
 
-
+/*
+ * Returns the prepared Heap scan descriptor that is opened on the provided
+ * relation using the other members.
+ */
 TableScanDesc
 heap_beginscan(Relation relation, Snapshot snapshot,
 			   int nkeys, ScanKey key,
@@ -1332,6 +1335,9 @@ heap_getnext(TableScanDesc sscan, ScanDirection direction)
 #define HEAPAMSLOTDEBUG_3
 #endif
 
+/*
+ * Retrieve slot with next tuple in scan
+ */
 TupleTableSlot *
 heap_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *slot)
 {
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3dc1444739..5f7b39360c 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -63,13 +63,18 @@ reform_and_rewrite_tuple(HeapTuple tuple,
  * ----------------------------------------------------------------
  */
 
+/*
+ * Return slot implementation suitable for storing a tuple.
+ */
 static const TupleTableSlotOps *
 heapam_slot_callbacks(Relation relation)
 {
 	return &TTSOpsBufferHeapTuple;
 }
 
-
+/*
+ *
+ */
 static IndexFetchTableData *
 heapam_begin_index_fetch(Relation rel)
 {
@@ -107,8 +112,7 @@ heapam_end_index_fetch(IndexFetchTableData *scan)
 
 
 /*
- * Insert a heap tuple from a slot, which may contain an OID and speculative
- * insertion token.
+ * Insert a heap tuple from a slot, which may contain an OID
  */
 static void
 heapam_heap_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
@@ -128,6 +132,10 @@ heapam_heap_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
 		pfree(tuple);
 }
 
+/*
+ * Insert a heap tuple from a slot, which may contain an OID and speculative
+ * insertion token.
+ */
 static void
 heapam_heap_insert_speculative(Relation relation, TupleTableSlot *slot, CommandId cid,
 							   int options, BulkInsertState bistate, uint32 specToken)
@@ -148,6 +156,9 @@ heapam_heap_insert_speculative(Relation relation, TupleTableSlot *slot, CommandI
 		pfree(tuple);
 }
 
+/*
+ * Complete the speculative insert based on the succeeded flag.
+ */
 static void
 heapam_heap_complete_speculative(Relation relation, TupleTableSlot *slot, uint32 spekToken,
 								 bool succeeded)
@@ -179,6 +190,11 @@ heapam_heap_delete(Relation relation, ItemPointer tid, CommandId cid,
 }
 
 
+/*
+ * Updates the heaptuple pointed by the Otid with the new tuple from slot
+ * and returns the flag to indicate whether the update affects indexes
+ * also.
+ */
 static HTSU_Result
 heapam_heap_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
 				   CommandId cid, Snapshot snapshot, Snapshot crosscheck,
@@ -448,6 +464,10 @@ retry:
 	return result;
 }
 
+/*
+ * Finish the bulk insert operation by syncing the relation
+ * to the disk filenodes in case if the bulk insert skip the WAL.
+ */
 static void
 heapam_finish_bulk_insert(Relation relation, int options)
 {
@@ -460,6 +480,9 @@ heapam_finish_bulk_insert(Relation relation, int options)
 }
 
 
+/*
+ *
+ */
 static bool
 heapam_fetch_row_version(Relation relation,
 						 ItemPointer tid,
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a2da7b7809..8fed8a0e00 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -266,12 +266,19 @@ simple_table_delete(Relation rel, ItemPointer tid, Snapshot snapshot)
 }
 
 
+/*
+ * Returns the size required to perform Parallel Block table scan.
+ */
 Size
 table_block_parallelscan_estimate(Relation rel)
 {
 	return sizeof(ParallelBlockTableScanDescData);
 }
 
+/*
+ * Initializes the table block parallel scan with the relation specific
+ * information.
+ */
 Size
 table_block_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan)
 {
@@ -290,6 +297,9 @@ table_block_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan)
 	return sizeof(ParallelBlockTableScanDescData);
 }
 
+/*
+ * Reinitialize the table block parallel scan information
+ */
 void
 table_block_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
 {
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 4aa4369366..b997552fdb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -76,6 +76,10 @@ typedef struct TableAmRoutine
 	 * ------------------------------------------------------------------------
 	 */
 
+	/*
+	 * Returns scan descriptor that is opened on the provided relation based
+	 * on the provided keys.
+	 */
 	TableScanDesc (*scan_begin) (Relation rel,
 								 Snapshot snapshot,
 								 int nkeys, struct ScanKeyData *key,
@@ -86,9 +90,15 @@ typedef struct TableAmRoutine
 								 bool is_bitmapscan,
 								 bool is_samplescan,
 								 bool temp_snap);
+
+	/* API to end scan that is started */
 	void		(*scan_end) (TableScanDesc scan);
+
+	/* Restart the scan that is already started on the corresponding relation */
 	void		(*scan_rescan) (TableScanDesc scan, struct ScanKeyData *key, bool set_params,
 								bool allow_strat, bool allow_sync, bool allow_pagemode);
+
+	/* Returns the next satisfied tuple from the scan */
 	TupleTableSlot *(*scan_getnextslot) (TableScanDesc scan,
 										 ScanDirection direction, TupleTableSlot *slot);
 
@@ -97,8 +107,23 @@ typedef struct TableAmRoutine
 	 * Parallel table scan related functions.
 	 * ------------------------------------------------------------------------
 	 */
+
+	/*
+	 * Returns the total size that is required for the relation to perform
+	 * parallel sequential scan on the relation
+	 */
 	Size		(*parallelscan_estimate) (Relation rel);
+
+	/*
+	 * Initialize the parallel scan of the relation that is necessary and also
+	 * return the total size that is required for storing the parallel scan
+	 */
 	Size		(*parallelscan_initialize) (Relation rel, ParallelTableScanDesc parallel_scan);
+
+	/*
+	 * Reinitialize the parallel scan structure parameters that are necessary to
+	 * restart the parallel scan again.
+	 */
 	void		(*parallelscan_reinitialize) (Relation rel, ParallelTableScanDesc parallel_scan);
 
 
@@ -107,8 +132,16 @@ typedef struct TableAmRoutine
 	 * ------------------------------------------------------------------------
 	 */
 
+	/*
+	 * Returns the allocated prepared the IndexFetchTableData structure
+	 * for the relation.
+	 */
 	struct IndexFetchTableData *(*begin_index_fetch) (Relation rel);
+
+	/* Resets the internal members of the IndexFetchTableData structure */
 	void		(*reset_index_fetch) (struct IndexFetchTableData *data);
+
+	/* Frees the IndexFetchTableData that is allocated */
 	void		(*end_index_fetch) (struct IndexFetchTableData *data);
 
 	/*
@@ -126,18 +159,40 @@ typedef struct TableAmRoutine
 	 * ------------------------------------------------------------------------
 	 */
 
+	/*
+	 * Insert the tuple into the relation specified and provide the location
+	 * of the tuple in the form of ItemPointerData and also use the
+	 * BulkInsertStateData if available.
+	 */
 	void		(*tuple_insert) (Relation rel, TupleTableSlot *slot, CommandId cid,
 								 int options, struct BulkInsertStateData *bistate);
+
+	/*
+	 * It is similar like tuple_insert API, but it inserts the tuple with
+	 * speculative token, to confirm the success of the operation.
+	 */
 	void		(*tuple_insert_speculative) (Relation rel,
 											 TupleTableSlot *slot,
 											 CommandId cid,
 											 int options,
 											 struct BulkInsertStateData *bistate,
 											 uint32 specToken);
+
+	/*
+	 * API to complete the speculative insert that is done by the
+	 * tuple_insert_speculative, returns the result based on the
+	 * success of the operation.
+	 */
 	void		(*tuple_complete_speculative) (Relation rel,
 											   TupleTableSlot *slot,
 											   uint32 specToken,
 											   bool succeeded);
+
+	/*
+	 * Deletes the tuple of a relation pointed by the ItemPointer
+	 * and returns the result of the operation. In case of any failure
+	 * updates the hufd.
+	 */
 	HTSU_Result (*tuple_delete) (Relation rel,
 								 ItemPointer tid,
 								 CommandId cid,
@@ -146,6 +201,13 @@ typedef struct TableAmRoutine
 								 bool wait,
 								 HeapUpdateFailureData *hufd,
 								 bool changingPart);
+
+	/*
+	 * Update a tuple with the new tuple pointed by the ItemPointer and
+	 * returns the result of the operation and also updates the flag to
+	 * indicate whether the index needs an update or not? In case of any
+	 * failure, it update the hufd flag.
+	 */
 	HTSU_Result (*tuple_update) (Relation rel,
 								 ItemPointer otid,
 								 TupleTableSlot *slot,
@@ -156,8 +218,19 @@ typedef struct TableAmRoutine
 								 HeapUpdateFailureData *hufd,
 								 LockTupleMode *lockmode,
 								 bool *update_indexes);
+
+	/*
+	 * Insert multiple tuples into the relation for faster data insertion.
+	 * It can use the BulkInsertStateData if available.
+	 */
 	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
 								 CommandId cid, int options, struct BulkInsertStateData *bistate);
+
+	/*
+	 * Locks the tuple record specified by the ItemPointer after getting the
+	 * latest record and returns the result of the operation. In case of
+	 * failure updates the hufd.
+	 */
 	HTSU_Result (*tuple_lock) (Relation rel,
 							   ItemPointer tid,
 							   Snapshot snapshot,
@@ -184,19 +257,43 @@ typedef struct TableAmRoutine
 	 * ------------------------------------------------------------------------
 	 */
 
+	/*
+	 * Fetches the latest tuple specified by the ItemPointer and store it
+	 * in the slot.
+	 */
 	bool		(*tuple_fetch_row_version) (Relation rel,
 											ItemPointer tid,
 											Snapshot snapshot,
 											TupleTableSlot *slot,
 											Relation stats_relation);
+
+	/*
+	 * Gets the latest ItemPointer of the tuple based on the specified
+	 * ItemPointer.
+	 *
+	 * For example, in the case of Heap AM, the update chains are created
+	 * whenever any tuple is updated. This API is useful to find out
+	 * latest ItemPointer.
+	 */
 	void		(*tuple_get_latest_tid) (Relation rel,
 										 Snapshot snapshot,
 										 ItemPointer tid);
+
+	/*
+	 * Fetches the tuple pointed by the ItemPointer based on the
+	 * IndexFetchTableData and store it in the specified slot and
+	 * also updates the flags.
+	 */
 	bool		(*tuple_fetch_follow) (struct IndexFetchTableData *scan,
 									   ItemPointer tid,
 									   Snapshot snapshot,
 									   TupleTableSlot *slot,
 									   bool *call_again, bool *all_dead);
+
+	/*
+	 * Performs the tuple visibility according to the snapshot and returns
+	 * "true" if is visible otherwise "false"
+	 */
 	bool		(*tuple_satisfies_snapshot) (Relation rel,
 											 TupleTableSlot *slot,
 											 Snapshot snapshot);
@@ -207,22 +304,64 @@ typedef struct TableAmRoutine
 	 * ------------------------------------------------------------------------
 	 */
 
+	/*
+	 * Creates the storage that is necessary to store the tuples of the
+	 * relation and also updates the minimum XID that is possible to insert
+	 * the tuples.
+	 */
 	void		(*relation_set_new_filenode) (Relation rel,
 											  char persistence,
 											  TransactionId *freezeXid,
 											  MultiXactId *minmulti);
+
+	/*
+	 * Truncate the specified relation, this operation is not non-reversible.
+	 */
 	void		(*relation_nontransactional_truncate) (Relation rel);
+
+	/*
+	 * Performs copy of the relation existing data to the new filenodes
+	 * specified by the newrnode and removes the existing filenodes.
+	 */
 	void		(*relation_copy_data) (Relation rel, RelFileNode newrnode);
+
+	/*
+	 * Performs vacuuming of the relation based on the specified params.
+	 * It Gathers all the dead tuples of the relation and clean them including
+	 * the indexes.
+	 */
 	void		(*relation_vacuum) (Relation onerel, int options,
 									struct VacuumParams *params, BufferAccessStrategy bstrategy);
+
+	/*
+	 * Prepares the block of the relation specified for the analysis of tuples that
+	 * are present in the block.
+	 */
 	void		(*scan_analyze_next_block) (TableScanDesc scan, BlockNumber blockno,
 											BufferAccessStrategy bstrategy);
+
+	/*
+	 * Scans all the tuples in the page based on the snapshot and return the
+	 * visible tuple, and also update the stats related to the page.
+	 */
 	bool		(*scan_analyze_next_tuple) (TableScanDesc scan, TransactionId OldestXmin,
 											double *liverows, double *deadrows, TupleTableSlot *slot);
+
+	/*
+	 * Reorganizes the relation data in the new filenode according to the
+	 * specified index. All the tuples the new file node are ordered similar like
+	 * index and also removes some of the dead tuples.
+	 */
 	void		(*relation_copy_for_cluster) (Relation NewHeap, Relation OldHeap, Relation OldIndex,
 											  bool use_sort,
 											  TransactionId OldestXmin, TransactionId FreezeXid, MultiXactId MultiXactCutoff,
 											  double *num_tuples, double *tups_vacuumed, double *tups_recently_dead);
+
+	/*
+	 * Performs the range scan of the relation instead of performing the full scan.
+	 * Range scan can be specified by start and end block numbers. In case if there
+	 * is no range, specify the InvalidBlockNumber for end block.
+	 */
 	double		(*index_build_range_scan) (Relation heap_rel,
 										   Relation index_rel,
 										   IndexInfo *index_nfo,
@@ -233,6 +372,12 @@ typedef struct TableAmRoutine
 										   IndexBuildCallback callback,
 										   void *callback_state,
 										   TableScanDesc scan);
+
+	/*
+	 * Performs the table scan and insert the satisfied records into the index.
+	 * This API is similar like index_build_range_scan, but this is used for
+	 * the scenario where the index is getting build concurrently.
+	 */
 	void		(*index_validate_scan) (Relation heap_rel,
 										Relation index_rel,
 										IndexInfo *index_info,
@@ -245,6 +390,10 @@ typedef struct TableAmRoutine
 	 * ------------------------------------------------------------------------
 	 */
 
+	/*
+	 * Estimates the total size of the relation and also returns the number of
+	 * pages, tuples and etc related to the corresponding relation.
+	 */
 	void		(*relation_estimate_size) (Relation rel, int32 *attr_widths,
 										   BlockNumber *pages, double *tuples, double *allvisfrac);
 
@@ -254,12 +403,28 @@ typedef struct TableAmRoutine
 	 * ------------------------------------------------------------------------
 	 */
 
+	/*
+	 * Scans the page for all the tuples and store all of their ItemPointers
+	 * based on the visibility.
+	 */
 	bool		(*scan_bitmap_pagescan) (TableScanDesc scan,
 										 TBMIterateResult *tbmres);
+
+	/* Returns the next tuple of the scan */
 	bool		(*scan_bitmap_pagescan_next) (TableScanDesc scan,
 											  TupleTableSlot *slot);
+
+	/*
+	 * Get the next block for sampling based on the sampling method that is
+	 * available or sequentially to get the next from the scan.
+	 */
 	bool		(*scan_sample_next_block) (TableScanDesc scan,
 										   struct SampleScanState *scanstate);
+
+	/*
+	 * Get the next tuple to sample from the current sampling block based on
+	 * the sampling method, otherwise get the next visible tuple of the block.
+	 */
 	bool		(*scan_sample_next_tuple) (TableScanDesc scan,
 										   struct SampleScanState *scanstate,
 										   TupleTableSlot *slot);
-- 
2.20.1.windows.1

#88Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Andres Freund (#78)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Mon, 21 Jan 2019 at 08:31, Andres Freund <andres@anarazel.de> wrote:

Hi,

(resending with compressed attachements, perhaps that'll go through)

On 2018-12-10 18:13:40 -0800, Andres Freund wrote:

On 2018-11-26 17:55:57 -0800, Andres Freund wrote:

FWIW, now that oids are removed, and the tuple table slot abstraction
got in, I'm working on rebasing the pluggable storage patchset ontop of
that.

I've pushed a version to that to the git tree, including a rebased
version of zheap:
https://github.com/anarazel/postgres-pluggable-storage

I worked on a slight improvement on the
0040-WIP-Move-xid-horizon-computation-for-page-level patch . Instead
of pre-fetching all the required buffers beforehand, the attached WIP
patch pre-fetches the buffers keeping a constant distance ahead of the
buffer reads. It's a WIP patch because right now it just uses a
hard-coded 5 buffers ahead. Haven't used effective_io_concurrency like
how it is done in nodeBitmapHeapscan.c. Will do that next. But before
that, any comments on the way I did the improvements would be nice.

Note that for now, the patch is based on the pluggable-storage latest
commit; it does not replace the 0040 patch in the patch series.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

prefetch_xid_horizon_scan_WIP.patchapplication/octet-stream; name=prefetch_xid_horizon_scan_WIP.patchDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8837f83..c87fd47 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -162,6 +162,14 @@ static const struct
 #define ConditionalLockTupleTuplock(rel, tup, mode) \
 	ConditionalLockTuple((rel), (tup), tupleLockExtraInfo[mode].hwlock)
 
+#ifdef USE_PREFETCH
+typedef struct
+{
+	int		next_item;
+	BlockNumber		cur_hblkno;
+} PrefetchState;
+#endif
+
 /*
  * This table maps tuple lock strength values for each particular
  * MultiXactStatus value.
@@ -6990,6 +6998,45 @@ HeapTupleHeaderAdvanceLatestRemovedXid(HeapTupleHeader tuple,
 	/* *latestRemovedXid may still be invalid at end */
 }
 
+#ifdef USE_PREFETCH
+/*
+ * prefetch_buffer
+ *
+ * Pre-fetch 'prefetch_count' numnber of buffers.
+ * Continues to scan the tids array from the last position that was scanned
+ * for previous pre-fetching.
+ */
+static void
+prefetch_buffer(Relation rel, PrefetchState *prefetch_state,
+				ItemPointerData *tids, int nitems, int prefetch_count)
+{
+	int		count = 0;
+	BlockNumber		hblkno = prefetch_state->cur_hblkno;
+	int		i;
+
+	for (i = prefetch_state->next_item; i < nitems && count < prefetch_count; i++)
+	{
+		ItemPointer htid = &tids[i];
+
+		if (hblkno == InvalidBlockNumber ||
+			ItemPointerGetBlockNumber(htid) != hblkno)
+		{
+			hblkno = ItemPointerGetBlockNumber(htid);
+
+			count++;
+			PrefetchBuffer(rel, MAIN_FORKNUM, hblkno);
+		}
+	}
+
+	/*
+	 * Keep track of the item pointer array position, so that next time we can
+	 * continue from this position.
+	 */
+	prefetch_state->next_item = i;
+	prefetch_state->cur_hblkno = hblkno;
+}
+#endif
+
 /*
  * Get the latestRemovedXid from the heap pages pointed at by the index
  * tuples being deleted.
@@ -7011,6 +7058,7 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
 	BlockNumber hblkno;
 	Buffer		buf = InvalidBuffer;
 	Page		hpage;
+	PrefetchState prefetch_state;
 
 	/*
 	 * Sort to avoid repeated lookups for the same page, and to make it more
@@ -7023,19 +7071,10 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
 
 	/* prefetch all pages */
 #ifdef USE_PREFETCH
-	hblkno = InvalidBlockNumber;
-	for (int i = 0; i < nitems; i++)
-	{
-		ItemPointer htid = &tids[i];
-
-		if (hblkno == InvalidBlockNumber ||
-			ItemPointerGetBlockNumber(htid) != hblkno)
-		{
-			hblkno = ItemPointerGetBlockNumber(htid);
-
-			PrefetchBuffer(rel, MAIN_FORKNUM, hblkno);
-		}
-	}
+	prefetch_state.next_item = 0;
+	prefetch_state.cur_hblkno = InvalidBlockNumber;
+	prefetch_buffer(rel, &prefetch_state, tids, nitems,
+					5 /* prefetch distance ; TODO: use effective_io_concurrency */);
 #endif
 
 	/* Iterate over all tids, and check their horizon */
@@ -7063,6 +7102,13 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
 			hblkno = ItemPointerGetBlockNumber(htid);
 
 			buf = ReadBuffer(rel, hblkno);
+
+			/*
+			 * Need to maintain the prefetch distance, so prefetch a page each
+			 * time we read a new page.
+			 */
+			prefetch_buffer(rel, &prefetch_state, tids, nitems, 1);
+
 			hpage = BufferGetPage(buf);
 
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
#89Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Amit Khandekar (#88)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Wed, 6 Feb 2019 at 18:30, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On Mon, 21 Jan 2019 at 08:31, Andres Freund <andres@anarazel.de> wrote:

Hi,

(resending with compressed attachements, perhaps that'll go through)

On 2018-12-10 18:13:40 -0800, Andres Freund wrote:

On 2018-11-26 17:55:57 -0800, Andres Freund wrote:

FWIW, now that oids are removed, and the tuple table slot abstraction
got in, I'm working on rebasing the pluggable storage patchset ontop of
that.

I've pushed a version to that to the git tree, including a rebased
version of zheap:
https://github.com/anarazel/postgres-pluggable-storage

I worked on a slight improvement on the
0040-WIP-Move-xid-horizon-computation-for-page-level patch . Instead
of pre-fetching all the required buffers beforehand, the attached WIP
patch pre-fetches the buffers keeping a constant distance ahead of the
buffer reads. It's a WIP patch because right now it just uses a
hard-coded 5 buffers ahead. Haven't used effective_io_concurrency like
how it is done in nodeBitmapHeapscan.c. Will do that next. But before
that, any comments on the way I did the improvements would be nice.

Note that for now, the patch is based on the pluggable-storage latest
commit; it does not replace the 0040 patch in the patch series.

In the attached v1 patch, the prefetch_distance is calculated as
effective_io_concurrency + 10. Also it has some cosmetic changes.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

prefetch_xid_horizon_scan_v1.patchapplication/octet-stream; name=prefetch_xid_horizon_scan_v1.patchDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8837f83..d0463c7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -67,6 +67,7 @@
 #include "utils/lsyscache.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
+#include "utils/spccache.h"
 
 
 static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
@@ -162,6 +163,26 @@ static const struct
 #define ConditionalLockTupleTuplock(rel, tup, mode) \
 	ConditionalLockTuple((rel), (tup), tupleLockExtraInfo[mode].hwlock)
 
+#ifdef USE_PREFETCH
+/*
+ * Maintains the current prefetch state so as to keep it ahead of buffer reads.
+ * Used to prefetch tid buffers.
+ */
+typedef struct
+{
+	int		next_item;
+	BlockNumber cur_hblkno;
+} PrefetchState;
+
+/*
+ * An arbitrary way to come up with a pre-fetch distance that grows with io
+ * concurrency, but is at least 10 and not more than the max effective io
+ * concurrency.
+ */
+#define PREFETCH_DISTANCE(io_concurrency) Min((io_concurrency) + 10, MAX_IO_CONCURRENCY)
+
+#endif
+
 /*
  * This table maps tuple lock strength values for each particular
  * MultiXactStatus value.
@@ -6990,6 +7011,44 @@ HeapTupleHeaderAdvanceLatestRemovedXid(HeapTupleHeader tuple,
 	/* *latestRemovedXid may still be invalid at end */
 }
 
+#ifdef USE_PREFETCH
+/*
+ * prefetch_buffer
+ *
+ * Pre-fetch 'prefetch_count' number of buffers.
+ * Continues to scan the tids array from the last position that was scanned
+ * for previous pre-fetching.
+ */
+static void
+prefetch_buffer(Relation rel, PrefetchState *prefetch_state,
+				ItemPointerData *tids, int nitems, int prefetch_count)
+{
+	BlockNumber cur_hblkno = prefetch_state->cur_hblkno;
+	int		count = 0;
+	int		i;
+
+	for (i = prefetch_state->next_item; i < nitems && count < prefetch_count; i++)
+	{
+		ItemPointer htid = &tids[i];
+
+		if (cur_hblkno == InvalidBlockNumber ||
+			ItemPointerGetBlockNumber(htid) != cur_hblkno)
+		{
+			cur_hblkno = ItemPointerGetBlockNumber(htid);
+			PrefetchBuffer(rel, MAIN_FORKNUM, cur_hblkno);
+			count++;
+		}
+	}
+
+	/*
+	 * Save the prefetch position so that next time we can continue from that
+	 * position.
+	 */
+	prefetch_state->next_item = i;
+	prefetch_state->cur_hblkno = cur_hblkno;
+}
+#endif
+
 /*
  * Get the latestRemovedXid from the heap pages pointed at by the index
  * tuples being deleted.
@@ -7011,6 +7070,10 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
 	BlockNumber hblkno;
 	Buffer		buf = InvalidBuffer;
 	Page		hpage;
+#ifdef USE_PREFETCH
+	PrefetchState prefetch_state;
+	int			io_concurrency;
+#endif
 
 	/*
 	 * Sort to avoid repeated lookups for the same page, and to make it more
@@ -7021,21 +7084,13 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
 	qsort((void *) tids, nitems, sizeof(ItemPointerData),
 		  (int (*) (const void *, const void *)) ItemPointerCompare);
 
-	/* prefetch all pages */
+	/* prefetch a fixed number of pages beforehand. */
 #ifdef USE_PREFETCH
-	hblkno = InvalidBlockNumber;
-	for (int i = 0; i < nitems; i++)
-	{
-		ItemPointer htid = &tids[i];
-
-		if (hblkno == InvalidBlockNumber ||
-			ItemPointerGetBlockNumber(htid) != hblkno)
-		{
-			hblkno = ItemPointerGetBlockNumber(htid);
-
-			PrefetchBuffer(rel, MAIN_FORKNUM, hblkno);
-		}
-	}
+	prefetch_state.next_item = 0;
+	prefetch_state.cur_hblkno = InvalidBlockNumber;
+	io_concurrency = get_tablespace_io_concurrency(rel->rd_rel->reltablespace);
+	prefetch_buffer(rel, &prefetch_state, tids, nitems,
+					PREFETCH_DISTANCE(io_concurrency));
 #endif
 
 	/* Iterate over all tids, and check their horizon */
@@ -7063,6 +7118,15 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
 			hblkno = ItemPointerGetBlockNumber(htid);
 
 			buf = ReadBuffer(rel, hblkno);
+
+#ifdef USE_PREFETCH
+			/*
+			 * Need to maintain the prefetch distance, so prefetch a page each
+			 * time we read a new page.
+			 */
+			prefetch_buffer(rel, &prefetch_state, tids, nitems, 1);
+#endif
+
 			hpage = BufferGetPage(buf);
 
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
#90Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#87)
8 attachment(s)
Re: Pluggable Storage - Andres's take

On Mon, Feb 4, 2019 at 2:31 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Tue, Jan 22, 2019 at 1:43 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

OK. I will work on the doc changes.

Sorry for the delay.

Attached a draft patch of doc and comments changes that I worked upon.
Currently I added comments to the callbacks that are present in the
TableAmRoutine
structure and I copied it into the docs. I am not sure whether it is a
good approach or not?
I am yet to add description for the each parameter of the callbacks for
easier understanding.

Or, Giving description of each callbacks in the docs with division of
those callbacks
according to them divided in the TableAmRoutine structure? Currently
following divisions
are available.
1. Table scan
2. Parallel table scan
3. Index scan
4. Manipulation of physical tuples
5. Non-modifying operations on individual tuples
6. DDL
7. Planner
8. Executor

Suggestions?

Here I attached the doc patches for the pluggable storage, I divided the
API's into the above
specified groups and explained them in the docs.I can further add more
details if the approach
seems fine.

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0004-Doc-update-of-Create-access-method-type-table.patchapplication/octet-stream; name=0004-Doc-update-of-Create-access-method-type-table.patchDownload
From d573ef256a13cc803b1422b49c13581a4c6cc0d1 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 7 Feb 2019 16:22:42 +1100
Subject: [PATCH 04/17] Doc update of Create access method type table

Create access method has added the support to create
table access methods also.
---
 doc/src/sgml/catalogs.sgml                 |  4 ++--
 doc/src/sgml/ref/create_access_method.sgml | 12 ++++++++----
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index af4d0625ea..9fd0668db3 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -587,8 +587,8 @@
    The catalog <structname>pg_am</structname> stores information about
    relation access methods.  There is one row for each access method supported
    by the system.
-   Currently, only indexes have access methods.  The requirements for index
-   access methods are discussed in detail in <xref linkend="indexam"/>.
+   Currently, only tables, index and materialized views have access methods.
+   The requirements for access methods are discussed in detail in <xref linkend="am"/>.
   </para>
 
   <table>
diff --git a/doc/src/sgml/ref/create_access_method.sgml b/doc/src/sgml/ref/create_access_method.sgml
index 851c5e63be..256914022a 100644
--- a/doc/src/sgml/ref/create_access_method.sgml
+++ b/doc/src/sgml/ref/create_access_method.sgml
@@ -61,7 +61,8 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
     <listitem>
      <para>
       This clause specifies the type of access method to define.
-      Only <literal>INDEX</literal> is supported at present.
+      Only <literal>INDEX</literal> and <literal>TABLE</literal>
+      are supported at present.
      </para>
     </listitem>
    </varlistentry>
@@ -76,9 +77,12 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
       declared to take a single argument of type <type>internal</type>,
       and its return type depends on the type of access method;
       for <literal>INDEX</literal> access methods, it must
-      be <type>index_am_handler</type>.  The C-level API that the handler
-      function must implement varies depending on the type of access method.
-      The index access method API is described in <xref linkend="indexam"/>.
+      be <type>index_am_handler</type> and for <literal>TABLE</literal>
+      access methods, it must be <type>table_am_handler</type>.
+      The C-level API that the handler function must implement varies
+      depending on the type of access method. The index access method API
+      is described in <xref linkend="index-access-methods"/> and the table access method
+      API is described in <xref linkend="table-access-methods"/>.
      </para>
     </listitem>
    </varlistentry>
-- 
2.20.1.windows.1

0005-Doc-update-of-create-materialized-view-.-USING-synta.patchapplication/octet-stream; name=0005-Doc-update-of-create-materialized-view-.-USING-synta.patchDownload
From 171f435c364529a15f8028a8e4c7648acaa89d9a Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 7 Feb 2019 16:24:24 +1100
Subject: [PATCH 05/17] Doc update of create materialized view ... USING syntax

CREATE MATERIALIZED VIEW has added the support of USING
syntax to specify its own table access method.
---
 doc/src/sgml/ref/create_materialized_view.sgml | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/doc/src/sgml/ref/create_materialized_view.sgml b/doc/src/sgml/ref/create_materialized_view.sgml
index 7f31ab4d26..3a052ee6a4 100644
--- a/doc/src/sgml/ref/create_materialized_view.sgml
+++ b/doc/src/sgml/ref/create_materialized_view.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 <synopsis>
 CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     [ (<replaceable>column_name</replaceable> [, ...] ) ]
+    [ USING <replaceable class="parameter">method</replaceable> ]
     [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) ]
     [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
     AS <replaceable>query</replaceable>
@@ -85,6 +86,19 @@ CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional access method for the new materialize view;
+      see <xref linkend="table-access-methods"/> for more information.
+      If this option is not specified, then the default table access method
+      is chosen for the new materialized view. see <xref linkend="guc-default-table-access-method"/>
+      for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
-- 
2.20.1.windows.1

0006-Doc-update-of-CREATE-TABLE-.-USING-syntax.patchapplication/octet-stream; name=0006-Doc-update-of-CREATE-TABLE-.-USING-syntax.patchDownload
From 07919bf42387073d1ef9855a0174e56bbed78afd Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 7 Feb 2019 16:25:49 +1100
Subject: [PATCH 06/17] Doc update of CREATE TABLE ... USING syntax

CREATE TABLE has added the support of USING syntax
to specify the table access method.
---
 doc/src/sgml/ref/create_table.sgml | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 857515ec8f..72a1a785e7 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -29,6 +29,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 ] )
 [ INHERITS ( <replaceable>parent_table</replaceable> [, ... ] ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -40,6 +41,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     [, ... ]
 ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -51,6 +53,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     [, ... ]
 ) ] { FOR VALUES <replaceable class="parameter">partition_bound_spec</replaceable> | DEFAULT }
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -953,7 +956,7 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
 
      <para>
       The access method must support <literal>amgettuple</literal> (see <xref
-      linkend="indexam"/>); at present this means <acronym>GIN</acronym>
+      linkend="index-access-methods"/>); at present this means <acronym>GIN</acronym>
       cannot be used.  Although it's allowed, there is little point in using
       B-tree or hash indexes with an exclusion constraint, because this
       does nothing that an ordinary unique constraint doesn't do better.
@@ -1136,6 +1139,19 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional access method for the new table;
+      see <xref linkend="table-access-methods"/> for more information.
+      If this option is not specified, then the default table access method
+      is chosen for the new table. see <xref linkend="guc-default-table-access-method"/>
+      for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
-- 
2.20.1.windows.1

0007-Doc-of-CREATE-TABLE-AS-.-USING-syntax.patchapplication/octet-stream; name=0007-Doc-of-CREATE-TABLE-AS-.-USING-syntax.patchDownload
From fb2080aa10c6d9f32f2c3567575ce05723fad751 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 7 Feb 2019 16:27:20 +1100
Subject: [PATCH 07/17] Doc of CREATE TABLE AS ... USING syntax

CREATE TABLE AS has added USING syntax to specify table
access method during the table creation
---
 doc/src/sgml/ref/create_table_as.sgml | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 679e8f521e..90c9dbdaa5 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 <synopsis>
 CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     [ (<replaceable>column_name</replaceable> [, ...] ) ]
+    [ USING <replaceable class="parameter">method</replaceable> ]
     [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
     [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
     [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -120,6 +121,19 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional access method for the new table;
+      see <xref linkend="table-access-methods"/> for more information.
+      If this option is not specified, then the default table access method
+      is chosen for the new table. see <xref linkend="guc-default-table-access-method"/>
+      for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
-- 
2.20.1.windows.1

0008-Table-access-method-API-explanation.patchapplication/octet-stream; name=0008-Table-access-method-API-explanation.patchDownload
From 9cc966c261c1a9164128507ecca21b75d6fef191 Mon Sep 17 00:00:00 2001
From: Kommi <haribabuk@fast.au.fujitsu.com>
Date: Mon, 18 Feb 2019 12:41:34 +1100
Subject: [PATCH] Table access method API explanation

All the table access method API's and their details are explained.
---
 doc/src/sgml/am.sgml          | 551 +++++++++++++++++++++++++++++++++-
 doc/src/sgml/release-9.6.sgml |   2 +-
 2 files changed, 548 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/am.sgml b/doc/src/sgml/am.sgml
index 579187ed1b..7cbe00fdbe 100644
--- a/doc/src/sgml/am.sgml
+++ b/doc/src/sgml/am.sgml
@@ -18,12 +18,555 @@
   <para>
    All Tables in <productname>PostgreSQL</productname> are the primary
    data store. Each table is stored as its own physical <firstterm>relation</firstterm>
-   and so is described by an entry in the <structname>pg_class</structname>
-   catalog. The contents of an table are entirely under the control of its
-   access method. (All the access methods furthermore use the standard page
-   layout described in <xref linkend="storage-page-layout"/>.)
+   and is described by an entry in the <structname>pg_class</structname>
+   catalog. A table's content is entirely controlled by its access method, although
+   all access methods use the same standard page layout described in <xref linkend="storage-page-layout"/>.
   </para>
+  
+  <sect2 id="table-access-methods-api">
+   <title>Table access method API</title>
+   
+   <para>
+    Each table access method is described by a row in the
+    <link linkend="catalog-pg-am"><structname>pg_am</structname></link> system
+    catalog. The <structname>pg_am</structname> entry specifies a <firstterm>type</firstterm>
+    of the access method and a <firstterm>handler function</firstterm> for the
+    access method. These entries can be created and deleted using the <xref linkend="sql-create-access-method"/>
+    and <xref linkend="sql-drop-access-method"/> SQL commands.
+   </para>
+  
+   <para>
+    A table access method handler function must be declared to accept a
+    single argument of type <type>internal</type> and to return the
+    pseudo-type <type>table_am_handler</type>.  The argument is a dummy value that
+    simply serves to prevent handler functions from being called directly from
+    SQL commands.  The result of the function must be a palloc'd struct of
+    type <structname>TableAmRoutine</structname>, which contains everything
+    that the core code needs to know to make use of the table access method.
+    The <structname>TableAmRoutine</structname> struct, also called the access
+    method's <firstterm>API struct</firstterm>, includes fields specifying assorted
+    fixed properties of the access method, such as whether it can support
+    bitmap scans.  More importantly, it contains pointers to support
+    functions for the access method, which do all of the real work to access
+    tables.  These support functions are plain C functions and are not
+    visible or callable at the SQL level.  The support functions are described
+    in <structname>TableAmRoutine</structname> structure. For more details, please
+    refer the file <filename>src/include/access/tableam.h</filename>.
+   </para>
+   
+   <para>
+    Any new <literal>TABLE ACCSESS METHOD</literal> developers can refer the exisitng <literal>HEAP</literal>
+    implementation present in the <filename>src/backend/heap/heapam_handler.c</filename> for more details of
+    how it is implemented for HEAP access method.
+   </para>
+   
+   <para>
+    There are differnt type of API's that are defined and those details are below.
+   </para>
+  
+   <sect3 id="slot-implementation-function">
+    <title>Slot implementation functions</title>
+     
+   <para>
+<programlisting>
+const TupleTableSlotOps *(*slot_callbacks) (Relation rel);
+</programlisting>
+  
+    This API expects the function should return the slot implementation that is specific to the AM.
+    Following are the predefined types of slot implementations that are available,
+    <literal>TTSOpsVirtual</literal>, <literal>TTSOpsHeapTuple</literal>,
+    <literal>TTSOpsMinimalTuple</literal> and <literal>TTSOpsBufferHeapTuple</literal>.
+    The AM implementations can use any one of them. For more details of these slot 
+    specific implementations, you can refer <filename>src/include/executor/tuptable.h</filename>.
+   </para>
+   </sect3>
+   
+   <sect3 id="table-scan-functions">
+    <title>Table scan functions</title>
+     
+    <para>
+     The following API's are used for scanning of a table.
+    </para>
+   
+    <para>
+<programlisting>
+TableScanDesc (*scan_begin) (Relation rel,
+                             Snapshot snapshot,
+                             int nkeys, struct ScanKeyData *key,
+                             ParallelTableScanDesc parallel_scan,
+                             bool allow_strat,
+                             bool allow_sync,
+                             bool allow_pagemode,
+                             bool is_bitmapscan,
+                             bool is_samplescan,
+                             bool temp_snap);
+</programlisting>
+  
+     This API to start a scan of a relation pointed by <literal>rel</literal> using specified options
+     and returns the <structname>TableScanDesc</structname>. <literal>parallel_scan</literal> can be used
+     by the AM, in case if it support parallel scan.
+    </para>
+  
+    <para>
+<programlisting>
+void        (*scan_end) (TableScanDesc scan);
+</programlisting>
+  
+     This API expects the function to end the scan that is started by the API
+     <literal>scan_begin</literal>.
+    </para>
+  
+    <para>
+<programlisting>
+void        (*scan_rescan) (TableScanDesc scan, struct ScanKeyData *key, bool set_params,
+                            bool allow_strat, bool allow_sync, bool allow_pagemode);
+</programlisting>
+  
+     This API to restart the given scan that is already started by the
+     API <literal>scan_begin</literal> using the provided options, releasing
+     any resources (such as buffer pins) that are held by the scan.
+    </para>
+   
+    <para>
+<programlisting>
+TupleTableSlot *(*scan_getnextslot) (TableScanDesc scan,
+                                     ScanDirection direction, TupleTableSlot *slot);
+</programlisting>
+  
+     This API expects the function to return the next satisified tuple from the scan
+     started by the API <literal>scan_begin</literal>.
+    </para>
+    
+   </sect3>
+  
+   <sect3 id="parallel-table-scan-function">
+    <title>parallel table scan functions</title>
+   
+    <para>
+     The following API's are used to perform the parallel scan. 
+    </para>  
+   
+    <para>
+<programlisting>
+Size        (*parallelscan_estimate) (Relation rel);
+</programlisting>
+  
+     This API expects the function to return the total size that is required for the AM
+     to perform the parallel table scan. The minimum size that is required is 
+     <structname>ParallelBlockTableScanDescData</structname>.
+    </para>
+    
+    <para>
+<programlisting>
+Size        (*parallelscan_initialize) (Relation rel, ParallelTableScanDesc parallel_scan);
+</programlisting>
+  
+     This API expects the function to perform the initialization of the <literal>parallel_scan</literal>
+     that is required for the parallel scan to be performed by the AM and also return
+     the total size that is required for the AM to perform the parallel table scan.
+    </para>
+   
+    <para>
+<programlisting>
+void        (*parallelscan_reinitialize) (Relation rel, ParallelTableScanDesc parallel_scan);
+</programlisting>
+  
+     This API expects the function to reinitalize the parallel scan structure pointed by the
+     <literal>parallel_scan</literal>.
+    </para>
+    
+   </sect3>
+ 
+   <sect3 id="index-scan-functions">
+    <title>Index scan functions</title>
+     
+    <para>
+<programlisting>
+struct IndexFetchTableData *(*begin_index_fetch) (Relation rel);
+</programlisting>
+  
+     This API to return the allocated and initialized <structname>IndexFetchTableData</structname>
+     strutucture that is used to perform the table scan from the index.
+    </para>
+  
+    <para>
+<programlisting>
+void        (*reset_index_fetch) (struct IndexFetchTableData *data);
+</programlisting>
+  
+     This API to release the AM specific resources that are held by <structname>IndexFetchTableData</structname>
+     of a index scan.
+    </para>
+   
+    <para>
+<programlisting>
+void        (*end_index_fetch) (struct IndexFetchTableData *data);
+</programlisting>
+  
+     This API to release AM-specific resources held by the <structname>IndexFetchTableData</structname>
+     of a given index scan and free the memory of <structname>IndexFetchTableData</structname> itself.
+    </para>
+    
+    <para>
+<programlisting>
+TransactionId (*compute_xid_horizon_for_tuples) (Relation rel,
+                                                 ItemPointerData *items,
+                                                 int nitems);
+</programlisting>
+  
+     This API to get the newest xid among the provided tuples by <literal>items</literal>. This is used
+     to compute what snapshots to conflict with the <literal>items</literal> when replaying WAL records
+     for page-level index vacuums.
+    </para>
+    
+   </sect3>
 
+   <sect3 id="manipulation-of-physical-tuples-functions">
+    <title>Manipulation of physical tuples functions</title>
+     
+    <para>
+<programlisting>
+void        (*tuple_insert) (Relation rel, TupleTableSlot *slot, CommandId cid,
+                             int options, struct BulkInsertStateData *bistate);
+</programlisting>
+  
+     This API expects the function to insert the tuple contained in the provided slot and returns the 
+     the unique identifier of the tuple <literal>ItemPointerData</literal>, use the BulkInsertStateData if available.
+    </para>
+   
+    <para>
+<programlisting>
+void        (*tuple_insert_speculative) (Relation rel,
+                                         TupleTableSlot *slot,
+                                         CommandId cid,
+                                         int options,
+                                         struct BulkInsertStateData *bistate,
+                                         uint32 specToken);
+</programlisting>
+  
+     This API is similar like <literal>tuple_insert</literal> API, but it inserts the tuple
+     with addtional information that is necessray for speculative insertion, the insertion will be confirmed
+     later based on its successful insertion to the index.
+    </para>
+    
+    <para>
+<programlisting>
+void        (*tuple_complete_speculative) (Relation rel,
+                                           TupleTableSlot *slot,
+                                           uint32 specToken,
+                                           bool succeeded);
+</programlisting>
+  
+     This API to complete the speculative insertion of a tuple started by <literal>tuple_insert_speculative</literal>,
+     invoked after finishing the index insert and returns whether the operation is successfule or not?
+    </para>
+   
+    <para>
+<programlisting>
+HTSU_Result (*tuple_delete) (Relation rel,
+                             ItemPointer tid,
+                             CommandId cid,
+                             Snapshot snapshot,
+                             Snapshot crosscheck,
+                             bool wait,
+                             HeapUpdateFailureData *hufd,
+                             bool changingPart);
+</programlisting>
+  
+     This API expects the function to delete a tuple of the relation pointed by the ItemPointer
+     and returns the result of the operation. In case of any failure updates the hufd.
+    </para>
+   
+    <para>
+<programlisting>
+HTSU_Result (*tuple_update) (Relation rel,
+                             ItemPointer otid,
+                             TupleTableSlot *slot,
+                             CommandId cid,
+                             Snapshot snapshot,
+                             Snapshot crosscheck,
+                             bool wait,
+                             HeapUpdateFailureData *hufd,
+                             LockTupleMode *lockmode,
+                             bool *update_indexes);
+</programlisting>
+  
+     This API is expected the function to perform updating a tuple with the new tuple pointed by the ItemPointer
+     and returns the result of the operation and also updates the flag whether the index needs an update or not?
+     In case of any failure it should update the hufd flag.
+    </para>
+   
+    <para>
+<programlisting>
+void        (*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
+                             CommandId cid, int options, struct BulkInsertStateData *bistate);
+</programlisting>
+  
+     This API is expected the function to perform insertion of multiple tuples into the relation for faster data insertion.
+     use the BulkInsertStateData if available.
+    </para>
+   
+    <para>
+<programlisting>
+HTSU_Result (*tuple_lock) (Relation rel,
+                           ItemPointer tid,
+                           Snapshot snapshot,
+                           TupleTableSlot *slot,
+                           CommandId cid,
+                           LockTupleMode mode,
+                           LockWaitPolicy wait_policy,
+                           uint8 flags,
+                           HeapUpdateFailureData *hufd);
+</programlisting>
+  
+     This API is expected the function to lock the specified tuple pointed by the ItemPointer <literal>tid</literal>
+     of its newest version and returns the result of the operation. In case of failure updates the hufd.
+    </para>
+   
+    <para>
+<programlisting>
+void        (*finish_bulk_insert) (Relation rel, int options);
+</programlisting>
+  
+     This API expects the function to perform the operations necessary to complete insertions made
+     via <literal>tuple_insert</literal> and <literal>multi_insert</literal> with a
+     BulkInsertState specified. This e.g. may e.g. used to flush the relation when
+     inserting with skipping WAL or may be no operation.
+    </para>
+   
+   </sect3>
+  
+   <sect3 id="non-modifying-tuple-functions">
+    <title>Non modifying tuple functions</title>
+     
+    <para>
+<programlisting>
+bool        (*tuple_fetch_row_version) (Relation rel,
+                                        ItemPointer tid,
+                                        Snapshot snapshot,
+                                        TupleTableSlot *slot,
+                                        Relation stats_relation);
+</programlisting>
+  
+     This API expects the function to fetches the latest tuple specified by the
+     ItemPointer <literal>tid</literal> and store it in the slot. For e.g, in the
+     case if Heap AM, the update chains are created whenever the tuple is updated,
+     so the function should fetch the latest tuple.
+    </para>
+  
+    <para>
+<programlisting>
+void        (*tuple_get_latest_tid) (Relation rel,
+                                     Snapshot snapshot,
+                                     ItemPointer tid);
+</programlisting>
+  
+     This API to get the TID of the latest version of the tuple based on the specified
+     ItemPointer. For e.g, in the case of Heap AM, the update chains are created whenever
+     any tuple is updated. This API is useful to find out latest ItemPointer.
+    </para>
+   
+    <para>
+<programlisting>
+bool        (*tuple_fetch_follow) (struct IndexFetchTableData *scan,
+                                   ItemPointer tid,
+                                   Snapshot snapshot,
+                                   TupleTableSlot *slot,
+                                   bool *call_again, bool *all_dead);
+</programlisting>
+  
+     This API is used to fetch the tuple pointed by the ItemPointer based on the
+     IndexFetchTableData and store it in the specified slot and also updates the flags.
+     This API is called from the index scan operation.
+    </para>
+   
+    <para>
+<programlisting>
+bool        (*tuple_satisfies_snapshot) (Relation rel,
+                                         TupleTableSlot *slot,
+                                         Snapshot snapshot);
+</programlisting>
+  
+     This API performs the tuple visibility based on provided snapshot and returns
+     "true" if the current tuple is visible, otherwise "false".
+    </para>
+    
+   </sect3>
+   
+   <sect3 id="ddl-related-functions">
+    <title>DDL related functions</title>
+     
+    <para>
+<programlisting>
+void        (*relation_set_new_filenode) (Relation rel,
+                                          char persistence,
+                                          TransactionId *freezeXid,
+                                          MultiXactId *minmulti);
+</programlisting>
+  
+     This API expects the function to create the storage that is necessary to store the tuples of the
+     relation and also updates the minimum XID that is possible to insert the tuples. For e.g, the Heap AM,
+     should create the relfilenode that is necessary to store the heap tuples.
+    </para>
+  
+    <para>
+<programlisting>
+void        (*relation_nontransactional_truncate) (Relation rel);
+</programlisting>
+  
+     This API is used to truncate the specified relation, this operation is not non-reversible.
+    </para>
+  
+    <para>
+<programlisting>
+void        (*relation_copy_data) (Relation rel, RelFileNode newrnode);
+</programlisting>
+  
+     This API expects the function to perform the copy of the relation from existing filenode to the new filenode
+     specified by the <literal>newrnode</literal> and removes the existing filenode.
+    </para>
+   
+    <para>
+<programlisting>
+void        (*relation_vacuum) (Relation onerel, int options,
+                                struct VacuumParams *params, BufferAccessStrategy bstrategy);
+</programlisting>
+  
+     This API performs vacuuming of the relation based on the specified params.
+     It Gathers all the dead tuples of the relation and clean them including
+     the indexes.
+    </para>
+   
+    <para>
+<programlisting>
+void        (*scan_analyze_next_block) (TableScanDesc scan, BlockNumber blockno,
+                                        BufferAccessStrategy bstrategy);
+</programlisting>
+  
+     This API expects the function to return a relation block, required to perform tuple analysis.
+     Analysis of this information is used by the planner to optimize the query planning on this
+     relation.
+    </para>
+   
+    <para>
+<programlisting>
+bool        (*scan_analyze_next_tuple) (TableScanDesc scan, TransactionId OldestXmin,
+                                        double *liverows, double *deadrows, TupleTableSlot *slot);
+</programlisting>
+  
+     This API expects the function to get the next visible tuple from the block being scanned based on the snapshot
+     and also updates the number of live and dead tuples encountered.
+    </para>
+   
+    <para>
+<programlisting>
+void        (*relation_copy_for_cluster) (Relation NewHeap, Relation OldHeap, Relation OldIndex,
+                                          bool use_sort,
+                                          TransactionId OldestXmin, TransactionId FreezeXid, MultiXactId MultiXactCutoff,
+                                          double *num_tuples, double *tups_vacuumed, double *tups_recently_dead);
+</programlisting>
+  
+     This API to make a copy of the content of a relation, optionally sorted using either the specified index or by sorting
+     explicitly. It also removes the dead tuples.
+    </para>
+   
+    <para>
+<programlisting>
+double      (*index_build_range_scan) (Relation heap_rel,
+                                       Relation index_rel,
+                                       IndexInfo *index_nfo,
+                                       bool allow_sync,
+                                       bool anyvisible,
+                                       BlockNumber start_blockno,
+                                       BlockNumber end_blockno,
+                                       IndexBuildCallback callback,
+                                       void *callback_state,
+                                       TableScanDesc scan);
+</programlisting>
+  
+     This API to scan the specified blocks of a given relation and insert them into the specified index
+     using the provided the callback function.
+    </para>
+   
+    <para>
+<programlisting>
+void        (*index_validate_scan) (Relation heap_rel,
+                                    Relation index_rel,
+                                    IndexInfo *index_info,
+                                    Snapshot snapshot,
+                                    struct ValidateIndexState *state);
+</programlisting>
+  
+     This API to scan the table according to the given snapshot and insert tuples
+     satisfying the snapshot into the specified index, provided their TIDs are
+     also present in the <structname>ValidateIndexState</structname> struct;
+     this API is used as the last phase of a concurrent index build.
+    </para>
+   
+   </sect3>
+   
+   <sect3 id="planner-functions">
+    <title>planner functions</title>
+     
+    <para>
+<programlisting>
+void        (*relation_estimate_size) (Relation rel, int32 *attr_widths,
+                                       BlockNumber *pages, double *tuples, double *allvisfrac);
+</programlisting>
+  
+     This API estimates the total size of the relation and also returns the number of
+     pages, tuples and etc related to the corresponding relation.
+    </para>
+    
+   </sect3>
+   
+   <sect3 id="executor-functions">
+    <title>executor functions</title>
+     
+    <para>
+<programlisting>
+bool        (*scan_bitmap_pagescan) (TableScanDesc scan,
+                                     TBMIterateResult *tbmres);
+</programlisting>
+  
+     This API to scan the relation block specified in the scan descriptor to collect and return the
+     tuples requested by the <structname>tbmres</structname> based on the visibility.
+    </para>
+  
+    <para>
+<programlisting>
+bool        (*scan_bitmap_pagescan_next) (TableScanDesc scan,
+                                          TupleTableSlot *slot);
+</programlisting>
+  
+     This API to get the next tuple from the set of tuples of a given page specified in the scan descriptor
+     and return the provided slot; returns false in case if there are no more tuples. 
+    </para>
+   
+    <para>
+<programlisting>
+bool        (*scan_sample_next_block) (TableScanDesc scan,
+                                       struct SampleScanState *scanstate);
+</programlisting>
+  
+     This API to select the next block of a relation using the given sampling method or sequentially and
+     set its information in the scan descriptor.
+    </para>
+   
+    <para>
+<programlisting>
+bool        (*scan_sample_next_tuple) (TableScanDesc scan,
+                                       struct SampleScanState *scanstate,
+                                       TupleTableSlot *slot);
+</programlisting>
+  
+     This API get the next tuple to sample from the current sampling block based on
+     the sampling method, otherwise get the next visible tuple of the block that is 
+     choosen from the <literal>scan_sample_next_block</literal>.
+    </para>
+    
+  </sect3>	
+  </sect2>
  </sect1> 
  
  <sect1 id="index-access-methods">
diff --git a/doc/src/sgml/release-9.6.sgml b/doc/src/sgml/release-9.6.sgml
index acebcc6249..c0a96c2cce 100644
--- a/doc/src/sgml/release-9.6.sgml
+++ b/doc/src/sgml/release-9.6.sgml
@@ -10763,7 +10763,7 @@ This commit is also listed under libpq and PL/pgSQL
 2016-08-13 [ed0097e4f] Add SQL-accessible functions for inspecting index AM pro
 -->
        <para>
-        Restructure <link linkend="indexam">index access
+        Restructure <link linkend="index-access-methods">index access
         method <acronym>API</acronym></link> to hide most of it at
         the <application>C</application> level (Alexander Korotkov, Andrew Gierth)
        </para>
-- 
2.20.1.windows.1

0001-Docs-of-default_table_access_method-GUC.patchapplication/octet-stream; name=0001-Docs-of-default_table_access_method-GUC.patchDownload
From ff53bfe787c187aa1a682239cbf98e270518dc3b Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 7 Feb 2019 15:59:48 +1100
Subject: [PATCH 01/17] Docs of default_table_access_method GUC

This GUC can be set to use the default table access methods
for the tables that are created when table is not specified
it's own access method
---
 doc/src/sgml/config.sgml | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b6f5822b84..0183c439ab 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7168,6 +7168,30 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-default-table-access-method" xreflabel="default_table_access_method">
+      <term><varname>default_table_access_method</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>default_table_access_method</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        This parameter specifies the default table access method to use when creating
+        tables or materialized views if the <command>CREATE</command> command does
+        not explicitly specify an access method.
+       </para>
+
+       <para>
+        The value is either the name of a table access method, or an empty string
+        to specify using the default table access method of the current database.
+        If the value does not match the name of any existing table access method,
+        <productname>PostgreSQL</productname> will automatically use the default
+        table access method of the current database.
+       </para>
+
+      </listitem>
+     </varlistentry>
+     
      <varlistentry id="guc-default-tablespace" xreflabel="default_tablespace">
       <term><varname>default_tablespace</varname> (<type>string</type>)
       <indexterm>
-- 
2.20.1.windows.1

0002-Rename-indexam.sgml-to-am.sgml.patchapplication/octet-stream; name=0002-Rename-indexam.sgml-to-am.sgml.patchDownload
From a50d4c45c48985d70d2396ccd8e20beddbeec017 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 7 Feb 2019 16:13:56 +1100
Subject: [PATCH 02/17] Rename indexam.sgml to am.sgml

It is just a rename and necessry updates
---
 doc/src/sgml/{indexam.sgml => am.sgml} | 2 +-
 doc/src/sgml/filelist.sgml             | 2 +-
 doc/src/sgml/postgres.sgml             | 2 +-
 doc/src/sgml/xindex.sgml               | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)
 rename doc/src/sgml/{indexam.sgml => am.sgml} (99%)

diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/am.sgml
similarity index 99%
rename from doc/src/sgml/indexam.sgml
rename to doc/src/sgml/am.sgml
index 05102724ea..a9f0838ee5 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/am.sgml
@@ -1,4 +1,4 @@
-<!-- doc/src/sgml/indexam.sgml -->
+<!-- doc/src/sgml/am.sgml -->
 
 <chapter id="indexam">
  <title>Index Access Method Interface Definition</title>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 5dfdf54815..fed460b7c3 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -89,7 +89,7 @@
 <!ENTITY gin        SYSTEM "gin.sgml">
 <!ENTITY brin       SYSTEM "brin.sgml">
 <!ENTITY planstats    SYSTEM "planstats.sgml">
-<!ENTITY indexam    SYSTEM "indexam.sgml">
+<!ENTITY am         SYSTEM "am.sgml">
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 96d196d229..9dce0c5f81 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -250,7 +250,7 @@
   &tablesample-method;
   &custom-scan;
   &geqo;
-  &indexam;
+  &am;
   &generic-wal;
   &btree;
   &gist;
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml
index 9446f8b836..4fa821160c 100644
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@@ -36,7 +36,7 @@
    described in <classname>pg_am</classname>.  It is possible to add a
    new index access method by writing the necessary code and
    then creating an entry in <classname>pg_am</classname> &mdash; but that is
-   beyond the scope of this chapter (see <xref linkend="indexam"/>).
+   beyond the scope of this chapter (see <xref linkend="am"/>).
   </para>
 
   <para>
-- 
2.20.1.windows.1

0003-Reorganize-am-as-both-table-and-index.patchapplication/octet-stream; name=0003-Reorganize-am-as-both-table-and-index.patchDownload
From 4c685d4990dbf641914175628db1ba74ac3cabfa Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 7 Feb 2019 16:21:40 +1100
Subject: [PATCH 03/17] Reorganize am as both table and index

There is not much table access methods info.
---
 doc/src/sgml/am.sgml | 61 +++++++++++++++++++++++++++++---------------
 1 file changed, 41 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/am.sgml b/doc/src/sgml/am.sgml
index a9f0838ee5..579187ed1b 100644
--- a/doc/src/sgml/am.sgml
+++ b/doc/src/sgml/am.sgml
@@ -1,16 +1,34 @@
 <!-- doc/src/sgml/am.sgml -->
 
-<chapter id="indexam">
- <title>Index Access Method Interface Definition</title>
+<chapter id="am">
+ <title>Access Method Interface Definition</title>
 
   <para>
    This chapter defines the interface between the core
-   <productname>PostgreSQL</productname> system and <firstterm>index access
-   methods</firstterm>, which manage individual index types.  The core system
-   knows nothing about indexes beyond what is specified here, so it is
-   possible to develop entirely new index types by writing add-on code.
-  </para>
-
+   <productname>PostgreSQL</productname> system and <firstterm>access
+   methods</firstterm>, which manage individual <literal>INDEX</literal> 
+   and <literal>TABLE</literal> types.  The core system knows nothing
+   about these access methods beyond what is specified here, so it is
+   possible to develop entirely new access method types by writing add-on code.
+  </para>
+ 
+ <sect1 id="table-access-methods">
+  <title>Overview of Table access methods </title>
+  
+  <para>
+   All Tables in <productname>PostgreSQL</productname> are the primary
+   data store. Each table is stored as its own physical <firstterm>relation</firstterm>
+   and so is described by an entry in the <structname>pg_class</structname>
+   catalog. The contents of an table are entirely under the control of its
+   access method. (All the access methods furthermore use the standard page
+   layout described in <xref linkend="storage-page-layout"/>.)
+  </para>
+
+ </sect1> 
+ 
+ <sect1 id="index-access-methods">
+  <title>Overview of Index access methods</title>
+  
   <para>
    All indexes in <productname>PostgreSQL</productname> are what are known
    technically as <firstterm>secondary indexes</firstterm>; that is, the index is
@@ -42,8 +60,8 @@
    dead tuples are reclaimed (by vacuuming) when the dead tuples themselves
    are reclaimed.
   </para>
-
- <sect1 id="index-api">
+  
+ <sect2 id="index-api">
   <title>Basic API Structure for Indexes</title>
 
   <para>
@@ -217,9 +235,9 @@ typedef struct IndexAmRoutine
    conditions.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-functions">
+ <sect2 id="index-functions">
   <title>Index Access Method Functions</title>
 
   <para>
@@ -710,9 +728,11 @@ amparallelrescan (IndexScanDesc scan);
    the beginning.
   </para>
 
- </sect1>
+ </sect2>
+ 
+ 
 
- <sect1 id="index-scanning">
+ <sect2 id="index-scanning">
   <title>Index Scanning</title>
 
   <para>
@@ -865,9 +885,9 @@ amparallelrescan (IndexScanDesc scan);
    if its internal implementation is unsuited to one API or the other.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-locking">
+ <sect2 id="index-locking">
   <title>Index Locking Considerations</title>
 
   <para>
@@ -979,9 +999,9 @@ amparallelrescan (IndexScanDesc scan);
    reduce the frequency of such transaction cancellations.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-unique-checks">
+ <sect2 id="index-unique-checks">
   <title>Index Uniqueness Checks</title>
 
   <para>
@@ -1128,9 +1148,9 @@ amparallelrescan (IndexScanDesc scan);
     </itemizedlist>
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-cost-estimation">
+ <sect2 id="index-cost-estimation">
   <title>Index Cost Estimation Functions</title>
 
   <para>
@@ -1377,5 +1397,6 @@ cost_qual_eval(&amp;index_qual_cost, path-&gt;indexquals, root);
    Examples of cost estimator functions can be found in
    <filename>src/backend/utils/adt/selfuncs.c</filename>.
   </para>
+ </sect2>
  </sect1>
 </chapter>
-- 
2.20.1.windows.1

#91Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Amit Langote (#51)
Re: Pluggable Storage - Andres's take

On Tue, Nov 27, 2018 at 4:59 PM Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
wrote:

Hi,

On 2018/11/02 9:17, Haribabu Kommi wrote:

Here I attached the cumulative fixes of the patches, new API additions

for

zheap and
basic outline of the documentation.

I've read the documentation patch while also looking at the code and here
are some comments.

Thanks for the review and apologies for the delay.
I have taken care of your most of the comments in the latest version of the
doc patches.

+  <para>
+<programlisting>
+TupleTableSlotOps *
+slot_callbacks (Relation relation);
+</programlisting>
+   API to access the slot specific methods;
+   Following methods are available;
+   <structname>TTSOpsVirtual</structname>,
+   <structname>TTSOpsHeapTuple</structname>,
+   <structname>TTSOpsMinimalTuple</structname>,
+   <structname>TTSOpsBufferTuple</structname>,
+  </para>

Unless I'm misunderstanding what the TupleTableSlotOps abstraction is or
its relations to the TableAmRoutine abstraction, I think the text
description could better be written as:

"API to get the slot operations struct for a given table access method"

It's not clear to me why various TTSOps* structs are listed here? Is the
point that different AMs may choose one of the listed alternatives? For
example, I see that heap AM implementation returns TTOpsBufferTuple, so it
manipulates slots containing buffered tuples, right? Other AMs are free
to return any one of these? For example, some AMs may never use buffer
manager and hence not use TTOpsBufferTuple. Is that understanding correct?

Yes, AM can decide what type of Slot method it wants to use.

Regards,
Haribabu Kommi
Fujitsu Australia

#92Robert Haas
robertmhaas@gmail.com
In reply to: Amit Khandekar (#89)
Re: Pluggable Storage - Andres's take

On Fri, Feb 8, 2019 at 5:18 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

In the attached v1 patch, the prefetch_distance is calculated as
effective_io_concurrency + 10. Also it has some cosmetic changes.

I did a little brief review of this patch and noticed the following things.

+} PrefetchState;

That name seems too generic.

+/*
+ * An arbitrary way to come up with a pre-fetch distance that grows with io
+ * concurrency, but is at least 10 and not more than the max effective io
+ * concurrency.
+ */

This comment is kinda useless, because it only tells us what the code
does (which is obvious anyway) and not why it does that. Saying that
your formula is arbitrary may not be the best way to attract support
for it.

+ for (i = prefetch_state->next_item; i < nitems && count < prefetch_count; i++)

It looks strange to me that next_item is stored in prefetch_state and
nitems is passed around as an argument. Is there some reason why it's
like that?

+ /* prefetch a fixed number of pages beforehand. */

Not accurate -- the number of pages we prefetch isn't fixed. It
depends on effective_io_concurrency.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#93Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Robert Haas (#92)
Re: Pluggable Storage - Andres's take

On Thu, 21 Feb 2019 at 04:17, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Feb 8, 2019 at 5:18 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

In the attached v1 patch, the prefetch_distance is calculated as
effective_io_concurrency + 10. Also it has some cosmetic changes.

I did a little brief review of this patch and noticed the following things.

+} PrefetchState;
That name seems too generic.

Ok, so something like XidHorizonPrefetchState ? On similar lines, does
prefetch_buffer() function name sound too generic as well ?

+/*
+ * An arbitrary way to come up with a pre-fetch distance that grows with io
+ * concurrency, but is at least 10 and not more than the max effective io
+ * concurrency.
+ */

This comment is kinda useless, because it only tells us what the code
does (which is obvious anyway) and not why it does that. Saying that
your formula is arbitrary may not be the best way to attract support
for it.

Well, I had checked the way the number of drive spindles
(effective_io_concurrency) is used to calculate the prefetch distance
for bitmap heap scans (ComputeIoConcurrency). Basically I think the
intention behind that method is to come up with a number that makes it
highly likely that we pre-fetch a block of each of the drive spindles.
But I didn't get how that exactly works, all the less for non-parallel
bitmap scans. Same is the case for the pre-fetching that we do here
for xid-horizon stuff, where we do the block reads sequentially. Me
and Andres discussed this offline, and he was of the opinion that this
formula won't help here, and instead we just keep a constant distance
that is some number greater than effective_io_concurrency. I agree
that instead of saying "arbitrary" we should explain why we have done
that, and before that, come up with an agreed-upon formula.

+ for (i = prefetch_state->next_item; i < nitems && count < prefetch_count; i++)

It looks strange to me that next_item is stored in prefetch_state and
nitems is passed around as an argument. Is there some reason why it's
like that?

We could keep the max count in the structure itself as well. There
isn't any specific reason for not keeping it there. It's just that
this function prefetch_state () is not a general function for
maintaining a prefetch state that spans across function calls; so we
might as well just pass the max count to that function instead of
having another field in that structure. I am not inclined specifically
towards either of the approaches.

+ /* prefetch a fixed number of pages beforehand. */

Not accurate -- the number of pages we prefetch isn't fixed. It
depends on effective_io_concurrency.

Yeah, will change that in the next patch version, according to what we
conclude about the prefetch distance calculation.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#94Robert Haas
robertmhaas@gmail.com
In reply to: Amit Khandekar (#93)
Re: Pluggable Storage - Andres's take

On Thu, Feb 21, 2019 at 6:44 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Ok, so something like XidHorizonPrefetchState ? On similar lines, does
prefetch_buffer() function name sound too generic as well ?

Yeah, that sounds good. And, yeah, then maybe rename the function too.

+/*
+ * An arbitrary way to come up with a pre-fetch distance that grows with io
+ * concurrency, but is at least 10 and not more than the max effective io
+ * concurrency.
+ */

This comment is kinda useless, because it only tells us what the code
does (which is obvious anyway) and not why it does that. Saying that
your formula is arbitrary may not be the best way to attract support
for it.

Well, I had checked the way the number of drive spindles
(effective_io_concurrency) is used to calculate the prefetch distance
for bitmap heap scans (ComputeIoConcurrency). Basically I think the
intention behind that method is to come up with a number that makes it
highly likely that we pre-fetch a block of each of the drive spindles.
But I didn't get how that exactly works, all the less for non-parallel
bitmap scans. Same is the case for the pre-fetching that we do here
for xid-horizon stuff, where we do the block reads sequentially. Me
and Andres discussed this offline, and he was of the opinion that this
formula won't help here, and instead we just keep a constant distance
that is some number greater than effective_io_concurrency. I agree
that instead of saying "arbitrary" we should explain why we have done
that, and before that, come up with an agreed-upon formula.

Maybe something like: We don't use the regular formula to determine
how much to prefetch here, but instead just add a constant to
effective_io_concurrency. That's because it seems best to do some
prefetching here even when effective_io_concurrency is set to 0, but
if the DBA thinks it's OK to do more prefetching for other operations,
then it's probably OK to do more prefetching in this case, too. It
may be that this formula is too simplistic, but at the moment we have
no evidence of that or any idea about what would work better.

+ for (i = prefetch_state->next_item; i < nitems && count < prefetch_count; i++)

It looks strange to me that next_item is stored in prefetch_state and
nitems is passed around as an argument. Is there some reason why it's
like that?

We could keep the max count in the structure itself as well. There
isn't any specific reason for not keeping it there. It's just that
this function prefetch_state () is not a general function for
maintaining a prefetch state that spans across function calls; so we
might as well just pass the max count to that function instead of
having another field in that structure. I am not inclined specifically
towards either of the approaches.

All right, count me as +0.5 for putting a copy in the structure.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#95Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Robert Haas (#94)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Thu, 21 Feb 2019 at 18:06, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Feb 21, 2019 at 6:44 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Ok, so something like XidHorizonPrefetchState ? On similar lines, does
prefetch_buffer() function name sound too generic as well ?

Yeah, that sounds good.

And, yeah, then maybe rename the function too.

Renamed the function to xid_horizon_prefetch_buffer().

+/*
+ * An arbitrary way to come up with a pre-fetch distance that grows with io
+ * concurrency, but is at least 10 and not more than the max effective io
+ * concurrency.
+ */

This comment is kinda useless, because it only tells us what the code
does (which is obvious anyway) and not why it does that. Saying that
your formula is arbitrary may not be the best way to attract support
for it.

Well, I had checked the way the number of drive spindles
(effective_io_concurrency) is used to calculate the prefetch distance
for bitmap heap scans (ComputeIoConcurrency). Basically I think the
intention behind that method is to come up with a number that makes it
highly likely that we pre-fetch a block of each of the drive spindles.
But I didn't get how that exactly works, all the less for non-parallel
bitmap scans. Same is the case for the pre-fetching that we do here
for xid-horizon stuff, where we do the block reads sequentially. Me
and Andres discussed this offline, and he was of the opinion that this
formula won't help here, and instead we just keep a constant distance
that is some number greater than effective_io_concurrency. I agree
that instead of saying "arbitrary" we should explain why we have done
that, and before that, come up with an agreed-upon formula.

Maybe something like: We don't use the regular formula to determine
how much to prefetch here, but instead just add a constant to
effective_io_concurrency. That's because it seems best to do some
prefetching here even when effective_io_concurrency is set to 0, but
if the DBA thinks it's OK to do more prefetching for other operations,
then it's probably OK to do more prefetching in this case, too. It
may be that this formula is too simplistic, but at the moment we have
no evidence of that or any idea about what would work better.

Thanks for writing it down for me. I think this is good-to-go as a
comment; so I put this as-is into the patch.

+ for (i = prefetch_state->next_item; i < nitems && count < prefetch_count; i++)

It looks strange to me that next_item is stored in prefetch_state and
nitems is passed around as an argument. Is there some reason why it's
like that?

We could keep the max count in the structure itself as well. There
isn't any specific reason for not keeping it there. It's just that
this function prefetch_state () is not a general function for
maintaining a prefetch state that spans across function calls; so we
might as well just pass the max count to that function instead of
having another field in that structure. I am not inclined specifically
towards either of the approaches.

All right, count me as +0.5 for putting a copy in the structure.

Have put the nitems into the structure.

Thanks for the review. Attached v2.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

prefetch_xid_horizon_scan_v2.patchapplication/octet-stream; name=prefetch_xid_horizon_scan_v2.patchDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8837f83..a2ba1f5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -67,6 +67,7 @@
 #include "utils/lsyscache.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
+#include "utils/spccache.h"
 
 
 static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
@@ -162,6 +163,31 @@ static const struct
 #define ConditionalLockTupleTuplock(rel, tup, mode) \
 	ConditionalLockTuple((rel), (tup), tupleLockExtraInfo[mode].hwlock)
 
+#ifdef USE_PREFETCH
+/*
+ * Maintains the current prefetch state so as to keep it ahead of buffer reads.
+ * Used to prefetch tid buffers.
+ */
+typedef struct
+{
+	int		next_item;
+	int		nitems;
+	BlockNumber cur_hblkno;
+} XidHorizonPrefetchState;
+
+/*
+ * We don't use the regular formula to determine how much to prefetch here, but
+ * instead just add a constant to effective_io_concurrency.  That's because it
+ * seems best to do some prefetching here even when effective_io_concurrency is
+ * set to 0, but if the DBA thinks it's OK to do more prefetching for other
+ * operations, then it's probably OK to do more prefetching in this case, too.
+ * It may be that this formula is too simplistic, but at the moment there is no
+ * evidence of that or any idea about what would work better.
+ */
+#define PREFETCH_DISTANCE(io_concurrency) Min((io_concurrency) + 10, MAX_IO_CONCURRENCY)
+
+#endif
+
 /*
  * This table maps tuple lock strength values for each particular
  * MultiXactStatus value.
@@ -6990,6 +7016,46 @@ HeapTupleHeaderAdvanceLatestRemovedXid(HeapTupleHeader tuple,
 	/* *latestRemovedXid may still be invalid at end */
 }
 
+#ifdef USE_PREFETCH
+/*
+ * xid_horizon_prefetch_buffer
+ *
+ * Pre-fetch 'prefetch_count' number of buffers.
+ * Continues to scan the tids array from the last position that was scanned
+ * for previous pre-fetching.
+ */
+static void
+xid_horizon_prefetch_buffer(Relation rel,
+							XidHorizonPrefetchState *prefetch_state,
+							ItemPointerData *tids, int prefetch_count)
+{
+	BlockNumber cur_hblkno = prefetch_state->cur_hblkno;
+	int		count = 0;
+	int		i;
+	int		nitems = prefetch_state->nitems;
+
+	for (i = prefetch_state->next_item; i < nitems && count < prefetch_count; i++)
+	{
+		ItemPointer htid = &tids[i];
+
+		if (cur_hblkno == InvalidBlockNumber ||
+			ItemPointerGetBlockNumber(htid) != cur_hblkno)
+		{
+			cur_hblkno = ItemPointerGetBlockNumber(htid);
+			PrefetchBuffer(rel, MAIN_FORKNUM, cur_hblkno);
+			count++;
+		}
+	}
+
+	/*
+	 * Save the prefetch position so that next time we can continue from that
+	 * position.
+	 */
+	prefetch_state->next_item = i;
+	prefetch_state->cur_hblkno = cur_hblkno;
+}
+#endif
+
 /*
  * Get the latestRemovedXid from the heap pages pointed at by the index
  * tuples being deleted.
@@ -7011,6 +7077,10 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
 	BlockNumber hblkno;
 	Buffer		buf = InvalidBuffer;
 	Page		hpage;
+#ifdef USE_PREFETCH
+	XidHorizonPrefetchState prefetch_state;
+	int			io_concurrency;
+#endif
 
 	/*
 	 * Sort to avoid repeated lookups for the same page, and to make it more
@@ -7021,21 +7091,14 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
 	qsort((void *) tids, nitems, sizeof(ItemPointerData),
 		  (int (*) (const void *, const void *)) ItemPointerCompare);
 
-	/* prefetch all pages */
+	/* prefetch a fixed number of pages beforehand. */
 #ifdef USE_PREFETCH
-	hblkno = InvalidBlockNumber;
-	for (int i = 0; i < nitems; i++)
-	{
-		ItemPointer htid = &tids[i];
-
-		if (hblkno == InvalidBlockNumber ||
-			ItemPointerGetBlockNumber(htid) != hblkno)
-		{
-			hblkno = ItemPointerGetBlockNumber(htid);
-
-			PrefetchBuffer(rel, MAIN_FORKNUM, hblkno);
-		}
-	}
+	prefetch_state.next_item = 0;
+	prefetch_state.nitems = nitems;
+	prefetch_state.cur_hblkno = InvalidBlockNumber;
+	io_concurrency = get_tablespace_io_concurrency(rel->rd_rel->reltablespace);
+	xid_horizon_prefetch_buffer(rel, &prefetch_state, tids,
+								PREFETCH_DISTANCE(io_concurrency));
 #endif
 
 	/* Iterate over all tids, and check their horizon */
@@ -7063,6 +7126,15 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
 			hblkno = ItemPointerGetBlockNumber(htid);
 
 			buf = ReadBuffer(rel, hblkno);
+
+#ifdef USE_PREFETCH
+			/*
+			 * Need to maintain the prefetch distance, so prefetch a page each
+			 * time we read a new page.
+			 */
+			xid_horizon_prefetch_buffer(rel, &prefetch_state, tids, 1);
+#endif
+
 			hpage = BufferGetPage(buf);
 
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
#96Robert Haas
robertmhaas@gmail.com
In reply to: Amit Khandekar (#95)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Fri, Feb 22, 2019 at 11:19 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Thanks for the review. Attached v2.

Thanks. I took this, combined it with Andres's
v12-0040-WIP-Move-xid-horizon-computation-for-page-level-.patch, did
some polishing of the code and comments, and pgindented. Here's what
I ended up with; see what you think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Updated-lastRemovedXid-to-primary-patch.patchapplication/octet-stream; name=0001-Updated-lastRemovedXid-to-primary-patch.patchDownload
From 1a3d53094377ad815657d1107675de87e3ddf377 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 22 Feb 2019 13:58:09 -0500
Subject: [PATCH] Updated lastRemovedXid-to-primary patch.

---
 src/backend/access/hash/hash_xlog.c    | 153 +-----------------
 src/backend/access/hash/hashinsert.c   |  18 ++-
 src/backend/access/heap/heapam.c       | 213 +++++++++++++++++++++++++
 src/backend/access/index/genam.c       |  36 +++++
 src/backend/access/nbtree/nbtpage.c    |   7 +
 src/backend/access/nbtree/nbtxlog.c    | 156 +-----------------
 src/backend/access/rmgrdesc/hashdesc.c |   5 +-
 src/backend/access/rmgrdesc/nbtdesc.c  |   3 +-
 src/include/access/genam.h             |   5 +
 src/include/access/hash_xlog.h         |   1 +
 src/include/access/heapam.h            |   4 +
 src/include/access/nbtxlog.h           |   1 +
 src/tools/pgindent/typedefs.list       |   1 +
 13 files changed, 287 insertions(+), 316 deletions(-)

diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index c6d8726157..20441e307a 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -969,155 +969,6 @@ hash_xlog_update_meta_page(XLogReaderState *record)
 		UnlockReleaseBuffer(metabuf);
 }
 
-/*
- * Get the latestRemovedXid from the heap pages pointed at by the index
- * tuples being deleted. See also btree_xlog_delete_get_latestRemovedXid,
- * on which this function is based.
- */
-static TransactionId
-hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
-{
-	xl_hash_vacuum_one_page *xlrec;
-	OffsetNumber *unused;
-	Buffer		ibuffer,
-				hbuffer;
-	Page		ipage,
-				hpage;
-	RelFileNode rnode;
-	BlockNumber blkno;
-	ItemId		iitemid,
-				hitemid;
-	IndexTuple	itup;
-	HeapTupleHeader htuphdr;
-	BlockNumber hblkno;
-	OffsetNumber hoffnum;
-	TransactionId latestRemovedXid = InvalidTransactionId;
-	int			i;
-
-	xlrec = (xl_hash_vacuum_one_page *) XLogRecGetData(record);
-
-	/*
-	 * If there's nothing running on the standby we don't need to derive a
-	 * full latestRemovedXid value, so use a fast path out of here.  This
-	 * returns InvalidTransactionId, and so will conflict with all HS
-	 * transactions; but since we just worked out that that's zero people,
-	 * it's OK.
-	 *
-	 * XXX There is a race condition here, which is that a new backend might
-	 * start just after we look.  If so, it cannot need to conflict, but this
-	 * coding will result in throwing a conflict anyway.
-	 */
-	if (CountDBBackends(InvalidOid) == 0)
-		return latestRemovedXid;
-
-	/*
-	 * Check if WAL replay has reached a consistent database state. If not, we
-	 * must PANIC. See the definition of
-	 * btree_xlog_delete_get_latestRemovedXid for more details.
-	 */
-	if (!reachedConsistency)
-		elog(PANIC, "hash_xlog_vacuum_get_latestRemovedXid: cannot operate with inconsistent data");
-
-	/*
-	 * Get index page.  If the DB is consistent, this should not fail, nor
-	 * should any of the heap page fetches below.  If one does, we return
-	 * InvalidTransactionId to cancel all HS transactions.  That's probably
-	 * overkill, but it's safe, and certainly better than panicking here.
-	 */
-	XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno);
-	ibuffer = XLogReadBufferExtended(rnode, MAIN_FORKNUM, blkno, RBM_NORMAL);
-
-	if (!BufferIsValid(ibuffer))
-		return InvalidTransactionId;
-	LockBuffer(ibuffer, HASH_READ);
-	ipage = (Page) BufferGetPage(ibuffer);
-
-	/*
-	 * Loop through the deleted index items to obtain the TransactionId from
-	 * the heap items they point to.
-	 */
-	unused = (OffsetNumber *) ((char *) xlrec + SizeOfHashVacuumOnePage);
-
-	for (i = 0; i < xlrec->ntuples; i++)
-	{
-		/*
-		 * Identify the index tuple about to be deleted.
-		 */
-		iitemid = PageGetItemId(ipage, unused[i]);
-		itup = (IndexTuple) PageGetItem(ipage, iitemid);
-
-		/*
-		 * Locate the heap page that the index tuple points at
-		 */
-		hblkno = ItemPointerGetBlockNumber(&(itup->t_tid));
-		hbuffer = XLogReadBufferExtended(xlrec->hnode, MAIN_FORKNUM,
-										 hblkno, RBM_NORMAL);
-
-		if (!BufferIsValid(hbuffer))
-		{
-			UnlockReleaseBuffer(ibuffer);
-			return InvalidTransactionId;
-		}
-		LockBuffer(hbuffer, HASH_READ);
-		hpage = (Page) BufferGetPage(hbuffer);
-
-		/*
-		 * Look up the heap tuple header that the index tuple points at by
-		 * using the heap node supplied with the xlrec. We can't use
-		 * heap_fetch, since it uses ReadBuffer rather than XLogReadBuffer.
-		 * Note that we are not looking at tuple data here, just headers.
-		 */
-		hoffnum = ItemPointerGetOffsetNumber(&(itup->t_tid));
-		hitemid = PageGetItemId(hpage, hoffnum);
-
-		/*
-		 * Follow any redirections until we find something useful.
-		 */
-		while (ItemIdIsRedirected(hitemid))
-		{
-			hoffnum = ItemIdGetRedirect(hitemid);
-			hitemid = PageGetItemId(hpage, hoffnum);
-			CHECK_FOR_INTERRUPTS();
-		}
-
-		/*
-		 * If the heap item has storage, then read the header and use that to
-		 * set latestRemovedXid.
-		 *
-		 * Some LP_DEAD items may not be accessible, so we ignore them.
-		 */
-		if (ItemIdHasStorage(hitemid))
-		{
-			htuphdr = (HeapTupleHeader) PageGetItem(hpage, hitemid);
-			HeapTupleHeaderAdvanceLatestRemovedXid(htuphdr, &latestRemovedXid);
-		}
-		else if (ItemIdIsDead(hitemid))
-		{
-			/*
-			 * Conjecture: if hitemid is dead then it had xids before the xids
-			 * marked on LP_NORMAL items. So we just ignore this item and move
-			 * onto the next, for the purposes of calculating
-			 * latestRemovedxids.
-			 */
-		}
-		else
-			Assert(!ItemIdIsUsed(hitemid));
-
-		UnlockReleaseBuffer(hbuffer);
-	}
-
-	UnlockReleaseBuffer(ibuffer);
-
-	/*
-	 * If all heap tuples were LP_DEAD then we will be returning
-	 * InvalidTransactionId here, which avoids conflicts. This matches
-	 * existing logic which assumes that LP_DEAD tuples must already be older
-	 * than the latestRemovedXid on the cleanup record that set them as
-	 * LP_DEAD, hence must already have generated a conflict.
-	 */
-	return latestRemovedXid;
-}
-
 /*
  * replay delete operation in hash index to remove
  * tuples marked as DEAD during index tuple insertion.
@@ -1149,12 +1000,10 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
 	 */
 	if (InHotStandby)
 	{
-		TransactionId latestRemovedXid =
-		hash_xlog_vacuum_get_latestRemovedXid(record);
 		RelFileNode rnode;
 
 		XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
-		ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+		ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
 	}
 
 	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true, &buffer);
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 970733f0cd..a248bd0743 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -23,8 +23,8 @@
 #include "storage/buf_internals.h"
 #include "storage/predicate.h"
 
-static void _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
-					  RelFileNode hnode);
+static void _hash_vacuum_one_page(Relation rel, Relation hrel,
+								  Buffer metabuf, Buffer buf);
 
 /*
  *	_hash_doinsert() -- Handle insertion of a single index tuple.
@@ -137,7 +137,7 @@ restart_insert:
 
 			if (IsBufferCleanupOK(buf))
 			{
-				_hash_vacuum_one_page(rel, metabuf, buf, heapRel->rd_node);
+				_hash_vacuum_one_page(rel, heapRel, metabuf, buf);
 
 				if (PageGetFreeSpace(page) >= itemsz)
 					break;		/* OK, now we have enough space */
@@ -335,8 +335,7 @@ _hash_pgaddmultitup(Relation rel, Buffer buf, IndexTuple *itups,
  */
 
 static void
-_hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
-					  RelFileNode hnode)
+_hash_vacuum_one_page(Relation rel, Relation hrel, Buffer metabuf, Buffer buf)
 {
 	OffsetNumber deletable[MaxOffsetNumber];
 	int			ndeletable = 0;
@@ -360,6 +359,12 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
 
 	if (ndeletable > 0)
 	{
+		TransactionId latestRemovedXid;
+
+		latestRemovedXid =
+			index_compute_xid_horizon_for_tuples(rel, hrel, buf,
+												 deletable, ndeletable);
+
 		/*
 		 * Write-lock the meta page so that we can decrement tuple count.
 		 */
@@ -393,7 +398,8 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
 			xl_hash_vacuum_one_page xlrec;
 			XLogRecPtr	recptr;
 
-			xlrec.hnode = hnode;
+			xlrec.latestRemovedXid = latestRemovedXid;
+			xlrec.hnode = hrel->rd_node;
 			xlrec.ntuples = ndeletable;
 
 			XLogBeginInsert();
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dc3499349b..e5c7365651 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -66,6 +66,7 @@
 #include "utils/lsyscache.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
+#include "utils/spccache.h"
 
 
 /* GUC variable */
@@ -177,6 +178,20 @@ static const struct
 #define ConditionalLockTupleTuplock(rel, tup, mode) \
 	ConditionalLockTuple((rel), (tup), tupleLockExtraInfo[mode].hwlock)
 
+#ifdef USE_PREFETCH
+/*
+ * heap_compute_xid_horizon_for_tuples and xid_horizon_prefetch_buffer use
+ * this structure to coordinate prefetching activity.
+ */
+typedef struct
+{
+	BlockNumber cur_hblkno;
+	int			next_item;
+	int			nitems;
+	ItemPointerData *tids;
+} XidHorizonPrefetchState;
+#endif
+
 /*
  * This table maps tuple lock strength values for each particular
  * MultiXactStatus value.
@@ -7177,6 +7192,204 @@ HeapTupleHeaderAdvanceLatestRemovedXid(HeapTupleHeader tuple,
 	/* *latestRemovedXid may still be invalid at end */
 }
 
+#ifdef USE_PREFETCH
+/*
+ * Helper function for heap_compute_xid_horizon_for_tuples.  Issue prefetch
+ * requests for the number of buffers indicated by prefetch_count.  The
+ * prefetch_state keeps track of all the buffers that we can prefetch and
+ * which ones have already been prefetched; each call to this function picks
+ * up where the previous call left off.
+ */
+static void
+xid_horizon_prefetch_buffer(Relation rel,
+							XidHorizonPrefetchState *prefetch_state,
+							int prefetch_count)
+{
+	BlockNumber cur_hblkno = prefetch_state->cur_hblkno;
+	int			count = 0;
+	int			i;
+	int			nitems = prefetch_state->nitems;
+	ItemPointerData *tids = prefetch_state->tids;
+
+	for (i = prefetch_state->next_item;
+		 i < nitems && count < prefetch_count;
+		 i++)
+	{
+		ItemPointer htid = &tids[i];
+
+		if (cur_hblkno == InvalidBlockNumber ||
+			ItemPointerGetBlockNumber(htid) != cur_hblkno)
+		{
+			cur_hblkno = ItemPointerGetBlockNumber(htid);
+			PrefetchBuffer(rel, MAIN_FORKNUM, cur_hblkno);
+			count++;
+		}
+	}
+
+	/*
+	 * Save the prefetch position so that next time we can continue from that
+	 * position.
+	 */
+	prefetch_state->next_item = i;
+	prefetch_state->cur_hblkno = cur_hblkno;
+}
+#endif
+
+/*
+ * Get the latestRemovedXid from the heap pages pointed at by the index
+ * tuples being deleted.
+ *
+ * We used to do this during recovery rather than on the primary, but that
+ * approach now appears inferior.  It meant that the master could generate
+ * a lot of work for the standby without any back-pressure to slow down the
+ * master, and it required the standby to have reached consistency, whereas
+ * we want to have correct information available even before that point.
+ *
+ * It's possible for this to generate a fair amount of I/O, since we may be
+ * deleting hundreds of tuples from a single index block.  To amortize that
+ * cost to some degree, this uses prefetching and combines repeat accesses to
+ * the same block.
+ */
+TransactionId
+heap_compute_xid_horizon_for_tuples(Relation rel,
+									ItemPointerData *tids,
+									int nitems)
+{
+	TransactionId latestRemovedXid = InvalidTransactionId;
+	BlockNumber hblkno;
+	Buffer		buf = InvalidBuffer;
+	Page		hpage;
+#ifdef USE_PREFETCH
+	XidHorizonPrefetchState prefetch_state;
+	int			io_concurrency;
+	int			prefetch_distance;
+#endif
+
+	/*
+	 * Sort to avoid repeated lookups for the same page, and to make it more
+	 * likely to access items in an efficient order. In particular, this
+	 * ensures that if there are multiple pointers to the same page, they all
+	 * get processed looking up and locking the page just once.
+	 */
+	qsort((void *) tids, nitems, sizeof(ItemPointerData),
+		  (int (*) (const void *, const void *)) ItemPointerCompare);
+
+#ifdef USE_PREFETCH
+	/* Initialize prefetch state. */
+	prefetch_state.cur_hblkno = InvalidBlockNumber;
+	prefetch_state.next_item = 0;
+	prefetch_state.nitems = nitems;
+	prefetch_state.tids = tids;
+
+	/*
+	 * Compute the prefetch distance that we will attempt to maintain.
+	 *
+	 * We don't use the regular formula to determine how much to prefetch
+	 * here, but instead just add a constant to effective_io_concurrency.
+	 * That's because it seems best to do some prefetching here even when
+	 * effective_io_concurrency is set to 0, but if the DBA thinks it's OK to
+	 * do more prefetching for other operations, then it's probably OK to do
+	 * more prefetching in this case, too. It may be that this formula is too
+	 * simplistic, but at the moment there is no evidence of that or any idea
+	 * about what would work better.
+	 */
+	io_concurrency = get_tablespace_io_concurrency(rel->rd_rel->reltablespace);
+	prefetch_distance = Min((io_concurrency) + 10, MAX_IO_CONCURRENCY);
+
+	/* Start prefetching. */
+	xid_horizon_prefetch_buffer(rel, &prefetch_state, prefetch_distance);
+#endif
+
+	/* Iterate over all tids, and check their horizon */
+	hblkno = InvalidBlockNumber;
+	for (int i = 0; i < nitems; i++)
+	{
+		ItemPointer htid = &tids[i];
+		ItemId		hitemid;
+		OffsetNumber hoffnum;
+
+		/*
+		 * Read heap buffer, but avoid refetching if it's the same block as
+		 * required for the last tid.
+		 */
+		if (hblkno == InvalidBlockNumber ||
+			ItemPointerGetBlockNumber(htid) != hblkno)
+		{
+			/* release old buffer */
+			if (BufferIsValid(buf))
+			{
+				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+				ReleaseBuffer(buf);
+			}
+
+			hblkno = ItemPointerGetBlockNumber(htid);
+
+			buf = ReadBuffer(rel, hblkno);
+
+#ifdef USE_PREFETCH
+
+			/*
+			 * To maintain the prefetch distance, prefetch one more page for
+			 * each page we read.
+			 */
+			xid_horizon_prefetch_buffer(rel, &prefetch_state, 1);
+#endif
+
+			hpage = BufferGetPage(buf);
+
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
+		}
+
+		hoffnum = ItemPointerGetOffsetNumber(htid);
+		hitemid = PageGetItemId(hpage, hoffnum);
+
+		/*
+		 * Follow any redirections until we find something useful.
+		 */
+		while (ItemIdIsRedirected(hitemid))
+		{
+			hoffnum = ItemIdGetRedirect(hitemid);
+			hitemid = PageGetItemId(hpage, hoffnum);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		/*
+		 * If the heap item has storage, then read the header and use that to
+		 * set latestRemovedXid.
+		 *
+		 * Some LP_DEAD items may not be accessible, so we ignore them.
+		 */
+		if (ItemIdHasStorage(hitemid))
+		{
+			HeapTupleHeader htuphdr;
+
+			htuphdr = (HeapTupleHeader) PageGetItem(hpage, hitemid);
+
+			HeapTupleHeaderAdvanceLatestRemovedXid(htuphdr, &latestRemovedXid);
+		}
+		else if (ItemIdIsDead(hitemid))
+		{
+			/*
+			 * Conjecture: if hitemid is dead then it had xids before the xids
+			 * marked on LP_NORMAL items. So we just ignore this item and move
+			 * onto the next, for the purposes of calculating
+			 * latestRemovedxids.
+			 */
+		}
+		else
+			Assert(!ItemIdIsUsed(hitemid));
+
+	}
+
+	if (BufferIsValid(buf))
+	{
+		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		ReleaseBuffer(buf);
+	}
+
+	return latestRemovedXid;
+}
+
 /*
  * Perform XLogInsert to register a heap cleanup info message. These
  * messages are sent once per VACUUM and are required because
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index e0a5ea42d5..dd01ce6805 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -276,6 +276,42 @@ BuildIndexValueDescription(Relation indexRelation,
 	return buf.data;
 }
 
+/*
+ * Get the latestRemovedXid from the heap pages pointed at by the index
+ * tuples being deleted.
+ */
+TransactionId
+index_compute_xid_horizon_for_tuples(Relation irel,
+									 Relation hrel,
+									 Buffer ibuf,
+									 OffsetNumber *itemnos,
+									 int nitems)
+{
+	ItemPointerData *htids = (ItemPointerData *) palloc(sizeof(ItemPointerData) * nitems);
+	TransactionId latestRemovedXid = InvalidTransactionId;
+	Page		ipage = BufferGetPage(ibuf);
+	IndexTuple	itup;
+
+	/* identify what the index tuples about to be deleted point to */
+	for (int i = 0; i < nitems; i++)
+	{
+		ItemId		iitemid;
+
+		iitemid = PageGetItemId(ipage, itemnos[i]);
+		itup = (IndexTuple) PageGetItem(ipage, iitemid);
+
+		ItemPointerCopy(&itup->t_tid, &htids[i]);
+	}
+
+	/* determine the actual xid horizon */
+	latestRemovedXid =
+		heap_compute_xid_horizon_for_tuples(hrel, htids, nitems);
+
+	pfree(htids);
+
+	return latestRemovedXid;
+}
+
 
 /* ----------------------------------------------------------------
  *		heap-or-index-scan access to system catalogs
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 1d72fe5408..4c7dbb6523 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -1032,10 +1032,16 @@ _bt_delitems_delete(Relation rel, Buffer buf,
 {
 	Page		page = BufferGetPage(buf);
 	BTPageOpaque opaque;
+	TransactionId latestRemovedXid = InvalidTransactionId;
 
 	/* Shouldn't be called unless there's something to do */
 	Assert(nitems > 0);
 
+	if (XLogStandbyInfoActive() && RelationNeedsWAL(rel))
+		latestRemovedXid =
+			index_compute_xid_horizon_for_tuples(rel, heapRel, buf,
+												 itemnos, nitems);
+
 	/* No ereport(ERROR) until changes are logged */
 	START_CRIT_SECTION();
 
@@ -1065,6 +1071,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
 		XLogRecPtr	recptr;
 		xl_btree_delete xlrec_delete;
 
+		xlrec_delete.latestRemovedXid = latestRemovedXid;
 		xlrec_delete.hnode = heapRel->rd_node;
 		xlrec_delete.nitems = nitems;
 
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index b0666b42df..9c277f5016 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -518,159 +518,6 @@ btree_xlog_vacuum(XLogReaderState *record)
 		UnlockReleaseBuffer(buffer);
 }
 
-/*
- * Get the latestRemovedXid from the heap pages pointed at by the index
- * tuples being deleted. This puts the work for calculating latestRemovedXid
- * into the recovery path rather than the primary path.
- *
- * It's possible that this generates a fair amount of I/O, since an index
- * block may have hundreds of tuples being deleted. Repeat accesses to the
- * same heap blocks are common, though are not yet optimised.
- *
- * XXX optimise later with something like XLogPrefetchBuffer()
- */
-static TransactionId
-btree_xlog_delete_get_latestRemovedXid(XLogReaderState *record)
-{
-	xl_btree_delete *xlrec = (xl_btree_delete *) XLogRecGetData(record);
-	OffsetNumber *unused;
-	Buffer		ibuffer,
-				hbuffer;
-	Page		ipage,
-				hpage;
-	RelFileNode rnode;
-	BlockNumber blkno;
-	ItemId		iitemid,
-				hitemid;
-	IndexTuple	itup;
-	HeapTupleHeader htuphdr;
-	BlockNumber hblkno;
-	OffsetNumber hoffnum;
-	TransactionId latestRemovedXid = InvalidTransactionId;
-	int			i;
-
-	/*
-	 * If there's nothing running on the standby we don't need to derive a
-	 * full latestRemovedXid value, so use a fast path out of here.  This
-	 * returns InvalidTransactionId, and so will conflict with all HS
-	 * transactions; but since we just worked out that that's zero people,
-	 * it's OK.
-	 *
-	 * XXX There is a race condition here, which is that a new backend might
-	 * start just after we look.  If so, it cannot need to conflict, but this
-	 * coding will result in throwing a conflict anyway.
-	 */
-	if (CountDBBackends(InvalidOid) == 0)
-		return latestRemovedXid;
-
-	/*
-	 * In what follows, we have to examine the previous state of the index
-	 * page, as well as the heap page(s) it points to.  This is only valid if
-	 * WAL replay has reached a consistent database state; which means that
-	 * the preceding check is not just an optimization, but is *necessary*. We
-	 * won't have let in any user sessions before we reach consistency.
-	 */
-	if (!reachedConsistency)
-		elog(PANIC, "btree_xlog_delete_get_latestRemovedXid: cannot operate with inconsistent data");
-
-	/*
-	 * Get index page.  If the DB is consistent, this should not fail, nor
-	 * should any of the heap page fetches below.  If one does, we return
-	 * InvalidTransactionId to cancel all HS transactions.  That's probably
-	 * overkill, but it's safe, and certainly better than panicking here.
-	 */
-	XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno);
-	ibuffer = XLogReadBufferExtended(rnode, MAIN_FORKNUM, blkno, RBM_NORMAL);
-	if (!BufferIsValid(ibuffer))
-		return InvalidTransactionId;
-	LockBuffer(ibuffer, BT_READ);
-	ipage = (Page) BufferGetPage(ibuffer);
-
-	/*
-	 * Loop through the deleted index items to obtain the TransactionId from
-	 * the heap items they point to.
-	 */
-	unused = (OffsetNumber *) ((char *) xlrec + SizeOfBtreeDelete);
-
-	for (i = 0; i < xlrec->nitems; i++)
-	{
-		/*
-		 * Identify the index tuple about to be deleted
-		 */
-		iitemid = PageGetItemId(ipage, unused[i]);
-		itup = (IndexTuple) PageGetItem(ipage, iitemid);
-
-		/*
-		 * Locate the heap page that the index tuple points at
-		 */
-		hblkno = ItemPointerGetBlockNumber(&(itup->t_tid));
-		hbuffer = XLogReadBufferExtended(xlrec->hnode, MAIN_FORKNUM, hblkno, RBM_NORMAL);
-		if (!BufferIsValid(hbuffer))
-		{
-			UnlockReleaseBuffer(ibuffer);
-			return InvalidTransactionId;
-		}
-		LockBuffer(hbuffer, BT_READ);
-		hpage = (Page) BufferGetPage(hbuffer);
-
-		/*
-		 * Look up the heap tuple header that the index tuple points at by
-		 * using the heap node supplied with the xlrec. We can't use
-		 * heap_fetch, since it uses ReadBuffer rather than XLogReadBuffer.
-		 * Note that we are not looking at tuple data here, just headers.
-		 */
-		hoffnum = ItemPointerGetOffsetNumber(&(itup->t_tid));
-		hitemid = PageGetItemId(hpage, hoffnum);
-
-		/*
-		 * Follow any redirections until we find something useful.
-		 */
-		while (ItemIdIsRedirected(hitemid))
-		{
-			hoffnum = ItemIdGetRedirect(hitemid);
-			hitemid = PageGetItemId(hpage, hoffnum);
-			CHECK_FOR_INTERRUPTS();
-		}
-
-		/*
-		 * If the heap item has storage, then read the header and use that to
-		 * set latestRemovedXid.
-		 *
-		 * Some LP_DEAD items may not be accessible, so we ignore them.
-		 */
-		if (ItemIdHasStorage(hitemid))
-		{
-			htuphdr = (HeapTupleHeader) PageGetItem(hpage, hitemid);
-
-			HeapTupleHeaderAdvanceLatestRemovedXid(htuphdr, &latestRemovedXid);
-		}
-		else if (ItemIdIsDead(hitemid))
-		{
-			/*
-			 * Conjecture: if hitemid is dead then it had xids before the xids
-			 * marked on LP_NORMAL items. So we just ignore this item and move
-			 * onto the next, for the purposes of calculating
-			 * latestRemovedxids.
-			 */
-		}
-		else
-			Assert(!ItemIdIsUsed(hitemid));
-
-		UnlockReleaseBuffer(hbuffer);
-	}
-
-	UnlockReleaseBuffer(ibuffer);
-
-	/*
-	 * If all heap tuples were LP_DEAD then we will be returning
-	 * InvalidTransactionId here, which avoids conflicts. This matches
-	 * existing logic which assumes that LP_DEAD tuples must already be older
-	 * than the latestRemovedXid on the cleanup record that set them as
-	 * LP_DEAD, hence must already have generated a conflict.
-	 */
-	return latestRemovedXid;
-}
-
 static void
 btree_xlog_delete(XLogReaderState *record)
 {
@@ -693,12 +540,11 @@ btree_xlog_delete(XLogReaderState *record)
 	 */
 	if (InHotStandby)
 	{
-		TransactionId latestRemovedXid = btree_xlog_delete_get_latestRemovedXid(record);
 		RelFileNode rnode;
 
 		XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
 
-		ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+		ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
 	}
 
 	/*
diff --git a/src/backend/access/rmgrdesc/hashdesc.c b/src/backend/access/rmgrdesc/hashdesc.c
index ade1c61816..a29aa96e9c 100644
--- a/src/backend/access/rmgrdesc/hashdesc.c
+++ b/src/backend/access/rmgrdesc/hashdesc.c
@@ -113,8 +113,9 @@ hash_desc(StringInfo buf, XLogReaderState *record)
 			{
 				xl_hash_vacuum_one_page *xlrec = (xl_hash_vacuum_one_page *) rec;
 
-				appendStringInfo(buf, "ntuples %d",
-								 xlrec->ntuples);
+				appendStringInfo(buf, "ntuples %d, latest removed xid %u",
+								 xlrec->ntuples,
+								 xlrec->latestRemovedXid);
 				break;
 			}
 	}
diff --git a/src/backend/access/rmgrdesc/nbtdesc.c b/src/backend/access/rmgrdesc/nbtdesc.c
index 8d5c6ae0ab..64cf7ed02e 100644
--- a/src/backend/access/rmgrdesc/nbtdesc.c
+++ b/src/backend/access/rmgrdesc/nbtdesc.c
@@ -56,7 +56,8 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 			{
 				xl_btree_delete *xlrec = (xl_btree_delete *) rec;
 
-				appendStringInfo(buf, "%d items", xlrec->nitems);
+				appendStringInfo(buf, "%d items, latest removed xid %u",
+								 xlrec->nitems, xlrec->latestRemovedXid);
 				break;
 			}
 		case XLOG_BTREE_MARK_PAGE_HALFDEAD:
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index c4aba39496..91efd6621f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -186,6 +186,11 @@ extern IndexScanDesc RelationGetIndexScan(Relation indexRelation,
 extern void IndexScanEnd(IndexScanDesc scan);
 extern char *BuildIndexValueDescription(Relation indexRelation,
 						   Datum *values, bool *isnull);
+extern TransactionId index_compute_xid_horizon_for_tuples(Relation irel,
+									 Relation hrel,
+									 Buffer ibuf,
+									 OffsetNumber *itemnos,
+									 int nitems);
 
 /*
  * heap-or-index access to system catalogs (in genam.c)
diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
index 9cef1b7c25..045e2bf58b 100644
--- a/src/include/access/hash_xlog.h
+++ b/src/include/access/hash_xlog.h
@@ -263,6 +263,7 @@ typedef struct xl_hash_init_bitmap_page
  */
 typedef struct xl_hash_vacuum_one_page
 {
+	TransactionId latestRemovedXid;
 	RelFileNode hnode;
 	int			ntuples;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ab0879138f..2028a6446f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -166,6 +166,10 @@ extern void simple_heap_update(Relation relation, ItemPointer otid,
 extern void heap_sync(Relation relation);
 extern void heap_update_snapshot(HeapScanDesc scan, Snapshot snapshot);
 
+extern TransactionId heap_compute_xid_horizon_for_tuples(Relation rel,
+									ItemPointerData *items,
+									int nitems);
+
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern int heap_page_prune(Relation relation, Buffer buffer,
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index a605851c98..a294bd6fef 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -123,6 +123,7 @@ typedef struct xl_btree_split
  */
 typedef struct xl_btree_delete
 {
+	TransactionId latestRemovedXid;
 	RelFileNode hnode;			/* RelFileNode of the heap the index currently
 								 * points at */
 	int			nitems;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3d3c76d251..e3d0aef9d3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2602,6 +2602,7 @@ XactCallback
 XactCallbackItem
 XactEvent
 XactLockTableWaitInfo
+XidHorizonPrefetchState
 XidStatus
 XmlExpr
 XmlExprOp
-- 
2.17.2 (Apple Git-113)

#97Amit Khandekar
amitdkhan.pg@gmail.com
In reply to: Robert Haas (#96)
Re: Pluggable Storage - Andres's take

On Sat, 23 Feb 2019 at 01:22, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Feb 22, 2019 at 11:19 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Thanks for the review. Attached v2.

Thanks. I took this, combined it with Andres's
v12-0040-WIP-Move-xid-horizon-computation-for-page-level-.patch, did
some polishing of the code and comments, and pgindented. Here's what
I ended up with; see what you think.

Thanks Robert ! The changes look good.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#98Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#77)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-01-21 10:32:37 +1100, Haribabu Kommi wrote:

I am not able to remove the complete t_tableOid from HeapTuple,
because of its use in triggers, as the slot is not available in triggers
and I need to store the tableOid also as part of the tuple.

What precisely do you man by "use in triggers"? You mean that a trigger
might access a HeapTuple's t_tableOid directly, even though all of the
information is available in the trigger context?

Greetings,

Andres Freund

#99Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#98)
Re: Pluggable Storage - Andres's take

On Wed, Feb 27, 2019 at 11:10 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-01-21 10:32:37 +1100, Haribabu Kommi wrote:

I am not able to remove the complete t_tableOid from HeapTuple,
because of its use in triggers, as the slot is not available in triggers
and I need to store the tableOid also as part of the tuple.

What precisely do you man by "use in triggers"? You mean that a trigger
might access a HeapTuple's t_tableOid directly, even though all of the
information is available in the trigger context?

I forgot the exact scenario, but during the trigger function execution, the
pl/pgsql function execution access the TableOidAttributeNumber from the
stored
tuple using the heap_get* function. Because of lack of slot support in the
triggers,
we still need to maintain the t_tableOid with proper OID. The heaptuple
t_tableOid
member data is updated whenever the heaptuple is generated from slot.

Regards,
Haribabu Kommi
Fujitsu Australia

#100Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Andres Freund (#78)
Re: Pluggable Storage - Andres's take

I haven't been following this thread closely, but I looked briefly at
some of the patches posted here:

On 21/01/2019 11:01, Andres Freund wrote:

The patchset is now pretty granularly split into individual pieces.

Wow, 42 patches, very granular indeed! That's nice for reviewing, but
are you planning to squash them before committing? Seems a bit excessive
for the git history.

Patches 1-4:

* v12-0001-WIP-Introduce-access-table.h-access-relation.h.patch
* v12-0002-Replace-heapam.h-includes-with-relation.h-table..patch
* v12-0003-Replace-uses-of-heap_open-et-al-with-table_open-.patch
* v12-0004-Remove-superfluous-tqual.h-includes.patch

These look good to me. I think it would make sense to squash these
together, and commit now.

Patches 20 and 21:
* v12-0020-WIP-Slotified-triggers.patch
* v12-0021-WIP-Slotify-EPQ.patch

I like this slotification of trigger and EPQ code. It seems like a nice
thing to do, independently of the other patches. You said you wanted to
polish that up to committable state, and commit separately: +1 on that.
Perhaps do that even before patches 1-4.

--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -35,8 +35,8 @@ typedef struct TriggerData
HeapTuple	tg_trigtuple;
HeapTuple	tg_newtuple;
Trigger    *tg_trigger;
-	Buffer		tg_trigtuplebuf;
-	Buffer		tg_newtuplebuf;
+	TupleTableSlot *tg_trigslot;
+	TupleTableSlot *tg_newslot;
Tuplestorestate *tg_oldtable;
Tuplestorestate *tg_newtable;
} TriggerData;

Do we still need tg_trigtuple and tg_newtuple? Can't we always use the
corresponding slots instead? Is it for backwards-compatibility with
user-defined triggers written in C? (Documentation also needs to be
updated for the changes in this struct)

I didn't look a the rest of the patches yet...

- Heikki

#101Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#100)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-02-27 18:00:12 +0800, Heikki Linnakangas wrote:

I haven't been following this thread closely, but I looked briefly at some
of the patches posted here:

Thanks!

On 21/01/2019 11:01, Andres Freund wrote:

The patchset is now pretty granularly split into individual pieces.

Wow, 42 patches, very granular indeed! That's nice for reviewing, but are
you planning to squash them before committing? Seems a bit excessive for the
git history.

I've pushed a number of the preliminary patches since you replied. We're
down to 23 in my local count...

I do plan / did squash some, but not actually that many. I find that
patches after a certain size are just too hard to do the necessary final
polish to, especially if they do several independent things. Keeping
things granular also allows to push incrementally, even when later
patches aren't quite ready - imo pretty important for a project this
size.

Patches 1-4:

* v12-0001-WIP-Introduce-access-table.h-access-relation.h.patch
* v12-0002-Replace-heapam.h-includes-with-relation.h-table..patch
* v12-0003-Replace-uses-of-heap_open-et-al-with-table_open-.patch
* v12-0004-Remove-superfluous-tqual.h-includes.patch

These look good to me. I think it would make sense to squash these together,
and commit now.

I've pushed these a while ago.

Patches 20 and 21:
* v12-0020-WIP-Slotified-triggers.patch
* v12-0021-WIP-Slotify-EPQ.patch

I like this slotification of trigger and EPQ code. It seems like a nice
thing to do, independently of the other patches. You said you wanted to
polish that up to committable state, and commit separately: +1 on
that.

I pushed the trigger patch yesterday evening. Working to finalize the
EPQ patch now - I've polished it a fair bit since the version posted on
the list, but it still needs a bit more.

Once the EPQ patch (and two other simple preliminary ones) is pushed, I
plan to post a new rebased version to this thread. That's then really
only the core table AM work.

--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -35,8 +35,8 @@ typedef struct TriggerData
HeapTuple	tg_trigtuple;
HeapTuple	tg_newtuple;
Trigger    *tg_trigger;
-	Buffer		tg_trigtuplebuf;
-	Buffer		tg_newtuplebuf;
+	TupleTableSlot *tg_trigslot;
+	TupleTableSlot *tg_newslot;
Tuplestorestate *tg_oldtable;
Tuplestorestate *tg_newtable;
} TriggerData;

Do we still need tg_trigtuple and tg_newtuple? Can't we always use the
corresponding slots instead? Is it for backwards-compatibility with
user-defined triggers written in C?

Yes, the external trigger interface currently relies on those being
there. I think we probably ought to revise that, at the very least
because it'll otherwise be noticably less efficient to have triggers on
!heap tables, but also because it's just cleaner. But I feel like I
don't want more significantly sized patches on my plate right now, so my
current goal is to just put that on the todo after the pluggable storage
work. Kinda wonder if we don't want to do that earlier in a release
cycle too, so we can do other breaking changes to the trigger interface
without breaking external code multiple times. There's probably also an
argument for just not breaking the interface.

(Documentation also needs to be updated for the changes in this
struct)

Ah, nice catch, will do that next.

Greetings,

Andres Freund

#102Ashwin Agrawal
aagrawal@pivotal.io
In reply to: Andres Freund (#101)
Re: Pluggable Storage - Andres's take

Hi,

While playing with the tableam, usage of which starts with commit
v12-0023-tableam-Introduce-and-use-begin-endscan-and-do-i.patch, should we
check for NULL function pointer before actually calling the same and ERROR
out instead as NOT_SUPPORTED or something on those lines.

Understand its kind of think which should get caught during development.
But still currently it segfaults if missing to define some AM function,
might be easier for iterative development to error instead in common place.

Or should there be upfront check for NULL somewhere if all the AM functions
are mandatory to have functions defined for them and should not be NULL.

#103Andres Freund
andres@anarazel.de
In reply to: Ashwin Agrawal (#102)
Re: Pluggable Storage - Andres's take

Hi,

Thanks for looking!

On 2019-03-05 18:27:45 -0800, Ashwin Agrawal wrote:

While playing with the tableam, usage of which starts with commit
v12-0023-tableam-Introduce-and-use-begin-endscan-and-do-i.patch, should we
check for NULL function pointer before actually calling the same and ERROR
out instead as NOT_SUPPORTED or something on those lines.

Scans seem like absolutely required part of the functionality, so I
don't think there's much point in that. It'd just bloat code and
runtime.

Understand its kind of think which should get caught during development.
But still currently it segfaults if missing to define some AM function,

The segfault iself doesn't bother me at all, it's just a NULL pointer
dereference. If we were to put Asserts somewhere it'd crash very
similarly. I think you have a point in that:

might be easier for iterative development to error instead in common place.

Would make it a tiny bit easier to implement a new AM. We could
probably add a few asserts to GetTableAmRoutine(), to check that
required functions are implemted. Don't think that'd make a meaningful
difference for something like the scan functions, but it'd probably make
it easier to forward port AMs to the next release - I'm pretty sure
we're going to add required callbacks in the next few releases.

Greetings,

Andres Freund

#104Andres Freund
andres@anarazel.de
In reply to: Ashwin Agrawal (#102)
Re: Pluggable Storage - Andres's take

Hi,

Thanks for looking!

On 2019-03-05 18:27:45 -0800, Ashwin Agrawal wrote:

While playing with the tableam, usage of which starts with commit
v12-0023-tableam-Introduce-and-use-begin-endscan-and-do-i.patch, should we
check for NULL function pointer before actually calling the same and ERROR
out instead as NOT_SUPPORTED or something on those lines.

Scans seem like absolutely required part of the functionality, so I
don't think there's much point in that. It'd just bloat code and
runtime.

Understand its kind of think which should get caught during development.
But still currently it segfaults if missing to define some AM function,

The segfault iself doesn't bother me at all, it's just a NULL pointer
dereference. If we were to put Asserts somewhere it'd crash very
similarly. I think you have a point in that:

might be easier for iterative development to error instead in common place.

Would make it a tiny bit easier to implement a new AM. We could
probably add a few asserts to GetTableAmRoutine(), to check that
required functions are implemted. Don't think that'd make a meaningful
difference for something like the scan functions, but it'd probably make
it easier to forward port AMs to the next release - I'm pretty sure
we're going to add required callbacks in the next few releases.

Greetings,

Andres Freund

#105Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#101)
23 attachment(s)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-02-27 09:29:31 -0800, Andres Freund wrote:

Once the EPQ patch (and two other simple preliminary ones) is pushed, I
plan to post a new rebased version to this thread. That's then really
only the core table AM work.

That's now done. Here's my current submission of remaining
patches. I've polished the first patch, adding DDL support, quite a bit,
I'm planning to push that soon.

Changes:

- I've removed the ability to specify a table AM to partitioned tables,
as discussed at [1]/messages/by-id/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
- That happily shook out a number of bugs, where the partitioned table's
AM was used, when the leaf partitioned AM should have been used (via
the slot). In particular this necessicated refactoring the way slots
are used for ON CONFLICT of partitioned tables. That's the new WIP
patch in the series. But I think the result is actually clearer.
- I've integrated the pg_dump and psql patches, although I'd made
HIDE_TABLEAM independent of whether \d+ table is on a table with the
default AM or not.
- There's a good number of new tests in both the DDL and the pg_dump
patch
- Lots of smaller cleanups

My next steps are:
- final polish & push the basic DDL and pg_dump patches
- cleanup & polish the ON CONFLICT refactoring
- cleanup & polish the patch adding the tableam based scan
interface. That's by far the largest patch in the series. I might try
to split it up further, but it's probably not worth it.
- improve documentation for the individual callbacks (integrating
work done by others on this thread), in the header
- integrate docs patch
- integrate the revised version of the xid horizon patch by Amit
Khandekar (reviewed by Robert Haas)
- fix naive implementation of slot based COPY, to not constantly
drop/recreate slots upon partition change. I've hackplemented a better
approach, which makes it faster than the current code in my testing.

Notes:
- I'm currently not targeting v13 with "tableam: Fetch tuples for
triggers & EPQ using a proper snapshot.". While we need something like
that for some AMs like zheap, I think it's a crap approach and needs
more thought.

Greetings,

Andres Freund

[1]: /messages/by-id/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de

Attachments:

v15-0001-tableam-introduce-table-AM-infrastructure.patch.gzapplication/x-patch-gzipDownload
v15-0002-tableam-Add-pg_dump-support.patch.gzapplication/x-patch-gzipDownload
v15-0003-WIP-Use-per-partition-slots-for-ON-CONFLICT.patch.gzapplication/x-patch-gzipDownload
v15-0004-tableam-Introduce-and-use-begin-endscan-and-do-i.patch.gzapplication/x-patch-gzipDownload
��p\v15-0004-tableam-Introduce-and-use-begin-endscan-and-do-i.patch�\ys�F�����l=��x��a?{����:���y/�B��X�H��|������������0����o3>�9��{���YC{��YC�5�A{d���e��k�F�p������`�!k������nw����v�Y�,��g�����{���3w��w���
���:���|u� ��N�M���Q��w�f���|}x{�{��v�����B�p���{a�[����"��������%L��w-�9��������B�Gg����F���L�8FK��3n��x��K�3�����R~��l?E�������"�s���'�������i2$�g�B1��R�|�A?"��@�)%
�9�_+"6&"�-C7�Ay-�4���#B���0'���t1�����%t�����}�Q��l����F��6>��������:=����<�5��Z��6�_��$��!v7bR�BB�^�5/C%u���"���;�A���-��2=|ae�����l�������6,V�T_8;���}�q$�Z�!�p����P��G�+q�5"Su��B��9����a���p�D�=��+���^��b�A�C������"RD������a�xS)[���:p��j+��a�"MUB�����e��4�)�z�
D���u��|����bY��M�YF�3�QAP%Dd]s�����y�&=��X�f���6��\�wT��1��nF�<��b"���(��q���8�d��Q���&=�����/r�D��H�<��GN����8�`��*����e�7W�|3Q&�"��n(����h��"�
�G6�#U]=T��.?J�z��.|:��m��c�3���"�`hs�+�� ��O������D�~p�@�"|���	y�}�m���Rl:�h�TP�� �Z���$r<'$���|[��x�-�5.3���B�D
n�Q�bSBp�<(O���M!�Y��������-�E��\����<Y6����"x���R��m�t��L&�-�:���c&��)���}�b�k��~
��ajcq��[������6k4�N����Y��������}�|���p�l����=:h����8����(~����5��:x.|�����?)��os>�m�3��hu����+x���,q�"B(��({���a�I������G�~���1#�HM,5������A�A�%W8n����g��E���*�����x�=)�u�Sn����[����>c�Y���;v9��<��@/��{�JK�
D�	�!�vo�R�E5�n3����]�����	4S�G��g4L�<��`�-��d��a^���<Rc��Hu�6������������]�!g�L��4��Z]W#MM��&�E�)9�Pc�&�N�����!^�&"a@]A,!P�'����%�����a
_d�����
0�f�|)�-.�>}�P&x���K������/������5�V�9�����v�8�������V�_�u�7�'d�g'�y��NA��+�����uW��,���G�kV	5>����������s����g����������������v~yv�����I�����g#3�c�~�8��z0�����R�Y8�g�n4����],O`h�rP�!e�al
��$����H�����75)���B�2{��0��g���m:�������V�U7�E��
ob0��a�{P�,{��y�ql8��� Ty��[UA��=��=� n�Qp����@�*���k����p+���b54�&qf�����*1��Cr����O 	R`���
- e[���9������Z�=<�t�^hW�'5l_�����dr5��*�^_��jg���w�S���V�����������D[!�`.��o�z`����&�����5�~\Q���C�a���������'������yz"�%���T�qM��M�.?hWG9=��n���Ev�}��N�#�s�+�D%-��
��p�l�CM�����1�
�T���������hs$$[���*� ;4C����R���3�����jW��"�����D~e���x���fU8��oW_���[��_zm���A�$��FT}�.��$�H`�D�}*L�J��LK�um�2����UX���h���Z���3����3�$�j��LlD����4uv��z`_'�&�j�d��>��(�`��;8jY�3��������G0�����tfD�
����'���x�������z��rDd$?�Uh&��������������v����T�P�`��9E�*E����B�e��t,��������C�5V�2H��?��3��4���r�����O�L
�xj��Dk�.���� �3����v�u�5��J�O��\;V�T�V�M���tw�_J;*�!K�G������+�{<�tR���~�TT�����8�H����N{GO�
?�q���o��X�;3��b*sn�6S�M���J����Un&g��*��v���2<-A�eo��m����x4��=���1��G�����q����v	����+
������1}	j���;ysA98�(�8n-���0^�B�K��	��s���dg_�E��#�RNR=��4��B&h��l��O����G^�l�#]��dR���m�xE��R��^��
����	�eR�O�Z`b��T�
n�����3]�hN�����$���y��h��[���ie����?��	��3���:��
��q��|6�����#5���~UfU �cli\��S�������-�Z�"�=�;.�U��O�����&��%j_
�����N��]M`z}�����j���������j�I�&Y�b�2NQg��Z<�|	�Y��p��
��4�{l?���m�
���}�!h�G�"/M���O�����|��"���@o}�?������7W0���R�T�2X�����}����T�`@����+fcT�����C=}
qK&����K��xywQ�	 ��'G������S��(���2���MgEmY�I�Jb"69#M#p�k}�g�w����_����\���b�;Z�����c�;�q�Vo����c�5~�1v��.���_�����2�_��58��7�x���&4��8E�k1l��>�t����K������m�7����m�{�q���z\��=��o��E����o�i_���M!H��+*
���Ez��X�oBZ����j	���&@&[r�G�cXTKZ�U�T�.:�L���4. �&��z;�:����t��8[<:�4>���=��N����=C2�rq���P"�v�*�D���'�g�s�4��AR�r����l�`|�@(5*��	�m3*g�T�Dr9�����AL���MF�����u��s��g�X�q��������@W\��O��9�b�����f
�.+���=��M���"%)q[6��~���� �-�Y�� �����pN�)h��S"�|�cL�C1��j���k}��H��f"!e���pp�c>��%&��a*�Cn��9_�`D��@.���( ~h=��=���H�8d�ez#�sL3�f�p)D���������F`�����WWh���W9������i���jMY�5�)gR��j�/���T��a�\�I��X��E�+��Ps������I����%G�a�h=L�_#�����w+hL)�b�[�'i\$+��\&jo��mV�-����.=�}B#���?���iR2����K��r@��&u[�T���-]i� ���b�D�~;�)9�P�*��*lz�M�m�����;�������p-�6%�)�:4���m��^�mn�ls����O	9�p�^*��~�y$��s�n���4�����H� ��+Kp/�����@�4���������*�x���m�M��o���A��-���"��m��2���������h�*mq���Z�U�����
��:W:�?������Z�b+ @�o�9�!k��^=����.�@.r�8�(O��!;4!��k�� ��PD�t�	?��@EeL�wC�������6��WO�X&KA.�����p����c��j%_�����hy�\p!7�h#��B�KsH\���:��	-�K�
����K��xZ�V���:�4#y�\V��&� �p�H���
��%��\�2��Qf�MB��5O�}v$g�bsm&����Q\y
�A�@jYw�J������c����u������,^
3��m������c��|`������4	�7��=Z�_pC�TvvJ	��>�)��
r�R@���_���`�W�l��/�H*~��!f��Uo7�0f'�=�:V�
3�m�R��]&��~�J��J4BE��A%*4y8CYW
�7=���N�/5��isk�dN[{l�I�����c�k�M}��:�k��GJ�Yy�d��?�qsH�]�9d��������HN��gQ�d�O����"��q��6����0<#0����5����c����w~X)��d��g��s>��^:��Y�D�S���\,�g`!<={Z�~$o��W�����	3�+�����x8!y|����?��F7�� ���Q�I*|�-���2����3Gh�\���4e���B�q���>zKj�eW�4��$bY�;U�&�)�]2���� 
Y�����bp��5y��z)p�xu�����������%�(���hS��m�T�?}��nL`�?���Er�f�k�V�yc�k��n��gH}�*�+�T�����@,�E_��|�?�n�|��{���v���	�U����N)��s�HY�VX�G%3eW;��Z���!�d��\u�&�i����������������9@nDt�Ig��5.~<>^�'�+\2���d�����;�
M�hIfr�
�J���-����M�%l5��{���dS���5��;�j3M#��`���e�������W����N�rc�X)�d}NL�vM6D���xp���2���#��5iX��>w���'.IHk����!�`�L���h�R���������&K�����6�v�C�9�k���T�zD�������@��+F��;0H�"|�Xi�G���BZ��S&q`(��5/>���of\��X�w�c��]���q��W�wh�2����O�Z�/Wz�<��Lzi�3KA+�7���b�2<��x�Z�\��w��&��v$�U��2����������R��&��,�tlIE;'ez�$�E	kp���I�<��
qNA������x�ii�[����n�> �e�!	�WG^��`$V,�%Y����~�C�_Fu�az�A���
g-�,MS�q��l8�l���:���a���D�j+��Gq�z��t�?H��_5���^�"0������IN�;��\$��d�)�5�����\�����X/^��8B����{[���^3)�t�*mQ�be2���"�/@e<�
��+UF�'��)I$j
����?�VL����ix5�ss�\����c�9y���s{J�uvo�G��^���<���8^����z�`�L@��X^������Y�\��s��Q�Jf���]���T�V���5���#�c�('{1�G��L1�'����\��yh�$x�~��a5��8\�_�-�:j�'	$'N�-y$��������m#K�,�
$s�������$sd[���my$��9=���(�Ml�����.����t�y���$�P��[��~%�K��hl�U�=��7���#
`�}�k��c���9
^.?�=M�7I7U&�Z����B������x��^�V����5�R����pFb�S������-zp�(Q���y�7_��NR���������b/�!��/bd�KI��1��������u��D����b/9*y@o��|��	*hG:L<�x�Q��^C���jqOIu�p��G����k1�>u*�?:�f��D�������[?	�0,p�U�]q���<�H��qS~�)��q�;"�D�����S�K�!X{���
Cx��x��~2�Z�W����T�D�%�w.C4)�_b��
������^��Ci����dL\��Pe����U��C3���P��>�����*<.�/�F�t!��k�h��^SP�_w��$B��q��O�k��z�H����<�
���&^��A�g���������p����&�7Ty������������2��'�nu�*����Fk�#��1�H���q4���M�
F^;�"`_��4��2PqZIh�1� ����%~�-�u�+��x�2�(k�j
>H�P���K��N���:���h�M� ���ZY�w8��^��)j���8g���������l.��\����8�p�u	6
o������t��o��uvS�
������aV{ZS_������/|gJ;l�!!DrgZ#R�)M�a����	��<�@��0A� ��U�^,�yv��j8L�e��m�v`"����=q$������j��Q�E����������NT4x&O����(l�Y������]F���9�0�B��h�������%���!>��w�d2�	��>�������^�;�y&d�
C�wd�r��Sp�t)��W�d�����L~8F����2jx_��#d���2+���Xs����M�9���K;��P���:���$1�"�����o��`	<��'QI�KZ*Yl)M�[��,�.�������h�K��n�h����C)�\DM�.��E�I;b��96_5�[������;eE�����-T*� Dz=�Hw_buX�:n��O�>���+�gX��X�$�	VVB-��$K	��E?��Y/��ea�o/�Ao�|��WR^�/XQc�(�9T�5�;)B��������
�P���[jAQA�����Kw}0���Lp� �4����W1����=��|��K�o�����2����=��d����%R����e��=����;yp.}y��=?�sg�/Ze����/'$�X�b���El��d&�h	����`���F�hD��@w/JoU�^����;�j���7���7�\='$���W��O��G�&E��$����)�L|��m*�-��4!$4���8?���1�mn:��6���wo���4���VC1N	[�"���,}'+�cC0��������������������HY���~��M����2�9D--k�I$oc���d:CV�%��/3���!��ggH4��8X�5����17��[�����.
x�l2�ni.�Z*6��H:H��W���~�X����UAQS!M��J������Q����b}|A�L��Y�)��
�L����:�`�C������%V��b��f�z���p��d@U��Obi�:(F�Fi��&��?��N�8��������5����������k��)2t����V<�R���Q�I��2\�2	W���^`��mLh[��*������w��������0�t?;Q�7�"��M�C�3�a��qop��??�N)N�
��aL3Jd�w*t�E��q��o�s�}����Yx,�W+v��*v��~A���������kx��@h��[�?G!��9�6FL���l^
���:�il��j|&��`2���P���}� W�8� 9�M�g2FU�d\�vK��D~�:�N��J�s�����7����z���b���}��QX�KjJ��]������N{"�k�M&��1-
��a��Q���z�jx�\���[�]�9A@3��I�{�*����w��
�BK�_.-����5d�	�"�Fo�VI�)W�� �8F*���B>"q^o�\'>�u2�K;����z���%��z'��H^X�Yj:c��x������K��� \��e&=8)�U���
�$��������&N�� ���z��eL7�|�B��?rvF|?x��<�e���P!n9�,F���y���s�)��l�&�������5���G�>������u�)�<#�z�$JH��%f��f�=8���������]�lG�=�@��q�<�+�z�;������@�L�J�	��n�a/p����= ��&�x��1�;��Fwy����S�����OS�D�MJ��*'����\x�N;�,�C����3�� �b��.�#��@�E����<5(��	�1a,F*�`�<=�=�Aj*[����pfhN.$�K�O����$_�����$���y��#�^�S�� ����5�79V,��3�������_��?���IW�E7������5�G��K������!�"�ha�.C(�FsQ��wI��
 ���`H�q��1|u���}�����R�e���cYM��r��!����V	\��'b0�?7v%1����S����S!�0��F<�
`�Hu����
��Y�5����yr����E:^�"���"�w�E�^��4_�T1�l(cw:����J;�!�"C-��82�����E����NG���� �{a�1�T�����������������P��f�o��5��:Z8g�h����E�
\W��3�%��X��Bj/D����]9dd��|+�-�	i�6
_A�%�Zh{�}��+��Xy��mK&�hY>a3^�6D��8!��gM�	F6Y�hAB�h������<����T��Z�$+�fw�B/W���Tw����cs�����"$��hwC/�i��U82(�U��;�C�<��\\&aN#ZBx�&R�S�����������2�i*��AHo�����b�gsL ��7i�r���C����|�G�F��3�$�,����o�l���s�����������y�{N�,�@x��������[�1������8%�x����O�)D
�k��oQ��@�u�R��x�@�
��$���:r���~���|vw��I��.�D��y������t�cu����N[���Y�����#��KOP������RI+f	$u�g��)#�F 2��UJl�}����|p�15�J\x�t�t8e�S�2N��� �
�]o�8kZS>��L�Q�d��ns�����.����w��h��S�w������r��To�Q���
I&�K[�����2H����Jj��RM�#T$��P�eR;�m���BoB��K���rz#��:���C������B�����3��B1#������8B0$T�:��>
P���)f����FZ��"S0}7Y�z�p��M��uH�n��!�@
?@E4f`o�R���O�>u8�V���54km�&u���Z���a��%I7��:�F��������h�f#�
(��t�;��
��G�2���h�5��`�����<(p�d��5����|�r�\KM�&A��������r�;�i����{���VY�����Ke�ez��V�)���5q��5+m�DTSY%M��"���qI�n���)��V�o� �'!����[�vRz8=T���S<(:"N���%k�Z�y!{zR���^"g���.%���[����452�D�o���C�n�,��-Pv�������i��&�M�VI+Y��m������25d����u�!��
D���~e�@R��_JF���;6����B��Q��)��f ���)p9I��k@���0�m$c
���O�*o�K�~��j>��E�P�����J
���dq�����
6"���	�$��~LS���hx�<3�U�_�(�(����c�*��d}g��H��H`0�M���z��;�����@����]y��X*Prk�kF��v��`���$�B�Eqv<Xb&!��c(^��,��(�P^�~����c���:UCcO��`Nl�2HG�i��b������(I�RR������W�>�
N+����i�5��/��/��h^Tz��0�.��w<}
p(�8�f�-]+�3��AXHf��9�����7�m~*���~��:�<�\O�,^Es������RE#�6�����!m�5�<��A�^��S��Mte����He�2����)�%�6�7�m�\S}���.c��08yj�`'c]�RI�����X��nkr�W�I���XV���c�4N@�o1j�+o��#��`f��	d�R�l����UN��T���
���&��1D_�_9��GCg&�g�+�����(�i�M|��@����\,�9�����..���S�QXC7>5���,��y�����l�/1���?c���A���r1�����$%3�C�oRC��k����5��N���\�w"9b~�%��QC}�b��jo+�9��|�[�:%m/$6/�FuM4N�%�{��B�.�34s(Gh������l�!rP�����Z��W@�X!����X'��@Q<�J�}tN��H��FP��\H��,oN��X
�J�GW�)�
����w�������2�SO��Vl2�G�{�k�d������wf�,�����u��St-�G�61��qJ^���{���Bm��
�
�V���	[�1K��,j#���)\
�/�G���&�i�S�����ts�}%���Y�Qzo��l��A�@�v�����h|�5��EC'���+2�b��V	���+��}�E�n������j��4v��V��} �*�������(��"��)��]�>EGT�������N�c����
�w�g�D�5�C�w$���m���0��:�T�D ��G�
�)Yms�����
�������rJ��3�����l3��n��E�!�!S_�8F�Z�(��eJ���Q�GN[ P��G	����<�kj�]��YQ��}d|���*�
�#��{]8GTd��Hs��v
�;�������n�h�qO��0Z�R���8��`��\��}������\�@�-��O��������
����J5D���~�S�$CP�"�
Mls��R��~����9S�P�������$ZK����
���J�,u����;��
�W7���d��}��m���PM�a��S��������AK�.������2N��{����5��JD��f���s�E�al��S�	?��	T���V�e����uf�����\�>��>�mG�Nl�,Ps����h���1�����'s��K24
�xp���-�D��6l����@dm��N�`M�����}>:@:���G������2���B�a1v�wT'�&�p���O����S%�c"�S�����F ~����K�3!&m"$�r�#�n6w5�)m!18 �*��g�f5�%�Kpt�n��� f���vVO� T�I�������.e6�t!w�@��5�����w�t�����[��S����"*U��q�Z�R0���Z[N�\)����������;�(F����������S��+���0���o�������/K�����/M��\��Ll\��hA��o3(��� ���.	���D�QW� =y�|10:B������� u�o�,��n����4�V���3��v-8����"s h'"j��[�������!~S��O��=Y]��q�v|L��<�N/.�/8'F-L�9������_�N�?=����t���jz�����W��e
O�^�������/��[�L���1�������'�_����O���9��JtA���"Z�k��?���jr� TX�u���SP��[:.���A���s�YPl��La�����z�)��(��.fY"4����z��@;���8H�G�=0��JI|(��0�k��&�����b�����z#^�i���E.<X����7�W���7o�+����w������bi�A��}}��Z��/H����!�kC�P�<f���
��(gA�_��+gJ��r4�	*ul���yww��������+���������s��
�+X�l'^�to-������e)�Fz��at]��������B�p^x�_���E]�{R�{��:���%|��9	�����r6������p�J��	�}
w�82���a��/JEJ2�q-��P��]�L���.��J0*�kd1�0��o��M%!�9<Q9�4R8�����{P��&��K0����Fk6��]����\���vm�����<B�v_�p�����>�h0��f
�.�hIY]�I����A��ak�u���MaX����������>�����4O�XU]�j!�����6�F�r�``WJz��Y�&-��]J��F�������|�B'�B���CcZ��d�;�@/5VN�
������-�R8#'�e�Z�U��<0���d�U���]�pn�)
�&��������-�H�����8/�����&���D�$�*\j��a)������r���&B	AMS��
�Bf�;
�����@2!N^�
�����9���S^����s���wFlOk���n����:��,�}��jM��p��A�u��a��D�����^�2<��s?�p�Y��o�z��
I�k��Zl|��kA+��w}���,�����;�rr����G]M��[v�}.�ju�����2�"����������lI�c��B!���I��F8�"�Y��4�4(6y�q�.�iqQ��Trfb�pL��~�E�#��$uu)��]2��W��*�G���zH���"����F�u0�p�l�\�e�e��1_�7t�}J�3Hb�`hu���;��k�!�r}j�SX.�������Z��@��T���g2�S�d4����0����c���E���Z���$�t}.[� .���K��P���O�=��05u2����F���8�<sOF�XH.;`��K���0N��)�Z���<dG�#k�c���#��P���������UJ^������^\����)���J#�/�m���
��b���8�ots_EA���+��#��������w��<�gJ6n�c�Q.���K��!��C�#sM���@��Y��u�t��)3�np�+��|����e�����6���7F5$�������E�S�=����C��\�W��S���X��K�k
m��Y`��h��2X6�y.������j�\�A�Ys��CZ��#S!m��OC���L�vmS�&���{�g��������[��o?���c���,��,���{C��w���m������kM�6�`5���0�v�M��~%l���{�3
�7�\0�7����|U�������(��>(�np��="�����ER���o>�����su������|6���\v�a�i*�ok���j��knk�?�I c���lc3:r��v��`H��sH9Ly��{�"�4U�I�#�+�H%��5�)/�-��W�	2��5�2��`�����r��z+e��x(/���u��h��o:@l��b��nd�jj�n�-[�`r\Hdc��S�os�
�N��V��~��dVF��~����f��<w�hE�u��Q����^�*�`�U�a��u�0��W8p���^�9�@7���	�2yq�����	����E����$�]*�"�3JT9VV��m��n0��Z���;��;��;U8vg����l�����c�+���kd���42	!�[�c�k�L��)u9����e$��A.-���K�^s�%�.Gat��B�^/T�����<�����H4�zqe�^�T�2�N�m��[�wpO���\��NP|&O�,fG/��C�=�o�����,��g`��
��������n�1�?Z�9����t�������Y�|y��WMS���9���m��2Fa}�g�)g����N=�� 3�6��e@����p�&�^@V���d.I�R���|�&Q� �Z#���Uf�f���G��R��{�/�qL����Q�T���8'��~�#�Bk;?���F�'���@P�`�����:G?d�,g���@��GX6�~���B��A�������I��[���xO���`5��|Jjk����J#����N�[/����Am
�9����3��x\y8%r�!;������Fp��z���U*��2�f��2��l)��s�/���i�{�I/����M�Q��F��"Z���g���AC�4dC�+�c�\E���J�_~������Yx�C�,Q�
����(������
_H������MGA
=9���T�S�t���������grO�\��d�	:��GEM��R�ek��s\�-K.!&tl}���^4<�@��J�,��A�c���,���^@�tw*=%�S�q�����'d\���
@;\y�b	r6�a��?���i���,`R�,����HP�1&m���OH�F3�?6E����~��/�Ct�B�3nHo
6���-AWM����
�#��I��8�������KopD	Hjm"^~b��O���mD����Xr9s' ��=��h�9AaBWH.SG8	��^�%��0�����[��|e��W��z(&�>�x��QIv�u/\�}�(u 2�.�8�6�o-��g�����":�=����3]X�u���v�1/�
��OI�
nl{�p����p�*J�&3�U�Q��~�����>�r�5�J�q$Q�.�����F��Fe�����������22�4{��W��������X�1�e.j�@�,^��Z		�c�������P�T����D�DA�'���F���{B�a�;��Z�6�P��������a�Q!N)���������(��#���?
yBx����U@��d��=y�TbjO�������2E�{^����,YR������P���];U�A�\c!��J���\/��)sY�m��ja��s������K���5�|������y�A\��F'C+�N��,����J!]���r��2�V�P��%{�P�k�P�w(��!���y�������c���C���C/t9�=H��Lr����������uo�Z���XN��T����(1�����$oj���v
��q���
�A����as%4iL�i�*���YhH*�A�z��`"J�^S������l5�oP6c���`��:���N��@}�<@O��<P� �9XX�/�B`��{��Jn�>�2��
��Q��ri����>B���d@!���k���1��8�&�U���d����T����g����2��Fl���A������.����3�=
���?�pc���Q���_���_o�����Y�eD�&R�=�Ph��0����.�<�n��{e;�h��k�7���0%F��
A�{�g����E��>\�K&Y0�x�����������������"
}7�`Y9*�RS"�[�R>���k���G.�z���v��9{�x�$�jX�X{�������@�i�������q*86�^;>2����r�I�8Q��4e�X��/���J�*u��������j~��� RYP+��+�Y�n�^�l��~~*�������yu�n����W?^��oS|��������.Nk����"��P�{L6Y�}��~T�/�YV�:%��w������;��\�Fc.9OC����&N��Y�EEQ����tW��YmTx�����?���3�`���RBT�����;E�5z���Uut2 m�0�B���C�mZru�v_R�r�{�K �l�^��b�<Q���9��#c����;U���h�&=� �iAJ�T��/�Pzd�����X�*�������?�*+]By��[��I>�"�pSQM=���)���Xc��zql]�Y���k�l�~���(!�b�#�m.@����W+��e#&23,V����A�6L�CH���s���Es#X:4nb
���rf�>��9�	i�qg}�l��	������;(|��o�J���
|?'P������5��
�s���Ob���21��@�40�����0�����71�`l��>�|.��E3�YE$���e'�f	�jf+����N�(2C�N�������Jz�r�\p���G�+<��=80Tt�_���J�h!���Y �&�?F%J�U�DvB)7�Yg��o%�Q��A :#�`���P@]�������b���.v_Pj�h�R�q ��hW;�k+�2�;d�
��Q3�J����2�,1�����:(�6�!M��V������Kn3����5�86�@�S�\&d��e����h�WW9�vw7�L�h*��GJ>v�����s�m�.�x�V��^�~�	p���PB�I��CZFI#�kt{�i��X�����-���	�3��]+Vd���qOr�wP0&�~���\.���(��&�PrW/�K27�U�S�K����w�k�5�'�����x��z�p<��4qj�h	���s"���,34<�vQtJD�PO%�����\ �;l8��~����LOrW�����������B`�x���������C��R�#2��:L ?
DcU0qMm�i��Ku�u
���mYf,�����������#�M��I%����Z�s%��+�8�P���=c����n���(%��������("���Pt��< }�������g/�<�r��[�r��E�&�`g�y����o���uj���T'�
1w,��]��SO?8��y���!�;��bKf��{���+7���{A���9������X����fI����2i�E���2O+���M'�]���q�Y�y@p�N�����S#�=�jq{/��.�8<�H���,o�e�j���7m�>l�}�$���,B���K����E8�?���!:xS��VfQi�?��S�+4��Z�s5G�i�1�l��A����_?�T-�Pj���9�%�8�	��5��	�'q���p��R�.��O�����sh}o}Rq!��y��**�Q��H����@2H2�]���N�v���+`�&%��)'%!����qMf��
��[���

�{�m�M�"3�(�,��2m!��'�m��a?j��;��$y2/`��&�Iz��I�=���.6F��r������p7�C_�6�.�*=zb�-f���9��J>��T
U&��K�� ��^�P��%��YM�%����p���2��-��X�"�`s���oC�EK��2���L��[*F�~�,�J]��[��j�{	;D�{������x���(��\�}�� ���9
��E�O1�mB��+�jU�A

"'\m=@e���Q�)
��Z���'�8P�A��q-���L\�d�F����4%��-��1��3���F�/��A�+R<�R��!q��;��rY���u�p]�2��v3<V�<�w�Z������JxfT���]��TkX���	P����q����bz@:Z�O����BG2�q����&s�k�([��L�8~���sob!X%�9�]^j�Y��!�(�R�5b/��x�a7\����g��|����R�wW���oH>OmPa��`�)���g�_��e��2l�t�MQ�{��b*�G�����^�����'�f��XX�2��4��+�G��yb������G
�c�\��i��b�t$�$O��v�N�|w��v{#<k��E/�{�?�
�y�����bN��0��x��r/�&6�
��7F*���'�����8�b_H��Rb�{7�����i>M�������G�3���H���4ky\p��l���%,���.8;A�<�!�3M��(H��1�|�H�o9���!>6 ��-�,�����aaL?��4`�����p>�,�I�Q�VL�yo2��Z�����w'��drI���!�������9m���t�ViJ�ot5{;mo�5�cb���`�%or��,pP�������X��2AI���D0�f�������v,���
Ow?8j�h�L�6E�o���Gv$cC�z"{@iW3��r���fi��t�FV-�\+����"Cw~�uq��11�*���^^�����X�n������<�������h�W�6=�/)q�
���Sv*b����
X�a�N��o��f���?��U��P�J�B������
�Jq�g�[p|������J�g�q��Y" :F��6B<�60W_Q��Z:����5���;�{�jm$�S��N$��������{rq���������5}yuqzJV�������~�84���{^�_\��%�G��Ij�'n��}���|'�m�6kXhp����Y<���o���������?���g�G��R��`>��+NX`�z���<(|*1��"&�:�/�	r�RB�����[q��h���n����i0��E
�S����3eS.�����J���t�n"�qGa�`�QKm_s�D��4����D���"�����`�'���YP��68�J:I�n�%`E�!��b��v��R���T��6��*�A5�4$�O������-H�@^j1�{��&/��������27C���^u_��X��R]��#e���
�6��n��r5���n��;E�Wq��Pf�~��b�P}@_����!`��2j��;���\K�Qx�|��(��E�w%�GmZ���HkQ���F:=��A�8�o=\���>��F�x�K�0��R4J|l��xI1���O��4�X
@��{�u���@G3�\���|/6�t�7��<���<���%��j�h<;s3�=�dw�T�����%2U�������7���Y�5�
��`���2U��R���py��cK?�7�\K���%q������=�u-~Mm�"k���J}AE^�����%�7����E��W�_����g6�~����*m-��r�x-�q�����D��%x�k�h2�����X���V'��z�`>��;�$�])���V�1�X������UU����UN���^�x)��p��W	���O���L	}��gB�����I�40ge��������\M�N^�9���zuzy9}{z����d��`���&���Nx�w@�]\���e�@��2Z�5=���A��<:ze��9cN�������%��di���d���.�'�RB^(x�����]��"�$gFyr��[H�jj��Q�>�VF��m�H�V��J����X�@�GEkJS���pA��7T
p	n
kh����<���`���*�,�!����iJ'�sQ�����J��u33��O)��!�����~7m����k������/g�;wP�W�q��s����i��tq�G:F$R`�� -O���r��YS�����|�`��ArK�1M$�)Rr�]�����
�e��C�+XyHQ�W��v����$��@�����OY�4cc%~��h(}������d!]dV#���|��C�E�(�^4'�38��}�B�'����)>�_�d��� EN�������^+O��]�<�����M�n�%u�����"#}�(
+��2�G���Lz%�Q��j}��|%��M�]8j�I��aLg�a����i���w�_J�Zh����@��W2:Q~�K@"���E�w���n�;Q�	w���o�u���^���Q9=(�M�y�N�t�J�40E.���������K���&3r���v���=e���b7�����Q}
y����`
Q#��>���S>o�a@�,�4���;�B���JP�bf������q�9U���	����GA0�8W�	�F�h�'2���H]�c��+����>��JN��4L���?#/A�!7r-�;����{}<�h
�gb^���7�m�: L�g{3���g"4�2�Y�������ow��/?���������?v�[��"��-�����l����5��0�2�F^
�Rs�ev�X�P������+���|2�����f\= �J���L�%0����	��F�?N��vI�a|D�5h����.���X	�8�33��@}<��-����'u�������Iz{��b	;"�X�P���S5w2��Z{g�&� E����D
������:��p��JX_�&9
Xy'>�9PC�
!���mJ���eT[���~mn/V��lF��Z����.�AF����Y�aB���	b
r�3_��*hl~�P3��o�#��M�����u<�����7��%3b��H�g��!�<��l,�gKQ��NkX�@O����I`!��!����T����%����V"���Q�#t�?Z�����+z��P�?QY��9����srDJ5�p�)�O���me�~V�3����t�����������s�?A�m.�	�qM��T�D`��G8��z�I����d����1��cU.��A5x��0��O�$2RA}-�U>�D��E�J�e$>)9�����	��^�jF/�@(�+o�y�!��7�D���m���#}��D����e9o��.�
�}@Jn���PNq�/E�b����������$�U����(=|�����:�#���TS�B�;xD
��54��M�;��V~�CEs�*���:��h=O0k�&�
�G��H������B�F]Z.��V���x<e����Lp���m��1?�[GL
�����������^/��G�D>���_�knL��g�����i��_��K��`�����>9e��J���K��<�
(��]�OH�/��}�i���>@��,�EM
���{m�tQ_�:��S�M�z�p������q����9�����}`/��HtK�$���fa��k�
�1�N�`��xLo99�L��dV�.5��$��.���b7�E�V&
��W":�.�[��������D��g��3��1�|�������:c�^��ts5N�D�0���0H����\����(Z���
��">f�����|�����f����w{����Y�;��!���~f_j��������t5�F��I������t�[E)������I*�������0W��kG:�=w����g�l+���}��[��U�wt)������D�)�������W�x	E'���wL����;y{z�����%��Z
~�]��)FT����#�����-Ty��7eJ��G���}C#��J�}qD�R<��#���25�/s~u��%����7<�����S#��?x�����GQ����"2��HF]�@��}e�r/�������!T��\K&��8l������Z�p��G�^��G4�}V��|k��1���c������" ������%�������T�{�O�EB��4.���3�6������;J9c�4m����}��M�vc
s#u���c��hJ%P�b�������^����O���l��t5��L0���A�B��7�YG,��
��d�f��%��+�%�3�y"=�)�������$�����pa;����W2a%a��tF�8F*�]�C#�Q��c�?W�_MY���)�<��K�$����5�����@.
�/������u*�}��l�F�V�D�6�q	Fa0���?j���h4;�n��� woE��n�;��
����p]��V���l�A���2}�(��L��k�Z%�
&B9�p�*4���p-N��o�H�Op�0b�,������	�v/N�������NY����Y��������7����BOkz���_�y88w$���-g��5GE�	������g�J4�+=r����oN}���j�BHGe��%��,)\�o���j5W��yy�����^�^k��KJ��e�4��bL���7o�_�r��" ��L#qP���40��8�����g4e��5����h�:d�.
����>�\��(Z Cbf�L�\r���$&�o=7XQ�������an��P1�p2x��+3SYx��_Bbhj�	�I���99*HL��^����O�����<�,����!����`~?�����������q��A�����������^]���2���<����(9Yu$*4S�x�?_�o>x��kf�:� ��]<�Z�x���`����R������v�8!$�Y�����������M����B����(����#��M�fz�����+�*>���v�O��A����&J3!���uR�����CxG!�!2/��A>�S|������M��'���_������x�����Y1j��9T����?��{��u���GG��=�����%����4M.S!�����r�es+���=$yP�6�	�����R:�u?mxqK4!)�����r�w� ��
�G�6(�=B�>i�.�[���]������0)��CD�l/$���wgo�g}������G!�?����q�������5��|�8Y8Br�]'h�����f�#�5����>#�<�H���Ai��y<�s}@>�5�%Zi���q �xy��B�#o+�F)G@/���f��b�]L�V�����0U��s��s�s�aIS��N�C����l���j��h�P���=���)�Nt��������h������M�����l�,�Y�Q�6T�]��2���r��K��[�K:m�J+���������Y3h�	0YC�g��\�"����=��D�k ��xl��(�(���B������XD����;cs-N��b�X��;���+2�d�/�wI�������'��
U6������yD��Y`^2��J��k(�3d34c*~+�q����,��auYn�}=�7wC�q����g�����
����M�����1�="���"�l��U�
���1C��]��T	^.g�hD���)'u���y^��"�c�8�r%Jq5v%�����A���VG`�QR�m��tke�U���6����&�����Q�����@*������D�y�V.>��L�>��#��� B�	5�#���
��q�~ ����xd�\����F.>E�q	��b����j��wd�'(�4o��0�`N�����9=?]3^���;��n�q�}�T����!���4l���@5�t�F���M��7���|��]�wQ��?{�Sp8��5���(F��x���TO%!���A�p|=q��/�0�~YE��������D+�i�!���#����U������oR/����h%�-�/u>�n��� p[�r�b�?�_5)�p��`w�s�\��Y�+9CeP��z�)]9�:2
��K�����Y�uZ����}Z�g�x:[~X���Y�fG3O#x�_���T�����H[��j~�oO��B�����Q���f�O�>J~d�k����#�O(
X������j0'�
�]C�<Ms�Q��[�7����[_���M�C�s��t���!�/d^��CEc��A)���j��P&���+.���C����	��e�n�s<r%
�D�������u�=<��p�	�]��F�w�gL"��Q����|���[X"�A��?!,�F�Q�>d#��_��a����9�C'��K�(��R�C��j���r�uz�0��?�
�Bq-
OSXg�7y���+J�����a)����~8=y��9��OO^���:q��!���b�0Y5-�^��2$V��G��#����#/^LP���D�Q��?�G���fR�AHr��GL��_�y�FF��,&U�Sk��g�=U{���r��[������&[F���,.���Q�W?��C���>���h���������/��+�e�^��E����p�?�KE�l��;�)%P����Y.�0�Rv
?���*1\B6�
t�?^�p���d�6��W���~�/�)�VTn�z��u.N����fUa<1��2K���$���_���>�e�����|�WSz^o�K���O���C�g���9:���
�k��,~�r������} a���99�xF��m�ZoW��7�=9SOt������R�N!�Z�>P��,9����G�����g�Z��!V*Us���d�H����e�vSh����A�9�xC@fnL��s�e
�I�&W�<Oi8-n�K��.C�>^
k]�����^��"f>�o������K+$��:�iD�>�%�/�dn�U��sR��7�I��'2��N����"H0�:$8��#�`.)p��)*�E��?�$��pd���H�@�c��^�:���>Z�����
1��:A��(|l��k��15��N��|��H4#|��}0nt���}��^4@��7)-g�it��x�m�X.0�Q�f�q��9�~���O8�d��w|�Ph����z��R��.d�[��������5��*\�ak�z�y�7�n�����P����*^\�d�R
A]K�V�[�������'��?+��o���+�qK7�����e)u���b��7����3��Z��|��_�]�_�F�%� S{�0����r����m�'T�����W�+�/�������_���o�� �bs��c�p�;����"�]"��;����
��������?hp�����6�d������t���o����������p�]^��x��X�@,Q��Z���AE|�FAm���>�u�!N��=���%M�if*���Kk3���^\�_4R�E.��\�5�j���������'o=�~}��V3qU-!�d��*�Uh�Tk�O����,Z�n����n��*;�$]��T�	�W�7��oy��,�����z*i�X��sCt-��X^�t��_���/�^��������q��p����r!��b��<�r��L(	v&C�g�=�E�}��@�*��Z��T��c�
q��C��?�)��$�}>��������CI6��2�HT�j&�"��h�u�������_Q����J�a7%Et)-
a���������zL�t�����{�\�v�5��]/����it�|`�����T(���~*K�����K�D_�L���Q�o0�G�<�:;w�w#!dy����o9��7 ���{0��N����$<N0����������a�$���3v;Ihz�`4��~���aw�����M��JJ�kK�2)�����1�7

��� b:H4�4V,v�ous#J�����T:8������~��z�#�gM������g$���3�[���C�^����R [���wpH�������|��p�#^�����I&E:Ph�����^0��;�����w{�A,*@�+��/6B����E������z���1������g���������k
�!bPM7������w�- &�'��9�+���b#�t<��P'�����L���L��..����$�I��m_��6cm�pz�/+��#$��q��|�
�\h�k*'�GT��'�xg��?���a��.�|����RP�'[L�m�7�g�����ZVD�hqCM����-�Bo�:����X���R�Iw��v�4��G������M���H�d �%QT�*���>�La�xH�fEzkzzj^�[)��������h�3��,d�*���A8MkZ�}�'���W����\r�9�#c>9��u�,�Ix��;`�x���:���\t����r�^]]�o�r�ox�Z9�{��ty����=�MrA�Sz%f�{H�]kt�HG;��	�3�����O���=%��I!��N^��������7/���^h��)�oW�
�2��*7�'�4��������#U&]`a�Pv�dUv���`�Y��,E��7?]^�^`�b����
�Gz�����%?�ZY�f����lkd�R,�Sg�L~,_"6JS�O�w����Y+��z���,VE����*R�C�e�
_�+0�pN���
NM�����a���+R�XfG|���Y����lMfQ�u���U�
�	�R��%�L�k�B���������Y��n�����d<�	l�iO��������u)�)�{����%@)���!8���{����a�b:_�_���P5/������(�V�"��r���0veK~y~�F)�^
�����]l��$�he'J���J��
!
��k��1���3���XnP���?|���V�K����k1I�v�����@]��=�GH����� �6,C.&~�w'�b�o�z��\H��Yo��i�V%F�������A��~/���,X�wr�Y]_d�P�m'��/�I*��7wU�o��1������Vk0��E���D��~*!�-y�~^{g���z�`)�0����+�<q������(����m����LpHe�����X�h�@@�pp�|
EjH�o��)@"�"����8N	>F>c���L�:���-K�;�N�Z}~����t��%���\xk��>�V��'��	9�`|m��	c6*���L9(���
I�h_i��	�UYnB����w�`&-'f&"�8�`<�,����~��
��*W+go��jG��HP��Q���DnU�~���F}������*L��t�M�����e~���-���!jC�k�b�9�(@\|��O����o���J:��&��
Z���:������]1��>���5�x�@{<��l���t)�jA<G���t�PN��+I-�	r��J�����>N�A
���,1��>�EMFW���RU�.~e������8�������_�\�N_]��\�z�#�]L/���:;yf��O�_��*������M� �R��Q���@��Q~�'��v	��bk���t3�>������@�gD�C1����^	��G�����T�d��s��M����n�d>@��3��	�07�j��smA?�z���Y����<�4��H��E�!#���n��������!�7_Uq�F%e��`���|����qo4���S�:���f�8��:��1z8����s�j�^����H��#��!\�]S�Fy���]��N*,~�Q<n�E��1�������@�)��v�(���e�$�q,��x�~�2F�;�/���eg��EKP�1Y�Ax~J�%�X��a��R�Z������C�{���:��]�<����f�+���>�rw����Q��R'{/]l�N�@�&���y��[�i$�����,�|X��R�2�t7i��T�v�	���x<�������Eo�����i7;� �f3�<�0���N�.��w�v�D�A�)D����m�����)��Y��a�����lv�(�&�s)�P��(��'�}��FUN���f��Si����PP*�f9g�������G��r�;�����^�zOMa�>i0�d�r7:o��v�WR��%���&L�D�V�+�>��r�D�(Q�ahJ`�PI;���`���*�B#�����
��^>#L�=�����xa]X,���*�0b9^����UeT<$R
#�o��Qi�~K^��	�������8k�u.du���7��z���~)W���*A&l~�H�t���c!����A�������vZpOa4�h(���2&!V�It�?^<��(�3����;s:�cAF��y����>i0����������������@3��������i~���KLX"�v�|��N��K��N���T���_g�������������k���A�j��������I�����F	��JO�]<z�����bat
W��7b�F)rPzi�@i�M�c�=�����+�2�,7f��Q�+����
.���W���DO�c�+H��M�?�@R��>4�Z��4���pd�8�~/�LI�(f_[�#��w�{g�Q
������_�{v��m�1<Y.k�'�}��E��G�TK���sCQk����EKb�����>Kv�]������g���_N�#��>�J���t:#���/���[�(�$~=��u��Y���
��*gqvhg�����Ag4'�N�5��p>�,F��gdo��3�q��{�f���������'X�MJ)�#�m`5������y����?�N/�����w?
�r�l�%~`-������N�H���>Jg���x�
�+��0f_�������}*z7��Rq���t�^�mZ������/Z���>�n����<w���<����W�d;��^��RW�~�}h��$Y��������������:�#�$�f�{�6��5��*;\<����/����uF1&�`bs��p��\m�4���XQC�`,sv�U:������H<�����o�����W�#�E�U�$C*��i��U�����=�(���,O1��7.WSU��4y�������������N�Q�n�
�>g��i�1�*b$^kb���'����V�����O@��&J���^T��c+X������5�~_<o�n�
��L@O�
����X��K��D����
������?�ar���`O�����6F�A~�yB��w�M���W~sr�X�8S}_*��];�����Gu�rF5���m��7w��5���	�
O��d����������\���������n~�p
����9Mv?X l�AM��>t�_���x�_�==�9����iHK�<�����{���Qy�x-�8;�}��*�R�\\z������.`n]h�����]EX�v86}���g���'��+�9f�_�Mi_���#-�w�//�	4�9�Rf�*[p�
f��8h`�tFt���v��`_�����Wv��5���K��{�r��Wb$Ioh�����&-����g��xX"�����:I�R��k�W�����d����8��]�
 ����h�n���}�XpV�/| v�g��{��T����0�u��V+l�;����M���:��R�V��T�|h�Q{��z���n��\'~���-$@��
{+����x�G�wq�n�\�()r�L������ LW�+���dp�j��co���4�%+�%{)���w,K
�hq�����g������X����1�T6s���,o)-$���}<K����R��2[q�w0��	��e,)Y�~�=%�9�P�%�?>�����X^�~���u����"����2���g����*��jn��trF�Q����0�c�>����l�sqbq����]�����`�������yDO�]�����1c[�#3�R|���s�8����af8���;�4?|[��?��@o*��~�Z	N������T�h~w����G,�[�/���-��N��vF��
�Z�=<�M�������WPmh�0��e��d�������@��V���O)�!Q��fT�)AB�R�g�8K1��X4��$���h��]*#�������n�b����O�7���5c�����U�-�v������W
�JK�a-UWR��4K��9F�@�t�����G����5�o���'~�.��^���"�}b����V��(��y�/�*Z��V/XE���N����b�l��)$�����:,����#�!�vip�y@���l|��dV��WT��p����d�d������j-�Yw.$��0(��J�+^v%
)�R��,xn�8_��o���<�I�#e$;m3�P������;x�Y9�#y37���4��_p��g\�p�E`i����S`W��HR(k������o��d���������A��i��`R�q���Qg��3���{FKL�B'~u�(�N�d���S��m��k(y�>6$y�pD*��K������@VB�-8��Hl�gY��\�����IEx��1�;��$Y�R:C���,�S�������wB#��v����.6(���������(���#3u�C��QT$�����`>8�2�s���`�<r�e$������J�B�jp�	��D������
E)�Ra�B>���i�G�h����!/�����`��.�Z���%��D����S�}$f��2�x@��;3u�A��N��.��0O����g����`_~�n��B~��p�>�&���E0����_�\�<=���`+�T �z�Vn5ZyUh���ZV�����7�9ox�1���:NLoU��2z���D����Q�`�1�k�0Fnp��&h�He�\�J�]���
�]���sl*���\u����-�
�}����s��y������"�/�	�5�GT!s����|�IP�9�(T���v:fi���=o���S�����+����%�!,�R�C����6d��'����?8�5.|�E�YcU�m�4>��K������q/��E5Z�F��@I?�W��YW��u���S
�TU�������c��q��M)���\r$��d�!��hz*/A��� P����<��=���?z�����Q�f�G�pO��E�Q8�h���1`G�B���g1��
�OB��N��j@�Y1$��XK�W�m��T��#���V��Q.����{��]����d�P�q�"��S���}�L	6=��/A�/�����~f<�8�Y�|��i���&�g{nJ���7�������h1k�:��`2��gaE���.w��vcG�Wi���DW����8G�e�2��/�����sco�He_����E���8
>�Vg!0�[!�-oo3J��b=��6�8G�.�P��7(�������O[���pb����D�	�!6��V�+�n�SN�}��������,�~z��.�`3|U����b��'���}������{�x+.�$��?85��Z�O|YW���`���������j(�k�^���(Rix�6�@fQ�X���X�����\`E���?IF��_�6��K�+d������"�����"��1��e��[�+(�)4�?"���{���J��Q*��c����]�vE�P4O�=����9�F�����Q�x��:
Q�L>B������r@��[6_<���/r"�����%T��Q���@oT���ye������EX��2�V�@�����q��M8���`�G�������Wd������q�G�b������`_>Fi4��Qv����e�!�p�U�G�"�PT��e�1[�G8	��M3!b-�l~#�����4t}���+�e��-�/�%�Y6[%��~����D���*���TM�I�k�b�E�*^F���>C�_��>�^0C�D�^���O��2\�2s`k9�F#�x�������R[)��K2�p�o���4�����$�	>���b�hB��
S�{�j��+X�S���|Sg
����ik�?5���RR��Fl� �3���d�a�6�@�����;kT}0��>#L�j���P_�g��t(���>8�A(��f4
(
��f�>��@`�Y��w�+�Az��#r�Z���ez��[r���"���w3`��������q�S]'��������S�J����E�4y��mt��t?$��
�P��x!��dd.����M��b��^}���X�/���d��Z.u#A��YKv@ =�w�V��NvL
��}Z�������8��CL��G�+Ef �E�;��"�_s�}e���w�6�;��{}�{��T
��BQ�\��R~u����^5��
38�O�;N��A�!�
?�����k�O�k��n�o��$]��
_k�@��(���'���:IK=p��������0��Q�M���2_,:�����G�P�uUA���`�upTV��.O��/N�?�z���]���k���*���\GqX�u����-�4n`�X�
� ��B`fC��Us[��^�������-2GH�d/��Qxn�?��Rw�&~6����o @���`���$FE�}B!�p�X@�o��/�����0����{$k�
T�S�����q���_���Q\�����k�
�SgA-���������Q�����E�em�Q��SAN`��<� �� L���j<Qr�1����!�"42�������V�/�s�]C�J�$^��v�����ey��]�9��N�����b���;V]w����>�&���^o:�tF%_Id1�VA��nL ,����wp_m7x9HI^UX
?��!xP=�]���)�	~���������F����}�K�"6�����M���ey�,oD�_On�j	�/�-��z[@P���*^��C�T2��N�|#\CP�
�!�[)�W�r�XL���O�^��i��b(�@0��	(4��`h�D���o�������4�������g��77����������pv ��|<�R��s�'�u�iH7~�x�����8���^
�7��Z`�
O\����L;����O7��4�d��k0V#U�5`��1g�3x&���Q�%O}��P�UqI�J<�
P��O���eR��c�f�Z���<yJ��X� dB�9�u�8b���j��z�D�X%���G�f_i���jI��Du����d����W�]��H������!�#�!rM�����������"~��h���P���+��
R���I�A�e����=�%_���.�0��Rz�~��!}���#K-P��4	�����0o��d���������aW:�#�����.�~��Bd���N���qh�����(��Q��8(
q��������'� ��"�kz�W"����o�Yx*��dd���KS�������#~R��k��L�h�G;��J��,�'�!'3V���giO��Q|�����&�N���4�R�N�A�c�,*�y�P��l`�����$��D���0�u��l
XSS��
��cvR#�+���^�������:8�;go�Q.��t���F����
���!�4����c�F>Q5 bh��[�F���y�����.6������L��n�U���r�����z�������I ���!0o�����P�^���B@��GP~�k������2�o�p�]R_�<��O�7���a�V�����2 W��E�biQ�N%���h4[���**e9�3r��������G�+��i�c�\����wO����x�(}��A�����N����W{�H�����}?���p�]���b��xJB� +��
/os�������x@���jbiq)p��};��������J�\S��G�q8�{�V�;����?����w�� ���a"���f��)���b�$��]e(�v<���c.��e.2�Bn���K�����:%�H0j�~8}���������������_]����3D�E�[$�P.|��0�%��@�������VQ�Z�%�@#��L�5��Sl�?���g3}�2-R[���C��6<�GHKbh�Y�Z���<�	�9�]�cT��W�I��'?�z^Q.`�����W�P+�n7�@/<H^$�s#U��].;�����+c/��-�A�?�_A�T���r�k������d�����XD��I7�C�����&��}���(�����k�3B��"�s�h��H[
�E�v��@o����l��hEp��u�m�����h���(��Mm!�q��t�HB��L�X�a�9��w{��c�����F���4*+l���6����L������,g�B�xe��@����@-�g��\���#.�G�>.T�F���

�h�$q������0�@��I��Q@�#\�"j��aG+��|�a�Q\Xw����������@����I�
4iBv��Hp���D�#��F�,�3��s��` I��7�aL���pXn���Uq/��A��^p'�*;��>fQF��8�����J>�(<7-cF�����Dq+7�[S����3�y�9�_����GN�3"��;'~�}�@�����j���
@S0;�5:�h�Bc�#���O��El�9������Y���9�������2�.IO
r4{� -B�� �C3��}�����@,U�2�����f��{aNA����YP��-�p��
�]�+�Cz6,��mD>SH	�-�
8���B��L��0��4 [F)U6��(G���)�\I���q�
w��=;��W�,E)-Is#��C4T�o����B�mb�P�uv�r��}�t��daI��h6v��~���
���?�,��+6;�G�0��^��;����Y���|���&�9�6���;������v'�j}
����9HPi	G�1���� w��YN�����4���/J���nt!�I�$���YK��?�c1�J���?�-n~/������$,���
1�*T���!���Z�k"�G^Q���.�k�v2�����H:Z�8�YH�
^��Np�oS��QT�c&�/�r�/�J������:AL��
<���6��v�M���q�y����s�4pA�{Q��H��r�q���6R���Y�
��G�Y�GKnAA���k�����~����P��@�0����6�=l�+`
Sv��}��Y��a��&-;t�!��E�bd�n��f^��;�o�d!4GQy�A|Z��gN��afA�y��C�b~=)�W�x�S��G���'��������W�_LE:u^�>��^���� �($�!���s�`�r�7&���y�9�7��o~�Gfv��F��;�-��)-�#�R�
�!t������1�z�����%���o�m(e���3>�n��
���~�1( �Cj?���E�c�.����|���������"���Vk�,���;�{2fV�;83��>�7~z�����+U�l�K�f,Q�(���2�<�G��~u&�Y���VI���DE�Z���8��A�$-eW�H	2j
RA�����V�
�o�V��1���/��x��( Y��/-��Y���I�+&ex@�� B�h5(U/y�Iw{��������Ilq�W���Ff���\�q!�nUcG.����k��	u��W���N�&��Y(�H�U��������?�I��m'%����\�d��F4�4*sh{IB�������2���b����������3��ku��^�3������h�1WX���D�������H�s�4
3�8�W�N��o+a+wD�qmI�S{OL���LQf7D���M�_�6iZ�P�
 MS�7`'}�9S4����<�pF���E���o������g'$���k7.��
:��S�yT�n����wq�y�S����+&�w���t r�V���-Z!+c(�=���:A�Ku���y�m�Y`�h�����_���?Q|)cP�US�2i�W7�L�����0s\�Z�A$&���K�+�W��dw9������h;�����L�������Q�),N�����M�-�?d��NyB� 6^�z@~��q����}���il���ia}2��D7H��QM���
�!u��gWuGW�pg\3t;t�?��tQV���{��5�<)�j��TJ<��s-�;�5�b�1��$b>]`��B�c���&C������������S�]%�y`����`�`�r��x/��������W?N/8�8��7>���������Z;�
��a�g��r�gz�qHTcM�U(Y)���[(�w���!e�S4L�C��?Y�2z��Y:�gT�����})l��v��+���tRC��z�G�X:�l7������Oo�[�
1��?��.���P�K�T
~9����t����"hr�.���0�/����>han���}���r�SL�������:��t��~�I�y��W8�N�h�<�����{ )�<��
���;+�����E����0�C���hUhI��}*���Y��v��� �W/7Z����A�Y�v}A�|�B�p��
H ���J$�;��q
�Xw������-�=�J�������S1d�rA��j������������u�'�%������%���<z��d����u ���L$<�d �:p���d��)L�����&���i�����j^���=����=wN,��bf����:���;�=B�����`o�u�%�.�&8���O����uxk�����ov��@��P����C4��v���{���qo��b�����I����-�����1�>�	���L�����R[�WE5+p���
z�{��������}
���
��m��a��ah/�^@��9EqZm�*�3�C��W�Gk��/8��LD��d2��c�[����S{��e�&`i#��Q�g�Nr	���q����M���
�vE��. �>�7j����j��l$K�
���x0	�Vk���^0��+J��t��n�R&,���e�<�gMN�n���%.���tv$^���1��A�,����R��z���UBIV�=�6��QuB;���\���H�j+�D��ID�`n��{��2!)��K�6z�;�C�=Z��N���'�����3��XB��I�a�sX�J�������
�����a�v��
�#k�$|�`�c��:c�a|��q�s�vu����h���e���_a��+4h��z��i<�w�7k�������X��f�7�H��	��Mp������
-�u�����(�@rD�%���,����v��<����ln���x����n�ME��}�9��b�NQ���7�4#�����zX� 5S�S!$h��D�1"K)�-l�XY_�_���4�)��'�'���`U� "/�UJ�9��,U��4�,d�(�����\N`��TA�����Y��N����hRwtB@s����Rv��
����Y<W�N�7����������8ix���?�=r|�1��� ��f[��!:���[���zd�c�iM�=�:Jy��[�Lm������P�M����#�T&1cG��Z
�����lJF9��m?�<�U}|����.
���I�;��SL�R�Q��(��}p��*�����x���]��h��u:]�o>�����8n����w�6'��a�����q���G%�l�����n�Np5���~T��55��w�7��)�(�nj8���T#����R���)����)���������T��:��7��f�Vk��~�3����X����E��z�
���8������J�����rB����ZH�Q�����NU%�KU����Y�7�v��|4���]�����(eFQ^��<�3U��B��J��q����J�]�����SGo�U�����e���1��Sc�8C�����\������H��S]G)da��~ �h�f����*�X`�8#����`8x5�y�J"���^��Rn_
AB0c�V�$Y}�;�I�Q�X��Z�$Tje*�j�K�������U�h�Wq���6� �������I~�L�p��N�M���d�'ax�:TJ��0��:��a������?�nWE�n%c�����Nz�Y�5�L�`2� �%�9�oIK��
�;���������h�'���;
Jj%w�����FK��A���;E��C�_?�4�uH�����S��������7�]��������=X��Q��A��,�8<��,e���tL!��g���� �F�4M�^p%��"0�P�5�%u&��\-���~S6U��]���=�����������#��n����go�����S�|����Q�G.��*^�"s�^Q��P;���I��qO�!6QVa�|�����n�����i�KdO�#��3x��<&�TQ����^���F����������k#t�J��S���J�X��0����[m�u�����`��b�������;e7�xH\�)_]:[/��%�m����!��;S�|����	��;���0����I8������f���p��L]��U���4"/����
'�M�1p�d��ivl�O~�@��=�j�<�>Y	:�N�,��^^�B��ey�����.h�����gCz���XB��/N��x���g���7?���#�pG�lb�����+���hw�_�P�H�H����#��Y���pzuyu���+6Q��iT�^�W��:�K�*�Y�/��u�����E��t�LqT�a�����
�-�w���d�t���l���a��:{�`��J�K�B������s�3f��Ye������LV���������_����-D�d��3�(���0�z��2��pu�/�1}��N.xu�F�~z3�g����\�5/���eI7�{/��|���^��[v-�k-�]����H����(�#�X����������d�m��F
p���0�=�� ��rK�|(��K��',%mI�YF��?�����v-���@��������3������Q{0���,'x;�-����}����"�E���-Kt�vFIY�H���V�d�8�v�B&W���%L�|���h��S��4���P�bF��p����A4��:{Q����?��(w��V8.f��p8�FoT��Q��\P�����"���T���!���	�"Z_���'�b��Cm�AU !y��W�7�B5�B�I�+q��QI����5����o�Vy7�Dc\x��Nyt��_B����^����a%=��N��?
���/����	����v8O>Kcd�����A�����$�����6��P�|.�u�8��V|$G~���^���Z�������p>�:���\���r�lSH>;T������TS�&�g������6L�.�y��P�N`����P|�B��2�Y��������$�i����������x����|i���zM�������-@�p�]�VO������������S!�9=�p�	z!���@X�?�$�_������]E���vx$S�����1���T�O�V�uy�IS6�0L. aX�d!<~u��tz��������;�����J^��������������T�����������8`���d����������Xp����*W*��0e����Q����M[�N{�����T\{��a"��RrNa~�=���BD*zq��H���7'�����.�����M�����"w6C�F���.��f��w�����������~YB�)�����Px���3�^�~q�O?�ID��P5���6�h�%1��?s0c��g�0��r�����;!|0	%�8�����t�J/�$����$>I�-���U�XJr����A���x�_��'M"J!U����v����]��3��}L�C���P�~>&��p�|�B/��i&�02�����r^�c;���T�]�vHi~�!����g��"�[�J�;�)5^a�;���D����o�5��xq,�>�����sK��b.������_�����r�F�#��L��+�#^�hCa3ZC�n+��y20k8��T��T\@���T�s)�|��,�����Q���V�X�e���������������U�rB��w/���G�PGQA(#S���^G�.�q�Xx>TxX����e�<�2�4NA���(`��Be���bS}~�����9������Plj��N:�`��+
h�K	?�k��Ma�=YL@���
:�p���w2D���8�|;Te�P���2RmJ���-*w��d���n��]�/C�w�B���-�,=�����=_��@<�
���f72�j��o!E6{��������,�V:{;���)�K��x`��W��H�����s�r+�9y��s��2�<��<���>'��|,�}��&��p�ju&�ag2�z���,�F����R>A/d�o'�P���I;����*Y��"	�q	vG�Q_��49*^k�p��g��Q	E�Zg�(�s,�[��D8�d*fQr�V\6�-��(p:����#\Q����?�B=`����{&vK�j,������5����Q+:3&���.?�F����g�k����cC���5������&���o�8k�G�No��;p��y�=��<l�����Z�"8T�y?���?y;����"5%��.\�����g�x�/9�
D�V��p�wb�K���Z;p�{6���2�����=����
�9����i�R
{f���^�������	T
��^}B
�D9K"BsV�aW���_���
�	�0��`�uq��x���K7��4m�H[���t9B��8N���@�Pd
s ������G�M���y�����z%����,8�@\��)�����Wy���:�#�a���:��K�mq���#���(|��oN!�?��
�!f����:
�g���5��'F�e���H���0E��Q����U�����/
�B0s�\42�����_������Bb���������
�6!od[���L�FpA!��~��^~xw�{y2S��*W��	$�/�^��Ra�����E�w/���9�k��Tv��)^���S����#�W��{������D�<�4$�J���89�l�h%M8�$AK����zXMXP�@&�m�Y�o��[������T$)� q#��e���7)����Q������q|�4�[yYa�F.f+Q��Bu����'��}�@2����EG�1|IS�|d��2��J�3�	$��?�"�2eJ�H�?F�����O�FC�U$0����9�=���t
��8u\>���������3���L������vd>Cn���B��VM�.�{6z=�^�F���I�>����K���Y�&�U2nx��+N��s�y� ���x�>N����*�B���V!1��;!/s���}�D��lc=�����sfP%Zi3�Tdk{&��<���
$���1��#�~Yc�y���BU�kA�5�;\�:oXO���=g6[2��Y��))5���w@�h'$�F�9ct���A#,�����DaL�LVy���
%��j�{as�
�������U����U��~���[�"A��.�#^�����*WL�b��e�/���@���m�������;v��������g!��Y����FFr�����/��g�����K�������l&,vM�uj.l��a����F8Ua���d�J�q��K(�]��@��.�j���B����R�����Z)h��YLb���PP�6�,���nQ�h^��e�W����Jt�f������)�a�*SH����O*�"��	��{^>����i(_2S-D��|c��P�����m�p���?i��q:�7������(������L��G����i
*��0;ND��9����b��d�|�l�!���\����b�i�q2��A$����s�X-t8���5k�g�q��.����j��p��
�����liG��a��bM�	��R�]S�
�M�pR�)����s_�����Z����U��_�-���!6>��&l��0y�<m�������o!bc���5va!���{kn��#L�2�PvvY>����aO���u��'_�u�"(Q����-,�V�Q�s�� ��K"��y{�/I�	�n��6%s�<����z(
"����4)1m�U8�����:S�A���XM+���E!�Lh}�
���@�!?�2���5�'���l�jv7�h*]�Q���B�~�a����'uc�`E�f�\�#?cm����E

�Md���NJ�������M1a��������=C���>9��L���`���U4�n�X�K������W�^�X�,�J�����V�L����t���������l�;n����o�����_�^�9��\\�����������s)���{�"?�u���F�S��h�yZ
u�i��j4�9X���s����M�G.R���-�x������,�����DEW�<9O��f� T�xA=H�2+���a���:�����b�@+�
6�����A�����-��4�l����Jj��^�q�?Wr'~�U��F��	:MX)�rsM�a������k��d�i>�%���c��A�=�+���y/�Q�A����hnj����|4s���v���$�0���HCl0&���Et��7�q�G����05�x�,1�c������Y�������W^U�zonZl����-R����|���i����*~��t)C��4�wO�����2���Q��Q��M���,��8��sC.Z��d�������6�=�L�
�<��q����D�c�6)���y����g��!����<Sy��K�1P�
�r�q���S|ob�KH����X�3Y���;C�CZ;���"G�����D��^2������S�����%����s��<�G!�����&�5��8������5,��8����CZZN.��sNj��q���M�>�����
��;��j��DN�T�j4�b�e���-����b�w�f���X���
?��R�	�Cfx�_����m7XO������7Cz��+}�C���!����3q�yx����s��T�dG�_�T��
�w�Cl�uCp��X���b�`� u�NB.��g<Y* ��}"`]��]$k��Rq�.����!�~�?���=0��Ab�8	�dv�3�����%?����R!�A���� T>Z�-�@,�������!�{$Y���}:ny����E7-��6�����v�2C��Q.F��9&�A�����i�0b&���W���(��{]*z���'M
��t�8F��R�{�n������<.����I)����d2A�5��!�S�:[Qk�?�yX��\��M��(�K��Ht�
,M�
�%k������)�"Aeu����tQYO��~��g*=5K�V\��K�G��lF��B�h��>���M�-�?��7��N�M�Nw��T8����^��?�
��Q0Z�^��v����;�GwTjN�M�Dc)�:��Av�5 2f�h��f<�� �W$(�����7VA��s���NNO����'?���\'��W�N//9���������_fQJm"�S@-d�
��S�7��M���5�
���_H�.��7lt'}qD\;Y]�P�2T�h��"�UX����������e`��[��3�)E[�J���\���8dI��^&�����������/��8�:W�32��J����L��8%rP}hUMWw��jp���DN��r��D�\0r6�^�8�%)��������u������r`+��}���76�m���xk9������a)(����
�nI~P����t{g����h����5�L��#�eN�{������,3��e���j���Mf�-�p����c���������=�MC���{%���j�� �H��F��k�$x��M�qU���[@���?�2��.^7�!�wp/�����xm�w����.�k�p�]�@�v@'�,'�[�O����U7�w���&H#��U<�����T�<B?���W����h�\�0��Y��\D�s��"&����C[�m��I^7� ��k-����������.���c��u�Z������DT2+{��������,~	����+q��������@|��D�����|GM��x�lz��R	�u����O��Q�m����6����������!w%D��N!y����=�'�kPo��?#��0K@A4@O��K��	Q��{�x���r��Z���PI���?�����[��P��j�8��+��)m��Uf��cgd��������E7�2`���#&�ga���R�z��)��l'���ph|T��Jm���_�Xh�Kc��������0��7|r�	�7�X����JB�7����D���Cl�"�E��������;���T���;Z^L��plQs�,	�e<�@%�Nz,���~y����S��6�tBh�����������$|C�i7��H���>�2-,PTG6��� �Z��K���!s��T5Hz�c�]a� Ry�/�B�t�@�����f�)�7��H�:��>�F���z>��M�������1��T�%��d:��N���J�-
e�^�f#�t6������n���h�QY� {�F�"O�l�x�dG�b9������="y���Xc��1����j�d.����>P�R���[�����3���R,�7.��F�R��}����/����A6���%���e������a��������-����v�2�r���b�$p�O|���}�\>N�"d[{B��3*������[=]��!�45*�m���W`~����R&�����}��y��
��%C�PL/����'�c���WT<�/��0�����8�Q��J���6M�q^�W���we)���P�A����������d��PpM�T��i����K:+�F���n�}q`W8�.
� v("���Q)�r42�&�Qn%m���4��i��Pui���C��|��I�����������
�jf����J�,���L=�C�w0�xKGWF��Z�R�E9>��?�s����G�
��/���������f�b{L};����E�u��+G�mK�K�m��l
��<K%w�����Yd��5Y�:8�y����b_�E����fQm���R�{�1�d��V�a�NC
�O�����
L�����l2�^J�R��-�4`0��~�nW�����T�����%%�)��J�M{���JA� :�p��P<��m�\&�!v�J����dy'�a#6�(��������R��hu�W���2��.��������(���,��we���
���R6��(�F�0��+�K?�KA���Tn@v�X�+���2T��S���q�L����������c��9�\�f�G�(�=����#�%����ehaw�q�6W��{8@O�6��|������&�+���i��������,�^b*����0Ctn��R����e�#4���O{���?�������N�0v{�I��i����i���������\&V^��;Zvy��H^7�I��x&q���ML�������M�7.F�>�U���1i���g��g-���m�A��'"]����nD����)��4��)2��������x�.@��P���NqP��������7�A����#�����TB����G����"�n�����q��	�jq��C�s*���5dy}~
��[�P�3���~������Z������������������������{5�S��g5�I�?�6qN�jY���0$Xa����[�����������|��:����;*p��^�����]����� '���5l�:{�J[���P�j������?
;�Vk8����<����Vv�w��?������9���~�����"?~��������T0�O��4���LM��Wp{�_���;=grkc���O������������'��:�l�a�T�:8 2��z9�U���.������}�%�c,��@Q��`�q��&��2����8�����	�1o�X����_��%�)�I�_�z������gN����
+I?�Ce�����S�!U���	�7����g�-����CH�.�9��s�v��m������D%��|���w
#z������vZ���O���?0�5�
v15-0005-tableam-Inquire-slot-type-from-AM-rather-than-ha.patch.gzapplication/x-patch-gzipDownload
v15-0006-tableam-introduce-slot-based-table-getnext-and-u.patch.gzapplication/x-patch-gzipDownload
v15-0007-tableam-Add-insert-delete-update-lock_tuple.patch.gzapplication/x-patch-gzipDownload
v15-0008-tableam-Add-fetch_row_version.patch.gzapplication/x-patch-gzipDownload
v15-0009-tableam-Add-use-tableam_fetch_follow_check.patch.gzapplication/x-patch-gzipDownload
v15-0010-tableam-Add-table_get_latest_tid.patch.gzapplication/x-patch-gzipDownload
v15-0011-tableam-multi_insert-and-slotify-COPY.patch.gzapplication/x-patch-gzipDownload
v15-0012-tableam-finish_bulk_insert.patch.gzapplication/x-patch-gzipDownload
v15-0013-tableam-slotify-CREATE-TABLE-AS-and-CREATE-MATER.patch.gzapplication/x-patch-gzipDownload
v15-0014-tableam-index-builds.patch.gzapplication/x-patch-gzipDownload
v15-0015-tableam-relation-creation-VACUUM-FULL-CLUSTER-SE.patch.gzapplication/x-patch-gzipDownload
v15-0016-tableam-VACUUM-and-ANALYZE.patch.gzapplication/x-patch-gzipDownload
v15-0017-tableam-planner-size-estimation.patch.gzapplication/x-patch-gzipDownload
v15-0018-tableam-Sample-Scan-Support.patch.gzapplication/x-patch-gzipDownload
v15-0019-tableam-bitmap-heap-scan.patch.gzapplication/x-patch-gzipDownload
v15-0020-tableam-remaining-stuff.patch.gzapplication/x-patch-gzipDownload
v15-0021-WIP-Move-xid-horizon-computation-for-page-level-.patch.gzapplication/x-patch-gzipDownload
v15-0022-tableam-Add-function-to-determine-newest-xid-amo.patch.gzapplication/x-patch-gzipDownload
v15-0023-tableam-Fetch-tuples-for-triggers-EPQ-using-a-pr.patch.gzapplication/x-patch-gzipDownload
#106Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#105)
1 attachment(s)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-03-05 23:07:21 -0800, Andres Freund wrote:

My next steps are:
- final polish & push the basic DDL and pg_dump patches

Done and pushed. Some collation dependent fallout, I'm hoping I've just
fixed that.

- cleanup & polish the ON CONFLICT refactoring

Here's a cleaned up version of that patch. David, Alvaro, you also
played in that area, any objections? I think this makes that part of the
code easier to read actually. Robert, thanks for looking at that patch
already.

Greetings,

Andres Freund

Attachments:

v16-0001-Don-t-reuse-slots-between-root-and-partition-in-.patchtext/x-diff; charset=us-asciiDownload
From e4caa7de3006370f52b4dafe204d45f9d99fa5a4 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 6 Mar 2019 11:30:33 -0800
Subject: [PATCH v16] Don't reuse slots between root and partition in ON
 CONFLICT ... UPDATE.

Until now the the slot to store the conflicting tuple, and the result
of the ON CONFLICT SET, where reused between partitions. That
necessitated changing slots descriptor when switching partitions.

Besides the overhead of switching descriptors on a slot (which
requires memory allocations and prevents JITing), that's importantly
also problematic for tableam. There individual partitions might belong
to different tableams, needing different kinds of slots.

In passing also fix ExecOnConflictUpdate to clear the existing slot at
exit. Otherwise that slot could continue to hold a pin till the query
ends, which could be far too long if the input data set is large, and
there's no further conflicts. While previously also problematic, it's
now more important as there will be more such slots when partitioned.

Author: Andres Freund
Reviewed-By: Robert Haas
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
---
 src/backend/executor/execPartition.c   | 52 +++++++++++++------
 src/backend/executor/nodeModifyTable.c | 70 +++++++++-----------------
 src/include/nodes/execnodes.h          |  5 +-
 3 files changed, 64 insertions(+), 63 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e41801662b3..4491ee69912 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -723,28 +723,55 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 		if (node->onConflictAction == ONCONFLICT_UPDATE)
 		{
 			TupleConversionMap *map;
+			TupleDesc	leaf_desc;
 
 			map = leaf_part_rri->ri_PartitionInfo->pi_RootToPartitionMap;
+			leaf_desc = RelationGetDescr(leaf_part_rri->ri_RelationDesc);
 
 			Assert(node->onConflictSet != NIL);
 			Assert(rootResultRelInfo->ri_onConflict != NULL);
 
+			leaf_part_rri->ri_onConflict = makeNode(OnConflictSetState);
+
+			/*
+			 * Need a separate existing slot for each partition, as the
+			 * partition could be of a different AM, even if the tuple
+			 * descriptors match.
+			 */
+			leaf_part_rri->ri_onConflict->oc_Existing =
+				ExecInitExtraTupleSlot(mtstate->ps.state,
+									   leaf_desc,
+									   &TTSOpsBufferHeapTuple);
+
 			/*
 			 * If the partition's tuple descriptor matches exactly the root
-			 * parent (the common case), we can simply re-use the parent's ON
+			 * parent (the common case), we can re-use most of the parent's ON
 			 * CONFLICT SET state, skipping a bunch of work.  Otherwise, we
 			 * need to create state specific to this partition.
 			 */
 			if (map == NULL)
-				leaf_part_rri->ri_onConflict = rootResultRelInfo->ri_onConflict;
+			{
+				/*
+				 * It's safe to reuse these from the partition root, as we
+				 * only process one tuple at a time (therefore we won't
+				 * overwrite needed data in slots), and the results of
+				 * projections are independent of the underlying
+				 * storage. Projections and where clauses themselves don't
+				 * store state / are independent of the underlying storage.
+				 */
+				leaf_part_rri->ri_onConflict->oc_ProjSlot =
+					rootResultRelInfo->ri_onConflict->oc_ProjSlot;
+				leaf_part_rri->ri_onConflict->oc_ProjInfo =
+					rootResultRelInfo->ri_onConflict->oc_ProjInfo;
+				leaf_part_rri->ri_onConflict->oc_WhereClause =
+					rootResultRelInfo->ri_onConflict->oc_WhereClause;
+			}
 			else
 			{
 				List	   *onconflset;
 				TupleDesc	tupDesc;
 				bool		found_whole_row;
 
-				leaf_part_rri->ri_onConflict = makeNode(OnConflictSetState);
-
 				/*
 				 * Translate expressions in onConflictSet to account for
 				 * different attribute numbers.  For that, map partition
@@ -778,20 +805,17 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 				/* Finally, adjust this tlist to match the partition. */
 				onconflset = adjust_partition_tlist(onconflset, map);
 
-				/*
-				 * Build UPDATE SET's projection info.  The user of this
-				 * projection is responsible for setting the slot's tupdesc!
-				 * We set aside a tupdesc that's good for the common case of a
-				 * partition that's tupdesc-equal to the partitioned table;
-				 * partitions of different tupdescs must generate their own.
-				 */
+				/* create the tuple slot for the UPDATE SET projection */
 				tupDesc = ExecTypeFromTL(onconflset);
-				ExecSetSlotDescriptor(mtstate->mt_conflproj, tupDesc);
+				leaf_part_rri->ri_onConflict->oc_ProjSlot =
+					ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc,
+										   &TTSOpsVirtual);
+
+				/* build UPDATE SET projection state */
 				leaf_part_rri->ri_onConflict->oc_ProjInfo =
 					ExecBuildProjectionInfo(onconflset, econtext,
-											mtstate->mt_conflproj,
+											leaf_part_rri->ri_onConflict->oc_ProjSlot,
 											&mtstate->ps, partrelDesc);
-				leaf_part_rri->ri_onConflict->oc_ProjTupdesc = tupDesc;
 
 				/*
 				 * If there is a WHERE clause, initialize state where it will
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 14d25fd2aa8..b9bd86ff8fd 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1304,6 +1304,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	ExprContext *econtext = mtstate->ps.ps_ExprContext;
 	Relation	relation = resultRelInfo->ri_RelationDesc;
 	ExprState  *onConflictSetWhere = resultRelInfo->ri_onConflict->oc_WhereClause;
+	TupleTableSlot *existing = resultRelInfo->ri_onConflict->oc_Existing;
 	HeapTupleData tuple;
 	HeapUpdateFailureData hufd;
 	LockTupleMode lockmode;
@@ -1413,7 +1414,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	ExecCheckHeapTupleVisible(estate, &tuple, buffer);
 
 	/* Store target's existing tuple in the state's dedicated slot */
-	ExecStoreBufferHeapTuple(&tuple, mtstate->mt_existing, buffer);
+	ExecStorePinnedBufferHeapTuple(&tuple, existing, buffer);
 
 	/*
 	 * Make tuple and any needed join variables available to ExecQual and
@@ -1422,13 +1423,13 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 * has been made to reference INNER_VAR in setrefs.c, but there is no
 	 * other redirection.
 	 */
-	econtext->ecxt_scantuple = mtstate->mt_existing;
+	econtext->ecxt_scantuple = existing;
 	econtext->ecxt_innertuple = excludedSlot;
 	econtext->ecxt_outertuple = NULL;
 
 	if (!ExecQual(onConflictSetWhere, econtext))
 	{
-		ReleaseBuffer(buffer);
+		ExecClearTuple(existing);	/* see return below */
 		InstrCountFiltered1(&mtstate->ps, 1);
 		return true;			/* done with the tuple */
 	}
@@ -1451,7 +1452,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 		 * INSERT or UPDATE path.
 		 */
 		ExecWithCheckOptions(WCO_RLS_CONFLICT_CHECK, resultRelInfo,
-							 mtstate->mt_existing,
+							 existing,
 							 mtstate->ps.state);
 	}
 
@@ -1469,11 +1470,17 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 
 	/* Execute UPDATE with projection */
 	*returning = ExecUpdate(mtstate, &tuple.t_self, NULL,
-							mtstate->mt_conflproj, planSlot,
+							resultRelInfo->ri_onConflict->oc_ProjSlot,
+							planSlot,
 							&mtstate->mt_epqstate, mtstate->ps.state,
 							canSetTag);
 
-	ReleaseBuffer(buffer);
+	/*
+	 * Clear out existing tuple, as there might not be another conflict among
+	 * the next input rows. Don't want to hold resources till the end of the
+	 * query.
+	 */
+	ExecClearTuple(existing);
 	return true;
 }
 
@@ -1633,7 +1640,6 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						ResultRelInfo *targetRelInfo,
 						TupleTableSlot *slot)
 {
-	ModifyTable *node;
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
@@ -1698,19 +1704,6 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 		slot = execute_attr_map_slot(map->attrMap, slot, new_slot);
 	}
 
-	/* Initialize information needed to handle ON CONFLICT DO UPDATE. */
-	Assert(mtstate != NULL);
-	node = (ModifyTable *) mtstate->ps.plan;
-	if (node->onConflictAction == ONCONFLICT_UPDATE)
-	{
-		Assert(mtstate->mt_existing != NULL);
-		ExecSetSlotDescriptor(mtstate->mt_existing,
-							  RelationGetDescr(partrel->ri_RelationDesc));
-		Assert(mtstate->mt_conflproj != NULL);
-		ExecSetSlotDescriptor(mtstate->mt_conflproj,
-							  partrel->ri_onConflict->oc_ProjTupdesc);
-	}
-
 	return slot;
 }
 
@@ -2319,43 +2312,28 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		econtext = mtstate->ps.ps_ExprContext;
 		relationDesc = resultRelInfo->ri_RelationDesc->rd_att;
 
-		/*
-		 * Initialize slot for the existing tuple.  If we'll be performing
-		 * tuple routing, the tuple descriptor to use for this will be
-		 * determined based on which relation the update is actually applied
-		 * to, so we don't set its tuple descriptor here.
-		 */
-		mtstate->mt_existing =
-			ExecInitExtraTupleSlot(mtstate->ps.state,
-								   mtstate->mt_partition_tuple_routing ?
-								   NULL : relationDesc, &TTSOpsBufferHeapTuple);
-
 		/* carried forward solely for the benefit of explain */
 		mtstate->mt_excludedtlist = node->exclRelTlist;
 
 		/* create state for DO UPDATE SET operation */
 		resultRelInfo->ri_onConflict = makeNode(OnConflictSetState);
 
-		/*
-		 * Create the tuple slot for the UPDATE SET projection.
-		 *
-		 * Just like mt_existing above, we leave it without a tuple descriptor
-		 * in the case of partitioning tuple routing, so that it can be
-		 * changed by ExecPrepareTupleRouting.  In that case, we still save
-		 * the tupdesc in the parent's state: it can be reused by partitions
-		 * with an identical descriptor to the parent.
-		 */
+		/* initialize slot for the existing tuple */
+		resultRelInfo->ri_onConflict->oc_Existing =
+			ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc,
+								   &TTSOpsBufferHeapTuple);
+
+		/* create the tuple slot for the UPDATE SET projection */
 		tupDesc = ExecTypeFromTL((List *) node->onConflictSet);
-		mtstate->mt_conflproj =
-			ExecInitExtraTupleSlot(mtstate->ps.state,
-								   mtstate->mt_partition_tuple_routing ?
-								   NULL : tupDesc, &TTSOpsHeapTuple);
-		resultRelInfo->ri_onConflict->oc_ProjTupdesc = tupDesc;
+		resultRelInfo->ri_onConflict->oc_ProjSlot =
+			ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc,
+								   &TTSOpsVirtual);
 
 		/* build UPDATE SET projection state */
 		resultRelInfo->ri_onConflict->oc_ProjInfo =
 			ExecBuildProjectionInfo(node->onConflictSet, econtext,
-									mtstate->mt_conflproj, &mtstate->ps,
+									resultRelInfo->ri_onConflict->oc_ProjSlot,
+									&mtstate->ps,
 									relationDesc);
 
 		/* initialize state to evaluate the WHERE clause, if any */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 996d872c562..6a5411eba8c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -377,8 +377,9 @@ typedef struct OnConflictSetState
 {
 	NodeTag		type;
 
+	TupleTableSlot *oc_Existing;	/* slot to store existing target tuple in */
+	TupleTableSlot *oc_ProjSlot;	/* CONFLICT ... SET ... projection target */
 	ProjectionInfo *oc_ProjInfo;	/* for ON CONFLICT DO UPDATE SET */
-	TupleDesc	oc_ProjTupdesc; /* TupleDesc for the above projection */
 	ExprState  *oc_WhereClause; /* state for the WHERE clause */
 } OnConflictSetState;
 
@@ -1109,9 +1110,7 @@ typedef struct ModifyTableState
 	List	  **mt_arowmarks;	/* per-subplan ExecAuxRowMark lists */
 	EPQState	mt_epqstate;	/* for evaluating EvalPlanQual rechecks */
 	bool		fireBSTriggers; /* do we need to fire stmt triggers? */
-	TupleTableSlot *mt_existing;	/* slot to store existing target tuple in */
 	List	   *mt_excludedtlist;	/* the excluded pseudo relation's tlist  */
-	TupleTableSlot *mt_conflproj;	/* CONFLICT ... SET ... projection target */
 
 	/*
 	 * Slot for storing tuples in the root partitioned table's rowtype during
-- 
2.21.0.dirty

#107David Rowley
david.rowley@2ndquadrant.com
In reply to: Andres Freund (#106)
Re: Pluggable Storage - Andres's take

On Thu, 7 Mar 2019 at 08:33, Andres Freund <andres@anarazel.de> wrote:

Here's a cleaned up version of that patch. David, Alvaro, you also
played in that area, any objections? I think this makes that part of the
code easier to read actually. Robert, thanks for looking at that patch
already.

I only had a quick look and don't have a grasp of what the patch
series is doing to tuple slots, but I didn't see anything I found
alarming during the read.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#108Andres Freund
andres@anarazel.de
In reply to: David Rowley (#107)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-03-07 11:56:57 +1300, David Rowley wrote:

On Thu, 7 Mar 2019 at 08:33, Andres Freund <andres@anarazel.de> wrote:

Here's a cleaned up version of that patch. David, Alvaro, you also
played in that area, any objections? I think this makes that part of the
code easier to read actually. Robert, thanks for looking at that patch
already.

I only had a quick look and don't have a grasp of what the patch
series is doing to tuple slots, but I didn't see anything I found
alarming during the read.

Thanks for looking.

Re slots - the deal basically is that going forward low level
operations, like fetching a row from a table etc, have to be done by a
slot that's compatible with the "target" table. You can get compatible
slot callbakcs by calling table_slot_callbacks(), or directly create one
by calling table_gimmegimmeslot() (likely to be renamed :)).

The problem here was that the partition root's slot was used to fetch /
store rows from a child partition. By moving mt_existing into
ResultRelInfo that's not the case anymore.

- Andres

#109Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#108)
Re: Pluggable Storage - Andres's take

On Wed, Mar 6, 2019 at 6:11 PM Andres Freund <andres@anarazel.de> wrote:

slot that's compatible with the "target" table. You can get compatible
slot callbakcs by calling table_slot_callbacks(), or directly create one
by calling table_gimmegimmeslot() (likely to be renamed :)).

Hmm. I assume the issue is that table_createslot() was already taken
for another purpose, so then when you needed another callback you went
with table_givemeslot(), and then when you needed a third API to do
something in the same area the best thing available was
table_gimmeslot(), which meant that the fourth API could only be
table_gimmegimmeslot().

Does that sound about right?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#110Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#109)
Re: Pluggable Storage - Andres's take

On 2019-03-07 08:52:21 -0500, Robert Haas wrote:

On Wed, Mar 6, 2019 at 6:11 PM Andres Freund <andres@anarazel.de> wrote:

slot that's compatible with the "target" table. You can get compatible
slot callbakcs by calling table_slot_callbacks(), or directly create one
by calling table_gimmegimmeslot() (likely to be renamed :)).

Hmm. I assume the issue is that table_createslot() was already taken
for another purpose, so then when you needed another callback you went
with table_givemeslot(), and then when you needed a third API to do
something in the same area the best thing available was
table_gimmeslot(), which meant that the fourth API could only be
table_gimmegimmeslot().

Does that sound about right?

It was 3 AM, and I thought it was hilarious...

#111Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#110)
Re: Pluggable Storage - Andres's take

On Thu, Mar 7, 2019 at 12:49 PM Andres Freund <andres@anarazel.de> wrote:

On 2019-03-07 08:52:21 -0500, Robert Haas wrote:

On Wed, Mar 6, 2019 at 6:11 PM Andres Freund <andres@anarazel.de> wrote:

slot that's compatible with the "target" table. You can get compatible
slot callbakcs by calling table_slot_callbacks(), or directly create one
by calling table_gimmegimmeslot() (likely to be renamed :)).

Hmm. I assume the issue is that table_createslot() was already taken
for another purpose, so then when you needed another callback you went
with table_givemeslot(), and then when you needed a third API to do
something in the same area the best thing available was
table_gimmeslot(), which meant that the fourth API could only be
table_gimmegimmeslot().

Does that sound about right?

It was 3 AM, and I thought it was hilarious...

It is. Just like me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#112Noname
ilmari@ilmari.org
In reply to: Andres Freund (#110)
Re: Pluggable Storage - Andres's take

Andres Freund <andres@anarazel.de> writes:

On 2019-03-07 08:52:21 -0500, Robert Haas wrote:

On Wed, Mar 6, 2019 at 6:11 PM Andres Freund <andres@anarazel.de> wrote:

slot that's compatible with the "target" table. You can get compatible
slot callbakcs by calling table_slot_callbacks(), or directly create one
by calling table_gimmegimmeslot() (likely to be renamed :)).

Hmm. I assume the issue is that table_createslot() was already taken
for another purpose, so then when you needed another callback you went
with table_givemeslot(), and then when you needed a third API to do
something in the same area the best thing available was
table_gimmeslot(), which meant that the fourth API could only be
table_gimmegimmeslot().

Does that sound about right?

It was 3 AM, and I thought it was hilarious...

♫ Gimme! Gimme! Gimme! A slot after midnight ♫

- ilmari (SCNR)
--
"I use RMS as a guide in the same way that a boat captain would use
a lighthouse. It's good to know where it is, but you generally
don't want to find yourself in the same spot." - Tollef Fog Heen

#113Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#106)
10 attachment(s)
Re: Pluggable Storage - Andres's take

On Thu, Mar 7, 2019 at 6:33 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-03-05 23:07:21 -0800, Andres Freund wrote:

My next steps are:
- final polish & push the basic DDL and pg_dump patches

Done and pushed. Some collation dependent fallout, I'm hoping I've just
fixed that.

Thanks for the corrections that I missed, also for the extra changes.

Here I attached the rebased patches that I shared earlier. I am adding the
comments to explain the API's in the code, will share those patches later.

I observed a crash with the latest patch series in the COPY command.
I am not sure whether the problem is with the reduce of tableOid patch
problem,
Will check it and correct the problem.

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0010-Table-access-method-API-explanation.patchapplication/octet-stream; name=0010-Table-access-method-API-explanation.patchDownload
From bade818d2a77dd4f5cf93cfaba05f6a11899732c Mon Sep 17 00:00:00 2001
From: Kommi <haribabuk@fast.au.fujitsu.com>
Date: Mon, 18 Feb 2019 12:41:34 +1100
Subject: [PATCH 10/10] Table access method API explanation

All the table access method API's and their details are explained.
---
 doc/src/sgml/am.sgml | 548 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 544 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/am.sgml b/doc/src/sgml/am.sgml
index 579187ed1b..d440ebeb58 100644
--- a/doc/src/sgml/am.sgml
+++ b/doc/src/sgml/am.sgml
@@ -18,12 +18,552 @@
   <para>
    All Tables in <productname>PostgreSQL</productname> are the primary
    data store. Each table is stored as its own physical <firstterm>relation</firstterm>
-   and so is described by an entry in the <structname>pg_class</structname>
-   catalog. The contents of an table are entirely under the control of its
-   access method. (All the access methods furthermore use the standard page
-   layout described in <xref linkend="storage-page-layout"/>.)
+   and is described by an entry in the <structname>pg_class</structname>
+   catalog. A table's content is entirely controlled by its access method, although
+   all access methods use the same standard page layout described in <xref linkend="storage-page-layout"/>.
   </para>
+  
+  <sect2 id="table-access-methods-api">
+   <title>Table access method API</title>
+   
+   <para>
+    Each table access method is described by a row in the
+    <link linkend="catalog-pg-am"><structname>pg_am</structname></link> system
+    catalog. The <structname>pg_am</structname> entry specifies a <firstterm>type</firstterm>
+    of the access method and a <firstterm>handler function</firstterm> for the
+    access method. These entries can be created and deleted using the <xref linkend="sql-create-access-method"/>
+    and <xref linkend="sql-drop-access-method"/> SQL commands.
+   </para>
+  
+   <para>
+    A table access method handler function must be declared to accept a
+    single argument of type <type>internal</type> and to return the
+    pseudo-type <type>table_am_handler</type>.  The argument is a dummy value that
+    simply serves to prevent handler functions from being called directly from
+    SQL commands.  The result of the function must be a palloc'd struct of
+    type <structname>TableAmRoutine</structname>, which contains everything
+    that the core code needs to know to make use of the table access method.
+    The <structname>TableAmRoutine</structname> struct, also called the access
+    method's <firstterm>API struct</firstterm>, includes fields specifying assorted
+    fixed properties of the access method, such as whether it can support
+    bitmap scans.  More importantly, it contains pointers to support
+    functions for the access method, which do all of the real work to access
+    tables.  These support functions are plain C functions and are not
+    visible or callable at the SQL level.  The support functions are described
+    in <structname>TableAmRoutine</structname> structure. For more details, please
+    refer the file <filename>src/include/access/tableam.h</filename>.
+   </para>
+   
+   <para>
+    Any new <literal>TABLE ACCSESS METHOD</literal> developers can refer the exisitng <literal>HEAP</literal>
+    implementation present in the <filename>src/backend/heap/heapam_handler.c</filename> for more details of
+    how it is implemented for HEAP access method.
+   </para>
+   
+   <para>
+    There are differnt type of API's that are defined and those details are below.
+   </para>
+  
+   <sect3 id="slot-implementation-function">
+    <title>Slot implementation functions</title>
+     
+   <para>
+<programlisting>
+const TupleTableSlotOps *(*slot_callbacks) (Relation rel);
+</programlisting>
+  
+    This API expects the function should return the slot implementation that is specific to the AM.
+    Following are the predefined types of slot implementations that are available,
+    <literal>TTSOpsVirtual</literal>, <literal>TTSOpsHeapTuple</literal>,
+    <literal>TTSOpsMinimalTuple</literal> and <literal>TTSOpsBufferHeapTuple</literal>.
+    The AM implementations can use any one of them. For more details of these slot 
+    specific implementations, you can refer <filename>src/include/executor/tuptable.h</filename>.
+   </para>
+   </sect3>
+   
+   <sect3 id="table-scan-functions">
+    <title>Table scan functions</title>
+     
+    <para>
+     The following API's are used for scanning of a table.
+    </para>
+   
+    <para>
+<programlisting>
+TableScanDesc (*scan_begin) (Relation rel,
+                             Snapshot snapshot,
+                             int nkeys, struct ScanKeyData *key,
+                             ParallelTableScanDesc parallel_scan,
+                             bool allow_strat,
+                             bool allow_sync,
+                             bool allow_pagemode,
+                             bool is_bitmapscan,
+                             bool is_samplescan,
+                             bool temp_snap);
+</programlisting>
+  
+     This API to start a scan of a relation pointed by <literal>rel</literal> using specified options
+     and returns the <structname>TableScanDesc</structname>. <literal>parallel_scan</literal> can be used
+     by the AM, in case if it support parallel scan.
+    </para>
+  
+    <para>
+<programlisting>
+void        (*scan_end) (TableScanDesc scan);
+</programlisting>
+  
+     This API to end the scan that is started by the API <literal>scan_begin</literal>.
+    </para>
+  
+    <para>
+<programlisting>
+void        (*scan_rescan) (TableScanDesc scan, struct ScanKeyData *key, bool set_params,
+                            bool allow_strat, bool allow_sync, bool allow_pagemode);
+</programlisting>
+  
+     This API to restart the given scan that is already started by the
+     API <literal>scan_begin</literal> using the provided options, releasing
+     any resources (such as buffer pins) that are held by the scan.
+    </para>
+   
+    <para>
+<programlisting>
+TupleTableSlot *(*scan_getnextslot) (TableScanDesc scan,
+                                     ScanDirection direction, TupleTableSlot *slot);
+</programlisting>
+  
+     This API to return the next satisified tuple from the scan started by the API
+     <literal>scan_begin</literal>.
+    </para>
+    
+   </sect3>
+  
+   <sect3 id="parallel-table-scan-function">
+    <title>parallel table scan functions</title>
+   
+    <para>
+     The following API's are used to perform the parallel scan. 
+    </para>  
+   
+    <para>
+<programlisting>
+Size        (*parallelscan_estimate) (Relation rel);
+</programlisting>
+  
+     This API to return the total size that is required for the AM to perform
+     the parallel table scan. The minimum size that is required is 
+     <structname>ParallelBlockTableScanDescData</structname>.
+    </para>
+    
+    <para>
+<programlisting>
+Size        (*parallelscan_initialize) (Relation rel, ParallelTableScanDesc parallel_scan);
+</programlisting>
+  
+     This API to perform the initialization of the <literal>parallel_scan</literal>
+     that is required for the parallel scan to be performed by the AM and also return
+     the total size that is required for the AM to perform the parallel table scan.
+    </para>
+   
+    <para>
+<programlisting>
+void        (*parallelscan_reinitialize) (Relation rel, ParallelTableScanDesc parallel_scan);
+</programlisting>
+  
+     This API to reinitalize the parallel scan structure pointed by the <literal>parallel_scan</literal>.
+    </para>
+    
+   </sect3>
+ 
+   <sect3 id="index-scan-functions">
+    <title>Index scan functions</title>
+     
+    <para>
+<programlisting>
+struct IndexFetchTableData *(*begin_index_fetch) (Relation rel);
+</programlisting>
+  
+     This API to return the allocated and initialized <structname>IndexFetchTableData</structname>
+     strutucture that is used to perform the table scan from the index.
+    </para>
+  
+    <para>
+<programlisting>
+void        (*reset_index_fetch) (struct IndexFetchTableData *data);
+</programlisting>
+  
+     This API to release the AM specific resources that are held by <structname>IndexFetchTableData</structname>
+     of a index scan.
+    </para>
+   
+    <para>
+<programlisting>
+void        (*end_index_fetch) (struct IndexFetchTableData *data);
+</programlisting>
+  
+     This API to release AM-specific resources held by the <structname>IndexFetchTableData</structname>
+     of a given index scan and free the memory of <structname>IndexFetchTableData</structname> itself.
+    </para>
+    
+    <para>
+<programlisting>
+TransactionId (*compute_xid_horizon_for_tuples) (Relation rel,
+                                                 ItemPointerData *items,
+                                                 int nitems);
+</programlisting>
+  
+     This API to get the newest xid among the provided tuples by <literal>items</literal>. This is used
+     to compute what snapshots to conflict with the <literal>items</literal> when replaying WAL records
+     for page-level index vacuums.
+    </para>
+    
+   </sect3>
 
+   <sect3 id="manipulation-of-physical-tuples-functions">
+    <title>Manipulation of physical tuples functions</title>
+     
+    <para>
+<programlisting>
+void        (*tuple_insert) (Relation rel, TupleTableSlot *slot, CommandId cid,
+                             int options, struct BulkInsertStateData *bistate);
+</programlisting>
+  
+     This API to insert the tuple contained in the provided slot into the relation
+     and update the unique identifier of the tuple <literal>ItemPointerData</literal>
+     in the slot, use the BulkInsertStateData if available.
+    </para>
+   
+    <para>
+<programlisting>
+void        (*tuple_insert_speculative) (Relation rel,
+                                         TupleTableSlot *slot,
+                                         CommandId cid,
+                                         int options,
+                                         struct BulkInsertStateData *bistate,
+                                         uint32 specToken);
+</programlisting>
+  
+     This API is similar like <literal>tuple_insert</literal> API, but it inserts the tuple
+     with addtional information that is necessray for speculative insertion, the insertion will
+     be confirmed later based on its successful insertion to the index.
+    </para>
+    
+    <para>
+<programlisting>
+void        (*tuple_complete_speculative) (Relation rel,
+                                           TupleTableSlot *slot,
+                                           uint32 specToken,
+                                           bool succeeded);
+</programlisting>
+  
+     This API to complete the speculative insertion of a tuple started by <literal>tuple_insert_speculative</literal>,
+     invoked after finishing the index insert and returns whether the operation is successfule or not?
+    </para>
+   
+    <para>
+<programlisting>
+HTSU_Result (*tuple_delete) (Relation rel,
+                             ItemPointer tid,
+                             CommandId cid,
+                             Snapshot snapshot,
+                             Snapshot crosscheck,
+                             bool wait,
+                             HeapUpdateFailureData *hufd,
+                             bool changingPart);
+</programlisting>
+  
+     This API to delete a tuple of the relation pointed by the ItemPointer and returns the
+     result of the operation. In case of any failure updates the hufd.
+    </para>
+   
+    <para>
+<programlisting>
+HTSU_Result (*tuple_update) (Relation rel,
+                             ItemPointer otid,
+                             TupleTableSlot *slot,
+                             CommandId cid,
+                             Snapshot snapshot,
+                             Snapshot crosscheck,
+                             bool wait,
+                             HeapUpdateFailureData *hufd,
+                             LockTupleMode *lockmode,
+                             bool *update_indexes);
+</programlisting>
+  
+     This API to perform updating a tuple with the new tuple pointed by the ItemPointer and returns
+     the result of the operation and also updates the flag whether the index needs an update or not?
+     In case of any failure it should update the hufd flag.
+    </para>
+   
+    <para>
+<programlisting>
+void        (*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
+                             CommandId cid, int options, struct BulkInsertStateData *bistate);
+</programlisting>
+  
+     This API to perform insertion of multiple tuples into the relation for faster data insertion.
+     use the BulkInsertStateData if available.
+    </para>
+   
+    <para>
+<programlisting>
+HTSU_Result (*tuple_lock) (Relation rel,
+                           ItemPointer tid,
+                           Snapshot snapshot,
+                           TupleTableSlot *slot,
+                           CommandId cid,
+                           LockTupleMode mode,
+                           LockWaitPolicy wait_policy,
+                           uint8 flags,
+                           HeapUpdateFailureData *hufd);
+</programlisting>
+  
+     This API to lock the specified tuple pointed by the ItemPointer <literal>tid</literal>
+     of its newest version and returns the result of the operation. In case of failure updates the hufd.
+    </para>
+   
+    <para>
+<programlisting>
+void        (*finish_bulk_insert) (Relation rel, int options);
+</programlisting>
+  
+     This API to perform the operations necessary to complete insertions made
+     via <literal>tuple_insert</literal> and <literal>multi_insert</literal> with a
+     BulkInsertState specified. This e.g. may e.g. used to flush the relation when
+     inserting with skipping WAL or may be no operation.
+    </para>
+   
+   </sect3>
+  
+   <sect3 id="non-modifying-tuple-functions">
+    <title>Non modifying tuple functions</title>
+     
+    <para>
+<programlisting>
+bool        (*tuple_fetch_row_version) (Relation rel,
+                                        ItemPointer tid,
+                                        Snapshot snapshot,
+                                        TupleTableSlot *slot,
+                                        Relation stats_relation);
+</programlisting>
+  
+     This API to fetches the latest tuple specified by the ItemPointer <literal>tid</literal>
+     and store it in the slot. For e.g, in the case if Heap AM, the update chains are created
+     whenever the tuple is updated, so the function should fetch the latest tuple.
+    </para>
+  
+    <para>
+<programlisting>
+void        (*tuple_get_latest_tid) (Relation rel,
+                                     Snapshot snapshot,
+                                     ItemPointer tid);
+</programlisting>
+  
+     This API to get the TID of the latest version of the tuple based on the specified
+     ItemPointer. For e.g, in the case of Heap AM, the update chains are created whenever
+     any tuple is updated. This API is useful to find out latest ItemPointer.
+    </para>
+   
+    <para>
+<programlisting>
+bool        (*tuple_fetch_follow) (struct IndexFetchTableData *scan,
+                                   ItemPointer tid,
+                                   Snapshot snapshot,
+                                   TupleTableSlot *slot,
+                                   bool *call_again, bool *all_dead);
+</programlisting>
+  
+     This API is used to fetch the tuple pointed by the ItemPointer based on the
+     IndexFetchTableData and store it in the specified slot and also updates the flags.
+     This API is called from the index scan operation.
+    </para>
+   
+    <para>
+<programlisting>
+bool        (*tuple_satisfies_snapshot) (Relation rel,
+                                         TupleTableSlot *slot,
+                                         Snapshot snapshot);
+</programlisting>
+  
+     This API performs the tuple visibility based on provided snapshot and returns
+     "true" if the current tuple is visible, otherwise "false".
+    </para>
+    
+   </sect3>
+   
+   <sect3 id="ddl-related-functions">
+    <title>DDL related functions</title>
+     
+    <para>
+<programlisting>
+void        (*relation_set_new_filenode) (Relation rel,
+                                          char persistence,
+                                          TransactionId *freezeXid,
+                                          MultiXactId *minmulti);
+</programlisting>
+  
+     This API to create the storage that is necessary to store the tuples of the relation
+     and also updates the minimum XID that is possible to insert the tuples. For e.g, the Heap AM,
+     should create the relfilenode that is necessary to store the heap tuples.
+    </para>
+  
+    <para>
+<programlisting>
+void        (*relation_nontransactional_truncate) (Relation rel);
+</programlisting>
+  
+     This API is used to truncate the specified relation, this operation is not non-reversible.
+    </para>
+  
+    <para>
+<programlisting>
+void        (*relation_copy_data) (Relation rel, RelFileNode newrnode);
+</programlisting>
+  
+     This API to perform the copy of the relation from existing filenode to the new filenode
+     specified by the <literal>newrnode</literal> and removes the existing filenode.
+    </para>
+   
+    <para>
+<programlisting>
+void        (*relation_vacuum) (Relation onerel, int options,
+                                struct VacuumParams *params, BufferAccessStrategy bstrategy);
+</programlisting>
+  
+     This API performs vacuuming of the relation based on the specified params.
+     It Gathers all the dead tuples of the relation and clean them including
+     the indexes.
+    </para>
+   
+    <para>
+<programlisting>
+void        (*scan_analyze_next_block) (TableScanDesc scan, BlockNumber blockno,
+                                        BufferAccessStrategy bstrategy);
+</programlisting>
+  
+     This API to return a relation block, required to perform tuple analysis. Analysis of this
+     information is used by the planner to optimize the query planning on this relation.
+    </para>
+   
+    <para>
+<programlisting>
+bool        (*scan_analyze_next_tuple) (TableScanDesc scan, TransactionId OldestXmin,
+                                        double *liverows, double *deadrows, TupleTableSlot *slot);
+</programlisting>
+  
+     This API to get the next visible tuple from the block being scanned based on the snapshot
+     and also updates the number of live and dead tuples encountered.
+    </para>
+   
+    <para>
+<programlisting>
+void        (*relation_copy_for_cluster) (Relation NewHeap, Relation OldHeap, Relation OldIndex,
+                                          bool use_sort,
+                                          TransactionId OldestXmin, TransactionId FreezeXid, MultiXactId MultiXactCutoff,
+                                          double *num_tuples, double *tups_vacuumed, double *tups_recently_dead);
+</programlisting>
+  
+     This API to make a copy of the content of a relation, optionally sorted using either the specified index or by sorting
+     explicitly. It also removes the dead tuples.
+    </para>
+   
+    <para>
+<programlisting>
+double      (*index_build_range_scan) (Relation heap_rel,
+                                       Relation index_rel,
+                                       IndexInfo *index_nfo,
+                                       bool allow_sync,
+                                       bool anyvisible,
+                                       BlockNumber start_blockno,
+                                       BlockNumber end_blockno,
+                                       IndexBuildCallback callback,
+                                       void *callback_state,
+                                       TableScanDesc scan);
+</programlisting>
+  
+     This API to scan the specified blocks of a given relation and insert them into the specified index
+     using the provided the callback function.
+    </para>
+   
+    <para>
+<programlisting>
+void        (*index_validate_scan) (Relation heap_rel,
+                                    Relation index_rel,
+                                    IndexInfo *index_info,
+                                    Snapshot snapshot,
+                                    struct ValidateIndexState *state);
+</programlisting>
+  
+     This API to scan the table according to the given snapshot and insert tuples
+     satisfying the snapshot into the specified index, provided their TIDs are
+     also present in the <structname>ValidateIndexState</structname> struct;
+     this API is used as the last phase of a concurrent index build.
+    </para>
+   
+   </sect3>
+   
+   <sect3 id="planner-functions">
+    <title>planner functions</title>
+     
+    <para>
+<programlisting>
+void        (*relation_estimate_size) (Relation rel, int32 *attr_widths,
+                                       BlockNumber *pages, double *tuples, double *allvisfrac);
+</programlisting>
+  
+     This API estimates the total size of the relation and also returns the number of
+     pages, tuples and etc related to the corresponding relation.
+    </para>
+    
+   </sect3>
+   
+   <sect3 id="executor-functions">
+    <title>executor functions</title>
+     
+    <para>
+<programlisting>
+bool        (*scan_bitmap_pagescan) (TableScanDesc scan,
+                                     TBMIterateResult *tbmres);
+</programlisting>
+  
+     This API to scan the relation block specified in the scan descriptor to collect and return the
+     tuples requested by the <structname>tbmres</structname> based on the visibility.
+    </para>
+  
+    <para>
+<programlisting>
+bool        (*scan_bitmap_pagescan_next) (TableScanDesc scan,
+                                          TupleTableSlot *slot);
+</programlisting>
+  
+     This API to get the next tuple from the set of tuples of a given page specified in the scan descriptor
+     and return the provided slot; returns false in case if there are no more tuples. 
+    </para>
+   
+    <para>
+<programlisting>
+bool        (*scan_sample_next_block) (TableScanDesc scan,
+                                       struct SampleScanState *scanstate);
+</programlisting>
+  
+     This API to select the next block of a relation using the given sampling method or sequentially and
+     set its information in the scan descriptor.
+    </para>
+   
+    <para>
+<programlisting>
+bool        (*scan_sample_next_tuple) (TableScanDesc scan,
+                                       struct SampleScanState *scanstate,
+                                       TupleTableSlot *slot);
+</programlisting>
+  
+     This API get the next tuple to sample from the current sampling block based on
+     the sampling method, otherwise get the next visible tuple of the block that is 
+     choosen from the <literal>scan_sample_next_block</literal>.
+    </para>
+    
+  </sect3>  
+  </sect2>
  </sect1> 
  
  <sect1 id="index-access-methods">
-- 
2.20.1.windows.1

0001-Reduce-the-use-of-HeapTuple-t_tableOid.patchapplication/octet-stream; name=0001-Reduce-the-use-of-HeapTuple-t_tableOid.patchDownload
From cd447144c3d9ac16d7668f1da0536cdab99c618a Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Wed, 16 Jan 2019 18:43:47 +1100
Subject: [PATCH 01/10] Reduce the use of HeapTuple t_tableOid

Except trigger and where the HeapTuple is generated
and passed from slots, still the t_tableOid is used.
This needs to be replaced when the t_tableOid is stored
as a seperate variable/parameter.
---
 contrib/hstore/hstore_io.c                    |  2 --
 contrib/pg_visibility/pg_visibility.c         |  1 -
 contrib/pgstattuple/pgstatapprox.c            |  1 -
 contrib/pgstattuple/pgstattuple.c             |  3 +-
 contrib/postgres_fdw/postgres_fdw.c           | 11 ++++--
 src/backend/access/common/heaptuple.c         |  7 ----
 src/backend/access/heap/heapam.c              | 35 +++++--------------
 src/backend/access/heap/heapam_handler.c      | 27 +++++++-------
 src/backend/access/heap/heapam_visibility.c   | 20 ++++-------
 src/backend/access/heap/pruneheap.c           |  2 --
 src/backend/access/heap/tuptoaster.c          |  3 --
 src/backend/access/heap/vacuumlazy.c          |  2 --
 src/backend/access/index/genam.c              |  1 -
 src/backend/catalog/indexing.c                |  2 +-
 src/backend/commands/analyze.c                |  2 +-
 src/backend/commands/functioncmds.c           |  3 +-
 src/backend/commands/schemacmds.c             |  1 -
 src/backend/commands/trigger.c                | 21 +++++------
 src/backend/executor/execExprInterp.c         |  1 -
 src/backend/executor/execTuples.c             | 29 +++++++++------
 src/backend/executor/execUtils.c              |  2 --
 src/backend/executor/nodeAgg.c                |  3 +-
 src/backend/executor/nodeGather.c             |  1 +
 src/backend/executor/nodeGatherMerge.c        |  1 +
 src/backend/executor/nodeIndexonlyscan.c      |  4 +--
 src/backend/executor/nodeIndexscan.c          |  3 +-
 src/backend/executor/nodeModifyTable.c        |  6 +---
 src/backend/executor/nodeSetOp.c              |  1 +
 src/backend/executor/spi.c                    |  1 -
 src/backend/executor/tqueue.c                 |  1 -
 src/backend/replication/logical/decode.c      |  9 -----
 .../replication/logical/reorderbuffer.c       |  4 +--
 src/backend/utils/adt/expandedrecord.c        |  1 -
 src/backend/utils/adt/jsonfuncs.c             |  2 --
 src/backend/utils/adt/rowtypes.c              | 10 ------
 src/backend/utils/cache/catcache.c            |  1 -
 src/backend/utils/sort/tuplesort.c            |  7 ++--
 src/include/access/heapam.h                   |  3 +-
 src/include/executor/tuptable.h               |  5 ++-
 src/pl/plpgsql/src/pl_exec.c                  |  2 --
 src/test/regress/regress.c                    |  1 -
 41 files changed, 92 insertions(+), 150 deletions(-)

diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 745497c76f..05244e77ef 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -845,7 +845,6 @@ hstore_from_record(PG_FUNCTION_ARGS)
 		/* Build a temporary HeapTuple control structure */
 		tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 		ItemPointerSetInvalid(&(tuple.t_self));
-		tuple.t_tableOid = InvalidOid;
 		tuple.t_data = rec;
 
 		values = (Datum *) palloc(ncolumns * sizeof(Datum));
@@ -998,7 +997,6 @@ hstore_populate_record(PG_FUNCTION_ARGS)
 		/* Build a temporary HeapTuple control structure */
 		tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 		ItemPointerSetInvalid(&(tuple.t_self));
-		tuple.t_tableOid = InvalidOid;
 		tuple.t_data = rec;
 	}
 
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index c9166730fe..503f00408c 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -657,7 +657,6 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
 			ItemPointerSet(&(tuple.t_self), blkno, offnum);
 			tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 			tuple.t_len = ItemIdGetLength(itemid);
-			tuple.t_tableOid = relid;
 
 			/*
 			 * If we're checking whether the page is all-visible, we expect
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index ed62aef766..17879f115a 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -155,7 +155,6 @@ statapprox_heap(Relation rel, output_type *stat)
 
 			tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 			tuple.t_len = ItemIdGetLength(itemid);
-			tuple.t_tableOid = RelationGetRelid(rel);
 
 			/*
 			 * We follow VACUUM's lead in counting INSERT_IN_PROGRESS tuples
diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 9bcb640884..a0e7abe748 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -344,7 +344,8 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 		/* must hold a buffer lock to call HeapTupleSatisfiesVisibility */
 		LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
 
-		if (HeapTupleSatisfiesVisibility(tuple, &SnapshotDirty, hscan->rs_cbuf))
+		if (HeapTupleSatisfiesVisibility(tuple, RelationGetRelid(hscan->rs_scan.rs_rd),
+								&SnapshotDirty, hscan->rs_cbuf))
 		{
 			stat.tuple_len += tuple->t_len;
 			stat.tuple_count++;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2f387fac42..bbe7f3010f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1471,6 +1471,8 @@ postgresIterateForeignScan(ForeignScanState *node)
 	 */
 	ExecStoreHeapTuple(fsstate->tuples[fsstate->next_tuple++],
 					   slot,
+					   fsstate->rel ?
+							   RelationGetRelid(fsstate->rel) : InvalidOid,
 					   false);
 
 	return slot;
@@ -3511,7 +3513,8 @@ store_returning_result(PgFdwModifyState *fmstate,
 		 * The returning slot will not necessarily be suitable to store
 		 * heaptuples directly, so allow for conversion.
 		 */
-		ExecForceStoreHeapTuple(newtup, slot);
+		ExecForceStoreHeapTuple(newtup, slot,
+					fmstate->rel ? RelationGetRelid(fmstate->rel) : InvalidOid);
 		ExecMaterializeSlot(slot);
 		pfree(newtup);
 	}
@@ -3785,7 +3788,11 @@ get_returning_data(ForeignScanState *node)
 												dmstate->retrieved_attrs,
 												node,
 												dmstate->temp_cxt);
-			ExecStoreHeapTuple(newtup, slot, false);
+			ExecStoreHeapTuple(newtup,
+								slot,
+								dmstate->rel ?
+								RelationGetRelid(dmstate->rel) : InvalidOid,
+								false);
 		}
 		PG_CATCH();
 		{
diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index 783b04a3cb..9e4c3c6cef 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -687,7 +687,6 @@ heap_copytuple(HeapTuple tuple)
 	newTuple = (HeapTuple) palloc(HEAPTUPLESIZE + tuple->t_len);
 	newTuple->t_len = tuple->t_len;
 	newTuple->t_self = tuple->t_self;
-	newTuple->t_tableOid = tuple->t_tableOid;
 	newTuple->t_data = (HeapTupleHeader) ((char *) newTuple + HEAPTUPLESIZE);
 	memcpy((char *) newTuple->t_data, (char *) tuple->t_data, tuple->t_len);
 	return newTuple;
@@ -713,7 +712,6 @@ heap_copytuple_with_tuple(HeapTuple src, HeapTuple dest)
 
 	dest->t_len = src->t_len;
 	dest->t_self = src->t_self;
-	dest->t_tableOid = src->t_tableOid;
 	dest->t_data = (HeapTupleHeader) palloc(src->t_len);
 	memcpy((char *) dest->t_data, (char *) src->t_data, src->t_len);
 }
@@ -848,7 +846,6 @@ expand_tuple(HeapTuple *targetHeapTuple,
 			= targetTHeader
 			= (HeapTupleHeader) ((char *) *targetHeapTuple + HEAPTUPLESIZE);
 		(*targetHeapTuple)->t_len = len;
-		(*targetHeapTuple)->t_tableOid = sourceTuple->t_tableOid;
 		(*targetHeapTuple)->t_self = sourceTuple->t_self;
 
 		targetTHeader->t_infomask = sourceTHeader->t_infomask;
@@ -1076,7 +1073,6 @@ heap_form_tuple(TupleDesc tupleDescriptor,
 	 */
 	tuple->t_len = len;
 	ItemPointerSetInvalid(&(tuple->t_self));
-	tuple->t_tableOid = InvalidOid;
 
 	HeapTupleHeaderSetDatumLength(td, len);
 	HeapTupleHeaderSetTypeId(td, tupleDescriptor->tdtypeid);
@@ -1160,7 +1156,6 @@ heap_modify_tuple(HeapTuple tuple,
 	 */
 	newTuple->t_data->t_ctid = tuple->t_data->t_ctid;
 	newTuple->t_self = tuple->t_self;
-	newTuple->t_tableOid = tuple->t_tableOid;
 
 	return newTuple;
 }
@@ -1223,7 +1218,6 @@ heap_modify_tuple_by_cols(HeapTuple tuple,
 	 */
 	newTuple->t_data->t_ctid = tuple->t_data->t_ctid;
 	newTuple->t_self = tuple->t_self;
-	newTuple->t_tableOid = tuple->t_tableOid;
 
 	return newTuple;
 }
@@ -1463,7 +1457,6 @@ heap_tuple_from_minimal_tuple(MinimalTuple mtup)
 	result = (HeapTuple) palloc(HEAPTUPLESIZE + len);
 	result->t_len = len;
 	ItemPointerSetInvalid(&(result->t_self));
-	result->t_tableOid = InvalidOid;
 	result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
 	memcpy((char *) result->t_data + MINIMAL_TUPLE_OFFSET, mtup, mtup->t_len);
 	memset(result->t_data, 0, offsetof(HeapTupleHeaderData, t_infomask2));
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2a88982576..c1e4d07864 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -414,7 +414,6 @@ heapgetpage(TableScanDesc sscan, BlockNumber page)
 			HeapTupleData loctup;
 			bool		valid;
 
-			loctup.t_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 			loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
 			loctup.t_len = ItemIdGetLength(lpp);
 			ItemPointerSet(&(loctup.t_self), page, lineoff);
@@ -422,7 +421,8 @@ heapgetpage(TableScanDesc sscan, BlockNumber page)
 			if (all_visible)
 				valid = true;
 			else
-				valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer);
+				valid = HeapTupleSatisfiesVisibility(&loctup, RelationGetRelid(scan->rs_scan.rs_rd),
+									snapshot, buffer);
 
 			CheckForSerializableConflictOut(valid, scan->rs_scan.rs_rd, &loctup,
 											buffer, snapshot);
@@ -640,7 +640,7 @@ heapgettup(HeapScanDesc scan,
 				/*
 				 * if current tuple qualifies, return it.
 				 */
-				valid = HeapTupleSatisfiesVisibility(tuple,
+				valid = HeapTupleSatisfiesVisibility(tuple, RelationGetRelid(scan->rs_scan.rs_rd),
 													 snapshot,
 													 scan->rs_cbuf);
 
@@ -1156,9 +1156,6 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 	if (!is_bitmapscan && snapshot)
 		PredicateLockRelation(relation, snapshot);
 
-	/* we only need to set this up once */
-	scan->rs_ctup.t_tableOid = RelationGetRelid(relation);
-
 	/*
 	 * we do this here instead of in initscan() because heap_rescan also calls
 	 * initscan() and we don't want to allocate memory again
@@ -1383,6 +1380,7 @@ heap_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *s
 
 	pgstat_count_heap_getnext(scan->rs_scan.rs_rd);
 
+	slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 	return ExecStoreBufferHeapTuple(&scan->rs_ctup, slot,
 									scan->rs_cbuf);
 }
@@ -1486,12 +1484,11 @@ heap_fetch(Relation relation,
 	ItemPointerCopy(tid, &(tuple->t_self));
 	tuple->t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tuple->t_len = ItemIdGetLength(lp);
-	tuple->t_tableOid = RelationGetRelid(relation);
 
 	/*
 	 * check tuple visibility, then release lock
 	 */
-	valid = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
+	valid = HeapTupleSatisfiesVisibility(tuple, RelationGetRelid(relation), snapshot, buffer);
 
 	if (valid)
 		PredicateLockTuple(relation, tuple, snapshot);
@@ -1596,7 +1593,6 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 
 		heapTuple->t_data = (HeapTupleHeader) PageGetItem(dp, lp);
 		heapTuple->t_len = ItemIdGetLength(lp);
-		heapTuple->t_tableOid = RelationGetRelid(relation);
 		ItemPointerSetOffsetNumber(&heapTuple->t_self, offnum);
 
 		/*
@@ -1633,7 +1629,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 			ItemPointerSet(&(heapTuple->t_self), BufferGetBlockNumber(buffer), offnum);
 
 			/* If it's visible per the snapshot, we must return it */
-			valid = HeapTupleSatisfiesVisibility(heapTuple, snapshot, buffer);
+			valid = HeapTupleSatisfiesVisibility(heapTuple, RelationGetRelid(relation), snapshot, buffer);
 			CheckForSerializableConflictOut(valid, relation, heapTuple,
 											buffer, snapshot);
 			/* reset to original, non-redirected, tid */
@@ -1790,7 +1786,6 @@ heap_get_latest_tid(Relation relation,
 		tp.t_self = ctid;
 		tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 		tp.t_len = ItemIdGetLength(lp);
-		tp.t_tableOid = RelationGetRelid(relation);
 
 		/*
 		 * After following a t_ctid link, we might arrive at an unrelated
@@ -1807,7 +1802,7 @@ heap_get_latest_tid(Relation relation,
 		 * Check tuple visibility; if visible, set it as the new result
 		 * candidate.
 		 */
-		valid = HeapTupleSatisfiesVisibility(&tp, snapshot, buffer);
+		valid = HeapTupleSatisfiesVisibility(&tp, RelationGetRelid(relation), snapshot, buffer);
 		CheckForSerializableConflictOut(valid, relation, &tp, buffer, snapshot);
 		if (valid)
 			*tid = ctid;
@@ -2157,7 +2152,6 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 
 	HeapTupleHeaderSetCmin(tup->t_data, cid);
 	HeapTupleHeaderSetXmax(tup->t_data, 0); /* for cleanliness */
-	tup->t_tableOid = RelationGetRelid(relation);
 
 	/*
 	 * If the new tuple is too big for storage or contains already toasted
@@ -2215,9 +2209,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	{
 		heaptuples[i] = heap_prepare_insert(relation, ExecFetchSlotHeapTuple(slots[i], true, NULL),
 											xid, cid, options);
-
-		if (slots[i]->tts_tableOid != InvalidOid)
-			heaptuples[i]->t_tableOid = slots[i]->tts_tableOid;
 	}
 
 	/*
@@ -2607,7 +2598,6 @@ heap_delete(Relation relation, ItemPointer tid,
 	lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
 	Assert(ItemIdIsNormal(lp));
 
-	tp.t_tableOid = RelationGetRelid(relation);
 	tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tp.t_len = ItemIdGetLength(lp);
 	tp.t_self = *tid;
@@ -2724,7 +2714,7 @@ l1:
 	if (crosscheck != InvalidSnapshot && result == HeapTupleMayBeUpdated)
 	{
 		/* Perform additional check for transaction-snapshot mode RI updates */
-		if (!HeapTupleSatisfiesVisibility(&tp, crosscheck, buffer))
+		if (!HeapTupleSatisfiesVisibility(&tp, RelationGetRelid(relation), crosscheck, buffer))
 			result = HeapTupleUpdated;
 	}
 
@@ -3126,14 +3116,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	 * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
 	 * properly.
 	 */
-	oldtup.t_tableOid = RelationGetRelid(relation);
 	oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	oldtup.t_len = ItemIdGetLength(lp);
 	oldtup.t_self = *otid;
 
-	/* the new tuple is ready, except for this: */
-	newtup->t_tableOid = RelationGetRelid(relation);
-
 	/* Determine columns modified by the update. */
 	modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
 												  &oldtup, newtup);
@@ -3364,7 +3350,7 @@ l2:
 	if (crosscheck != InvalidSnapshot && result == HeapTupleMayBeUpdated)
 	{
 		/* Perform additional check for transaction-snapshot mode RI updates */
-		if (!HeapTupleSatisfiesVisibility(&oldtup, crosscheck, buffer))
+		if (!HeapTupleSatisfiesVisibility(&oldtup, RelationGetRelid(relation), crosscheck, buffer))
 			result = HeapTupleUpdated;
 	}
 
@@ -4118,7 +4104,6 @@ heap_lock_tuple(Relation relation, ItemPointer tid,
 
 	tuple->t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tuple->t_len = ItemIdGetLength(lp);
-	tuple->t_tableOid = RelationGetRelid(relation);
 	tuple->t_self = *tid;
 
 l3:
@@ -5672,7 +5657,6 @@ heap_abort_speculative(Relation relation, HeapTuple tuple)
 	lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
 	Assert(ItemIdIsNormal(lp));
 
-	tp.t_tableOid = RelationGetRelid(relation);
 	tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tp.t_len = ItemIdGetLength(lp);
 	tp.t_self = *tid;
@@ -7504,7 +7488,6 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
 	HeapTupleHeader hdr = tup->t_data;
 
 	Assert(ItemPointerIsValid(&tup->t_self));
-	Assert(tup->t_tableOid != InvalidOid);
 
 	xlrec.top_xid = GetTopTransactionId();
 	xlrec.target_node = relation->rd_node;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index fec649b842..f2719bb017 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -118,8 +118,6 @@ heapam_heap_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
 
 	/* Update the tuple with table oid */
 	slot->tts_tableOid = RelationGetRelid(relation);
-	if (slot->tts_tableOid != InvalidOid)
-		tuple->t_tableOid = slot->tts_tableOid;
 
 	/* Perform the insertion, and copy the resulting ItemPointer */
 	heap_insert(relation, tuple, cid, options, bistate);
@@ -138,8 +136,6 @@ heapam_heap_insert_speculative(Relation relation, TupleTableSlot *slot, CommandI
 
 	/* Update the tuple with table oid */
 	slot->tts_tableOid = RelationGetRelid(relation);
-	if (slot->tts_tableOid != InvalidOid)
-		tuple->t_tableOid = slot->tts_tableOid;
 
 	HeapTupleHeaderSetSpeculativeToken(tuple->t_data, specToken);
 
@@ -566,7 +562,9 @@ heapam_tuple_satisfies_snapshot(Relation rel, TupleTableSlot *slot, Snapshot sna
 	 * Caller should be holding pin, but not lock.
 	 */
 	LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
-	res = HeapTupleSatisfiesVisibility(bslot->base.tuple, snapshot,
+	res = HeapTupleSatisfiesVisibility(bslot->base.tuple,
+									   RelationGetRelid(rel),
+									   snapshot,
 									   bslot->buffer);
 	LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
 
@@ -732,7 +730,6 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
 
 		ItemPointerSet(&targtuple->t_self, scan->rs_cblock, scan->rs_cindex);
 
-		targtuple->t_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 		targtuple->t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
 		targtuple->t_len = ItemIdGetLength(itemid);
 
@@ -817,6 +814,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
 		if (sample_it)
 		{
 			ExecStoreBufferHeapTuple(targtuple, slot, scan->rs_cbuf);
+			slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 			scan->rs_cindex++;
 
 			/* note that we leave the buffer locked here! */
@@ -1477,7 +1475,7 @@ heapam_index_build_range_scan(Relation heapRelation,
 		MemoryContextReset(econtext->ecxt_per_tuple_memory);
 
 		/* Set up for predicate or expression evaluation */
-		ExecStoreHeapTuple(heapTuple, slot, false);
+		ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(sscan->rs_rd), false);
 
 		/*
 		 * In a partial index, discard tuples that don't satisfy the
@@ -1731,7 +1729,7 @@ heapam_index_validate_scan(Relation heapRelation,
 			MemoryContextReset(econtext->ecxt_per_tuple_memory);
 
 			/* Set up for predicate or expression evaluation */
-			ExecStoreHeapTuple(heapTuple, slot, false);
+			ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(sscan->rs_rd), false);
 
 			/*
 			 * In a partial index, discard tuples that don't satisfy the
@@ -1884,9 +1882,9 @@ heapam_scan_bitmap_pagescan(TableScanDesc sscan,
 				continue;
 			loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
 			loctup.t_len = ItemIdGetLength(lp);
-			loctup.t_tableOid = scan->rs_scan.rs_rd->rd_id;
 			ItemPointerSet(&loctup.t_self, page, offnum);
-			valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer);
+			valid = HeapTupleSatisfiesVisibility(&loctup,
+					RelationGetRelid(scan->rs_scan.rs_rd), snapshot, buffer);
 			if (valid)
 			{
 				scan->rs_vistuples[ntup++] = offnum;
@@ -1923,7 +1921,6 @@ heapam_scan_bitmap_pagescan_next(TableScanDesc sscan, TupleTableSlot *slot)
 
 	scan->rs_ctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
 	scan->rs_ctup.t_len = ItemIdGetLength(lp);
-	scan->rs_ctup.t_tableOid = scan->rs_scan.rs_rd->rd_id;
 	ItemPointerSet(&scan->rs_ctup.t_self, scan->rs_cblock, targoffset);
 
 	pgstat_count_heap_fetch(scan->rs_scan.rs_rd);
@@ -1935,6 +1932,7 @@ heapam_scan_bitmap_pagescan_next(TableScanDesc sscan, TupleTableSlot *slot)
 	ExecStoreBufferHeapTuple(&scan->rs_ctup,
 							 slot,
 							 scan->rs_cbuf);
+	slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 
 	scan->rs_cindex++;
 
@@ -1981,8 +1979,10 @@ SampleHeapTupleVisible(HeapScanDesc scan, Buffer buffer,
 	else
 	{
 		/* Otherwise, we have to check the tuple individually. */
-		return HeapTupleSatisfiesVisibility(tuple, scan->rs_scan.rs_snapshot,
-											buffer);
+		return HeapTupleSatisfiesVisibility(tuple,
+				RelationGetRelid(scan->rs_scan.rs_rd),
+				scan->rs_scan.rs_snapshot,
+				buffer);
 	}
 }
 
@@ -2131,6 +2131,7 @@ heapam_scan_sample_next_tuple(TableScanDesc sscan, struct SampleScanState *scans
 				LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 
 			ExecStoreBufferHeapTuple(tuple, slot, scan->rs_cbuf);
+			slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 
 			/* Count successfully-fetched tuples as heap fetches */
 			pgstat_count_heap_getnext(scan->rs_scan.rs_rd);
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 5e8fdacb95..af63038d4a 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -173,7 +173,6 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
@@ -366,7 +365,6 @@ HeapTupleSatisfiesToast(HeapTuple htup, Snapshot snapshot,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
@@ -460,7 +458,6 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
@@ -750,7 +747,6 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	snapshot->xmin = snapshot->xmax = InvalidTransactionId;
 	snapshot->speculativeToken = 0;
@@ -967,7 +963,6 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
@@ -1168,7 +1163,6 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	/*
 	 * Has inserting transaction committed?
@@ -1425,7 +1419,6 @@ HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin)
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	/*
 	 * If the inserting transaction is marked invalid, then it aborted, and
@@ -1541,7 +1534,7 @@ TransactionIdInArray(TransactionId xid, TransactionId *xip, Size num)
  * complicated than when dealing "only" with the present.
  */
 static bool
-HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
+HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Oid relid, Snapshot snapshot,
 							   Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
@@ -1549,7 +1542,6 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 	TransactionId xmax = HeapTupleHeaderGetRawXmax(tuple);
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	/* inserting transaction aborted */
 	if (HeapTupleHeaderXminInvalid(tuple))
@@ -1570,7 +1562,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 		 * values externally.
 		 */
 		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(), snapshot,
-												 htup, buffer,
+												 htup, relid, buffer,
 												 &cmin, &cmax);
 
 		if (!resolved)
@@ -1641,7 +1633,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 
 		/* Lookup actual cmin/cmax values */
 		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(), snapshot,
-												 htup, buffer,
+												 htup, relid, buffer,
 												 &cmin, &cmax);
 
 		if (!resolved)
@@ -1689,8 +1681,10 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
  *	if so, the indicated buffer is marked dirty.
  */
 bool
-HeapTupleSatisfiesVisibility(HeapTuple tup, Snapshot snapshot, Buffer buffer)
+HeapTupleSatisfiesVisibility(HeapTuple tup, Oid relid, Snapshot snapshot, Buffer buffer)
 {
+	Assert(relid != InvalidOid);
+
 	switch (snapshot->snapshot_type)
 	{
 		case SNAPSHOT_MVCC:
@@ -1709,7 +1703,7 @@ HeapTupleSatisfiesVisibility(HeapTuple tup, Snapshot snapshot, Buffer buffer)
 			return HeapTupleSatisfiesDirty(tup, snapshot, buffer);
 			break;
 		case SNAPSHOT_HISTORIC_MVCC:
-			return HeapTupleSatisfiesHistoricMVCC(tup, snapshot, buffer);
+			return HeapTupleSatisfiesHistoricMVCC(tup, relid, snapshot, buffer);
 			break;
 		case SNAPSHOT_NON_VACUUMABLE:
 			return HeapTupleSatisfiesNonVacuumable(tup, snapshot, buffer);
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a3e51922d8..e09a9d7340 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -366,8 +366,6 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 				i;
 	HeapTupleData tup;
 
-	tup.t_tableOid = RelationGetRelid(relation);
-
 	rootlp = PageGetItemId(dp, rootoffnum);
 
 	/*
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 7ea964c493..257ae9761b 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1022,7 +1022,6 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 		result_tuple = (HeapTuple) palloc0(HEAPTUPLESIZE + new_tuple_len);
 		result_tuple->t_len = new_tuple_len;
 		result_tuple->t_self = newtup->t_self;
-		result_tuple->t_tableOid = newtup->t_tableOid;
 		new_data = (HeapTupleHeader) ((char *) result_tuple + HEAPTUPLESIZE);
 		result_tuple->t_data = new_data;
 
@@ -1123,7 +1122,6 @@ toast_flatten_tuple(HeapTuple tup, TupleDesc tupleDesc)
 	 * a syscache entry.
 	 */
 	new_tuple->t_self = tup->t_self;
-	new_tuple->t_tableOid = tup->t_tableOid;
 
 	new_tuple->t_data->t_choice = tup->t_data->t_choice;
 	new_tuple->t_data->t_ctid = tup->t_data->t_ctid;
@@ -1194,7 +1192,6 @@ toast_flatten_tuple_to_datum(HeapTupleHeader tup,
 	/* Build a temporary HeapTuple control structure */
 	tmptup.t_len = tup_len;
 	ItemPointerSetInvalid(&(tmptup.t_self));
-	tmptup.t_tableOid = InvalidOid;
 	tmptup.t_data = tup;
 
 	/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9416c31889..2056dde239 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1009,7 +1009,6 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 			tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 			tuple.t_len = ItemIdGetLength(itemid);
-			tuple.t_tableOid = RelationGetRelid(onerel);
 
 			tupgone = false;
 
@@ -2244,7 +2243,6 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(rel);
 
 		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
 		{
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index f4a527b126..250c746971 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -510,7 +510,6 @@ systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup)
 	result = table_tuple_satisfies_snapshot(sysscan->heap_rel,
 											sysscan->slot,
 											freshsnap);
-
 	return result;
 }
 
diff --git a/src/backend/catalog/indexing.c b/src/backend/catalog/indexing.c
index 0c994122d8..7d443f8fe7 100644
--- a/src/backend/catalog/indexing.c
+++ b/src/backend/catalog/indexing.c
@@ -99,7 +99,7 @@ CatalogIndexInsert(CatalogIndexState indstate, HeapTuple heapTuple)
 	/* Need a slot to hold the tuple being examined */
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
 									&TTSOpsHeapTuple);
-	ExecStoreHeapTuple(heapTuple, slot, false);
+	ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(heapRelation), false);
 
 	/*
 	 * for each index, form and insert the index tuple
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index fb4384d556..a71d7b658e 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -760,7 +760,7 @@ compute_index_stats(Relation onerel, double totalrows,
 			ResetExprContext(econtext);
 
 			/* Set up for predicate or expression evaluation */
-			ExecStoreHeapTuple(heapTuple, slot, false);
+			ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(onerel), false);
 
 			/* If index is partial, check predicate */
 			if (predicate != NULL)
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index 4f62e48d98..0f3e52802b 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -2441,10 +2441,9 @@ ExecuteCallStmt(CallStmt *stmt, ParamListInfo params, bool atomic, DestReceiver
 
 		rettupdata.t_len = HeapTupleHeaderGetDatumLength(td);
 		ItemPointerSetInvalid(&(rettupdata.t_self));
-		rettupdata.t_tableOid = InvalidOid;
 		rettupdata.t_data = td;
 
-		slot = ExecStoreHeapTuple(&rettupdata, tstate->slot, false);
+		slot = ExecStoreHeapTuple(&rettupdata, tstate->slot, InvalidOid, false);
 		tstate->dest->receiveSlot(slot, tstate->dest);
 
 		end_tup_output(tstate);
diff --git a/src/backend/commands/schemacmds.c b/src/backend/commands/schemacmds.c
index 6cf94a3140..4492ae2b0e 100644
--- a/src/backend/commands/schemacmds.c
+++ b/src/backend/commands/schemacmds.c
@@ -355,7 +355,6 @@ AlterSchemaOwner_internal(HeapTuple tup, Relation rel, Oid newOwnerId)
 {
 	Form_pg_namespace nspForm;
 
-	Assert(tup->t_tableOid == NamespaceRelationId);
 	Assert(RelationGetRelid(rel) == NamespaceRelationId);
 
 	nspForm = (Form_pg_namespace) GETSTRUCT(tup);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 1653c37567..bae5de9764 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2567,7 +2567,7 @@ ExecBRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
 		}
 		else if (newtuple != oldtuple)
 		{
-			ExecForceStoreHeapTuple(newtuple, slot);
+			ExecForceStoreHeapTuple(newtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 			if (should_free)
 				heap_freetuple(oldtuple);
@@ -2649,7 +2649,8 @@ ExecIRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
 		}
 		else if (newtuple != oldtuple)
 		{
-			ExecForceStoreHeapTuple(newtuple, slot);
+			ExecForceStoreHeapTuple(newtuple, LocTriggerData.tg_trigslot,
+									RelationGetRelid(relinfo->ri_RelationDesc));
 
 			if (should_free)
 				heap_freetuple(oldtuple);
@@ -2778,7 +2779,7 @@ ExecBRDeleteTriggers(EState *estate, EPQState *epqstate,
 	else
 	{
 		trigtuple = fdw_trigtuple;
-		ExecForceStoreHeapTuple(trigtuple, slot);
+		ExecForceStoreHeapTuple(trigtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
 	}
 
 	LocTriggerData.type = T_TriggerData;
@@ -2850,7 +2851,7 @@ ExecARDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
 							   slot,
 							   NULL);
 		else
-			ExecForceStoreHeapTuple(fdw_trigtuple, slot);
+			ExecForceStoreHeapTuple(fdw_trigtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_DELETE,
 							  true, slot, NULL, NIL, NULL,
@@ -2879,7 +2880,7 @@ ExecIRDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
 	LocTriggerData.tg_oldtable = NULL;
 	LocTriggerData.tg_newtable = NULL;
 
-	ExecForceStoreHeapTuple(trigtuple, slot);
+	ExecForceStoreHeapTuple(trigtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 	for (i = 0; i < trigdesc->numtriggers; i++)
 	{
@@ -3038,7 +3039,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 	}
 	else
 	{
-		ExecForceStoreHeapTuple(fdw_trigtuple, oldslot);
+		ExecForceStoreHeapTuple(fdw_trigtuple, oldslot, RelationGetRelid(relinfo->ri_RelationDesc));
 		trigtuple = fdw_trigtuple;
 	}
 
@@ -3088,7 +3089,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 		}
 		else if (newtuple != oldtuple)
 		{
-			ExecForceStoreHeapTuple(newtuple, newslot);
+			ExecForceStoreHeapTuple(newtuple, newslot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 			if (should_free_new)
 				heap_freetuple(oldtuple);
@@ -3136,7 +3137,7 @@ ExecARUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 							   oldslot,
 							   NULL);
 		else if (fdw_trigtuple != NULL)
-			ExecForceStoreHeapTuple(fdw_trigtuple, oldslot);
+			ExecForceStoreHeapTuple(fdw_trigtuple, oldslot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_UPDATE,
 							  true, oldslot, newslot, recheckIndexes,
@@ -3164,7 +3165,7 @@ ExecIRUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 	LocTriggerData.tg_oldtable = NULL;
 	LocTriggerData.tg_newtable = NULL;
 
-	ExecForceStoreHeapTuple(trigtuple, oldslot);
+	ExecForceStoreHeapTuple(trigtuple, oldslot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 	for (i = 0; i < trigdesc->numtriggers; i++)
 	{
@@ -3200,7 +3201,7 @@ ExecIRUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 		}
 		else if (newtuple != oldtuple)
 		{
-			ExecForceStoreHeapTuple(newtuple, newslot);
+			ExecForceStoreHeapTuple(newtuple, newslot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 			if (should_free)
 				heap_freetuple(oldtuple);
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index a018925d4e..11aee64d19 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3016,7 +3016,6 @@ ExecEvalFieldStoreDeForm(ExprState *state, ExprEvalStep *op, ExprContext *econte
 		tuphdr = DatumGetHeapTupleHeader(tupDatum);
 		tmptup.t_len = HeapTupleHeaderGetDatumLength(tuphdr);
 		ItemPointerSetInvalid(&(tmptup.t_self));
-		tmptup.t_tableOid = InvalidOid;
 		tmptup.t_data = tuphdr;
 
 		heap_deform_tuple(&tmptup, tupDesc,
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 41fa374b6f..cde87c9d4b 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -389,7 +389,7 @@ tts_heap_copyslot(TupleTableSlot *dstslot, TupleTableSlot *srcslot)
 	tuple = ExecCopySlotHeapTuple(srcslot);
 	MemoryContextSwitchTo(oldcontext);
 
-	ExecStoreHeapTuple(tuple, dstslot, true);
+	ExecStoreHeapTuple(tuple, dstslot, srcslot->tts_tableOid, true);
 }
 
 static HeapTuple
@@ -1102,6 +1102,7 @@ MakeTupleTableSlot(TupleDesc tupleDesc,
 	slot->tts_tupleDescriptor = tupleDesc;
 	slot->tts_mcxt = CurrentMemoryContext;
 	slot->tts_nvalid = 0;
+	slot->tts_tableOid = InvalidOid;
 
 	if (tupleDesc != NULL)
 	{
@@ -1314,6 +1315,7 @@ ExecSetSlotDescriptor(TupleTableSlot *slot, /* slot to change */
 TupleTableSlot *
 ExecStoreHeapTuple(HeapTuple tuple,
 				   TupleTableSlot *slot,
+				   Oid relid,
 				   bool shouldFree)
 {
 	/*
@@ -1327,7 +1329,7 @@ ExecStoreHeapTuple(HeapTuple tuple,
 		elog(ERROR, "trying to store a heap tuple into wrong type of slot");
 	tts_heap_store_tuple(slot, tuple, shouldFree);
 
-	slot->tts_tableOid = tuple->t_tableOid;
+	slot->tts_tableOid = relid;
 
 	return slot;
 }
@@ -1349,6 +1351,8 @@ ExecStoreHeapTuple(HeapTuple tuple,
  *
  * If the target slot is not guaranteed to be TTSOpsBufferHeapTuple type slot,
  * use the, more expensive, ExecForceStoreHeapTuple().
+ *
+ * NOTE: Don't set the tts_tableOid from tuple t_tableOid.
  * --------------------------------
  */
 TupleTableSlot *
@@ -1368,8 +1372,6 @@ ExecStoreBufferHeapTuple(HeapTuple tuple,
 		elog(ERROR, "trying to store an on-disk heap tuple into wrong type of slot");
 	tts_buffer_heap_store_tuple(slot, tuple, buffer, false);
 
-	slot->tts_tableOid = tuple->t_tableOid;
-
 	return slot;
 }
 
@@ -1394,8 +1396,6 @@ ExecStorePinnedBufferHeapTuple(HeapTuple tuple,
 		elog(ERROR, "trying to store an on-disk heap tuple into wrong type of slot");
 	tts_buffer_heap_store_tuple(slot, tuple, buffer, true);
 
-	slot->tts_tableOid = tuple->t_tableOid;
-
 	return slot;
 }
 
@@ -1430,11 +1430,12 @@ ExecStoreMinimalTuple(MinimalTuple mtup,
  */
 void
 ExecForceStoreHeapTuple(HeapTuple tuple,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						Oid relid)
 {
 	if (TTS_IS_HEAPTUPLE(slot))
 	{
-		ExecStoreHeapTuple(tuple, slot, false);
+		ExecStoreHeapTuple(tuple, slot, relid, false);
 	}
 	else if (TTS_IS_BUFFERTUPLE(slot))
 	{
@@ -1447,6 +1448,7 @@ ExecForceStoreHeapTuple(HeapTuple tuple,
 		oldContext = MemoryContextSwitchTo(slot->tts_mcxt);
 		bslot->base.tuple = heap_copytuple(tuple);
 		MemoryContextSwitchTo(oldContext);
+		slot->tts_tableOid = relid;
 	}
 	else
 	{
@@ -1454,6 +1456,7 @@ ExecForceStoreHeapTuple(HeapTuple tuple,
 		heap_deform_tuple(tuple, slot->tts_tupleDescriptor,
 						  slot->tts_values, slot->tts_isnull);
 		ExecStoreVirtualTuple(slot);
+		slot->tts_tableOid = relid;
 	}
 }
 
@@ -1590,6 +1593,8 @@ ExecStoreHeapTupleDatum(Datum data, TupleTableSlot *slot)
 HeapTuple
 ExecFetchSlotHeapTuple(TupleTableSlot *slot, bool materialize, bool *shouldFree)
 {
+	HeapTuple htup;
+
 	/*
 	 * sanity checks
 	 */
@@ -1604,14 +1609,18 @@ ExecFetchSlotHeapTuple(TupleTableSlot *slot, bool materialize, bool *shouldFree)
 	{
 		if (shouldFree)
 			*shouldFree = true;
-		return slot->tts_ops->copy_heap_tuple(slot);
+		htup = slot->tts_ops->copy_heap_tuple(slot);
 	}
 	else
 	{
 		if (shouldFree)
 			*shouldFree = false;
-		return slot->tts_ops->get_heap_tuple(slot);
+		htup = slot->tts_ops->get_heap_tuple(slot);
 	}
+
+	htup->t_tableOid = slot->tts_tableOid;
+
+	return htup;
 }
 
 /* --------------------------------
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 7f33fe933b..adbbe17bde 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1010,7 +1010,6 @@ GetAttributeByName(HeapTupleHeader tuple, const char *attname, bool *isNull)
 	 */
 	tmptup.t_len = HeapTupleHeaderGetDatumLength(tuple);
 	ItemPointerSetInvalid(&(tmptup.t_self));
-	tmptup.t_tableOid = InvalidOid;
 	tmptup.t_data = tuple;
 
 	result = heap_getattr(&tmptup,
@@ -1058,7 +1057,6 @@ GetAttributeByNum(HeapTupleHeader tuple,
 	 */
 	tmptup.t_len = HeapTupleHeaderGetDatumLength(tuple);
 	ItemPointerSetInvalid(&(tmptup.t_self));
-	tmptup.t_tableOid = InvalidOid;
 	tmptup.t_data = tuple;
 
 	result = heap_getattr(&tmptup,
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index bae7989a42..07479bc5c4 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -1806,7 +1806,8 @@ agg_retrieve_direct(AggState *aggstate)
 				 * cleared from the slot.
 				 */
 				ExecForceStoreHeapTuple(aggstate->grp_firstTuple,
-								   firstSlot);
+								   firstSlot,
+								   InvalidOid);
 				aggstate->grp_firstTuple = NULL;	/* don't keep two pointers */
 
 				/* set up for first advance_aggregates call */
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 69d5a1f239..b467db19fb 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -280,6 +280,7 @@ gather_getnext(GatherState *gatherstate)
 			{
 				ExecStoreHeapTuple(tup, /* tuple to store */
 								   fslot,	/* slot to store the tuple */
+								   InvalidOid,
 								   true);	/* pfree tuple when done with it */
 				return fslot;
 			}
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 4de1d2b484..a04da7bbeb 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -703,6 +703,7 @@ gather_merge_readnext(GatherMergeState *gm_state, int reader, bool nowait)
 	ExecStoreHeapTuple(tup,			/* tuple to store */
 					   gm_state->gm_slots[reader],	/* slot in which to store
 													 * the tuple */
+					   InvalidOid,
 					   true);		/* pfree tuple when done with it */
 
 	return true;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 85c5a1fb79..383c9a8e22 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -206,8 +206,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 			 */
 			Assert(slot->tts_tupleDescriptor->natts ==
 				   scandesc->xs_hitupdesc->natts);
-			ExecForceStoreHeapTuple(scandesc->xs_hitup, slot);
-			slot->tts_tableOid = RelationGetRelid(scandesc->heapRelation);
+			ExecForceStoreHeapTuple(scandesc->xs_hitup, slot,
+					RelationGetRelid(scandesc->heapRelation));
 		}
 		else if (scandesc->xs_itup)
 			StoreIndexTuple(slot, scandesc->xs_itup, scandesc->xs_itupdesc);
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 84e8e872ee..a5401e9c02 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -246,8 +246,7 @@ IndexNextWithReorder(IndexScanState *node)
 				tuple = reorderqueue_pop(node);
 
 				/* Pass 'true', as the tuple in the queue is a palloc'd copy */
-				slot->tts_tableOid = RelationGetRelid(scandesc->heapRelation);
-				ExecStoreHeapTuple(tuple, slot, true);
+				ExecStoreHeapTuple(tuple, slot, RelationGetRelid(scandesc->heapRelation), true);
 				return slot;
 			}
 		}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index f4f95428af..fc6b53a148 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -844,7 +844,7 @@ ldelete:;
 			slot = ExecGetReturningSlot(estate, resultRelInfo);
 			if (oldtuple != NULL)
 			{
-				ExecForceStoreHeapTuple(oldtuple, slot);
+				ExecForceStoreHeapTuple(oldtuple, slot, RelationGetRelid(resultRelationDesc));
 			}
 			else
 			{
@@ -2017,10 +2017,6 @@ ExecModifyTable(PlanState *pstate)
 					oldtupdata.t_len =
 						HeapTupleHeaderGetDatumLength(oldtupdata.t_data);
 					ItemPointerSetInvalid(&(oldtupdata.t_self));
-					/* Historically, view triggers see invalid t_tableOid. */
-					oldtupdata.t_tableOid =
-						(relkind == RELKIND_VIEW) ? InvalidOid :
-						RelationGetRelid(resultRelInfo->ri_RelationDesc);
 
 					oldtuple = &oldtupdata;
 				}
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 26aeaee083..8432c54ddb 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -270,6 +270,7 @@ setop_retrieve_direct(SetOpState *setopstate)
 		 */
 		ExecStoreHeapTuple(setopstate->grp_firstTuple,
 						   resultTupleSlot,
+						   InvalidOid,
 						   true);
 		setopstate->grp_firstTuple = NULL;	/* don't keep two pointers */
 
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 70c03e0f60..c44147c618 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -870,7 +870,6 @@ SPI_modifytuple(Relation rel, HeapTuple tuple, int natts, int *attnum,
 		 */
 		mtuple->t_data->t_ctid = tuple->t_data->t_ctid;
 		mtuple->t_self = tuple->t_self;
-		mtuple->t_tableOid = tuple->t_tableOid;
 	}
 	else
 	{
diff --git a/src/backend/executor/tqueue.c b/src/backend/executor/tqueue.c
index 6e2eaa5dcf..3ebf1e347e 100644
--- a/src/backend/executor/tqueue.c
+++ b/src/backend/executor/tqueue.c
@@ -208,7 +208,6 @@ TupleQueueReaderNext(TupleQueueReader *reader, bool nowait, bool *done)
 	 * (which had better be sufficiently aligned).
 	 */
 	ItemPointerSetInvalid(&htup.t_self);
-	htup.t_tableOid = InvalidOid;
 	htup.t_len = nbytes;
 	htup.t_data = data;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index eec3a22842..f26dac0f70 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -940,12 +940,6 @@ DecodeMultiInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			/* not a disk based tuple */
 			ItemPointerSetInvalid(&tuple->tuple.t_self);
 
-			/*
-			 * We can only figure this out after reassembling the
-			 * transactions.
-			 */
-			tuple->tuple.t_tableOid = InvalidOid;
-
 			tuple->tuple.t_len = datalen + SizeofHeapTupleHeader;
 
 			memset(header, 0, SizeofHeapTupleHeader);
@@ -1033,9 +1027,6 @@ DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tuple)
 	/* not a disk based tuple */
 	ItemPointerSetInvalid(&tuple->tuple.t_self);
 
-	/* we can only figure this out after reassembling the transactions */
-	tuple->tuple.t_tableOid = InvalidOid;
-
 	/* data is not stored aligned, copy to aligned storage */
 	memcpy((char *) &xlhdr,
 		   data,
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 2b486b5e9f..e95a5bbb3d 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3486,7 +3486,7 @@ UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
 bool
 ResolveCminCmaxDuringDecoding(HTAB *tuplecid_data,
 							  Snapshot snapshot,
-							  HeapTuple htup, Buffer buffer,
+							  HeapTuple htup, Oid relid, Buffer buffer,
 							  CommandId *cmin, CommandId *cmax)
 {
 	ReorderBufferTupleCidKey key;
@@ -3528,7 +3528,7 @@ restart:
 	 */
 	if (ent == NULL && !updated_mapping)
 	{
-		UpdateLogicalMappings(tuplecid_data, htup->t_tableOid, snapshot);
+		UpdateLogicalMappings(tuplecid_data, relid, snapshot);
 		/* now check but don't update for a mapping again */
 		updated_mapping = true;
 		goto restart;
diff --git a/src/backend/utils/adt/expandedrecord.c b/src/backend/utils/adt/expandedrecord.c
index 9971abd71f..a49cf9b467 100644
--- a/src/backend/utils/adt/expandedrecord.c
+++ b/src/backend/utils/adt/expandedrecord.c
@@ -610,7 +610,6 @@ make_expanded_record_from_datum(Datum recorddatum, MemoryContext parentcontext)
 
 	tmptup.t_len = HeapTupleHeaderGetDatumLength(tuphdr);
 	ItemPointerSetInvalid(&(tmptup.t_self));
-	tmptup.t_tableOid = InvalidOid;
 	tmptup.t_data = tuphdr;
 
 	oldcxt = MemoryContextSwitchTo(objcxt);
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index dd88c09e6d..314777909d 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -3147,7 +3147,6 @@ populate_record(TupleDesc tupdesc,
 		/* Build a temporary HeapTuple control structure */
 		tuple.t_len = HeapTupleHeaderGetDatumLength(defaultval);
 		ItemPointerSetInvalid(&(tuple.t_self));
-		tuple.t_tableOid = InvalidOid;
 		tuple.t_data = defaultval;
 
 		/* Break down the tuple into fields */
@@ -3546,7 +3545,6 @@ populate_recordset_record(PopulateRecordsetState *state, JsObject *obj)
 	/* ok, save into tuplestore */
 	tuple.t_len = HeapTupleHeaderGetDatumLength(tuphead);
 	ItemPointerSetInvalid(&(tuple.t_self));
-	tuple.t_tableOid = InvalidOid;
 	tuple.t_data = tuphead;
 
 	tuplestore_puttuple(state->tuple_store, &tuple);
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index 5bbf568610..7f1adce08a 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -324,7 +324,6 @@ record_out(PG_FUNCTION_ARGS)
 	/* Build a temporary HeapTuple control structure */
 	tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 	ItemPointerSetInvalid(&(tuple.t_self));
-	tuple.t_tableOid = InvalidOid;
 	tuple.t_data = rec;
 
 	/*
@@ -671,7 +670,6 @@ record_send(PG_FUNCTION_ARGS)
 	/* Build a temporary HeapTuple control structure */
 	tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 	ItemPointerSetInvalid(&(tuple.t_self));
-	tuple.t_tableOid = InvalidOid;
 	tuple.t_data = rec;
 
 	/*
@@ -821,11 +819,9 @@ record_cmp(FunctionCallInfo fcinfo)
 	/* Build temporary HeapTuple control structures */
 	tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
 	ItemPointerSetInvalid(&(tuple1.t_self));
-	tuple1.t_tableOid = InvalidOid;
 	tuple1.t_data = record1;
 	tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
 	ItemPointerSetInvalid(&(tuple2.t_self));
-	tuple2.t_tableOid = InvalidOid;
 	tuple2.t_data = record2;
 
 	/*
@@ -1063,11 +1059,9 @@ record_eq(PG_FUNCTION_ARGS)
 	/* Build temporary HeapTuple control structures */
 	tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
 	ItemPointerSetInvalid(&(tuple1.t_self));
-	tuple1.t_tableOid = InvalidOid;
 	tuple1.t_data = record1;
 	tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
 	ItemPointerSetInvalid(&(tuple2.t_self));
-	tuple2.t_tableOid = InvalidOid;
 	tuple2.t_data = record2;
 
 	/*
@@ -1326,11 +1320,9 @@ record_image_cmp(FunctionCallInfo fcinfo)
 	/* Build temporary HeapTuple control structures */
 	tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
 	ItemPointerSetInvalid(&(tuple1.t_self));
-	tuple1.t_tableOid = InvalidOid;
 	tuple1.t_data = record1;
 	tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
 	ItemPointerSetInvalid(&(tuple2.t_self));
-	tuple2.t_tableOid = InvalidOid;
 	tuple2.t_data = record2;
 
 	/*
@@ -1570,11 +1562,9 @@ record_image_eq(PG_FUNCTION_ARGS)
 	/* Build temporary HeapTuple control structures */
 	tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
 	ItemPointerSetInvalid(&(tuple1.t_self));
-	tuple1.t_tableOid = InvalidOid;
 	tuple1.t_data = record1;
 	tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
 	ItemPointerSetInvalid(&(tuple2.t_self));
-	tuple2.t_tableOid = InvalidOid;
 	tuple2.t_data = record2;
 
 	/*
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 78dd5714fa..3d93f2b84b 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1832,7 +1832,6 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
 								MAXIMUM_ALIGNOF + dtp->t_len);
 		ct->tuple.t_len = dtp->t_len;
 		ct->tuple.t_self = dtp->t_self;
-		ct->tuple.t_tableOid = dtp->t_tableOid;
 		ct->tuple.t_data = (HeapTupleHeader)
 			MAXALIGN(((char *) ct) + sizeof(CatCTup));
 		/* copy tuple contents */
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 60b96df8f9..a3ed15214a 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -3792,11 +3792,11 @@ comparetup_cluster(const SortTuple *a, const SortTuple *b,
 
 		ecxt_scantuple = GetPerTupleExprContext(state->estate)->ecxt_scantuple;
 
-		ExecStoreHeapTuple(ltup, ecxt_scantuple, false);
+		ExecStoreHeapTuple(ltup, ecxt_scantuple, InvalidOid, false);
 		FormIndexDatum(state->indexInfo, ecxt_scantuple, state->estate,
 					   l_index_values, l_index_isnull);
 
-		ExecStoreHeapTuple(rtup, ecxt_scantuple, false);
+		ExecStoreHeapTuple(rtup, ecxt_scantuple, InvalidOid, false);
 		FormIndexDatum(state->indexInfo, ecxt_scantuple, state->estate,
 					   r_index_values, r_index_isnull);
 
@@ -3926,8 +3926,7 @@ readtup_cluster(Tuplesortstate *state, SortTuple *stup,
 	tuple->t_len = t_len;
 	LogicalTapeReadExact(state->tapeset, tapenum,
 						 &tuple->t_self, sizeof(ItemPointerData));
-	/* We don't currently bother to reconstruct t_tableOid */
-	tuple->t_tableOid = InvalidOid;
+
 	/* Read in the tuple body */
 	LogicalTapeReadExact(state->tapeset, tapenum,
 						 tuple->t_data, tuple->t_len);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8401dac483..a290e4f053 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -211,7 +211,7 @@ extern void heap_vacuum_rel(Relation onerel, int options,
 				struct VacuumParams *params, BufferAccessStrategy bstrategy);
 
 /* in heap/heapam_visibility.c */
-extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot,
+extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Oid relid, Snapshot snapshot,
 										 Buffer buffer);
 extern HTSU_Result HeapTupleSatisfiesUpdate(HeapTuple stup, CommandId curcid,
 						 Buffer buffer);
@@ -231,6 +231,7 @@ struct HTAB;
 extern bool ResolveCminCmaxDuringDecoding(struct HTAB *tuplecid_data,
 							  Snapshot snapshot,
 							  HeapTuple htup,
+							  Oid relid,
 							  Buffer buffer,
 							  CommandId *cmin, CommandId *cmax);
 
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index b0561ebe29..cb09ccb859 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -304,9 +304,8 @@ extern TupleTableSlot *MakeSingleTupleTableSlot(TupleDesc tupdesc,
 extern void ExecDropSingleTupleTableSlot(TupleTableSlot *slot);
 extern void ExecSetSlotDescriptor(TupleTableSlot *slot, TupleDesc tupdesc);
 extern TupleTableSlot *ExecStoreHeapTuple(HeapTuple tuple,
-				   TupleTableSlot *slot,
-				   bool shouldFree);
-extern void ExecForceStoreHeapTuple(HeapTuple tuple, TupleTableSlot *slot);
+				   TupleTableSlot *slot, Oid relid, bool shouldFree);
+extern void ExecForceStoreHeapTuple(HeapTuple tuple, TupleTableSlot *slot, Oid relid);
 extern TupleTableSlot *ExecStoreBufferHeapTuple(HeapTuple tuple,
 						 TupleTableSlot *slot,
 						 Buffer buffer);
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index a5aafa8c09..45ebd5814d 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -7235,7 +7235,6 @@ deconstruct_composite_datum(Datum value, HeapTupleData *tmptup)
 	/* Build a temporary HeapTuple control structure */
 	tmptup->t_len = HeapTupleHeaderGetDatumLength(td);
 	ItemPointerSetInvalid(&(tmptup->t_self));
-	tmptup->t_tableOid = InvalidOid;
 	tmptup->t_data = td;
 
 	/* Extract rowtype info and find a tupdesc */
@@ -7404,7 +7403,6 @@ exec_move_row_from_datum(PLpgSQL_execstate *estate,
 		/* Build a temporary HeapTuple control structure */
 		tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
 		ItemPointerSetInvalid(&(tmptup.t_self));
-		tmptup.t_tableOid = InvalidOid;
 		tmptup.t_data = td;
 
 		/* Extract rowtype info */
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index ad3e803899..9e60e88242 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -528,7 +528,6 @@ make_tuple_indirect(PG_FUNCTION_ARGS)
 	/* Build a temporary HeapTuple control structure */
 	tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 	ItemPointerSetInvalid(&(tuple.t_self));
-	tuple.t_tableOid = InvalidOid;
 	tuple.t_data = rec;
 
 	values = (Datum *) palloc(ncolumns * sizeof(Datum));
-- 
2.20.1.windows.1

0003-Docs-of-default_table_access_method-GUC.patchapplication/octet-stream; name=0003-Docs-of-default_table_access_method-GUC.patchDownload
From 56f8c89f0770561088e1af6d48ec1762e69cb397 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 7 Feb 2019 15:59:48 +1100
Subject: [PATCH 03/10] Docs of default_table_access_method GUC

This GUC can be set to use the default table access methods
for the tables that are created when table is not specified
it's own access method
---
 doc/src/sgml/config.sgml | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6d42b7afe7..17a8871c51 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7214,6 +7214,30 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-default-table-access-method" xreflabel="default_table_access_method">
+      <term><varname>default_table_access_method</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>default_table_access_method</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        This parameter specifies the default table access method to use when creating
+        tables or materialized views if the <command>CREATE</command> command does
+        not explicitly specify an access method.
+       </para>
+
+       <para>
+        The value is either the name of a table access method, or an empty string
+        to specify using the default table access method of the current database.
+        If the value does not match the name of any existing table access method,
+        <productname>PostgreSQL</productname> will automatically use the default
+        table access method of the current database.
+       </para>
+
+      </listitem>
+     </varlistentry>
+     
      <varlistentry id="guc-default-tablespace" xreflabel="default_tablespace">
       <term><varname>default_tablespace</varname> (<type>string</type>)
       <indexterm>
-- 
2.20.1.windows.1

0004-Rename-indexam.sgml-to-am.sgml.patchapplication/octet-stream; name=0004-Rename-indexam.sgml-to-am.sgml.patchDownload
From 063ac32355d1e112083e272527ba918f2088bad8 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 7 Feb 2019 16:13:56 +1100
Subject: [PATCH 04/10] Rename indexam.sgml to am.sgml

It is just a rename and necessry updates
---
 doc/src/sgml/{indexam.sgml => am.sgml} | 2 +-
 doc/src/sgml/filelist.sgml             | 2 +-
 doc/src/sgml/postgres.sgml             | 2 +-
 doc/src/sgml/xindex.sgml               | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)
 rename doc/src/sgml/{indexam.sgml => am.sgml} (99%)

diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/am.sgml
similarity index 99%
rename from doc/src/sgml/indexam.sgml
rename to doc/src/sgml/am.sgml
index 05102724ea..a9f0838ee5 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/am.sgml
@@ -1,4 +1,4 @@
-<!-- doc/src/sgml/indexam.sgml -->
+<!-- doc/src/sgml/am.sgml -->
 
 <chapter id="indexam">
  <title>Index Access Method Interface Definition</title>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index a03ea1427b..52a5efca94 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -89,7 +89,7 @@
 <!ENTITY gin        SYSTEM "gin.sgml">
 <!ENTITY brin       SYSTEM "brin.sgml">
 <!ENTITY planstats    SYSTEM "planstats.sgml">
-<!ENTITY indexam    SYSTEM "indexam.sgml">
+<!ENTITY am         SYSTEM "am.sgml">
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 96d196d229..9dce0c5f81 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -250,7 +250,7 @@
   &tablesample-method;
   &custom-scan;
   &geqo;
-  &indexam;
+  &am;
   &generic-wal;
   &btree;
   &gist;
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml
index 9446f8b836..4fa821160c 100644
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@@ -36,7 +36,7 @@
    described in <classname>pg_am</classname>.  It is possible to add a
    new index access method by writing the necessary code and
    then creating an entry in <classname>pg_am</classname> &mdash; but that is
-   beyond the scope of this chapter (see <xref linkend="indexam"/>).
+   beyond the scope of this chapter (see <xref linkend="am"/>).
   </para>
 
   <para>
-- 
2.20.1.windows.1

0002-Removal-of-scan_update_snapshot-callback.patchapplication/octet-stream; name=0002-Removal-of-scan_update_snapshot-callback.patchDownload
From 80a3c95429e081d5df636b5876e050f53ea7b9e4 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Tue, 22 Jan 2019 11:29:20 +1100
Subject: [PATCH 02/10] Removal of scan_update_snapshot callback

The snapshot structure is avaiable in the tablescandesc
structure itself, so it can be accessed outside itself,
no need of any callback.
---
 src/backend/access/heap/heapam.c         | 18 ------------------
 src/backend/access/heap/heapam_handler.c |  1 -
 src/backend/access/table/tableam.c       | 13 +++++++++++++
 src/include/access/heapam.h              |  1 -
 src/include/access/tableam.h             | 16 ++++++----------
 5 files changed, 19 insertions(+), 30 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c1e4d07864..16a3a378eb 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1252,24 +1252,6 @@ heap_endscan(TableScanDesc sscan)
 	pfree(scan);
 }
 
-/* ----------------
- *		heap_update_snapshot
- *
- *		Update snapshot info in heap scan descriptor.
- * ----------------
- */
-void
-heap_update_snapshot(TableScanDesc sscan, Snapshot snapshot)
-{
-	HeapScanDesc scan = (HeapScanDesc) sscan;
-
-	Assert(IsMVCCSnapshot(snapshot));
-
-	RegisterSnapshot(snapshot);
-	scan->rs_scan.rs_snapshot = snapshot;
-	scan->rs_scan.rs_temp_snap = true;
-}
-
 /* ----------------
  *		heap_getnext	- retrieve next tuple in scan
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index f2719bb017..eab6a107a6 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2307,7 +2307,6 @@ static const TableAmRoutine heapam_methods = {
 	.scan_begin = heap_beginscan,
 	.scan_end = heap_endscan,
 	.scan_rescan = heap_rescan,
-	.scan_update_snapshot = heap_update_snapshot,
 	.scan_getnextslot = heap_getnextslot,
 
 	.parallelscan_estimate = table_block_parallelscan_estimate,
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 43e5444dcb..9b17cf1cd9 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -395,3 +395,16 @@ table_block_parallelscan_nextpage(Relation rel, ParallelBlockTableScanDesc pbsca
 
 	return page;
 }
+
+/*
+ * Update snapshot info in table scan descriptor.
+ */
+void
+table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot)
+{
+	Assert(IsMVCCSnapshot(snapshot));
+
+	RegisterSnapshot(snapshot);
+	scan->rs_snapshot = snapshot;
+	scan->rs_temp_snap = true;
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a290e4f053..0ae7923c95 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -182,7 +182,6 @@ extern void simple_heap_update(Relation relation, ItemPointer otid,
 				   HeapTuple tup);
 
 extern void heap_sync(Relation relation);
-extern void heap_update_snapshot(TableScanDesc scan, Snapshot snapshot);
 
 extern TransactionId heap_compute_xid_horizon_for_tuples(Relation rel,
 														 ItemPointerData *items,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index d0240d46f7..054f2102c8 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -90,7 +90,6 @@ typedef struct TableAmRoutine
 	void		(*scan_end) (TableScanDesc scan);
 	void		(*scan_rescan) (TableScanDesc scan, struct ScanKeyData *key, bool set_params,
 								bool allow_strat, bool allow_sync, bool allow_pagemode);
-	void		(*scan_update_snapshot) (TableScanDesc scan, Snapshot snapshot);
 	TupleTableSlot *(*scan_getnextslot) (TableScanDesc scan,
 										 ScanDirection direction, TupleTableSlot *slot);
 
@@ -390,15 +389,6 @@ table_rescan_set_params(TableScanDesc scan, struct ScanKeyData *key,
 										 allow_pagemode);
 }
 
-/*
- * Update snapshot info in heap scan descriptor.
- */
-static inline void
-table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot)
-{
-	scan->rs_rd->rd_tableam->scan_update_snapshot(scan, snapshot);
-}
-
 static inline TupleTableSlot *
 table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *slot)
 {
@@ -800,6 +790,12 @@ extern BlockNumber table_block_parallelscan_nextpage(Relation rel, ParallelBlock
 extern void table_block_parallelscan_startblock_init(Relation rel, ParallelBlockTableScanDesc pbscan);
 
 
+/* ----------------------------------------------------------------------------
+ * Helper function to update the snapshot of the scan descriptor
+ * ----------------------------------------------------------------------------
+ */
+extern void table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot);
+
 /* ----------------------------------------------------------------------------
  * Functions in tableamapi.c
  * ----------------------------------------------------------------------------
-- 
2.20.1.windows.1

0005-Reorganize-am-as-both-table-and-index.patchapplication/octet-stream; name=0005-Reorganize-am-as-both-table-and-index.patchDownload
From 739209fc4930847229b96acd279aa3dfe556b44f Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 7 Feb 2019 16:21:40 +1100
Subject: [PATCH 05/10] Reorganize am as both table and index

There is not much table access methods info.
---
 doc/src/sgml/am.sgml | 61 +++++++++++++++++++++++++++++---------------
 1 file changed, 41 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/am.sgml b/doc/src/sgml/am.sgml
index a9f0838ee5..579187ed1b 100644
--- a/doc/src/sgml/am.sgml
+++ b/doc/src/sgml/am.sgml
@@ -1,16 +1,34 @@
 <!-- doc/src/sgml/am.sgml -->
 
-<chapter id="indexam">
- <title>Index Access Method Interface Definition</title>
+<chapter id="am">
+ <title>Access Method Interface Definition</title>
 
   <para>
    This chapter defines the interface between the core
-   <productname>PostgreSQL</productname> system and <firstterm>index access
-   methods</firstterm>, which manage individual index types.  The core system
-   knows nothing about indexes beyond what is specified here, so it is
-   possible to develop entirely new index types by writing add-on code.
-  </para>
-
+   <productname>PostgreSQL</productname> system and <firstterm>access
+   methods</firstterm>, which manage individual <literal>INDEX</literal> 
+   and <literal>TABLE</literal> types.  The core system knows nothing
+   about these access methods beyond what is specified here, so it is
+   possible to develop entirely new access method types by writing add-on code.
+  </para>
+ 
+ <sect1 id="table-access-methods">
+  <title>Overview of Table access methods </title>
+  
+  <para>
+   All Tables in <productname>PostgreSQL</productname> are the primary
+   data store. Each table is stored as its own physical <firstterm>relation</firstterm>
+   and so is described by an entry in the <structname>pg_class</structname>
+   catalog. The contents of an table are entirely under the control of its
+   access method. (All the access methods furthermore use the standard page
+   layout described in <xref linkend="storage-page-layout"/>.)
+  </para>
+
+ </sect1> 
+ 
+ <sect1 id="index-access-methods">
+  <title>Overview of Index access methods</title>
+  
   <para>
    All indexes in <productname>PostgreSQL</productname> are what are known
    technically as <firstterm>secondary indexes</firstterm>; that is, the index is
@@ -42,8 +60,8 @@
    dead tuples are reclaimed (by vacuuming) when the dead tuples themselves
    are reclaimed.
   </para>
-
- <sect1 id="index-api">
+  
+ <sect2 id="index-api">
   <title>Basic API Structure for Indexes</title>
 
   <para>
@@ -217,9 +235,9 @@ typedef struct IndexAmRoutine
    conditions.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-functions">
+ <sect2 id="index-functions">
   <title>Index Access Method Functions</title>
 
   <para>
@@ -710,9 +728,11 @@ amparallelrescan (IndexScanDesc scan);
    the beginning.
   </para>
 
- </sect1>
+ </sect2>
+ 
+ 
 
- <sect1 id="index-scanning">
+ <sect2 id="index-scanning">
   <title>Index Scanning</title>
 
   <para>
@@ -865,9 +885,9 @@ amparallelrescan (IndexScanDesc scan);
    if its internal implementation is unsuited to one API or the other.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-locking">
+ <sect2 id="index-locking">
   <title>Index Locking Considerations</title>
 
   <para>
@@ -979,9 +999,9 @@ amparallelrescan (IndexScanDesc scan);
    reduce the frequency of such transaction cancellations.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-unique-checks">
+ <sect2 id="index-unique-checks">
   <title>Index Uniqueness Checks</title>
 
   <para>
@@ -1128,9 +1148,9 @@ amparallelrescan (IndexScanDesc scan);
     </itemizedlist>
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-cost-estimation">
+ <sect2 id="index-cost-estimation">
   <title>Index Cost Estimation Functions</title>
 
   <para>
@@ -1377,5 +1397,6 @@ cost_qual_eval(&amp;index_qual_cost, path-&gt;indexquals, root);
    Examples of cost estimator functions can be found in
    <filename>src/backend/utils/adt/selfuncs.c</filename>.
   </para>
+ </sect2>
  </sect1>
 </chapter>
-- 
2.20.1.windows.1

0006-Doc-update-of-Create-access-method-type-table.patchapplication/octet-stream; name=0006-Doc-update-of-Create-access-method-type-table.patchDownload
From cbafa7c383c994d29ca38fd2f2cb9cfa8606e1ee Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 7 Feb 2019 16:22:42 +1100
Subject: [PATCH 06/10] Doc update of Create access method type table

Create access method has added the support to create
table access methods also.
---
 doc/src/sgml/catalogs.sgml                 |  4 ++--
 doc/src/sgml/ref/create_access_method.sgml | 12 ++++++++----
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 0fd792ff1a..21deba139c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -587,8 +587,8 @@
    The catalog <structname>pg_am</structname> stores information about
    relation access methods.  There is one row for each access method supported
    by the system.
-   Currently, only indexes have access methods.  The requirements for index
-   access methods are discussed in detail in <xref linkend="indexam"/>.
+   Currently, only tables, index and materialized views have access methods.
+   The requirements for access methods are discussed in detail in <xref linkend="am"/>.
   </para>
 
   <table>
diff --git a/doc/src/sgml/ref/create_access_method.sgml b/doc/src/sgml/ref/create_access_method.sgml
index 851c5e63be..256914022a 100644
--- a/doc/src/sgml/ref/create_access_method.sgml
+++ b/doc/src/sgml/ref/create_access_method.sgml
@@ -61,7 +61,8 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
     <listitem>
      <para>
       This clause specifies the type of access method to define.
-      Only <literal>INDEX</literal> is supported at present.
+      Only <literal>INDEX</literal> and <literal>TABLE</literal>
+      are supported at present.
      </para>
     </listitem>
    </varlistentry>
@@ -76,9 +77,12 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
       declared to take a single argument of type <type>internal</type>,
       and its return type depends on the type of access method;
       for <literal>INDEX</literal> access methods, it must
-      be <type>index_am_handler</type>.  The C-level API that the handler
-      function must implement varies depending on the type of access method.
-      The index access method API is described in <xref linkend="indexam"/>.
+      be <type>index_am_handler</type> and for <literal>TABLE</literal>
+      access methods, it must be <type>table_am_handler</type>.
+      The C-level API that the handler function must implement varies
+      depending on the type of access method. The index access method API
+      is described in <xref linkend="index-access-methods"/> and the table access method
+      API is described in <xref linkend="table-access-methods"/>.
      </para>
     </listitem>
    </varlistentry>
-- 
2.20.1.windows.1

0009-Doc-of-CREATE-TABLE-AS-.-USING-syntax.patchapplication/octet-stream; name=0009-Doc-of-CREATE-TABLE-AS-.-USING-syntax.patchDownload
From ee88a862e62eea723df9ebd7ff1f9f11138a8171 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 7 Feb 2019 16:27:20 +1100
Subject: [PATCH 09/10] Doc of CREATE TABLE AS ... USING syntax

CREATE TABLE AS has added USING syntax to specify table
access method during the table creation
---
 doc/src/sgml/ref/create_table_as.sgml | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 679e8f521e..90c9dbdaa5 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 <synopsis>
 CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     [ (<replaceable>column_name</replaceable> [, ...] ) ]
+    [ USING <replaceable class="parameter">method</replaceable> ]
     [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
     [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
     [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -120,6 +121,19 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional access method for the new table;
+      see <xref linkend="table-access-methods"/> for more information.
+      If this option is not specified, then the default table access method
+      is chosen for the new table. see <xref linkend="guc-default-table-access-method"/>
+      for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
-- 
2.20.1.windows.1

0007-Doc-update-of-create-materialized-view-.-USING-synta.patchapplication/octet-stream; name=0007-Doc-update-of-create-materialized-view-.-USING-synta.patchDownload
From c16239811b2bf62a913f6e14e89ee42b154df607 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 7 Feb 2019 16:24:24 +1100
Subject: [PATCH 07/10] Doc update of create materialized view ... USING syntax

CREATE MATERIALIZED VIEW has added the support of USING
syntax to specify its own table access method.
---
 doc/src/sgml/ref/create_materialized_view.sgml | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/doc/src/sgml/ref/create_materialized_view.sgml b/doc/src/sgml/ref/create_materialized_view.sgml
index 7f31ab4d26..3a052ee6a4 100644
--- a/doc/src/sgml/ref/create_materialized_view.sgml
+++ b/doc/src/sgml/ref/create_materialized_view.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 <synopsis>
 CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     [ (<replaceable>column_name</replaceable> [, ...] ) ]
+    [ USING <replaceable class="parameter">method</replaceable> ]
     [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) ]
     [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
     AS <replaceable>query</replaceable>
@@ -85,6 +86,19 @@ CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional access method for the new materialize view;
+      see <xref linkend="table-access-methods"/> for more information.
+      If this option is not specified, then the default table access method
+      is chosen for the new materialized view. see <xref linkend="guc-default-table-access-method"/>
+      for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
-- 
2.20.1.windows.1

0008-Doc-update-of-CREATE-TABLE-.-USING-syntax.patchapplication/octet-stream; name=0008-Doc-update-of-CREATE-TABLE-.-USING-syntax.patchDownload
From 9188646bdf7c5bb8ab5d48c4e2d0a3f742812223 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 7 Feb 2019 16:25:49 +1100
Subject: [PATCH 08/10] Doc update of CREATE TABLE ... USING syntax

CREATE TABLE has added the support of USING syntax
to specify the table access method.
---
 doc/src/sgml/ref/create_table.sgml | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 22dbc07b23..e605e0a1c4 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -29,6 +29,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 ] )
 [ INHERITS ( <replaceable>parent_table</replaceable> [, ... ] ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -40,6 +41,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     [, ... ]
 ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -51,6 +53,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     [, ... ]
 ) ] { FOR VALUES <replaceable class="parameter">partition_bound_spec</replaceable> | DEFAULT }
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -954,7 +957,7 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
 
      <para>
       The access method must support <literal>amgettuple</literal> (see <xref
-      linkend="indexam"/>); at present this means <acronym>GIN</acronym>
+      linkend="index-access-methods"/>); at present this means <acronym>GIN</acronym>
       cannot be used.  Although it's allowed, there is little point in using
       B-tree or hash indexes with an exclusion constraint, because this
       does nothing that an ordinary unique constraint doesn't do better.
@@ -1137,6 +1140,19 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional access method for the new table;
+      see <xref linkend="table-access-methods"/> for more information.
+      If this option is not specified, then the default table access method
+      is chosen for the new table. see <xref linkend="guc-default-table-access-method"/>
+      for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
-- 
2.20.1.windows.1

#114Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#105)
18 attachment(s)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-03-05 23:07:21 -0800, Andres Freund wrote:

My next steps are:
- final polish & push the basic DDL and pg_dump patches
- cleanup & polish the ON CONFLICT refactoring

Those are pushed.

- cleanup & polish the patch adding the tableam based scan
interface. That's by far the largest patch in the series. I might try
to split it up further, but it's probably not worth it.

I decided not to split it up further, and even merged two small commits
into it. Subdividing it cleanly would have required making some changes
just to undo them in a subsequent patch.

- improve documentation for the individual callbacks (integrating
work done by others on this thread), in the header

I've done that for the callbacks in the above commit.

Changes:
- I've added comments to all the callbacks in the first commit / the
scan commit
- I've renamed table_gimmegimmeslot to table_slot_create
- I've made the callbacks and their wrappers more consistently named
- I've added asserts for necessary callbacks in scan commit
- Lots of smaller cleanup
- Added a commit message

While 0001 is pretty bulky, the interesting bits concentrate on a
comparatively small area. I'd appreciate if somebody could give the
comments added in tableam.h a read (both on callbacks, and their
wrappers, as they have different audiences). It'd make sense to first
read the commit message, to understand the goal (and I'd obviously also
appreciate suggestions for improvements there as well).

I'm pretty happy with the current state of the scan patch. I plan to do
two more passes through it (formatting, comment polishing, etc. I don't
know of any functional changes needed), and then commit it, lest
somebody objects.

Greetings,

Andres Freund

Attachments:

v18-0001-tableam-Add-and-use-scan-APIs.patchtext/x-diff; charset=us-asciiDownload
From c22834aecb625ac02d1e21be718b4288326f6b37 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 8 Mar 2019 18:30:05 -0800
Subject: [PATCH v18 01/18] tableam: Add and use scan APIs.

Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:

1) Heap scans need to be generalized into table scans. Do this by
   introducing TableScanDesc, which will be the "base class" for
   individual AMs. This contains the AM independent fields from
   HeapScanDesc.

   The previous heap_{beginscan,rescan,endscan} et al have been
   replaced with a table_ version.

   There's no direct replacement for heap_getnext(), as that returned
   a HeapTuple, which is undesirable for a other AMs. Instead there's
   table_scan_getnextslot().  But note that heap_getnext() lives on,
   it's still used widely to access catalog tables.

   This is achieved by new scan_begin, scan_end, scan_rescan,
   scan_getnextslot callbacks.

2) The portion of parallel scans that's shared between backends need
   to be able to do so without the user doing per-AM work. To achieve
   that new parallelscan_{estimate, initialize, reinitialize}
   callbacks are introduced, which operate on a new
   ParallelTableScanDesc, which again can be subclassed by AMs.

   As it is likely that several AMs are going to be block oriented,
   block oriented callbacks that can be shared between such AMs are
   provided and used by heap. table_block_parallelscan_{estimate,
   intiialize, reinitialize} as callbacks, and
   table_block_parallelscan_{nextpage, init} for use in AMs. These
   operate on a ParallelBlockTableScanDesc.

3) Index scans need to be able to access tables to return a tuple, and
   there needs to be state across individual accesses to the heap to
   store state like buffers. That's now handled by introducing a
   sort-of-scan IndexFetchTable, which again is intended to be
   subclassed by individual AMs (for heap IndexFetchHeap).

   The relevant callbacks for an AM are index_fetch_{end, begin,
   reset} to create the necessary state, and index_fetch_tuple to
   retrieve an indexed tuple.  Note that index_fetch_tuple
   implementations need to be smarter than just blindly fetching the
   tuples for AMs that have optimizations similar to heap's HOT - the
   currently alive tuple in the update chain needs to be fetched if
   appropriate.

   Similar to table_scan_getnextslot(), it's undesirable to continue
   to return HeapTuples. Thus index_fetch_heap (might want to rename
   that later) now accepts a slot as an argument. Core code doesn't
   have a lot of call sites performing index scans without going
   through the systable_* API (in contrast to loads of heap_getnext
   calls and working directly with HeapTuples).

   Index scans now store the result of a search in
   IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
   target is not generally a HeapTuple anymore that seems cleaner.

To be able to sensible adapt code to use the above, two further
callbacks have been introduced:

a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
   slots capable of holding a tuple of the AMs
   type. table_slot_callbacks() and table_slot_create() are based
   upon that, but have additional logic to deal with views, foreign
   tables, etc.

   While this change could have been done separately, nearly all the
   call sites that needed to be adapted for the rest of this commit
   also would have been needed to be adapted for
   table_slot_callbacks(), making separation not worthwhile.

b) tuple_satisfies_snapshot checks whether the tuple in a slot is
   currently visible according to a snapshot. That's required as a few
   places now don't have a buffer + HeapTuple around, but a
   slot (which in heap's case internally has that information).

Additionally a few infrastructure changes were needed:

I) SysScanDesc, as used by systable_{beginscan, getnext} et al now
   internally uses a slot to keep track of tuples. While
   systable_getnext() still returns HeapTuples, and will so for the
   foreseeable future, the index API (see 1) above) now only deals with
   slots.

The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.

Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
---
 contrib/amcheck/verify_nbtree.c            |  15 +-
 contrib/pgrowlocks/pgrowlocks.c            |  20 +-
 contrib/pgstattuple/pgstattuple.c          |  22 +-
 contrib/tsm_system_rows/tsm_system_rows.c  |  14 +-
 contrib/tsm_system_time/tsm_system_time.c  |   9 +-
 src/backend/access/gist/gistget.c          |   4 +-
 src/backend/access/hash/hashsearch.c       |   4 +-
 src/backend/access/heap/heapam.c           | 650 +++++++--------------
 src/backend/access/heap/heapam_handler.c   | 166 ++++++
 src/backend/access/index/genam.c           | 107 ++--
 src/backend/access/index/indexam.c         | 162 ++---
 src/backend/access/nbtree/nbtree.c         |   2 +-
 src/backend/access/nbtree/nbtsearch.c      |   6 +-
 src/backend/access/nbtree/nbtsort.c        |  20 +-
 src/backend/access/spgist/spgscan.c        |   2 +-
 src/backend/access/table/tableam.c         | 293 +++++++++-
 src/backend/access/table/tableamapi.c      |  26 +-
 src/backend/access/tablesample/system.c    |   7 +-
 src/backend/bootstrap/bootstrap.c          |  21 +-
 src/backend/catalog/aclchk.c               |  13 +-
 src/backend/catalog/index.c                | 112 ++--
 src/backend/catalog/pg_conversion.c        |   7 +-
 src/backend/catalog/pg_db_role_setting.c   |   7 +-
 src/backend/catalog/pg_publication.c       |   7 +-
 src/backend/catalog/pg_subscription.c      |   7 +-
 src/backend/commands/cluster.c             |  29 +-
 src/backend/commands/constraint.c          |  67 ++-
 src/backend/commands/copy.c                |   7 +-
 src/backend/commands/dbcommands.c          |  19 +-
 src/backend/commands/indexcmds.c           |   7 +-
 src/backend/commands/tablecmds.c           | 135 +++--
 src/backend/commands/tablespace.c          |  37 +-
 src/backend/commands/typecmds.c            |  29 +-
 src/backend/commands/vacuum.c              |  13 +-
 src/backend/executor/execCurrent.c         |   2 +-
 src/backend/executor/execIndexing.c        |  18 +-
 src/backend/executor/execMain.c            |   6 +-
 src/backend/executor/execPartition.c       |   8 +-
 src/backend/executor/execReplication.c     |  62 +-
 src/backend/executor/execUtils.c           |   7 +-
 src/backend/executor/nodeBitmapHeapscan.c  |  74 +--
 src/backend/executor/nodeIndexonlyscan.c   |  25 +-
 src/backend/executor/nodeIndexscan.c       |  32 +-
 src/backend/executor/nodeModifyTable.c     |  17 +-
 src/backend/executor/nodeSamplescan.c      |  86 +--
 src/backend/executor/nodeSeqscan.c         |  67 +--
 src/backend/executor/nodeTidscan.c         |   3 +-
 src/backend/partitioning/partbounds.c      |  18 +-
 src/backend/postmaster/autovacuum.c        |  17 +-
 src/backend/postmaster/pgstat.c            |   7 +-
 src/backend/replication/logical/launcher.c |   7 +-
 src/backend/replication/logical/worker.c   |  13 +-
 src/backend/rewrite/rewriteDefine.c        |  12 +-
 src/backend/utils/adt/ri_triggers.c        |  23 +-
 src/backend/utils/adt/selfuncs.c           |  17 +-
 src/backend/utils/init/postinit.c          |   7 +-
 src/include/access/genam.h                 |   5 +-
 src/include/access/heapam.h                |  90 ++-
 src/include/access/relscan.h               | 112 ++--
 src/include/access/tableam.h               | 470 ++++++++++++++-
 src/include/catalog/index.h                |   5 +-
 src/include/nodes/execnodes.h              |   2 +-
 src/tools/pgindent/typedefs.list           |  11 +-
 63 files changed, 2019 insertions(+), 1250 deletions(-)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 964200a7678..bb6442de82d 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -26,6 +26,7 @@
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/nbtree.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "catalog/index.h"
@@ -481,7 +482,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool readonly,
 	if (state->heapallindexed)
 	{
 		IndexInfo  *indexinfo = BuildIndexInfo(state->rel);
-		HeapScanDesc scan;
+		TableScanDesc scan;
 
 		/* Report on extra downlink checks performed in readonly case */
 		if (state->readonly)
@@ -500,12 +501,12 @@ bt_check_every_level(Relation rel, Relation heaprel, bool readonly,
 		 *
 		 * Note that IndexBuildHeapScan() calls heap_endscan() for us.
 		 */
-		scan = heap_beginscan_strat(state->heaprel, /* relation */
-									snapshot,	/* snapshot */
-									0,	/* number of keys */
-									NULL,	/* scan key */
-									true,	/* buffer access strategy OK */
-									true);	/* syncscan OK? */
+		scan = table_beginscan_strat(state->heaprel, /* relation */
+									 snapshot,	/* snapshot */
+									 0,	/* number of keys */
+									 NULL,	/* scan key */
+									 true,	/* buffer access strategy OK */
+									 true);	/* syncscan OK? */
 
 		/*
 		 * Scan will behave as the first scan of a CREATE INDEX CONCURRENTLY
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index df2ad7f2c9d..2d2a6cf1533 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -27,6 +27,7 @@
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -55,7 +56,7 @@ PG_FUNCTION_INFO_V1(pgrowlocks);
 typedef struct
 {
 	Relation	rel;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	int			ncolumns;
 } MyData;
 
@@ -70,7 +71,8 @@ Datum
 pgrowlocks(PG_FUNCTION_ARGS)
 {
 	FuncCallContext *funcctx;
-	HeapScanDesc scan;
+	TableScanDesc scan;
+	HeapScanDesc hscan;
 	HeapTuple	tuple;
 	TupleDesc	tupdesc;
 	AttInMetadata *attinmeta;
@@ -124,7 +126,8 @@ pgrowlocks(PG_FUNCTION_ARGS)
 			aclcheck_error(aclresult, get_relkind_objtype(rel->rd_rel->relkind),
 						   RelationGetRelationName(rel));
 
-		scan = heap_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+		scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+		hscan = (HeapScanDesc) scan;
 		mydata = palloc(sizeof(*mydata));
 		mydata->rel = rel;
 		mydata->scan = scan;
@@ -138,6 +141,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 	attinmeta = funcctx->attinmeta;
 	mydata = (MyData *) funcctx->user_fctx;
 	scan = mydata->scan;
+	hscan = (HeapScanDesc) scan;
 
 	/* scan the relation */
 	while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
@@ -147,11 +151,11 @@ pgrowlocks(PG_FUNCTION_ARGS)
 		uint16		infomask;
 
 		/* must hold a buffer lock to call HeapTupleSatisfiesUpdate */
-		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+		LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
 
 		htsu = HeapTupleSatisfiesUpdate(tuple,
 										GetCurrentCommandId(false),
-										scan->rs_cbuf);
+										hscan->rs_cbuf);
 		xmax = HeapTupleHeaderGetRawXmax(tuple->t_data);
 		infomask = tuple->t_data->t_infomask;
 
@@ -284,7 +288,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 						 BackendXidGetPid(xmax));
 			}
 
-			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 
 			/* build a tuple */
 			tuple = BuildTupleFromCStrings(attinmeta, values);
@@ -301,11 +305,11 @@ pgrowlocks(PG_FUNCTION_ARGS)
 		}
 		else
 		{
-			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 		}
 	}
 
-	heap_endscan(scan);
+	table_endscan(scan);
 	table_close(mydata->rel, AccessShareLock);
 
 	SRF_RETURN_DONE(funcctx);
diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 2ac9863463b..7e1c3080006 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -29,6 +29,7 @@
 #include "access/heapam.h"
 #include "access/nbtree.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_am.h"
 #include "funcapi.h"
@@ -317,7 +318,8 @@ pgstat_relation(Relation rel, FunctionCallInfo fcinfo)
 static Datum
 pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 {
-	HeapScanDesc scan;
+	TableScanDesc scan;
+	HeapScanDesc hscan;
 	HeapTuple	tuple;
 	BlockNumber nblocks;
 	BlockNumber block = 0;		/* next block to count free space in */
@@ -327,10 +329,12 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 	SnapshotData SnapshotDirty;
 
 	/* Disable syncscan because we assume we scan from block zero upwards */
-	scan = heap_beginscan_strat(rel, SnapshotAny, 0, NULL, true, false);
+	scan = table_beginscan_strat(rel, SnapshotAny, 0, NULL, true, false);
+	hscan = (HeapScanDesc) scan;
+
 	InitDirtySnapshot(SnapshotDirty);
 
-	nblocks = scan->rs_nblocks; /* # blocks to be scanned */
+	nblocks = hscan->rs_nblocks; /* # blocks to be scanned */
 
 	/* scan the relation */
 	while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
@@ -338,9 +342,9 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 		CHECK_FOR_INTERRUPTS();
 
 		/* must hold a buffer lock to call HeapTupleSatisfiesVisibility */
-		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+		LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
 
-		if (HeapTupleSatisfiesVisibility(tuple, &SnapshotDirty, scan->rs_cbuf))
+		if (HeapTupleSatisfiesVisibility(tuple, &SnapshotDirty, hscan->rs_cbuf))
 		{
 			stat.tuple_len += tuple->t_len;
 			stat.tuple_count++;
@@ -351,7 +355,7 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 			stat.dead_tuple_count++;
 		}
 
-		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+		LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 
 		/*
 		 * To avoid physically reading the table twice, try to do the
@@ -366,7 +370,7 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 			CHECK_FOR_INTERRUPTS();
 
 			buffer = ReadBufferExtended(rel, MAIN_FORKNUM, block,
-										RBM_NORMAL, scan->rs_strategy);
+										RBM_NORMAL, hscan->rs_strategy);
 			LockBuffer(buffer, BUFFER_LOCK_SHARE);
 			stat.free_space += PageGetHeapFreeSpace((Page) BufferGetPage(buffer));
 			UnlockReleaseBuffer(buffer);
@@ -379,14 +383,14 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 		CHECK_FOR_INTERRUPTS();
 
 		buffer = ReadBufferExtended(rel, MAIN_FORKNUM, block,
-									RBM_NORMAL, scan->rs_strategy);
+									RBM_NORMAL, hscan->rs_strategy);
 		LockBuffer(buffer, BUFFER_LOCK_SHARE);
 		stat.free_space += PageGetHeapFreeSpace((Page) BufferGetPage(buffer));
 		UnlockReleaseBuffer(buffer);
 		block++;
 	}
 
-	heap_endscan(scan);
+	table_endscan(scan);
 	relation_close(rel, AccessShareLock);
 
 	stat.table_len = (uint64) nblocks * BLCKSZ;
diff --git a/contrib/tsm_system_rows/tsm_system_rows.c b/contrib/tsm_system_rows/tsm_system_rows.c
index c92490f9389..1d35ea3c53a 100644
--- a/contrib/tsm_system_rows/tsm_system_rows.c
+++ b/contrib/tsm_system_rows/tsm_system_rows.c
@@ -209,7 +209,8 @@ static BlockNumber
 system_rows_nextsampleblock(SampleScanState *node)
 {
 	SystemRowsSamplerData *sampler = (SystemRowsSamplerData *) node->tsm_state;
-	HeapScanDesc scan = node->ss.ss_currentScanDesc;
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+	HeapScanDesc hscan = (HeapScanDesc) scan;
 
 	/* First call within scan? */
 	if (sampler->doneblocks == 0)
@@ -221,14 +222,14 @@ system_rows_nextsampleblock(SampleScanState *node)
 			SamplerRandomState randstate;
 
 			/* If relation is empty, there's nothing to scan */
-			if (scan->rs_nblocks == 0)
+			if (hscan->rs_nblocks == 0)
 				return InvalidBlockNumber;
 
 			/* We only need an RNG during this setup step */
 			sampler_random_init_state(sampler->seed, randstate);
 
 			/* Compute nblocks/firstblock/step only once per query */
-			sampler->nblocks = scan->rs_nblocks;
+			sampler->nblocks = hscan->rs_nblocks;
 
 			/* Choose random starting block within the relation */
 			/* (Actually this is the predecessor of the first block visited) */
@@ -258,7 +259,7 @@ system_rows_nextsampleblock(SampleScanState *node)
 	{
 		/* Advance lb, using uint64 arithmetic to forestall overflow */
 		sampler->lb = ((uint64) sampler->lb + sampler->step) % sampler->nblocks;
-	} while (sampler->lb >= scan->rs_nblocks);
+	} while (sampler->lb >= hscan->rs_nblocks);
 
 	return sampler->lb;
 }
@@ -278,7 +279,8 @@ system_rows_nextsampletuple(SampleScanState *node,
 							OffsetNumber maxoffset)
 {
 	SystemRowsSamplerData *sampler = (SystemRowsSamplerData *) node->tsm_state;
-	HeapScanDesc scan = node->ss.ss_currentScanDesc;
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+	HeapScanDesc hscan = (HeapScanDesc) scan;
 	OffsetNumber tupoffset = sampler->lt;
 
 	/* Quit if we've returned all needed tuples */
@@ -308,7 +310,7 @@ system_rows_nextsampletuple(SampleScanState *node,
 		}
 
 		/* Found a candidate? */
-		if (SampleOffsetVisible(tupoffset, scan))
+		if (SampleOffsetVisible(tupoffset, hscan))
 		{
 			sampler->donetuples++;
 			break;
diff --git a/contrib/tsm_system_time/tsm_system_time.c b/contrib/tsm_system_time/tsm_system_time.c
index edeacf0b539..1cc7264e084 100644
--- a/contrib/tsm_system_time/tsm_system_time.c
+++ b/contrib/tsm_system_time/tsm_system_time.c
@@ -216,7 +216,8 @@ static BlockNumber
 system_time_nextsampleblock(SampleScanState *node)
 {
 	SystemTimeSamplerData *sampler = (SystemTimeSamplerData *) node->tsm_state;
-	HeapScanDesc scan = node->ss.ss_currentScanDesc;
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+	HeapScanDesc hscan = (HeapScanDesc) scan;
 	instr_time	cur_time;
 
 	/* First call within scan? */
@@ -229,14 +230,14 @@ system_time_nextsampleblock(SampleScanState *node)
 			SamplerRandomState randstate;
 
 			/* If relation is empty, there's nothing to scan */
-			if (scan->rs_nblocks == 0)
+			if (hscan->rs_nblocks == 0)
 				return InvalidBlockNumber;
 
 			/* We only need an RNG during this setup step */
 			sampler_random_init_state(sampler->seed, randstate);
 
 			/* Compute nblocks/firstblock/step only once per query */
-			sampler->nblocks = scan->rs_nblocks;
+			sampler->nblocks = hscan->rs_nblocks;
 
 			/* Choose random starting block within the relation */
 			/* (Actually this is the predecessor of the first block visited) */
@@ -272,7 +273,7 @@ system_time_nextsampleblock(SampleScanState *node)
 	{
 		/* Advance lb, using uint64 arithmetic to forestall overflow */
 		sampler->lb = ((uint64) sampler->lb + sampler->step) % sampler->nblocks;
-	} while (sampler->lb >= scan->rs_nblocks);
+	} while (sampler->lb >= hscan->rs_nblocks);
 
 	return sampler->lb;
 }
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index a96ef5c3acc..b54d5991622 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -561,7 +561,7 @@ getNextNearest(IndexScanDesc scan)
 		if (GISTSearchItemIsHeap(*item))
 		{
 			/* found a heap item at currently minimal distance */
-			scan->xs_ctup.t_self = item->data.heap.heapPtr;
+			scan->xs_heaptid = item->data.heap.heapPtr;
 			scan->xs_recheck = item->data.heap.recheck;
 
 			index_store_float8_orderby_distances(scan, so->orderByTypes,
@@ -650,7 +650,7 @@ gistgettuple(IndexScanDesc scan, ScanDirection dir)
 							so->pageData[so->curPageData - 1].offnum;
 				}
 				/* continuing to return tuples from a leaf page */
-				scan->xs_ctup.t_self = so->pageData[so->curPageData].heapPtr;
+				scan->xs_heaptid = so->pageData[so->curPageData].heapPtr;
 				scan->xs_recheck = so->pageData[so->curPageData].recheck;
 
 				/* in an index-only scan, also return the reconstructed tuple */
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index ccd3fdceac0..61c90e6bb78 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -119,7 +119,7 @@ _hash_next(IndexScanDesc scan, ScanDirection dir)
 
 	/* OK, itemIndex says what to return */
 	currItem = &so->currPos.items[so->currPos.itemIndex];
-	scan->xs_ctup.t_self = currItem->heapTid;
+	scan->xs_heaptid = currItem->heapTid;
 
 	return true;
 }
@@ -432,7 +432,7 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	/* OK, itemIndex says what to return */
 	currItem = &so->currPos.items[so->currPos.itemIndex];
-	scan->xs_ctup.t_self = currItem->heapTid;
+	scan->xs_heaptid = currItem->heapTid;
 
 	/* if we're here, _hash_readpage found a valid tuples */
 	return true;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dc3499349b6..d6e32d6ce21 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -40,6 +40,7 @@
 #include "access/multixact.h"
 #include "access/parallel.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "access/sysattr.h"
 #include "access/transam.h"
 #include "access/tuptoaster.h"
@@ -68,22 +69,6 @@
 #include "utils/snapmgr.h"
 
 
-/* GUC variable */
-bool		synchronize_seqscans = true;
-
-
-static HeapScanDesc heap_beginscan_internal(Relation relation,
-						Snapshot snapshot,
-						int nkeys, ScanKey key,
-						ParallelHeapScanDesc parallel_scan,
-						bool allow_strat,
-						bool allow_sync,
-						bool allow_pagemode,
-						bool is_bitmapscan,
-						bool is_samplescan,
-						bool temp_snap);
-static void heap_parallelscan_startblock_init(HeapScanDesc scan);
-static BlockNumber heap_parallelscan_nextpage(HeapScanDesc scan);
 static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 					TransactionId xid, CommandId cid, int options);
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
@@ -207,6 +192,7 @@ static const int MultiXactStatusLock[MaxMultiXactStatus + 1] =
 static void
 initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
 {
+	ParallelBlockTableScanDesc bpscan = NULL;
 	bool		allow_strat;
 	bool		allow_sync;
 
@@ -221,10 +207,13 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
 	 * results for a non-MVCC snapshot, the caller must hold some higher-level
 	 * lock that ensures the interesting tuple(s) won't change.)
 	 */
-	if (scan->rs_parallel != NULL)
-		scan->rs_nblocks = scan->rs_parallel->phs_nblocks;
+	if (scan->rs_base.rs_parallel != NULL)
+	{
+		bpscan = (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel;
+		scan->rs_nblocks = bpscan->phs_nblocks;
+	}
 	else
-		scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_rd);
+		scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_base.rs_rd);
 
 	/*
 	 * If the table is large relative to NBuffers, use a bulk-read access
@@ -238,11 +227,11 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
 	 * Note that heap_parallelscan_initialize has a very similar test; if you
 	 * change this, consider changing that one, too.
 	 */
-	if (!RelationUsesLocalBuffers(scan->rs_rd) &&
+	if (!RelationUsesLocalBuffers(scan->rs_base.rs_rd) &&
 		scan->rs_nblocks > NBuffers / 4)
 	{
-		allow_strat = scan->rs_allow_strat;
-		allow_sync = scan->rs_allow_sync;
+		allow_strat = scan->rs_base.rs_allow_strat;
+		allow_sync = scan->rs_base.rs_allow_sync;
 	}
 	else
 		allow_strat = allow_sync = false;
@@ -260,10 +249,10 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
 		scan->rs_strategy = NULL;
 	}
 
-	if (scan->rs_parallel != NULL)
+	if (scan->rs_base.rs_parallel != NULL)
 	{
-		/* For parallel scan, believe whatever ParallelHeapScanDesc says. */
-		scan->rs_syncscan = scan->rs_parallel->phs_syncscan;
+		/* For parallel scan, believe whatever ParallelTableScanDesc says. */
+		scan->rs_base.rs_syncscan = scan->rs_base.rs_parallel->phs_syncscan;
 	}
 	else if (keep_startblock)
 	{
@@ -272,16 +261,16 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
 		 * so that rewinding a cursor doesn't generate surprising results.
 		 * Reset the active syncscan setting, though.
 		 */
-		scan->rs_syncscan = (allow_sync && synchronize_seqscans);
+		scan->rs_base.rs_syncscan = (allow_sync && synchronize_seqscans);
 	}
 	else if (allow_sync && synchronize_seqscans)
 	{
-		scan->rs_syncscan = true;
-		scan->rs_startblock = ss_get_location(scan->rs_rd, scan->rs_nblocks);
+		scan->rs_base.rs_syncscan = true;
+		scan->rs_startblock = ss_get_location(scan->rs_base.rs_rd, scan->rs_nblocks);
 	}
 	else
 	{
-		scan->rs_syncscan = false;
+		scan->rs_base.rs_syncscan = false;
 		scan->rs_startblock = 0;
 	}
 
@@ -298,15 +287,15 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
 	 * copy the scan key, if appropriate
 	 */
 	if (key != NULL)
-		memcpy(scan->rs_key, key, scan->rs_nkeys * sizeof(ScanKeyData));
+		memcpy(scan->rs_base.rs_key, key, scan->rs_base.rs_nkeys * sizeof(ScanKeyData));
 
 	/*
 	 * Currently, we don't have a stats counter for bitmap heap scans (but the
 	 * underlying bitmap index scans will be counted) or sample scans (we only
 	 * update stats for tuple fetches there)
 	 */
-	if (!scan->rs_bitmapscan && !scan->rs_samplescan)
-		pgstat_count_heap_scan(scan->rs_rd);
+	if (!scan->rs_base.rs_bitmapscan && !scan->rs_base.rs_samplescan)
+		pgstat_count_heap_scan(scan->rs_base.rs_rd);
 }
 
 /*
@@ -316,10 +305,12 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
  * numBlks is number of pages to scan (InvalidBlockNumber means "all")
  */
 void
-heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk, BlockNumber numBlks)
+heap_setscanlimits(TableScanDesc sscan, BlockNumber startBlk, BlockNumber numBlks)
 {
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+
 	Assert(!scan->rs_inited);	/* else too late to change */
-	Assert(!scan->rs_syncscan); /* else rs_startblock is significant */
+	Assert(!scan->rs_base.rs_syncscan); /* else rs_startblock is significant */
 
 	/* Check startBlk is valid (but allow case of zero blocks...) */
 	Assert(startBlk == 0 || startBlk < scan->rs_nblocks);
@@ -336,8 +327,9 @@ heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk, BlockNumber numBlks)
  * which tuples on the page are visible.
  */
 void
-heapgetpage(HeapScanDesc scan, BlockNumber page)
+heapgetpage(TableScanDesc sscan, BlockNumber page)
 {
+	HeapScanDesc scan = (HeapScanDesc) sscan;
 	Buffer		buffer;
 	Snapshot	snapshot;
 	Page		dp;
@@ -364,20 +356,20 @@ heapgetpage(HeapScanDesc scan, BlockNumber page)
 	CHECK_FOR_INTERRUPTS();
 
 	/* read page using selected strategy */
-	scan->rs_cbuf = ReadBufferExtended(scan->rs_rd, MAIN_FORKNUM, page,
+	scan->rs_cbuf = ReadBufferExtended(scan->rs_base.rs_rd, MAIN_FORKNUM, page,
 									   RBM_NORMAL, scan->rs_strategy);
 	scan->rs_cblock = page;
 
-	if (!scan->rs_pageatatime)
+	if (!scan->rs_base.rs_pageatatime)
 		return;
 
 	buffer = scan->rs_cbuf;
-	snapshot = scan->rs_snapshot;
+	snapshot = scan->rs_base.rs_snapshot;
 
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -387,7 +379,7 @@ heapgetpage(HeapScanDesc scan, BlockNumber page)
 	LockBuffer(buffer, BUFFER_LOCK_SHARE);
 
 	dp = BufferGetPage(buffer);
-	TestForOldSnapshot(snapshot, scan->rs_rd, dp);
+	TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, dp);
 	lines = PageGetMaxOffsetNumber(dp);
 	ntup = 0;
 
@@ -422,7 +414,7 @@ heapgetpage(HeapScanDesc scan, BlockNumber page)
 			HeapTupleData loctup;
 			bool		valid;
 
-			loctup.t_tableOid = RelationGetRelid(scan->rs_rd);
+			loctup.t_tableOid = RelationGetRelid(scan->rs_base.rs_rd);
 			loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
 			loctup.t_len = ItemIdGetLength(lpp);
 			ItemPointerSet(&(loctup.t_self), page, lineoff);
@@ -432,8 +424,8 @@ heapgetpage(HeapScanDesc scan, BlockNumber page)
 			else
 				valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer);
 
-			CheckForSerializableConflictOut(valid, scan->rs_rd, &loctup,
-											buffer, snapshot);
+			CheckForSerializableConflictOut(valid, scan->rs_base.rs_rd,
+											&loctup, buffer, snapshot);
 
 			if (valid)
 				scan->rs_vistuples[ntup++] = lineoff;
@@ -476,7 +468,7 @@ heapgettup(HeapScanDesc scan,
 		   ScanKey key)
 {
 	HeapTuple	tuple = &(scan->rs_ctup);
-	Snapshot	snapshot = scan->rs_snapshot;
+	Snapshot	snapshot = scan->rs_base.rs_snapshot;
 	bool		backward = ScanDirectionIsBackward(dir);
 	BlockNumber page;
 	bool		finished;
@@ -502,11 +494,16 @@ heapgettup(HeapScanDesc scan,
 				tuple->t_data = NULL;
 				return;
 			}
-			if (scan->rs_parallel != NULL)
+			if (scan->rs_base.rs_parallel != NULL)
 			{
-				heap_parallelscan_startblock_init(scan);
+				ParallelBlockTableScanDesc pbscan =
+				(ParallelBlockTableScanDesc) scan->rs_base.rs_parallel;
 
-				page = heap_parallelscan_nextpage(scan);
+				table_block_parallelscan_startblock_init(scan->rs_base.rs_rd,
+														 pbscan);
+
+				page = table_block_parallelscan_nextpage(scan->rs_base.rs_rd,
+														 pbscan);
 
 				/* Other processes might have already finished the scan. */
 				if (page == InvalidBlockNumber)
@@ -518,7 +515,7 @@ heapgettup(HeapScanDesc scan,
 			}
 			else
 				page = scan->rs_startblock; /* first page */
-			heapgetpage(scan, page);
+			heapgetpage((TableScanDesc) scan, page);
 			lineoff = FirstOffsetNumber;	/* first offnum */
 			scan->rs_inited = true;
 		}
@@ -533,7 +530,7 @@ heapgettup(HeapScanDesc scan,
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
 		dp = BufferGetPage(scan->rs_cbuf);
-		TestForOldSnapshot(snapshot, scan->rs_rd, dp);
+		TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, dp);
 		lines = PageGetMaxOffsetNumber(dp);
 		/* page and lineoff now reference the physically next tid */
 
@@ -542,7 +539,7 @@ heapgettup(HeapScanDesc scan,
 	else if (backward)
 	{
 		/* backward parallel scan not supported */
-		Assert(scan->rs_parallel == NULL);
+		Assert(scan->rs_base.rs_parallel == NULL);
 
 		if (!scan->rs_inited)
 		{
@@ -562,13 +559,13 @@ heapgettup(HeapScanDesc scan,
 			 * time, and much more likely that we'll just bollix things for
 			 * forward scanners.
 			 */
-			scan->rs_syncscan = false;
+			scan->rs_base.rs_syncscan = false;
 			/* start from last page of the scan */
 			if (scan->rs_startblock > 0)
 				page = scan->rs_startblock - 1;
 			else
 				page = scan->rs_nblocks - 1;
-			heapgetpage(scan, page);
+			heapgetpage((TableScanDesc) scan, page);
 		}
 		else
 		{
@@ -579,7 +576,7 @@ heapgettup(HeapScanDesc scan,
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
 		dp = BufferGetPage(scan->rs_cbuf);
-		TestForOldSnapshot(snapshot, scan->rs_rd, dp);
+		TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, dp);
 		lines = PageGetMaxOffsetNumber(dp);
 
 		if (!scan->rs_inited)
@@ -610,11 +607,11 @@ heapgettup(HeapScanDesc scan,
 
 		page = ItemPointerGetBlockNumber(&(tuple->t_self));
 		if (page != scan->rs_cblock)
-			heapgetpage(scan, page);
+			heapgetpage((TableScanDesc) scan, page);
 
 		/* Since the tuple was previously fetched, needn't lock page here */
 		dp = BufferGetPage(scan->rs_cbuf);
-		TestForOldSnapshot(snapshot, scan->rs_rd, dp);
+		TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, dp);
 		lineoff = ItemPointerGetOffsetNumber(&(tuple->t_self));
 		lpp = PageGetItemId(dp, lineoff);
 		Assert(ItemIdIsNormal(lpp));
@@ -649,11 +646,12 @@ heapgettup(HeapScanDesc scan,
 													 snapshot,
 													 scan->rs_cbuf);
 
-				CheckForSerializableConflictOut(valid, scan->rs_rd, tuple,
-												scan->rs_cbuf, snapshot);
+				CheckForSerializableConflictOut(valid, scan->rs_base.rs_rd,
+												tuple, scan->rs_cbuf,
+												snapshot);
 
 				if (valid && key != NULL)
-					HeapKeyTest(tuple, RelationGetDescr(scan->rs_rd),
+					HeapKeyTest(tuple, RelationGetDescr(scan->rs_base.rs_rd),
 								nkeys, key, valid);
 
 				if (valid)
@@ -696,9 +694,13 @@ heapgettup(HeapScanDesc scan,
 				page = scan->rs_nblocks;
 			page--;
 		}
-		else if (scan->rs_parallel != NULL)
+		else if (scan->rs_base.rs_parallel != NULL)
 		{
-			page = heap_parallelscan_nextpage(scan);
+			ParallelBlockTableScanDesc pbscan =
+			(ParallelBlockTableScanDesc) scan->rs_base.rs_parallel;
+
+			page = table_block_parallelscan_nextpage(scan->rs_base.rs_rd,
+													 pbscan);
 			finished = (page == InvalidBlockNumber);
 		}
 		else
@@ -721,8 +723,8 @@ heapgettup(HeapScanDesc scan,
 			 * a little bit backwards on every invocation, which is confusing.
 			 * We don't guarantee any specific ordering in general, though.
 			 */
-			if (scan->rs_syncscan)
-				ss_report_location(scan->rs_rd, page);
+			if (scan->rs_base.rs_syncscan)
+				ss_report_location(scan->rs_base.rs_rd, page);
 		}
 
 		/*
@@ -739,12 +741,12 @@ heapgettup(HeapScanDesc scan,
 			return;
 		}
 
-		heapgetpage(scan, page);
+		heapgetpage((TableScanDesc) scan, page);
 
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
 		dp = BufferGetPage(scan->rs_cbuf);
-		TestForOldSnapshot(snapshot, scan->rs_rd, dp);
+		TestForOldSnapshot(snapshot, scan->rs_base.rs_rd, dp);
 		lines = PageGetMaxOffsetNumber((Page) dp);
 		linesleft = lines;
 		if (backward)
@@ -806,11 +808,16 @@ heapgettup_pagemode(HeapScanDesc scan,
 				tuple->t_data = NULL;
 				return;
 			}
-			if (scan->rs_parallel != NULL)
+			if (scan->rs_base.rs_parallel != NULL)
 			{
-				heap_parallelscan_startblock_init(scan);
+				ParallelBlockTableScanDesc pbscan =
+				(ParallelBlockTableScanDesc) scan->rs_base.rs_parallel;
 
-				page = heap_parallelscan_nextpage(scan);
+				table_block_parallelscan_startblock_init(scan->rs_base.rs_rd,
+														 pbscan);
+
+				page = table_block_parallelscan_nextpage(scan->rs_base.rs_rd,
+														 pbscan);
 
 				/* Other processes might have already finished the scan. */
 				if (page == InvalidBlockNumber)
@@ -822,7 +829,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 			}
 			else
 				page = scan->rs_startblock; /* first page */
-			heapgetpage(scan, page);
+			heapgetpage((TableScanDesc) scan, page);
 			lineindex = 0;
 			scan->rs_inited = true;
 		}
@@ -834,7 +841,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 		}
 
 		dp = BufferGetPage(scan->rs_cbuf);
-		TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp);
+		TestForOldSnapshot(scan->rs_base.rs_snapshot, scan->rs_base.rs_rd, dp);
 		lines = scan->rs_ntuples;
 		/* page and lineindex now reference the next visible tid */
 
@@ -843,7 +850,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 	else if (backward)
 	{
 		/* backward parallel scan not supported */
-		Assert(scan->rs_parallel == NULL);
+		Assert(scan->rs_base.rs_parallel == NULL);
 
 		if (!scan->rs_inited)
 		{
@@ -863,13 +870,13 @@ heapgettup_pagemode(HeapScanDesc scan,
 			 * time, and much more likely that we'll just bollix things for
 			 * forward scanners.
 			 */
-			scan->rs_syncscan = false;
+			scan->rs_base.rs_syncscan = false;
 			/* start from last page of the scan */
 			if (scan->rs_startblock > 0)
 				page = scan->rs_startblock - 1;
 			else
 				page = scan->rs_nblocks - 1;
-			heapgetpage(scan, page);
+			heapgetpage((TableScanDesc) scan, page);
 		}
 		else
 		{
@@ -878,7 +885,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 		}
 
 		dp = BufferGetPage(scan->rs_cbuf);
-		TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp);
+		TestForOldSnapshot(scan->rs_base.rs_snapshot, scan->rs_base.rs_rd, dp);
 		lines = scan->rs_ntuples;
 
 		if (!scan->rs_inited)
@@ -908,11 +915,11 @@ heapgettup_pagemode(HeapScanDesc scan,
 
 		page = ItemPointerGetBlockNumber(&(tuple->t_self));
 		if (page != scan->rs_cblock)
-			heapgetpage(scan, page);
+			heapgetpage((TableScanDesc) scan, page);
 
 		/* Since the tuple was previously fetched, needn't lock page here */
 		dp = BufferGetPage(scan->rs_cbuf);
-		TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp);
+		TestForOldSnapshot(scan->rs_base.rs_snapshot, scan->rs_base.rs_rd, dp);
 		lineoff = ItemPointerGetOffsetNumber(&(tuple->t_self));
 		lpp = PageGetItemId(dp, lineoff);
 		Assert(ItemIdIsNormal(lpp));
@@ -950,7 +957,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 			{
 				bool		valid;
 
-				HeapKeyTest(tuple, RelationGetDescr(scan->rs_rd),
+				HeapKeyTest(tuple, RelationGetDescr(scan->rs_base.rs_rd),
 							nkeys, key, valid);
 				if (valid)
 				{
@@ -986,9 +993,13 @@ heapgettup_pagemode(HeapScanDesc scan,
 				page = scan->rs_nblocks;
 			page--;
 		}
-		else if (scan->rs_parallel != NULL)
+		else if (scan->rs_base.rs_parallel != NULL)
 		{
-			page = heap_parallelscan_nextpage(scan);
+			ParallelBlockTableScanDesc pbscan =
+			(ParallelBlockTableScanDesc) scan->rs_base.rs_parallel;
+
+			page = table_block_parallelscan_nextpage(scan->rs_base.rs_rd,
+													 pbscan);
 			finished = (page == InvalidBlockNumber);
 		}
 		else
@@ -1011,8 +1022,8 @@ heapgettup_pagemode(HeapScanDesc scan,
 			 * a little bit backwards on every invocation, which is confusing.
 			 * We don't guarantee any specific ordering in general, though.
 			 */
-			if (scan->rs_syncscan)
-				ss_report_location(scan->rs_rd, page);
+			if (scan->rs_base.rs_syncscan)
+				ss_report_location(scan->rs_base.rs_rd, page);
 		}
 
 		/*
@@ -1029,10 +1040,10 @@ heapgettup_pagemode(HeapScanDesc scan,
 			return;
 		}
 
-		heapgetpage(scan, page);
+		heapgetpage((TableScanDesc) scan, page);
 
 		dp = BufferGetPage(scan->rs_cbuf);
-		TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp);
+		TestForOldSnapshot(scan->rs_base.rs_snapshot, scan->rs_base.rs_rd, dp);
 		lines = scan->rs_ntuples;
 		linesleft = lines;
 		if (backward)
@@ -1095,86 +1106,16 @@ fastgetattr(HeapTuple tup, int attnum, TupleDesc tupleDesc,
  */
 
 
-/* ----------------
- *		heap_beginscan	- begin relation scan
- *
- * heap_beginscan is the "standard" case.
- *
- * heap_beginscan_catalog differs in setting up its own temporary snapshot.
- *
- * heap_beginscan_strat offers an extended API that lets the caller control
- * whether a nondefault buffer access strategy can be used, and whether
- * syncscan can be chosen (possibly resulting in the scan not starting from
- * block zero).  Both of these default to true with plain heap_beginscan.
- *
- * heap_beginscan_bm is an alternative entry point for setting up a
- * HeapScanDesc for a bitmap heap scan.  Although that scan technology is
- * really quite unlike a standard seqscan, there is just enough commonality
- * to make it worth using the same data structure.
- *
- * heap_beginscan_sampling is an alternative entry point for setting up a
- * HeapScanDesc for a TABLESAMPLE scan.  As with bitmap scans, it's worth
- * using the same data structure although the behavior is rather different.
- * In addition to the options offered by heap_beginscan_strat, this call
- * also allows control of whether page-mode visibility checking is used.
- * ----------------
- */
-HeapScanDesc
+TableScanDesc
 heap_beginscan(Relation relation, Snapshot snapshot,
-			   int nkeys, ScanKey key)
-{
-	return heap_beginscan_internal(relation, snapshot, nkeys, key, NULL,
-								   true, true, true, false, false, false);
-}
-
-HeapScanDesc
-heap_beginscan_catalog(Relation relation, int nkeys, ScanKey key)
-{
-	Oid			relid = RelationGetRelid(relation);
-	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
-
-	return heap_beginscan_internal(relation, snapshot, nkeys, key, NULL,
-								   true, true, true, false, false, true);
-}
-
-HeapScanDesc
-heap_beginscan_strat(Relation relation, Snapshot snapshot,
-					 int nkeys, ScanKey key,
-					 bool allow_strat, bool allow_sync)
-{
-	return heap_beginscan_internal(relation, snapshot, nkeys, key, NULL,
-								   allow_strat, allow_sync, true,
-								   false, false, false);
-}
-
-HeapScanDesc
-heap_beginscan_bm(Relation relation, Snapshot snapshot,
-				  int nkeys, ScanKey key)
-{
-	return heap_beginscan_internal(relation, snapshot, nkeys, key, NULL,
-								   false, false, true, true, false, false);
-}
-
-HeapScanDesc
-heap_beginscan_sampling(Relation relation, Snapshot snapshot,
-						int nkeys, ScanKey key,
-						bool allow_strat, bool allow_sync, bool allow_pagemode)
-{
-	return heap_beginscan_internal(relation, snapshot, nkeys, key, NULL,
-								   allow_strat, allow_sync, allow_pagemode,
-								   false, true, false);
-}
-
-static HeapScanDesc
-heap_beginscan_internal(Relation relation, Snapshot snapshot,
-						int nkeys, ScanKey key,
-						ParallelHeapScanDesc parallel_scan,
-						bool allow_strat,
-						bool allow_sync,
-						bool allow_pagemode,
-						bool is_bitmapscan,
-						bool is_samplescan,
-						bool temp_snap)
+			   int nkeys, ScanKey key,
+			   ParallelTableScanDesc parallel_scan,
+			   bool allow_strat,
+			   bool allow_sync,
+			   bool allow_pagemode,
+			   bool is_bitmapscan,
+			   bool is_samplescan,
+			   bool temp_snap)
 {
 	HeapScanDesc scan;
 
@@ -1192,21 +1133,22 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
 	 */
 	scan = (HeapScanDesc) palloc(sizeof(HeapScanDescData));
 
-	scan->rs_rd = relation;
-	scan->rs_snapshot = snapshot;
-	scan->rs_nkeys = nkeys;
-	scan->rs_bitmapscan = is_bitmapscan;
-	scan->rs_samplescan = is_samplescan;
+	scan->rs_base.rs_rd = relation;
+	scan->rs_base.rs_snapshot = snapshot;
+	scan->rs_base.rs_nkeys = nkeys;
+	scan->rs_base.rs_bitmapscan = is_bitmapscan;
+	scan->rs_base.rs_samplescan = is_samplescan;
 	scan->rs_strategy = NULL;	/* set in initscan */
-	scan->rs_allow_strat = allow_strat;
-	scan->rs_allow_sync = allow_sync;
-	scan->rs_temp_snap = temp_snap;
-	scan->rs_parallel = parallel_scan;
+	scan->rs_base.rs_allow_strat = allow_strat;
+	scan->rs_base.rs_allow_sync = allow_sync;
+	scan->rs_base.rs_temp_snap = temp_snap;
+	scan->rs_base.rs_parallel = parallel_scan;
 
 	/*
 	 * we can use page-at-a-time mode if it's an MVCC-safe snapshot
 	 */
-	scan->rs_pageatatime = allow_pagemode && IsMVCCSnapshot(snapshot);
+	scan->rs_base.rs_pageatatime =
+		allow_pagemode && snapshot && IsMVCCSnapshot(snapshot);
 
 	/*
 	 * For a seqscan in a serializable transaction, acquire a predicate lock
@@ -1230,23 +1172,29 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
 	 * initscan() and we don't want to allocate memory again
 	 */
 	if (nkeys > 0)
-		scan->rs_key = (ScanKey) palloc(sizeof(ScanKeyData) * nkeys);
+		scan->rs_base.rs_key = (ScanKey) palloc(sizeof(ScanKeyData) * nkeys);
 	else
-		scan->rs_key = NULL;
+		scan->rs_base.rs_key = NULL;
 
 	initscan(scan, key, false);
 
-	return scan;
+	return (TableScanDesc) scan;
 }
 
-/* ----------------
- *		heap_rescan		- restart a relation scan
- * ----------------
- */
 void
-heap_rescan(HeapScanDesc scan,
-			ScanKey key)
+heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
+			bool allow_strat, bool allow_sync, bool allow_pagemode)
 {
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+
+	if (set_params)
+	{
+		scan->rs_base.rs_allow_strat = allow_strat;
+		scan->rs_base.rs_allow_sync = allow_sync;
+		scan->rs_base.rs_pageatatime =
+			allow_pagemode && IsMVCCSnapshot(scan->rs_base.rs_snapshot);
+	}
+
 	/*
 	 * unpin scan buffers
 	 */
@@ -1259,37 +1207,11 @@ heap_rescan(HeapScanDesc scan,
 	initscan(scan, key, true);
 }
 
-/* ----------------
- *		heap_rescan_set_params	- restart a relation scan after changing params
- *
- * This call allows changing the buffer strategy, syncscan, and pagemode
- * options before starting a fresh scan.  Note that although the actual use
- * of syncscan might change (effectively, enabling or disabling reporting),
- * the previously selected startblock will be kept.
- * ----------------
- */
 void
-heap_rescan_set_params(HeapScanDesc scan, ScanKey key,
-					   bool allow_strat, bool allow_sync, bool allow_pagemode)
+heap_endscan(TableScanDesc sscan)
 {
-	/* adjust parameters */
-	scan->rs_allow_strat = allow_strat;
-	scan->rs_allow_sync = allow_sync;
-	scan->rs_pageatatime = allow_pagemode && IsMVCCSnapshot(scan->rs_snapshot);
-	/* ... and rescan */
-	heap_rescan(scan, key);
-}
+	HeapScanDesc scan = (HeapScanDesc) sscan;
 
-/* ----------------
- *		heap_endscan	- end relation scan
- *
- *		See how to integrate with index scans.
- *		Check handling if reldesc caching.
- * ----------------
- */
-void
-heap_endscan(HeapScanDesc scan)
-{
 	/* Note: no locking manipulations needed */
 
 	/*
@@ -1301,246 +1223,20 @@ heap_endscan(HeapScanDesc scan)
 	/*
 	 * decrement relation reference count and free scan descriptor storage
 	 */
-	RelationDecrementReferenceCount(scan->rs_rd);
+	RelationDecrementReferenceCount(scan->rs_base.rs_rd);
 
-	if (scan->rs_key)
-		pfree(scan->rs_key);
+	if (scan->rs_base.rs_key)
+		pfree(scan->rs_base.rs_key);
 
 	if (scan->rs_strategy != NULL)
 		FreeAccessStrategy(scan->rs_strategy);
 
-	if (scan->rs_temp_snap)
-		UnregisterSnapshot(scan->rs_snapshot);
+	if (scan->rs_base.rs_temp_snap)
+		UnregisterSnapshot(scan->rs_base.rs_snapshot);
 
 	pfree(scan);
 }
 
-/* ----------------
- *		heap_parallelscan_estimate - estimate storage for ParallelHeapScanDesc
- *
- *		Sadly, this doesn't reduce to a constant, because the size required
- *		to serialize the snapshot can vary.
- * ----------------
- */
-Size
-heap_parallelscan_estimate(Snapshot snapshot)
-{
-	Size		sz = offsetof(ParallelHeapScanDescData, phs_snapshot_data);
-
-	if (IsMVCCSnapshot(snapshot))
-		sz = add_size(sz, EstimateSnapshotSpace(snapshot));
-	else
-		Assert(snapshot == SnapshotAny);
-
-	return sz;
-}
-
-/* ----------------
- *		heap_parallelscan_initialize - initialize ParallelHeapScanDesc
- *
- *		Must allow as many bytes of shared memory as returned by
- *		heap_parallelscan_estimate.  Call this just once in the leader
- *		process; then, individual workers attach via heap_beginscan_parallel.
- * ----------------
- */
-void
-heap_parallelscan_initialize(ParallelHeapScanDesc target, Relation relation,
-							 Snapshot snapshot)
-{
-	target->phs_relid = RelationGetRelid(relation);
-	target->phs_nblocks = RelationGetNumberOfBlocks(relation);
-	/* compare phs_syncscan initialization to similar logic in initscan */
-	target->phs_syncscan = synchronize_seqscans &&
-		!RelationUsesLocalBuffers(relation) &&
-		target->phs_nblocks > NBuffers / 4;
-	SpinLockInit(&target->phs_mutex);
-	target->phs_startblock = InvalidBlockNumber;
-	pg_atomic_init_u64(&target->phs_nallocated, 0);
-	if (IsMVCCSnapshot(snapshot))
-	{
-		SerializeSnapshot(snapshot, target->phs_snapshot_data);
-		target->phs_snapshot_any = false;
-	}
-	else
-	{
-		Assert(snapshot == SnapshotAny);
-		target->phs_snapshot_any = true;
-	}
-}
-
-/* ----------------
- *		heap_parallelscan_reinitialize - reset a parallel scan
- *
- *		Call this in the leader process.  Caller is responsible for
- *		making sure that all workers have finished the scan beforehand.
- * ----------------
- */
-void
-heap_parallelscan_reinitialize(ParallelHeapScanDesc parallel_scan)
-{
-	pg_atomic_write_u64(&parallel_scan->phs_nallocated, 0);
-}
-
-/* ----------------
- *		heap_beginscan_parallel - join a parallel scan
- *
- *		Caller must hold a suitable lock on the correct relation.
- * ----------------
- */
-HeapScanDesc
-heap_beginscan_parallel(Relation relation, ParallelHeapScanDesc parallel_scan)
-{
-	Snapshot	snapshot;
-
-	Assert(RelationGetRelid(relation) == parallel_scan->phs_relid);
-
-	if (!parallel_scan->phs_snapshot_any)
-	{
-		/* Snapshot was serialized -- restore it */
-		snapshot = RestoreSnapshot(parallel_scan->phs_snapshot_data);
-		RegisterSnapshot(snapshot);
-	}
-	else
-	{
-		/* SnapshotAny passed by caller (not serialized) */
-		snapshot = SnapshotAny;
-	}
-
-	return heap_beginscan_internal(relation, snapshot, 0, NULL, parallel_scan,
-								   true, true, true, false, false,
-								   !parallel_scan->phs_snapshot_any);
-}
-
-/* ----------------
- *		heap_parallelscan_startblock_init - find and set the scan's startblock
- *
- *		Determine where the parallel seq scan should start.  This function may
- *		be called many times, once by each parallel worker.  We must be careful
- *		only to set the startblock once.
- * ----------------
- */
-static void
-heap_parallelscan_startblock_init(HeapScanDesc scan)
-{
-	BlockNumber sync_startpage = InvalidBlockNumber;
-	ParallelHeapScanDesc parallel_scan;
-
-	Assert(scan->rs_parallel);
-	parallel_scan = scan->rs_parallel;
-
-retry:
-	/* Grab the spinlock. */
-	SpinLockAcquire(&parallel_scan->phs_mutex);
-
-	/*
-	 * If the scan's startblock has not yet been initialized, we must do so
-	 * now.  If this is not a synchronized scan, we just start at block 0, but
-	 * if it is a synchronized scan, we must get the starting position from
-	 * the synchronized scan machinery.  We can't hold the spinlock while
-	 * doing that, though, so release the spinlock, get the information we
-	 * need, and retry.  If nobody else has initialized the scan in the
-	 * meantime, we'll fill in the value we fetched on the second time
-	 * through.
-	 */
-	if (parallel_scan->phs_startblock == InvalidBlockNumber)
-	{
-		if (!parallel_scan->phs_syncscan)
-			parallel_scan->phs_startblock = 0;
-		else if (sync_startpage != InvalidBlockNumber)
-			parallel_scan->phs_startblock = sync_startpage;
-		else
-		{
-			SpinLockRelease(&parallel_scan->phs_mutex);
-			sync_startpage = ss_get_location(scan->rs_rd, scan->rs_nblocks);
-			goto retry;
-		}
-	}
-	SpinLockRelease(&parallel_scan->phs_mutex);
-}
-
-/* ----------------
- *		heap_parallelscan_nextpage - get the next page to scan
- *
- *		Get the next page to scan.  Even if there are no pages left to scan,
- *		another backend could have grabbed a page to scan and not yet finished
- *		looking at it, so it doesn't follow that the scan is done when the
- *		first backend gets an InvalidBlockNumber return.
- * ----------------
- */
-static BlockNumber
-heap_parallelscan_nextpage(HeapScanDesc scan)
-{
-	BlockNumber page;
-	ParallelHeapScanDesc parallel_scan;
-	uint64		nallocated;
-
-	Assert(scan->rs_parallel);
-	parallel_scan = scan->rs_parallel;
-
-	/*
-	 * phs_nallocated tracks how many pages have been allocated to workers
-	 * already.  When phs_nallocated >= rs_nblocks, all blocks have been
-	 * allocated.
-	 *
-	 * Because we use an atomic fetch-and-add to fetch the current value, the
-	 * phs_nallocated counter will exceed rs_nblocks, because workers will
-	 * still increment the value, when they try to allocate the next block but
-	 * all blocks have been allocated already. The counter must be 64 bits
-	 * wide because of that, to avoid wrapping around when rs_nblocks is close
-	 * to 2^32.
-	 *
-	 * The actual page to return is calculated by adding the counter to the
-	 * starting block number, modulo nblocks.
-	 */
-	nallocated = pg_atomic_fetch_add_u64(&parallel_scan->phs_nallocated, 1);
-	if (nallocated >= scan->rs_nblocks)
-		page = InvalidBlockNumber;	/* all blocks have been allocated */
-	else
-		page = (nallocated + parallel_scan->phs_startblock) % scan->rs_nblocks;
-
-	/*
-	 * Report scan location.  Normally, we report the current page number.
-	 * When we reach the end of the scan, though, we report the starting page,
-	 * not the ending page, just so the starting positions for later scans
-	 * doesn't slew backwards.  We only report the position at the end of the
-	 * scan once, though: subsequent callers will report nothing.
-	 */
-	if (scan->rs_syncscan)
-	{
-		if (page != InvalidBlockNumber)
-			ss_report_location(scan->rs_rd, page);
-		else if (nallocated == scan->rs_nblocks)
-			ss_report_location(scan->rs_rd, parallel_scan->phs_startblock);
-	}
-
-	return page;
-}
-
-/* ----------------
- *		heap_update_snapshot
- *
- *		Update snapshot info in heap scan descriptor.
- * ----------------
- */
-void
-heap_update_snapshot(HeapScanDesc scan, Snapshot snapshot)
-{
-	Assert(IsMVCCSnapshot(snapshot));
-
-	RegisterSnapshot(snapshot);
-	scan->rs_snapshot = snapshot;
-	scan->rs_temp_snap = true;
-}
-
-/* ----------------
- *		heap_getnext	- retrieve next tuple in scan
- *
- *		Fix to work with index relations.
- *		We don't return the buffer anymore, but you can get it from the
- *		returned HeapTuple.
- * ----------------
- */
-
 #ifdef HEAPDEBUGALL
 #define HEAPDEBUG_1 \
 	elog(DEBUG2, "heap_getnext([%s,nkeys=%d],dir=%d) called", \
@@ -1557,17 +1253,32 @@ heap_update_snapshot(HeapScanDesc scan, Snapshot snapshot)
 
 
 HeapTuple
-heap_getnext(HeapScanDesc scan, ScanDirection direction)
+heap_getnext(TableScanDesc sscan, ScanDirection direction)
 {
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+
+	/*
+	 * This is still widely used directly, without going through the table AM,
+	 * so add a safety check.  It's possible we should, at a later point,
+	 * downgrade this to an assert. The reason for checking the AM routine,
+	 * rather than the AM oid, is that this allows to write regression tests
+	 * that create another AM reusing the heap handler.
+	 */
+	if (unlikely(sscan->rs_rd->rd_tableam != GetHeapamTableAmRoutine()))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("only heap AM is supported")));
+
 	/* Note: no locking manipulations needed */
 
 	HEAPDEBUG_1;				/* heap_getnext( info ) */
 
-	if (scan->rs_pageatatime)
+	if (scan->rs_base.rs_pageatatime)
 		heapgettup_pagemode(scan, direction,
-							scan->rs_nkeys, scan->rs_key);
+							scan->rs_base.rs_nkeys, scan->rs_base.rs_key);
 	else
-		heapgettup(scan, direction, scan->rs_nkeys, scan->rs_key);
+		heapgettup(scan, direction,
+				   scan->rs_base.rs_nkeys, scan->rs_base.rs_key);
 
 	if (scan->rs_ctup.t_data == NULL)
 	{
@@ -1581,9 +1292,58 @@ heap_getnext(HeapScanDesc scan, ScanDirection direction)
 	 */
 	HEAPDEBUG_3;				/* heap_getnext returning tuple */
 
-	pgstat_count_heap_getnext(scan->rs_rd);
+	pgstat_count_heap_getnext(scan->rs_base.rs_rd);
 
-	return &(scan->rs_ctup);
+	return &scan->rs_ctup;
+}
+
+#ifdef HEAPAMSLOTDEBUGALL
+#define HEAPAMSLOTDEBUG_1 \
+	elog(DEBUG2, "heapam_getnextslot([%s,nkeys=%d],dir=%d) called", \
+		 RelationGetRelationName(scan->rs_base.rs_rd), scan->rs_base.rs_nkeys, (int) direction)
+#define HEAPAMSLOTDEBUG_2 \
+	elog(DEBUG2, "heapam_getnextslot returning EOS")
+#define HEAPAMSLOTDEBUG_3 \
+	elog(DEBUG2, "heapam_getnextslot returning tuple")
+#else
+#define HEAPAMSLOTDEBUG_1
+#define HEAPAMSLOTDEBUG_2
+#define HEAPAMSLOTDEBUG_3
+#endif
+
+bool
+heap_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *slot)
+{
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+
+	/* Note: no locking manipulations needed */
+
+	HEAPAMSLOTDEBUG_1;			/* heap_getnextslot( info ) */
+
+	if (scan->rs_base.rs_pageatatime)
+		heapgettup_pagemode(scan, direction,
+							scan->rs_base.rs_nkeys, scan->rs_base.rs_key);
+	else
+		heapgettup(scan, direction, scan->rs_base.rs_nkeys, scan->rs_base.rs_key);
+
+	if (scan->rs_ctup.t_data == NULL)
+	{
+		HEAPAMSLOTDEBUG_2;		/* heap_getnextslot returning EOS */
+		ExecClearTuple(slot);
+		return false;
+	}
+
+	/*
+	 * if we get here it means we have a new current scan tuple, so point to
+	 * the proper return buffer and return the tuple.
+	 */
+	HEAPAMSLOTDEBUG_3;			/* heap_getnextslot returning tuple */
+
+	pgstat_count_heap_getnext(scan->rs_base.rs_rd);
+
+	ExecStoreBufferHeapTuple(&scan->rs_ctup, slot,
+							 scan->rs_cbuf);
+	return true;
 }
 
 /*
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 518d1df84a1..6a26fcef94c 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -19,15 +19,181 @@
  */
 #include "postgres.h"
 
+#include "access/heapam.h"
 #include "access/tableam.h"
+#include "storage/bufmgr.h"
 #include "utils/builtins.h"
 
 
 static const TableAmRoutine heapam_methods;
 
 
+/* ------------------------------------------------------------------------
+ * Slot related callbacks for heap AM
+ * ------------------------------------------------------------------------
+ */
+
+static const TupleTableSlotOps *
+heapam_slot_callbacks(Relation relation)
+{
+	return &TTSOpsBufferHeapTuple;
+}
+
+
+/* ------------------------------------------------------------------------
+ * Index Scan Callbacks for heap AM
+ * ------------------------------------------------------------------------
+ */
+
+static IndexFetchTableData *
+heapam_index_fetch_begin(Relation rel)
+{
+	IndexFetchHeapData *hscan = palloc0(sizeof(IndexFetchHeapData));
+
+	hscan->xs_base.rel = rel;
+	hscan->xs_cbuf = InvalidBuffer;
+
+	return &hscan->xs_base;
+}
+
+static void
+heapam_index_fetch_reset(IndexFetchTableData *scan)
+{
+	IndexFetchHeapData *hscan = (IndexFetchHeapData *) scan;
+
+	if (BufferIsValid(hscan->xs_cbuf))
+	{
+		ReleaseBuffer(hscan->xs_cbuf);
+		hscan->xs_cbuf = InvalidBuffer;
+	}
+}
+
+static void
+heapam_index_fetch_end(IndexFetchTableData *scan)
+{
+	IndexFetchHeapData *hscan = (IndexFetchHeapData *) scan;
+
+	heapam_index_fetch_reset(scan);
+
+	pfree(hscan);
+}
+
+static bool
+heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
+						 ItemPointer tid,
+						 Snapshot snapshot,
+						 TupleTableSlot *slot,
+						 bool *call_again, bool *all_dead)
+{
+	IndexFetchHeapData *hscan = (IndexFetchHeapData *) scan;
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	bool		got_heap_tuple;
+
+	Assert(TTS_IS_BUFFERTUPLE(slot));
+
+	/* We can skip the buffer-switching logic if we're in mid-HOT chain. */
+	if (!*call_again)
+	{
+		/* Switch to correct buffer if we don't have it already */
+		Buffer		prev_buf = hscan->xs_cbuf;
+
+		hscan->xs_cbuf = ReleaseAndReadBuffer(hscan->xs_cbuf,
+											  hscan->xs_base.rel,
+											  ItemPointerGetBlockNumber(tid));
+
+		/*
+		 * Prune page, but only if we weren't already on this page
+		 */
+		if (prev_buf != hscan->xs_cbuf)
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+	}
+
+	/* Obtain share-lock on the buffer so we can examine visibility */
+	LockBuffer(hscan->xs_cbuf, BUFFER_LOCK_SHARE);
+	got_heap_tuple = heap_hot_search_buffer(tid,
+											hscan->xs_base.rel,
+											hscan->xs_cbuf,
+											snapshot,
+											&bslot->base.tupdata,
+											all_dead,
+											!*call_again);
+	bslot->base.tupdata.t_self = *tid;
+	LockBuffer(hscan->xs_cbuf, BUFFER_LOCK_UNLOCK);
+
+	if (got_heap_tuple)
+	{
+		/*
+		 * Only in a non-MVCC snapshot can more than one member of the HOT
+		 * chain be visible.
+		 */
+		*call_again = !IsMVCCSnapshot(snapshot);
+
+		slot->tts_tableOid = RelationGetRelid(scan->rel);
+		ExecStoreBufferHeapTuple(&bslot->base.tupdata, slot, hscan->xs_cbuf);
+	}
+	else
+	{
+		/* We've reached the end of the HOT chain. */
+		*call_again = false;
+	}
+
+	return got_heap_tuple;
+}
+
+
+/* ------------------------------------------------------------------------
+ * Callbacks for non-modifying operations on individual tuples for heap AM
+ * ------------------------------------------------------------------------
+ */
+
+static bool
+heapam_tuple_satisfies_snapshot(Relation rel, TupleTableSlot *slot,
+								Snapshot snapshot)
+{
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	bool		res;
+
+	Assert(TTS_IS_BUFFERTUPLE(slot));
+	Assert(BufferIsValid(bslot->buffer));
+
+	/*
+	 * We need buffer pin and lock to call HeapTupleSatisfiesVisibility.
+	 * Caller should be holding pin, but not lock.
+	 */
+	LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
+	res = HeapTupleSatisfiesVisibility(bslot->base.tuple, snapshot,
+									   bslot->buffer);
+	LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
+
+	return res;
+}
+
+
+/* ------------------------------------------------------------------------
+ * Definition of the heap table access method.
+ * ------------------------------------------------------------------------
+ */
+
 static const TableAmRoutine heapam_methods = {
 	.type = T_TableAmRoutine,
+
+	.slot_callbacks = heapam_slot_callbacks,
+
+	.scan_begin = heap_beginscan,
+	.scan_end = heap_endscan,
+	.scan_rescan = heap_rescan,
+	.scan_getnextslot = heap_getnextslot,
+
+	.parallelscan_estimate = table_block_parallelscan_estimate,
+	.parallelscan_initialize = table_block_parallelscan_initialize,
+	.parallelscan_reinitialize = table_block_parallelscan_reinitialize,
+
+	.index_fetch_begin = heapam_index_fetch_begin,
+	.index_fetch_reset = heapam_index_fetch_reset,
+	.index_fetch_end = heapam_index_fetch_end,
+	.index_fetch_tuple = heapam_index_fetch_tuple,
+
+	.tuple_satisfies_snapshot = heapam_tuple_satisfies_snapshot,
 };
 
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index e0a5ea42d52..d34e4ccd3d5 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -22,6 +22,7 @@
 #include "access/genam.h"
 #include "access/heapam.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "catalog/index.h"
 #include "lib/stringinfo.h"
@@ -83,6 +84,7 @@ RelationGetIndexScan(Relation indexRelation, int nkeys, int norderbys)
 	scan = (IndexScanDesc) palloc(sizeof(IndexScanDescData));
 
 	scan->heapRelation = NULL;	/* may be set later */
+	scan->xs_heapfetch = NULL;
 	scan->indexRelation = indexRelation;
 	scan->xs_snapshot = InvalidSnapshot;	/* caller must initialize this */
 	scan->numberOfKeys = nkeys;
@@ -123,11 +125,6 @@ RelationGetIndexScan(Relation indexRelation, int nkeys, int norderbys)
 	scan->xs_hitup = NULL;
 	scan->xs_hitupdesc = NULL;
 
-	ItemPointerSetInvalid(&scan->xs_ctup.t_self);
-	scan->xs_ctup.t_data = NULL;
-	scan->xs_cbuf = InvalidBuffer;
-	scan->xs_continue_hot = false;
-
 	return scan;
 }
 
@@ -335,6 +332,7 @@ systable_beginscan(Relation heapRelation,
 
 	sysscan->heap_rel = heapRelation;
 	sysscan->irel = irel;
+	sysscan->slot = table_slot_create(heapRelation, NULL);
 
 	if (snapshot == NULL)
 	{
@@ -384,9 +382,9 @@ systable_beginscan(Relation heapRelation,
 		 * disadvantage; and there are no compensating advantages, because
 		 * it's unlikely that such scans will occur in parallel.
 		 */
-		sysscan->scan = heap_beginscan_strat(heapRelation, snapshot,
-											 nkeys, key,
-											 true, false);
+		sysscan->scan = table_beginscan_strat(heapRelation, snapshot,
+											  nkeys, key,
+											  true, false);
 		sysscan->iscan = NULL;
 	}
 
@@ -401,28 +399,45 @@ systable_beginscan(Relation heapRelation,
  * Note that returned tuple is a reference to data in a disk buffer;
  * it must not be modified, and should be presumed inaccessible after
  * next getnext() or endscan() call.
+ *
+ * XXX: It'd probably make sense to start offering a slot based interface.
  */
 HeapTuple
 systable_getnext(SysScanDesc sysscan)
 {
-	HeapTuple	htup;
+	HeapTuple	htup = NULL;
 
 	if (sysscan->irel)
 	{
-		htup = index_getnext(sysscan->iscan, ForwardScanDirection);
+		if (index_getnext_slot(sysscan->iscan, ForwardScanDirection, sysscan->slot))
+		{
+			bool		shouldFree;
 
-		/*
-		 * We currently don't need to support lossy index operators for any
-		 * system catalog scan.  It could be done here, using the scan keys to
-		 * drive the operator calls, if we arranged to save the heap attnums
-		 * during systable_beginscan(); this is practical because we still
-		 * wouldn't need to support indexes on expressions.
-		 */
-		if (htup && sysscan->iscan->xs_recheck)
-			elog(ERROR, "system catalog scans with lossy index conditions are not implemented");
+			htup = ExecFetchSlotHeapTuple(sysscan->slot, false, &shouldFree);
+			Assert(!shouldFree);
+
+			/*
+			 * We currently don't need to support lossy index operators for
+			 * any system catalog scan.  It could be done here, using the scan
+			 * keys to drive the operator calls, if we arranged to save the
+			 * heap attnums during systable_beginscan(); this is practical
+			 * because we still wouldn't need to support indexes on
+			 * expressions.
+			 */
+			if (sysscan->iscan->xs_recheck)
+				elog(ERROR, "system catalog scans with lossy index conditions are not implemented");
+		}
 	}
 	else
-		htup = heap_getnext(sysscan->scan, ForwardScanDirection);
+	{
+		if (table_scan_getnextslot(sysscan->scan, ForwardScanDirection, sysscan->slot))
+		{
+			bool		shouldFree;
+
+			htup = ExecFetchSlotHeapTuple(sysscan->slot, false, &shouldFree);
+			Assert(!shouldFree);
+		}
+	}
 
 	return htup;
 }
@@ -446,37 +461,20 @@ systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup)
 	Snapshot	freshsnap;
 	bool		result;
 
+	Assert(tup == ExecFetchSlotHeapTuple(sysscan->slot, false, NULL));
+
 	/*
-	 * Trust that LockBuffer() and HeapTupleSatisfiesMVCC() do not themselves
+	 * Trust that table_tuple_satisfies_snapshot() and its subsidiaries
+	 * (commonly LockBuffer() and HeapTupleSatisfiesMVCC()) do not themselves
 	 * acquire snapshots, so we need not register the snapshot.  Those
 	 * facilities are too low-level to have any business scanning tables.
 	 */
 	freshsnap = GetCatalogSnapshot(RelationGetRelid(sysscan->heap_rel));
 
-	if (sysscan->irel)
-	{
-		IndexScanDesc scan = sysscan->iscan;
+	result = table_tuple_satisfies_snapshot(sysscan->heap_rel,
+											sysscan->slot,
+											freshsnap);
 
-		Assert(IsMVCCSnapshot(scan->xs_snapshot));
-		Assert(tup == &scan->xs_ctup);
-		Assert(BufferIsValid(scan->xs_cbuf));
-		/* must hold a buffer lock to call HeapTupleSatisfiesVisibility */
-		LockBuffer(scan->xs_cbuf, BUFFER_LOCK_SHARE);
-		result = HeapTupleSatisfiesVisibility(tup, freshsnap, scan->xs_cbuf);
-		LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK);
-	}
-	else
-	{
-		HeapScanDesc scan = sysscan->scan;
-
-		Assert(IsMVCCSnapshot(scan->rs_snapshot));
-		Assert(tup == &scan->rs_ctup);
-		Assert(BufferIsValid(scan->rs_cbuf));
-		/* must hold a buffer lock to call HeapTupleSatisfiesVisibility */
-		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
-		result = HeapTupleSatisfiesVisibility(tup, freshsnap, scan->rs_cbuf);
-		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-	}
 	return result;
 }
 
@@ -488,13 +486,19 @@ systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup)
 void
 systable_endscan(SysScanDesc sysscan)
 {
+	if (sysscan->slot)
+	{
+		ExecDropSingleTupleTableSlot(sysscan->slot);
+		sysscan->slot = NULL;
+	}
+
 	if (sysscan->irel)
 	{
 		index_endscan(sysscan->iscan);
 		index_close(sysscan->irel, AccessShareLock);
 	}
 	else
-		heap_endscan(sysscan->scan);
+		table_endscan(sysscan->scan);
 
 	if (sysscan->snapshot)
 		UnregisterSnapshot(sysscan->snapshot);
@@ -541,6 +545,7 @@ systable_beginscan_ordered(Relation heapRelation,
 
 	sysscan->heap_rel = heapRelation;
 	sysscan->irel = indexRelation;
+	sysscan->slot = table_slot_create(heapRelation, NULL);
 
 	if (snapshot == NULL)
 	{
@@ -586,10 +591,12 @@ systable_beginscan_ordered(Relation heapRelation,
 HeapTuple
 systable_getnext_ordered(SysScanDesc sysscan, ScanDirection direction)
 {
-	HeapTuple	htup;
+	HeapTuple	htup = NULL;
 
 	Assert(sysscan->irel);
-	htup = index_getnext(sysscan->iscan, direction);
+	if (index_getnext_slot(sysscan->iscan, direction, sysscan->slot))
+		htup = ExecFetchSlotHeapTuple(sysscan->slot, false, NULL);
+
 	/* See notes in systable_getnext */
 	if (htup && sysscan->iscan->xs_recheck)
 		elog(ERROR, "system catalog scans with lossy index conditions are not implemented");
@@ -603,6 +610,12 @@ systable_getnext_ordered(SysScanDesc sysscan, ScanDirection direction)
 void
 systable_endscan_ordered(SysScanDesc sysscan)
 {
+	if (sysscan->slot)
+	{
+		ExecDropSingleTupleTableSlot(sysscan->slot);
+		sysscan->slot = NULL;
+	}
+
 	Assert(sysscan->irel);
 	index_endscan(sysscan->iscan);
 	if (sysscan->snapshot)
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 4ad30186d97..ae1c87ebadd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -72,6 +72,7 @@
 #include "access/amapi.h"
 #include "access/heapam.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "catalog/index.h"
@@ -235,6 +236,9 @@ index_beginscan(Relation heapRelation,
 	scan->heapRelation = heapRelation;
 	scan->xs_snapshot = snapshot;
 
+	/* prepare to fetch index matches from table */
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+
 	return scan;
 }
 
@@ -318,16 +322,12 @@ index_rescan(IndexScanDesc scan,
 	Assert(nkeys == scan->numberOfKeys);
 	Assert(norderbys == scan->numberOfOrderBys);
 
-	/* Release any held pin on a heap page */
-	if (BufferIsValid(scan->xs_cbuf))
-	{
-		ReleaseBuffer(scan->xs_cbuf);
-		scan->xs_cbuf = InvalidBuffer;
-	}
-
-	scan->xs_continue_hot = false;
+	/* Release resources (like buffer pins) from table accesses */
+	if (scan->xs_heapfetch)
+		table_index_fetch_reset(scan->xs_heapfetch);
 
 	scan->kill_prior_tuple = false; /* for safety */
+	scan->xs_heap_continue = false;
 
 	scan->indexRelation->rd_indam->amrescan(scan, keys, nkeys,
 											orderbys, norderbys);
@@ -343,11 +343,11 @@ index_endscan(IndexScanDesc scan)
 	SCAN_CHECKS;
 	CHECK_SCAN_PROCEDURE(amendscan);
 
-	/* Release any held pin on a heap page */
-	if (BufferIsValid(scan->xs_cbuf))
+	/* Release resources (like buffer pins) from table accesses */
+	if (scan->xs_heapfetch)
 	{
-		ReleaseBuffer(scan->xs_cbuf);
-		scan->xs_cbuf = InvalidBuffer;
+		table_index_fetch_end(scan->xs_heapfetch);
+		scan->xs_heapfetch = NULL;
 	}
 
 	/* End the AM's scan */
@@ -379,17 +379,16 @@ index_markpos(IndexScanDesc scan)
 /* ----------------
  *		index_restrpos	- restore a scan position
  *
- * NOTE: this only restores the internal scan state of the index AM.
- * The current result tuple (scan->xs_ctup) doesn't change.  See comments
- * for ExecRestrPos().
+ * NOTE: this only restores the internal scan state of the index AM.  See
+ * comments for ExecRestrPos().
  *
- * NOTE: in the presence of HOT chains, mark/restore only works correctly
- * if the scan's snapshot is MVCC-safe; that ensures that there's at most one
- * returnable tuple in each HOT chain, and so restoring the prior state at the
- * granularity of the index AM is sufficient.  Since the only current user
- * of mark/restore functionality is nodeMergejoin.c, this effectively means
- * that merge-join plans only work for MVCC snapshots.  This could be fixed
- * if necessary, but for now it seems unimportant.
+ * NOTE: For heap, in the presence of HOT chains, mark/restore only works
+ * correctly if the scan's snapshot is MVCC-safe; that ensures that there's at
+ * most one returnable tuple in each HOT chain, and so restoring the prior
+ * state at the granularity of the index AM is sufficient.  Since the only
+ * current user of mark/restore functionality is nodeMergejoin.c, this
+ * effectively means that merge-join plans only work for MVCC snapshots.  This
+ * could be fixed if necessary, but for now it seems unimportant.
  * ----------------
  */
 void
@@ -400,9 +399,12 @@ index_restrpos(IndexScanDesc scan)
 	SCAN_CHECKS;
 	CHECK_SCAN_PROCEDURE(amrestrpos);
 
-	scan->xs_continue_hot = false;
+	/* release resources (like buffer pins) from table accesses */
+	if (scan->xs_heapfetch)
+		table_index_fetch_reset(scan->xs_heapfetch);
 
 	scan->kill_prior_tuple = false; /* for safety */
+	scan->xs_heap_continue = false;
 
 	scan->indexRelation->rd_indam->amrestrpos(scan);
 }
@@ -483,6 +485,9 @@ index_parallelrescan(IndexScanDesc scan)
 {
 	SCAN_CHECKS;
 
+	if (scan->xs_heapfetch)
+		table_index_fetch_reset(scan->xs_heapfetch);
+
 	/* amparallelrescan is optional; assume no-op if not provided by AM */
 	if (scan->indexRelation->rd_indam->amparallelrescan != NULL)
 		scan->indexRelation->rd_indam->amparallelrescan(scan);
@@ -513,6 +518,9 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel, int nkeys,
 	scan->heapRelation = heaprel;
 	scan->xs_snapshot = snapshot;
 
+	/* prepare to fetch index matches from table */
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+
 	return scan;
 }
 
@@ -535,7 +543,7 @@ index_getnext_tid(IndexScanDesc scan, ScanDirection direction)
 
 	/*
 	 * The AM's amgettuple proc finds the next index entry matching the scan
-	 * keys, and puts the TID into scan->xs_ctup.t_self.  It should also set
+	 * keys, and puts the TID into scan->xs_heaptid.  It should also set
 	 * scan->xs_recheck and possibly scan->xs_itup/scan->xs_hitup, though we
 	 * pay no attention to those fields here.
 	 */
@@ -543,23 +551,23 @@ index_getnext_tid(IndexScanDesc scan, ScanDirection direction)
 
 	/* Reset kill flag immediately for safety */
 	scan->kill_prior_tuple = false;
+	scan->xs_heap_continue = false;
 
 	/* If we're out of index entries, we're done */
 	if (!found)
 	{
-		/* ... but first, release any held pin on a heap page */
-		if (BufferIsValid(scan->xs_cbuf))
-		{
-			ReleaseBuffer(scan->xs_cbuf);
-			scan->xs_cbuf = InvalidBuffer;
-		}
+		/* release resources (like buffer pins) from table accesses */
+		if (scan->xs_heapfetch)
+			table_index_fetch_reset(scan->xs_heapfetch);
+
 		return NULL;
 	}
+	Assert(ItemPointerIsValid(&scan->xs_heaptid));
 
 	pgstat_count_index_tuples(scan->indexRelation, 1);
 
 	/* Return the TID of the tuple we found. */
-	return &scan->xs_ctup.t_self;
+	return &scan->xs_heaptid;
 }
 
 /* ----------------
@@ -580,53 +588,18 @@ index_getnext_tid(IndexScanDesc scan, ScanDirection direction)
  * enough information to do it efficiently in the general case.
  * ----------------
  */
-HeapTuple
-index_fetch_heap(IndexScanDesc scan)
+bool
+index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot)
 {
-	ItemPointer tid = &scan->xs_ctup.t_self;
 	bool		all_dead = false;
-	bool		got_heap_tuple;
+	bool		found;
 
-	/* We can skip the buffer-switching logic if we're in mid-HOT chain. */
-	if (!scan->xs_continue_hot)
-	{
-		/* Switch to correct buffer if we don't have it already */
-		Buffer		prev_buf = scan->xs_cbuf;
+	found = table_index_fetch_tuple(scan->xs_heapfetch, &scan->xs_heaptid,
+									scan->xs_snapshot, slot,
+									&scan->xs_heap_continue, &all_dead);
 
-		scan->xs_cbuf = ReleaseAndReadBuffer(scan->xs_cbuf,
-											 scan->heapRelation,
-											 ItemPointerGetBlockNumber(tid));
-
-		/*
-		 * Prune page, but only if we weren't already on this page
-		 */
-		if (prev_buf != scan->xs_cbuf)
-			heap_page_prune_opt(scan->heapRelation, scan->xs_cbuf);
-	}
-
-	/* Obtain share-lock on the buffer so we can examine visibility */
-	LockBuffer(scan->xs_cbuf, BUFFER_LOCK_SHARE);
-	got_heap_tuple = heap_hot_search_buffer(tid, scan->heapRelation,
-											scan->xs_cbuf,
-											scan->xs_snapshot,
-											&scan->xs_ctup,
-											&all_dead,
-											!scan->xs_continue_hot);
-	LockBuffer(scan->xs_cbuf, BUFFER_LOCK_UNLOCK);
-
-	if (got_heap_tuple)
-	{
-		/*
-		 * Only in a non-MVCC snapshot can more than one member of the HOT
-		 * chain be visible.
-		 */
-		scan->xs_continue_hot = !IsMVCCSnapshot(scan->xs_snapshot);
+	if (found)
 		pgstat_count_heap_fetch(scan->indexRelation);
-		return &scan->xs_ctup;
-	}
-
-	/* We've reached the end of the HOT chain. */
-	scan->xs_continue_hot = false;
 
 	/*
 	 * If we scanned a whole HOT chain and found only dead tuples, tell index
@@ -638,17 +611,17 @@ index_fetch_heap(IndexScanDesc scan)
 	if (!scan->xactStartedInRecovery)
 		scan->kill_prior_tuple = all_dead;
 
-	return NULL;
+	return found;
 }
 
 /* ----------------
- *		index_getnext - get the next heap tuple from a scan
+ *		index_getnext_slot - get the next tuple from a scan
  *
- * The result is the next heap tuple satisfying the scan keys and the
- * snapshot, or NULL if no more matching tuples exist.
+ * The result is true if a tuple satisfying the scan keys and the snapshot was
+ * found, false otherwise.  The tuple is stored in the specified slot.
  *
- * On success, the buffer containing the heap tup is pinned (the pin will be
- * dropped in a future index_getnext_tid, index_fetch_heap or index_endscan
+ * On success, resources (like buffer pins) are likely to be held, and will be
+ * dropped by a future index_getnext_tid, index_fetch_heap or index_endscan
  * call).
  *
  * Note: caller must check scan->xs_recheck, and perform rechecking of the
@@ -656,32 +629,23 @@ index_fetch_heap(IndexScanDesc scan)
  * enough information to do it efficiently in the general case.
  * ----------------
  */
-HeapTuple
-index_getnext(IndexScanDesc scan, ScanDirection direction)
+bool
+index_getnext_slot(IndexScanDesc scan, ScanDirection direction, TupleTableSlot *slot)
 {
-	HeapTuple	heapTuple;
-	ItemPointer tid;
-
 	for (;;)
 	{
-		if (scan->xs_continue_hot)
-		{
-			/*
-			 * We are resuming scan of a HOT chain after having returned an
-			 * earlier member.  Must still hold pin on current heap page.
-			 */
-			Assert(BufferIsValid(scan->xs_cbuf));
-			Assert(ItemPointerGetBlockNumber(&scan->xs_ctup.t_self) ==
-				   BufferGetBlockNumber(scan->xs_cbuf));
-		}
-		else
+		if (!scan->xs_heap_continue)
 		{
+			ItemPointer tid;
+
 			/* Time to fetch the next TID from the index */
 			tid = index_getnext_tid(scan, direction);
 
 			/* If we're out of index entries, we're done */
 			if (tid == NULL)
 				break;
+
+			Assert(ItemPointerEquals(tid, &scan->xs_heaptid));
 		}
 
 		/*
@@ -689,12 +653,12 @@ index_getnext(IndexScanDesc scan, ScanDirection direction)
 		 * If we don't find anything, loop around and grab the next TID from
 		 * the index.
 		 */
-		heapTuple = index_fetch_heap(scan);
-		if (heapTuple != NULL)
-			return heapTuple;
+		Assert(ItemPointerIsValid(&scan->xs_heaptid));
+		if (index_fetch_heap(scan, slot))
+			return true;
 	}
 
-	return NULL;				/* failure exit */
+	return false;
 }
 
 /* ----------------
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 98917de2efd..60e0b90ccf2 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -310,7 +310,7 @@ btgetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
 		if (_bt_first(scan, ForwardScanDirection))
 		{
 			/* Save tuple ID, and continue scanning */
-			heapTid = &scan->xs_ctup.t_self;
+			heapTid = &scan->xs_heaptid;
 			tbm_add_tuples(tbm, heapTid, 1, false);
 			ntids++;
 
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index 92832237a8b..af3da3aa5b6 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -1135,7 +1135,7 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
 readcomplete:
 	/* OK, itemIndex says what to return */
 	currItem = &so->currPos.items[so->currPos.itemIndex];
-	scan->xs_ctup.t_self = currItem->heapTid;
+	scan->xs_heaptid = currItem->heapTid;
 	if (scan->xs_want_itup)
 		scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
 
@@ -1185,7 +1185,7 @@ _bt_next(IndexScanDesc scan, ScanDirection dir)
 
 	/* OK, itemIndex says what to return */
 	currItem = &so->currPos.items[so->currPos.itemIndex];
-	scan->xs_ctup.t_self = currItem->heapTid;
+	scan->xs_heaptid = currItem->heapTid;
 	if (scan->xs_want_itup)
 		scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
 
@@ -1964,7 +1964,7 @@ _bt_endpoint(IndexScanDesc scan, ScanDirection dir)
 
 	/* OK, itemIndex says what to return */
 	currItem = &so->currPos.items[so->currPos.itemIndex];
-	scan->xs_ctup.t_self = currItem->heapTid;
+	scan->xs_heaptid = currItem->heapTid;
 	if (scan->xs_want_itup)
 		scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
 
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index dc398e11867..e37cbac7b3c 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -61,6 +61,7 @@
 #include "access/nbtree.h"
 #include "access/parallel.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
@@ -158,9 +159,9 @@ typedef struct BTShared
 	/*
 	 * This variable-sized field must come last.
 	 *
-	 * See _bt_parallel_estimate_shared() and heap_parallelscan_estimate().
+	 * See _bt_parallel_estimate_shared() and table_parallelscan_estimate().
 	 */
-	ParallelHeapScanDescData heapdesc;
+	ParallelTableScanDescData heapdesc;
 } BTShared;
 
 /*
@@ -282,7 +283,7 @@ static void _bt_load(BTWriteState *wstate,
 static void _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent,
 				   int request);
 static void _bt_end_parallel(BTLeader *btleader);
-static Size _bt_parallel_estimate_shared(Snapshot snapshot);
+static Size _bt_parallel_estimate_shared(Relation heap, Snapshot snapshot);
 static double _bt_parallel_heapscan(BTBuildState *buildstate,
 					  bool *brokenhotchain);
 static void _bt_leader_participate_as_worker(BTBuildState *buildstate);
@@ -1275,7 +1276,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	 * Estimate size for our own PARALLEL_KEY_BTREE_SHARED workspace, and
 	 * PARALLEL_KEY_TUPLESORT tuplesort workspace
 	 */
-	estbtshared = _bt_parallel_estimate_shared(snapshot);
+	estbtshared = _bt_parallel_estimate_shared(btspool->heap, snapshot);
 	shm_toc_estimate_chunk(&pcxt->estimator, estbtshared);
 	estsort = tuplesort_estimate_shared(scantuplesortstates);
 	shm_toc_estimate_chunk(&pcxt->estimator, estsort);
@@ -1316,7 +1317,8 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	btshared->havedead = false;
 	btshared->indtuples = 0.0;
 	btshared->brokenhotchain = false;
-	heap_parallelscan_initialize(&btshared->heapdesc, btspool->heap, snapshot);
+	table_parallelscan_initialize(btspool->heap, &btshared->heapdesc,
+								  snapshot);
 
 	/*
 	 * Store shared tuplesort-private state, for which we reserved space.
@@ -1403,10 +1405,10 @@ _bt_end_parallel(BTLeader *btleader)
  * btree index build based on the snapshot its parallel scan will use.
  */
 static Size
-_bt_parallel_estimate_shared(Snapshot snapshot)
+_bt_parallel_estimate_shared(Relation heap, Snapshot snapshot)
 {
 	return add_size(offsetof(BTShared, heapdesc),
-					heap_parallelscan_estimate(snapshot));
+					table_parallelscan_estimate(heap, snapshot));
 }
 
 /*
@@ -1617,7 +1619,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 {
 	SortCoordinate coordinate;
 	BTBuildState buildstate;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	double		reltuples;
 	IndexInfo  *indexInfo;
 
@@ -1670,7 +1672,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	/* Join parallel scan */
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
-	scan = heap_beginscan_parallel(btspool->heap, &btshared->heapdesc);
+	scan = table_beginscan_parallel(btspool->heap, &btshared->heapdesc);
 	reltuples = IndexBuildHeapScan(btspool->heap, btspool->index, indexInfo,
 								   true, _bt_build_callback,
 								   (void *) &buildstate, scan);
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index dc0d63924db..9365bc57ad5 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -927,7 +927,7 @@ spggettuple(IndexScanDesc scan, ScanDirection dir)
 		if (so->iPtr < so->nPtrs)
 		{
 			/* continuing to return reported tuples */
-			scan->xs_ctup.t_self = so->heapPtrs[so->iPtr];
+			scan->xs_heaptid = so->heapPtrs[so->iPtr];
 			scan->xs_recheck = so->recheck[so->iPtr];
 			scan->xs_hitup = so->reconTups[so->iPtr];
 
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 84851e4ff88..628d930c130 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -6,13 +6,304 @@
  * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
- * src/backend/access/table/tableam.c
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/table/tableam.c
+ *
+ * NOTES
+ *	  Note that most function in here are documented in tableam.h, rather than
+ *	  here. That's because there's a lot of inline functions in tableam.h and
+ *	  it'd be harder to understand if one constantly had to switch between files.
+ *
  *----------------------------------------------------------------------
  */
 #include "postgres.h"
 
+#include "access/heapam.h"		/* for ss_* */
 #include "access/tableam.h"
+#include "access/xact.h"
+#include "storage/bufmgr.h"
+#include "storage/shmem.h"
 
 
 /* GUC variables */
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
+bool		synchronize_seqscans = true;
+
+
+/* ----------------------------------------------------------------------------
+ * Slot functions.
+ * ----------------------------------------------------------------------------
+ */
+
+const TupleTableSlotOps *
+table_slot_callbacks(Relation relation)
+{
+	const TupleTableSlotOps *tts_cb;
+
+	if (relation->rd_tableam)
+		tts_cb = relation->rd_tableam->slot_callbacks(relation);
+	else if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		/*
+		 * Historically FDWs expect to store heap tuples in slots. Continue
+		 * handing them one, to make it less painful to adapt FDWs to new
+		 * versions. The cost of a heap slot over a virtual slot is pretty
+		 * small.
+		 */
+		tts_cb = &TTSOpsHeapTuple;
+	}
+	else
+	{
+		/*
+		 * These need to be supported, as some parts of the code (like COPY)
+		 * need to create slots for such relations too. It seems better to
+		 * centralize the knowledge that a heap slot is the right thing in
+		 * that case here.
+		 */
+		Assert(relation->rd_rel->relkind == RELKIND_VIEW ||
+			   relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+		tts_cb = &TTSOpsVirtual;
+	}
+
+	return tts_cb;
+}
+
+TupleTableSlot *
+table_slot_create(Relation relation, List **reglist)
+{
+	const TupleTableSlotOps *tts_cb;
+	TupleTableSlot *slot;
+
+	tts_cb = table_slot_callbacks(relation);
+	slot = MakeSingleTupleTableSlot(RelationGetDescr(relation), tts_cb);
+
+	if (reglist)
+		*reglist = lappend(*reglist, slot);
+
+	return slot;
+}
+
+
+/* ----------------------------------------------------------------------------
+ * Table scan functions.
+ * ----------------------------------------------------------------------------
+ */
+
+TableScanDesc
+table_beginscan_catalog(Relation relation, int nkeys, struct ScanKeyData *key)
+{
+	Oid			relid = RelationGetRelid(relation);
+	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
+
+	return relation->rd_tableam->scan_begin(relation, snapshot, nkeys, key, NULL,
+											true, true, true, false, false, true);
+}
+
+void
+table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot)
+{
+	Assert(IsMVCCSnapshot(snapshot));
+
+	RegisterSnapshot(snapshot);
+	scan->rs_snapshot = snapshot;
+	scan->rs_temp_snap = true;
+}
+
+
+/* ----------------------------------------------------------------------------
+ * Parallel table scan related functions.
+ * ----------------------------------------------------------------------------
+ */
+
+Size
+table_parallelscan_estimate(Relation rel, Snapshot snapshot)
+{
+	Size		sz = 0;
+
+	if (IsMVCCSnapshot(snapshot))
+		sz = add_size(sz, EstimateSnapshotSpace(snapshot));
+	else
+		Assert(snapshot == SnapshotAny);
+
+	sz = add_size(sz, rel->rd_tableam->parallelscan_estimate(rel));
+
+	return sz;
+}
+
+void
+table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
+							  Snapshot snapshot)
+{
+	Size		snapshot_off = rel->rd_tableam->parallelscan_initialize(rel, pscan);
+
+	pscan->phs_snapshot_off = snapshot_off;
+
+	if (IsMVCCSnapshot(snapshot))
+	{
+		SerializeSnapshot(snapshot, (char *) pscan + pscan->phs_snapshot_off);
+		pscan->phs_snapshot_any = false;
+	}
+	else
+	{
+		Assert(snapshot == SnapshotAny);
+		pscan->phs_snapshot_any = true;
+	}
+}
+
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc parallel_scan)
+{
+	Snapshot	snapshot;
+
+	Assert(RelationGetRelid(relation) == parallel_scan->phs_relid);
+
+	if (!parallel_scan->phs_snapshot_any)
+	{
+		/* Snapshot was serialized -- restore it */
+		snapshot = RestoreSnapshot((char *) parallel_scan +
+								   parallel_scan->phs_snapshot_off);
+		RegisterSnapshot(snapshot);
+	}
+	else
+	{
+		/* SnapshotAny passed by caller (not serialized) */
+		snapshot = SnapshotAny;
+	}
+
+	return relation->rd_tableam->scan_begin(relation, snapshot, 0, NULL, parallel_scan,
+											true, true, true, false, false, !parallel_scan->phs_snapshot_any);
+}
+
+
+/* ----------------------------------------------------------------------------
+ * Helper functions to implement parallel scans for block oriented AMs.
+ * ----------------------------------------------------------------------------
+ */
+
+Size
+table_block_parallelscan_estimate(Relation rel)
+{
+	return sizeof(ParallelBlockTableScanDescData);
+}
+
+Size
+table_block_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan)
+{
+	ParallelBlockTableScanDesc bpscan = (ParallelBlockTableScanDesc) pscan;
+
+	bpscan->base.phs_relid = RelationGetRelid(rel);
+	bpscan->phs_nblocks = RelationGetNumberOfBlocks(rel);
+	/* compare phs_syncscan initialization to similar logic in initscan */
+	bpscan->base.phs_syncscan = synchronize_seqscans &&
+		!RelationUsesLocalBuffers(rel) &&
+		bpscan->phs_nblocks > NBuffers / 4;
+	SpinLockInit(&bpscan->phs_mutex);
+	bpscan->phs_startblock = InvalidBlockNumber;
+	pg_atomic_init_u64(&bpscan->phs_nallocated, 0);
+
+	return sizeof(ParallelBlockTableScanDescData);
+}
+
+void
+table_block_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
+{
+	ParallelBlockTableScanDesc bpscan = (ParallelBlockTableScanDesc) pscan;
+
+	pg_atomic_write_u64(&bpscan->phs_nallocated, 0);
+}
+
+/*
+ * find and set the scan's startblock
+ *
+ * Determine where the parallel seq scan should start.  This function may be
+ * called many times, once by each parallel worker.  We must be careful only
+ * to set the startblock once.
+ */
+void
+table_block_parallelscan_startblock_init(Relation rel, ParallelBlockTableScanDesc pbscan)
+{
+	BlockNumber sync_startpage = InvalidBlockNumber;
+
+retry:
+	/* Grab the spinlock. */
+	SpinLockAcquire(&pbscan->phs_mutex);
+
+	/*
+	 * If the scan's startblock has not yet been initialized, we must do so
+	 * now.  If this is not a synchronized scan, we just start at block 0, but
+	 * if it is a synchronized scan, we must get the starting position from
+	 * the synchronized scan machinery.  We can't hold the spinlock while
+	 * doing that, though, so release the spinlock, get the information we
+	 * need, and retry.  If nobody else has initialized the scan in the
+	 * meantime, we'll fill in the value we fetched on the second time
+	 * through.
+	 */
+	if (pbscan->phs_startblock == InvalidBlockNumber)
+	{
+		if (!pbscan->base.phs_syncscan)
+			pbscan->phs_startblock = 0;
+		else if (sync_startpage != InvalidBlockNumber)
+			pbscan->phs_startblock = sync_startpage;
+		else
+		{
+			SpinLockRelease(&pbscan->phs_mutex);
+			sync_startpage = ss_get_location(rel, pbscan->phs_nblocks);
+			goto retry;
+		}
+	}
+	SpinLockRelease(&pbscan->phs_mutex);
+}
+
+/*
+ * get the next page to scan
+ *
+ * Get the next page to scan.  Even if there are no pages left to scan,
+ * another backend could have grabbed a page to scan and not yet finished
+ * looking at it, so it doesn't follow that the scan is done when the first
+ * backend gets an InvalidBlockNumber return.
+ */
+BlockNumber
+table_block_parallelscan_nextpage(Relation rel, ParallelBlockTableScanDesc pbscan)
+{
+	BlockNumber page;
+	uint64		nallocated;
+
+	/*
+	 * phs_nallocated tracks how many pages have been allocated to workers
+	 * already.  When phs_nallocated >= rs_nblocks, all blocks have been
+	 * allocated.
+	 *
+	 * Because we use an atomic fetch-and-add to fetch the current value, the
+	 * phs_nallocated counter will exceed rs_nblocks, because workers will
+	 * still increment the value, when they try to allocate the next block but
+	 * all blocks have been allocated already. The counter must be 64 bits
+	 * wide because of that, to avoid wrapping around when rs_nblocks is close
+	 * to 2^32.
+	 *
+	 * The actual page to return is calculated by adding the counter to the
+	 * starting block number, modulo nblocks.
+	 */
+	nallocated = pg_atomic_fetch_add_u64(&pbscan->phs_nallocated, 1);
+	if (nallocated >= pbscan->phs_nblocks)
+		page = InvalidBlockNumber;	/* all blocks have been allocated */
+	else
+		page = (nallocated + pbscan->phs_startblock) % pbscan->phs_nblocks;
+
+	/*
+	 * Report scan location.  Normally, we report the current page number.
+	 * When we reach the end of the scan, though, we report the starting page,
+	 * not the ending page, just so the starting positions for later scans
+	 * doesn't slew backwards.  We only report the position at the end of the
+	 * scan once, though: subsequent callers will report nothing.
+	 */
+	if (pbscan->base.phs_syncscan)
+	{
+		if (page != InvalidBlockNumber)
+			ss_report_location(rel, page);
+		else if (nallocated == pbscan->phs_nblocks)
+			ss_report_location(rel, pbscan->phs_startblock);
+	}
+
+	return page;
+}
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index d49607e7f85..48981427d19 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -44,6 +44,26 @@ GetTableAmRoutine(Oid amhandler)
 		elog(ERROR, "Table access method handler %u did not return a TableAmRoutine struct",
 			 amhandler);
 
+	/*
+	 * Assert that all required callbacks are present. That makes it a bit
+	 * easier to keep AMs up to date, e.g. when forward porting them to a new
+	 * major version.
+	 */
+	Assert(routine->scan_begin != NULL);
+	Assert(routine->scan_end != NULL);
+	Assert(routine->scan_rescan != NULL);
+
+	Assert(routine->parallelscan_estimate != NULL);
+	Assert(routine->parallelscan_initialize != NULL);
+	Assert(routine->parallelscan_reinitialize != NULL);
+
+	Assert(routine->index_fetch_begin != NULL);
+	Assert(routine->index_fetch_reset != NULL);
+	Assert(routine->index_fetch_end != NULL);
+	Assert(routine->index_fetch_tuple != NULL);
+
+	Assert(routine->tuple_satisfies_snapshot != NULL);
+
 	return routine;
 }
 
@@ -98,7 +118,7 @@ get_table_am_oid(const char *tableamname, bool missing_ok)
 {
 	Oid			result;
 	Relation	rel;
-	HeapScanDesc scandesc;
+	TableScanDesc scandesc;
 	HeapTuple	tuple;
 	ScanKeyData entry[1];
 
@@ -113,7 +133,7 @@ get_table_am_oid(const char *tableamname, bool missing_ok)
 				Anum_pg_am_amname,
 				BTEqualStrategyNumber, F_NAMEEQ,
 				CStringGetDatum(tableamname));
-	scandesc = heap_beginscan_catalog(rel, 1, entry);
+	scandesc = table_beginscan_catalog(rel, 1, entry);
 	tuple = heap_getnext(scandesc, ForwardScanDirection);
 
 	/* We assume that there can be at most one matching tuple */
@@ -123,7 +143,7 @@ get_table_am_oid(const char *tableamname, bool missing_ok)
 	else
 		result = InvalidOid;
 
-	heap_endscan(scandesc);
+	table_endscan(scandesc);
 	heap_close(rel, AccessShareLock);
 
 	if (!OidIsValid(result) && !missing_ok)
diff --git a/src/backend/access/tablesample/system.c b/src/backend/access/tablesample/system.c
index 298e0ab4a09..fe62a73341e 100644
--- a/src/backend/access/tablesample/system.c
+++ b/src/backend/access/tablesample/system.c
@@ -180,7 +180,8 @@ static BlockNumber
 system_nextsampleblock(SampleScanState *node)
 {
 	SystemSamplerData *sampler = (SystemSamplerData *) node->tsm_state;
-	HeapScanDesc scan = node->ss.ss_currentScanDesc;
+	TableScanDesc scan = node->ss.ss_currentScanDesc;
+	HeapScanDesc hscan = (HeapScanDesc) scan;
 	BlockNumber nextblock = sampler->nextblock;
 	uint32		hashinput[2];
 
@@ -199,7 +200,7 @@ system_nextsampleblock(SampleScanState *node)
 	 * Loop over block numbers until finding suitable block or reaching end of
 	 * relation.
 	 */
-	for (; nextblock < scan->rs_nblocks; nextblock++)
+	for (; nextblock < hscan->rs_nblocks; nextblock++)
 	{
 		uint32		hash;
 
@@ -211,7 +212,7 @@ system_nextsampleblock(SampleScanState *node)
 			break;
 	}
 
-	if (nextblock < scan->rs_nblocks)
+	if (nextblock < hscan->rs_nblocks)
 	{
 		/* Found a suitable block; remember where we should start next time */
 		sampler->nextblock = nextblock + 1;
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 4d7ed8ad1a7..d8776e192ea 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -20,6 +20,7 @@
 #include "access/genam.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "access/xlog_internal.h"
 #include "bootstrap/bootstrap.h"
@@ -594,7 +595,7 @@ boot_openrel(char *relname)
 	int			i;
 	struct typmap **app;
 	Relation	rel;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	HeapTuple	tup;
 
 	if (strlen(relname) >= NAMEDATALEN)
@@ -604,16 +605,16 @@ boot_openrel(char *relname)
 	{
 		/* We can now load the pg_type data */
 		rel = table_open(TypeRelationId, NoLock);
-		scan = heap_beginscan_catalog(rel, 0, NULL);
+		scan = table_beginscan_catalog(rel, 0, NULL);
 		i = 0;
 		while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
 			++i;
-		heap_endscan(scan);
+		table_endscan(scan);
 		app = Typ = ALLOC(struct typmap *, i + 1);
 		while (i-- > 0)
 			*app++ = ALLOC(struct typmap, 1);
 		*app = NULL;
-		scan = heap_beginscan_catalog(rel, 0, NULL);
+		scan = table_beginscan_catalog(rel, 0, NULL);
 		app = Typ;
 		while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
 		{
@@ -623,7 +624,7 @@ boot_openrel(char *relname)
 				   sizeof((*app)->am_typ));
 			app++;
 		}
-		heap_endscan(scan);
+		table_endscan(scan);
 		table_close(rel, NoLock);
 	}
 
@@ -915,7 +916,7 @@ gettype(char *type)
 {
 	int			i;
 	Relation	rel;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	HeapTuple	tup;
 	struct typmap **app;
 
@@ -939,16 +940,16 @@ gettype(char *type)
 		}
 		elog(DEBUG4, "external type: %s", type);
 		rel = table_open(TypeRelationId, NoLock);
-		scan = heap_beginscan_catalog(rel, 0, NULL);
+		scan = table_beginscan_catalog(rel, 0, NULL);
 		i = 0;
 		while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
 			++i;
-		heap_endscan(scan);
+		table_endscan(scan);
 		app = Typ = ALLOC(struct typmap *, i + 1);
 		while (i-- > 0)
 			*app++ = ALLOC(struct typmap, 1);
 		*app = NULL;
-		scan = heap_beginscan_catalog(rel, 0, NULL);
+		scan = table_beginscan_catalog(rel, 0, NULL);
 		app = Typ;
 		while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
 		{
@@ -957,7 +958,7 @@ gettype(char *type)
 					(char *) GETSTRUCT(tup),
 					sizeof((*app)->am_typ));
 		}
-		heap_endscan(scan);
+		table_endscan(scan);
 		table_close(rel, NoLock);
 		return gettype(type);
 	}
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 11ddce2a8b5..a600f43a675 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -21,6 +21,7 @@
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/catalog.h"
@@ -821,7 +822,7 @@ objectsInSchemaToOids(ObjectType objtype, List *nspnames)
 					ScanKeyData key[2];
 					int			keycount;
 					Relation	rel;
-					HeapScanDesc scan;
+					TableScanDesc scan;
 					HeapTuple	tuple;
 
 					keycount = 0;
@@ -843,7 +844,7 @@ objectsInSchemaToOids(ObjectType objtype, List *nspnames)
 									CharGetDatum(PROKIND_PROCEDURE));
 
 					rel = table_open(ProcedureRelationId, AccessShareLock);
-					scan = heap_beginscan_catalog(rel, keycount, key);
+					scan = table_beginscan_catalog(rel, keycount, key);
 
 					while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
 					{
@@ -852,7 +853,7 @@ objectsInSchemaToOids(ObjectType objtype, List *nspnames)
 						objects = lappend_oid(objects, oid);
 					}
 
-					heap_endscan(scan);
+					table_endscan(scan);
 					table_close(rel, AccessShareLock);
 				}
 				break;
@@ -877,7 +878,7 @@ getRelationsInNamespace(Oid namespaceId, char relkind)
 	List	   *relations = NIL;
 	ScanKeyData key[2];
 	Relation	rel;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	HeapTuple	tuple;
 
 	ScanKeyInit(&key[0],
@@ -890,7 +891,7 @@ getRelationsInNamespace(Oid namespaceId, char relkind)
 				CharGetDatum(relkind));
 
 	rel = table_open(RelationRelationId, AccessShareLock);
-	scan = heap_beginscan_catalog(rel, 2, key);
+	scan = table_beginscan_catalog(rel, 2, key);
 
 	while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
 	{
@@ -899,7 +900,7 @@ getRelationsInNamespace(Oid namespaceId, char relkind)
 		relations = lappend_oid(relations, oid);
 	}
 
-	heap_endscan(scan);
+	table_endscan(scan);
 	table_close(rel, AccessShareLock);
 
 	return relations;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 1ee1ed28946..ff1a18c4d4e 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -27,6 +27,7 @@
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "access/sysattr.h"
 #include "access/transam.h"
 #include "access/visibilitymap.h"
@@ -2138,7 +2139,7 @@ index_update_stats(Relation rel,
 		ReindexIsProcessingHeap(RelationRelationId))
 	{
 		/* don't assume syscache will work */
-		HeapScanDesc pg_class_scan;
+		TableScanDesc pg_class_scan;
 		ScanKeyData key[1];
 
 		ScanKeyInit(&key[0],
@@ -2146,10 +2147,10 @@ index_update_stats(Relation rel,
 					BTEqualStrategyNumber, F_OIDEQ,
 					ObjectIdGetDatum(relid));
 
-		pg_class_scan = heap_beginscan_catalog(pg_class, 1, key);
+		pg_class_scan = table_beginscan_catalog(pg_class, 1, key);
 		tuple = heap_getnext(pg_class_scan, ForwardScanDirection);
 		tuple = heap_copytuple(tuple);
-		heap_endscan(pg_class_scan);
+		table_endscan(pg_class_scan);
 	}
 	else
 	{
@@ -2431,7 +2432,7 @@ IndexBuildHeapScan(Relation heapRelation,
 				   bool allow_sync,
 				   IndexBuildCallback callback,
 				   void *callback_state,
-				   HeapScanDesc scan)
+				   TableScanDesc scan)
 {
 	return IndexBuildHeapRangeScan(heapRelation, indexRelation,
 								   indexInfo, allow_sync,
@@ -2460,8 +2461,9 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 						BlockNumber numblocks,
 						IndexBuildCallback callback,
 						void *callback_state,
-						HeapScanDesc scan)
+						TableScanDesc scan)
 {
+	HeapScanDesc hscan;
 	bool		is_system_catalog;
 	bool		checking_uniqueness;
 	HeapTuple	heapTuple;
@@ -2502,8 +2504,7 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 	 */
 	estate = CreateExecutorState();
 	econtext = GetPerTupleExprContext(estate);
-	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
-									&TTSOpsHeapTuple);
+	slot = table_slot_create(heapRelation, NULL);
 
 	/* Arrange for econtext's scan tuple to be the tuple under test */
 	econtext->ecxt_scantuple = slot;
@@ -2540,12 +2541,12 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 		else
 			snapshot = SnapshotAny;
 
-		scan = heap_beginscan_strat(heapRelation,	/* relation */
-									snapshot,	/* snapshot */
-									0,	/* number of keys */
-									NULL,	/* scan key */
-									true,	/* buffer access strategy OK */
-									allow_sync);	/* syncscan OK? */
+		scan = table_beginscan_strat(heapRelation,	/* relation */
+									 snapshot,	/* snapshot */
+									 0,	/* number of keys */
+									 NULL,	/* scan key */
+									 true,	/* buffer access strategy OK */
+									 allow_sync);	/* syncscan OK? */
 	}
 	else
 	{
@@ -2561,6 +2562,8 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 		snapshot = scan->rs_snapshot;
 	}
 
+	hscan = (HeapScanDesc) scan;
+
 	/*
 	 * Must call GetOldestXmin() with SnapshotAny.  Should never call
 	 * GetOldestXmin() with MVCC snapshot. (It's especially worth checking
@@ -2618,15 +2621,15 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 		 * tuple per HOT-chain --- else we could create more than one index
 		 * entry pointing to the same root tuple.
 		 */
-		if (scan->rs_cblock != root_blkno)
+		if (hscan->rs_cblock != root_blkno)
 		{
-			Page		page = BufferGetPage(scan->rs_cbuf);
+			Page		page = BufferGetPage(hscan->rs_cbuf);
 
-			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
 			heap_get_root_tuples(page, root_offsets);
-			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 
-			root_blkno = scan->rs_cblock;
+			root_blkno = hscan->rs_cblock;
 		}
 
 		if (snapshot == SnapshotAny)
@@ -2643,7 +2646,7 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 			 * be conservative about it.  (This remark is still correct even
 			 * with HOT-pruning: our pin on the buffer prevents pruning.)
 			 */
-			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
 
 			/*
 			 * The criteria for counting a tuple as live in this block need to
@@ -2652,7 +2655,7 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 			 * values, e.g. when there are many recently-dead tuples.
 			 */
 			switch (HeapTupleSatisfiesVacuum(heapTuple, OldestXmin,
-											 scan->rs_cbuf))
+											 hscan->rs_cbuf))
 			{
 				case HEAPTUPLE_DEAD:
 					/* Definitely dead, we can ignore it */
@@ -2733,7 +2736,7 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 							/*
 							 * Must drop the lock on the buffer before we wait
 							 */
-							LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+							LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 							XactLockTableWait(xwait, heapRelation,
 											  &heapTuple->t_self,
 											  XLTW_InsertIndexUnique);
@@ -2800,7 +2803,7 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 							/*
 							 * Must drop the lock on the buffer before we wait
 							 */
-							LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+							LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 							XactLockTableWait(xwait, heapRelation,
 											  &heapTuple->t_self,
 											  XLTW_InsertIndexUnique);
@@ -2852,7 +2855,7 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 					break;
 			}
 
-			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 
 			if (!indexIt)
 				continue;
@@ -2867,7 +2870,7 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 		MemoryContextReset(econtext->ecxt_per_tuple_memory);
 
 		/* Set up for predicate or expression evaluation */
-		ExecStoreHeapTuple(heapTuple, slot, false);
+		ExecStoreBufferHeapTuple(heapTuple, slot, hscan->rs_cbuf);
 
 		/*
 		 * In a partial index, discard tuples that don't satisfy the
@@ -2931,7 +2934,7 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 		}
 	}
 
-	heap_endscan(scan);
+	table_endscan(scan);
 
 	/* we can now forget our snapshot, if set and registered by us */
 	if (need_unregister_snapshot)
@@ -2966,8 +2969,7 @@ IndexCheckExclusion(Relation heapRelation,
 					Relation indexRelation,
 					IndexInfo *indexInfo)
 {
-	HeapScanDesc scan;
-	HeapTuple	heapTuple;
+	TableScanDesc scan;
 	Datum		values[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
 	ExprState  *predicate;
@@ -2990,8 +2992,7 @@ IndexCheckExclusion(Relation heapRelation,
 	 */
 	estate = CreateExecutorState();
 	econtext = GetPerTupleExprContext(estate);
-	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
-									&TTSOpsHeapTuple);
+	slot = table_slot_create(heapRelation, NULL);
 
 	/* Arrange for econtext's scan tuple to be the tuple under test */
 	econtext->ecxt_scantuple = slot;
@@ -3003,22 +3004,17 @@ IndexCheckExclusion(Relation heapRelation,
 	 * Scan all live tuples in the base relation.
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = heap_beginscan_strat(heapRelation,	/* relation */
-								snapshot,	/* snapshot */
-								0,	/* number of keys */
-								NULL,	/* scan key */
-								true,	/* buffer access strategy OK */
-								true);	/* syncscan OK */
+	scan = table_beginscan_strat(heapRelation,	/* relation */
+								 snapshot,	/* snapshot */
+								 0, /* number of keys */
+								 NULL,	/* scan key */
+								 true,	/* buffer access strategy OK */
+								 true); /* syncscan OK */
 
-	while ((heapTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 	{
 		CHECK_FOR_INTERRUPTS();
 
-		MemoryContextReset(econtext->ecxt_per_tuple_memory);
-
-		/* Set up for predicate or expression evaluation */
-		ExecStoreHeapTuple(heapTuple, slot, false);
-
 		/*
 		 * In a partial index, ignore tuples that don't satisfy the predicate.
 		 */
@@ -3042,11 +3038,13 @@ IndexCheckExclusion(Relation heapRelation,
 		 */
 		check_exclusion_constraint(heapRelation,
 								   indexRelation, indexInfo,
-								   &(heapTuple->t_self), values, isnull,
+								   &(slot->tts_tid), values, isnull,
 								   estate, true);
+
+		MemoryContextReset(econtext->ecxt_per_tuple_memory);
 	}
 
-	heap_endscan(scan);
+	table_endscan(scan);
 	UnregisterSnapshot(snapshot);
 
 	ExecDropSingleTupleTableSlot(slot);
@@ -3281,7 +3279,8 @@ validate_index_heapscan(Relation heapRelation,
 						Snapshot snapshot,
 						v_i_state *state)
 {
-	HeapScanDesc scan;
+	TableScanDesc scan;
+	HeapScanDesc hscan;
 	HeapTuple	heapTuple;
 	Datum		values[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
@@ -3324,12 +3323,13 @@ validate_index_heapscan(Relation heapRelation,
 	 * here, because it's critical that we read from block zero forward to
 	 * match the sorted TIDs.
 	 */
-	scan = heap_beginscan_strat(heapRelation,	/* relation */
-								snapshot,	/* snapshot */
-								0,	/* number of keys */
-								NULL,	/* scan key */
-								true,	/* buffer access strategy OK */
-								false); /* syncscan not OK */
+	scan = table_beginscan_strat(heapRelation,	/* relation */
+								 snapshot,	/* snapshot */
+								 0,	/* number of keys */
+								 NULL,	/* scan key */
+								 true,	/* buffer access strategy OK */
+								 false); /* syncscan not OK */
+	hscan = (HeapScanDesc) scan;
 
 	/*
 	 * Scan all tuples matching the snapshot.
@@ -3358,17 +3358,17 @@ validate_index_heapscan(Relation heapRelation,
 		 * already-passed-over tuplesort output TIDs of the current page. We
 		 * clear that array here, when advancing onto a new heap page.
 		 */
-		if (scan->rs_cblock != root_blkno)
+		if (hscan->rs_cblock != root_blkno)
 		{
-			Page		page = BufferGetPage(scan->rs_cbuf);
+			Page		page = BufferGetPage(hscan->rs_cbuf);
 
-			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
 			heap_get_root_tuples(page, root_offsets);
-			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 
 			memset(in_index, 0, sizeof(in_index));
 
-			root_blkno = scan->rs_cblock;
+			root_blkno = hscan->rs_cblock;
 		}
 
 		/* Convert actual tuple TID to root TID */
@@ -3493,7 +3493,7 @@ validate_index_heapscan(Relation heapRelation,
 		}
 	}
 
-	heap_endscan(scan);
+	table_endscan(scan);
 
 	ExecDropSingleTupleTableSlot(slot);
 
diff --git a/src/backend/catalog/pg_conversion.c b/src/backend/catalog/pg_conversion.c
index a3bd8c2c152..b7c7e5e1a74 100644
--- a/src/backend/catalog/pg_conversion.c
+++ b/src/backend/catalog/pg_conversion.c
@@ -16,6 +16,7 @@
 
 #include "access/heapam.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/sysattr.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
@@ -152,7 +153,7 @@ RemoveConversionById(Oid conversionOid)
 {
 	Relation	rel;
 	HeapTuple	tuple;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	ScanKeyData scanKeyData;
 
 	ScanKeyInit(&scanKeyData,
@@ -163,14 +164,14 @@ RemoveConversionById(Oid conversionOid)
 	/* open pg_conversion */
 	rel = table_open(ConversionRelationId, RowExclusiveLock);
 
-	scan = heap_beginscan_catalog(rel, 1, &scanKeyData);
+	scan = table_beginscan_catalog(rel, 1, &scanKeyData);
 
 	/* search for the target tuple */
 	if (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
 		CatalogTupleDelete(rel, &tuple->t_self);
 	else
 		elog(ERROR, "could not find tuple for conversion %u", conversionOid);
-	heap_endscan(scan);
+	table_endscan(scan);
 	table_close(rel, RowExclusiveLock);
 }
 
diff --git a/src/backend/catalog/pg_db_role_setting.c b/src/backend/catalog/pg_db_role_setting.c
index 5189c6f7a5f..20acac2eea9 100644
--- a/src/backend/catalog/pg_db_role_setting.c
+++ b/src/backend/catalog/pg_db_role_setting.c
@@ -13,6 +13,7 @@
 #include "access/genam.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "catalog/indexing.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_db_role_setting.h"
@@ -169,7 +170,7 @@ void
 DropSetting(Oid databaseid, Oid roleid)
 {
 	Relation	relsetting;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	ScanKeyData keys[2];
 	HeapTuple	tup;
 	int			numkeys = 0;
@@ -195,12 +196,12 @@ DropSetting(Oid databaseid, Oid roleid)
 		numkeys++;
 	}
 
-	scan = heap_beginscan_catalog(relsetting, numkeys, keys);
+	scan = table_beginscan_catalog(relsetting, numkeys, keys);
 	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
 	{
 		CatalogTupleDelete(relsetting, &tup->t_self);
 	}
-	heap_endscan(scan);
+	table_endscan(scan);
 
 	table_close(relsetting, RowExclusiveLock);
 }
diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
index 96f92750728..1c322655e45 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -21,6 +21,7 @@
 #include "access/hash.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 
 #include "catalog/catalog.h"
@@ -329,7 +330,7 @@ GetAllTablesPublicationRelations(void)
 {
 	Relation	classRel;
 	ScanKeyData key[1];
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	HeapTuple	tuple;
 	List	   *result = NIL;
 
@@ -340,7 +341,7 @@ GetAllTablesPublicationRelations(void)
 				BTEqualStrategyNumber, F_CHAREQ,
 				CharGetDatum(RELKIND_RELATION));
 
-	scan = heap_beginscan_catalog(classRel, 1, key);
+	scan = table_beginscan_catalog(classRel, 1, key);
 
 	while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
 	{
@@ -351,7 +352,7 @@ GetAllTablesPublicationRelations(void)
 			result = lappend_oid(result, relid);
 	}
 
-	heap_endscan(scan);
+	table_endscan(scan);
 	table_close(classRel, AccessShareLock);
 
 	return result;
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 935d7670e42..afee2838cc2 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -19,6 +19,7 @@
 #include "access/genam.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 
 #include "catalog/indexing.h"
@@ -390,7 +391,7 @@ void
 RemoveSubscriptionRel(Oid subid, Oid relid)
 {
 	Relation	rel;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	ScanKeyData skey[2];
 	HeapTuple	tup;
 	int			nkeys = 0;
@@ -416,12 +417,12 @@ RemoveSubscriptionRel(Oid subid, Oid relid)
 	}
 
 	/* Do the search and delete what we found. */
-	scan = heap_beginscan_catalog(rel, nkeys, skey);
+	scan = table_beginscan_catalog(rel, nkeys, skey);
 	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
 	{
 		CatalogTupleDelete(rel, &tup->t_self);
 	}
-	heap_endscan(scan);
+	table_endscan(scan);
 
 	table_close(rel, RowExclusiveLock);
 }
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 4d6453d9241..3e2a807640f 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -22,6 +22,7 @@
 #include "access/multixact.h"
 #include "access/relscan.h"
 #include "access/rewriteheap.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/tuptoaster.h"
 #include "access/xact.h"
@@ -764,6 +765,7 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 	Datum	   *values;
 	bool	   *isnull;
 	IndexScanDesc indexScan;
+	TableScanDesc tableScan;
 	HeapScanDesc heapScan;
 	bool		use_wal;
 	bool		is_system_catalog;
@@ -779,6 +781,8 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 	BlockNumber num_pages;
 	int			elevel = verbose ? INFO : DEBUG2;
 	PGRUsage	ru0;
+	TupleTableSlot *slot;
+	BufferHeapTupleTableSlot *hslot;
 
 	pg_rusage_init(&ru0);
 
@@ -924,16 +928,21 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 	 */
 	if (OldIndex != NULL && !use_sort)
 	{
+		tableScan = NULL;
 		heapScan = NULL;
 		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
 	{
-		heapScan = heap_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 	}
 
+	slot = table_slot_create(OldHeap, NULL);
+	hslot = (BufferHeapTupleTableSlot *) slot;
+
 	/* Log what we're doing */
 	if (indexScan != NULL)
 		ereport(elevel,
@@ -968,19 +977,19 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 
 		if (indexScan != NULL)
 		{
-			tuple = index_getnext(indexScan, ForwardScanDirection);
-			if (tuple == NULL)
+			if (!index_getnext_slot(indexScan, ForwardScanDirection, slot))
 				break;
 
 			/* Since we used no scan keys, should never need to recheck */
 			if (indexScan->xs_recheck)
 				elog(ERROR, "CLUSTER does not support lossy index conditions");
 
-			buf = indexScan->xs_cbuf;
+			tuple = hslot->base.tuple;
+			buf = hslot->buffer;
 		}
 		else
 		{
-			tuple = heap_getnext(heapScan, ForwardScanDirection);
+			tuple = heap_getnext(tableScan, ForwardScanDirection);
 			if (tuple == NULL)
 				break;
 
@@ -1066,7 +1075,9 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 	if (indexScan != NULL)
 		index_endscan(indexScan);
 	if (heapScan != NULL)
-		heap_endscan(heapScan);
+		table_endscan(tableScan);
+	if (slot)
+		ExecDropSingleTupleTableSlot(slot);
 
 	/*
 	 * In scan-and-sort mode, complete the sort, then read out all live tuples
@@ -1694,7 +1705,7 @@ static List *
 get_tables_to_cluster(MemoryContext cluster_context)
 {
 	Relation	indRelation;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	ScanKeyData entry;
 	HeapTuple	indexTuple;
 	Form_pg_index index;
@@ -1713,7 +1724,7 @@ get_tables_to_cluster(MemoryContext cluster_context)
 				Anum_pg_index_indisclustered,
 				BTEqualStrategyNumber, F_BOOLEQ,
 				BoolGetDatum(true));
-	scan = heap_beginscan_catalog(indRelation, 1, &entry);
+	scan = table_beginscan_catalog(indRelation, 1, &entry);
 	while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
 	{
 		index = (Form_pg_index) GETSTRUCT(indexTuple);
@@ -1734,7 +1745,7 @@ get_tables_to_cluster(MemoryContext cluster_context)
 
 		MemoryContextSwitchTo(old_context);
 	}
-	heap_endscan(scan);
+	table_endscan(scan);
 
 	relation_close(indRelation, AccessShareLock);
 
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index f9ada29af84..b285bc9fe5e 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -15,6 +15,7 @@
 
 #include "access/genam.h"
 #include "access/heapam.h"
+#include "access/tableam.h"
 #include "catalog/index.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
@@ -41,7 +42,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 {
 	TriggerData *trigdata = castNode(TriggerData, fcinfo->context);
 	const char *funcname = "unique_key_recheck";
-	HeapTuple	new_row;
+	ItemPointerData checktid;
 	ItemPointerData tmptid;
 	Relation	indexRel;
 	IndexInfo  *indexInfo;
@@ -73,28 +74,30 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 * Get the new data that was inserted/updated.
 	 */
 	if (TRIGGER_FIRED_BY_INSERT(trigdata->tg_event))
-		new_row = trigdata->tg_trigtuple;
+		checktid = trigdata->tg_trigslot->tts_tid;
 	else if (TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event))
-		new_row = trigdata->tg_newtuple;
+		checktid = trigdata->tg_newslot->tts_tid;
 	else
 	{
 		ereport(ERROR,
 				(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
 				 errmsg("function \"%s\" must be fired for INSERT or UPDATE",
 						funcname)));
-		new_row = NULL;			/* keep compiler quiet */
+		ItemPointerSetInvalid(&checktid);		/* keep compiler quiet */
 	}
 
+	slot = table_slot_create(trigdata->tg_relation, NULL);
+
 	/*
-	 * If the new_row is now dead (ie, inserted and then deleted within our
-	 * transaction), we can skip the check.  However, we have to be careful,
-	 * because this trigger gets queued only in response to index insertions;
-	 * which means it does not get queued for HOT updates.  The row we are
-	 * called for might now be dead, but have a live HOT child, in which case
-	 * we still need to make the check --- effectively, we're applying the
-	 * check against the live child row, although we can use the values from
-	 * this row since by definition all columns of interest to us are the
-	 * same.
+	 * If the row pointed at by checktid is now dead (ie, inserted and then
+	 * deleted within our transaction), we can skip the check.  However, we
+	 * have to be careful, because this trigger gets queued only in response
+	 * to index insertions; which means it does not get queued for HOT
+	 * updates.  The row we are called for might now be dead, but have a live
+	 * HOT child, in which case we still need to make the check ---
+	 * effectively, we're applying the check against the live child row,
+	 * although we can use the values from this row since by definition all
+	 * columns of interest to us are the same.
 	 *
 	 * This might look like just an optimization, because the index AM will
 	 * make this identical test before throwing an error.  But it's actually
@@ -103,13 +106,22 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 * it's possible the index entry has also been marked dead, and even
 	 * removed.
 	 */
-	tmptid = new_row->t_self;
-	if (!heap_hot_search(&tmptid, trigdata->tg_relation, SnapshotSelf, NULL))
+	tmptid = checktid;
 	{
-		/*
-		 * All rows in the HOT chain are dead, so skip the check.
-		 */
-		return PointerGetDatum(NULL);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		bool call_again = false;
+
+		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
+									 &call_again, NULL))
+		{
+			/*
+			 * All rows referenced by the index are dead, so skip the check.
+			 */
+			ExecDropSingleTupleTableSlot(slot);
+			table_index_fetch_end(scan);
+			return PointerGetDatum(NULL);
+		}
+		table_index_fetch_end(scan);
 	}
 
 	/*
@@ -121,14 +133,6 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 						  RowExclusiveLock);
 	indexInfo = BuildIndexInfo(indexRel);
 
-	/*
-	 * The heap tuple must be put into a slot for FormIndexDatum.
-	 */
-	slot = MakeSingleTupleTableSlot(RelationGetDescr(trigdata->tg_relation),
-									&TTSOpsHeapTuple);
-
-	ExecStoreHeapTuple(new_row, slot, false);
-
 	/*
 	 * Typically the index won't have expressions, but if it does we need an
 	 * EState to evaluate them.  We need it for exclusion constraints too,
@@ -163,11 +167,12 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	{
 		/*
 		 * Note: this is not a real insert; it is a check that the index entry
-		 * that has already been inserted is unique.  Passing t_self is
-		 * correct even if t_self is now dead, because that is the TID the
-		 * index will know about.
+		 * that has already been inserted is unique.  Passing the tuple's tid
+		 * (i.e. unmodified by table_index_fetch_tuple()) is correct even if
+		 * the row is now dead, because that is the TID the index will know
+		 * about.
 		 */
-		index_insert(indexRel, values, isnull, &(new_row->t_self),
+		index_insert(indexRel, values, isnull, &checktid,
 					 trigdata->tg_relation, UNIQUE_CHECK_EXISTING,
 					 indexInfo);
 	}
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 12415b4e99f..f2731b40757 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -20,6 +20,7 @@
 
 #include "access/heapam.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/sysattr.h"
 #include "access/xact.h"
 #include "access/xlog.h"
@@ -2073,13 +2074,13 @@ CopyTo(CopyState cstate)
 	{
 		Datum	   *values;
 		bool	   *nulls;
-		HeapScanDesc scandesc;
+		TableScanDesc scandesc;
 		HeapTuple	tuple;
 
 		values = (Datum *) palloc(num_phys_attrs * sizeof(Datum));
 		nulls = (bool *) palloc(num_phys_attrs * sizeof(bool));
 
-		scandesc = heap_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
+		scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
 
 		processed = 0;
 		while ((tuple = heap_getnext(scandesc, ForwardScanDirection)) != NULL)
@@ -2094,7 +2095,7 @@ CopyTo(CopyState cstate)
 			processed++;
 		}
 
-		heap_endscan(scandesc);
+		table_endscan(scandesc);
 
 		pfree(values);
 		pfree(nulls);
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index d207cd899f8..35cad0b6294 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -26,6 +26,7 @@
 #include "access/genam.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
@@ -97,7 +98,7 @@ static int	errdetail_busy_db(int notherbackends, int npreparedxacts);
 Oid
 createdb(ParseState *pstate, const CreatedbStmt *stmt)
 {
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	Relation	rel;
 	Oid			src_dboid;
 	Oid			src_owner;
@@ -589,7 +590,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		 * each one to the new database.
 		 */
 		rel = table_open(TableSpaceRelationId, AccessShareLock);
-		scan = heap_beginscan_catalog(rel, 0, NULL);
+		scan = table_beginscan_catalog(rel, 0, NULL);
 		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
 		{
 			Form_pg_tablespace spaceform = (Form_pg_tablespace) GETSTRUCT(tuple);
@@ -643,7 +644,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 								  XLOG_DBASE_CREATE | XLR_SPECIAL_REL_UPDATE);
 			}
 		}
-		heap_endscan(scan);
+		table_endscan(scan);
 		table_close(rel, AccessShareLock);
 
 		/*
@@ -1870,11 +1871,11 @@ static void
 remove_dbtablespaces(Oid db_id)
 {
 	Relation	rel;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	HeapTuple	tuple;
 
 	rel = table_open(TableSpaceRelationId, AccessShareLock);
-	scan = heap_beginscan_catalog(rel, 0, NULL);
+	scan = table_beginscan_catalog(rel, 0, NULL);
 	while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
 	{
 		Form_pg_tablespace spcform = (Form_pg_tablespace) GETSTRUCT(tuple);
@@ -1917,7 +1918,7 @@ remove_dbtablespaces(Oid db_id)
 		pfree(dstpath);
 	}
 
-	heap_endscan(scan);
+	table_endscan(scan);
 	table_close(rel, AccessShareLock);
 }
 
@@ -1938,11 +1939,11 @@ check_db_file_conflict(Oid db_id)
 {
 	bool		result = false;
 	Relation	rel;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	HeapTuple	tuple;
 
 	rel = table_open(TableSpaceRelationId, AccessShareLock);
-	scan = heap_beginscan_catalog(rel, 0, NULL);
+	scan = table_beginscan_catalog(rel, 0, NULL);
 	while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
 	{
 		Form_pg_tablespace spcform = (Form_pg_tablespace) GETSTRUCT(tuple);
@@ -1967,7 +1968,7 @@ check_db_file_conflict(Oid db_id)
 		pfree(dstpath);
 	}
 
-	heap_endscan(scan);
+	table_endscan(scan);
 	table_close(rel, AccessShareLock);
 
 	return result;
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 5dcedc337aa..7cf18377156 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -19,6 +19,7 @@
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/reloptions.h"
+#include "access/tableam.h"
 #include "access/sysattr.h"
 #include "access/xact.h"
 #include "catalog/catalog.h"
@@ -2336,7 +2337,7 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind,
 {
 	Oid			objectOid;
 	Relation	relationRelation;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	ScanKeyData scan_keys[1];
 	HeapTuple	tuple;
 	MemoryContext private_context;
@@ -2410,7 +2411,7 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind,
 	 * rels will be processed indirectly by reindex_relation).
 	 */
 	relationRelation = table_open(RelationRelationId, AccessShareLock);
-	scan = heap_beginscan_catalog(relationRelation, num_keys, scan_keys);
+	scan = table_beginscan_catalog(relationRelation, num_keys, scan_keys);
 	while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
 	{
 		Form_pg_class classtuple = (Form_pg_class) GETSTRUCT(tuple);
@@ -2469,7 +2470,7 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind,
 
 		MemoryContextSwitchTo(old);
 	}
-	heap_endscan(scan);
+	table_endscan(scan);
 	table_close(relationRelation, AccessShareLock);
 
 	/* Now reindex each rel in a separate transaction */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 93f13a4778c..40839e14dbe 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -4735,12 +4735,9 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	if (newrel || needscan)
 	{
 		ExprContext *econtext;
-		Datum	   *values;
-		bool	   *isnull;
 		TupleTableSlot *oldslot;
 		TupleTableSlot *newslot;
-		HeapScanDesc scan;
-		HeapTuple	tuple;
+		TableScanDesc scan;
 		MemoryContext oldCxt;
 		List	   *dropped_attrs = NIL;
 		ListCell   *lc;
@@ -4768,19 +4765,27 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 		econtext = GetPerTupleExprContext(estate);
 
 		/*
-		 * Make tuple slots for old and new tuples.  Note that even when the
-		 * tuples are the same, the tupDescs might not be (consider ADD COLUMN
-		 * without a default).
+		 * Create necessary tuple slots. When rewriting, two slots are needed,
+		 * otherwise one suffices. In the case where one slot suffices, we
+		 * need to use the new tuple descriptor, otherwise some constraints
+		 * can't be evaluated.  Note that even when the tuple layout is the
+		 * same and no rewrite is required, the tupDescs might not be
+		 * (consider ADD COLUMN without a default).
 		 */
-		oldslot = MakeSingleTupleTableSlot(oldTupDesc, &TTSOpsHeapTuple);
-		newslot = MakeSingleTupleTableSlot(newTupDesc, &TTSOpsHeapTuple);
-
-		/* Preallocate values/isnull arrays */
-		i = Max(newTupDesc->natts, oldTupDesc->natts);
-		values = (Datum *) palloc(i * sizeof(Datum));
-		isnull = (bool *) palloc(i * sizeof(bool));
-		memset(values, 0, i * sizeof(Datum));
-		memset(isnull, true, i * sizeof(bool));
+		if (tab->rewrite)
+		{
+			Assert(newrel != NULL);
+			oldslot = MakeSingleTupleTableSlot(oldTupDesc,
+											   table_slot_callbacks(oldrel));
+			newslot = MakeSingleTupleTableSlot(newTupDesc,
+											   table_slot_callbacks(newrel));
+		}
+		else
+		{
+			oldslot = MakeSingleTupleTableSlot(newTupDesc,
+											   table_slot_callbacks(oldrel));
+			newslot = NULL;
+		}
 
 		/*
 		 * Any attributes that are dropped according to the new tuple
@@ -4798,7 +4803,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = heap_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -4806,55 +4811,69 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 		 */
 		oldCxt = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
 
-		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		while (table_scan_getnextslot(scan, ForwardScanDirection, oldslot))
 		{
+			TupleTableSlot *insertslot;
+
 			if (tab->rewrite > 0)
 			{
 				/* Extract data from old tuple */
-				heap_deform_tuple(tuple, oldTupDesc, values, isnull);
+				slot_getallattrs(oldslot);
+				ExecClearTuple(newslot);
+
+				/* copy attributes */
+				memcpy(newslot->tts_values, oldslot->tts_values,
+					   sizeof(Datum) * oldslot->tts_nvalid);
+				memcpy(newslot->tts_isnull, oldslot->tts_isnull,
+					   sizeof(bool) * oldslot->tts_nvalid);
 
 				/* Set dropped attributes to null in new tuple */
 				foreach(lc, dropped_attrs)
-					isnull[lfirst_int(lc)] = true;
+					newslot->tts_isnull[lfirst_int(lc)] = true;
 
 				/*
 				 * Process supplied expressions to replace selected columns.
 				 * Expression inputs come from the old tuple.
 				 */
-				ExecStoreHeapTuple(tuple, oldslot, false);
 				econtext->ecxt_scantuple = oldslot;
 
 				foreach(l, tab->newvals)
 				{
 					NewColumnValue *ex = lfirst(l);
 
-					values[ex->attnum - 1] = ExecEvalExpr(ex->exprstate,
-														  econtext,
-														  &isnull[ex->attnum - 1]);
+					newslot->tts_values[ex->attnum - 1]
+						= ExecEvalExpr(ex->exprstate,
+									   econtext,
+									   &newslot->tts_isnull[ex->attnum - 1]);
 				}
 
-				/*
-				 * Form the new tuple. Note that we don't explicitly pfree it,
-				 * since the per-tuple memory context will be reset shortly.
-				 */
-				tuple = heap_form_tuple(newTupDesc, values, isnull);
+				ExecStoreVirtualTuple(newslot);
 
 				/*
 				 * Constraints might reference the tableoid column, so
 				 * initialize t_tableOid before evaluating them.
 				 */
-				tuple->t_tableOid = RelationGetRelid(oldrel);
+				newslot->tts_tableOid = RelationGetRelid(oldrel);
+				insertslot = newslot;
+			}
+			else
+			{
+				/*
+				 * If there's no rewrite, old and new table are guaranteed to
+				 * have the same AM, so we can just use the old slot to
+				 * verify new constraints etc.
+				 */
+				insertslot = oldslot;
 			}
 
 			/* Now check any constraints on the possibly-changed tuple */
-			ExecStoreHeapTuple(tuple, newslot, false);
-			econtext->ecxt_scantuple = newslot;
+			econtext->ecxt_scantuple = insertslot;
 
 			foreach(l, notnull_attrs)
 			{
 				int			attn = lfirst_int(l);
 
-				if (heap_attisnull(tuple, attn + 1, newTupDesc))
+				if (slot_attisnull(insertslot, attn + 1))
 				{
 					Form_pg_attribute attr = TupleDescAttr(newTupDesc, attn);
 
@@ -4904,6 +4923,9 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 			/* Write the tuple out to the new relation */
 			if (newrel)
 			{
+				HeapTuple	tuple;
+
+				tuple = ExecFetchSlotHeapTuple(newslot, true, NULL);
 				heap_insert(newrel, tuple, mycid, hi_options, bistate);
 				ItemPointerCopy(&tuple->t_self, &newslot->tts_tid);
 			}
@@ -4914,11 +4936,12 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 		}
 
 		MemoryContextSwitchTo(oldCxt);
-		heap_endscan(scan);
+		table_endscan(scan);
 		UnregisterSnapshot(snapshot);
 
 		ExecDropSingleTupleTableSlot(oldslot);
-		ExecDropSingleTupleTableSlot(newslot);
+		if (newslot)
+			ExecDropSingleTupleTableSlot(newslot);
 	}
 
 	FreeExecutorState(estate);
@@ -5309,7 +5332,7 @@ find_typed_table_dependencies(Oid typeOid, const char *typeName, DropBehavior be
 {
 	Relation	classRel;
 	ScanKeyData key[1];
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	HeapTuple	tuple;
 	List	   *result = NIL;
 
@@ -5320,7 +5343,7 @@ find_typed_table_dependencies(Oid typeOid, const char *typeName, DropBehavior be
 				BTEqualStrategyNumber, F_OIDEQ,
 				ObjectIdGetDatum(typeOid));
 
-	scan = heap_beginscan_catalog(classRel, 1, key);
+	scan = table_beginscan_catalog(classRel, 1, key);
 
 	while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
 	{
@@ -5336,7 +5359,7 @@ find_typed_table_dependencies(Oid typeOid, const char *typeName, DropBehavior be
 			result = lappend_oid(result, classform->oid);
 	}
 
-	heap_endscan(scan);
+	table_endscan(scan);
 	table_close(classRel, AccessShareLock);
 
 	return result;
@@ -8821,9 +8844,7 @@ validateCheckConstraint(Relation rel, HeapTuple constrtup)
 	char	   *conbin;
 	Expr	   *origexpr;
 	ExprState  *exprstate;
-	TupleDesc	tupdesc;
-	HeapScanDesc scan;
-	HeapTuple	tuple;
+	TableScanDesc scan;
 	ExprContext *econtext;
 	MemoryContext oldcxt;
 	TupleTableSlot *slot;
@@ -8858,12 +8879,11 @@ validateCheckConstraint(Relation rel, HeapTuple constrtup)
 	exprstate = ExecPrepareExpr(origexpr, estate);
 
 	econtext = GetPerTupleExprContext(estate);
-	tupdesc = RelationGetDescr(rel);
-	slot = MakeSingleTupleTableSlot(tupdesc, &TTSOpsHeapTuple);
+	slot = table_slot_create(rel, NULL);
 	econtext->ecxt_scantuple = slot;
 
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = heap_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
@@ -8871,10 +8891,8 @@ validateCheckConstraint(Relation rel, HeapTuple constrtup)
 	 */
 	oldcxt = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
 
-	while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 	{
-		ExecStoreHeapTuple(tuple, slot, false);
-
 		if (!ExecCheck(exprstate, econtext))
 			ereport(ERROR,
 					(errcode(ERRCODE_CHECK_VIOLATION),
@@ -8886,7 +8904,7 @@ validateCheckConstraint(Relation rel, HeapTuple constrtup)
 	}
 
 	MemoryContextSwitchTo(oldcxt);
-	heap_endscan(scan);
+	table_endscan(scan);
 	UnregisterSnapshot(snapshot);
 	ExecDropSingleTupleTableSlot(slot);
 	FreeExecutorState(estate);
@@ -8905,8 +8923,8 @@ validateForeignKeyConstraint(char *conname,
 							 Oid pkindOid,
 							 Oid constraintOid)
 {
-	HeapScanDesc scan;
-	HeapTuple	tuple;
+	TupleTableSlot *slot;
+	TableScanDesc scan;
 	Trigger		trig;
 	Snapshot	snapshot;
 
@@ -8941,9 +8959,10 @@ validateForeignKeyConstraint(char *conname,
 	 * ereport(ERROR) and that's that.
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = heap_beginscan(rel, snapshot, 0, NULL);
+	slot = table_slot_create(rel, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL);
 
-	while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 	{
 		LOCAL_FCINFO(fcinfo, 0);
 		TriggerData trigdata;
@@ -8961,7 +8980,8 @@ validateForeignKeyConstraint(char *conname,
 		trigdata.type = T_TriggerData;
 		trigdata.tg_event = TRIGGER_EVENT_INSERT | TRIGGER_EVENT_ROW;
 		trigdata.tg_relation = rel;
-		trigdata.tg_trigtuple = tuple;
+		trigdata.tg_trigtuple = ExecFetchSlotHeapTuple(slot, true, NULL);
+		trigdata.tg_trigslot = slot;
 		trigdata.tg_newtuple = NULL;
 		trigdata.tg_trigger = &trig;
 
@@ -8970,8 +8990,9 @@ validateForeignKeyConstraint(char *conname,
 		RI_FKey_check_ins(fcinfo);
 	}
 
-	heap_endscan(scan);
+	table_endscan(scan);
 	UnregisterSnapshot(snapshot);
+	ExecDropSingleTupleTableSlot(slot);
 }
 
 static void
@@ -11596,7 +11617,7 @@ AlterTableMoveAll(AlterTableMoveAllStmt *stmt)
 	ListCell   *l;
 	ScanKeyData key[1];
 	Relation	rel;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	HeapTuple	tuple;
 	Oid			orig_tablespaceoid;
 	Oid			new_tablespaceoid;
@@ -11661,7 +11682,7 @@ AlterTableMoveAll(AlterTableMoveAllStmt *stmt)
 				ObjectIdGetDatum(orig_tablespaceoid));
 
 	rel = table_open(RelationRelationId, AccessShareLock);
-	scan = heap_beginscan_catalog(rel, 1, key);
+	scan = table_beginscan_catalog(rel, 1, key);
 	while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
 	{
 		Form_pg_class relForm = (Form_pg_class) GETSTRUCT(tuple);
@@ -11720,7 +11741,7 @@ AlterTableMoveAll(AlterTableMoveAllStmt *stmt)
 		relations = lappend_oid(relations, relOid);
 	}
 
-	heap_endscan(scan);
+	table_endscan(scan);
 	table_close(rel, AccessShareLock);
 
 	if (relations == NIL)
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 4afd178e971..3784ea4b4fa 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -54,6 +54,7 @@
 #include "access/reloptions.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
@@ -405,7 +406,7 @@ DropTableSpace(DropTableSpaceStmt *stmt)
 {
 #ifdef HAVE_SYMLINK
 	char	   *tablespacename = stmt->tablespacename;
-	HeapScanDesc scandesc;
+	TableScanDesc scandesc;
 	Relation	rel;
 	HeapTuple	tuple;
 	Form_pg_tablespace spcform;
@@ -421,7 +422,7 @@ DropTableSpace(DropTableSpaceStmt *stmt)
 				Anum_pg_tablespace_spcname,
 				BTEqualStrategyNumber, F_NAMEEQ,
 				CStringGetDatum(tablespacename));
-	scandesc = heap_beginscan_catalog(rel, 1, entry);
+	scandesc = table_beginscan_catalog(rel, 1, entry);
 	tuple = heap_getnext(scandesc, ForwardScanDirection);
 
 	if (!HeapTupleIsValid(tuple))
@@ -439,7 +440,7 @@ DropTableSpace(DropTableSpaceStmt *stmt)
 					(errmsg("tablespace \"%s\" does not exist, skipping",
 							tablespacename)));
 			/* XXX I assume I need one or both of these next two calls */
-			heap_endscan(scandesc);
+			table_endscan(scandesc);
 			table_close(rel, NoLock);
 		}
 		return;
@@ -467,7 +468,7 @@ DropTableSpace(DropTableSpaceStmt *stmt)
 	 */
 	CatalogTupleDelete(rel, &tuple->t_self);
 
-	heap_endscan(scandesc);
+	table_endscan(scandesc);
 
 	/*
 	 * Remove any comments or security labels on this tablespace.
@@ -918,7 +919,7 @@ RenameTableSpace(const char *oldname, const char *newname)
 	Oid			tspId;
 	Relation	rel;
 	ScanKeyData entry[1];
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	HeapTuple	tup;
 	HeapTuple	newtuple;
 	Form_pg_tablespace newform;
@@ -931,7 +932,7 @@ RenameTableSpace(const char *oldname, const char *newname)
 				Anum_pg_tablespace_spcname,
 				BTEqualStrategyNumber, F_NAMEEQ,
 				CStringGetDatum(oldname));
-	scan = heap_beginscan_catalog(rel, 1, entry);
+	scan = table_beginscan_catalog(rel, 1, entry);
 	tup = heap_getnext(scan, ForwardScanDirection);
 	if (!HeapTupleIsValid(tup))
 		ereport(ERROR,
@@ -943,7 +944,7 @@ RenameTableSpace(const char *oldname, const char *newname)
 	newform = (Form_pg_tablespace) GETSTRUCT(newtuple);
 	tspId = newform->oid;
 
-	heap_endscan(scan);
+	table_endscan(scan);
 
 	/* Must be owner */
 	if (!pg_tablespace_ownercheck(tspId, GetUserId()))
@@ -961,7 +962,7 @@ RenameTableSpace(const char *oldname, const char *newname)
 				Anum_pg_tablespace_spcname,
 				BTEqualStrategyNumber, F_NAMEEQ,
 				CStringGetDatum(newname));
-	scan = heap_beginscan_catalog(rel, 1, entry);
+	scan = table_beginscan_catalog(rel, 1, entry);
 	tup = heap_getnext(scan, ForwardScanDirection);
 	if (HeapTupleIsValid(tup))
 		ereport(ERROR,
@@ -969,7 +970,7 @@ RenameTableSpace(const char *oldname, const char *newname)
 				 errmsg("tablespace \"%s\" already exists",
 						newname)));
 
-	heap_endscan(scan);
+	table_endscan(scan);
 
 	/* OK, update the entry */
 	namestrcpy(&(newform->spcname), newname);
@@ -993,7 +994,7 @@ AlterTableSpaceOptions(AlterTableSpaceOptionsStmt *stmt)
 {
 	Relation	rel;
 	ScanKeyData entry[1];
-	HeapScanDesc scandesc;
+	TableScanDesc scandesc;
 	HeapTuple	tup;
 	Oid			tablespaceoid;
 	Datum		datum;
@@ -1011,7 +1012,7 @@ AlterTableSpaceOptions(AlterTableSpaceOptionsStmt *stmt)
 				Anum_pg_tablespace_spcname,
 				BTEqualStrategyNumber, F_NAMEEQ,
 				CStringGetDatum(stmt->tablespacename));
-	scandesc = heap_beginscan_catalog(rel, 1, entry);
+	scandesc = table_beginscan_catalog(rel, 1, entry);
 	tup = heap_getnext(scandesc, ForwardScanDirection);
 	if (!HeapTupleIsValid(tup))
 		ereport(ERROR,
@@ -1053,7 +1054,7 @@ AlterTableSpaceOptions(AlterTableSpaceOptionsStmt *stmt)
 	heap_freetuple(newtuple);
 
 	/* Conclude heap scan. */
-	heap_endscan(scandesc);
+	table_endscan(scandesc);
 	table_close(rel, NoLock);
 
 	return tablespaceoid;
@@ -1387,7 +1388,7 @@ get_tablespace_oid(const char *tablespacename, bool missing_ok)
 {
 	Oid			result;
 	Relation	rel;
-	HeapScanDesc scandesc;
+	TableScanDesc scandesc;
 	HeapTuple	tuple;
 	ScanKeyData entry[1];
 
@@ -1402,7 +1403,7 @@ get_tablespace_oid(const char *tablespacename, bool missing_ok)
 				Anum_pg_tablespace_spcname,
 				BTEqualStrategyNumber, F_NAMEEQ,
 				CStringGetDatum(tablespacename));
-	scandesc = heap_beginscan_catalog(rel, 1, entry);
+	scandesc = table_beginscan_catalog(rel, 1, entry);
 	tuple = heap_getnext(scandesc, ForwardScanDirection);
 
 	/* We assume that there can be at most one matching tuple */
@@ -1411,7 +1412,7 @@ get_tablespace_oid(const char *tablespacename, bool missing_ok)
 	else
 		result = InvalidOid;
 
-	heap_endscan(scandesc);
+	table_endscan(scandesc);
 	table_close(rel, AccessShareLock);
 
 	if (!OidIsValid(result) && !missing_ok)
@@ -1433,7 +1434,7 @@ get_tablespace_name(Oid spc_oid)
 {
 	char	   *result;
 	Relation	rel;
-	HeapScanDesc scandesc;
+	TableScanDesc scandesc;
 	HeapTuple	tuple;
 	ScanKeyData entry[1];
 
@@ -1448,7 +1449,7 @@ get_tablespace_name(Oid spc_oid)
 				Anum_pg_tablespace_oid,
 				BTEqualStrategyNumber, F_OIDEQ,
 				ObjectIdGetDatum(spc_oid));
-	scandesc = heap_beginscan_catalog(rel, 1, entry);
+	scandesc = table_beginscan_catalog(rel, 1, entry);
 	tuple = heap_getnext(scandesc, ForwardScanDirection);
 
 	/* We assume that there can be at most one matching tuple */
@@ -1457,7 +1458,7 @@ get_tablespace_name(Oid spc_oid)
 	else
 		result = NULL;
 
-	heap_endscan(scandesc);
+	table_endscan(scandesc);
 	table_close(rel, AccessShareLock);
 
 	return result;
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 448926db125..f94248dc958 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -34,6 +34,7 @@
 #include "access/genam.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/catalog.h"
@@ -2362,14 +2363,15 @@ AlterDomainNotNull(List *names, bool notNull)
 			RelToCheck *rtc = (RelToCheck *) lfirst(rt);
 			Relation	testrel = rtc->rel;
 			TupleDesc	tupdesc = RelationGetDescr(testrel);
-			HeapScanDesc scan;
-			HeapTuple	tuple;
+			TupleTableSlot *slot;
+			TableScanDesc scan;
 			Snapshot	snapshot;
 
 			/* Scan all tuples in this relation */
 			snapshot = RegisterSnapshot(GetLatestSnapshot());
-			scan = heap_beginscan(testrel, snapshot, 0, NULL);
-			while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+			scan = table_beginscan(testrel, snapshot, 0, NULL);
+			slot = table_slot_create(testrel, NULL);
+			while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 			{
 				int			i;
 
@@ -2379,7 +2381,7 @@ AlterDomainNotNull(List *names, bool notNull)
 					int			attnum = rtc->atts[i];
 					Form_pg_attribute attr = TupleDescAttr(tupdesc, attnum - 1);
 
-					if (heap_attisnull(tuple, attnum, tupdesc))
+					if (slot_attisnull(slot, attnum))
 					{
 						/*
 						 * In principle the auxiliary information for this
@@ -2398,7 +2400,8 @@ AlterDomainNotNull(List *names, bool notNull)
 					}
 				}
 			}
-			heap_endscan(scan);
+			ExecDropSingleTupleTableSlot(slot);
+			table_endscan(scan);
 			UnregisterSnapshot(snapshot);
 
 			/* Close each rel after processing, but keep lock */
@@ -2776,14 +2779,15 @@ validateDomainConstraint(Oid domainoid, char *ccbin)
 		RelToCheck *rtc = (RelToCheck *) lfirst(rt);
 		Relation	testrel = rtc->rel;
 		TupleDesc	tupdesc = RelationGetDescr(testrel);
-		HeapScanDesc scan;
-		HeapTuple	tuple;
+		TupleTableSlot *slot;
+		TableScanDesc scan;
 		Snapshot	snapshot;
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = heap_beginscan(testrel, snapshot, 0, NULL);
-		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		slot = table_slot_create(testrel, NULL);
+		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
 			int			i;
 
@@ -2796,7 +2800,7 @@ validateDomainConstraint(Oid domainoid, char *ccbin)
 				Datum		conResult;
 				Form_pg_attribute attr = TupleDescAttr(tupdesc, attnum - 1);
 
-				d = heap_getattr(tuple, attnum, tupdesc, &isNull);
+				d = slot_getattr(slot, attnum, &isNull);
 
 				econtext->domainValue_datum = d;
 				econtext->domainValue_isNull = isNull;
@@ -2826,7 +2830,8 @@ validateDomainConstraint(Oid domainoid, char *ccbin)
 
 			ResetExprContext(econtext);
 		}
-		heap_endscan(scan);
+		ExecDropSingleTupleTableSlot(slot);
+		table_endscan(scan);
 		UnregisterSnapshot(snapshot);
 
 		/* Hold relation lock till commit (XXX bad for concurrency) */
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index e91df2171e0..3763a8c39e0 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -28,6 +28,7 @@
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/multixact.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "catalog/namespace.h"
@@ -745,12 +746,12 @@ get_all_vacuum_rels(int options)
 {
 	List	   *vacrels = NIL;
 	Relation	pgclass;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	HeapTuple	tuple;
 
 	pgclass = table_open(RelationRelationId, AccessShareLock);
 
-	scan = heap_beginscan_catalog(pgclass, 0, NULL);
+	scan = table_beginscan_catalog(pgclass, 0, NULL);
 
 	while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
 	{
@@ -784,7 +785,7 @@ get_all_vacuum_rels(int options)
 		MemoryContextSwitchTo(oldcontext);
 	}
 
-	heap_endscan(scan);
+	table_endscan(scan);
 	table_close(pgclass, AccessShareLock);
 
 	return vacrels;
@@ -1381,7 +1382,7 @@ vac_truncate_clog(TransactionId frozenXID,
 {
 	TransactionId nextXID = ReadNewTransactionId();
 	Relation	relation;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	HeapTuple	tuple;
 	Oid			oldestxid_datoid;
 	Oid			minmulti_datoid;
@@ -1412,7 +1413,7 @@ vac_truncate_clog(TransactionId frozenXID,
 	 */
 	relation = table_open(DatabaseRelationId, AccessShareLock);
 
-	scan = heap_beginscan_catalog(relation, 0, NULL);
+	scan = table_beginscan_catalog(relation, 0, NULL);
 
 	while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
 	{
@@ -1451,7 +1452,7 @@ vac_truncate_clog(TransactionId frozenXID,
 		}
 	}
 
-	heap_endscan(scan);
+	table_endscan(scan);
 
 	table_close(relation, AccessShareLock);
 
diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index fe99096efc2..fdb2c36246d 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -204,7 +204,7 @@ execCurrentOf(CurrentOfExpr *cexpr,
 			 */
 			IndexScanDesc scan = ((IndexOnlyScanState *) scanstate)->ioss_ScanDesc;
 
-			*current_tid = scan->xs_ctup.t_self;
+			*current_tid = scan->xs_heaptid;
 		}
 		else
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index fd0520105dc..e67dd6750c6 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -108,6 +108,7 @@
 
 #include "access/genam.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "catalog/index.h"
 #include "executor/executor.h"
@@ -651,7 +652,6 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 	Oid		   *index_collations = index->rd_indcollation;
 	int			indnkeyatts = IndexRelationGetNumberOfKeyAttributes(index);
 	IndexScanDesc index_scan;
-	HeapTuple	tup;
 	ScanKeyData scankeys[INDEX_MAX_KEYS];
 	SnapshotData DirtySnapshot;
 	int			i;
@@ -707,8 +707,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 	 * to this slot.  Be sure to save and restore caller's value for
 	 * scantuple.
 	 */
-	existing_slot = MakeSingleTupleTableSlot(RelationGetDescr(heap),
-											 &TTSOpsHeapTuple);
+	existing_slot = table_slot_create(heap, NULL);
 
 	econtext = GetPerTupleExprContext(estate);
 	save_scantuple = econtext->ecxt_scantuple;
@@ -724,11 +723,9 @@ retry:
 	index_scan = index_beginscan(heap, index, &DirtySnapshot, indnkeyatts, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
-	while ((tup = index_getnext(index_scan,
-								ForwardScanDirection)) != NULL)
+	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
 	{
 		TransactionId xwait;
-		ItemPointerData ctid_wait;
 		XLTW_Oper	reason_wait;
 		Datum		existing_values[INDEX_MAX_KEYS];
 		bool		existing_isnull[INDEX_MAX_KEYS];
@@ -739,7 +736,7 @@ retry:
 		 * Ignore the entry for the tuple we're trying to check.
 		 */
 		if (ItemPointerIsValid(tupleid) &&
-			ItemPointerEquals(tupleid, &tup->t_self))
+			ItemPointerEquals(tupleid, &existing_slot->tts_tid))
 		{
 			if (found_self)		/* should not happen */
 				elog(ERROR, "found self tuple multiple times in index \"%s\"",
@@ -752,7 +749,6 @@ retry:
 		 * Extract the index column values and isnull flags from the existing
 		 * tuple.
 		 */
-		ExecStoreHeapTuple(tup, existing_slot, false);
 		FormIndexDatum(indexInfo, existing_slot, estate,
 					   existing_values, existing_isnull);
 
@@ -787,7 +783,6 @@ retry:
 			  DirtySnapshot.speculativeToken &&
 			  TransactionIdPrecedes(GetCurrentTransactionId(), xwait))))
 		{
-			ctid_wait = tup->t_data->t_ctid;
 			reason_wait = indexInfo->ii_ExclusionOps ?
 				XLTW_RecheckExclusionConstr : XLTW_InsertIndex;
 			index_endscan(index_scan);
@@ -795,7 +790,8 @@ retry:
 				SpeculativeInsertionWait(DirtySnapshot.xmin,
 										 DirtySnapshot.speculativeToken);
 			else
-				XactLockTableWait(xwait, heap, &ctid_wait, reason_wait);
+				XactLockTableWait(xwait, heap,
+								  &existing_slot->tts_tid, reason_wait);
 			goto retry;
 		}
 
@@ -807,7 +803,7 @@ retry:
 		{
 			conflict = true;
 			if (conflictTid)
-				*conflictTid = tup->t_self;
+				*conflictTid = existing_slot->tts_tid;
 			break;
 		}
 
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 61be56fe0b7..5a9ffe59c47 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -39,6 +39,7 @@
 
 #include "access/heapam.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/sysattr.h"
 #include "access/transam.h"
 #include "access/xact.h"
@@ -2802,9 +2803,8 @@ EvalPlanQualSlot(EPQState *epqstate,
 		oldcontext = MemoryContextSwitchTo(epqstate->estate->es_query_cxt);
 
 		if (relation)
-			*slot = ExecAllocTableSlot(&epqstate->estate->es_tupleTable,
-									   RelationGetDescr(relation),
-									   &TTSOpsBufferHeapTuple);
+			*slot = table_slot_create(relation,
+										 &epqstate->estate->es_tupleTable);
 		else
 			*slot = ExecAllocTableSlot(&epqstate->estate->es_tupleTable,
 									   epqstate->origslot->tts_tupleDescriptor,
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index aaa81f0620e..4054f64c49d 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 
 #include "access/table.h"
+#include "access/tableam.h"
 #include "catalog/partition.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_type.h"
@@ -728,9 +729,11 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 		{
 			TupleConversionMap *map;
 			TupleDesc	leaf_desc;
+			const TupleTableSlotOps *tts_cb;
 
 			map = leaf_part_rri->ri_PartitionInfo->pi_RootToPartitionMap;
 			leaf_desc = RelationGetDescr(leaf_part_rri->ri_RelationDesc);
+			tts_cb = table_slot_callbacks(leaf_part_rri->ri_RelationDesc);
 
 			Assert(node->onConflictSet != NIL);
 			Assert(rootResultRelInfo->ri_onConflict != NULL);
@@ -745,7 +748,7 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 			leaf_part_rri->ri_onConflict->oc_Existing =
 				ExecInitExtraTupleSlot(mtstate->ps.state,
 									   leaf_desc,
-									   &TTSOpsBufferHeapTuple);
+									   tts_cb);
 
 			/*
 			 * If the partition's tuple descriptor matches exactly the root
@@ -920,8 +923,7 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 		 * end of the command.
 		 */
 		partrouteinfo->pi_PartitionTupleSlot =
-			ExecInitExtraTupleSlot(estate, RelationGetDescr(partrel),
-								   &TTSOpsHeapTuple);
+			table_slot_create(partrel, &estate->es_tupleTable);
 	}
 	else
 		partrouteinfo->pi_PartitionTupleSlot = NULL;
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 5c5aa96e7fb..95dfc4987de 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -17,6 +17,7 @@
 #include "access/genam.h"
 #include "access/heapam.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "commands/trigger.h"
@@ -118,7 +119,6 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 							 TupleTableSlot *searchslot,
 							 TupleTableSlot *outslot)
 {
-	HeapTuple	scantuple;
 	ScanKeyData skey[INDEX_MAX_KEYS];
 	IndexScanDesc scan;
 	SnapshotData snap;
@@ -144,10 +144,9 @@ retry:
 	index_rescan(scan, skey, IndexRelationGetNumberOfKeyAttributes(idxrel), NULL, 0);
 
 	/* Try to find the tuple */
-	if ((scantuple = index_getnext(scan, ForwardScanDirection)) != NULL)
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
 	{
 		found = true;
-		ExecStoreHeapTuple(scantuple, outslot, false);
 		ExecMaterializeSlot(outslot);
 
 		xwait = TransactionIdIsValid(snap.xmin) ?
@@ -222,19 +221,21 @@ retry:
 }
 
 /*
- * Compare the tuple and slot and check if they have equal values.
+ * Compare the tuples in the slots by checking if they have equal values.
  */
 static bool
-tuple_equals_slot(TupleDesc desc, HeapTuple tup, TupleTableSlot *slot)
+tuples_equal(TupleTableSlot *slot1, TupleTableSlot *slot2)
 {
-	Datum		values[MaxTupleAttributeNumber];
-	bool		isnull[MaxTupleAttributeNumber];
-	int			attrnum;
+	int         attrnum;
 
-	heap_deform_tuple(tup, desc, values, isnull);
+	Assert(slot1->tts_tupleDescriptor->natts ==
+		   slot2->tts_tupleDescriptor->natts);
+
+	slot_getallattrs(slot1);
+	slot_getallattrs(slot2);
 
 	/* Check equality of the attributes. */
-	for (attrnum = 0; attrnum < desc->natts; attrnum++)
+	for (attrnum = 0; attrnum < slot1->tts_tupleDescriptor->natts; attrnum++)
 	{
 		Form_pg_attribute att;
 		TypeCacheEntry *typentry;
@@ -243,16 +244,16 @@ tuple_equals_slot(TupleDesc desc, HeapTuple tup, TupleTableSlot *slot)
 		 * If one value is NULL and other is not, then they are certainly not
 		 * equal
 		 */
-		if (isnull[attrnum] != slot->tts_isnull[attrnum])
+		if (slot1->tts_isnull[attrnum] != slot2->tts_isnull[attrnum])
 			return false;
 
 		/*
 		 * If both are NULL, they can be considered equal.
 		 */
-		if (isnull[attrnum])
+		if (slot1->tts_isnull[attrnum] || slot2->tts_isnull[attrnum])
 			continue;
 
-		att = TupleDescAttr(desc, attrnum);
+		att = TupleDescAttr(slot1->tts_tupleDescriptor, attrnum);
 
 		typentry = lookup_type_cache(att->atttypid, TYPECACHE_EQ_OPR_FINFO);
 		if (!OidIsValid(typentry->eq_opr_finfo.fn_oid))
@@ -262,8 +263,8 @@ tuple_equals_slot(TupleDesc desc, HeapTuple tup, TupleTableSlot *slot)
 							format_type_be(att->atttypid))));
 
 		if (!DatumGetBool(FunctionCall2(&typentry->eq_opr_finfo,
-										values[attrnum],
-										slot->tts_values[attrnum])))
+										slot1->tts_values[attrnum],
+										slot2->tts_values[attrnum])))
 			return false;
 	}
 
@@ -284,33 +285,33 @@ bool
 RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 						 TupleTableSlot *searchslot, TupleTableSlot *outslot)
 {
-	HeapTuple	scantuple;
-	HeapScanDesc scan;
+	TupleTableSlot *scanslot;
+	TableScanDesc scan;
 	SnapshotData snap;
 	TransactionId xwait;
 	bool		found;
-	TupleDesc	desc = RelationGetDescr(rel);
+	TupleDesc	desc PG_USED_FOR_ASSERTS_ONLY = RelationGetDescr(rel);
 
 	Assert(equalTupleDescs(desc, outslot->tts_tupleDescriptor));
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = heap_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL);
+	scanslot = table_slot_create(rel, NULL);
 
 retry:
 	found = false;
 
-	heap_rescan(scan, NULL);
+	table_rescan(scan, NULL);
 
 	/* Try to find the tuple */
-	while ((scantuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	while (table_scan_getnextslot(scan, ForwardScanDirection, scanslot))
 	{
-		if (!tuple_equals_slot(desc, scantuple, searchslot))
+		if (!tuples_equal(scanslot, searchslot))
 			continue;
 
 		found = true;
-		ExecStoreHeapTuple(scantuple, outslot, false);
-		ExecMaterializeSlot(outslot);
+		ExecCopySlot(outslot, scanslot);
 
 		xwait = TransactionIdIsValid(snap.xmin) ?
 			snap.xmin : snap.xmax;
@@ -375,7 +376,8 @@ retry:
 		}
 	}
 
-	heap_endscan(scan);
+	table_endscan(scan);
+	ExecDropSingleTupleTableSlot(scanslot);
 
 	return found;
 }
@@ -458,11 +460,9 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	HeapTupleTableSlot *hsearchslot = (HeapTupleTableSlot *)searchslot;
-	HeapTupleTableSlot *hslot = (HeapTupleTableSlot *)slot;
 
-	/* We expect both searchslot and the slot to contain a heap tuple. */
+	/* We expect the searchslot to contain a heap tuple. */
 	Assert(TTS_IS_HEAPTUPLE(searchslot) || TTS_IS_BUFFERTUPLE(searchslot));
-	Assert(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot));
 
 	/* For now we support only tables. */
 	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
@@ -493,11 +493,11 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		tuple = ExecFetchSlotHeapTuple(slot, true, NULL);
 
 		/* OK, update the tuple and index entries for it */
-		simple_heap_update(rel, &hsearchslot->tuple->t_self, hslot->tuple);
-		ItemPointerCopy(&hslot->tuple->t_self, &slot->tts_tid);
+		simple_heap_update(rel, &hsearchslot->tuple->t_self, tuple);
+		ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
 
 		if (resultRelInfo->ri_NumIndices > 0 &&
-			!HeapTupleIsHeapOnly(hslot->tuple))
+			!HeapTupleIsHeapOnly(tuple))
 			recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
 												   estate, false, NULL,
 												   NIL);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 044d62a56e1..3b23de9fac5 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -48,6 +48,7 @@
 #include "access/parallel.h"
 #include "access/relscan.h"
 #include "access/table.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "executor/executor.h"
 #include "jit/jit.h"
@@ -1121,7 +1122,7 @@ ExecGetTriggerOldSlot(EState *estate, ResultRelInfo *relInfo)
 		relInfo->ri_TrigOldSlot =
 			ExecInitExtraTupleSlot(estate,
 								   RelationGetDescr(rel),
-								   &TTSOpsBufferHeapTuple);
+								   table_slot_callbacks(rel));
 
 		MemoryContextSwitchTo(oldcontext);
 	}
@@ -1143,7 +1144,7 @@ ExecGetTriggerNewSlot(EState *estate, ResultRelInfo *relInfo)
 		relInfo->ri_TrigNewSlot =
 			ExecInitExtraTupleSlot(estate,
 								   RelationGetDescr(rel),
-								   &TTSOpsBufferHeapTuple);
+								   table_slot_callbacks(rel));
 
 		MemoryContextSwitchTo(oldcontext);
 	}
@@ -1165,7 +1166,7 @@ ExecGetReturningSlot(EState *estate, ResultRelInfo *relInfo)
 		relInfo->ri_ReturningSlot =
 			ExecInitExtraTupleSlot(estate,
 								   RelationGetDescr(rel),
-								   &TTSOpsBufferHeapTuple);
+								   table_slot_callbacks(rel));
 
 		MemoryContextSwitchTo(oldcontext);
 	}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 5e74585d5e4..3a82857770c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -39,6 +39,7 @@
 
 #include "access/heapam.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/visibilitymap.h"
 #include "executor/execdebug.h"
@@ -61,7 +62,7 @@ static inline void BitmapAdjustPrefetchIterator(BitmapHeapScanState *node,
 							 TBMIterateResult *tbmres);
 static inline void BitmapAdjustPrefetchTarget(BitmapHeapScanState *node);
 static inline void BitmapPrefetch(BitmapHeapScanState *node,
-			   HeapScanDesc scan);
+			   TableScanDesc scan);
 static bool BitmapShouldInitializeSharedState(
 								  ParallelBitmapHeapState *pstate);
 
@@ -76,7 +77,8 @@ static TupleTableSlot *
 BitmapHeapNext(BitmapHeapScanState *node)
 {
 	ExprContext *econtext;
-	HeapScanDesc scan;
+	TableScanDesc scan;
+	HeapScanDesc hscan;
 	TIDBitmap  *tbm;
 	TBMIterator *tbmiterator = NULL;
 	TBMSharedIterator *shared_tbmiterator = NULL;
@@ -92,6 +94,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
 	econtext = node->ss.ps.ps_ExprContext;
 	slot = node->ss.ss_ScanTupleSlot;
 	scan = node->ss.ss_currentScanDesc;
+	hscan = (HeapScanDesc) scan;
 	tbm = node->tbm;
 	if (pstate == NULL)
 		tbmiterator = node->tbmiterator;
@@ -219,7 +222,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
 			 * least AccessShareLock on the table before performing any of the
 			 * indexscans, but let's be safe.)
 			 */
-			if (tbmres->blockno >= scan->rs_nblocks)
+			if (tbmres->blockno >= hscan->rs_nblocks)
 			{
 				node->tbmres = tbmres = NULL;
 				continue;
@@ -242,14 +245,14 @@ BitmapHeapNext(BitmapHeapScanState *node)
 				 * The number of tuples on this page is put into
 				 * scan->rs_ntuples; note we don't fill scan->rs_vistuples.
 				 */
-				scan->rs_ntuples = tbmres->ntuples;
+				hscan->rs_ntuples = tbmres->ntuples;
 			}
 			else
 			{
 				/*
 				 * Fetch the current heap page and identify candidate tuples.
 				 */
-				bitgetpage(scan, tbmres);
+				bitgetpage(hscan, tbmres);
 			}
 
 			if (tbmres->ntuples >= 0)
@@ -260,7 +263,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
 			/*
 			 * Set rs_cindex to first slot to examine
 			 */
-			scan->rs_cindex = 0;
+			hscan->rs_cindex = 0;
 
 			/* Adjust the prefetch target */
 			BitmapAdjustPrefetchTarget(node);
@@ -270,7 +273,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
 			/*
 			 * Continuing in previously obtained page; advance rs_cindex
 			 */
-			scan->rs_cindex++;
+			hscan->rs_cindex++;
 
 #ifdef USE_PREFETCH
 
@@ -297,7 +300,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
 		/*
 		 * Out of range?  If so, nothing more to look at on this page
 		 */
-		if (scan->rs_cindex < 0 || scan->rs_cindex >= scan->rs_ntuples)
+		if (hscan->rs_cindex < 0 || hscan->rs_cindex >= hscan->rs_ntuples)
 		{
 			node->tbmres = tbmres = NULL;
 			continue;
@@ -324,15 +327,15 @@ BitmapHeapNext(BitmapHeapScanState *node)
 			/*
 			 * Okay to fetch the tuple.
 			 */
-			targoffset = scan->rs_vistuples[scan->rs_cindex];
-			dp = (Page) BufferGetPage(scan->rs_cbuf);
+			targoffset = hscan->rs_vistuples[hscan->rs_cindex];
+			dp = (Page) BufferGetPage(hscan->rs_cbuf);
 			lp = PageGetItemId(dp, targoffset);
 			Assert(ItemIdIsNormal(lp));
 
-			scan->rs_ctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
-			scan->rs_ctup.t_len = ItemIdGetLength(lp);
-			scan->rs_ctup.t_tableOid = scan->rs_rd->rd_id;
-			ItemPointerSet(&scan->rs_ctup.t_self, tbmres->blockno, targoffset);
+			hscan->rs_ctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
+			hscan->rs_ctup.t_len = ItemIdGetLength(lp);
+			hscan->rs_ctup.t_tableOid = scan->rs_rd->rd_id;
+			ItemPointerSet(&hscan->rs_ctup.t_self, tbmres->blockno, targoffset);
 
 			pgstat_count_heap_fetch(scan->rs_rd);
 
@@ -340,9 +343,9 @@ BitmapHeapNext(BitmapHeapScanState *node)
 			 * Set up the result slot to point to this tuple.  Note that the
 			 * slot acquires a pin on the buffer.
 			 */
-			ExecStoreBufferHeapTuple(&scan->rs_ctup,
+			ExecStoreBufferHeapTuple(&hscan->rs_ctup,
 									 slot,
-									 scan->rs_cbuf);
+									 hscan->rs_cbuf);
 
 			/*
 			 * If we are using lossy info, we have to recheck the qual
@@ -392,17 +395,17 @@ bitgetpage(HeapScanDesc scan, TBMIterateResult *tbmres)
 	Assert(page < scan->rs_nblocks);
 
 	scan->rs_cbuf = ReleaseAndReadBuffer(scan->rs_cbuf,
-										 scan->rs_rd,
+										 scan->rs_base.rs_rd,
 										 page);
 	buffer = scan->rs_cbuf;
-	snapshot = scan->rs_snapshot;
+	snapshot = scan->rs_base.rs_snapshot;
 
 	ntup = 0;
 
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -430,8 +433,8 @@ bitgetpage(HeapScanDesc scan, TBMIterateResult *tbmres)
 			HeapTupleData heapTuple;
 
 			ItemPointerSet(&tid, page, offnum);
-			if (heap_hot_search_buffer(&tid, scan->rs_rd, buffer, snapshot,
-									   &heapTuple, NULL, true))
+			if (heap_hot_search_buffer(&tid, scan->rs_base.rs_rd, buffer,
+									   snapshot, &heapTuple, NULL, true))
 				scan->rs_vistuples[ntup++] = ItemPointerGetOffsetNumber(&tid);
 		}
 	}
@@ -456,16 +459,16 @@ bitgetpage(HeapScanDesc scan, TBMIterateResult *tbmres)
 				continue;
 			loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
 			loctup.t_len = ItemIdGetLength(lp);
-			loctup.t_tableOid = scan->rs_rd->rd_id;
+			loctup.t_tableOid = scan->rs_base.rs_rd->rd_id;
 			ItemPointerSet(&loctup.t_self, page, offnum);
 			valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer);
 			if (valid)
 			{
 				scan->rs_vistuples[ntup++] = offnum;
-				PredicateLockTuple(scan->rs_rd, &loctup, snapshot);
+				PredicateLockTuple(scan->rs_base.rs_rd, &loctup, snapshot);
 			}
-			CheckForSerializableConflictOut(valid, scan->rs_rd, &loctup,
-											buffer, snapshot);
+			CheckForSerializableConflictOut(valid, scan->rs_base.rs_rd,
+											&loctup, buffer, snapshot);
 		}
 	}
 
@@ -598,7 +601,7 @@ BitmapAdjustPrefetchTarget(BitmapHeapScanState *node)
  * BitmapPrefetch - Prefetch, if prefetch_pages are behind prefetch_target
  */
 static inline void
-BitmapPrefetch(BitmapHeapScanState *node, HeapScanDesc scan)
+BitmapPrefetch(BitmapHeapScanState *node, TableScanDesc scan)
 {
 #ifdef USE_PREFETCH
 	ParallelBitmapHeapState *pstate = node->pstate;
@@ -741,7 +744,7 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node)
 	PlanState  *outerPlan = outerPlanState(node);
 
 	/* rescan to release any page pin */
-	heap_rescan(node->ss.ss_currentScanDesc, NULL);
+	table_rescan(node->ss.ss_currentScanDesc, NULL);
 
 	/* release bitmaps and buffers if any */
 	if (node->tbmiterator)
@@ -785,7 +788,7 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node)
 void
 ExecEndBitmapHeapScan(BitmapHeapScanState *node)
 {
-	HeapScanDesc scanDesc;
+	TableScanDesc scanDesc;
 
 	/*
 	 * extract information from the node
@@ -830,7 +833,7 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
 	/*
 	 * close heap scan
 	 */
-	heap_endscan(scanDesc);
+	table_endscan(scanDesc);
 }
 
 /* ----------------------------------------------------------------
@@ -914,8 +917,7 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
 	 */
 	ExecInitScanTupleSlot(estate, &scanstate->ss,
 						  RelationGetDescr(currentRelation),
-						  &TTSOpsBufferHeapTuple);
-
+						  table_slot_callbacks(currentRelation));
 
 	/*
 	 * Initialize result type and projection.
@@ -953,10 +955,10 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
 	 * Even though we aren't going to do a conventional seqscan, it is useful
 	 * to create a HeapScanDesc --- most of the fields in it are usable.
 	 */
-	scanstate->ss.ss_currentScanDesc = heap_beginscan_bm(currentRelation,
-														 estate->es_snapshot,
-														 0,
-														 NULL);
+	scanstate->ss.ss_currentScanDesc = table_beginscan_bm(currentRelation,
+														  estate->es_snapshot,
+														  0,
+														  NULL);
 
 	/*
 	 * all done.
@@ -1104,5 +1106,5 @@ ExecBitmapHeapInitializeWorker(BitmapHeapScanState *node,
 	node->pstate = pstate;
 
 	snapshot = RestoreSnapshot(pstate->phs_snapshot_data);
-	heap_update_snapshot(node->ss.ss_currentScanDesc, snapshot);
+	table_scan_update_snapshot(node->ss.ss_currentScanDesc, snapshot);
 }
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 26758e77039..2d954b722a7 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -32,6 +32,7 @@
 
 #include "access/genam.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "access/tupdesc.h"
 #include "access/visibilitymap.h"
 #include "executor/execdebug.h"
@@ -119,7 +120,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 	 */
 	while ((tid = index_getnext_tid(scandesc, direction)) != NULL)
 	{
-		HeapTuple	tuple = NULL;
+		bool		tuple_from_heap = false;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -165,17 +166,18 @@ IndexOnlyNext(IndexOnlyScanState *node)
 			 * Rats, we have to visit the heap to check visibility.
 			 */
 			InstrCountTuples2(node, 1);
-			tuple = index_fetch_heap(scandesc);
-			if (tuple == NULL)
+			if (!index_fetch_heap(scandesc, slot))
 				continue;		/* no visible tuple, try next index entry */
 
+			ExecClearTuple(slot);
+
 			/*
 			 * Only MVCC snapshots are supported here, so there should be no
 			 * need to keep following the HOT chain once a visible entry has
 			 * been found.  If we did want to allow that, we'd need to keep
 			 * more state to remember not to call index_getnext_tid next time.
 			 */
-			if (scandesc->xs_continue_hot)
+			if (scandesc->xs_heap_continue)
 				elog(ERROR, "non-MVCC snapshots are not supported in index-only scans");
 
 			/*
@@ -184,13 +186,15 @@ IndexOnlyNext(IndexOnlyScanState *node)
 			 * but it's not clear whether it's a win to do so.  The next index
 			 * entry might require a visit to the same heap page.
 			 */
+
+			tuple_from_heap = true;
 		}
 
 		/*
 		 * Fill the scan tuple slot with data from the index.  This might be
-		 * provided in either HeapTuple or IndexTuple format.  Conceivably an
-		 * index AM might fill both fields, in which case we prefer the heap
-		 * format, since it's probably a bit cheaper to fill a slot from.
+		 * provided in either HeapTuple or IndexTuple format.  Conceivably
+		 * an index AM might fill both fields, in which case we prefer the
+		 * heap format, since it's probably a bit cheaper to fill a slot from.
 		 */
 		if (scandesc->xs_hitup)
 		{
@@ -201,7 +205,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 			 */
 			Assert(slot->tts_tupleDescriptor->natts ==
 				   scandesc->xs_hitupdesc->natts);
-			ExecStoreHeapTuple(scandesc->xs_hitup, slot, false);
+			ExecForceStoreHeapTuple(scandesc->xs_hitup, slot);
 		}
 		else if (scandesc->xs_itup)
 			StoreIndexTuple(slot, scandesc->xs_itup, scandesc->xs_itupdesc);
@@ -244,7 +248,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 * anyway, then we already have the tuple-level lock and can skip the
 		 * page lock.
 		 */
-		if (tuple == NULL)
+		if (!tuple_from_heap)
 			PredicateLockPage(scandesc->heapRelation,
 							  ItemPointerGetBlockNumber(tid),
 							  estate->es_snapshot);
@@ -523,7 +527,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	 * suitable data anyway.)
 	 */
 	tupDesc = ExecTypeFromTL(node->indextlist);
-	ExecInitScanTupleSlot(estate, &indexstate->ss, tupDesc, &TTSOpsHeapTuple);
+	ExecInitScanTupleSlot(estate, &indexstate->ss, tupDesc,
+						  table_slot_callbacks(currentRelation));
 
 	/*
 	 * Initialize result type and projection info.  The node's targetlist will
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 337b561c241..ca4a3bf4981 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -31,6 +31,7 @@
 
 #include "access/nbtree.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "catalog/pg_am.h"
 #include "executor/execdebug.h"
 #include "executor/nodeIndexscan.h"
@@ -83,7 +84,6 @@ IndexNext(IndexScanState *node)
 	ExprContext *econtext;
 	ScanDirection direction;
 	IndexScanDesc scandesc;
-	HeapTuple	tuple;
 	TupleTableSlot *slot;
 
 	/*
@@ -130,20 +130,10 @@ IndexNext(IndexScanState *node)
 	/*
 	 * ok, now that we have what we need, fetch the next tuple.
 	 */
-	while ((tuple = index_getnext(scandesc, direction)) != NULL)
+	while (index_getnext_slot(scandesc, direction, slot))
 	{
 		CHECK_FOR_INTERRUPTS();
 
-		/*
-		 * Store the scanned tuple in the scan tuple slot of the scan state.
-		 * Note: we pass 'false' because tuples returned by amgetnext are
-		 * pointers onto disk pages and must not be pfree()'d.
-		 */
-		ExecStoreBufferHeapTuple(tuple, /* tuple to store */
-								 slot,	/* slot to store in */
-								 scandesc->xs_cbuf);	/* buffer containing
-														 * tuple */
-
 		/*
 		 * If the index was lossy, we have to recheck the index quals using
 		 * the fetched tuple.
@@ -183,7 +173,6 @@ IndexNextWithReorder(IndexScanState *node)
 	EState	   *estate;
 	ExprContext *econtext;
 	IndexScanDesc scandesc;
-	HeapTuple	tuple;
 	TupleTableSlot *slot;
 	ReorderTuple *topmost = NULL;
 	bool		was_exact;
@@ -252,6 +241,8 @@ IndexNextWithReorder(IndexScanState *node)
 								scandesc->xs_orderbynulls,
 								node) <= 0)
 			{
+				HeapTuple tuple;
+
 				tuple = reorderqueue_pop(node);
 
 				/* Pass 'true', as the tuple in the queue is a palloc'd copy */
@@ -271,8 +262,7 @@ IndexNextWithReorder(IndexScanState *node)
 		 */
 next_indextuple:
 		slot = node->ss.ss_ScanTupleSlot;
-		tuple = index_getnext(scandesc, ForwardScanDirection);
-		if (!tuple)
+		if (!index_getnext_slot(scandesc, ForwardScanDirection, slot))
 		{
 			/*
 			 * No more tuples from the index.  But we still need to drain any
@@ -282,14 +272,6 @@ next_indextuple:
 			continue;
 		}
 
-		/*
-		 * Store the scanned tuple in the scan tuple slot of the scan state.
-		 */
-		ExecStoreBufferHeapTuple(tuple, /* tuple to store */
-								 slot,	/* slot to store in */
-								 scandesc->xs_cbuf);	/* buffer containing
-														 * tuple */
-
 		/*
 		 * If the index was lossy, we have to recheck the index quals and
 		 * ORDER BY expressions using the fetched tuple.
@@ -357,6 +339,8 @@ next_indextuple:
 													  topmost->orderbynulls,
 													  node) > 0))
 		{
+			HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, NULL);
+
 			/* Put this tuple to the queue */
 			reorderqueue_push(node, tuple, lastfetched_vals, lastfetched_nulls);
 			continue;
@@ -949,7 +933,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	 */
 	ExecInitScanTupleSlot(estate, &indexstate->ss,
 						  RelationGetDescr(currentRelation),
-						  &TTSOpsBufferHeapTuple);
+						  table_slot_callbacks(currentRelation));
 
 	/*
 	 * Initialize result type and projection.
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index a7efe8dcaec..e316ff99012 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -39,6 +39,7 @@
 
 #include "access/heapam.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "catalog/catalog.h"
 #include "commands/trigger.h"
@@ -2147,7 +2148,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
 		mtstate->mt_scans[i] =
 			ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
-								   &TTSOpsHeapTuple);
+								   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
 
 		/* Also let FDWs init themselves for foreign-table result rels */
 		if (!resultRelInfo->ri_usesFdwDirectModify &&
@@ -2207,8 +2208,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	if (update_tuple_routing_needed)
 	{
 		ExecSetupChildParentMapForSubplan(mtstate);
-		mtstate->mt_root_tuple_slot = MakeTupleTableSlot(RelationGetDescr(rel),
-														 &TTSOpsHeapTuple);
+		mtstate->mt_root_tuple_slot = table_slot_create(rel, NULL);
 	}
 
 	/*
@@ -2301,6 +2301,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		ExprContext *econtext;
 		TupleDesc	relationDesc;
 		TupleDesc	tupDesc;
+		const TupleTableSlotOps *tts_cb;
 
 		/* insert may only have one plan, inheritance is not expanded */
 		Assert(nplans == 1);
@@ -2311,6 +2312,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 		econtext = mtstate->ps.ps_ExprContext;
 		relationDesc = resultRelInfo->ri_RelationDesc->rd_att;
+		tts_cb = table_slot_callbacks(resultRelInfo->ri_RelationDesc);
 
 		/* carried forward solely for the benefit of explain */
 		mtstate->mt_excludedtlist = node->exclRelTlist;
@@ -2321,7 +2323,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		/* initialize slot for the existing tuple */
 		resultRelInfo->ri_onConflict->oc_Existing =
 			ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc,
-								   &TTSOpsBufferHeapTuple);
+								   tts_cb);
 
 		/* create the tuple slot for the UPDATE SET projection */
 		tupDesc = ExecTypeFromTL((List *) node->onConflictSet);
@@ -2430,15 +2432,18 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			for (i = 0; i < nplans; i++)
 			{
 				JunkFilter *j;
+				TupleTableSlot *junkresslot;
 
 				subplan = mtstate->mt_plans[i]->plan;
 				if (operation == CMD_INSERT || operation == CMD_UPDATE)
 					ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc,
 										subplan->targetlist);
 
+				junkresslot =
+					ExecInitExtraTupleSlot(estate, NULL,
+										   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
 				j = ExecInitJunkFilter(subplan->targetlist,
-									   ExecInitExtraTupleSlot(estate, NULL,
-															  &TTSOpsHeapTuple));
+									   junkresslot);
 
 				if (operation == CMD_UPDATE || operation == CMD_DELETE)
 				{
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index da4a65fd30a..3a00d648099 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -17,6 +17,7 @@
 #include "access/hash.h"
 #include "access/heapam.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "access/tsmapi.h"
 #include "executor/executor.h"
 #include "executor/nodeSamplescan.h"
@@ -48,6 +49,7 @@ SampleNext(SampleScanState *node)
 {
 	HeapTuple	tuple;
 	TupleTableSlot *slot;
+	HeapScanDesc hscan;
 
 	/*
 	 * if this is first call within a scan, initialize
@@ -61,11 +63,12 @@ SampleNext(SampleScanState *node)
 	tuple = tablesample_getnext(node);
 
 	slot = node->ss.ss_ScanTupleSlot;
+	hscan = (HeapScanDesc) node->ss.ss_currentScanDesc;
 
 	if (tuple)
 		ExecStoreBufferHeapTuple(tuple, /* tuple to store */
 								 slot,	/* slot to store in */
-								 node->ss.ss_currentScanDesc->rs_cbuf); /* tuple's buffer */
+								 hscan->rs_cbuf); /* tuple's buffer */
 	else
 		ExecClearTuple(slot);
 
@@ -147,7 +150,7 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
 	/* and create slot with appropriate rowtype */
 	ExecInitScanTupleSlot(estate, &scanstate->ss,
 						  RelationGetDescr(scanstate->ss.ss_currentRelation),
-						  &TTSOpsBufferHeapTuple);
+						  table_slot_callbacks(scanstate->ss.ss_currentRelation));
 
 	/*
 	 * Initialize result type and projection.
@@ -219,7 +222,7 @@ ExecEndSampleScan(SampleScanState *node)
 	 * close heap scan
 	 */
 	if (node->ss.ss_currentScanDesc)
-		heap_endscan(node->ss.ss_currentScanDesc);
+		table_endscan(node->ss.ss_currentScanDesc);
 }
 
 /* ----------------------------------------------------------------
@@ -319,19 +322,19 @@ tablesample_init(SampleScanState *scanstate)
 	if (scanstate->ss.ss_currentScanDesc == NULL)
 	{
 		scanstate->ss.ss_currentScanDesc =
-			heap_beginscan_sampling(scanstate->ss.ss_currentRelation,
-									scanstate->ss.ps.state->es_snapshot,
-									0, NULL,
-									scanstate->use_bulkread,
-									allow_sync,
-									scanstate->use_pagemode);
+			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
+									 scanstate->ss.ps.state->es_snapshot,
+									 0, NULL,
+									 scanstate->use_bulkread,
+									 allow_sync,
+									 scanstate->use_pagemode);
 	}
 	else
 	{
-		heap_rescan_set_params(scanstate->ss.ss_currentScanDesc, NULL,
-							   scanstate->use_bulkread,
-							   allow_sync,
-							   scanstate->use_pagemode);
+		table_rescan_set_params(scanstate->ss.ss_currentScanDesc, NULL,
+								scanstate->use_bulkread,
+								allow_sync,
+								scanstate->use_pagemode);
 	}
 
 	pfree(params);
@@ -350,8 +353,9 @@ static HeapTuple
 tablesample_getnext(SampleScanState *scanstate)
 {
 	TsmRoutine *tsm = scanstate->tsmroutine;
-	HeapScanDesc scan = scanstate->ss.ss_currentScanDesc;
-	HeapTuple	tuple = &(scan->rs_ctup);
+	TableScanDesc scan = scanstate->ss.ss_currentScanDesc;
+	HeapScanDesc hscan = (HeapScanDesc) scan;
+	HeapTuple	tuple = &(hscan->rs_ctup);
 	Snapshot	snapshot = scan->rs_snapshot;
 	bool		pagemode = scan->rs_pageatatime;
 	BlockNumber blockno;
@@ -359,14 +363,14 @@ tablesample_getnext(SampleScanState *scanstate)
 	bool		all_visible;
 	OffsetNumber maxoffset;
 
-	if (!scan->rs_inited)
+	if (!hscan->rs_inited)
 	{
 		/*
 		 * return null immediately if relation is empty
 		 */
-		if (scan->rs_nblocks == 0)
+		if (hscan->rs_nblocks == 0)
 		{
-			Assert(!BufferIsValid(scan->rs_cbuf));
+			Assert(!BufferIsValid(hscan->rs_cbuf));
 			tuple->t_data = NULL;
 			return NULL;
 		}
@@ -380,15 +384,15 @@ tablesample_getnext(SampleScanState *scanstate)
 			}
 		}
 		else
-			blockno = scan->rs_startblock;
-		Assert(blockno < scan->rs_nblocks);
+			blockno = hscan->rs_startblock;
+		Assert(blockno < hscan->rs_nblocks);
 		heapgetpage(scan, blockno);
-		scan->rs_inited = true;
+		hscan->rs_inited = true;
 	}
 	else
 	{
 		/* continue from previously returned page/tuple */
-		blockno = scan->rs_cblock;	/* current page */
+		blockno = hscan->rs_cblock;	/* current page */
 	}
 
 	/*
@@ -396,9 +400,9 @@ tablesample_getnext(SampleScanState *scanstate)
 	 * visibility checks.
 	 */
 	if (!pagemode)
-		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+		LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
 
-	page = (Page) BufferGetPage(scan->rs_cbuf);
+	page = (Page) BufferGetPage(hscan->rs_cbuf);
 	all_visible = PageIsAllVisible(page) && !snapshot->takenDuringRecovery;
 	maxoffset = PageGetMaxOffsetNumber(page);
 
@@ -431,18 +435,18 @@ tablesample_getnext(SampleScanState *scanstate)
 			if (all_visible)
 				visible = true;
 			else
-				visible = SampleTupleVisible(tuple, tupoffset, scan);
+				visible = SampleTupleVisible(tuple, tupoffset, hscan);
 
 			/* in pagemode, heapgetpage did this for us */
 			if (!pagemode)
 				CheckForSerializableConflictOut(visible, scan->rs_rd, tuple,
-												scan->rs_cbuf, snapshot);
+												hscan->rs_cbuf, snapshot);
 
 			if (visible)
 			{
 				/* Found visible tuple, return it. */
 				if (!pagemode)
-					LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+					LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 				break;
 			}
 			else
@@ -457,7 +461,7 @@ tablesample_getnext(SampleScanState *scanstate)
 		 * it's time to move to the next.
 		 */
 		if (!pagemode)
-			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 
 		if (tsm->NextSampleBlock)
 		{
@@ -469,7 +473,7 @@ tablesample_getnext(SampleScanState *scanstate)
 		{
 			/* Without NextSampleBlock, just do a plain forward seqscan. */
 			blockno++;
-			if (blockno >= scan->rs_nblocks)
+			if (blockno >= hscan->rs_nblocks)
 				blockno = 0;
 
 			/*
@@ -485,7 +489,7 @@ tablesample_getnext(SampleScanState *scanstate)
 			if (scan->rs_syncscan)
 				ss_report_location(scan->rs_rd, blockno);
 
-			finished = (blockno == scan->rs_startblock);
+			finished = (blockno == hscan->rs_startblock);
 		}
 
 		/*
@@ -493,23 +497,23 @@ tablesample_getnext(SampleScanState *scanstate)
 		 */
 		if (finished)
 		{
-			if (BufferIsValid(scan->rs_cbuf))
-				ReleaseBuffer(scan->rs_cbuf);
-			scan->rs_cbuf = InvalidBuffer;
-			scan->rs_cblock = InvalidBlockNumber;
+			if (BufferIsValid(hscan->rs_cbuf))
+				ReleaseBuffer(hscan->rs_cbuf);
+			hscan->rs_cbuf = InvalidBuffer;
+			hscan->rs_cblock = InvalidBlockNumber;
 			tuple->t_data = NULL;
-			scan->rs_inited = false;
+			hscan->rs_inited = false;
 			return NULL;
 		}
 
-		Assert(blockno < scan->rs_nblocks);
+		Assert(blockno < hscan->rs_nblocks);
 		heapgetpage(scan, blockno);
 
 		/* Re-establish state for new page */
 		if (!pagemode)
-			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
 
-		page = (Page) BufferGetPage(scan->rs_cbuf);
+		page = (Page) BufferGetPage(hscan->rs_cbuf);
 		all_visible = PageIsAllVisible(page) && !snapshot->takenDuringRecovery;
 		maxoffset = PageGetMaxOffsetNumber(page);
 	}
@@ -517,7 +521,7 @@ tablesample_getnext(SampleScanState *scanstate)
 	/* Count successfully-fetched tuples as heap fetches */
 	pgstat_count_heap_getnext(scan->rs_rd);
 
-	return &(scan->rs_ctup);
+	return &(hscan->rs_ctup);
 }
 
 /*
@@ -526,7 +530,7 @@ tablesample_getnext(SampleScanState *scanstate)
 static bool
 SampleTupleVisible(HeapTuple tuple, OffsetNumber tupoffset, HeapScanDesc scan)
 {
-	if (scan->rs_pageatatime)
+	if (scan->rs_base.rs_pageatatime)
 	{
 		/*
 		 * In pageatatime mode, heapgetpage() already did visibility checks,
@@ -559,7 +563,7 @@ SampleTupleVisible(HeapTuple tuple, OffsetNumber tupoffset, HeapScanDesc scan)
 	{
 		/* Otherwise, we have to check the tuple individually. */
 		return HeapTupleSatisfiesVisibility(tuple,
-											scan->rs_snapshot,
+											scan->rs_base.rs_snapshot,
 											scan->rs_cbuf);
 	}
 }
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index e5482859efc..8bd7430a918 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -27,8 +27,8 @@
  */
 #include "postgres.h"
 
-#include "access/heapam.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "executor/execdebug.h"
 #include "executor/nodeSeqscan.h"
 #include "utils/rel.h"
@@ -49,8 +49,7 @@ static TupleTableSlot *SeqNext(SeqScanState *node);
 static TupleTableSlot *
 SeqNext(SeqScanState *node)
 {
-	HeapTuple	tuple;
-	HeapScanDesc scandesc;
+	TableScanDesc scandesc;
 	EState	   *estate;
 	ScanDirection direction;
 	TupleTableSlot *slot;
@@ -69,34 +68,18 @@ SeqNext(SeqScanState *node)
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
-		scandesc = heap_beginscan(node->ss.ss_currentRelation,
-								  estate->es_snapshot,
-								  0, NULL);
+		scandesc = table_beginscan(node->ss.ss_currentRelation,
+								   estate->es_snapshot,
+								   0, NULL);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
 	/*
 	 * get the next tuple from the table
 	 */
-	tuple = heap_getnext(scandesc, direction);
-
-	/*
-	 * save the tuple and the buffer returned to us by the access methods in
-	 * our scan tuple slot and return the slot.  Note: we pass 'false' because
-	 * tuples returned by heap_getnext() are pointers onto disk pages and were
-	 * not created with palloc() and so should not be pfree()'d.  Note also
-	 * that ExecStoreHeapTuple will increment the refcount of the buffer; the
-	 * refcount will not be dropped until the tuple table slot is cleared.
-	 */
-	if (tuple)
-		ExecStoreBufferHeapTuple(tuple, /* tuple to store */
-								 slot,	/* slot to store in */
-								 scandesc->rs_cbuf);	/* buffer associated
-														 * with this tuple */
-	else
-		ExecClearTuple(slot);
-
-	return slot;
+	if (table_scan_getnextslot(scandesc, direction, slot))
+		return slot;
+	return NULL;
 }
 
 /*
@@ -174,7 +157,7 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 	/* and create slot with the appropriate rowtype */
 	ExecInitScanTupleSlot(estate, &scanstate->ss,
 						  RelationGetDescr(scanstate->ss.ss_currentRelation),
-						  &TTSOpsBufferHeapTuple);
+						  table_slot_callbacks(scanstate->ss.ss_currentRelation));
 
 	/*
 	 * Initialize result type and projection.
@@ -200,7 +183,7 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 void
 ExecEndSeqScan(SeqScanState *node)
 {
-	HeapScanDesc scanDesc;
+	TableScanDesc scanDesc;
 
 	/*
 	 * get information from node
@@ -223,7 +206,7 @@ ExecEndSeqScan(SeqScanState *node)
 	 * close heap scan
 	 */
 	if (scanDesc != NULL)
-		heap_endscan(scanDesc);
+		table_endscan(scanDesc);
 }
 
 /* ----------------------------------------------------------------
@@ -240,13 +223,13 @@ ExecEndSeqScan(SeqScanState *node)
 void
 ExecReScanSeqScan(SeqScanState *node)
 {
-	HeapScanDesc scan;
+	TableScanDesc scan;
 
 	scan = node->ss.ss_currentScanDesc;
 
 	if (scan != NULL)
-		heap_rescan(scan,		/* scan desc */
-					NULL);		/* new scan keys */
+		table_rescan(scan,	/* scan desc */
+					 NULL);	/* new scan keys */
 
 	ExecScanReScan((ScanState *) node);
 }
@@ -269,7 +252,8 @@ ExecSeqScanEstimate(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 
-	node->pscan_len = heap_parallelscan_estimate(estate->es_snapshot);
+	node->pscan_len = table_parallelscan_estimate(node->ss.ss_currentRelation,
+												  estate->es_snapshot);
 	shm_toc_estimate_chunk(&pcxt->estimator, node->pscan_len);
 	shm_toc_estimate_keys(&pcxt->estimator, 1);
 }
@@ -285,15 +269,15 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 						 ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
-	ParallelHeapScanDesc pscan;
+	ParallelTableScanDesc pscan;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
-	heap_parallelscan_initialize(pscan,
-								 node->ss.ss_currentRelation,
-								 estate->es_snapshot);
+	table_parallelscan_initialize(node->ss.ss_currentRelation,
+								  pscan,
+								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		heap_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
 }
 
 /* ----------------------------------------------------------------
@@ -306,9 +290,10 @@ void
 ExecSeqScanReInitializeDSM(SeqScanState *node,
 						   ParallelContext *pcxt)
 {
-	HeapScanDesc scan = node->ss.ss_currentScanDesc;
+	ParallelTableScanDesc pscan;
 
-	heap_parallelscan_reinitialize(scan->rs_parallel);
+	pscan = node->ss.ss_currentScanDesc->rs_parallel;
+	table_parallelscan_reinitialize(node->ss.ss_currentRelation, pscan);
 }
 
 /* ----------------------------------------------------------------
@@ -321,9 +306,9 @@ void
 ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
-	ParallelHeapScanDesc pscan;
+	ParallelTableScanDesc pscan;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		heap_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
 }
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 9a877874b75..08872ef9b4f 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -24,6 +24,7 @@
 
 #include "access/heapam.h"
 #include "access/sysattr.h"
+#include "access/tableam.h"
 #include "catalog/pg_type.h"
 #include "executor/execdebug.h"
 #include "executor/nodeTidscan.h"
@@ -538,7 +539,7 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
 	 */
 	ExecInitScanTupleSlot(estate, &tidstate->ss,
 						  RelationGetDescr(currentRelation),
-						  &TTSOpsBufferHeapTuple);
+						  table_slot_callbacks(currentRelation));
 
 	/*
 	 * Initialize result type and projection.
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index e71eb3793bc..5b897d50eed 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -14,7 +14,9 @@
 
 #include "postgres.h"
 
-#include "access/heapam.h"
+#include "access/relation.h"
+#include "access/table.h"
+#include "access/tableam.h"
 #include "catalog/partition.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_type.h"
@@ -1202,12 +1204,10 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		Expr	   *constr;
 		Expr	   *partition_constraint;
 		EState	   *estate;
-		HeapTuple	tuple;
 		ExprState  *partqualstate = NULL;
 		Snapshot	snapshot;
-		TupleDesc	tupdesc;
 		ExprContext *econtext;
-		HeapScanDesc scan;
+		TableScanDesc scan;
 		MemoryContext oldCxt;
 		TupleTableSlot *tupslot;
 
@@ -1254,7 +1254,6 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 			continue;
 		}
 
-		tupdesc = CreateTupleDescCopy(RelationGetDescr(part_rel));
 		constr = linitial(def_part_constraints);
 		partition_constraint = (Expr *)
 			map_partition_varattnos((List *) constr,
@@ -1266,8 +1265,8 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = heap_beginscan(part_rel, snapshot, 0, NULL);
-		tupslot = MakeSingleTupleTableSlot(tupdesc, &TTSOpsHeapTuple);
+		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -1275,9 +1274,8 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		 */
 		oldCxt = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
 
-		while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+		while (table_scan_getnextslot(scan, ForwardScanDirection, tupslot))
 		{
-			ExecStoreHeapTuple(tuple, tupslot, false);
 			econtext->ecxt_scantuple = tupslot;
 
 			if (!ExecCheck(partqualstate, econtext))
@@ -1291,7 +1289,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		}
 
 		MemoryContextSwitchTo(oldCxt);
-		heap_endscan(scan);
+		table_endscan(scan);
 		UnregisterSnapshot(snapshot);
 		ExecDropSingleTupleTableSlot(tupslot);
 		FreeExecutorState(estate);
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 347f91e937b..8ed306d5d98 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -69,6 +69,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/reloptions.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "catalog/dependency.h"
@@ -1865,7 +1866,7 @@ get_database_list(void)
 {
 	List	   *dblist = NIL;
 	Relation	rel;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	HeapTuple	tup;
 	MemoryContext resultcxt;
 
@@ -1883,7 +1884,7 @@ get_database_list(void)
 	(void) GetTransactionSnapshot();
 
 	rel = table_open(DatabaseRelationId, AccessShareLock);
-	scan = heap_beginscan_catalog(rel, 0, NULL);
+	scan = table_beginscan_catalog(rel, 0, NULL);
 
 	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
 	{
@@ -1912,7 +1913,7 @@ get_database_list(void)
 		MemoryContextSwitchTo(oldcxt);
 	}
 
-	heap_endscan(scan);
+	table_endscan(scan);
 	table_close(rel, AccessShareLock);
 
 	CommitTransactionCommand();
@@ -1931,7 +1932,7 @@ do_autovacuum(void)
 {
 	Relation	classRel;
 	HeapTuple	tuple;
-	HeapScanDesc relScan;
+	TableScanDesc relScan;
 	Form_pg_database dbForm;
 	List	   *table_oids = NIL;
 	List	   *orphan_oids = NIL;
@@ -2043,7 +2044,7 @@ do_autovacuum(void)
 	 * wide tables there might be proportionally much more activity in the
 	 * TOAST table than in its parent.
 	 */
-	relScan = heap_beginscan_catalog(classRel, 0, NULL);
+	relScan = table_beginscan_catalog(classRel, 0, NULL);
 
 	/*
 	 * On the first pass, we collect main tables to vacuum, and also the main
@@ -2132,7 +2133,7 @@ do_autovacuum(void)
 		}
 	}
 
-	heap_endscan(relScan);
+	table_endscan(relScan);
 
 	/* second pass: check TOAST tables */
 	ScanKeyInit(&key,
@@ -2140,7 +2141,7 @@ do_autovacuum(void)
 				BTEqualStrategyNumber, F_CHAREQ,
 				CharGetDatum(RELKIND_TOASTVALUE));
 
-	relScan = heap_beginscan_catalog(classRel, 1, &key);
+	relScan = table_beginscan_catalog(classRel, 1, &key);
 	while ((tuple = heap_getnext(relScan, ForwardScanDirection)) != NULL)
 	{
 		Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple);
@@ -2187,7 +2188,7 @@ do_autovacuum(void)
 			table_oids = lappend_oid(table_oids, relid);
 	}
 
-	heap_endscan(relScan);
+	table_endscan(relScan);
 	table_close(classRel, AccessShareLock);
 
 	/*
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 81c64992518..b6ac6e1a531 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -36,6 +36,7 @@
 
 #include "access/heapam.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/twophase_rmgr.h"
 #include "access/xact.h"
@@ -1205,7 +1206,7 @@ pgstat_collect_oids(Oid catalogid, AttrNumber anum_oid)
 	HTAB	   *htab;
 	HASHCTL		hash_ctl;
 	Relation	rel;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	HeapTuple	tup;
 	Snapshot	snapshot;
 
@@ -1220,7 +1221,7 @@ pgstat_collect_oids(Oid catalogid, AttrNumber anum_oid)
 
 	rel = table_open(catalogid, AccessShareLock);
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = heap_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL);
 	while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
 	{
 		Oid			thisoid;
@@ -1233,7 +1234,7 @@ pgstat_collect_oids(Oid catalogid, AttrNumber anum_oid)
 
 		(void) hash_search(htab, (void *) &thisoid, HASH_ENTER, NULL);
 	}
-	heap_endscan(scan);
+	table_endscan(scan);
 	UnregisterSnapshot(snapshot);
 	table_close(rel, AccessShareLock);
 
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 55b91b5e12c..186057bd932 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -24,6 +24,7 @@
 #include "access/heapam.h"
 #include "access/htup.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 
 #include "catalog/pg_subscription.h"
@@ -118,7 +119,7 @@ get_subscription_list(void)
 {
 	List	   *res = NIL;
 	Relation	rel;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	HeapTuple	tup;
 	MemoryContext resultcxt;
 
@@ -136,7 +137,7 @@ get_subscription_list(void)
 	(void) GetTransactionSnapshot();
 
 	rel = table_open(SubscriptionRelationId, AccessShareLock);
-	scan = heap_beginscan_catalog(rel, 0, NULL);
+	scan = table_beginscan_catalog(rel, 0, NULL);
 
 	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
 	{
@@ -164,7 +165,7 @@ get_subscription_list(void)
 		MemoryContextSwitchTo(oldcxt);
 	}
 
-	heap_endscan(scan);
+	table_endscan(scan);
 	table_close(rel, AccessShareLock);
 
 	CommitTransactionCommand();
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index a5e5007e810..8acd22465a3 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -24,6 +24,7 @@
 #include "postgres.h"
 
 #include "access/table.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "access/xlog_internal.h"
 #include "catalog/catalog.h"
@@ -698,10 +699,9 @@ apply_handle_update(StringInfo s)
 	estate = create_estate_for_relation(rel);
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
-										&TTSOpsHeapTuple);
-	localslot = ExecInitExtraTupleSlot(estate,
-									   RelationGetDescr(rel->localrel),
-									   &TTSOpsHeapTuple);
+										&TTSOpsVirtual);
+	localslot = table_slot_create(rel->localrel,
+									 &estate->es_tupleTable);
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
@@ -819,9 +819,8 @@ apply_handle_delete(StringInfo s)
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
-	localslot = ExecInitExtraTupleSlot(estate,
-									   RelationGetDescr(rel->localrel),
-									   &TTSOpsHeapTuple);
+	localslot = table_slot_create(rel->localrel,
+									 &estate->es_tupleTable);
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index 7ad470d34a9..6bd889461e3 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -17,6 +17,7 @@
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/multixact.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "catalog/catalog.h"
@@ -423,8 +424,9 @@ DefineQueryRewrite(const char *rulename,
 		if (event_relation->rd_rel->relkind != RELKIND_VIEW &&
 			event_relation->rd_rel->relkind != RELKIND_MATVIEW)
 		{
-			HeapScanDesc scanDesc;
+			TableScanDesc scanDesc;
 			Snapshot	snapshot;
+			TupleTableSlot *slot;
 
 			if (event_relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 				ereport(ERROR,
@@ -439,13 +441,15 @@ DefineQueryRewrite(const char *rulename,
 								RelationGetRelationName(event_relation))));
 
 			snapshot = RegisterSnapshot(GetLatestSnapshot());
-			scanDesc = heap_beginscan(event_relation, snapshot, 0, NULL);
-			if (heap_getnext(scanDesc, ForwardScanDirection) != NULL)
+			scanDesc = table_beginscan(event_relation, snapshot, 0, NULL);
+			slot = table_slot_create(event_relation, NULL);
+			if (table_scan_getnextslot(scanDesc, ForwardScanDirection, slot))
 				ereport(ERROR,
 						(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 						 errmsg("could not convert table \"%s\" to a view because it is not empty",
 								RelationGetRelationName(event_relation))));
-			heap_endscan(scanDesc);
+			ExecDropSingleTupleTableSlot(slot);
+			table_endscan(scanDesc);
 			UnregisterSnapshot(snapshot);
 
 			if (event_relation->rd_rel->relhastriggers)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index ef04fa5009b..d715709b7cd 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,10 +23,10 @@
 
 #include "postgres.h"
 
-#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "access/table.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
@@ -253,26 +253,9 @@ RI_FKey_check(TriggerData *trigdata)
 	 * checked).  Test its liveness according to SnapshotSelf.  We need pin
 	 * and lock on the buffer to call HeapTupleSatisfiesVisibility.  Caller
 	 * should be holding pin, but not lock.
-	 *
-	 * XXX: Note that the buffer-tuple specificity will be removed in the near
-	 * future.
 	 */
-	if (TTS_IS_BUFFERTUPLE(newslot))
-	{
-		BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) newslot;
-
-		Assert(BufferIsValid(bslot->buffer));
-
-		LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
-		if (!HeapTupleSatisfiesVisibility(bslot->base.tuple, SnapshotSelf, bslot->buffer))
-		{
-			LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
-			return PointerGetDatum(NULL);
-		}
-		LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
-	}
-	else
-		elog(ERROR, "expected buffer tuple");
+	if (!table_tuple_satisfies_snapshot(trigdata->tg_relation, newslot, SnapshotSelf))
+		return PointerGetDatum(NULL);
 
 	/*
 	 * Get the relation descriptors of the FK and PK tables.
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index e6837869cf6..bf3b5b551a3 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -104,6 +104,7 @@
 #include "access/brin.h"
 #include "access/gin.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/index.h"
@@ -5099,7 +5100,6 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 			bool		typByVal;
 			ScanKeyData scankeys[1];
 			IndexScanDesc index_scan;
-			HeapTuple	tup;
 			Datum		values[INDEX_MAX_KEYS];
 			bool		isnull[INDEX_MAX_KEYS];
 			SnapshotData SnapshotNonVacuumable;
@@ -5122,8 +5122,7 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 			indexInfo = BuildIndexInfo(indexRel);
 
 			/* some other stuff */
-			slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRel),
-											&TTSOpsHeapTuple);
+			slot = table_slot_create(heapRel, NULL);
 			econtext->ecxt_scantuple = slot;
 			get_typlenbyval(vardata->atttype, &typLen, &typByVal);
 			InitNonVacuumableSnapshot(SnapshotNonVacuumable, RecentGlobalXmin);
@@ -5175,11 +5174,9 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 				index_rescan(index_scan, scankeys, 1, NULL, 0);
 
 				/* Fetch first tuple in sortop's direction */
-				if ((tup = index_getnext(index_scan,
-										 indexscandir)) != NULL)
+				if (index_getnext_slot(index_scan, indexscandir, slot))
 				{
-					/* Extract the index column values from the heap tuple */
-					ExecStoreHeapTuple(tup, slot, false);
+					/* Extract the index column values from the slot */
 					FormIndexDatum(indexInfo, slot, estate,
 								   values, isnull);
 
@@ -5208,11 +5205,9 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 				index_rescan(index_scan, scankeys, 1, NULL, 0);
 
 				/* Fetch first tuple in reverse direction */
-				if ((tup = index_getnext(index_scan,
-										 -indexscandir)) != NULL)
+				if (index_getnext_slot(index_scan, -indexscandir, slot))
 				{
-					/* Extract the index column values from the heap tuple */
-					ExecStoreHeapTuple(tup, slot, false);
+					/* Extract the index column values from the slot */
 					FormIndexDatum(indexInfo, slot, estate,
 								   values, isnull);
 
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index a5ee209f910..0b51a6f1484 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -23,6 +23,7 @@
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/session.h"
+#include "access/tableam.h"
 #include "access/sysattr.h"
 #include "access/xact.h"
 #include "access/xlog.h"
@@ -1245,15 +1246,15 @@ static bool
 ThereIsAtLeastOneRole(void)
 {
 	Relation	pg_authid_rel;
-	HeapScanDesc scan;
+	TableScanDesc scan;
 	bool		result;
 
 	pg_authid_rel = table_open(AuthIdRelationId, AccessShareLock);
 
-	scan = heap_beginscan_catalog(pg_authid_rel, 0, NULL);
+	scan = table_beginscan_catalog(pg_authid_rel, 0, NULL);
 	result = (heap_getnext(scan, ForwardScanDirection) != NULL);
 
-	heap_endscan(scan);
+	table_endscan(scan);
 	table_close(pg_authid_rel, AccessShareLock);
 
 	return result;
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index c4aba39496f..1936195c535 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -159,8 +159,9 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 						 ParallelIndexScanDesc pscan);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 				  ScanDirection direction);
-extern HeapTuple index_fetch_heap(IndexScanDesc scan);
-extern HeapTuple index_getnext(IndexScanDesc scan, ScanDirection direction);
+struct TupleTableSlot;
+extern bool index_fetch_heap(IndexScanDesc scan, struct TupleTableSlot *slot);
+extern bool index_getnext_slot(IndexScanDesc scan, ScanDirection direction, struct TupleTableSlot *slot);
 extern int64 index_getbitmap(IndexScanDesc scan, TIDBitmap *bitmap);
 
 extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ab0879138f0..a369716ce31 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -15,6 +15,7 @@
 #define HEAPAM_H
 
 #include "access/relation.h"	/* for backward compatibility */
+#include "access/relscan.h"
 #include "access/sdir.h"
 #include "access/skey.h"
 #include "access/table.h"		/* for backward compatibility */
@@ -60,6 +61,48 @@ typedef struct HeapUpdateFailureData
 	CommandId	cmax;
 } HeapUpdateFailureData;
 
+/*
+ * Descriptor for heap table scans.
+ */
+typedef struct HeapScanDescData
+{
+	TableScanDescData rs_base;	/* AM independent part of the descriptor */
+
+	/* state set up at initscan time */
+	BlockNumber rs_nblocks;		/* total number of blocks in rel */
+	BlockNumber rs_startblock;	/* block # to start at */
+	BlockNumber rs_numblocks;	/* max number of blocks to scan */
+	/* rs_numblocks is usually InvalidBlockNumber, meaning "scan whole rel" */
+
+	/* scan current state */
+	bool		rs_inited;		/* false = scan not init'd yet */
+	BlockNumber rs_cblock;		/* current block # in scan, if any */
+	Buffer		rs_cbuf;		/* current buffer in scan, if any */
+	/* NB: if rs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+	/* rs_numblocks is usually InvalidBlockNumber, meaning "scan whole rel" */
+	BufferAccessStrategy rs_strategy;	/* access strategy for reads */
+
+	HeapTupleData rs_ctup;		/* current tuple in scan, if any */
+
+	/* these fields only used in page-at-a-time mode and for bitmap scans */
+	int			rs_cindex;		/* current tuple's index in vistuples */
+	int			rs_ntuples;		/* number of visible tuples on page */
+	OffsetNumber rs_vistuples[MaxHeapTuplesPerPage];	/* their offsets */
+}			HeapScanDescData;
+typedef struct HeapScanDescData *HeapScanDesc;
+
+/*
+ * Descriptor for fetches from heap via an index.
+ */
+typedef struct IndexFetchHeapData
+{
+	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
+
+	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
+	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+} IndexFetchHeapData;
+
 /* Result codes for HeapTupleSatisfiesVacuum */
 typedef enum
 {
@@ -79,42 +122,32 @@ typedef enum
  */
 
 
-/* struct definitions appear in relscan.h */
-typedef struct HeapScanDescData *HeapScanDesc;
-typedef struct ParallelHeapScanDescData *ParallelHeapScanDesc;
-
 /*
  * HeapScanIsValid
  *		True iff the heap scan is valid.
  */
 #define HeapScanIsValid(scan) PointerIsValid(scan)
 
-extern HeapScanDesc heap_beginscan(Relation relation, Snapshot snapshot,
-			   int nkeys, ScanKey key);
-extern HeapScanDesc heap_beginscan_catalog(Relation relation, int nkeys,
-					   ScanKey key);
-extern HeapScanDesc heap_beginscan_strat(Relation relation, Snapshot snapshot,
-					 int nkeys, ScanKey key,
-					 bool allow_strat, bool allow_sync);
-extern HeapScanDesc heap_beginscan_bm(Relation relation, Snapshot snapshot,
-				  int nkeys, ScanKey key);
-extern HeapScanDesc heap_beginscan_sampling(Relation relation,
-						Snapshot snapshot, int nkeys, ScanKey key,
-						bool allow_strat, bool allow_sync, bool allow_pagemode);
-extern void heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk,
+extern TableScanDesc heap_beginscan(Relation relation, Snapshot snapshot,
+			   int nkeys, ScanKey key,
+			   ParallelTableScanDesc parallel_scan,
+			   bool allow_strat,
+			   bool allow_sync,
+			   bool allow_pagemode,
+			   bool is_bitmapscan,
+			   bool is_samplescan,
+			   bool temp_snap);
+extern void heap_setscanlimits(TableScanDesc scan, BlockNumber startBlk,
 				   BlockNumber endBlk);
-extern void heapgetpage(HeapScanDesc scan, BlockNumber page);
-extern void heap_rescan(HeapScanDesc scan, ScanKey key);
-extern void heap_rescan_set_params(HeapScanDesc scan, ScanKey key,
+extern void heapgetpage(TableScanDesc scan, BlockNumber page);
+extern void heap_rescan(TableScanDesc scan, ScanKey key, bool set_params,
+			bool allow_strat, bool allow_sync, bool allow_pagemode);
+extern void heap_rescan_set_params(TableScanDesc scan, ScanKey key,
 					   bool allow_strat, bool allow_sync, bool allow_pagemode);
-extern void heap_endscan(HeapScanDesc scan);
-extern HeapTuple heap_getnext(HeapScanDesc scan, ScanDirection direction);
-
-extern Size heap_parallelscan_estimate(Snapshot snapshot);
-extern void heap_parallelscan_initialize(ParallelHeapScanDesc target,
-							 Relation relation, Snapshot snapshot);
-extern void heap_parallelscan_reinitialize(ParallelHeapScanDesc parallel_scan);
-extern HeapScanDesc heap_beginscan_parallel(Relation, ParallelHeapScanDesc);
+extern void heap_endscan(TableScanDesc scan);
+extern HeapTuple heap_getnext(TableScanDesc scan, ScanDirection direction);
+extern bool heap_getnextslot(TableScanDesc sscan,
+			ScanDirection direction, struct TupleTableSlot *slot);
 
 extern bool heap_fetch(Relation relation, Snapshot snapshot,
 		   HeapTuple tuple, Buffer *userbuf, bool keep_buf,
@@ -164,7 +197,6 @@ extern void simple_heap_update(Relation relation, ItemPointer otid,
 				   HeapTuple tup);
 
 extern void heap_sync(Relation relation);
-extern void heap_update_snapshot(HeapScanDesc scan, Snapshot snapshot);
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b78ef2f47d0..5f77d23195c 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -21,29 +21,14 @@
 #include "storage/spin.h"
 #include "utils/relcache.h"
 
-/*
- * Shared state for parallel heap scan.
- *
- * Each backend participating in a parallel heap scan has its own
- * HeapScanDesc in backend-private memory, and those objects all contain
- * a pointer to this structure.  The information here must be sufficient
- * to properly initialize each new HeapScanDesc as workers join the scan,
- * and it must act as a font of block numbers for those workers.
- */
-typedef struct ParallelHeapScanDescData
-{
-	Oid			phs_relid;		/* OID of relation to scan */
-	bool		phs_syncscan;	/* report location to syncscan logic? */
-	BlockNumber phs_nblocks;	/* # blocks in relation at start of scan */
-	slock_t		phs_mutex;		/* mutual exclusion for setting startblock */
-	BlockNumber phs_startblock; /* starting block number */
-	pg_atomic_uint64 phs_nallocated;	/* number of blocks allocated to
-										 * workers so far. */
-	bool		phs_snapshot_any;	/* SnapshotAny, not phs_snapshot_data? */
-	char		phs_snapshot_data[FLEXIBLE_ARRAY_MEMBER];
-} ParallelHeapScanDescData;
 
-typedef struct HeapScanDescData
+struct ParallelTableScanDescData;
+
+/*
+ * Generic descriptor for table scans. This is the base-class for table scans,
+ * which needs to be embedded in the scans of individual AMs.
+ */
+typedef struct TableScanDescData
 {
 	/* scan parameters */
 	Relation	rs_rd;			/* heap relation descriptor */
@@ -56,28 +41,55 @@ typedef struct HeapScanDescData
 	bool		rs_allow_strat; /* allow or disallow use of access strategy */
 	bool		rs_allow_sync;	/* allow or disallow use of syncscan */
 	bool		rs_temp_snap;	/* unregister snapshot at scan end? */
-
-	/* state set up at initscan time */
-	BlockNumber rs_nblocks;		/* total number of blocks in rel */
-	BlockNumber rs_startblock;	/* block # to start at */
-	BlockNumber rs_numblocks;	/* max number of blocks to scan */
-	/* rs_numblocks is usually InvalidBlockNumber, meaning "scan whole rel" */
-	BufferAccessStrategy rs_strategy;	/* access strategy for reads */
 	bool		rs_syncscan;	/* report location to syncscan logic? */
 
-	/* scan current state */
-	bool		rs_inited;		/* false = scan not init'd yet */
-	HeapTupleData rs_ctup;		/* current tuple in scan, if any */
-	BlockNumber rs_cblock;		/* current block # in scan, if any */
-	Buffer		rs_cbuf;		/* current buffer in scan, if any */
-	/* NB: if rs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
-	struct ParallelHeapScanDescData *rs_parallel;	/* parallel scan information */
+	struct ParallelTableScanDescData *rs_parallel;	/* parallel scan information */
 
-	/* these fields only used in page-at-a-time mode and for bitmap scans */
-	int			rs_cindex;		/* current tuple's index in vistuples */
-	int			rs_ntuples;		/* number of visible tuples on page */
-	OffsetNumber rs_vistuples[MaxHeapTuplesPerPage];	/* their offsets */
-}			HeapScanDescData;
+}			TableScanDescData;
+typedef struct TableScanDescData *TableScanDesc;
+
+/*
+ * Shared state for parallel table scan.
+ *
+ * Each backend participating in a parallel table scan has its own
+ * TableScanDesc in backend-private memory, and those objects all contain a
+ * pointer to this structure.  The information here must be sufficient to
+ * properly initialize each new TableScanDesc as workers join the scan, and it
+ * must act as a information what to scan for those workers.
+ */
+typedef struct ParallelTableScanDescData
+{
+	Oid			phs_relid;		/* OID of relation to scan */
+	bool		phs_syncscan;	/* report location to syncscan logic? */
+	bool		phs_snapshot_any;	/* SnapshotAny, not phs_snapshot_data? */
+	Size		phs_snapshot_off;	/* data for snapshot */
+} ParallelTableScanDescData;
+typedef struct ParallelTableScanDescData *ParallelTableScanDesc;
+
+/*
+ * Shared state for parallel table scans, for block oriented storage.
+ */
+typedef struct ParallelBlockTableScanDescData
+{
+	ParallelTableScanDescData base;
+
+	BlockNumber phs_nblocks;	/* # blocks in relation at start of scan */
+	slock_t		phs_mutex;		/* mutual exclusion for setting startblock */
+	BlockNumber phs_startblock; /* starting block number */
+	pg_atomic_uint64 phs_nallocated;	/* number of blocks allocated to
+										 * workers so far. */
+} ParallelBlockTableScanDescData;
+typedef struct ParallelBlockTableScanDescData *ParallelBlockTableScanDesc;
+
+/*
+ * Base class for fetches from a table via an index. This is the base-class
+ * for such scans, which needs to be embedded in the respective struct for
+ * individual AMs.
+ */
+typedef struct IndexFetchTableData
+{
+	Relation rel;
+} IndexFetchTableData;
 
 /*
  * We use the same IndexScanDescData structure for both amgettuple-based
@@ -117,10 +129,10 @@ typedef struct IndexScanDescData
 	HeapTuple	xs_hitup;		/* index data returned by AM, as HeapTuple */
 	struct TupleDescData *xs_hitupdesc;	/* rowtype descriptor of xs_hitup */
 
-	/* xs_ctup/xs_cbuf/xs_recheck are valid after a successful index_getnext */
-	HeapTupleData xs_ctup;		/* current heap tuple, if any */
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	ItemPointerData xs_heaptid; /* result */
+	bool		xs_heap_continue;	/* T if must keep walking, potential further results */
+	IndexFetchTableData *xs_heapfetch;
+
 	bool		xs_recheck;		/* T means scan keys must be rechecked */
 
 	/*
@@ -134,9 +146,6 @@ typedef struct IndexScanDescData
 	bool	   *xs_orderbynulls;
 	bool		xs_recheckorderby;
 
-	/* state data for traversing HOT chains in index_getnext */
-	bool		xs_continue_hot;	/* T if must keep walking HOT chain */
-
 	/* parallel index scan information, in shared memory */
 	struct ParallelIndexScanDescData *parallel_scan;
 }			IndexScanDescData;
@@ -150,14 +159,17 @@ typedef struct ParallelIndexScanDescData
 	char		ps_snapshot_data[FLEXIBLE_ARRAY_MEMBER];
 }			ParallelIndexScanDescData;
 
-/* Struct for heap-or-index scans of system tables */
+struct TupleTableSlot;
+
+/* Struct for storage-or-index scans of system tables */
 typedef struct SysScanDescData
 {
 	Relation	heap_rel;		/* catalog being scanned */
 	Relation	irel;			/* NULL if doing heap scan */
-	struct HeapScanDescData *scan;			/* only valid in heap-scan case */
+	struct TableScanDescData *scan;		/* only valid in storage-scan case */
 	struct IndexScanDescData *iscan;		/* only valid in index-scan case */
 	struct SnapshotData *snapshot;		/* snapshot to unregister at end of scan */
+	struct TupleTableSlot *slot;
 }			SysScanDescData;
 
 #endif							/* RELSCAN_H */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index caeb5887d5d..758a7309961 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -14,31 +14,497 @@
 #ifndef TABLEAM_H
 #define TABLEAM_H
 
+#include "access/relscan.h"
+#include "access/sdir.h"
 #include "utils/guc.h"
+#include "utils/rel.h"
+#include "utils/snapshot.h"
 
 
 #define DEFAULT_TABLE_ACCESS_METHOD	"heap"
 
 extern char *default_table_access_method;
-
+extern bool synchronize_seqscans;
 
 
 /*
  * API struct for a table AM.  Note this must be allocated in a
  * server-lifetime manner, typically as a static const struct, which then gets
  * returned by FormData_pg_am.amhandler.
+ *
+ * I most cases it's not appropriate to directly call the callbacks directly,
+ * instead use the table_* wrapper functions.
+ *
+ * GetTableAmRoutine() asserts that required callbacks are filled in, remember
+ * to update when adding a callback.
  */
 typedef struct TableAmRoutine
 {
 	/* this must be set to T_TableAmRoutine */
 	NodeTag		type;
+
+
+	/* ------------------------------------------------------------------------
+	 * Slot related callbacks.
+	 * ------------------------------------------------------------------------
+	 */
+
+	/*
+	 * Return slot implementation suitable for storing a tuple of this AM.
+	 */
+	const TupleTableSlotOps *(*slot_callbacks) (Relation rel);
+
+
+	/* ------------------------------------------------------------------------
+	 * Table scan callbacks.
+	 * ------------------------------------------------------------------------
+	 */
+
+	/*
+	 * Start a scan of `rel`.  The callback has to return a TableScanDesc,
+	 * which will typically be embedded in a larger, AM specific, struct.
+	 *
+	 * If nkeys != 0, the results need to be filtered by those scan keys.
+	 *
+	 * pscan, if not NULL, will have already been initialized with
+	 * parallelscan_initialize(), and has to be for the same relation. Will
+	 * only be set coming from table_beginscan_parallel().
+	 *
+	 * allow_{strat, sync, pagemode} specify whether a scan strategy,
+	 * synchronized scans, or page mode may be used (although not every AM
+	 * will support those).
+	 *
+	 * is_{bitmapscan, samplescan} specify whether the scan is inteded to
+	 * support those types of scans.
+	 *
+	 * if temp_snap is true, the snapshot will need to be deallocated at
+	 * scan_end.
+	 */
+	TableScanDesc (*scan_begin) (Relation rel,
+								 Snapshot snapshot,
+								 int nkeys, struct ScanKeyData *key,
+								 ParallelTableScanDesc pscan,
+								 bool allow_strat,
+								 bool allow_sync,
+								 bool allow_pagemode,
+								 bool is_bitmapscan,
+								 bool is_samplescan,
+								 bool temp_snap);
+
+	/*
+	 * Release resources and deallocate scan. If TableScanDesc.temp_snap,
+	 * TableScanDesc.rs_snapshot needs to be unregistered.
+	 */
+	void		(*scan_end) (TableScanDesc scan);
+
+	/*
+	 * Restart relation scan.  If set_params is set to true, allow{strat,
+	 * sync, pagemode} (see scan_begin) changes should be taken into account.
+	 */
+	void		(*scan_rescan) (TableScanDesc scan, struct ScanKeyData *key, bool set_params,
+								bool allow_strat, bool allow_sync, bool allow_pagemode);
+
+	/*
+	 * Return next tuple from `scan`, store in slot.
+	 */
+	bool		(*scan_getnextslot) (TableScanDesc scan,
+									 ScanDirection direction, TupleTableSlot *slot);
+
+
+	/* ------------------------------------------------------------------------
+	 * Parallel table scan related functions.
+	 * ------------------------------------------------------------------------
+	 */
+
+	/*
+	 * Estimate the size of shared memory needed for a parallel scan of this
+	 * relation. The snapshot does not need to be accounted for.
+	 */
+	Size		(*parallelscan_estimate) (Relation rel);
+
+	/*
+	 * Initialize ParallelTableScanDesc for a parallel scan of this relation.
+	 * pscan will be sized according to parallelscan_estimate() for the same
+	 * relation.
+	 */
+	Size		(*parallelscan_initialize) (Relation rel, ParallelTableScanDesc pscan);
+
+	/*
+	 * Reinitilize `pscan` for a new scan. `rel` will be the same relation as
+	 * when `pscan` was initialized by parallelscan_initialize.
+	 */
+	void		(*parallelscan_reinitialize) (Relation rel, ParallelTableScanDesc pscan);
+
+
+	/* ------------------------------------------------------------------------
+	 * Index Scan Callbacks
+	 * ------------------------------------------------------------------------
+	 */
+
+	/*
+	 * Prepare to fetch tuples from the relation, as needed when fetching
+	 * tuples for an index scan.  The callback has to return a
+	 * IndexFetchTableData, which the AM will typically embed in a larger
+	 * structure with additional information.
+	 *
+	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
+	 */
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+
+	/*
+	 * Reset index fetch. Typically this will release cross index fetch
+	 * resources held in IndexFetchTableData.
+	 */
+	void		(*index_fetch_reset) (struct IndexFetchTableData *data);
+
+	/*
+	 * Release resources and deallocate index fetch.
+	 */
+	void		(*index_fetch_end) (struct IndexFetchTableData *data);
+
+	/*
+	 * Fetch tuple at `tid` into `slot`, after doing a visibility test
+	 * according to `snapshot`. If a tuple was found and passed the visibility
+	 * test, return true, false otherwise.
+	 *
+	 * Note that AMs that do not necessarily update indexes when indexed
+	 * columns do not change, need to return the current/correct version of a
+	 * tuple as appropriate, even if the tid points to an older version of the
+	 * tuple.
+	 *
+	 * *call_again is false on the first call to index_fetch_tuple for a tid.
+	 * If there potentially is another tuple matching the tid, *call_again
+	 * needs be set to true by index_fetch_tuple, signalling to the caller
+	 * that index_fetch_tuple should be called again for the same tid.
+	 *
+	 * *all_dead should be set to true by index_fetch_tuple iff it is
+	 * guaranteed that no backend needs to see that tuple. Index AMs can use
+	 * that do avoid returning that tid in future searches.
+	 */
+	bool		(*index_fetch_tuple) (struct IndexFetchTableData *scan,
+									  ItemPointer tid,
+									  Snapshot snapshot,
+									  TupleTableSlot *slot,
+									  bool *call_again, bool *all_dead);
+
+	/* ------------------------------------------------------------------------
+	 * Callbacks for hon-modifying operations on individual tuples
+	 * ------------------------------------------------------------------------
+	 */
+
+	/*
+	 * Does the tuple in `slot` satisfy `snapshot`?  The slot needs to be of
+	 * the appropriate type for the AM.
+	 */
+	bool		(*tuple_satisfies_snapshot) (Relation rel,
+											 TupleTableSlot *slot,
+											 Snapshot snapshot);
+
 } TableAmRoutine;
 
 
+/* ----------------------------------------------------------------------------
+ * Slot functions.
+ * ----------------------------------------------------------------------------
+ */
 
 /*
- * Functions in tableamapi.c
+ * Returns slot callbacks suitable for holding tuples of the appropriate type
+ * for the relation.  Works for tables, views, foreign tables and partitioned
+ * tables.
  */
+extern const TupleTableSlotOps *table_slot_callbacks(Relation rel);
+
+/*
+ * Returns slot using the callbacks returned by table_slot_callbacks(), and
+ * registers it on *reglist.
+ */
+extern TupleTableSlot *table_slot_create(Relation rel, List **reglist);
+
+
+/* ----------------------------------------------------------------------------
+ * Table scan functions.
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * Start a scan of `rel`. Returned tuples pass a visibility test of
+ * `snapshot`, and if nkeys != 0, the results are filtered by those scan keys.
+ */
+static inline TableScanDesc
+table_beginscan(Relation rel, Snapshot snapshot,
+				int nkeys, struct ScanKeyData *key)
+{
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL,
+									   true, true, true, false, false, false);
+}
+
+/*
+ * Like table_beginscan(), but for scanning catalog. It'll automatically use a
+ * snapshot appropriate for scanning catalog relations.
+ */
+extern TableScanDesc table_beginscan_catalog(Relation rel, int nkeys,
+						struct ScanKeyData *key);
+
+/*
+ * Like table_beginscan(), but table_beginscan_strat() offers an extended API
+ * that lets the caller control whether a nondefault buffer access strategy
+ * can be used, and whether syncscan can be chosen (possibly resulting in the
+ * scan not starting from block zero).  Both of these default to true with
+ * plain table_beginscan.
+ */
+static inline TableScanDesc
+table_beginscan_strat(Relation rel, Snapshot snapshot,
+					  int nkeys, struct ScanKeyData *key,
+					  bool allow_strat, bool allow_sync)
+{
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL,
+									   allow_strat, allow_sync, true,
+									   false, false, false);
+}
+
+
+/*
+ * table_beginscan_bm is an alternative entry point for setting up a
+ * TableScanDesc for a bitmap heap scan.  Although that scan technology is
+ * really quite unlike a standard seqscan, there is just enough commonality to
+ * make it worth using the same data structure.
+ */
+static inline TableScanDesc
+table_beginscan_bm(Relation rel, Snapshot snapshot,
+				   int nkeys, struct ScanKeyData *key)
+{
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL,
+									   false, false, true, true, false, false);
+}
+
+/*
+ * table_beginscan_sampling is an alternative entry point for setting up a
+ * TableScanDesc for a TABLESAMPLE scan.  As with bitmap scans, it's worth
+ * using the same data structure although the behavior is rather different.
+ * In addition to the options offered by table_beginscan_strat, this call
+ * also allows control of whether page-mode visibility checking is used.
+ */
+static inline TableScanDesc
+table_beginscan_sampling(Relation rel, Snapshot snapshot,
+						 int nkeys, struct ScanKeyData *key,
+						 bool allow_strat, bool allow_sync, bool allow_pagemode)
+{
+	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL,
+									   allow_strat, allow_sync, allow_pagemode,
+									   false, true, false);
+}
+
+/*
+ * table_beginscan_analyze is an alternative entry point for setting up a
+ * TableScanDesc for an ANALYZE scan.  As with bitmap scans, it's worth using
+ * the same data structure although the behavior is rather different.
+ */
+static inline TableScanDesc
+table_beginscan_analyze(Relation rel)
+{
+	return rel->rd_tableam->scan_begin(rel, NULL, 0, NULL, NULL,
+									   true, false, true,
+									   false, true, false);
+}
+
+/*
+ * End relation scan.
+ */
+static inline void
+table_endscan(TableScanDesc scan)
+{
+	scan->rs_rd->rd_tableam->scan_end(scan);
+}
+
+
+/*
+ * Restart a relation scan.
+ */
+static inline void
+table_rescan(TableScanDesc scan,
+			 struct ScanKeyData *key)
+{
+	scan->rs_rd->rd_tableam->scan_rescan(scan, key, false, false, false, false);
+}
+
+/*
+ * Restart a relation scan after changing params.
+ *
+ * This call allows changing the buffer strategy, syncscan, and pagemode
+ * options before starting a fresh scan.  Note that although the actual use of
+ * syncscan might change (effectively, enabling or disabling reporting), the
+ * previously selected startblock will be kept.
+ */
+static inline void
+table_rescan_set_params(TableScanDesc scan, struct ScanKeyData *key,
+						bool allow_strat, bool allow_sync, bool allow_pagemode)
+{
+	scan->rs_rd->rd_tableam->scan_rescan(scan, key, true,
+										 allow_strat, allow_sync,
+										 allow_pagemode);
+}
+
+/*
+ * Update snapshot used by the scan.
+ */
+extern void table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot);
+
+
+/*
+ * Return next tuple from `scan`, store in slot.
+ */
+static inline bool
+table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *slot)
+{
+	slot->tts_tableOid = RelationGetRelid(sscan->rs_rd);
+	return sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, slot);
+}
+
+
+/* ----------------------------------------------------------------------------
+ * Parallel table scan related functions.
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * Estimate the size of shared memory needed for a parallel scan of this
+ * relation.
+ */
+extern Size table_parallelscan_estimate(Relation rel, Snapshot snapshot);
+
+/*
+ * Initialize ParallelTableScanDesc for a parallel scan of this
+ * relation. `pscan` needs to be sized according to parallelscan_estimate()
+ * for the same relation.  Call this just once in the leader process; then,
+ * individual workers attach via table_beginscan_parallel.
+ */
+extern void table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan, Snapshot snapshot);
+
+/*
+ * Begin a parallel scan. `pscan` needs to have been initialized with
+ * table_parallelscan_initialize(), for the same relation. The initialization
+ * does not need to have happened in this backend.
+ *
+ * Caller must hold a suitable lock on the correct relation.
+ */
+extern TableScanDesc table_beginscan_parallel(Relation rel, ParallelTableScanDesc pscan);
+
+/*
+ * Restart a parallel scan.  Call this in the leader process.  Caller is
+ * responsible for making sure that all workers have finished the scan
+ * beforehand.
+ */
+static inline void
+table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
+{
+	return rel->rd_tableam->parallelscan_reinitialize(rel, pscan);
+}
+
+
+/* ----------------------------------------------------------------------------
+ *  Index scan related functions.
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * Prepare to fetch tuples from the relation, as needed when fetching tuples
+ * for an index scan.
+ *
+ * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
+ */
+static inline IndexFetchTableData *
+table_index_fetch_begin(Relation rel)
+{
+	return rel->rd_tableam->index_fetch_begin(rel);
+}
+
+/*
+ * Reset index fetch. Typically this will release cross index fetch resources
+ * held in IndexFetchTableData.
+ */
+static inline void
+table_index_fetch_reset(struct IndexFetchTableData *scan)
+{
+	scan->rel->rd_tableam->index_fetch_reset(scan);
+}
+
+/*
+ * Release resources and deallocate index fetch.
+ */
+static inline void
+table_index_fetch_end(struct IndexFetchTableData *scan)
+{
+	scan->rel->rd_tableam->index_fetch_end(scan);
+}
+
+/*
+ * Fetches tuple at `tid` into `slot`, after doing a visibility test according
+ * to `snapshot`. If a tuple was found and passed the visibility test, returns
+ * true, false otherwise.
+ *
+ * *call_again needs to be false on the first call to table_index_fetch_tuple() for
+ * a tid. If there potentially is another tuple matching the tid, *call_again
+ * will be set to true, signalling that table_index_fetch_tuple() should be called
+ * again for the same tid.
+ *
+ * *all_dead will be set to true by table_index_fetch_tuple() iff it is guaranteed
+ * that no backend needs to see that tuple. Index AMs can use that do avoid
+ * returning that tid in future searches.
+ */
+static inline bool
+table_index_fetch_tuple(struct IndexFetchTableData *scan,
+						ItemPointer tid,
+						Snapshot snapshot,
+						TupleTableSlot *slot,
+						bool *call_again, bool *all_dead)
+{
+
+	return scan->rel->rd_tableam->index_fetch_tuple(scan, tid, snapshot,
+													slot, call_again,
+													all_dead);
+}
+
+
+/* ------------------------------------------------------------------------
+ * Functions for non-modifying operations on individual tuples
+ * ------------------------------------------------------------------------
+ */
+
+/*
+ * Return true iff tuple in slot satisfies the snapshot.
+ *
+ * This assumes the slot's tuple is valid, and of the appropriate type for the
+ * AM.
+ *
+ * Some AMs might modify the data underlying the tuple as a side-effect. If so
+ * they ought to mark the relevant buffer dirty.
+ */
+static inline bool
+table_tuple_satisfies_snapshot(Relation rel, TupleTableSlot *slot, Snapshot snapshot)
+{
+	return rel->rd_tableam->tuple_satisfies_snapshot(rel, slot, snapshot);
+}
+
+
+/* ----------------------------------------------------------------------------
+ * Helper functions to implement parallel scans for block oriented AMs.
+ * ----------------------------------------------------------------------------
+ */
+
+extern Size table_block_parallelscan_estimate(Relation rel);
+extern Size table_block_parallelscan_initialize(Relation rel,
+									ParallelTableScanDesc pscan);
+extern void table_block_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan);
+extern BlockNumber table_block_parallelscan_nextpage(Relation rel, ParallelBlockTableScanDesc pbscan);
+extern void table_block_parallelscan_startblock_init(Relation rel, ParallelBlockTableScanDesc pbscan);
+
+
+/* ----------------------------------------------------------------------------
+ * Functions in tableamapi.c
+ * ----------------------------------------------------------------------------
+ */
+
 extern const TableAmRoutine *GetTableAmRoutine(Oid amhandler);
 extern const TableAmRoutine *GetTableAmRoutineByAmId(Oid amoid);
 extern const TableAmRoutine *GetHeapamTableAmRoutine(void);
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 330c481a8b7..29f7ed62379 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -110,13 +110,14 @@ extern void index_build(Relation heapRelation,
 			bool isreindex,
 			bool parallel);
 
+struct TableScanDescData;
 extern double IndexBuildHeapScan(Relation heapRelation,
 				   Relation indexRelation,
 				   IndexInfo *indexInfo,
 				   bool allow_sync,
 				   IndexBuildCallback callback,
 				   void *callback_state,
-				   struct HeapScanDescData *scan);
+				   struct TableScanDescData *scan);
 extern double IndexBuildHeapRangeScan(Relation heapRelation,
 						Relation indexRelation,
 						IndexInfo *indexInfo,
@@ -126,7 +127,7 @@ extern double IndexBuildHeapRangeScan(Relation heapRelation,
 						BlockNumber end_blockno,
 						IndexBuildCallback callback,
 						void *callback_state,
-						struct HeapScanDescData *scan);
+						struct TableScanDescData *scan);
 
 extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index fd13c170d79..62eb1a06eef 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1270,7 +1270,7 @@ typedef struct ScanState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	Relation	ss_currentRelation;
-	struct HeapScanDescData *ss_currentScanDesc;
+	struct TableScanDescData *ss_currentScanDesc;
 	TupleTableSlot *ss_ScanTupleSlot;
 } ScanState;
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 7a5d8c47e12..b821df9e712 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1018,6 +1018,8 @@ IndexBulkDeleteCallback
 IndexBulkDeleteResult
 IndexClauseSet
 IndexElem
+IndexFetchHeapData
+IndexFetchTableData
 IndexInfo
 IndexList
 IndexOnlyScan
@@ -1602,6 +1604,8 @@ PagetableEntry
 Pairs
 ParallelAppendState
 ParallelBitmapHeapState
+ParallelBlockTableScanDesc
+ParallelBlockTableScanDescData
 ParallelCompletionPtr
 ParallelContext
 ParallelExecutorInfo
@@ -1609,8 +1613,8 @@ ParallelHashGrowth
 ParallelHashJoinBatch
 ParallelHashJoinBatchAccessor
 ParallelHashJoinState
-ParallelHeapScanDesc
-ParallelHeapScanDescData
+ParallelTableScanDesc
+ParallelTableScanDescData
 ParallelIndexScanDesc
 ParallelSlot
 ParallelState
@@ -2316,6 +2320,8 @@ TableFuncScanState
 TableInfo
 TableLikeClause
 TableSampleClause
+TableScanDesc
+TableScanDescData
 TableSpaceCacheEntry
 TableSpaceOpts
 TablespaceList
@@ -2410,6 +2416,7 @@ TupleHashIterator
 TupleHashTable
 TupleQueueReader
 TupleTableSlot
+TupleTableSlotOps
 TuplesortInstrumentation
 TuplesortMethod
 TuplesortSpaceType
-- 
2.21.0.dirty

v18-0002-tableam-Only-allow-heap-in-a-number-of-contrib-m.patchtext/x-diff; charset=us-asciiDownload
From 5b3fc889578871de617688ae6d2cab3db8966a18 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 8 Mar 2019 12:34:24 -0800
Subject: [PATCH v18 02/18] tableam: Only allow heap in a number of contrib
 modules.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 contrib/pgrowlocks/pgrowlocks.c    | 5 +++++
 contrib/pgstattuple/pgstatapprox.c | 7 ++++++-
 contrib/pgstattuple/pgstattuple.c  | 5 +++++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index 2d2a6cf1533..82b60d08cf0 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -30,6 +30,7 @@
 #include "access/tableam.h"
 #include "access/xact.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_am_d.h"
 #include "catalog/pg_authid.h"
 #include "funcapi.h"
 #include "miscadmin.h"
@@ -101,6 +102,10 @@ pgrowlocks(PG_FUNCTION_ARGS)
 		relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
 		rel = relation_openrv(relrv, AccessShareLock);
 
+		if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+			ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							errmsg("only heap AM is supported")));
+
 		if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 			ereport(ERROR,
 					(errcode(ERRCODE_WRONG_OBJECT_TYPE),
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index d36758af9a6..ed62aef7669 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -20,6 +20,8 @@
 #include "access/multixact.h"
 #include "access/htup_details.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_am_d.h"
+#include "commands/vacuum.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "storage/bufmgr.h"
@@ -27,7 +29,6 @@
 #include "storage/procarray.h"
 #include "storage/lmgr.h"
 #include "utils/builtins.h"
-#include "commands/vacuum.h"
 
 PG_FUNCTION_INFO_V1(pgstattuple_approx);
 PG_FUNCTION_INFO_V1(pgstattuple_approx_v1_5);
@@ -291,6 +292,10 @@ pgstattuple_approx_internal(Oid relid, FunctionCallInfo fcinfo)
 				 errmsg("\"%s\" is not a table or materialized view",
 						RelationGetRelationName(rel))));
 
+	if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+		ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						errmsg("only heap AM is supported")));
+
 	statapprox_heap(rel, &stat);
 
 	relation_close(rel, AccessShareLock);
diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 7e1c3080006..ac7a203f1ac 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -328,6 +328,11 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 	pgstattuple_type stat = {0};
 	SnapshotData SnapshotDirty;
 
+	if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("only heap AM is supported")));
+
 	/* Disable syncscan because we assume we scan from block zero upwards */
 	scan = table_beginscan_strat(rel, SnapshotAny, 0, NULL, true, false);
 	hscan = (HeapScanDesc) scan;
-- 
2.21.0.dirty

v18-0003-tableam-Add-insert-delete-update-lock_tuple.patchtext/x-diff; charset=utf-8Download
From 27a3b4226e83e016f6e427ce9907a9c415b9cd59 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 7 Mar 2019 16:23:34 -0800
Subject: [PATCH v18 03/18] tableam: Add insert, delete, update, lock_tuple.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/heap/heapam.c              |  88 ++---
 src/backend/access/heap/heapam_handler.c      | 353 ++++++++++++++++++
 src/backend/access/heap/heapam_visibility.c   |  23 +-
 src/backend/access/heap/tuptoaster.c          |   2 +-
 src/backend/access/table/tableam.c            | 101 +++++
 src/backend/commands/copy.c                   |   3 +-
 src/backend/commands/createas.c               |   3 -
 src/backend/commands/trigger.c                | 102 +++--
 src/backend/executor/execIndexing.c           |   4 +-
 src/backend/executor/execMain.c               | 279 +-------------
 src/backend/executor/execReplication.c        | 106 ++----
 src/backend/executor/nodeLockRows.c           |  73 ++--
 src/backend/executor/nodeModifyTable.c        | 324 +++++++++-------
 src/backend/executor/nodeTidscan.c            |   2 +-
 src/include/access/heapam.h                   |  45 +--
 src/include/access/tableam.h                  | 158 ++++++++
 src/include/executor/executor.h               |  12 +-
 src/include/nodes/lockoptions.h               |   5 +
 src/include/utils/snapshot.h                  |   1 +
 .../expected/partition-key-update-1.out       |   2 +-
 20 files changed, 1027 insertions(+), 659 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d6e32d6ce21..a2bb1701e40 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1386,13 +1386,12 @@ heap_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *s
  */
 bool
 heap_fetch(Relation relation,
+		   ItemPointer tid,
 		   Snapshot snapshot,
 		   HeapTuple tuple,
 		   Buffer *userbuf,
-		   bool keep_buf,
 		   Relation stats_relation)
 {
-	ItemPointer tid = &(tuple->t_self);
 	ItemId		lp;
 	Buffer		buffer;
 	Page		page;
@@ -1419,13 +1418,8 @@ heap_fetch(Relation relation,
 	if (offnum < FirstOffsetNumber || offnum > PageGetMaxOffsetNumber(page))
 	{
 		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
-		if (keep_buf)
-			*userbuf = buffer;
-		else
-		{
-			ReleaseBuffer(buffer);
-			*userbuf = InvalidBuffer;
-		}
+		ReleaseBuffer(buffer);
+		*userbuf = InvalidBuffer;
 		tuple->t_data = NULL;
 		return false;
 	}
@@ -1441,20 +1435,16 @@ heap_fetch(Relation relation,
 	if (!ItemIdIsNormal(lp))
 	{
 		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
-		if (keep_buf)
-			*userbuf = buffer;
-		else
-		{
-			ReleaseBuffer(buffer);
-			*userbuf = InvalidBuffer;
-		}
+		ReleaseBuffer(buffer);
+		*userbuf = InvalidBuffer;
 		tuple->t_data = NULL;
 		return false;
 	}
 
 	/*
-	 * fill in *tuple fields
+	 * fill in tuple fields and place it in stuple
 	 */
+	ItemPointerCopy(tid, &(tuple->t_self));
 	tuple->t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tuple->t_len = ItemIdGetLength(lp);
 	tuple->t_tableOid = RelationGetRelid(relation);
@@ -1486,14 +1476,9 @@ heap_fetch(Relation relation,
 		return true;
 	}
 
-	/* Tuple failed time qual, but maybe caller wants to see it anyway. */
-	if (keep_buf)
-		*userbuf = buffer;
-	else
-	{
-		ReleaseBuffer(buffer);
-		*userbuf = InvalidBuffer;
-	}
+	/* Tuple failed time qual */
+	ReleaseBuffer(buffer);
+	*userbuf = InvalidBuffer;
 
 	return false;
 }
@@ -2703,6 +2688,7 @@ l1:
 	{
 		Assert(result == HeapTupleSelfUpdated ||
 			   result == HeapTupleUpdated ||
+			   result == HeapTupleDeleted ||
 			   result == HeapTupleBeingUpdated);
 		Assert(!(tp.t_data->t_infomask & HEAP_XMAX_INVALID));
 		hufd->ctid = tp.t_data->t_ctid;
@@ -2716,6 +2702,8 @@ l1:
 			UnlockTupleTuplock(relation, &(tp.t_self), LockTupleExclusive);
 		if (vmbuffer != InvalidBuffer)
 			ReleaseBuffer(vmbuffer);
+		if (result == HeapTupleUpdated && ItemPointerEquals(tid, &hufd->ctid))
+			result = HeapTupleDeleted;
 		return result;
 	}
 
@@ -2932,6 +2920,10 @@ simple_heap_delete(Relation relation, ItemPointer tid)
 			elog(ERROR, "tuple concurrently updated");
 			break;
 
+		case HeapTupleDeleted:
+			elog(ERROR, "tuple concurrently deleted");
+			break;
+
 		default:
 			elog(ERROR, "unrecognized heap_delete status: %u", result);
 			break;
@@ -3336,6 +3328,7 @@ l2:
 	{
 		Assert(result == HeapTupleSelfUpdated ||
 			   result == HeapTupleUpdated ||
+			   result == HeapTupleDeleted ||
 			   result == HeapTupleBeingUpdated);
 		Assert(!(oldtup.t_data->t_infomask & HEAP_XMAX_INVALID));
 		hufd->ctid = oldtup.t_data->t_ctid;
@@ -3354,6 +3347,8 @@ l2:
 		bms_free(id_attrs);
 		bms_free(modified_attrs);
 		bms_free(interesting_attrs);
+		if (result == HeapTupleUpdated && ItemPointerEquals(otid, &hufd->ctid))
+			result = HeapTupleDeleted;
 		return result;
 	}
 
@@ -3971,6 +3966,10 @@ simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup)
 			elog(ERROR, "tuple concurrently updated");
 			break;
 
+		case HeapTupleDeleted:
+			elog(ERROR, "tuple concurrently deleted");
+			break;
+
 		default:
 			elog(ERROR, "unrecognized heap_update status: %u", result);
 			break;
@@ -4005,7 +4004,7 @@ get_mxact_status_for_lock(LockTupleMode mode, bool is_update)
  *
  * Input parameters:
  *	relation: relation containing tuple (caller must hold suitable lock)
- *	tuple->t_self: TID of tuple to lock (rest of struct need not be valid)
+ *	tid: TID of tuple to lock
  *	cid: current command ID (used for visibility test, and stored into
  *		tuple's cmax if lock is successful)
  *	mode: indicates if shared or exclusive tuple lock is desired
@@ -4023,6 +4022,7 @@ get_mxact_status_for_lock(LockTupleMode mode, bool is_update)
  *	HeapTupleInvisible: lock failed because tuple was never visible to us
  *	HeapTupleSelfUpdated: lock failed because tuple updated by self
  *	HeapTupleUpdated: lock failed because tuple updated by other xact
+ *	HeapTupleDeleted: lock failed because tuple deleted by other xact
  *	HeapTupleWouldBlock: lock couldn't be acquired and wait_policy is skip
  *
  * In the failure cases other than HeapTupleInvisible, the routine fills
@@ -4035,13 +4035,12 @@ get_mxact_status_for_lock(LockTupleMode mode, bool is_update)
  * See README.tuplock for a thorough explanation of this mechanism.
  */
 HTSU_Result
-heap_lock_tuple(Relation relation, HeapTuple tuple,
+heap_lock_tuple(Relation relation, ItemPointer tid,
 				CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
 				bool follow_updates,
-				Buffer *buffer, HeapUpdateFailureData *hufd)
+				HeapTuple tuple, Buffer *buffer, HeapUpdateFailureData *hufd)
 {
 	HTSU_Result result;
-	ItemPointer tid = &(tuple->t_self);
 	ItemId		lp;
 	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
@@ -4076,6 +4075,7 @@ heap_lock_tuple(Relation relation, HeapTuple tuple,
 	tuple->t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tuple->t_len = ItemIdGetLength(lp);
 	tuple->t_tableOid = RelationGetRelid(relation);
+	tuple->t_self = *tid;
 
 l3:
 	result = HeapTupleSatisfiesUpdate(tuple, cid, *buffer);
@@ -4091,7 +4091,7 @@ l3:
 		result = HeapTupleInvisible;
 		goto out_locked;
 	}
-	else if (result == HeapTupleBeingUpdated || result == HeapTupleUpdated)
+	else if (result == HeapTupleBeingUpdated || result == HeapTupleUpdated || result == HeapTupleDeleted)
 	{
 		TransactionId xwait;
 		uint16		infomask;
@@ -4371,7 +4371,7 @@ l3:
 		 * or we must wait for the locking transaction or multixact; so below
 		 * we ensure that we grab buffer lock after the sleep.
 		 */
-		if (require_sleep && result == HeapTupleUpdated)
+		if (require_sleep && (result == HeapTupleUpdated || result == HeapTupleDeleted))
 		{
 			LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
 			goto failed;
@@ -4531,6 +4531,8 @@ l3:
 			HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_data->t_infomask) ||
 			HeapTupleHeaderIsOnlyLocked(tuple->t_data))
 			result = HeapTupleMayBeUpdated;
+		else if (ItemPointerEquals(&tuple->t_self, &tuple->t_data->t_ctid))
+			result = HeapTupleDeleted;
 		else
 			result = HeapTupleUpdated;
 	}
@@ -4539,7 +4541,7 @@ failed:
 	if (result != HeapTupleMayBeUpdated)
 	{
 		Assert(result == HeapTupleSelfUpdated || result == HeapTupleUpdated ||
-			   result == HeapTupleWouldBlock);
+			   result == HeapTupleWouldBlock || result == HeapTupleDeleted);
 		Assert(!(tuple->t_data->t_infomask & HEAP_XMAX_INVALID));
 		hufd->ctid = tuple->t_data->t_ctid;
 		hufd->xmax = HeapTupleHeaderGetUpdateXid(tuple->t_data);
@@ -5143,9 +5145,8 @@ heap_lock_updated_tuple_rec(Relation rel, ItemPointer tid, TransactionId xid,
 		new_infomask = 0;
 		new_xmax = InvalidTransactionId;
 		block = ItemPointerGetBlockNumber(&tupid);
-		ItemPointerCopy(&tupid, &(mytup.t_self));
 
-		if (!heap_fetch(rel, SnapshotAny, &mytup, &buf, false, NULL))
+		if (!heap_fetch(rel, &tupid, SnapshotAny, &mytup, &buf, NULL))
 		{
 			/*
 			 * if we fail to find the updated version of the tuple, it's
@@ -5428,6 +5429,10 @@ next:
 	result = HeapTupleMayBeUpdated;
 
 out_locked:
+
+	if (result == HeapTupleUpdated && ItemPointerEquals(&mytup.t_self, &mytup.t_data->t_ctid))
+		result = HeapTupleDeleted;
+
 	UnlockReleaseBuffer(buf);
 
 out_unlocked:
@@ -5505,7 +5510,7 @@ heap_lock_updated_tuple(Relation rel, HeapTuple tuple, ItemPointer ctid,
  * An explicit confirmation WAL record also makes logical decoding simpler.
  */
 void
-heap_finish_speculative(Relation relation, HeapTuple tuple)
+heap_finish_speculative(Relation relation, ItemPointer tid)
 {
 	Buffer		buffer;
 	Page		page;
@@ -5513,11 +5518,11 @@ heap_finish_speculative(Relation relation, HeapTuple tuple)
 	ItemId		lp = NULL;
 	HeapTupleHeader htup;
 
-	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
+	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
 	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 	page = (Page) BufferGetPage(buffer);
 
-	offnum = ItemPointerGetOffsetNumber(&(tuple->t_self));
+	offnum = ItemPointerGetOffsetNumber(tid);
 	if (PageGetMaxOffsetNumber(page) >= offnum)
 		lp = PageGetItemId(page, offnum);
 
@@ -5533,7 +5538,7 @@ heap_finish_speculative(Relation relation, HeapTuple tuple)
 	/* NO EREPORT(ERROR) from here till changes are logged */
 	START_CRIT_SECTION();
 
-	Assert(HeapTupleHeaderIsSpeculative(tuple->t_data));
+	Assert(HeapTupleHeaderIsSpeculative(htup));
 
 	MarkBufferDirty(buffer);
 
@@ -5541,7 +5546,7 @@ heap_finish_speculative(Relation relation, HeapTuple tuple)
 	 * Replace the speculative insertion token with a real t_ctid, pointing to
 	 * itself like it does on regular tuples.
 	 */
-	htup->t_ctid = tuple->t_self;
+	htup->t_ctid = *tid;
 
 	/* XLOG stuff */
 	if (RelationNeedsWAL(relation))
@@ -5549,7 +5554,7 @@ heap_finish_speculative(Relation relation, HeapTuple tuple)
 		xl_heap_confirm xlrec;
 		XLogRecPtr	recptr;
 
-		xlrec.offnum = ItemPointerGetOffsetNumber(&tuple->t_self);
+		xlrec.offnum = ItemPointerGetOffsetNumber(tid);
 
 		XLogBeginInsert();
 
@@ -5596,10 +5601,9 @@ heap_finish_speculative(Relation relation, HeapTuple tuple)
  * confirmation records.
  */
 void
-heap_abort_speculative(Relation relation, HeapTuple tuple)
+heap_abort_speculative(Relation relation, ItemPointer tid)
 {
 	TransactionId xid = GetCurrentTransactionId();
-	ItemPointer tid = &(tuple->t_self);
 	ItemId		lp;
 	HeapTupleData tp;
 	Page		page;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6a26fcef94c..3285197c558 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -21,7 +21,9 @@
 
 #include "access/heapam.h"
 #include "access/tableam.h"
+#include "access/xact.h"
 #include "storage/bufmgr.h"
+#include "storage/lmgr.h"
 #include "utils/builtins.h"
 
 
@@ -169,6 +171,350 @@ heapam_tuple_satisfies_snapshot(Relation rel, TupleTableSlot *slot,
 }
 
 
+/* ----------------------------------------------------------------------------
+ *  Functions for manipulations of physical tuples for heap AM.
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * Insert a heap tuple from a slot, which may contain an OID and speculative
+ * insertion token.
+ */
+static void
+heapam_heap_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
+				   int options, BulkInsertState bistate)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+
+	/* Update the tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(relation);
+	if (slot->tts_tableOid != InvalidOid)
+		tuple->t_tableOid = slot->tts_tableOid;
+
+	/* Perform the insertion, and copy the resulting ItemPointer */
+	heap_insert(relation, tuple, cid, options, bistate);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+static void
+heapam_heap_insert_speculative(Relation relation, TupleTableSlot *slot, CommandId cid,
+							   int options, BulkInsertState bistate, uint32 specToken)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+
+	/* Update the tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(relation);
+	if (slot->tts_tableOid != InvalidOid)
+		tuple->t_tableOid = slot->tts_tableOid;
+
+	HeapTupleHeaderSetSpeculativeToken(tuple->t_data, specToken);
+
+	/* Perform the insertion, and copy the resulting ItemPointer */
+	heap_insert(relation, tuple, cid, options, bistate);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+static void
+heapam_heap_complete_speculative(Relation relation, TupleTableSlot *slot, uint32 spekToken,
+								 bool succeeded)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+
+	/* adjust the tuple's state accordingly */
+	if (!succeeded)
+		heap_finish_speculative(relation, &slot->tts_tid);
+	else
+		heap_abort_speculative(relation, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+static HTSU_Result
+heapam_heap_delete(Relation relation, ItemPointer tid, CommandId cid,
+				   Snapshot snapshot, Snapshot crosscheck, bool wait,
+				   HeapUpdateFailureData *hufd, bool changingPart)
+{
+	/*
+	 * Currently Deleting of index tuples are handled at vacuum, in case if
+	 * the storage itself is cleaning the dead tuples by itself, it is the
+	 * time to call the index tuple deletion also.
+	 */
+	return heap_delete(relation, tid, cid, crosscheck, wait, hufd, changingPart);
+}
+
+
+static HTSU_Result
+heapam_heap_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
+				   CommandId cid, Snapshot snapshot, Snapshot crosscheck,
+				   bool wait, HeapUpdateFailureData *hufd,
+				   LockTupleMode *lockmode, bool *update_indexes)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+	HTSU_Result result;
+
+	/* Update the tuple with table oid */
+	if (slot->tts_tableOid != InvalidOid)
+		tuple->t_tableOid = slot->tts_tableOid;
+
+	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
+						 hufd, lockmode);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	slot->tts_tableOid = RelationGetRelid(relation);
+
+	/*
+	 * Decide whether new index entries are needed for the tuple
+	 *
+	 * Note: heap_update returns the tid (location) of the new tuple in the
+	 * t_self field.
+	 *
+	 * If it's a HOT update, we mustn't insert new index entries.
+	 */
+	*update_indexes = result == HeapTupleMayBeUpdated &&
+		!HeapTupleIsHeapOnly(tuple);
+
+	if (shouldFree)
+		pfree(tuple);
+
+	return result;
+}
+
+/*
+ * Locks tuple and fetches its newest version and TID.
+ *
+ *	relation - table containing tuple
+ *	tid - TID of tuple to lock
+ *	snapshot - snapshot indentifying required version (used for assert check only)
+ *	slot - tuple to be returned
+ *	cid - current command ID (used for visibility test, and stored into
+ *		  tuple's cmax if lock is successful)
+ *	mode - indicates if shared or exclusive tuple lock is desired
+ *	wait_policy - what to do if tuple lock is not available
+ *	flags – indicating how do we handle updated tuples
+ *	*hufd - filled in failure cases
+ *
+ * Function result may be:
+ *	HeapTupleMayBeUpdated: lock was successfully acquired
+ *	HeapTupleInvisible: lock failed because tuple was never visible to us
+ *	HeapTupleSelfUpdated: lock failed because tuple updated by self
+ *	HeapTupleUpdated: lock failed because tuple updated by other xact
+ *	HeapTupleDeleted: lock failed because tuple deleted by other xact
+ *	HeapTupleWouldBlock: lock couldn't be acquired and wait_policy is skip
+ *
+ * In the failure cases other than HeapTupleInvisible, the routine fills
+ * *hufd with the tuple's t_ctid, t_xmax (resolving a possible MultiXact,
+ * if necessary), and t_cmax (the last only for HeapTupleSelfUpdated,
+ * since we cannot obtain cmax from a combocid generated by another
+ * transaction).
+ * See comments for struct HeapUpdateFailureData for additional info.
+ */
+static HTSU_Result
+heapam_lock_tuple(Relation relation, ItemPointer tid, Snapshot snapshot,
+				  TupleTableSlot *slot, CommandId cid, LockTupleMode mode,
+				  LockWaitPolicy wait_policy, uint8 flags,
+				  HeapUpdateFailureData *hufd)
+{
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	HTSU_Result result;
+	Buffer		buffer;
+	HeapTuple	tuple = &bslot->base.tupdata;
+
+	hufd->traversed = false;
+
+	Assert(TTS_IS_BUFFERTUPLE(slot));
+
+retry:
+	result = heap_lock_tuple(relation, tid, cid, mode, wait_policy,
+							 (flags & TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS) ? true : false,
+							 tuple, &buffer, hufd);
+
+	if (result == HeapTupleUpdated &&
+		(flags & TUPLE_LOCK_FLAG_FIND_LAST_VERSION))
+	{
+		ReleaseBuffer(buffer);
+		/* Should not encounter speculative tuple on recheck */
+		Assert(!HeapTupleHeaderIsSpeculative(tuple->t_data));
+
+		if (!ItemPointerEquals(&hufd->ctid, &tuple->t_self))
+		{
+			SnapshotData SnapshotDirty;
+			TransactionId priorXmax;
+
+			/* it was updated, so look at the updated version */
+			*tid = hufd->ctid;
+			/* updated row should have xmin matching this xmax */
+			priorXmax = hufd->xmax;
+
+			/*
+			 * fetch target tuple
+			 *
+			 * Loop here to deal with updated or busy tuples
+			 */
+			InitDirtySnapshot(SnapshotDirty);
+			for (;;)
+			{
+				if (ItemPointerIndicatesMovedPartitions(tid))
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("tuple to be locked was already moved to another partition due to concurrent update")));
+
+
+				if (heap_fetch(relation, tid, &SnapshotDirty, tuple, &buffer, NULL))
+				{
+					/*
+					 * If xmin isn't what we're expecting, the slot must have
+					 * been recycled and reused for an unrelated tuple.  This
+					 * implies that the latest version of the row was deleted,
+					 * so we need do nothing.  (Should be safe to examine xmin
+					 * without getting buffer's content lock.  We assume
+					 * reading a TransactionId to be atomic, and Xmin never
+					 * changes in an existing tuple, except to invalid or
+					 * frozen, and neither of those can match priorXmax.)
+					 */
+					if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple->t_data),
+											 priorXmax))
+					{
+						ReleaseBuffer(buffer);
+						return HeapTupleDeleted;
+					}
+
+					/* otherwise xmin should not be dirty... */
+					if (TransactionIdIsValid(SnapshotDirty.xmin))
+						elog(ERROR, "t_xmin is uncommitted in tuple to be updated");
+
+					/*
+					 * If tuple is being updated by other transaction then we
+					 * have to wait for its commit/abort, or die trying.
+					 */
+					if (TransactionIdIsValid(SnapshotDirty.xmax))
+					{
+						ReleaseBuffer(buffer);
+						switch (wait_policy)
+						{
+							case LockWaitBlock:
+								XactLockTableWait(SnapshotDirty.xmax,
+												  relation, &tuple->t_self,
+												  XLTW_FetchUpdated);
+								break;
+							case LockWaitSkip:
+								if (!ConditionalXactLockTableWait(SnapshotDirty.xmax))
+									return result;	/* skip instead of waiting */
+								break;
+							case LockWaitError:
+								if (!ConditionalXactLockTableWait(SnapshotDirty.xmax))
+									ereport(ERROR,
+											(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
+											 errmsg("could not obtain lock on row in relation \"%s\"",
+													RelationGetRelationName(relation))));
+								break;
+						}
+						continue;	/* loop back to repeat heap_fetch */
+					}
+
+					/*
+					 * If tuple was inserted by our own transaction, we have
+					 * to check cmin against es_output_cid: cmin >= current
+					 * CID means our command cannot see the tuple, so we
+					 * should ignore it. Otherwise heap_lock_tuple() will
+					 * throw an error, and so would any later attempt to
+					 * update or delete the tuple.  (We need not check cmax
+					 * because HeapTupleSatisfiesDirty will consider a tuple
+					 * deleted by our transaction dead, regardless of cmax.)
+					 * We just checked that priorXmax == xmin, so we can test
+					 * that variable instead of doing HeapTupleHeaderGetXmin
+					 * again.
+					 */
+					if (TransactionIdIsCurrentTransactionId(priorXmax) &&
+						HeapTupleHeaderGetCmin(tuple->t_data) >= cid)
+					{
+						ReleaseBuffer(buffer);
+						return result;
+					}
+
+					hufd->traversed = true;
+					*tid = tuple->t_data->t_ctid;
+					ReleaseBuffer(buffer);
+					goto retry;
+				}
+
+				/*
+				 * If the referenced slot was actually empty, the latest
+				 * version of the row must have been deleted, so we need do
+				 * nothing.
+				 */
+				if (tuple->t_data == NULL)
+				{
+					return HeapTupleDeleted;
+				}
+
+				/*
+				 * As above, if xmin isn't what we're expecting, do nothing.
+				 */
+				if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple->t_data),
+										 priorXmax))
+				{
+					if (BufferIsValid(buffer))
+						ReleaseBuffer(buffer);
+					return HeapTupleDeleted;
+				}
+
+				/*
+				 * If we get here, the tuple was found but failed
+				 * SnapshotDirty. Assuming the xmin is either a committed xact
+				 * or our own xact (as it certainly should be if we're trying
+				 * to modify the tuple), this must mean that the row was
+				 * updated or deleted by either a committed xact or our own
+				 * xact.  If it was deleted, we can ignore it; if it was
+				 * updated then chain up to the next version and repeat the
+				 * whole process.
+				 *
+				 * As above, it should be safe to examine xmax and t_ctid
+				 * without the buffer content lock, because they can't be
+				 * changing.
+				 */
+				if (ItemPointerEquals(&tuple->t_self, &tuple->t_data->t_ctid))
+				{
+					/* deleted, so forget about it */
+					if (BufferIsValid(buffer))
+						ReleaseBuffer(buffer);
+					return HeapTupleDeleted;
+				}
+
+				/* updated, so look at the updated row */
+				*tid = tuple->t_data->t_ctid;
+				/* updated row should have xmin matching this xmax */
+				priorXmax = HeapTupleHeaderGetUpdateXid(tuple->t_data);
+				if (BufferIsValid(buffer))
+					ReleaseBuffer(buffer);
+				/* loop back to fetch next in chain */
+			}
+		}
+		else
+		{
+			/* tuple was deleted, so give up */
+			return HeapTupleDeleted;
+		}
+	}
+
+	slot->tts_tableOid = RelationGetRelid(relation);
+	/* store in slot, transferring existing pin */
+	ExecStorePinnedBufferHeapTuple(tuple, slot, buffer);
+
+	return result;
+}
+
+
 /* ------------------------------------------------------------------------
  * Definition of the heap table access method.
  * ------------------------------------------------------------------------
@@ -193,6 +539,13 @@ static const TableAmRoutine heapam_methods = {
 	.index_fetch_end = heapam_index_fetch_end,
 	.index_fetch_tuple = heapam_index_fetch_tuple,
 
+	.tuple_insert = heapam_heap_insert,
+	.tuple_insert_speculative = heapam_heap_insert_speculative,
+	.tuple_complete_speculative = heapam_heap_complete_speculative,
+	.tuple_delete = heapam_heap_delete,
+	.tuple_update = heapam_heap_update,
+	.tuple_lock = heapam_lock_tuple,
+
 	.tuple_satisfies_snapshot = heapam_tuple_satisfies_snapshot,
 };
 
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 6cb38f80c68..5e8fdacb951 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -112,6 +112,9 @@ static inline void
 SetHintBits(HeapTupleHeader tuple, Buffer buffer,
 			uint16 infomask, TransactionId xid)
 {
+	if (!BufferIsValid(buffer))
+		return;
+
 	if (TransactionIdIsValid(xid))
 	{
 		/* NB: xid must be known committed here! */
@@ -606,7 +609,11 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 	{
 		if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
 			return HeapTupleMayBeUpdated;
-		return HeapTupleUpdated;	/* updated by other */
+		/* updated by other */
+		if (ItemPointerEquals(&htup->t_self, &tuple->t_ctid))
+			return HeapTupleDeleted;
+		else
+			return HeapTupleUpdated;
 	}
 
 	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
@@ -647,7 +654,12 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 			return HeapTupleBeingUpdated;
 
 		if (TransactionIdDidCommit(xmax))
-			return HeapTupleUpdated;
+		{
+			if (ItemPointerEquals(&htup->t_self, &tuple->t_ctid))
+				return HeapTupleDeleted;
+			else
+				return HeapTupleUpdated;
+		}
 
 		/*
 		 * By here, the update in the Xmax is either aborted or crashed, but
@@ -703,7 +715,12 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 
 	SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
 				HeapTupleHeaderGetRawXmax(tuple));
-	return HeapTupleUpdated;	/* updated by other */
+
+	/* updated by other */
+	if (ItemPointerEquals(&htup->t_self, &tuple->t_ctid))
+		return HeapTupleDeleted;
+	else
+		return HeapTupleUpdated;
 }
 
 /*
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index cd921a46005..a40cfcf1954 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1763,7 +1763,7 @@ toast_delete_datum(Relation rel, Datum value, bool is_speculative)
 		 * Have a chunk, delete it
 		 */
 		if (is_speculative)
-			heap_abort_speculative(toastrel, toasttup);
+			heap_abort_speculative(toastrel, &toasttup->t_self);
 		else
 			simple_heap_delete(toastrel, &toasttup->t_self);
 	}
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 628d930c130..9a01f74d8fe 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -176,6 +176,107 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc parallel_scan)
 }
 
 
+/* ----------------------------------------------------------------------------
+ * Functions to make modifications a bit simpler.
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ *	simple_table_update - replace a tuple
+ *
+ * This routine may be used to update a tuple when concurrent updates of
+ * the target tuple are not expected (for example, because we have a lock
+ * on the relation associated with the tuple).  Any failure is reported
+ * via ereport().
+ */
+void
+simple_table_update(Relation rel, ItemPointer otid,
+					TupleTableSlot *slot,
+					Snapshot snapshot,
+					bool *update_indexes)
+{
+	HTSU_Result result;
+	HeapUpdateFailureData hufd;
+	LockTupleMode lockmode;
+
+	result = table_update(rel, otid, slot,
+						  GetCurrentCommandId(true),
+						  snapshot, InvalidSnapshot,
+						  true /* wait for commit */ ,
+						  &hufd, &lockmode, update_indexes);
+
+	switch (result)
+	{
+		case HeapTupleSelfUpdated:
+			/* Tuple was already updated in current command? */
+			elog(ERROR, "tuple already updated by self");
+			break;
+
+		case HeapTupleMayBeUpdated:
+			/* done successfully */
+			break;
+
+		case HeapTupleUpdated:
+			elog(ERROR, "tuple concurrently updated");
+			break;
+
+		case HeapTupleDeleted:
+			elog(ERROR, "tuple concurrently deleted");
+			break;
+
+		default:
+			elog(ERROR, "unrecognized heap_update status: %u", result);
+			break;
+	}
+
+}
+
+/*
+ *	simple_table_delete - delete a tuple
+ *
+ * This routine may be used to delete a tuple when concurrent updates of
+ * the target tuple are not expected (for example, because we have a lock
+ * on the relation associated with the tuple).  Any failure is reported
+ * via ereport().
+ */
+void
+simple_table_delete(Relation rel, ItemPointer tid, Snapshot snapshot)
+{
+	HTSU_Result result;
+	HeapUpdateFailureData hufd;
+
+	result = table_delete(rel, tid,
+						  GetCurrentCommandId(true),
+						  snapshot, InvalidSnapshot,
+						  true /* wait for commit */ ,
+						  &hufd, false /* changingPart */ );
+
+	switch (result)
+	{
+		case HeapTupleSelfUpdated:
+			/* Tuple was already updated in current command? */
+			elog(ERROR, "tuple already updated by self");
+			break;
+
+		case HeapTupleMayBeUpdated:
+			/* done successfully */
+			break;
+
+		case HeapTupleUpdated:
+			elog(ERROR, "tuple concurrently updated");
+			break;
+
+		case HeapTupleDeleted:
+			elog(ERROR, "tuple concurrently deleted");
+			break;
+
+		default:
+			elog(ERROR, "unrecognized heap_delete status: %u", result);
+			break;
+	}
+}
+
+
 /* ----------------------------------------------------------------------------
  * Helper functions to implement parallel scans for block oriented AMs.
  * ----------------------------------------------------------------------------
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f2731b40757..aba93262383 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3003,7 +3003,6 @@ CopyFrom(CopyState cstate)
 					/* And create index entries for it */
 					if (resultRelInfo->ri_NumIndices > 0)
 						recheckIndexes = ExecInsertIndexTuples(slot,
-															   &(tuple->t_self),
 															   estate,
 															   false,
 															   NULL,
@@ -3147,7 +3146,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 			cstate->cur_lineno = firstBufferedLineNo + i;
 			ExecStoreHeapTuple(bufferedTuples[i], myslot, false);
 			recheckIndexes =
-				ExecInsertIndexTuples(myslot, &(bufferedTuples[i]->t_self),
+				ExecInsertIndexTuples(myslot,
 									  estate, false, NULL, NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 myslot,
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 36e3d44aad6..0ac295cea3f 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -588,9 +588,6 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 
 	/* We know this is a newly created relation, so there are no indexes */
 
-	/* Free the copied tuple. */
-	heap_freetuple(tuple);
-
 	return true;
 }
 
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 71098896947..5cc15bcfef0 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -15,6 +15,7 @@
 
 #include "access/genam.h"
 #include "access/heapam.h"
+#include "access/tableam.h"
 #include "access/sysattr.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
@@ -3285,14 +3286,6 @@ GetTupleForTrigger(EState *estate,
 				   TupleTableSlot **newSlot)
 {
 	Relation	relation = relinfo->ri_RelationDesc;
-	HeapTuple	tuple;
-	Buffer		buffer;
-	BufferHeapTupleTableSlot *boldslot;
-
-	Assert(TTS_IS_BUFFERTUPLE(oldslot));
-	ExecClearTuple(oldslot);
-	boldslot = (BufferHeapTupleTableSlot *) oldslot;
-	tuple = &boldslot->base.tupdata;
 
 	if (newSlot != NULL)
 	{
@@ -3307,12 +3300,12 @@ GetTupleForTrigger(EState *estate,
 		/*
 		 * lock tuple for update
 		 */
-ltrmark:;
-		tuple->t_self = *tid;
-		test = heap_lock_tuple(relation, tuple,
-							   estate->es_output_cid,
-							   lockmode, LockWaitBlock,
-							   false, &buffer, &hufd);
+		test = table_lock_tuple(relation, tid, estate->es_snapshot, oldslot,
+								estate->es_output_cid,
+								lockmode, LockWaitBlock,
+								IsolationUsesXactSnapshot() ? 0 : TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
+								&hufd);
+
 		switch (test)
 		{
 			case HeapTupleSelfUpdated:
@@ -3332,57 +3325,50 @@ ltrmark:;
 							 errhint("Consider using an AFTER trigger instead of a BEFORE trigger to propagate changes to other rows.")));
 
 				/* treat it as deleted; do not process */
-				ReleaseBuffer(buffer);
 				return false;
 
 			case HeapTupleMayBeUpdated:
-				ExecStorePinnedBufferHeapTuple(tuple, oldslot, buffer);
-
-				break;
-
-			case HeapTupleUpdated:
-				ReleaseBuffer(buffer);
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				if (ItemPointerIndicatesMovedPartitions(&hufd.ctid))
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("tuple to be locked was already moved to another partition due to concurrent update")));
-
-				if (!ItemPointerEquals(&hufd.ctid, &tuple->t_self))
+				if (hufd.traversed)
 				{
-					/* it was updated, so look at the updated version */
+					TupleTableSlot *testslot;
 					TupleTableSlot *epqslot;
 
+					EvalPlanQualBegin(epqstate, estate);
+
+					testslot = EvalPlanQualSlot(epqstate, relation, relinfo->ri_RangeTableIndex);
+					ExecCopySlot(testslot, oldslot);
+
 					epqslot = EvalPlanQual(estate,
 										   epqstate,
 										   relation,
 										   relinfo->ri_RangeTableIndex,
-										   lockmode,
-										   &hufd.ctid,
-										   hufd.xmax);
-					if (!TupIsNull(epqslot))
-					{
-						*tid = hufd.ctid;
+										   testslot);
 
-						*newSlot = epqslot;
+					/*
+					 * If PlanQual failed for updated tuple - we must not
+					 * process this tuple!
+					 */
+					if (TupIsNull(epqslot))
+						return false;
 
-						/*
-						 * EvalPlanQual already locked the tuple, but we
-						 * re-call heap_lock_tuple anyway as an easy way of
-						 * re-fetching the correct tuple.  Speed is hardly a
-						 * criterion in this path anyhow.
-						 */
-						goto ltrmark;
-					}
+					*newSlot = epqslot;
 				}
+				break;
 
-				/*
-				 * if tuple was deleted or PlanQual failed for updated tuple -
-				 * we must not process this tuple!
-				 */
+			case HeapTupleUpdated:
+				if (IsolationUsesXactSnapshot())
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("could not serialize access due to concurrent update")));
+				elog(ERROR, "wrong heap_lock_tuple status: %u", test);
+				break;
+
+			case HeapTupleDeleted:
+				if (IsolationUsesXactSnapshot())
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("could not serialize access due to concurrent update")));
+				/* tuple was deleted */
 				return false;
 
 			case HeapTupleInvisible:
@@ -3390,15 +3376,23 @@ ltrmark:;
 				break;
 
 			default:
-				ReleaseBuffer(buffer);
 				elog(ERROR, "unrecognized heap_lock_tuple status: %u", test);
 				return false;	/* keep compiler quiet */
 		}
 	}
 	else
 	{
+
 		Page		page;
 		ItemId		lp;
+		Buffer		buffer;
+		BufferHeapTupleTableSlot *boldslot;
+		HeapTuple tuple;
+
+		Assert(TTS_IS_BUFFERTUPLE(oldslot));
+		ExecClearTuple(oldslot);
+		boldslot = (BufferHeapTupleTableSlot *) oldslot;
+		tuple = &boldslot->base.tupdata;
 
 		buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
 
@@ -4286,7 +4280,7 @@ AfterTriggerExecute(EState *estate,
 				LocTriggerData.tg_trigslot = ExecGetTriggerOldSlot(estate, relInfo);
 
 				ItemPointerCopy(&(event->ate_ctid1), &(tuple1.t_self));
-				if (!heap_fetch(rel, SnapshotAny, &tuple1, &buffer, false, NULL))
+				if (!heap_fetch(rel, &(tuple1.t_self), SnapshotAny, &tuple1, &buffer, NULL))
 					elog(ERROR, "failed to fetch tuple1 for AFTER trigger");
 				ExecStorePinnedBufferHeapTuple(&tuple1,
 											   LocTriggerData.tg_trigslot,
@@ -4310,7 +4304,7 @@ AfterTriggerExecute(EState *estate,
 				LocTriggerData.tg_newslot = ExecGetTriggerNewSlot(estate, relInfo);
 
 				ItemPointerCopy(&(event->ate_ctid2), &(tuple2.t_self));
-				if (!heap_fetch(rel, SnapshotAny, &tuple2, &buffer, false, NULL))
+				if (!heap_fetch(rel, &(tuple2.t_self), SnapshotAny, &tuple2, &buffer, NULL))
 					elog(ERROR, "failed to fetch tuple2 for AFTER trigger");
 				ExecStorePinnedBufferHeapTuple(&tuple2,
 											   LocTriggerData.tg_newslot,
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index e67dd6750c6..3b602bb8baf 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -271,12 +271,12 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  */
 List *
 ExecInsertIndexTuples(TupleTableSlot *slot,
-					  ItemPointer tupleid,
 					  EState *estate,
 					  bool noDupErr,
 					  bool *specConflict,
 					  List *arbiterIndexes)
 {
+	ItemPointer tupleid = &slot->tts_tid;
 	List	   *result = NIL;
 	ResultRelInfo *resultRelInfo;
 	int			i;
@@ -288,6 +288,8 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 	Datum		values[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
 
+	Assert(ItemPointerIsValid(tupleid));
+
 	/*
 	 * Get information from the result relation info structure.
 	 */
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 5a9ffe59c47..8723e32dd58 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2425,9 +2425,7 @@ ExecBuildAuxRowMark(ExecRowMark *erm, List *targetlist)
  *	epqstate - state for EvalPlanQual rechecking
  *	relation - table containing tuple
  *	rti - rangetable index of table containing tuple
- *	lockmode - requested tuple lock mode
- *	*tid - t_ctid from the outdated tuple (ie, next updated version)
- *	priorXmax - t_xmax from the outdated tuple
+ *	tuple - tuple for processing
  *
  * *tid is also an output parameter: it's modified to hold the TID of the
  * latest version of the tuple (note this may be changed even on failure)
@@ -2437,11 +2435,9 @@ ExecBuildAuxRowMark(ExecRowMark *erm, List *targetlist)
  */
 TupleTableSlot *
 EvalPlanQual(EState *estate, EPQState *epqstate,
-			 Relation relation, Index rti, LockTupleMode lockmode,
-			 ItemPointer tid, TransactionId priorXmax)
+			 Relation relation, Index rti, TupleTableSlot *testslot)
 {
 	TupleTableSlot *slot;
-	TupleTableSlot *testslot;
 
 	Assert(rti > 0);
 
@@ -2450,20 +2446,7 @@ EvalPlanQual(EState *estate, EPQState *epqstate,
 	 */
 	EvalPlanQualBegin(epqstate, estate);
 
-	/*
-	 * Get and lock the updated version of the row; if fail, return NULL.
-	 */
-	testslot = EvalPlanQualSlot(epqstate, relation, rti);
-	if (!EvalPlanQualFetch(estate, relation, lockmode, LockWaitBlock,
-						   tid, priorXmax,
-						   testslot))
-		return NULL;
-
-	/*
-	 * For UPDATE/DELETE we have to return tid of actual row we're executing
-	 * PQ for.
-	 */
-	*tid = testslot->tts_tid;
+	Assert(testslot == epqstate->estate->es_epqTupleSlot[rti - 1]);
 
 	/*
 	 * Fetch any non-locked source rows
@@ -2495,258 +2478,6 @@ EvalPlanQual(EState *estate, EPQState *epqstate,
 	return slot;
 }
 
-/*
- * Fetch a copy of the newest version of an outdated tuple
- *
- *	estate - executor state data
- *	relation - table containing tuple
- *	lockmode - requested tuple lock mode
- *	wait_policy - requested lock wait policy
- *	*tid - t_ctid from the outdated tuple (ie, next updated version)
- *	priorXmax - t_xmax from the outdated tuple
- *	slot - slot to store newest tuple version
- *
- * Returns true, with slot containing the newest tuple version, or false if we
- * find that there is no newest version (ie, the row was deleted not updated).
- * We also return false if the tuple is locked and the wait policy is to skip
- * such tuples.
- *
- * If successful, we have locked the newest tuple version, so caller does not
- * need to worry about it changing anymore.
- */
-bool
-EvalPlanQualFetch(EState *estate, Relation relation, LockTupleMode lockmode,
-				  LockWaitPolicy wait_policy,
-				  ItemPointer tid, TransactionId priorXmax,
-				  TupleTableSlot *slot)
-{
-	HeapTupleData tuple;
-	SnapshotData SnapshotDirty;
-
-	/*
-	 * fetch target tuple
-	 *
-	 * Loop here to deal with updated or busy tuples
-	 */
-	InitDirtySnapshot(SnapshotDirty);
-	tuple.t_self = *tid;
-	for (;;)
-	{
-		Buffer		buffer;
-
-		if (heap_fetch(relation, &SnapshotDirty, &tuple, &buffer, true, NULL))
-		{
-			HTSU_Result test;
-			HeapUpdateFailureData hufd;
-
-			/*
-			 * If xmin isn't what we're expecting, the slot must have been
-			 * recycled and reused for an unrelated tuple.  This implies that
-			 * the latest version of the row was deleted, so we need do
-			 * nothing.  (Should be safe to examine xmin without getting
-			 * buffer's content lock.  We assume reading a TransactionId to be
-			 * atomic, and Xmin never changes in an existing tuple, except to
-			 * invalid or frozen, and neither of those can match priorXmax.)
-			 */
-			if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
-									 priorXmax))
-			{
-				ReleaseBuffer(buffer);
-				return false;
-			}
-
-			/* otherwise xmin should not be dirty... */
-			if (TransactionIdIsValid(SnapshotDirty.xmin))
-				elog(ERROR, "t_xmin is uncommitted in tuple to be updated");
-
-			/*
-			 * If tuple is being updated by other transaction then we have to
-			 * wait for its commit/abort, or die trying.
-			 */
-			if (TransactionIdIsValid(SnapshotDirty.xmax))
-			{
-				ReleaseBuffer(buffer);
-				switch (wait_policy)
-				{
-					case LockWaitBlock:
-						XactLockTableWait(SnapshotDirty.xmax,
-										  relation, &tuple.t_self,
-										  XLTW_FetchUpdated);
-						break;
-					case LockWaitSkip:
-						if (!ConditionalXactLockTableWait(SnapshotDirty.xmax))
-							return false;	/* skip instead of waiting */
-						break;
-					case LockWaitError:
-						if (!ConditionalXactLockTableWait(SnapshotDirty.xmax))
-							ereport(ERROR,
-									(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
-									 errmsg("could not obtain lock on row in relation \"%s\"",
-											RelationGetRelationName(relation))));
-						break;
-				}
-				continue;		/* loop back to repeat heap_fetch */
-			}
-
-			/*
-			 * If tuple was inserted by our own transaction, we have to check
-			 * cmin against es_output_cid: cmin >= current CID means our
-			 * command cannot see the tuple, so we should ignore it. Otherwise
-			 * heap_lock_tuple() will throw an error, and so would any later
-			 * attempt to update or delete the tuple.  (We need not check cmax
-			 * because HeapTupleSatisfiesDirty will consider a tuple deleted
-			 * by our transaction dead, regardless of cmax.) We just checked
-			 * that priorXmax == xmin, so we can test that variable instead of
-			 * doing HeapTupleHeaderGetXmin again.
-			 */
-			if (TransactionIdIsCurrentTransactionId(priorXmax) &&
-				HeapTupleHeaderGetCmin(tuple.t_data) >= estate->es_output_cid)
-			{
-				ReleaseBuffer(buffer);
-				return false;
-			}
-
-			/*
-			 * This is a live tuple, so now try to lock it.
-			 */
-			test = heap_lock_tuple(relation, &tuple,
-								   estate->es_output_cid,
-								   lockmode, wait_policy,
-								   false, &buffer, &hufd);
-			/* We now have two pins on the buffer, get rid of one */
-			ReleaseBuffer(buffer);
-
-			switch (test)
-			{
-				case HeapTupleSelfUpdated:
-
-					/*
-					 * The target tuple was already updated or deleted by the
-					 * current command, or by a later command in the current
-					 * transaction.  We *must* ignore the tuple in the former
-					 * case, so as to avoid the "Halloween problem" of
-					 * repeated update attempts.  In the latter case it might
-					 * be sensible to fetch the updated tuple instead, but
-					 * doing so would require changing heap_update and
-					 * heap_delete to not complain about updating "invisible"
-					 * tuples, which seems pretty scary (heap_lock_tuple will
-					 * not complain, but few callers expect
-					 * HeapTupleInvisible, and we're not one of them).  So for
-					 * now, treat the tuple as deleted and do not process.
-					 */
-					ReleaseBuffer(buffer);
-					return false;
-
-				case HeapTupleMayBeUpdated:
-					/* successfully locked */
-					break;
-
-				case HeapTupleUpdated:
-					ReleaseBuffer(buffer);
-					if (IsolationUsesXactSnapshot())
-						ereport(ERROR,
-								(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-								 errmsg("could not serialize access due to concurrent update")));
-					if (ItemPointerIndicatesMovedPartitions(&hufd.ctid))
-						ereport(ERROR,
-								(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-								 errmsg("tuple to be locked was already moved to another partition due to concurrent update")));
-
-					/* Should not encounter speculative tuple on recheck */
-					Assert(!HeapTupleHeaderIsSpeculative(tuple.t_data));
-					if (!ItemPointerEquals(&hufd.ctid, &tuple.t_self))
-					{
-						/* it was updated, so look at the updated version */
-						tuple.t_self = hufd.ctid;
-						/* updated row should have xmin matching this xmax */
-						priorXmax = hufd.xmax;
-						continue;
-					}
-					/* tuple was deleted, so give up */
-					return false;
-
-				case HeapTupleWouldBlock:
-					ReleaseBuffer(buffer);
-					return false;
-
-				case HeapTupleInvisible:
-					elog(ERROR, "attempted to lock invisible tuple");
-					break;
-
-				default:
-					ReleaseBuffer(buffer);
-					elog(ERROR, "unrecognized heap_lock_tuple status: %u",
-						 test);
-					return false;	/* keep compiler quiet */
-			}
-
-			/*
-			 * We got tuple - store it for use by the recheck query.
-			 */
-			ExecStorePinnedBufferHeapTuple(&tuple, slot, buffer);
-			ExecMaterializeSlot(slot);
-			break;
-		}
-
-		/*
-		 * If the referenced slot was actually empty, the latest version of
-		 * the row must have been deleted, so we need do nothing.
-		 */
-		if (tuple.t_data == NULL)
-		{
-			ReleaseBuffer(buffer);
-			return false;
-		}
-
-		/*
-		 * As above, if xmin isn't what we're expecting, do nothing.
-		 */
-		if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
-								 priorXmax))
-		{
-			ReleaseBuffer(buffer);
-			return false;
-		}
-
-		/*
-		 * If we get here, the tuple was found but failed SnapshotDirty.
-		 * Assuming the xmin is either a committed xact or our own xact (as it
-		 * certainly should be if we're trying to modify the tuple), this must
-		 * mean that the row was updated or deleted by either a committed xact
-		 * or our own xact.  If it was deleted, we can ignore it; if it was
-		 * updated then chain up to the next version and repeat the whole
-		 * process.
-		 *
-		 * As above, it should be safe to examine xmax and t_ctid without the
-		 * buffer content lock, because they can't be changing.
-		 */
-
-		/* check whether next version would be in a different partition */
-		if (HeapTupleHeaderIndicatesMovedPartitions(tuple.t_data))
-			ereport(ERROR,
-					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-					 errmsg("tuple to be locked was already moved to another partition due to concurrent update")));
-
-		/* check whether tuple has been deleted */
-		if (ItemPointerEquals(&tuple.t_self, &tuple.t_data->t_ctid))
-		{
-			/* deleted, so forget about it */
-			ReleaseBuffer(buffer);
-			return false;
-		}
-
-		/* updated, so look at the updated row */
-		tuple.t_self = tuple.t_data->t_ctid;
-		/* updated row should have xmin matching this xmax */
-		priorXmax = HeapTupleHeaderGetUpdateXid(tuple.t_data);
-		ReleaseBuffer(buffer);
-		/* loop back to fetch next in chain */
-	}
-
-	/* signal success */
-	return true;
-}
-
 /*
  * EvalPlanQualInit -- initialize during creation of a plan state node
  * that might need to invoke EPQ processing.
@@ -2911,8 +2642,8 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
 				Buffer		buffer;
 
 				tuple.t_self = *((ItemPointer) DatumGetPointer(datum));
-				if (!heap_fetch(erm->relation, SnapshotAny, &tuple, &buffer,
-								false, NULL))
+				if (!heap_fetch(erm->relation, &tuple.t_self, SnapshotAny,
+								&tuple, &buffer, NULL))
 					elog(ERROR, "failed to fetch tuple for EvalPlanQual recheck");
 
 				/* successful, store tuple */
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 95dfc4987de..73090d47d19 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -15,7 +15,6 @@
 #include "postgres.h"
 
 #include "access/genam.h"
-#include "access/heapam.h"
 #include "access/relscan.h"
 #include "access/tableam.h"
 #include "access/transam.h"
@@ -166,25 +165,18 @@ retry:
 	/* Found tuple, try to lock it in the lockmode. */
 	if (found)
 	{
-		Buffer		buf;
 		HeapUpdateFailureData hufd;
 		HTSU_Result res;
-		HeapTupleData locktup;
-		HeapTupleTableSlot *hslot = (HeapTupleTableSlot *)outslot;
-
-		/* Only a heap tuple has item pointers. */
-		Assert(TTS_IS_HEAPTUPLE(outslot) || TTS_IS_BUFFERTUPLE(outslot));
-		ItemPointerCopy(&hslot->tuple->t_self, &locktup.t_self);
 
 		PushActiveSnapshot(GetLatestSnapshot());
 
-		res = heap_lock_tuple(rel, &locktup, GetCurrentCommandId(false),
-							  lockmode,
-							  LockWaitBlock,
-							  false /* don't follow updates */ ,
-							  &buf, &hufd);
-		/* the tuple slot already has the buffer pinned */
-		ReleaseBuffer(buf);
+		res = table_lock_tuple(rel, &(outslot->tts_tid), GetLatestSnapshot(),
+							   outslot,
+							   GetCurrentCommandId(false),
+							   lockmode,
+							   LockWaitBlock,
+							   0 /* don't follow updates */ ,
+							   &hufd);
 
 		PopActiveSnapshot();
 
@@ -203,6 +195,12 @@ retry:
 							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 							 errmsg("concurrent update, retrying")));
 				goto retry;
+			case HeapTupleDeleted:
+				/* XXX: Improve handling here */
+				ereport(LOG,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("concurrent delete, retrying")));
+				goto retry;
 			case HeapTupleInvisible:
 				elog(ERROR, "attempted to lock invisible tuple");
 				break;
@@ -330,25 +328,18 @@ retry:
 	/* Found tuple, try to lock it in the lockmode. */
 	if (found)
 	{
-		Buffer		buf;
 		HeapUpdateFailureData hufd;
 		HTSU_Result res;
-		HeapTupleData locktup;
-		HeapTupleTableSlot *hslot = (HeapTupleTableSlot *)outslot;
-
-		/* Only a heap tuple has item pointers. */
-		Assert(TTS_IS_HEAPTUPLE(outslot) || TTS_IS_BUFFERTUPLE(outslot));
-		ItemPointerCopy(&hslot->tuple->t_self, &locktup.t_self);
 
 		PushActiveSnapshot(GetLatestSnapshot());
 
-		res = heap_lock_tuple(rel, &locktup, GetCurrentCommandId(false),
-							  lockmode,
-							  LockWaitBlock,
-							  false /* don't follow updates */ ,
-							  &buf, &hufd);
-		/* the tuple slot already has the buffer pinned */
-		ReleaseBuffer(buf);
+		res = table_lock_tuple(rel, &(outslot->tts_tid), GetLatestSnapshot(),
+							   outslot,
+							   GetCurrentCommandId(false),
+							   lockmode,
+							   LockWaitBlock,
+							   0 /* don't follow updates */ ,
+							   &hufd);
 
 		PopActiveSnapshot();
 
@@ -367,6 +358,12 @@ retry:
 							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 							 errmsg("concurrent update, retrying")));
 				goto retry;
+			case HeapTupleDeleted:
+				/* XXX: Improve handling here */
+				ereport(LOG,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("concurrent delete, retrying")));
+				goto retry;
 			case HeapTupleInvisible:
 				elog(ERROR, "attempted to lock invisible tuple");
 				break;
@@ -392,7 +389,6 @@ void
 ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	HeapTuple	tuple;
 	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 
@@ -419,16 +415,12 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		if (resultRelInfo->ri_PartitionCheck)
 			ExecPartitionCheck(resultRelInfo, slot, estate, true);
 
-		/* Materialize slot into a tuple that we can scribble upon. */
-		tuple = ExecFetchSlotHeapTuple(slot, true, NULL);
-
 		/* OK, store the tuple and create index entries for it */
-		simple_heap_insert(rel, tuple);
-		ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+		table_insert(resultRelInfo->ri_RelationDesc, slot,
+					 GetCurrentCommandId(true), 0, NULL);
 
 		if (resultRelInfo->ri_NumIndices > 0)
-			recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
-												   estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW INSERT Triggers */
@@ -456,13 +448,9 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	HeapTuple	tuple;
 	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
-	HeapTupleTableSlot *hsearchslot = (HeapTupleTableSlot *)searchslot;
-
-	/* We expect the searchslot to contain a heap tuple. */
-	Assert(TTS_IS_HEAPTUPLE(searchslot) || TTS_IS_BUFFERTUPLE(searchslot));
+	ItemPointer tid = &(searchslot->tts_tid);
 
 	/* For now we support only tables. */
 	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
@@ -474,14 +462,14 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		resultRelInfo->ri_TrigDesc->trig_update_before_row)
 	{
 		if (!ExecBRUpdateTriggers(estate, epqstate, resultRelInfo,
-								  &hsearchslot->tuple->t_self,
-								  NULL, slot))
+								  tid, NULL, slot))
 			skip_tuple = true;		/* "do nothing" */
 	}
 
 	if (!skip_tuple)
 	{
 		List	   *recheckIndexes = NIL;
+		bool		update_indexes;
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -489,23 +477,16 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		if (resultRelInfo->ri_PartitionCheck)
 			ExecPartitionCheck(resultRelInfo, slot, estate, true);
 
-		/* Materialize slot into a tuple that we can scribble upon. */
-		tuple = ExecFetchSlotHeapTuple(slot, true, NULL);
+		simple_table_update(rel, tid, slot,
+							estate->es_snapshot, &update_indexes);
 
-		/* OK, update the tuple and index entries for it */
-		simple_heap_update(rel, &hsearchslot->tuple->t_self, tuple);
-		ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
-
-		if (resultRelInfo->ri_NumIndices > 0 &&
-			!HeapTupleIsHeapOnly(tuple))
-			recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
-												   estate, false, NULL,
+		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
+			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW UPDATE Triggers */
 		ExecARUpdateTriggers(estate, resultRelInfo,
-							 &(tuple->t_self),
-							 NULL, slot,
+							 tid, NULL, slot,
 							 recheckIndexes, NULL);
 
 		list_free(recheckIndexes);
@@ -525,7 +506,7 @@ ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
 	bool		skip_tuple = false;
 	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
-	HeapTupleTableSlot *hsearchslot = (HeapTupleTableSlot *)searchslot;
+	ItemPointer tid = &(searchslot->tts_tid);
 
 	/* For now we support only tables and heap tuples. */
 	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
@@ -538,23 +519,18 @@ ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
 		resultRelInfo->ri_TrigDesc->trig_delete_before_row)
 	{
 		skip_tuple = !ExecBRDeleteTriggers(estate, epqstate, resultRelInfo,
-										   &hsearchslot->tuple->t_self,
-										   NULL, NULL);
+										   tid, NULL, NULL);
 
 	}
 
 	if (!skip_tuple)
 	{
-		List	   *recheckIndexes = NIL;
-
 		/* OK, delete the tuple */
-		simple_heap_delete(rel, &hsearchslot->tuple->t_self);
+		simple_table_delete(rel, tid, estate->es_snapshot);
 
 		/* AFTER ROW DELETE Triggers */
 		ExecARDeleteTriggers(estate, resultRelInfo,
-							 &hsearchslot->tuple->t_self, NULL, NULL);
-
-		list_free(recheckIndexes);
+							 tid, NULL, NULL);
 	}
 }
 
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 76f0f9d66e5..91f46b88ed8 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -23,6 +23,7 @@
 
 #include "access/heapam.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeLockRows.h"
@@ -82,8 +83,7 @@ lnext:
 		ExecRowMark *erm = aerm->rowmark;
 		Datum		datum;
 		bool		isNull;
-		HeapTupleData tuple;
-		Buffer		buffer;
+		ItemPointerData tid;
 		HeapUpdateFailureData hufd;
 		LockTupleMode lockmode;
 		HTSU_Result test;
@@ -161,7 +161,7 @@ lnext:
 		}
 
 		/* okay, try to lock the tuple */
-		tuple.t_self = *((ItemPointer) DatumGetPointer(datum));
+		tid = *((ItemPointer) DatumGetPointer(datum));
 		switch (erm->markType)
 		{
 			case ROW_MARK_EXCLUSIVE:
@@ -182,11 +182,13 @@ lnext:
 				break;
 		}
 
-		test = heap_lock_tuple(erm->relation, &tuple,
-							   estate->es_output_cid,
-							   lockmode, erm->waitPolicy, true,
-							   &buffer, &hufd);
-		ReleaseBuffer(buffer);
+		test = table_lock_tuple(erm->relation, &tid, estate->es_snapshot,
+								markSlot, estate->es_output_cid,
+								lockmode, erm->waitPolicy,
+								(IsolationUsesXactSnapshot() ? 0 : TUPLE_LOCK_FLAG_FIND_LAST_VERSION)
+								| TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS,
+								&hufd);
+
 		switch (test)
 		{
 			case HeapTupleWouldBlock:
@@ -213,6 +215,15 @@ lnext:
 
 			case HeapTupleMayBeUpdated:
 				/* got the lock successfully */
+				if (hufd.traversed)
+				{
+					/* locked tuple saved in markSlot for EvalPlanQual testing below */
+
+					/* Remember we need to do EPQ testing */
+					epq_needed = true;
+
+					/* Continue loop until we have all target tuples */
+				}
 				break;
 
 			case HeapTupleUpdated:
@@ -220,37 +231,19 @@ lnext:
 					ereport(ERROR,
 							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 							 errmsg("could not serialize access due to concurrent update")));
-				if (ItemPointerIndicatesMovedPartitions(&hufd.ctid))
+				/* skip lock */
+				goto lnext;
+
+			case HeapTupleDeleted:
+				if (IsolationUsesXactSnapshot())
 					ereport(ERROR,
 							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("tuple to be locked was already moved to another partition due to concurrent update")));
-
-				if (ItemPointerEquals(&hufd.ctid, &tuple.t_self))
-				{
-					/* Tuple was deleted, so don't return it */
-					goto lnext;
-				}
-
-				/* updated, so fetch and lock the updated version */
-				if (!EvalPlanQualFetch(estate, erm->relation,
-									   lockmode, erm->waitPolicy,
-									   &hufd.ctid, hufd.xmax,
-									   markSlot))
-				{
-					/*
-					 * Tuple was deleted; or it's locked and we're under SKIP
-					 * LOCKED policy, so don't return it
-					 */
-					goto lnext;
-				}
-				/* remember the actually locked tuple's TID */
-				tuple.t_self = markSlot->tts_tid;
-
-				/* Remember we need to do EPQ testing */
-				epq_needed = true;
-
-				/* Continue loop until we have all target tuples */
-				break;
+							 errmsg("could not serialize access due to concurrent update")));
+				/*
+				 * Tuple was deleted; or it's locked and we're under SKIP
+				 * LOCKED policy, so don't return it
+				 */
+				goto lnext;
 
 			case HeapTupleInvisible:
 				elog(ERROR, "attempted to lock invisible tuple");
@@ -262,7 +255,7 @@ lnext:
 		}
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
-		erm->curCtid = tuple.t_self;
+		erm->curCtid = tid;
 	}
 
 	/*
@@ -305,8 +298,8 @@ lnext:
 
 			/* okay, fetch the tuple */
 			tuple.t_self = erm->curCtid;
-			if (!heap_fetch(erm->relation, SnapshotAny, &tuple, &buffer,
-							false, NULL))
+			if (!heap_fetch(erm->relation, &tuple.t_self, SnapshotAny, &tuple, &buffer,
+							NULL))
 				elog(ERROR, "failed to fetch tuple for EvalPlanQual recheck");
 			ExecStorePinnedBufferHeapTuple(&tuple, markSlot, buffer);
 			ExecMaterializeSlot(markSlot);
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index e316ff99012..5b079d8302a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -42,6 +42,7 @@
 #include "access/tableam.h"
 #include "access/xact.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_am.h"
 #include "commands/trigger.h"
 #include "executor/execPartition.h"
 #include "executor/executor.h"
@@ -190,31 +191,33 @@ ExecProcessReturning(ResultRelInfo *resultRelInfo,
  */
 static void
 ExecCheckHeapTupleVisible(EState *estate,
-						  HeapTuple tuple,
-						  Buffer buffer)
+						  Relation rel,
+						  TupleTableSlot *slot)
 {
 	if (!IsolationUsesXactSnapshot())
 		return;
 
-	/*
-	 * We need buffer pin and lock to call HeapTupleSatisfiesVisibility.
-	 * Caller should be holding pin, but not lock.
-	 */
-	LockBuffer(buffer, BUFFER_LOCK_SHARE);
-	if (!HeapTupleSatisfiesVisibility(tuple, estate->es_snapshot, buffer))
+	if (!table_tuple_satisfies_snapshot(rel, slot, estate->es_snapshot))
 	{
+		Datum		xminDatum;
+		TransactionId xmin;
+		bool		isnull;
+
+		xminDatum = slot_getsysattr(slot, MinTransactionIdAttributeNumber, &isnull);
+		Assert(!isnull);
+		xmin = DatumGetTransactionId(xminDatum);
+
 		/*
 		 * We should not raise a serialization failure if the conflict is
 		 * against a tuple inserted by our own transaction, even if it's not
 		 * visible to our snapshot.  (This would happen, for example, if
 		 * conflicting keys are proposed for insertion in a single command.)
 		 */
-		if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
+		if (!TransactionIdIsCurrentTransactionId(xmin))
 			ereport(ERROR,
 					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 					 errmsg("could not serialize access due to concurrent update")));
 	}
-	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 }
 
 /*
@@ -223,7 +226,8 @@ ExecCheckHeapTupleVisible(EState *estate,
 static void
 ExecCheckTIDVisible(EState *estate,
 					ResultRelInfo *relinfo,
-					ItemPointer tid)
+					ItemPointer tid,
+					TupleTableSlot *tempSlot)
 {
 	Relation	rel = relinfo->ri_RelationDesc;
 	Buffer		buffer;
@@ -234,10 +238,10 @@ ExecCheckTIDVisible(EState *estate,
 		return;
 
 	tuple.t_self = *tid;
-	if (!heap_fetch(rel, SnapshotAny, &tuple, &buffer, false, NULL))
+	if (!heap_fetch(rel, &tuple.t_self, SnapshotAny, &tuple, &buffer, NULL))
 		elog(ERROR, "failed to fetch conflicting tuple for ON CONFLICT");
-	ExecCheckHeapTupleVisible(estate, &tuple, buffer);
-	ReleaseBuffer(buffer);
+	ExecStorePinnedBufferHeapTuple(&tuple, tempSlot, buffer);
+	ExecCheckHeapTupleVisible(estate, rel, tempSlot);
 }
 
 /* ----------------------------------------------------------------
@@ -319,7 +323,6 @@ ExecInsert(ModifyTableState *mtstate,
 	else
 	{
 		WCOKind		wco_kind;
-		HeapTuple	inserttuple;
 
 		/*
 		 * Constraints might reference the tableoid column, so (re-)initialize
@@ -417,16 +420,19 @@ ExecInsert(ModifyTableState *mtstate,
 					 * In case of ON CONFLICT DO NOTHING, do nothing. However,
 					 * verify that the tuple is visible to the executor's MVCC
 					 * snapshot at higher isolation levels.
+					 *
+					 * FIXME: Either comment or replace usage of
+					 * ExecGetReturningSlot(). Need a slot that's compatible
+					 * with the resultRelInfo table.
 					 */
 					Assert(onconflict == ONCONFLICT_NOTHING);
-					ExecCheckTIDVisible(estate, resultRelInfo, &conflictTid);
+					ExecCheckTIDVisible(estate, resultRelInfo, &conflictTid,
+										ExecGetReturningSlot(estate, resultRelInfo));
 					InstrCountTuples2(&mtstate->ps, 1);
 					return NULL;
 				}
 			}
 
-			inserttuple = ExecFetchSlotHeapTuple(slot, true, NULL);
-
 			/*
 			 * Before we start insertion proper, acquire our "speculative
 			 * insertion lock".  Others can use that to wait for us to decide
@@ -434,26 +440,23 @@ ExecInsert(ModifyTableState *mtstate,
 			 * waiting for the whole transaction to complete.
 			 */
 			specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
-			HeapTupleHeaderSetSpeculativeToken(inserttuple->t_data, specToken);
 
 			/* insert the tuple, with the speculative token */
-			heap_insert(resultRelationDesc, inserttuple,
-						estate->es_output_cid,
-						HEAP_INSERT_SPECULATIVE,
-						NULL);
+			table_insert_speculative(resultRelationDesc, slot,
+									 estate->es_output_cid,
+									 HEAP_INSERT_SPECULATIVE,
+									 NULL,
+									 specToken);
 			slot->tts_tableOid = RelationGetRelid(resultRelationDesc);
-			ItemPointerCopy(&inserttuple->t_self, &slot->tts_tid);
 
 			/* insert index entries for tuple */
-			recheckIndexes = ExecInsertIndexTuples(slot, &(inserttuple->t_self),
+			recheckIndexes = ExecInsertIndexTuples(slot,
 												   estate, true, &specConflict,
 												   arbiterIndexes);
 
 			/* adjust the tuple's state accordingly */
-			if (!specConflict)
-				heap_finish_speculative(resultRelationDesc, inserttuple);
-			else
-				heap_abort_speculative(resultRelationDesc, inserttuple);
+			table_complete_speculative(resultRelationDesc, slot,
+									   specToken, specConflict);
 
 			/*
 			 * Wake up anyone waiting for our decision.  They will re-check
@@ -481,21 +484,15 @@ ExecInsert(ModifyTableState *mtstate,
 		{
 			/*
 			 * insert the tuple normally.
-			 *
-			 * Note: heap_insert returns the tid (location) of the new tuple
-			 * in the t_self field.
 			 */
-			inserttuple = ExecFetchSlotHeapTuple(slot, true, NULL);
-			heap_insert(resultRelationDesc, inserttuple,
-						estate->es_output_cid,
-						0, NULL);
+			table_insert(resultRelationDesc, slot,
+						 estate->es_output_cid,
+						 0, NULL);
 			slot->tts_tableOid = RelationGetRelid(resultRelationDesc);
-			ItemPointerCopy(&inserttuple->t_self, &slot->tts_tid);
 
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(slot, &(inserttuple->t_self),
-													   estate, false, NULL,
+				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
 													   NIL);
 		}
 	}
@@ -671,12 +668,58 @@ ExecDelete(ModifyTableState *mtstate,
 		 * mode transactions.
 		 */
 ldelete:;
-		result = heap_delete(resultRelationDesc, tupleid,
-							 estate->es_output_cid,
-							 estate->es_crosscheck_snapshot,
-							 true /* wait for commit */ ,
-							 &hufd,
-							 changingPart);
+		result = table_delete(resultRelationDesc, tupleid,
+							  estate->es_output_cid,
+							  estate->es_snapshot,
+							  estate->es_crosscheck_snapshot,
+							  true /* wait for commit */ ,
+							  &hufd,
+							  changingPart);
+
+		if (result == HeapTupleUpdated && !IsolationUsesXactSnapshot())
+		{
+			EvalPlanQualBegin(epqstate, estate);
+			slot = EvalPlanQualSlot(epqstate, resultRelationDesc, resultRelInfo->ri_RangeTableIndex);
+
+			result = table_lock_tuple(resultRelationDesc, tupleid,
+									  estate->es_snapshot,
+									  slot, estate->es_output_cid,
+									  LockTupleExclusive, LockWaitBlock,
+									  TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
+									  &hufd);
+			/*hari FIXME*/
+			/*Assert(result != HeapTupleUpdated && hufd.traversed);*/
+			if (result == HeapTupleMayBeUpdated)
+			{
+				TupleTableSlot *epqslot;
+
+				epqslot = EvalPlanQual(estate,
+									   epqstate,
+									   resultRelationDesc,
+									   resultRelInfo->ri_RangeTableIndex,
+									   slot);
+				if (TupIsNull(epqslot))
+				{
+					/* Tuple no more passing quals, exiting... */
+					return NULL;
+				}
+
+				/**/
+				if (epqreturnslot)
+				{
+					*epqreturnslot = epqslot;
+					return NULL;
+				}
+
+				goto ldelete;
+			}
+			else if (result == HeapTupleInvisible)
+			{
+				/* tuple is not visible; nothing to do */
+				return NULL;
+			}
+		}
+
 		switch (result)
 		{
 			case HeapTupleSelfUpdated:
@@ -722,39 +765,16 @@ ldelete:;
 					ereport(ERROR,
 							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 							 errmsg("could not serialize access due to concurrent update")));
-				if (ItemPointerIndicatesMovedPartitions(&hufd.ctid))
+				else
+					/* shouldn't get there */
+					elog(ERROR, "wrong heap_delete status: %u", result);
+				break;
+
+			case HeapTupleDeleted:
+				if (IsolationUsesXactSnapshot())
 					ereport(ERROR,
 							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("tuple to be deleted was already moved to another partition due to concurrent update")));
-
-				if (!ItemPointerEquals(tupleid, &hufd.ctid))
-				{
-					TupleTableSlot *my_epqslot;
-
-					my_epqslot = EvalPlanQual(estate,
-											  epqstate,
-											  resultRelationDesc,
-											  resultRelInfo->ri_RangeTableIndex,
-											  LockTupleExclusive,
-											  &hufd.ctid,
-											  hufd.xmax);
-					if (!TupIsNull(my_epqslot))
-					{
-						*tupleid = hufd.ctid;
-
-						/*
-						 * If requested, skip delete and pass back the updated
-						 * row.
-						 */
-						if (epqreturnslot)
-						{
-							*epqreturnslot = my_epqslot;
-							return NULL;
-						}
-						else
-							goto ldelete;
-					}
-				}
+							 errmsg("could not serialize access due to concurrent delete")));
 				/* tuple already deleted; nothing to do */
 				return NULL;
 
@@ -841,8 +861,8 @@ ldelete:;
 				deltuple = &bslot->base.tupdata;
 
 				deltuple->t_self = *tupleid;
-				if (!heap_fetch(resultRelationDesc, SnapshotAny,
-								deltuple, &buffer, false, NULL))
+				if (!heap_fetch(resultRelationDesc, &deltuple->t_self, SnapshotAny,
+								deltuple, &buffer, NULL))
 					elog(ERROR, "failed to fetch deleted tuple for DELETE RETURNING");
 
 				ExecStorePinnedBufferHeapTuple(deltuple, slot, buffer);
@@ -897,7 +917,6 @@ ExecUpdate(ModifyTableState *mtstate,
 		   EState *estate,
 		   bool canSetTag)
 {
-	HeapTuple	updatetuple;
 	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	HTSU_Result result;
@@ -925,7 +944,7 @@ ExecUpdate(ModifyTableState *mtstate,
 	{
 		if (!ExecBRUpdateTriggers(estate, epqstate, resultRelInfo,
 								  tupleid, oldtuple, slot))
-			return NULL;        /* "do nothing" */
+			return NULL;		/* "do nothing" */
 	}
 
 	/* INSTEAD OF ROW UPDATE Triggers */
@@ -934,7 +953,7 @@ ExecUpdate(ModifyTableState *mtstate,
 	{
 		if (!ExecIRUpdateTriggers(estate, resultRelInfo,
 								  oldtuple, slot))
-			return NULL;        /* "do nothing" */
+			return NULL;		/* "do nothing" */
 	}
 	else if (resultRelInfo->ri_FdwRoutine)
 	{
@@ -960,6 +979,7 @@ ExecUpdate(ModifyTableState *mtstate,
 	{
 		LockTupleMode lockmode;
 		bool		partition_constraint_failed;
+		bool		update_indexes;
 
 		/*
 		 * Constraints might reference the tableoid column, so (re-)initialize
@@ -978,6 +998,9 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 lreplace:;
 
+		/* ensure slot is independent, consider e.g. EPQ */
+		ExecMaterializeSlot(slot);
+
 		/*
 		 * If partition constraint fails, this row might get moved to another
 		 * partition, in which case we should check the RLS CHECK policy just
@@ -1145,14 +1168,53 @@ lreplace:;
 		 * needed for referential integrity updates in transaction-snapshot
 		 * mode transactions.
 		 */
-		updatetuple = ExecFetchSlotHeapTuple(slot, true, NULL);
-		result = heap_update(resultRelationDesc, tupleid,
-							 updatetuple,
-							 estate->es_output_cid,
-							 estate->es_crosscheck_snapshot,
-							 true /* wait for commit */ ,
-							 &hufd, &lockmode);
-		ItemPointerCopy(&updatetuple->t_self, &slot->tts_tid);
+		result = table_update(resultRelationDesc, tupleid, slot,
+							  estate->es_output_cid,
+							  estate->es_snapshot,
+							  estate->es_crosscheck_snapshot,
+							  true /* wait for commit */ ,
+							  &hufd, &lockmode, &update_indexes);
+
+		if (result == HeapTupleUpdated && !IsolationUsesXactSnapshot())
+		{
+			TupleTableSlot *inputslot;
+
+			EvalPlanQualBegin(epqstate, estate);
+
+			inputslot = EvalPlanQualSlot(epqstate, resultRelationDesc, resultRelInfo->ri_RangeTableIndex);
+			ExecCopySlot(inputslot, slot);
+
+			result = table_lock_tuple(resultRelationDesc, tupleid,
+									  estate->es_snapshot,
+									  inputslot, estate->es_output_cid,
+									  lockmode, LockWaitBlock,
+									  TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
+									  &hufd);
+			/* hari FIXME*/
+			/*Assert(result != HeapTupleUpdated && hufd.traversed);*/
+			if (result == HeapTupleMayBeUpdated)
+			{
+				TupleTableSlot *epqslot;
+
+				epqslot = EvalPlanQual(estate,
+									   epqstate,
+									   resultRelationDesc,
+									   resultRelInfo->ri_RangeTableIndex,
+									   inputslot);
+				if (TupIsNull(epqslot))
+				{
+					/* Tuple no more passing quals, exiting... */
+					return NULL;
+				}
+				slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+				goto lreplace;
+			}
+			else if (result == HeapTupleInvisible)
+			{
+				/* tuple is not visible; nothing to do */
+				return NULL;
+			}
+		}
 
 		switch (result)
 		{
@@ -1194,33 +1256,22 @@ lreplace:;
 				break;
 
 			case HeapTupleUpdated:
+
+				/*
+				 * The lower level isolation case for HeapTupleUpdated is
+				 * handled above.
+				 */
+				Assert(IsolationUsesXactSnapshot());
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+				break;
+
+			case HeapTupleDeleted:
 				if (IsolationUsesXactSnapshot())
 					ereport(ERROR,
 							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				if (ItemPointerIndicatesMovedPartitions(&hufd.ctid))
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("tuple to be updated was already moved to another partition due to concurrent update")));
-
-				if (!ItemPointerEquals(tupleid, &hufd.ctid))
-				{
-					TupleTableSlot *epqslot;
-
-					epqslot = EvalPlanQual(estate,
-										   epqstate,
-										   resultRelationDesc,
-										   resultRelInfo->ri_RangeTableIndex,
-										   lockmode,
-										   &hufd.ctid,
-										   hufd.xmax);
-					if (!TupIsNull(epqslot))
-					{
-						*tupleid = hufd.ctid;
-						slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
-						goto lreplace;
-					}
-				}
+							 errmsg("could not serialize access due to concurrent delete")));
 				/* tuple already deleted; nothing to do */
 				return NULL;
 
@@ -1241,13 +1292,12 @@ lreplace:;
 		 * insert index entries for tuple
 		 *
 		 * Note: heap_update returns the tid (location) of the new tuple in
-		 * the t_self field.
+		 * the t_self field.  FIXME
 		 *
 		 * If it's a HOT update, we mustn't insert new index entries.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0 && !HeapTupleIsHeapOnly(updatetuple))
-			recheckIndexes = ExecInsertIndexTuples(slot, &(updatetuple->t_self),
-												   estate, false, NULL, NIL);
+		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
+			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
 	}
 
 	if (canSetTag)
@@ -1306,11 +1356,12 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	Relation	relation = resultRelInfo->ri_RelationDesc;
 	ExprState  *onConflictSetWhere = resultRelInfo->ri_onConflict->oc_WhereClause;
 	TupleTableSlot *existing = resultRelInfo->ri_onConflict->oc_Existing;
-	HeapTupleData tuple;
 	HeapUpdateFailureData hufd;
 	LockTupleMode lockmode;
 	HTSU_Result test;
-	Buffer		buffer;
+	Datum		xminDatum;
+	TransactionId xmin;
+	bool		isnull;
 
 	/* Determine lock mode to use */
 	lockmode = ExecUpdateLockMode(estate, resultRelInfo);
@@ -1321,10 +1372,11 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 * previous conclusion that the tuple is conclusively committed is not
 	 * true anymore.
 	 */
-	tuple.t_self = *conflictTid;
-	test = heap_lock_tuple(relation, &tuple, estate->es_output_cid,
-						   lockmode, LockWaitBlock, false, &buffer,
-						   &hufd);
+	test = table_lock_tuple(relation, conflictTid,
+							estate->es_snapshot,
+							existing, estate->es_output_cid,
+							lockmode, LockWaitBlock, 0,
+							&hufd);
 	switch (test)
 	{
 		case HeapTupleMayBeUpdated:
@@ -1349,7 +1401,13 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 			 * that for SQL MERGE, an exception must be raised in the event of
 			 * an attempt to update the same row twice.
 			 */
-			if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple.t_data)))
+			xminDatum = slot_getsysattr(existing,
+										MinTransactionIdAttributeNumber,
+										&isnull);
+			Assert(!isnull);
+			xmin = DatumGetTransactionId(xminDatum);
+
+			if (TransactionIdIsCurrentTransactionId(xmin))
 				ereport(ERROR,
 						(errcode(ERRCODE_CARDINALITY_VIOLATION),
 						 errmsg("ON CONFLICT DO UPDATE command cannot affect row a second time"),
@@ -1390,7 +1448,16 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 			 * loop here, as the new version of the row might not conflict
 			 * anymore, or the conflicting tuple has actually been deleted.
 			 */
-			ReleaseBuffer(buffer);
+			ExecClearTuple(existing);
+			return false;
+
+		case HeapTupleDeleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent delete")));
+
+			ExecClearTuple(existing);
 			return false;
 
 		default:
@@ -1412,10 +1479,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 * snapshot.  This is in line with the way UPDATE deals with newer tuple
 	 * versions.
 	 */
-	ExecCheckHeapTupleVisible(estate, &tuple, buffer);
-
-	/* Store target's existing tuple in the state's dedicated slot */
-	ExecStorePinnedBufferHeapTuple(&tuple, existing, buffer);
+	ExecCheckHeapTupleVisible(estate, relation, existing);
 
 	/*
 	 * Make tuple and any needed join variables available to ExecQual and
@@ -1470,7 +1534,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 */
 
 	/* Execute UPDATE with projection */
-	*returning = ExecUpdate(mtstate, &tuple.t_self, NULL,
+	*returning = ExecUpdate(mtstate, conflictTid, NULL,
 							resultRelInfo->ri_onConflict->oc_ProjSlot,
 							planSlot,
 							&mtstate->mt_epqstate, mtstate->ps.state,
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 08872ef9b4f..b819cf2383e 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -376,7 +376,7 @@ TidNext(TidScanState *node)
 		if (node->tss_isCurrentOf)
 			heap_get_latest_tid(heapRelation, snapshot, &tuple->t_self);
 
-		if (heap_fetch(heapRelation, snapshot, tuple, &buffer, false, NULL))
+		if (heap_fetch(heapRelation, &tuple->t_self, snapshot, tuple, &buffer, NULL))
 		{
 			/*
 			 * Store the scanned tuple in the scan tuple slot of the scan
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a369716ce31..02bfc914d11 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -35,32 +35,10 @@
 #define HEAP_INSERT_NO_LOGICAL	0x0010
 
 typedef struct BulkInsertStateData *BulkInsertState;
+struct HeapUpdateFailureData;
 
 #define MaxLockTupleMode	LockTupleExclusive
 
-/*
- * When heap_update, heap_delete, or heap_lock_tuple fail because the target
- * tuple is already outdated, they fill in this struct to provide information
- * to the caller about what happened.
- * ctid is the target's ctid link: it is the same as the target's TID if the
- * target was deleted, or the location of the replacement tuple if the target
- * was updated.
- * xmax is the outdating transaction's XID.  If the caller wants to visit the
- * replacement tuple, it must check that this matches before believing the
- * replacement is really a match.
- * cmax is the outdating command's CID, but only when the failure code is
- * HeapTupleSelfUpdated (i.e., something in the current transaction outdated
- * the tuple); otherwise cmax is zero.  (We make this restriction because
- * HeapTupleHeaderGetCmax doesn't work for tuples outdated in other
- * transactions.)
- */
-typedef struct HeapUpdateFailureData
-{
-	ItemPointerData ctid;
-	TransactionId xmax;
-	CommandId	cmax;
-} HeapUpdateFailureData;
-
 /*
  * Descriptor for heap table scans.
  */
@@ -149,8 +127,8 @@ extern HeapTuple heap_getnext(TableScanDesc scan, ScanDirection direction);
 extern bool heap_getnextslot(TableScanDesc sscan,
 			ScanDirection direction, struct TupleTableSlot *slot);
 
-extern bool heap_fetch(Relation relation, Snapshot snapshot,
-		   HeapTuple tuple, Buffer *userbuf, bool keep_buf,
+extern bool heap_fetch(Relation relation, ItemPointer tid, Snapshot snapshot,
+		   HeapTuple tuple, Buffer *userbuf,
 		   Relation stats_relation);
 extern bool heap_hot_search_buffer(ItemPointer tid, Relation relation,
 					   Buffer buffer, Snapshot snapshot, HeapTuple heapTuple,
@@ -163,7 +141,7 @@ extern void heap_get_latest_tid(Relation relation, Snapshot snapshot,
 extern void setLastTid(const ItemPointer tid);
 
 extern BulkInsertState GetBulkInsertState(void);
-extern void FreeBulkInsertState(BulkInsertState);
+extern void FreeBulkInsertState(BulkInsertState bistate);
 extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
 
 extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
@@ -172,17 +150,18 @@ extern void heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 				  CommandId cid, int options, BulkInsertState bistate);
 extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
 			CommandId cid, Snapshot crosscheck, bool wait,
-			HeapUpdateFailureData *hufd, bool changingPart);
-extern void heap_finish_speculative(Relation relation, HeapTuple tuple);
-extern void heap_abort_speculative(Relation relation, HeapTuple tuple);
+			struct HeapUpdateFailureData *hufd, bool changingPart);
+extern void heap_finish_speculative(Relation relation, ItemPointer tid);
+extern void heap_abort_speculative(Relation relation, ItemPointer tid);
 extern HTSU_Result heap_update(Relation relation, ItemPointer otid,
 			HeapTuple newtup,
 			CommandId cid, Snapshot crosscheck, bool wait,
-			HeapUpdateFailureData *hufd, LockTupleMode *lockmode);
-extern HTSU_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
+			struct HeapUpdateFailureData *hufd, LockTupleMode *lockmode);
+extern HTSU_Result heap_lock_tuple(Relation relation, ItemPointer tid,
 				CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
-				bool follow_update,
-				Buffer *buffer, HeapUpdateFailureData *hufd);
+				bool follow_update, HeapTuple tuple,
+				Buffer *buffer, struct HeapUpdateFailureData *hufd);
+
 extern void heap_inplace_update(Relation relation, HeapTuple tuple);
 extern bool heap_freeze_tuple(HeapTupleHeader tuple,
 				  TransactionId relfrozenxid, TransactionId relminmxid,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 758a7309961..b34e903d84d 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -27,6 +27,32 @@ extern char *default_table_access_method;
 extern bool synchronize_seqscans;
 
 
+struct BulkInsertStateData;
+
+/*
+ * When table_update, table_delete, or table_lock_tuple fail because the target
+ * tuple is already outdated, they fill in this struct to provide information
+ * to the caller about what happened.
+ * ctid is the target's ctid link: it is the same as the target's TID if the
+ * target was deleted, or the location of the replacement tuple if the target
+ * was updated.
+ * xmax is the outdating transaction's XID.  If the caller wants to visit the
+ * replacement tuple, it must check that this matches before believing the
+ * replacement is really a match.
+ * cmax is the outdating command's CID, but only when the failure code is
+ * HeapTupleSelfUpdated (i.e., something in the current transaction outdated
+ * the tuple); otherwise cmax is zero.  (We make this restriction because
+ * HeapTupleHeaderGetCmax doesn't work for tuples outdated in other
+ * transactions.)
+ */
+typedef struct HeapUpdateFailureData
+{
+	ItemPointerData ctid;
+	TransactionId xmax;
+	CommandId	cmax;
+	bool		traversed;
+} HeapUpdateFailureData;
+
 /*
  * API struct for a table AM.  Note this must be allocated in a
  * server-lifetime manner, typically as a static const struct, which then gets
@@ -200,6 +226,51 @@ typedef struct TableAmRoutine
 											 TupleTableSlot *slot,
 											 Snapshot snapshot);
 
+	/* ------------------------------------------------------------------------
+	 * Manipulations of physical tuples.
+	 * ------------------------------------------------------------------------
+	 */
+
+	void		(*tuple_insert) (Relation rel, TupleTableSlot *slot, CommandId cid,
+								 int options, struct BulkInsertStateData *bistate);
+	void		(*tuple_insert_speculative) (Relation rel,
+											 TupleTableSlot *slot,
+											 CommandId cid,
+											 int options,
+											 struct BulkInsertStateData *bistate,
+											 uint32 specToken);
+	void		(*tuple_complete_speculative) (Relation rel,
+											   TupleTableSlot *slot,
+											   uint32 specToken,
+											   bool succeeded);
+	HTSU_Result (*tuple_delete) (Relation rel,
+								 ItemPointer tid,
+								 CommandId cid,
+								 Snapshot snapshot,
+								 Snapshot crosscheck,
+								 bool wait,
+								 HeapUpdateFailureData *hufd,
+								 bool changingPart);
+	HTSU_Result (*tuple_update) (Relation rel,
+								 ItemPointer otid,
+								 TupleTableSlot *slot,
+								 CommandId cid,
+								 Snapshot snapshot,
+								 Snapshot crosscheck,
+								 bool wait,
+								 HeapUpdateFailureData *hufd,
+								 LockTupleMode *lockmode,
+								 bool *update_indexes);
+	HTSU_Result (*tuple_lock) (Relation rel,
+							   ItemPointer tid,
+							   Snapshot snapshot,
+							   TupleTableSlot *slot,
+							   CommandId cid,
+							   LockTupleMode mode,
+							   LockWaitPolicy wait_policy,
+							   uint8 flags,
+							   HeapUpdateFailureData *hufd);
+
 } TableAmRoutine;
 
 
@@ -487,6 +558,93 @@ table_tuple_satisfies_snapshot(Relation rel, TupleTableSlot *slot, Snapshot snap
 }
 
 
+/* ----------------------------------------------------------------------------
+ *  Functions for manipulations of physical tuples.
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * Insert a tuple from a slot into table AM routine
+ */
+static inline void
+table_insert(Relation rel, TupleTableSlot *slot, CommandId cid,
+			 int options, struct BulkInsertStateData *bistate)
+{
+	rel->rd_tableam->tuple_insert(rel, slot, cid, options,
+								  bistate);
+}
+
+static inline void
+table_insert_speculative(Relation rel, TupleTableSlot *slot, CommandId cid,
+						 int options, struct BulkInsertStateData *bistate, uint32 specToken)
+{
+	rel->rd_tableam->tuple_insert_speculative(rel, slot, cid, options,
+											  bistate, specToken);
+}
+
+static inline void
+table_complete_speculative(Relation rel, TupleTableSlot *slot, uint32 specToken,
+						   bool succeeded)
+{
+	return rel->rd_tableam->tuple_complete_speculative(rel, slot, specToken,
+													   succeeded);
+}
+
+/*
+ * Delete a tuple from tid using table AM routine
+ */
+static inline HTSU_Result
+table_delete(Relation rel, ItemPointer tid, CommandId cid,
+			 Snapshot snapshot, Snapshot crosscheck, bool wait,
+			 HeapUpdateFailureData *hufd, bool changingPart)
+{
+	return rel->rd_tableam->tuple_delete(rel, tid, cid,
+										 snapshot, crosscheck,
+										 wait, hufd, changingPart);
+}
+
+/*
+ * update a tuple from tid using table AM routine
+ */
+static inline HTSU_Result
+table_update(Relation rel, ItemPointer otid, TupleTableSlot *slot,
+			 CommandId cid, Snapshot snapshot, Snapshot crosscheck, bool wait,
+			 HeapUpdateFailureData *hufd, LockTupleMode *lockmode,
+			 bool *update_indexes)
+{
+	return rel->rd_tableam->tuple_update(rel, otid, slot,
+										 cid, snapshot, crosscheck,
+										 wait, hufd,
+										 lockmode, update_indexes);
+}
+
+/*
+ * Lock a tuple in the specified mode.
+ */
+static inline HTSU_Result
+table_lock_tuple(Relation rel, ItemPointer tid, Snapshot snapshot,
+				 TupleTableSlot *slot, CommandId cid, LockTupleMode mode,
+				 LockWaitPolicy wait_policy, uint8 flags,
+				 HeapUpdateFailureData *hufd)
+{
+	return rel->rd_tableam->tuple_lock(rel, tid, snapshot, slot,
+									   cid, mode, wait_policy,
+									   flags, hufd);
+}
+
+
+/* ----------------------------------------------------------------------------
+ * Functions to make modifications a bit simpler.
+ * ----------------------------------------------------------------------------
+ */
+
+extern void simple_table_delete(Relation rel, ItemPointer tid,
+					Snapshot snapshot);
+extern void simple_table_update(Relation rel, ItemPointer otid,
+					TupleTableSlot *slot,
+					Snapshot snapshot, bool *update_indexes);
+
+
 /* ----------------------------------------------------------------------------
  * Helper functions to implement parallel scans for block oriented AMs.
  * ----------------------------------------------------------------------------
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 9003f2ce583..ceacd1c6370 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -195,12 +195,7 @@ extern LockTupleMode ExecUpdateLockMode(EState *estate, ResultRelInfo *relinfo);
 extern ExecRowMark *ExecFindRowMark(EState *estate, Index rti, bool missing_ok);
 extern ExecAuxRowMark *ExecBuildAuxRowMark(ExecRowMark *erm, List *targetlist);
 extern TupleTableSlot *EvalPlanQual(EState *estate, EPQState *epqstate,
-			 Relation relation, Index rti, LockTupleMode lockmode,
-			 ItemPointer tid, TransactionId priorXmax);
-extern bool EvalPlanQualFetch(EState *estate, Relation relation,
-				  LockTupleMode lockmode, LockWaitPolicy wait_policy,
-				  ItemPointer tid, TransactionId priorXmax,
-				  TupleTableSlot *slot);
+			 Relation relation, Index rti, TupleTableSlot *testslot);
 extern void EvalPlanQualInit(EPQState *epqstate, EState *estate,
 				 Plan *subplan, List *auxrowmarks, int epqParam);
 extern void EvalPlanQualSetPlan(EPQState *epqstate,
@@ -569,9 +564,8 @@ extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relIn
  */
 extern void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative);
 extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
-extern List *ExecInsertIndexTuples(TupleTableSlot *slot, ItemPointer tupleid,
-					  EState *estate, bool noDupErr, bool *specConflict,
-					  List *arbiterIndexes);
+extern List *ExecInsertIndexTuples(TupleTableSlot *slot, EState *estate, bool noDupErr,
+					  bool *specConflict, List *arbiterIndexes);
 extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate,
 						  ItemPointer conflictTid, List *arbiterIndexes);
 extern void check_exclusion_constraint(Relation heap, Relation index,
diff --git a/src/include/nodes/lockoptions.h b/src/include/nodes/lockoptions.h
index 8e8ccff43ca..d6b1160ab4b 100644
--- a/src/include/nodes/lockoptions.h
+++ b/src/include/nodes/lockoptions.h
@@ -58,4 +58,9 @@ typedef enum LockTupleMode
 	LockTupleExclusive
 } LockTupleMode;
 
+/* Follow tuples whose update is in progress if lock modes don't conflict  */
+#define TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS	(1 << 0)
+/* Follow update chain and lock lastest version of tuple */
+#define TUPLE_LOCK_FLAG_FIND_LAST_VERSION		(1 << 1)
+
 #endif							/* LOCKOPTIONS_H */
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index e7ea5cf7b56..b976c13cb53 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -193,6 +193,7 @@ typedef enum
 	HeapTupleInvisible,
 	HeapTupleSelfUpdated,
 	HeapTupleUpdated,
+	HeapTupleDeleted,
 	HeapTupleBeingUpdated,
 	HeapTupleWouldBlock			/* can be returned by heap_tuple_lock */
 } HTSU_Result;
diff --git a/src/test/isolation/expected/partition-key-update-1.out b/src/test/isolation/expected/partition-key-update-1.out
index 37fe6a7b277..a632d7f7bad 100644
--- a/src/test/isolation/expected/partition-key-update-1.out
+++ b/src/test/isolation/expected/partition-key-update-1.out
@@ -15,7 +15,7 @@ step s1u: UPDATE foo SET a=2 WHERE a=1;
 step s2d: DELETE FROM foo WHERE a=1; <waiting ...>
 step s1c: COMMIT;
 step s2d: <... completed>
-error in steps s1c s2d: ERROR:  tuple to be deleted was already moved to another partition due to concurrent update
+error in steps s1c s2d: ERROR:  tuple to be locked was already moved to another partition due to concurrent update
 step s2c: COMMIT;
 
 starting permutation: s1b s2b s2d s1u s2c s1c
-- 
2.21.0.dirty

v18-0004-tableam-Add-fetch_row_version.patchtext/x-diff; charset=us-asciiDownload
From 9ad369d158b2bb07dff11c4b035a30aca10a9443 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 7 Mar 2019 16:30:12 -0800
Subject: [PATCH v18 04/18] tableam: Add fetch_row_version.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/heap/heapam_handler.c | 28 ++++++++++
 src/backend/commands/trigger.c           | 70 ++++--------------------
 src/backend/executor/execMain.c          | 12 +---
 src/backend/executor/nodeLockRows.c      | 10 +---
 src/backend/executor/nodeModifyTable.c   | 23 ++------
 src/backend/executor/nodeTidscan.c       | 15 +----
 src/include/access/tableam.h             | 25 +++++++++
 7 files changed, 73 insertions(+), 110 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3285197c558..318e393dbde 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -148,6 +148,33 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
  * ------------------------------------------------------------------------
  */
 
+static bool
+heapam_fetch_row_version(Relation relation,
+						 ItemPointer tid,
+						 Snapshot snapshot,
+						 TupleTableSlot *slot,
+						 Relation stats_relation)
+{
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	Buffer		buffer;
+
+	Assert(TTS_IS_BUFFERTUPLE(slot));
+
+	if (heap_fetch(relation, tid, snapshot, &bslot->base.tupdata, &buffer, stats_relation))
+	{
+		/* store in slot, transferring existing pin */
+		ExecStorePinnedBufferHeapTuple(&bslot->base.tupdata, slot, buffer);
+
+		slot->tts_tableOid = RelationGetRelid(relation);
+
+		return true;
+	}
+
+	slot->tts_tableOid = RelationGetRelid(relation);
+
+	return false;
+}
+
 static bool
 heapam_tuple_satisfies_snapshot(Relation rel, TupleTableSlot *slot,
 								Snapshot snapshot)
@@ -546,6 +573,7 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_update = heapam_heap_update,
 	.tuple_lock = heapam_lock_tuple,
 
+	.tuple_fetch_row_version = heapam_fetch_row_version,
 	.tuple_satisfies_snapshot = heapam_tuple_satisfies_snapshot,
 };
 
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 5cc15bcfef0..6fbf0c2b81e 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -14,10 +14,11 @@
 #include "postgres.h"
 
 #include "access/genam.h"
-#include "access/heapam.h"
-#include "access/tableam.h"
-#include "access/sysattr.h"
 #include "access/htup_details.h"
+#include "access/relation.h"
+#include "access/sysattr.h"
+#include "access/table.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
@@ -3382,43 +3383,8 @@ GetTupleForTrigger(EState *estate,
 	}
 	else
 	{
-
-		Page		page;
-		ItemId		lp;
-		Buffer		buffer;
-		BufferHeapTupleTableSlot *boldslot;
-		HeapTuple tuple;
-
-		Assert(TTS_IS_BUFFERTUPLE(oldslot));
-		ExecClearTuple(oldslot);
-		boldslot = (BufferHeapTupleTableSlot *) oldslot;
-		tuple = &boldslot->base.tupdata;
-
-		buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
-
-		/*
-		 * Although we already know this tuple is valid, we must lock the
-		 * buffer to ensure that no one has a buffer cleanup lock; otherwise
-		 * they might move the tuple while we try to copy it.  But we can
-		 * release the lock before actually doing the heap_copytuple call,
-		 * since holding pin is sufficient to prevent anyone from getting a
-		 * cleanup lock they don't already hold.
-		 */
-		LockBuffer(buffer, BUFFER_LOCK_SHARE);
-
-		page = BufferGetPage(buffer);
-		lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
-
-		Assert(ItemIdIsNormal(lp));
-
-		tuple->t_data = (HeapTupleHeader) PageGetItem(page, lp);
-		tuple->t_len = ItemIdGetLength(lp);
-		tuple->t_self = *tid;
-		tuple->t_tableOid = RelationGetRelid(relation);
-
-		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
-
-		ExecStorePinnedBufferHeapTuple(tuple, oldslot, buffer);
+		if (!table_fetch_row_version(relation, tid, SnapshotAny, oldslot, NULL))
+			elog(ERROR, "couldn't fetch tuple");
 	}
 
 	return true;
@@ -4197,8 +4163,6 @@ AfterTriggerExecute(EState *estate,
 	AfterTriggerShared evtshared = GetTriggerSharedData(event);
 	Oid			tgoid = evtshared->ats_tgoid;
 	TriggerData LocTriggerData;
-	HeapTupleData tuple1;
-	HeapTupleData tuple2;
 	HeapTuple	rettuple;
 	int			tgindx;
 	bool		should_free_trig = false;
@@ -4275,19 +4239,12 @@ AfterTriggerExecute(EState *estate,
 		default:
 			if (ItemPointerIsValid(&(event->ate_ctid1)))
 			{
-				Buffer buffer;
-
 				LocTriggerData.tg_trigslot = ExecGetTriggerOldSlot(estate, relInfo);
 
-				ItemPointerCopy(&(event->ate_ctid1), &(tuple1.t_self));
-				if (!heap_fetch(rel, &(tuple1.t_self), SnapshotAny, &tuple1, &buffer, NULL))
+				if (!table_fetch_row_version(rel, &(event->ate_ctid1), SnapshotAny, LocTriggerData.tg_trigslot, NULL))
 					elog(ERROR, "failed to fetch tuple1 for AFTER trigger");
-				ExecStorePinnedBufferHeapTuple(&tuple1,
-											   LocTriggerData.tg_trigslot,
-											   buffer);
 				LocTriggerData.tg_trigtuple =
-					ExecFetchSlotHeapTuple(LocTriggerData.tg_trigslot, false,
-										   &should_free_trig);
+					ExecFetchSlotHeapTuple(LocTriggerData.tg_trigslot, false, &should_free_trig);
 			}
 			else
 			{
@@ -4299,19 +4256,12 @@ AfterTriggerExecute(EState *estate,
 				AFTER_TRIGGER_2CTID &&
 				ItemPointerIsValid(&(event->ate_ctid2)))
 			{
-				Buffer buffer;
-
 				LocTriggerData.tg_newslot = ExecGetTriggerNewSlot(estate, relInfo);
 
-				ItemPointerCopy(&(event->ate_ctid2), &(tuple2.t_self));
-				if (!heap_fetch(rel, &(tuple2.t_self), SnapshotAny, &tuple2, &buffer, NULL))
+				if (!table_fetch_row_version(rel, &(event->ate_ctid2), SnapshotAny, LocTriggerData.tg_newslot, NULL))
 					elog(ERROR, "failed to fetch tuple2 for AFTER trigger");
-				ExecStorePinnedBufferHeapTuple(&tuple2,
-											   LocTriggerData.tg_newslot,
-											   buffer);
 				LocTriggerData.tg_newtuple =
-					ExecFetchSlotHeapTuple(LocTriggerData.tg_newslot, false,
-										   &should_free_new);
+					ExecFetchSlotHeapTuple(LocTriggerData.tg_newslot, false, &should_free_new);
 			}
 			else
 			{
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 8723e32dd58..e70e9f08e42 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2638,17 +2638,9 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
 			else
 			{
 				/* ordinary table, fetch the tuple */
-				HeapTupleData tuple;
-				Buffer		buffer;
-
-				tuple.t_self = *((ItemPointer) DatumGetPointer(datum));
-				if (!heap_fetch(erm->relation, &tuple.t_self, SnapshotAny,
-								&tuple, &buffer, NULL))
+				if (!table_fetch_row_version(erm->relation, (ItemPointer) DatumGetPointer(datum),
+											 SnapshotAny, slot, NULL))
 					elog(ERROR, "failed to fetch tuple for EvalPlanQual recheck");
-
-				/* successful, store tuple */
-				ExecStorePinnedBufferHeapTuple(&tuple, slot, buffer);
-				ExecMaterializeSlot(slot);
 			}
 		}
 		else
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 91f46b88ed8..aedc5297e3b 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -21,7 +21,6 @@
 
 #include "postgres.h"
 
-#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/tableam.h"
 #include "access/xact.h"
@@ -275,8 +274,6 @@ lnext:
 			ExecAuxRowMark *aerm = (ExecAuxRowMark *) lfirst(lc);
 			ExecRowMark *erm = aerm->rowmark;
 			TupleTableSlot *markSlot;
-			HeapTupleData tuple;
-			Buffer buffer;
 
 			markSlot = EvalPlanQualSlot(&node->lr_epqstate, erm->relation, erm->rti);
 
@@ -297,12 +294,9 @@ lnext:
 			Assert(ItemPointerIsValid(&(erm->curCtid)));
 
 			/* okay, fetch the tuple */
-			tuple.t_self = erm->curCtid;
-			if (!heap_fetch(erm->relation, &tuple.t_self, SnapshotAny, &tuple, &buffer,
-							NULL))
+			if (!table_fetch_row_version(erm->relation, &erm->curCtid, SnapshotAny, markSlot,
+							   NULL))
 				elog(ERROR, "failed to fetch tuple for EvalPlanQual recheck");
-			ExecStorePinnedBufferHeapTuple(&tuple, markSlot, buffer);
-			ExecMaterializeSlot(markSlot);
 			/* successful, use tuple in slot */
 		}
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 5b079d8302a..b4247ec33b4 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -230,18 +230,15 @@ ExecCheckTIDVisible(EState *estate,
 					TupleTableSlot *tempSlot)
 {
 	Relation	rel = relinfo->ri_RelationDesc;
-	Buffer		buffer;
-	HeapTupleData tuple;
 
 	/* Redundantly check isolation level */
 	if (!IsolationUsesXactSnapshot())
 		return;
 
-	tuple.t_self = *tid;
-	if (!heap_fetch(rel, &tuple.t_self, SnapshotAny, &tuple, &buffer, NULL))
+	if (!table_fetch_row_version(rel, tid, SnapshotAny, tempSlot, NULL))
 		elog(ERROR, "failed to fetch conflicting tuple for ON CONFLICT");
-	ExecStorePinnedBufferHeapTuple(&tuple, tempSlot, buffer);
 	ExecCheckHeapTupleVisible(estate, rel, tempSlot);
+	ExecClearTuple(tempSlot);
 }
 
 /* ----------------------------------------------------------------
@@ -851,21 +848,9 @@ ldelete:;
 			}
 			else
 			{
-				BufferHeapTupleTableSlot *bslot;
-				HeapTuple deltuple;
-				Buffer buffer;
-
-				Assert(TTS_IS_BUFFERTUPLE(slot));
-				ExecClearTuple(slot);
-				bslot = (BufferHeapTupleTableSlot *) slot;
-				deltuple = &bslot->base.tupdata;
-
-				deltuple->t_self = *tupleid;
-				if (!heap_fetch(resultRelationDesc, &deltuple->t_self, SnapshotAny,
-								deltuple, &buffer, NULL))
+				if (!table_fetch_row_version(resultRelationDesc, tupleid, SnapshotAny,
+											 slot, NULL))
 					elog(ERROR, "failed to fetch deleted tuple for DELETE RETURNING");
-
-				ExecStorePinnedBufferHeapTuple(deltuple, slot, buffer);
 			}
 		}
 
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index b819cf2383e..7d496cc4105 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -310,7 +310,6 @@ TidNext(TidScanState *node)
 	Relation	heapRelation;
 	HeapTuple	tuple;
 	TupleTableSlot *slot;
-	Buffer		buffer = InvalidBuffer;
 	ItemPointerData *tidList;
 	int			numTids;
 	bool		bBackward;
@@ -376,19 +375,9 @@ TidNext(TidScanState *node)
 		if (node->tss_isCurrentOf)
 			heap_get_latest_tid(heapRelation, snapshot, &tuple->t_self);
 
-		if (heap_fetch(heapRelation, &tuple->t_self, snapshot, tuple, &buffer, NULL))
-		{
-			/*
-			 * Store the scanned tuple in the scan tuple slot of the scan
-			 * state, transferring the pin to the slot.
-			 */
-			ExecStorePinnedBufferHeapTuple(tuple, /* tuple to store */
-										   slot,	/* slot to store in */
-										   buffer);	/* buffer associated with
-													 * tuple */
-
+		if (table_fetch_row_version(heapRelation, &tuple->t_self, snapshot, slot, NULL))
 			return slot;
-		}
+
 		/* Bad TID or failed snapshot qual; try next */
 		if (bBackward)
 			node->tss_TidPtr--;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index b34e903d84d..abbd7f63385 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -218,6 +218,13 @@ typedef struct TableAmRoutine
 	 * ------------------------------------------------------------------------
 	 */
 
+
+	bool		(*tuple_fetch_row_version) (Relation rel,
+											ItemPointer tid,
+											Snapshot snapshot,
+											TupleTableSlot *slot,
+											Relation stats_relation);
+
 	/*
 	 * Does the tuple in `slot` satisfy `snapshot`?  The slot needs to be of
 	 * the appropriate type for the AM.
@@ -542,6 +549,24 @@ table_index_fetch_tuple(struct IndexFetchTableData *scan,
  * ------------------------------------------------------------------------
  */
 
+
+/*
+ *	table_fetch_row_version		- retrieve tuple with given tid
+ *
+ *  XXX: This shouldn't just take a tid, but tid + additional information
+ */
+static inline bool
+table_fetch_row_version(Relation rel,
+						ItemPointer tid,
+						Snapshot snapshot,
+						TupleTableSlot *slot,
+						Relation stats_relation)
+{
+	return rel->rd_tableam->tuple_fetch_row_version(rel, tid,
+													snapshot, slot,
+													stats_relation);
+}
+
 /*
  * Return true iff tuple in slot satisfies the snapshot.
  *
-- 
2.21.0.dirty

v18-0005-tableam-Add-use-tableam_fetch_follow_check.patchtext/x-diff; charset=us-asciiDownload
From a39cfd010c9292f5cfee2a8bdfc9c1f439e90243 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 8 Mar 2019 18:33:24 -0800
Subject: [PATCH v18 05/18] tableam: Add & use tableam_fetch_follow_check().

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/nbtree/nbtinsert.c | 10 ++++++----
 src/backend/access/nbtree/nbtsort.c   |  2 +-
 src/backend/access/table/tableam.c    | 20 ++++++++++++++++++++
 src/include/access/tableam.h          |  5 +++++
 4 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index 2b180288239..fe9e8a6e1a3 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -15,9 +15,9 @@
 
 #include "postgres.h"
 
-#include "access/heapam.h"
 #include "access/nbtree.h"
 #include "access/nbtxlog.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xloginsert.h"
 #include "miscadmin.h"
@@ -414,8 +414,9 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
 				 * that satisfies SnapshotDirty.  This is necessary because we
 				 * have just a single index entry for the entire chain.
 				 */
-				else if (heap_hot_search(&htid, heapRel, &SnapshotDirty,
-										 &all_dead))
+				else if (table_index_fetch_tuple_check(heapRel, &htid,
+													   &SnapshotDirty,
+													   &all_dead))
 				{
 					TransactionId xwait;
 
@@ -468,7 +469,8 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
 					 * entry.
 					 */
 					htid = itup->t_tid;
-					if (heap_hot_search(&htid, heapRel, SnapshotSelf, NULL))
+					if (table_index_fetch_tuple_check(heapRel, &htid,
+													  SnapshotSelf, NULL))
 					{
 						/* Normal case --- it's still live */
 					}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index e37cbac7b3c..05a9b03aed5 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -57,10 +57,10 @@
 
 #include "postgres.h"
 
-#include "access/heapam.h"
 #include "access/nbtree.h"
 #include "access/parallel.h"
 #include "access/relscan.h"
+#include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 9a01f74d8fe..b1cf8245e3f 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -181,6 +181,26 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc parallel_scan)
  * ----------------------------------------------------------------------------
  */
 
+bool
+table_index_fetch_tuple_check(Relation rel,
+							  ItemPointer tid,
+							  Snapshot snapshot,
+							  bool *all_dead)
+{
+	IndexFetchTableData *scan = table_index_fetch_begin(rel);
+	TupleTableSlot *slot = table_slot_create(rel, NULL);
+	bool		call_again = false;
+	bool		found;
+
+	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again, all_dead);
+
+	table_index_fetch_end(scan);
+	ExecDropSingleTupleTableSlot(slot);
+
+	return found;
+}
+
+
 /*
  *	simple_table_update - replace a tuple
  *
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index abbd7f63385..cac3a5fdd83 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -543,6 +543,11 @@ table_index_fetch_tuple(struct IndexFetchTableData *scan,
 													all_dead);
 }
 
+extern bool table_index_fetch_tuple_check(Relation rel,
+							  ItemPointer tid,
+							  Snapshot snapshot,
+							  bool *all_dead);
+
 
 /* ------------------------------------------------------------------------
  * Functions for non-modifying operations on individual tuples
-- 
2.21.0.dirty

v18-0006-tableam-Add-table_get_latest_tid.patchtext/x-diff; charset=us-asciiDownload
From efa35a092b2527ce5124b273c53c006ad1bc1dc7 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 19 Jan 2019 22:55:39 -0800
Subject: [PATCH v18 06/18] tableam: Add table_get_latest_tid.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/heap/heapam_handler.c |  1 +
 src/backend/executor/nodeTidscan.c       | 14 +++-----------
 src/backend/utils/adt/tid.c              |  5 +++--
 src/include/access/tableam.h             | 12 ++++++++++++
 4 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 318e393dbde..f4bbf24412e 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -574,6 +574,7 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_lock = heapam_lock_tuple,
 
 	.tuple_fetch_row_version = heapam_fetch_row_version,
+	.tuple_get_latest_tid = heap_get_latest_tid,
 	.tuple_satisfies_snapshot = heapam_tuple_satisfies_snapshot,
 };
 
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 7d496cc4105..354dcc16d41 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -22,7 +22,6 @@
  */
 #include "postgres.h"
 
-#include "access/heapam.h"
 #include "access/sysattr.h"
 #include "access/tableam.h"
 #include "catalog/pg_type.h"
@@ -308,7 +307,6 @@ TidNext(TidScanState *node)
 	ScanDirection direction;
 	Snapshot	snapshot;
 	Relation	heapRelation;
-	HeapTuple	tuple;
 	TupleTableSlot *slot;
 	ItemPointerData *tidList;
 	int			numTids;
@@ -332,12 +330,6 @@ TidNext(TidScanState *node)
 	tidList = node->tss_TidList;
 	numTids = node->tss_NumTids;
 
-	/*
-	 * We use node->tss_htup as the tuple pointer; note this can't just be a
-	 * local variable here, as the scan tuple slot will keep a pointer to it.
-	 */
-	tuple = &(node->tss_htup);
-
 	/*
 	 * Initialize or advance scan position, depending on direction.
 	 */
@@ -365,7 +357,7 @@ TidNext(TidScanState *node)
 
 	while (node->tss_TidPtr >= 0 && node->tss_TidPtr < numTids)
 	{
-		tuple->t_self = tidList[node->tss_TidPtr];
+		ItemPointerData tid = tidList[node->tss_TidPtr];
 
 		/*
 		 * For WHERE CURRENT OF, the tuple retrieved from the cursor might
@@ -373,9 +365,9 @@ TidNext(TidScanState *node)
 		 * current according to our snapshot.
 		 */
 		if (node->tss_isCurrentOf)
-			heap_get_latest_tid(heapRelation, snapshot, &tuple->t_self);
+			table_get_latest_tid(heapRelation, snapshot, &tid);
 
-		if (table_fetch_row_version(heapRelation, &tuple->t_self, snapshot, slot, NULL))
+		if (table_fetch_row_version(heapRelation, &tid, snapshot, slot, NULL))
 			return slot;
 
 		/* Bad TID or failed snapshot qual; try next */
diff --git a/src/backend/utils/adt/tid.c b/src/backend/utils/adt/tid.c
index f5ffd12cfc9..333d9801c98 100644
--- a/src/backend/utils/adt/tid.c
+++ b/src/backend/utils/adt/tid.c
@@ -23,6 +23,7 @@
 #include "access/hash.h"
 #include "access/heapam.h"
 #include "access/sysattr.h"
+#include "access/tableam.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "libpq/pqformat.h"
@@ -379,7 +380,7 @@ currtid_byreloid(PG_FUNCTION_ARGS)
 	ItemPointerCopy(tid, result);
 
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	heap_get_latest_tid(rel, snapshot, result);
+	table_get_latest_tid(rel, snapshot, result);
 	UnregisterSnapshot(snapshot);
 
 	table_close(rel, AccessShareLock);
@@ -414,7 +415,7 @@ currtid_byrelname(PG_FUNCTION_ARGS)
 	ItemPointerCopy(tid, result);
 
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	heap_get_latest_tid(rel, snapshot, result);
+	table_get_latest_tid(rel, snapshot, result);
 	UnregisterSnapshot(snapshot);
 
 	table_close(rel, AccessShareLock);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index cac3a5fdd83..0fdb6b78e7f 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -225,6 +225,10 @@ typedef struct TableAmRoutine
 											TupleTableSlot *slot,
 											Relation stats_relation);
 
+	void		(*tuple_get_latest_tid) (Relation rel,
+										 Snapshot snapshot,
+										 ItemPointer tid);
+
 	/*
 	 * Does the tuple in `slot` satisfy `snapshot`?  The slot needs to be of
 	 * the appropriate type for the AM.
@@ -572,6 +576,14 @@ table_fetch_row_version(Relation rel,
 													stats_relation);
 }
 
+static inline void
+table_get_latest_tid(Relation rel,
+					 Snapshot snapshot,
+					 ItemPointer tid)
+{
+	rel->rd_tableam->tuple_get_latest_tid(rel, snapshot, tid);
+}
+
 /*
  * Return true iff tuple in slot satisfies the snapshot.
  *
-- 
2.21.0.dirty

v18-0007-tableam-multi_insert-and-slotify-COPY.patchtext/x-diff; charset=us-asciiDownload
From f2b3532bc28cee586ce06eaec57db785ed355100 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 20 Jan 2019 13:28:26 -0800
Subject: [PATCH v18 07/18] tableam: multi_insert and slotify COPY.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/heap/heapam.c         |  13 +-
 src/backend/access/heap/heapam_handler.c |   1 +
 src/backend/commands/copy.c              | 301 ++++++++++++-----------
 src/include/access/heapam.h              |   3 +-
 src/include/access/tableam.h             |  13 +
 5 files changed, 180 insertions(+), 151 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a2bb1701e40..0321d6bab9a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2149,7 +2149,7 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
  * temporary context before calling this, if that's a problem.
  */
 void
-heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
+heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 				  CommandId cid, int options, BulkInsertState bistate)
 {
 	TransactionId xid = GetCurrentTransactionId();
@@ -2170,12 +2170,17 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 	saveFreeSpace = RelationGetTargetPageFreeSpace(relation,
 												   HEAP_DEFAULT_FILLFACTOR);
 
-	/* Toast and set header data in all the tuples */
+	/* Toast and set header data in all the slots */
 	heaptuples = palloc(ntuples * sizeof(HeapTuple));
 	for (i = 0; i < ntuples; i++)
-		heaptuples[i] = heap_prepare_insert(relation, tuples[i],
+	{
+		heaptuples[i] = heap_prepare_insert(relation, ExecFetchSlotHeapTuple(slots[i], true, NULL),
 											xid, cid, options);
 
+		if (slots[i]->tts_tableOid != InvalidOid)
+			heaptuples[i]->t_tableOid = slots[i]->tts_tableOid;
+	}
+
 	/*
 	 * We're about to do the actual inserts -- but check for conflict first,
 	 * to minimize the possibility of having to roll back work we've just
@@ -2410,7 +2415,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 	 * probably faster to always copy than check.
 	 */
 	for (i = 0; i < ntuples; i++)
-		tuples[i]->t_self = heaptuples[i]->t_self;
+		slots[i]->tts_tid = heaptuples[i]->t_self;
 
 	pgstat_count_heap_insert(relation, ntuples);
 }
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index f4bbf24412e..ea8e3ee9ce5 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -571,6 +571,7 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_complete_speculative = heapam_heap_complete_speculative,
 	.tuple_delete = heapam_heap_delete,
 	.tuple_update = heapam_heap_update,
+	.multi_insert = heap_multi_insert,
 	.tuple_lock = heapam_lock_tuple,
 
 	.tuple_fetch_row_version = heapam_fetch_row_version,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index aba93262383..312fd3bed31 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -319,9 +319,9 @@ static void CopyOneRowTo(CopyState cstate,
 			 Datum *values, bool *nulls);
 static void CopyFromInsertBatch(CopyState cstate, EState *estate,
 					CommandId mycid, int hi_options,
-					ResultRelInfo *resultRelInfo, TupleTableSlot *myslot,
+					ResultRelInfo *resultRelInfo,
 					BulkInsertState bistate,
-					int nBufferedTuples, HeapTuple *bufferedTuples,
+					int nBufferedTuples, TupleTableSlot **bufferedSlots,
 					uint64 firstBufferedLineNo);
 static bool CopyReadLine(CopyState cstate);
 static bool CopyReadLineText(CopyState cstate);
@@ -2072,33 +2072,27 @@ CopyTo(CopyState cstate)
 
 	if (cstate->rel)
 	{
-		Datum	   *values;
-		bool	   *nulls;
+		TupleTableSlot *slot;
 		TableScanDesc scandesc;
-		HeapTuple	tuple;
-
-		values = (Datum *) palloc(num_phys_attrs * sizeof(Datum));
-		nulls = (bool *) palloc(num_phys_attrs * sizeof(bool));
 
 		scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
+		slot = table_slot_create(cstate->rel, NULL);
 
 		processed = 0;
-		while ((tuple = heap_getnext(scandesc, ForwardScanDirection)) != NULL)
+		while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
 		{
 			CHECK_FOR_INTERRUPTS();
 
-			/* Deconstruct the tuple ... faster than repeated heap_getattr */
-			heap_deform_tuple(tuple, tupDesc, values, nulls);
+			/* Deconstruct the tuple ... */
+			slot_getallattrs(slot);
 
 			/* Format and send the data */
-			CopyOneRowTo(cstate, values, nulls);
+			CopyOneRowTo(cstate, slot->tts_values, slot->tts_isnull);
 			processed++;
 		}
 
+		ExecDropSingleTupleTableSlot(slot);
 		table_endscan(scandesc);
-
-		pfree(values);
-		pfree(nulls);
 	}
 	else
 	{
@@ -2310,26 +2304,21 @@ limit_printout_length(const char *str)
 uint64
 CopyFrom(CopyState cstate)
 {
-	HeapTuple	tuple;
-	TupleDesc	tupDesc;
-	Datum	   *values;
-	bool	   *nulls;
 	ResultRelInfo *resultRelInfo;
 	ResultRelInfo *target_resultRelInfo;
 	ResultRelInfo *prevResultRelInfo = NULL;
 	EState	   *estate = CreateExecutorState(); /* for ExecConstraints() */
 	ModifyTableState *mtstate;
 	ExprContext *econtext;
-	TupleTableSlot *myslot;
+	TupleTableSlot *singleslot = NULL;
 	MemoryContext oldcontext = CurrentMemoryContext;
-	MemoryContext batchcontext;
 
 	PartitionTupleRouting *proute = NULL;
 	ErrorContextCallback errcallback;
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			hi_options = 0; /* start with default heap_insert options */
-	BulkInsertState bistate;
 	CopyInsertMethod insertMethod;
+	BulkInsertState bistate;
 	uint64		processed = 0;
 	int			nBufferedTuples = 0;
 	bool		has_before_insert_row_trig;
@@ -2338,8 +2327,8 @@ CopyFrom(CopyState cstate)
 
 #define MAX_BUFFERED_TUPLES 1000
 #define RECHECK_MULTI_INSERT_THRESHOLD 1000
-	HeapTuple  *bufferedTuples = NULL;	/* initialize to silence warning */
-	Size		bufferedTuplesSize = 0;
+	TupleTableSlot  **bufferedSlots = NULL;	/* initialize to silence warning */
+	Size		bufferedSlotsSize = 0;
 	uint64		firstBufferedLineNo = 0;
 	uint64		lastPartitionSampleLineNo = 0;
 	uint64		nPartitionChanges = 0;
@@ -2381,8 +2370,6 @@ CopyFrom(CopyState cstate)
 							RelationGetRelationName(cstate->rel))));
 	}
 
-	tupDesc = RelationGetDescr(cstate->rel);
-
 	/*----------
 	 * Check to see if we can avoid writing WAL
 	 *
@@ -2517,10 +2504,6 @@ CopyFrom(CopyState cstate)
 
 	ExecInitRangeTable(estate, cstate->range_table);
 
-	/* Set up a tuple slot too */
-	myslot = ExecInitExtraTupleSlot(estate, tupDesc,
-									&TTSOpsHeapTuple);
-
 	/*
 	 * Set up a ModifyTableState so we can let FDW(s) init themselves for
 	 * foreign-table result relation(s).
@@ -2642,7 +2625,17 @@ CopyFrom(CopyState cstate)
 		else
 			insertMethod = CIM_MULTI;
 
-		bufferedTuples = palloc(MAX_BUFFERED_TUPLES * sizeof(HeapTuple));
+		bufferedSlots = palloc0(MAX_BUFFERED_TUPLES * sizeof(TupleTableSlot *));
+	}
+
+	/*
+	 * If not using batch mode (which allocates slots as needed), Set up a
+	 * tuple slot too.
+	 */
+	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
+	{
+		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
+									   &estate->es_tupleTable);
 	}
 
 	has_before_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
@@ -2659,9 +2652,6 @@ CopyFrom(CopyState cstate)
 	 */
 	ExecBSInsertTriggers(estate, resultRelInfo);
 
-	values = (Datum *) palloc(tupDesc->natts * sizeof(Datum));
-	nulls = (bool *) palloc(tupDesc->natts * sizeof(bool));
-
 	bistate = GetBulkInsertState();
 	econtext = GetPerTupleExprContext(estate);
 
@@ -2671,17 +2661,9 @@ CopyFrom(CopyState cstate)
 	errcallback.previous = error_context_stack;
 	error_context_stack = &errcallback;
 
-	/*
-	 * Set up memory context for batches. For cases without batching we could
-	 * use the per-tuple context, but it's simpler to just use it every time.
-	 */
-	batchcontext = AllocSetContextCreate(CurrentMemoryContext,
-										 "batch context",
-										 ALLOCSET_DEFAULT_SIZES);
-
 	for (;;)
 	{
-		TupleTableSlot *slot;
+		TupleTableSlot *myslot;
 		bool		skip_tuple;
 
 		CHECK_FOR_INTERRUPTS();
@@ -2692,20 +2674,39 @@ CopyFrom(CopyState cstate)
 		 */
 		ResetPerTupleExprContext(estate);
 
+		if (insertMethod == CIM_SINGLE || proute)
+		{
+			myslot = singleslot;
+			Assert(myslot != NULL);
+		}
+		else
+		{
+			if (bufferedSlots[nBufferedTuples] == NULL)
+			{
+				const TupleTableSlotOps *tts_cb;
+
+				tts_cb = table_slot_callbacks(resultRelInfo->ri_RelationDesc);
+
+				bufferedSlots[nBufferedTuples] =
+					MakeSingleTupleTableSlot(RelationGetDescr(resultRelInfo->ri_RelationDesc),
+											 tts_cb);
+			}
+			myslot = bufferedSlots[nBufferedTuples];
+		}
+
 		/*
 		 * Switch to per-tuple context before calling NextCopyFrom, which does
 		 * evaluate default expressions etc. and requires per-tuple context.
 		 */
 		MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
 
-		if (!NextCopyFrom(cstate, econtext, values, nulls))
+		ExecClearTuple(myslot);
+
+		/* Directly store the values/nulls array in the slot */
+		if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
 			break;
 
-		/* Switch into per-batch memory context before forming the tuple. */
-		MemoryContextSwitchTo(batchcontext);
-
-		/* And now we can form the input tuple. */
-		tuple = heap_form_tuple(tupDesc, values, nulls);
+		ExecStoreVirtualTuple(myslot);
 
 		/*
 		 * Constraints might reference the tableoid column, so (re-)initialize
@@ -2716,10 +2717,6 @@ CopyFrom(CopyState cstate)
 		/* Triggers and stuff need to be invoked in query context. */
 		MemoryContextSwitchTo(oldcontext);
 
-		/* Place tuple in tuple slot --- but slot shouldn't free it */
-		slot = myslot;
-		ExecStoreHeapTuple(tuple, slot, false);
-
 		if (cstate->whereClause)
 		{
 			econtext->ecxt_scantuple = myslot;
@@ -2738,7 +2735,7 @@ CopyFrom(CopyState cstate)
 			 * if the found partition is not suitable for INSERTs.
 			 */
 			resultRelInfo = ExecFindPartition(mtstate, target_resultRelInfo,
-											  proute, slot, estate);
+											  proute, myslot, estate);
 
 			if (prevResultRelInfo != resultRelInfo)
 			{
@@ -2752,38 +2749,20 @@ CopyFrom(CopyState cstate)
 					 */
 					if (nBufferedTuples > 0)
 					{
-						MemoryContext	oldcontext;
-
 						CopyFromInsertBatch(cstate, estate, mycid, hi_options,
-											prevResultRelInfo, myslot, bistate,
-											nBufferedTuples, bufferedTuples,
+											prevResultRelInfo, bistate,
+											nBufferedTuples, bufferedSlots,
 											firstBufferedLineNo);
 						nBufferedTuples = 0;
-						bufferedTuplesSize = 0;
 
-						/*
-						 * The tuple is already allocated in the batch context, which
-						 * we want to reset.  So to keep the tuple we copy it into the
-						 * short-lived (per-tuple) context, reset the batch context
-						 * and then copy it back into the per-batch one.
-						 */
-						oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-						tuple = heap_copytuple(tuple);
-						MemoryContextSwitchTo(oldcontext);
-
-						/* cleanup the old batch */
-						MemoryContextReset(batchcontext);
-
-						/* copy the tuple back to the per-batch context */
-						oldcontext = MemoryContextSwitchTo(batchcontext);
-						tuple = heap_copytuple(tuple);
-						MemoryContextSwitchTo(oldcontext);
-
-						/*
-						 * Also push the tuple copy to the slot (resetting the context
-						 * invalidated the slot contents).
-						 */
-						ExecStoreHeapTuple(tuple, slot, false);
+						/* force new slots to be used */
+						for (int i = 0; i < MAX_BUFFERED_TUPLES; i++)
+						{
+							if (bufferedSlots[i] == NULL)
+								continue;
+							ExecDropSingleTupleTableSlot(bufferedSlots[i]);
+							bufferedSlots[i] = NULL;
+						}
 					}
 
 					nPartitionChanges++;
@@ -2878,26 +2857,47 @@ CopyFrom(CopyState cstate)
 			 * rowtype.
 			 */
 			map = resultRelInfo->ri_PartitionInfo->pi_RootToPartitionMap;
-			if (map != NULL)
+			if (insertMethod == CIM_SINGLE ||
+				(insertMethod == CIM_MULTI_CONDITIONAL && !leafpart_use_multi_insert))
+			{
+				if (map != NULL)
+				{
+					TupleTableSlot *new_slot;
+
+					new_slot = resultRelInfo->ri_PartitionInfo->pi_PartitionTupleSlot;
+					myslot = execute_attr_map_slot(map->attrMap, myslot, new_slot);
+				}
+			}
+			else if (insertMethod == CIM_MULTI_CONDITIONAL)
 			{
 				TupleTableSlot *new_slot;
-				MemoryContext oldcontext;
 
-				new_slot = resultRelInfo->ri_PartitionInfo->pi_PartitionTupleSlot;
-				Assert(new_slot != NULL);
+				if (bufferedSlots[nBufferedTuples] == NULL)
+				{
+					const TupleTableSlotOps *tts_cb;
 
-				slot = execute_attr_map_slot(map->attrMap, slot, new_slot);
+					tts_cb = table_slot_callbacks(resultRelInfo->ri_RelationDesc);
+					bufferedSlots[nBufferedTuples] =
+						MakeSingleTupleTableSlot(RelationGetDescr(resultRelInfo->ri_RelationDesc),
+												 tts_cb);
+				}
 
-				/*
-				 * Get the tuple in the per-batch context, so that it will be
-				 * freed after each batch insert.
-				 */
-				oldcontext = MemoryContextSwitchTo(batchcontext);
-				tuple = ExecCopySlotHeapTuple(slot);
-				MemoryContextSwitchTo(oldcontext);
+				new_slot = bufferedSlots[nBufferedTuples];
+
+				if (map != NULL)
+					myslot = execute_attr_map_slot(map->attrMap, myslot, new_slot);
+				else
+				{
+					ExecCopySlot(new_slot, myslot);
+					myslot = new_slot;
+				}
 			}
-
-			slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+			else
+			{
+				elog(ERROR, "huh");
+			}
+			// FIXME: needed?
+			myslot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 		}
 
 		skip_tuple = false;
@@ -2905,7 +2905,7 @@ CopyFrom(CopyState cstate)
 		/* BEFORE ROW INSERT Triggers */
 		if (has_before_insert_row_trig)
 		{
-			if (!ExecBRInsertTriggers(estate, resultRelInfo, slot))
+			if (!ExecBRInsertTriggers(estate, resultRelInfo, myslot))
 				skip_tuple = true;	/* "do nothing" */
 		}
 
@@ -2914,7 +2914,7 @@ CopyFrom(CopyState cstate)
 			if (has_instead_insert_row_trig)
 			{
 				/* Pass the data to the INSTEAD ROW INSERT trigger */
-				ExecIRInsertTriggers(estate, resultRelInfo, slot);
+				ExecIRInsertTriggers(estate, resultRelInfo, myslot);
 			}
 			else
 			{
@@ -2924,7 +2924,7 @@ CopyFrom(CopyState cstate)
 				 */
 				if (resultRelInfo->ri_FdwRoutine == NULL &&
 					resultRelInfo->ri_RelationDesc->rd_att->constr)
-					ExecConstraints(resultRelInfo, slot, estate);
+					ExecConstraints(resultRelInfo, myslot, estate);
 
 				/*
 				 * Also check the tuple against the partition constraint, if
@@ -2934,7 +2934,7 @@ CopyFrom(CopyState cstate)
 				 */
 				if (resultRelInfo->ri_PartitionCheck &&
 					(proute == NULL || has_before_insert_row_trig))
-					ExecPartitionCheck(resultRelInfo, slot, estate, true);
+					ExecPartitionCheck(resultRelInfo, myslot, estate, true);
 
 				/*
 				 * Perform multi-inserts when enabled, or when loading a
@@ -2946,8 +2946,16 @@ CopyFrom(CopyState cstate)
 					/* Add this tuple to the tuple buffer */
 					if (nBufferedTuples == 0)
 						firstBufferedLineNo = cstate->cur_lineno;
-					bufferedTuples[nBufferedTuples++] = tuple;
-					bufferedTuplesSize += tuple->t_len;
+
+					/*
+					 * The slot previously might point into the per-tuple
+					 * context.
+					 */
+					ExecMaterializeSlot(myslot);
+
+					Assert(bufferedSlots[nBufferedTuples] == myslot);
+					nBufferedTuples++;
+					bufferedSlotsSize += cstate->line_buf.len;
 
 					/*
 					 * If the buffer filled up, flush it.  Also flush if the
@@ -2956,17 +2964,14 @@ CopyFrom(CopyState cstate)
 					 * buffer when the tuples are exceptionally wide.
 					 */
 					if (nBufferedTuples == MAX_BUFFERED_TUPLES ||
-						bufferedTuplesSize > 65535)
+						bufferedSlotsSize > 65535)
 					{
 						CopyFromInsertBatch(cstate, estate, mycid, hi_options,
-											resultRelInfo, myslot, bistate,
-											nBufferedTuples, bufferedTuples,
+											resultRelInfo, bistate,
+											nBufferedTuples, bufferedSlots,
 											firstBufferedLineNo);
 						nBufferedTuples = 0;
-						bufferedTuplesSize = 0;
-
-						/* free memory occupied by tuples from the batch */
-						MemoryContextReset(batchcontext);
+						bufferedSlotsSize = 0;
 					}
 				}
 				else
@@ -2976,12 +2981,12 @@ CopyFrom(CopyState cstate)
 					/* OK, store the tuple */
 					if (resultRelInfo->ri_FdwRoutine != NULL)
 					{
-						slot = resultRelInfo->ri_FdwRoutine->ExecForeignInsert(estate,
-																			   resultRelInfo,
-																			   slot,
-																			   NULL);
+						myslot = resultRelInfo->ri_FdwRoutine->ExecForeignInsert(estate,
+																				 resultRelInfo,
+																				 myslot,
+																				 NULL);
 
-						if (slot == NULL)	/* "do nothing" */
+						if (myslot == NULL)	/* "do nothing" */
 							continue;	/* next tuple please */
 
 						/*
@@ -2989,27 +2994,27 @@ CopyFrom(CopyState cstate)
 						 * column, so (re-)initialize tts_tableOid before
 						 * evaluating them.
 						 */
-						slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+						myslot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 					}
 					else
 					{
-						tuple = ExecFetchSlotHeapTuple(slot, true, NULL);
-						heap_insert(resultRelInfo->ri_RelationDesc, tuple,
-									mycid, hi_options, bistate);
-						ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
-						slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+						/* OK, store the tuple and create index entries for it */
+						table_insert(resultRelInfo->ri_RelationDesc, myslot, mycid, hi_options,
+									 bistate);
+						myslot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 					}
 
+
 					/* And create index entries for it */
 					if (resultRelInfo->ri_NumIndices > 0)
-						recheckIndexes = ExecInsertIndexTuples(slot,
+						recheckIndexes = ExecInsertIndexTuples(myslot,
 															   estate,
 															   false,
 															   NULL,
 															   NIL);
 
 					/* AFTER ROW INSERT Triggers */
-					ExecARInsertTriggers(estate, resultRelInfo, slot,
+					ExecARInsertTriggers(estate, resultRelInfo, myslot,
 										 recheckIndexes, cstate->transition_capture);
 
 					list_free(recheckIndexes);
@@ -3031,26 +3036,36 @@ CopyFrom(CopyState cstate)
 		if (insertMethod == CIM_MULTI_CONDITIONAL)
 		{
 			CopyFromInsertBatch(cstate, estate, mycid, hi_options,
-								prevResultRelInfo, myslot, bistate,
-								nBufferedTuples, bufferedTuples,
+								prevResultRelInfo, bistate,
+								nBufferedTuples, bufferedSlots,
 								firstBufferedLineNo);
 		}
 		else
 			CopyFromInsertBatch(cstate, estate, mycid, hi_options,
-								resultRelInfo, myslot, bistate,
-								nBufferedTuples, bufferedTuples,
+								resultRelInfo, bistate,
+								nBufferedTuples, bufferedSlots,
 								firstBufferedLineNo);
 	}
 
+	/* free slots */
+	if (bufferedSlots)
+	{
+		for (int i = 0; i < MAX_BUFFERED_TUPLES; i++)
+		{
+			if (bufferedSlots[i] == NULL)
+				continue;
+			ExecDropSingleTupleTableSlot(bufferedSlots[i]);
+			bufferedSlots[i] = NULL;
+		}
+	}
+
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	FreeBulkInsertState(bistate);
+	ReleaseBulkInsertStatePin(bistate);
 
 	MemoryContextSwitchTo(oldcontext);
 
-	MemoryContextDelete(batchcontext);
-
 	/*
 	 * In the old protocol, tell pqcomm that we can process normal protocol
 	 * messages again.
@@ -3064,9 +3079,6 @@ CopyFrom(CopyState cstate)
 	/* Handle queued AFTER triggers */
 	AfterTriggerEndQuery(estate);
 
-	pfree(values);
-	pfree(nulls);
-
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
 	/* Allow the FDW to shut down */
@@ -3104,8 +3116,7 @@ CopyFrom(CopyState cstate)
 static void
 CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 					int hi_options, ResultRelInfo *resultRelInfo,
-					TupleTableSlot *myslot, BulkInsertState bistate,
-					int nBufferedTuples, HeapTuple *bufferedTuples,
+					BulkInsertState bistate, int nBufferedTuples, TupleTableSlot **bufferedSlots,
 					uint64 firstBufferedLineNo)
 {
 	MemoryContext oldcontext;
@@ -3125,12 +3136,12 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 	 * before calling it.
 	 */
 	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	heap_multi_insert(resultRelInfo->ri_RelationDesc,
-					  bufferedTuples,
-					  nBufferedTuples,
-					  mycid,
-					  hi_options,
-					  bistate);
+	table_multi_insert(resultRelInfo->ri_RelationDesc,
+					   bufferedSlots,
+					   nBufferedTuples,
+					   mycid,
+					   hi_options,
+					   bistate);
 	MemoryContextSwitchTo(oldcontext);
 
 	/*
@@ -3144,12 +3155,11 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 			List	   *recheckIndexes;
 
 			cstate->cur_lineno = firstBufferedLineNo + i;
-			ExecStoreHeapTuple(bufferedTuples[i], myslot, false);
 			recheckIndexes =
-				ExecInsertIndexTuples(myslot,
-									  estate, false, NULL, NIL);
+				ExecInsertIndexTuples(bufferedSlots[i], estate, false, NULL,
+									  NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
-								 myslot,
+								 bufferedSlots[i],
 								 recheckIndexes, cstate->transition_capture);
 			list_free(recheckIndexes);
 		}
@@ -3166,9 +3176,8 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 		for (i = 0; i < nBufferedTuples; i++)
 		{
 			cstate->cur_lineno = firstBufferedLineNo + i;
-			ExecStoreHeapTuple(bufferedTuples[i], myslot, false);
 			ExecARInsertTriggers(estate, resultRelInfo,
-								 myslot,
+								 bufferedSlots[i],
 								 NIL, cstate->transition_capture);
 		}
 	}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 02bfc914d11..044d6b04e86 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -35,6 +35,7 @@
 #define HEAP_INSERT_NO_LOGICAL	0x0010
 
 typedef struct BulkInsertStateData *BulkInsertState;
+struct TupleTableSlot;
 struct HeapUpdateFailureData;
 
 #define MaxLockTupleMode	LockTupleExclusive
@@ -146,7 +147,7 @@ extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
 
 extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 			int options, BulkInsertState bistate);
-extern void heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
+extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots, int ntuples,
 				  CommandId cid, int options, BulkInsertState bistate);
 extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
 			CommandId cid, Snapshot crosscheck, bool wait,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 0fdb6b78e7f..7213d425e12 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -272,6 +272,8 @@ typedef struct TableAmRoutine
 								 HeapUpdateFailureData *hufd,
 								 LockTupleMode *lockmode,
 								 bool *update_indexes);
+	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
+								 CommandId cid, int options, struct BulkInsertStateData *bistate);
 	HTSU_Result (*tuple_lock) (Relation rel,
 							   ItemPointer tid,
 							   Snapshot snapshot,
@@ -660,6 +662,17 @@ table_update(Relation rel, ItemPointer otid, TupleTableSlot *slot,
 										 lockmode, update_indexes);
 }
 
+/*
+ *	table_multi_insert	- insert multiple tuple into a table
+ */
+static inline void
+table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
+				   CommandId cid, int options, struct BulkInsertStateData *bistate)
+{
+	rel->rd_tableam->multi_insert(rel, slots, nslots,
+								  cid, options, bistate);
+}
+
 /*
  * Lock a tuple in the specified mode.
  */
-- 
2.21.0.dirty

v18-0008-tableam-finish_bulk_insert.patchtext/x-diff; charset=us-asciiDownload
From 91dbe1b63237c15a6d25fd6a79f3a834205b66be Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 7 Mar 2019 16:31:46 -0800
Subject: [PATCH v18 08/18] tableam: finish_bulk_insert().

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/heap/heapam_handler.c | 12 ++++++++++++
 src/backend/commands/copy.c              |  7 +------
 src/backend/commands/createas.c          |  5 ++---
 src/backend/commands/matview.c           |  5 ++---
 src/backend/commands/tablecmds.c         |  4 +---
 src/include/access/tableam.h             | 18 ++++++++++++++++++
 6 files changed, 36 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index ea8e3ee9ce5..3098cb96b60 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -541,6 +541,17 @@ retry:
 	return result;
 }
 
+static void
+heapam_finish_bulk_insert(Relation relation, int options)
+{
+	/*
+	 * If we skipped writing WAL, then we need to sync the heap (but not
+	 * indexes since those use WAL anyway)
+	 */
+	if (options & HEAP_INSERT_SKIP_WAL)
+		heap_sync(relation);
+}
+
 
 /* ------------------------------------------------------------------------
  * Definition of the heap table access method.
@@ -573,6 +584,7 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_update = heapam_heap_update,
 	.multi_insert = heap_multi_insert,
 	.tuple_lock = heapam_lock_tuple,
+	.finish_bulk_insert = heapam_finish_bulk_insert,
 
 	.tuple_fetch_row_version = heapam_fetch_row_version,
 	.tuple_get_latest_tid = heap_get_latest_tid,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 312fd3bed31..1e7a06a72fb 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3098,12 +3098,7 @@ CopyFrom(CopyState cstate)
 
 	FreeExecutorState(estate);
 
-	/*
-	 * If we skipped writing WAL, then we need to sync the heap (but not
-	 * indexes since those use WAL anyway)
-	 */
-	if (hi_options & HEAP_INSERT_SKIP_WAL)
-		heap_sync(cstate->rel);
+	table_finish_bulk_insert(cstate->rel, hi_options);
 
 	return processed;
 }
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 0ac295cea3f..55f61854614 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -28,6 +28,7 @@
 #include "access/reloptions.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "catalog/namespace.h"
@@ -601,9 +602,7 @@ intorel_shutdown(DestReceiver *self)
 
 	FreeBulkInsertState(myState->bistate);
 
-	/* If we skipped using WAL, must heap_sync before commit */
-	if (myState->hi_options & HEAP_INSERT_SKIP_WAL)
-		heap_sync(myState->rel);
+	table_finish_bulk_insert(myState->rel, myState->hi_options);
 
 	/* close rel, but keep lock until commit */
 	table_close(myState->rel, NoLock);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 5a47be4b33c..62b76cfd358 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -18,6 +18,7 @@
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/multixact.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "catalog/catalog.h"
@@ -509,9 +510,7 @@ transientrel_shutdown(DestReceiver *self)
 
 	FreeBulkInsertState(myState->bistate);
 
-	/* If we skipped using WAL, must heap_sync before commit */
-	if (myState->hi_options & HEAP_INSERT_SKIP_WAL)
-		heap_sync(myState->transientrel);
+	table_finish_bulk_insert(myState->transientrel, myState->hi_options);
 
 	/* close transientrel, but keep lock until commit */
 	table_close(myState->transientrel, NoLock);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 40839e14dbe..1f5a7e93155 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -4951,9 +4951,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	{
 		FreeBulkInsertState(bistate);
 
-		/* If we skipped writing WAL, then we need to sync the heap. */
-		if (hi_options & HEAP_INSERT_SKIP_WAL)
-			heap_sync(newrel);
+		table_finish_bulk_insert(newrel, hi_options);
 
 		table_close(newrel, NoLock);
 	}
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 7213d425e12..2c2d388dda6 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -284,6 +284,16 @@ typedef struct TableAmRoutine
 							   uint8 flags,
 							   HeapUpdateFailureData *hufd);
 
+	/*
+	 * Perform operations necessary to complete insertions made via
+	 * tuple_insert and multi_insert with a BulkInsertState specified. This
+	 * e.g. may e.g. used to flush the relation when inserting with skipping
+	 * WAL.
+	 *
+	 * May be NULL.
+	 */
+	void		(*finish_bulk_insert) (Relation rel, int options);
+
 } TableAmRoutine;
 
 
@@ -687,6 +697,14 @@ table_lock_tuple(Relation rel, ItemPointer tid, Snapshot snapshot,
 									   flags, hufd);
 }
 
+static inline void
+table_finish_bulk_insert(Relation rel, int options)
+{
+	/* optional */
+	if (rel->rd_tableam && rel->rd_tableam->finish_bulk_insert)
+		rel->rd_tableam->finish_bulk_insert(rel, options);
+}
+
 
 /* ----------------------------------------------------------------------------
  * Functions to make modifications a bit simpler.
-- 
2.21.0.dirty

v18-0009-tableam-slotify-CREATE-TABLE-AS-and-CREATE-MATER.patchtext/x-diff; charset=us-asciiDownload
From ecf76b433a30591b259f2c62c8e7348b36dcd851 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 19 Jan 2019 23:22:39 -0800
Subject: [PATCH v18 09/18] tableam: slotify CREATE TABLE AS and CREATE
 MATERIALIZED VIEW.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/commands/createas.c | 26 ++++++++++++++++----------
 src/backend/commands/matview.c  | 26 ++++++++++++++------------
 2 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 55f61854614..31b8691c988 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -27,6 +27,7 @@
 #include "access/heapam.h"
 #include "access/reloptions.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/sysattr.h"
 #include "access/tableam.h"
 #include "access/xact.h"
@@ -61,7 +62,8 @@ typedef struct
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
 	CommandId	output_cid;		/* cmin to insert in output tuples */
 	int			hi_options;		/* heap_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	BulkInsertState bistate;		/* bulk insert state */
+	TupleTableSlot *slot;
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -553,6 +555,7 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	myState->rel = intoRelationDesc;
 	myState->reladdr = intoRelationAddr;
 	myState->output_cid = GetCurrentCommandId(true);
+	myState->slot = table_slot_create(intoRelationDesc, NULL);
 
 	/*
 	 * We can skip WAL-logging the insertions, unless PITR or streaming
@@ -573,19 +576,21 @@ static bool
 intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 {
 	DR_intorel *myState = (DR_intorel *) self;
-	HeapTuple	tuple;
 
 	/*
-	 * get the heap tuple out of the tuple table slot, making sure we have a
-	 * writable copy
+	 * Ensure input tuple is the right format for the target relation.
 	 */
-	tuple = ExecCopySlotHeapTuple(slot);
+	if (slot->tts_ops != myState->slot->tts_ops)
+	{
+		ExecCopySlot(myState->slot, slot);
+		slot = myState->slot;
+	}
 
-	heap_insert(myState->rel,
-				tuple,
-				myState->output_cid,
-				myState->hi_options,
-				myState->bistate);
+	table_insert(myState->rel,
+				 slot,
+				 myState->output_cid,
+				 myState->hi_options,
+				 myState->bistate);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -600,6 +605,7 @@ intorel_shutdown(DestReceiver *self)
 {
 	DR_intorel *myState = (DR_intorel *) self;
 
+	ExecDropSingleTupleTableSlot(myState->slot);
 	FreeBulkInsertState(myState->bistate);
 
 	table_finish_bulk_insert(myState->rel, myState->hi_options);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 62b76cfd358..e291ad0c547 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -56,6 +56,7 @@ typedef struct
 	CommandId	output_cid;		/* cmin to insert in output tuples */
 	int			hi_options;		/* heap_insert performance options */
 	BulkInsertState bistate;	/* bulk insert state */
+	TupleTableSlot *slot;
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -457,6 +458,7 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 */
 	myState->transientrel = transientrel;
 	myState->output_cid = GetCurrentCommandId(true);
+	myState->slot = table_slot_create(transientrel, NULL);
 
 	/*
 	 * We can skip WAL-logging the insertions, unless PITR or streaming
@@ -478,25 +480,24 @@ static bool
 transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
-	HeapTuple	tuple;
 
 	/*
-	 * get the heap tuple out of the tuple table slot, making sure we have a
-	 * writable copy
+	 * Ensure input tuple is the right format for the target relation.
 	 */
-	tuple = ExecCopySlotHeapTuple(slot);
+	if (slot->tts_ops != myState->slot->tts_ops)
+	{
+		ExecCopySlot(myState->slot, slot);
+		slot = myState->slot;
+	}
 
-	heap_insert(myState->transientrel,
-				tuple,
-				myState->output_cid,
-				myState->hi_options,
-				myState->bistate);
+	table_insert(myState->transientrel,
+				 slot,
+				 myState->output_cid,
+				 myState->hi_options,
+				 myState->bistate);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
-	/* Free the copied tuple. */
-	heap_freetuple(tuple);
-
 	return true;
 }
 
@@ -508,6 +509,7 @@ transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
 
+	ExecDropSingleTupleTableSlot(myState->slot);
 	FreeBulkInsertState(myState->bistate);
 
 	table_finish_bulk_insert(myState->transientrel, myState->hi_options);
-- 
2.21.0.dirty

v18-0010-tableam-index-builds.patchtext/x-diff; charset=us-asciiDownload
From 54266c60d6d0186f6ff01807ce8315a61b60ce7d Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 19 Jan 2019 23:45:07 -0800
Subject: [PATCH v18 10/18] tableam: index builds.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 contrib/amcheck/verify_nbtree.c          |   6 +-
 contrib/bloom/blinsert.c                 |   9 +-
 src/backend/access/brin/brin.c           |  18 +-
 src/backend/access/gin/gininsert.c       |   6 +-
 src/backend/access/gist/gistbuild.c      |   8 +-
 src/backend/access/hash/hash.c           |   8 +-
 src/backend/access/heap/heapam_handler.c | 759 ++++++++++++++++++++
 src/backend/access/nbtree/nbtsort.c      |  14 +-
 src/backend/access/spgist/spginsert.c    |   9 +-
 src/backend/catalog/index.c              | 861 +----------------------
 src/include/access/tableam.h             |  89 +++
 src/include/catalog/index.h              |  71 +-
 12 files changed, 950 insertions(+), 908 deletions(-)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index bb6442de82d..cd261983690 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -23,9 +23,9 @@
  */
 #include "postgres.h"
 
-#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/nbtree.h"
+#include "access/table.h"
 #include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xact.h"
@@ -535,8 +535,8 @@ bt_check_every_level(Relation rel, Relation heaprel, bool readonly,
 			 RelationGetRelationName(state->rel),
 			 RelationGetRelationName(state->heaprel));
 
-		IndexBuildHeapScan(state->heaprel, state->rel, indexinfo, true,
-						   bt_tuple_present_callback, (void *) state, scan);
+		table_index_build_scan(state->heaprel, state->rel, indexinfo, true,
+							   bt_tuple_present_callback, (void *) state, scan);
 
 		ereport(DEBUG1,
 				(errmsg_internal("finished verifying presence of " INT64_FORMAT " tuples from table \"%s\" with bitset %.2f%% set",
diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
index e43fbe0005f..3b704312665 100644
--- a/contrib/bloom/blinsert.c
+++ b/contrib/bloom/blinsert.c
@@ -14,6 +14,7 @@
 
 #include "access/genam.h"
 #include "access/generic_xlog.h"
+#include "access/tableam.h"
 #include "catalog/index.h"
 #include "miscadmin.h"
 #include "storage/bufmgr.h"
@@ -69,7 +70,7 @@ initCachedPage(BloomBuildState *buildstate)
 }
 
 /*
- * Per-tuple callback from IndexBuildHeapScan.
+ * Per-tuple callback from table_index_build_scan.
  */
 static void
 bloomBuildCallback(Relation index, HeapTuple htup, Datum *values,
@@ -141,9 +142,9 @@ blbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	initCachedPage(&buildstate);
 
 	/* Do the heap scan */
-	reltuples = IndexBuildHeapScan(heap, index, indexInfo, true,
-								   bloomBuildCallback, (void *) &buildstate,
-								   NULL);
+	reltuples = table_index_build_scan(heap, index, indexInfo, true,
+									   bloomBuildCallback, (void *) &buildstate,
+									   NULL);
 
 	/* Flush last page if needed (it will be, unless heap was empty) */
 	if (buildstate.count > 0)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 8f008dd0080..6e96d24ca22 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -23,6 +23,7 @@
 #include "access/reloptions.h"
 #include "access/relscan.h"
 #include "access/table.h"
+#include "access/tableam.h"
 #include "access/xloginsert.h"
 #include "catalog/index.h"
 #include "catalog/pg_am.h"
@@ -587,7 +588,7 @@ brinendscan(IndexScanDesc scan)
 }
 
 /*
- * Per-heap-tuple callback for IndexBuildHeapScan.
+ * Per-heap-tuple callback for table_index_build_scan.
  *
  * Note we don't worry about the page range at the end of the table here; it is
  * present in the build state struct after we're called the last time, but not
@@ -718,8 +719,8 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	 * Now scan the relation.  No syncscan allowed here because we want the
 	 * heap blocks in physical order.
 	 */
-	reltuples = IndexBuildHeapScan(heap, index, indexInfo, false,
-								   brinbuildCallback, (void *) state, NULL);
+	reltuples = table_index_build_scan(heap, index, indexInfo, false,
+									   brinbuildCallback, (void *) state, NULL);
 
 	/* process the final batch */
 	form_and_insert_tuple(state);
@@ -1230,13 +1231,14 @@ summarize_range(IndexInfo *indexInfo, BrinBuildState *state, Relation heapRel,
 	 * short of brinbuildCallback creating the new index entry.
 	 *
 	 * Note that it is critical we use the "any visible" mode of
-	 * IndexBuildHeapRangeScan here: otherwise, we would miss tuples inserted
-	 * by transactions that are still in progress, among other corner cases.
+	 * table_index_build_range_scan here: otherwise, we would miss tuples
+	 * inserted by transactions that are still in progress, among other corner
+	 * cases.
 	 */
 	state->bs_currRangeStart = heapBlk;
-	IndexBuildHeapRangeScan(heapRel, state->bs_irel, indexInfo, false, true,
-							heapBlk, scanNumBlks,
-							brinbuildCallback, (void *) state, NULL);
+	table_index_build_range_scan(heapRel, state->bs_irel, indexInfo, false, true,
+								 heapBlk, scanNumBlks,
+								 brinbuildCallback, (void *) state, NULL);
 
 	/*
 	 * Now we update the values obtained by the scan with the placeholder
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 524ac5be8b5..b02f69b0dcb 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -17,6 +17,7 @@
 #include "access/gin_private.h"
 #include "access/ginxlog.h"
 #include "access/xloginsert.h"
+#include "access/tableam.h"
 #include "catalog/index.h"
 #include "miscadmin.h"
 #include "storage/bufmgr.h"
@@ -394,8 +395,9 @@ ginbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	 * Do the heap scan.  We disallow sync scan here because dataPlaceToPage
 	 * prefers to receive tuples in TID order.
 	 */
-	reltuples = IndexBuildHeapScan(heap, index, indexInfo, false,
-								   ginBuildCallback, (void *) &buildstate, NULL);
+	reltuples = table_index_build_scan(heap, index, indexInfo, false,
+									   ginBuildCallback, (void *) &buildstate,
+									   NULL);
 
 	/* dump remaining entries to the index */
 	oldCtx = MemoryContextSwitchTo(buildstate.tmpCtx);
diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c
index bd142a3560d..b76a587e945 100644
--- a/src/backend/access/gist/gistbuild.c
+++ b/src/backend/access/gist/gistbuild.c
@@ -19,6 +19,7 @@
 #include "access/genam.h"
 #include "access/gist_private.h"
 #include "access/gistxlog.h"
+#include "access/tableam.h"
 #include "access/xloginsert.h"
 #include "catalog/index.h"
 #include "miscadmin.h"
@@ -204,8 +205,9 @@ gistbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	/*
 	 * Do the heap scan.
 	 */
-	reltuples = IndexBuildHeapScan(heap, index, indexInfo, true,
-								   gistBuildCallback, (void *) &buildstate, NULL);
+	reltuples = table_index_build_scan(heap, index, indexInfo, true,
+									   gistBuildCallback,
+									   (void *) &buildstate, NULL);
 
 	/*
 	 * If buffering was used, flush out all the tuples that are still in the
@@ -454,7 +456,7 @@ calculatePagesPerBuffer(GISTBuildState *buildstate, int levelStep)
 }
 
 /*
- * Per-tuple callback from IndexBuildHeapScan.
+ * Per-tuple callback from table_index_build_scan.
  */
 static void
 gistBuildCallback(Relation index,
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index f1f01a0956d..38747ba3d7a 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -21,6 +21,7 @@
 #include "access/hash.h"
 #include "access/hash_xlog.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "catalog/index.h"
 #include "commands/vacuum.h"
 #include "miscadmin.h"
@@ -159,8 +160,9 @@ hashbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	buildstate.heapRel = heap;
 
 	/* do the heap scan */
-	reltuples = IndexBuildHeapScan(heap, index, indexInfo, true,
-								   hashbuildCallback, (void *) &buildstate, NULL);
+	reltuples = table_index_build_scan(heap, index, indexInfo, true,
+									   hashbuildCallback,
+									   (void *) &buildstate, NULL);
 
 	if (buildstate.spool)
 	{
@@ -190,7 +192,7 @@ hashbuildempty(Relation index)
 }
 
 /*
- * Per-tuple callback from IndexBuildHeapScan
+ * Per-tuple callback from table_index_build_scan
  */
 static void
 hashbuildCallback(Relation index,
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3098cb96b60..1bf08b47250 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -19,11 +19,19 @@
  */
 #include "postgres.h"
 
+#include "miscadmin.h"
+
+#include "access/genam.h"
 #include "access/heapam.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/catalog.h"
+#include "catalog/index.h"
+#include "executor/executor.h"
 #include "storage/bufmgr.h"
+#include "storage/bufpage.h"
 #include "storage/lmgr.h"
+#include "storage/procarray.h"
 #include "utils/builtins.h"
 
 
@@ -558,6 +566,754 @@ heapam_finish_bulk_insert(Relation relation, int options)
  * ------------------------------------------------------------------------
  */
 
+/*
+ * PBORKED: Update comment.
+ *
+ * As above, except that instead of scanning the complete heap, only the given
+ * number of blocks are scanned.  Scan to end-of-rel can be signalled by
+ * passing InvalidBlockNumber as numblocks.  Note that restricting the range
+ * to scan cannot be done when requesting syncscan.
+ *
+ * When "anyvisible" mode is requested, all tuples visible to any transaction
+ * are indexed and counted as live, including those inserted or deleted by
+ * transactions that are still in progress.
+ */
+static double
+heapam_index_build_range_scan(Relation heapRelation,
+							  Relation indexRelation,
+							  IndexInfo *indexInfo,
+							  bool allow_sync,
+							  bool anyvisible,
+							  BlockNumber start_blockno,
+							  BlockNumber numblocks,
+							  IndexBuildCallback callback,
+							  void *callback_state,
+							  TableScanDesc sscan)
+{
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+	bool		is_system_catalog;
+	bool		checking_uniqueness;
+	HeapTuple	heapTuple;
+	Datum		values[INDEX_MAX_KEYS];
+	bool		isnull[INDEX_MAX_KEYS];
+	double		reltuples;
+	ExprState  *predicate;
+	TupleTableSlot *slot;
+	EState	   *estate;
+	ExprContext *econtext;
+	Snapshot	snapshot;
+	bool		need_unregister_snapshot = false;
+	TransactionId OldestXmin;
+	BlockNumber root_blkno = InvalidBlockNumber;
+	OffsetNumber root_offsets[MaxHeapTuplesPerPage];
+
+	/*
+	 * sanity checks
+	 */
+	Assert(OidIsValid(indexRelation->rd_rel->relam));
+
+	/* Remember if it's a system catalog */
+	is_system_catalog = IsSystemRelation(heapRelation);
+
+	/* See whether we're verifying uniqueness/exclusion properties */
+	checking_uniqueness = (indexInfo->ii_Unique ||
+						   indexInfo->ii_ExclusionOps != NULL);
+
+	/*
+	 * "Any visible" mode is not compatible with uniqueness checks; make sure
+	 * only one of those is requested.
+	 */
+	Assert(!(anyvisible && checking_uniqueness));
+
+	/*
+	 * Need an EState for evaluation of index expressions and partial-index
+	 * predicates.  Also a slot to hold the current tuple.
+	 */
+	estate = CreateExecutorState();
+	econtext = GetPerTupleExprContext(estate);
+	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
+									&TTSOpsHeapTuple);
+
+	/* Arrange for econtext's scan tuple to be the tuple under test */
+	econtext->ecxt_scantuple = slot;
+
+	/* Set up execution state for predicate, if any. */
+	predicate = ExecPrepareQual(indexInfo->ii_Predicate, estate);
+
+	/*
+	 * Prepare for scan of the base relation.  In a normal index build, we use
+	 * SnapshotAny because we must retrieve all tuples and do our own time
+	 * qual checks (because we have to index RECENTLY_DEAD tuples). In a
+	 * concurrent build, or during bootstrap, we take a regular MVCC snapshot
+	 * and index whatever's live according to that.
+	 */
+	OldestXmin = InvalidTransactionId;
+
+	/* okay to ignore lazy VACUUMs here */
+	if (!IsBootstrapProcessingMode() && !indexInfo->ii_Concurrent)
+		OldestXmin = GetOldestXmin(heapRelation, PROCARRAY_FLAGS_VACUUM);
+
+	if (!scan)
+	{
+		/*
+		 * Serial index build.
+		 *
+		 * Must begin our own heap scan in this case.  We may also need to
+		 * register a snapshot whose lifetime is under our direct control.
+		 */
+		if (!TransactionIdIsValid(OldestXmin))
+		{
+			snapshot = RegisterSnapshot(GetTransactionSnapshot());
+			need_unregister_snapshot = true;
+		}
+		else
+			snapshot = SnapshotAny;
+
+		sscan = table_beginscan_strat(heapRelation, /* relation */
+									  snapshot, /* snapshot */
+									  0,	/* number of keys */
+									  NULL, /* scan key */
+									  true, /* buffer access strategy OK */
+									  allow_sync);	/* syncscan OK? */
+		scan = (HeapScanDesc) sscan;
+	}
+	else
+	{
+		/*
+		 * Parallel index build.
+		 *
+		 * Parallel case never registers/unregisters own snapshot.  Snapshot
+		 * is taken from parallel heap scan, and is SnapshotAny or an MVCC
+		 * snapshot, based on same criteria as serial case.
+		 */
+		Assert(!IsBootstrapProcessingMode());
+		Assert(allow_sync);
+		snapshot = scan->rs_base.rs_snapshot;
+	}
+
+	/*
+	 * Must call GetOldestXmin() with SnapshotAny.  Should never call
+	 * GetOldestXmin() with MVCC snapshot. (It's especially worth checking
+	 * this for parallel builds, since ambuild routines that support parallel
+	 * builds must work these details out for themselves.)
+	 */
+	Assert(snapshot == SnapshotAny || IsMVCCSnapshot(snapshot));
+	Assert(snapshot == SnapshotAny ? TransactionIdIsValid(OldestXmin) :
+		   !TransactionIdIsValid(OldestXmin));
+	Assert(snapshot == SnapshotAny || !anyvisible);
+
+	/* set our scan endpoints */
+	if (!allow_sync)
+		heap_setscanlimits(sscan, start_blockno, numblocks);
+	else
+	{
+		/* syncscan can only be requested on whole relation */
+		Assert(start_blockno == 0);
+		Assert(numblocks == InvalidBlockNumber);
+	}
+
+	reltuples = 0;
+
+	/*
+	 * Scan all tuples in the base relation.
+	 */
+	while ((heapTuple = heap_getnext(sscan, ForwardScanDirection)) != NULL)
+	{
+		bool		tupleIsAlive;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * When dealing with a HOT-chain of updated tuples, we want to index
+		 * the values of the live tuple (if any), but index it under the TID
+		 * of the chain's root tuple.  This approach is necessary to preserve
+		 * the HOT-chain structure in the heap. So we need to be able to find
+		 * the root item offset for every tuple that's in a HOT-chain.  When
+		 * first reaching a new page of the relation, call
+		 * heap_get_root_tuples() to build a map of root item offsets on the
+		 * page.
+		 *
+		 * It might look unsafe to use this information across buffer
+		 * lock/unlock.  However, we hold ShareLock on the table so no
+		 * ordinary insert/update/delete should occur; and we hold pin on the
+		 * buffer continuously while visiting the page, so no pruning
+		 * operation can occur either.
+		 *
+		 * Also, although our opinions about tuple liveness could change while
+		 * we scan the page (due to concurrent transaction commits/aborts),
+		 * the chain root locations won't, so this info doesn't need to be
+		 * rebuilt after waiting for another transaction.
+		 *
+		 * Note the implied assumption that there is no more than one live
+		 * tuple per HOT-chain --- else we could create more than one index
+		 * entry pointing to the same root tuple.
+		 */
+		if (scan->rs_cblock != root_blkno)
+		{
+			Page		page = BufferGetPage(scan->rs_cbuf);
+
+			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+			heap_get_root_tuples(page, root_offsets);
+			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+
+			root_blkno = scan->rs_cblock;
+		}
+
+		if (snapshot == SnapshotAny)
+		{
+			/* do our own time qual check */
+			bool		indexIt;
+			TransactionId xwait;
+
+	recheck:
+
+			/*
+			 * We could possibly get away with not locking the buffer here,
+			 * since caller should hold ShareLock on the relation, but let's
+			 * be conservative about it.  (This remark is still correct even
+			 * with HOT-pruning: our pin on the buffer prevents pruning.)
+			 */
+			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+
+			/*
+			 * The criteria for counting a tuple as live in this block need to
+			 * match what analyze.c's acquire_sample_rows() does, otherwise
+			 * CREATE INDEX and ANALYZE may produce wildly different reltuples
+			 * values, e.g. when there are many recently-dead tuples.
+			 */
+			switch (HeapTupleSatisfiesVacuum(heapTuple, OldestXmin, scan->rs_cbuf))
+			{
+				case HEAPTUPLE_DEAD:
+					/* Definitely dead, we can ignore it */
+					indexIt = false;
+					tupleIsAlive = false;
+					break;
+				case HEAPTUPLE_LIVE:
+					/* Normal case, index and unique-check it */
+					indexIt = true;
+					tupleIsAlive = true;
+					/* Count it as live, too */
+					reltuples += 1;
+					break;
+				case HEAPTUPLE_RECENTLY_DEAD:
+
+					/*
+					 * If tuple is recently deleted then we must index it
+					 * anyway to preserve MVCC semantics.  (Pre-existing
+					 * transactions could try to use the index after we finish
+					 * building it, and may need to see such tuples.)
+					 *
+					 * However, if it was HOT-updated then we must only index
+					 * the live tuple at the end of the HOT-chain.  Since this
+					 * breaks semantics for pre-existing snapshots, mark the
+					 * index as unusable for them.
+					 *
+					 * We don't count recently-dead tuples in reltuples, even
+					 * if we index them; see acquire_sample_rows().
+					 */
+					if (HeapTupleIsHotUpdated(heapTuple))
+					{
+						indexIt = false;
+						/* mark the index as unsafe for old snapshots */
+						indexInfo->ii_BrokenHotChain = true;
+					}
+					else
+						indexIt = true;
+					/* In any case, exclude the tuple from unique-checking */
+					tupleIsAlive = false;
+					break;
+				case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+					/*
+					 * In "anyvisible" mode, this tuple is visible and we
+					 * don't need any further checks.
+					 */
+					if (anyvisible)
+					{
+						indexIt = true;
+						tupleIsAlive = true;
+						reltuples += 1;
+						break;
+					}
+
+					/*
+					 * Since caller should hold ShareLock or better, normally
+					 * the only way to see this is if it was inserted earlier
+					 * in our own transaction.  However, it can happen in
+					 * system catalogs, since we tend to release write lock
+					 * before commit there.  Give a warning if neither case
+					 * applies.
+					 */
+					xwait = HeapTupleHeaderGetXmin(heapTuple->t_data);
+					if (!TransactionIdIsCurrentTransactionId(xwait))
+					{
+						if (!is_system_catalog)
+							elog(WARNING, "concurrent insert in progress within table \"%s\"",
+								 RelationGetRelationName(heapRelation));
+
+						/*
+						 * If we are performing uniqueness checks, indexing
+						 * such a tuple could lead to a bogus uniqueness
+						 * failure.  In that case we wait for the inserting
+						 * transaction to finish and check again.
+						 */
+						if (checking_uniqueness)
+						{
+							/*
+							 * Must drop the lock on the buffer before we wait
+							 */
+							LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+							XactLockTableWait(xwait, heapRelation,
+											  &heapTuple->t_self,
+											  XLTW_InsertIndexUnique);
+							CHECK_FOR_INTERRUPTS();
+							goto recheck;
+						}
+					}
+					else
+					{
+						/*
+						 * For consistency with acquire_sample_rows(), count
+						 * HEAPTUPLE_INSERT_IN_PROGRESS tuples as live only
+						 * when inserted by our own transaction.
+						 */
+						reltuples += 1;
+					}
+
+					/*
+					 * We must index such tuples, since if the index build
+					 * commits then they're good.
+					 */
+					indexIt = true;
+					tupleIsAlive = true;
+					break;
+				case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+					/*
+					 * As with INSERT_IN_PROGRESS case, this is unexpected
+					 * unless it's our own deletion or a system catalog; but
+					 * in anyvisible mode, this tuple is visible.
+					 */
+					if (anyvisible)
+					{
+						indexIt = true;
+						tupleIsAlive = false;
+						reltuples += 1;
+						break;
+					}
+
+					xwait = HeapTupleHeaderGetUpdateXid(heapTuple->t_data);
+					if (!TransactionIdIsCurrentTransactionId(xwait))
+					{
+						if (!is_system_catalog)
+							elog(WARNING, "concurrent delete in progress within table \"%s\"",
+								 RelationGetRelationName(heapRelation));
+
+						/*
+						 * If we are performing uniqueness checks, assuming
+						 * the tuple is dead could lead to missing a
+						 * uniqueness violation.  In that case we wait for the
+						 * deleting transaction to finish and check again.
+						 *
+						 * Also, if it's a HOT-updated tuple, we should not
+						 * index it but rather the live tuple at the end of
+						 * the HOT-chain.  However, the deleting transaction
+						 * could abort, possibly leaving this tuple as live
+						 * after all, in which case it has to be indexed. The
+						 * only way to know what to do is to wait for the
+						 * deleting transaction to finish and check again.
+						 */
+						if (checking_uniqueness ||
+							HeapTupleIsHotUpdated(heapTuple))
+						{
+							/*
+							 * Must drop the lock on the buffer before we wait
+							 */
+							LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+							XactLockTableWait(xwait, heapRelation,
+											  &heapTuple->t_self,
+											  XLTW_InsertIndexUnique);
+							CHECK_FOR_INTERRUPTS();
+							goto recheck;
+						}
+
+						/*
+						 * Otherwise index it but don't check for uniqueness,
+						 * the same as a RECENTLY_DEAD tuple.
+						 */
+						indexIt = true;
+
+						/*
+						 * Count HEAPTUPLE_DELETE_IN_PROGRESS tuples as live,
+						 * if they were not deleted by the current
+						 * transaction.  That's what acquire_sample_rows()
+						 * does, and we want the behavior to be consistent.
+						 */
+						reltuples += 1;
+					}
+					else if (HeapTupleIsHotUpdated(heapTuple))
+					{
+						/*
+						 * It's a HOT-updated tuple deleted by our own xact.
+						 * We can assume the deletion will commit (else the
+						 * index contents don't matter), so treat the same as
+						 * RECENTLY_DEAD HOT-updated tuples.
+						 */
+						indexIt = false;
+						/* mark the index as unsafe for old snapshots */
+						indexInfo->ii_BrokenHotChain = true;
+					}
+					else
+					{
+						/*
+						 * It's a regular tuple deleted by our own xact. Index
+						 * it, but don't check for uniqueness nor count in
+						 * reltuples, the same as a RECENTLY_DEAD tuple.
+						 */
+						indexIt = true;
+					}
+					/* In any case, exclude the tuple from unique-checking */
+					tupleIsAlive = false;
+					break;
+				default:
+					elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+					indexIt = tupleIsAlive = false; /* keep compiler quiet */
+					break;
+			}
+
+			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+
+			if (!indexIt)
+				continue;
+		}
+		else
+		{
+			/* heap_getnext did the time qual check */
+			tupleIsAlive = true;
+			reltuples += 1;
+		}
+
+		MemoryContextReset(econtext->ecxt_per_tuple_memory);
+
+		/* Set up for predicate or expression evaluation */
+		ExecStoreHeapTuple(heapTuple, slot, false);
+
+		/*
+		 * In a partial index, discard tuples that don't satisfy the
+		 * predicate.
+		 */
+		if (predicate != NULL)
+		{
+			if (!ExecQual(predicate, econtext))
+				continue;
+		}
+
+		/*
+		 * For the current heap tuple, extract all the attributes we use in
+		 * this index, and note which are null.  This also performs evaluation
+		 * of any expressions needed.
+		 */
+		FormIndexDatum(indexInfo,
+					   slot,
+					   estate,
+					   values,
+					   isnull);
+
+		/*
+		 * You'd think we should go ahead and build the index tuple here, but
+		 * some index AMs want to do further processing on the data first.  So
+		 * pass the values[] and isnull[] arrays, instead.
+		 */
+
+		if (HeapTupleIsHeapOnly(heapTuple))
+		{
+			/*
+			 * For a heap-only tuple, pretend its TID is that of the root. See
+			 * src/backend/access/heap/README.HOT for discussion.
+			 */
+			HeapTupleData rootTuple;
+			OffsetNumber offnum;
+
+			rootTuple = *heapTuple;
+			offnum = ItemPointerGetOffsetNumber(&heapTuple->t_self);
+
+			if (!OffsetNumberIsValid(root_offsets[offnum - 1]))
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"",
+										 ItemPointerGetBlockNumber(&heapTuple->t_self),
+										 offnum,
+										 RelationGetRelationName(heapRelation))));
+
+			ItemPointerSetOffsetNumber(&rootTuple.t_self,
+									   root_offsets[offnum - 1]);
+
+			/* Call the AM's callback routine to process the tuple */
+			callback(indexRelation, &rootTuple, values, isnull, tupleIsAlive,
+					 callback_state);
+		}
+		else
+		{
+			/* Call the AM's callback routine to process the tuple */
+			callback(indexRelation, heapTuple, values, isnull, tupleIsAlive,
+					 callback_state);
+		}
+	}
+
+	table_endscan(sscan);
+
+	/* we can now forget our snapshot, if set and registered by us */
+	if (need_unregister_snapshot)
+		UnregisterSnapshot(snapshot);
+
+	ExecDropSingleTupleTableSlot(slot);
+
+	FreeExecutorState(estate);
+
+	/* These may have been pointing to the now-gone estate */
+	indexInfo->ii_ExpressionsState = NIL;
+	indexInfo->ii_PredicateState = NULL;
+
+	return reltuples;
+}
+
+/*
+ * Second table scan for concurrent index build
+ *
+ * This has much code in common with IndexBuildHeapScan, but it's enough
+ * different that it seems cleaner to have two routines not one.
+ */
+static void
+heapam_index_validate_scan(Relation heapRelation,
+						   Relation indexRelation,
+						   IndexInfo *indexInfo,
+						   Snapshot snapshot,
+						   ValidateIndexState * state)
+{
+	TableScanDesc sscan;
+	HeapScanDesc scan;
+	HeapTuple	heapTuple;
+	Datum		values[INDEX_MAX_KEYS];
+	bool		isnull[INDEX_MAX_KEYS];
+	ExprState  *predicate;
+	TupleTableSlot *slot;
+	EState	   *estate;
+	ExprContext *econtext;
+	BlockNumber root_blkno = InvalidBlockNumber;
+	OffsetNumber root_offsets[MaxHeapTuplesPerPage];
+	bool		in_index[MaxHeapTuplesPerPage];
+
+	/* state variables for the merge */
+	ItemPointer indexcursor = NULL;
+	ItemPointerData decoded;
+	bool		tuplesort_empty = false;
+
+	/*
+	 * sanity checks
+	 */
+	Assert(OidIsValid(indexRelation->rd_rel->relam));
+
+	/*
+	 * Need an EState for evaluation of index expressions and partial-index
+	 * predicates.  Also a slot to hold the current tuple.
+	 */
+	estate = CreateExecutorState();
+	econtext = GetPerTupleExprContext(estate);
+	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
+									&TTSOpsHeapTuple);
+
+	/* Arrange for econtext's scan tuple to be the tuple under test */
+	econtext->ecxt_scantuple = slot;
+
+	/* Set up execution state for predicate, if any. */
+	predicate = ExecPrepareQual(indexInfo->ii_Predicate, estate);
+
+	/*
+	 * Prepare for scan of the base relation.  We need just those tuples
+	 * satisfying the passed-in reference snapshot.  We must disable syncscan
+	 * here, because it's critical that we read from block zero forward to
+	 * match the sorted TIDs.
+	 */
+	sscan = table_beginscan_strat(heapRelation, /* relation */
+								  snapshot, /* snapshot */
+								  0,	/* number of keys */
+								  NULL, /* scan key */
+								  true, /* buffer access strategy OK */
+								  false);	/* syncscan not OK */
+	scan = (HeapScanDesc) sscan;
+
+	/*
+	 * Scan all tuples matching the snapshot.
+	 */
+	while ((heapTuple = heap_getnext(sscan, ForwardScanDirection)) != NULL)
+	{
+		ItemPointer heapcursor = &heapTuple->t_self;
+		ItemPointerData rootTuple;
+		OffsetNumber root_offnum;
+
+		CHECK_FOR_INTERRUPTS();
+
+		state->htups += 1;
+
+		/*
+		 * As commented in IndexBuildHeapScan, we should index heap-only
+		 * tuples under the TIDs of their root tuples; so when we advance onto
+		 * a new heap page, build a map of root item offsets on the page.
+		 *
+		 * This complicates merging against the tuplesort output: we will
+		 * visit the live tuples in order by their offsets, but the root
+		 * offsets that we need to compare against the index contents might be
+		 * ordered differently.  So we might have to "look back" within the
+		 * tuplesort output, but only within the current page.  We handle that
+		 * by keeping a bool array in_index[] showing all the
+		 * already-passed-over tuplesort output TIDs of the current page. We
+		 * clear that array here, when advancing onto a new heap page.
+		 */
+		if (scan->rs_cblock != root_blkno)
+		{
+			Page		page = BufferGetPage(scan->rs_cbuf);
+
+			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+			heap_get_root_tuples(page, root_offsets);
+			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+
+			memset(in_index, 0, sizeof(in_index));
+
+			root_blkno = scan->rs_cblock;
+		}
+
+		/* Convert actual tuple TID to root TID */
+		rootTuple = *heapcursor;
+		root_offnum = ItemPointerGetOffsetNumber(heapcursor);
+
+		if (HeapTupleIsHeapOnly(heapTuple))
+		{
+			root_offnum = root_offsets[root_offnum - 1];
+			if (!OffsetNumberIsValid(root_offnum))
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"",
+										 ItemPointerGetBlockNumber(heapcursor),
+										 ItemPointerGetOffsetNumber(heapcursor),
+										 RelationGetRelationName(heapRelation))));
+			ItemPointerSetOffsetNumber(&rootTuple, root_offnum);
+		}
+
+		/*
+		 * "merge" by skipping through the index tuples until we find or pass
+		 * the current root tuple.
+		 */
+		while (!tuplesort_empty &&
+			   (!indexcursor ||
+				ItemPointerCompare(indexcursor, &rootTuple) < 0))
+		{
+			Datum		ts_val;
+			bool		ts_isnull;
+
+			if (indexcursor)
+			{
+				/*
+				 * Remember index items seen earlier on the current heap page
+				 */
+				if (ItemPointerGetBlockNumber(indexcursor) == root_blkno)
+					in_index[ItemPointerGetOffsetNumber(indexcursor) - 1] = true;
+			}
+
+			tuplesort_empty = !tuplesort_getdatum(state->tuplesort, true,
+												  &ts_val, &ts_isnull, NULL);
+			Assert(tuplesort_empty || !ts_isnull);
+			if (!tuplesort_empty)
+			{
+				itemptr_decode(&decoded, DatumGetInt64(ts_val));
+				indexcursor = &decoded;
+
+				/* If int8 is pass-by-ref, free (encoded) TID Datum memory */
+#ifndef USE_FLOAT8_BYVAL
+				pfree(DatumGetPointer(ts_val));
+#endif
+			}
+			else
+			{
+				/* Be tidy */
+				indexcursor = NULL;
+			}
+		}
+
+		/*
+		 * If the tuplesort has overshot *and* we didn't see a match earlier,
+		 * then this tuple is missing from the index, so insert it.
+		 */
+		if ((tuplesort_empty ||
+			 ItemPointerCompare(indexcursor, &rootTuple) > 0) &&
+			!in_index[root_offnum - 1])
+		{
+			MemoryContextReset(econtext->ecxt_per_tuple_memory);
+
+			/* Set up for predicate or expression evaluation */
+			ExecStoreHeapTuple(heapTuple, slot, false);
+
+			/*
+			 * In a partial index, discard tuples that don't satisfy the
+			 * predicate.
+			 */
+			if (predicate != NULL)
+			{
+				if (!ExecQual(predicate, econtext))
+					continue;
+			}
+
+			/*
+			 * For the current heap tuple, extract all the attributes we use
+			 * in this index, and note which are null.  This also performs
+			 * evaluation of any expressions needed.
+			 */
+			FormIndexDatum(indexInfo,
+						   slot,
+						   estate,
+						   values,
+						   isnull);
+
+			/*
+			 * You'd think we should go ahead and build the index tuple here,
+			 * but some index AMs want to do further processing on the data
+			 * first. So pass the values[] and isnull[] arrays, instead.
+			 */
+
+			/*
+			 * If the tuple is already committed dead, you might think we
+			 * could suppress uniqueness checking, but this is no longer true
+			 * in the presence of HOT, because the insert is actually a proxy
+			 * for a uniqueness check on the whole HOT-chain.  That is, the
+			 * tuple we have here could be dead because it was already
+			 * HOT-updated, and if so the updating transaction will not have
+			 * thought it should insert index entries.  The index AM will
+			 * check the whole HOT-chain and correctly detect a conflict if
+			 * there is one.
+			 */
+
+			index_insert(indexRelation,
+						 values,
+						 isnull,
+						 &rootTuple,
+						 heapRelation,
+						 indexInfo->ii_Unique ?
+						 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
+						 indexInfo);
+
+			state->tups_inserted += 1;
+		}
+	}
+
+	table_endscan(sscan);
+
+	ExecDropSingleTupleTableSlot(slot);
+
+	FreeExecutorState(estate);
+
+	/* These may have been pointing to the now-gone estate */
+	indexInfo->ii_ExpressionsState = NIL;
+	indexInfo->ii_PredicateState = NULL;
+}
 static const TableAmRoutine heapam_methods = {
 	.type = T_TableAmRoutine,
 
@@ -589,6 +1345,9 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_fetch_row_version = heapam_fetch_row_version,
 	.tuple_get_latest_tid = heap_get_latest_tid,
 	.tuple_satisfies_snapshot = heapam_tuple_satisfies_snapshot,
+
+	.index_build_range_scan = heapam_index_build_range_scan,
+	.index_validate_scan = heapam_index_validate_scan,
 };
 
 
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 05a9b03aed5..dcdd142367b 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -471,9 +471,9 @@ _bt_spools_heapscan(Relation heap, Relation index, BTBuildState *buildstate,
 
 	/* Fill spool using either serial or parallel heap scan */
 	if (!buildstate->btleader)
-		reltuples = IndexBuildHeapScan(heap, index, indexInfo, true,
-									   _bt_build_callback, (void *) buildstate,
-									   NULL);
+		reltuples = table_index_build_scan(heap, index, indexInfo, true,
+										   _bt_build_callback, (void *) buildstate,
+										   NULL);
 	else
 		reltuples = _bt_parallel_heapscan(buildstate,
 										  &indexInfo->ii_BrokenHotChain);
@@ -548,7 +548,7 @@ _bt_leafbuild(BTSpool *btspool, BTSpool *btspool2)
 }
 
 /*
- * Per-tuple callback from IndexBuildHeapScan
+ * Per-tuple callback from table_index_build_scan
  */
 static void
 _bt_build_callback(Relation index,
@@ -1673,9 +1673,9 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap, &btshared->heapdesc);
-	reltuples = IndexBuildHeapScan(btspool->heap, btspool->index, indexInfo,
-								   true, _bt_build_callback,
-								   (void *) &buildstate, scan);
+	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
+									   true, _bt_build_callback,
+									   (void *) &buildstate, scan);
 
 	/*
 	 * Execute this worker's part of the sort.
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index f428a151385..390ad9ac51f 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -19,6 +19,7 @@
 #include "access/genam.h"
 #include "access/spgist_private.h"
 #include "access/spgxlog.h"
+#include "access/tableam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "catalog/index.h"
@@ -37,7 +38,7 @@ typedef struct
 } SpGistBuildState;
 
 
-/* Callback to process one heap tuple during IndexBuildHeapScan */
+/* Callback to process one heap tuple during table_index_build_scan */
 static void
 spgistBuildCallback(Relation index, HeapTuple htup, Datum *values,
 					bool *isnull, bool tupleIsAlive, void *state)
@@ -142,9 +143,9 @@ spgbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 											  "SP-GiST build temporary context",
 											  ALLOCSET_DEFAULT_SIZES);
 
-	reltuples = IndexBuildHeapScan(heap, index, indexInfo, true,
-								   spgistBuildCallback, (void *) &buildstate,
-								   NULL);
+	reltuples = table_index_build_scan(heap, index, indexInfo, true,
+									   spgistBuildCallback, (void *) &buildstate,
+									   NULL);
 
 	MemoryContextDelete(buildstate.tmpCtx);
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ff1a18c4d4e..f9ae483ab97 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -80,16 +80,6 @@
 /* Potentially set by pg_upgrade_support functions */
 Oid			binary_upgrade_next_index_pg_class_oid = InvalidOid;
 
-/* state info for validate_index bulkdelete callback */
-typedef struct
-{
-	Tuplesortstate *tuplesort;	/* for sorting the index TIDs */
-	/* statistics (for debug purposes only): */
-	double		htups,
-				itups,
-				tups_inserted;
-} v_i_state;
-
 /*
  * Pointer-free representation of variables used when reindexing system
  * catalogs; we use this to propagate those values to parallel workers.
@@ -130,14 +120,7 @@ static void index_update_stats(Relation rel,
 static void IndexCheckExclusion(Relation heapRelation,
 					Relation indexRelation,
 					IndexInfo *indexInfo);
-static inline int64 itemptr_encode(ItemPointer itemptr);
-static inline void itemptr_decode(ItemPointer itemptr, int64 encoded);
 static bool validate_index_callback(ItemPointer itemptr, void *opaque);
-static void validate_index_heapscan(Relation heapRelation,
-						Relation indexRelation,
-						IndexInfo *indexInfo,
-						Snapshot snapshot,
-						v_i_state *state);
 static bool ReindexIsCurrentlyProcessingIndex(Oid indexOid);
 static void SetReindexProcessing(Oid heapOid, Oid indexOid);
 static void ResetReindexProcessing(void);
@@ -2401,557 +2384,6 @@ index_build(Relation heapRelation,
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 }
 
-
-/*
- * IndexBuildHeapScan - scan the heap relation to find tuples to be indexed
- *
- * This is called back from an access-method-specific index build procedure
- * after the AM has done whatever setup it needs.  The parent heap relation
- * is scanned to find tuples that should be entered into the index.  Each
- * such tuple is passed to the AM's callback routine, which does the right
- * things to add it to the new index.  After we return, the AM's index
- * build procedure does whatever cleanup it needs.
- *
- * The total count of live heap tuples is returned.  This is for updating
- * pg_class statistics.  (It's annoying not to be able to do that here, but we
- * want to merge that update with others; see index_update_stats.)  Note that
- * the index AM itself must keep track of the number of index tuples; we don't
- * do so here because the AM might reject some of the tuples for its own
- * reasons, such as being unable to store NULLs.
- *
- * A side effect is to set indexInfo->ii_BrokenHotChain to true if we detect
- * any potentially broken HOT chains.  Currently, we set this if there are
- * any RECENTLY_DEAD or DELETE_IN_PROGRESS entries in a HOT chain, without
- * trying very hard to detect whether they're really incompatible with the
- * chain tip.
- */
-double
-IndexBuildHeapScan(Relation heapRelation,
-				   Relation indexRelation,
-				   IndexInfo *indexInfo,
-				   bool allow_sync,
-				   IndexBuildCallback callback,
-				   void *callback_state,
-				   TableScanDesc scan)
-{
-	return IndexBuildHeapRangeScan(heapRelation, indexRelation,
-								   indexInfo, allow_sync,
-								   false,
-								   0, InvalidBlockNumber,
-								   callback, callback_state, scan);
-}
-
-/*
- * As above, except that instead of scanning the complete heap, only the given
- * number of blocks are scanned.  Scan to end-of-rel can be signalled by
- * passing InvalidBlockNumber as numblocks.  Note that restricting the range
- * to scan cannot be done when requesting syncscan.
- *
- * When "anyvisible" mode is requested, all tuples visible to any transaction
- * are indexed and counted as live, including those inserted or deleted by
- * transactions that are still in progress.
- */
-double
-IndexBuildHeapRangeScan(Relation heapRelation,
-						Relation indexRelation,
-						IndexInfo *indexInfo,
-						bool allow_sync,
-						bool anyvisible,
-						BlockNumber start_blockno,
-						BlockNumber numblocks,
-						IndexBuildCallback callback,
-						void *callback_state,
-						TableScanDesc scan)
-{
-	HeapScanDesc hscan;
-	bool		is_system_catalog;
-	bool		checking_uniqueness;
-	HeapTuple	heapTuple;
-	Datum		values[INDEX_MAX_KEYS];
-	bool		isnull[INDEX_MAX_KEYS];
-	double		reltuples;
-	ExprState  *predicate;
-	TupleTableSlot *slot;
-	EState	   *estate;
-	ExprContext *econtext;
-	Snapshot	snapshot;
-	bool		need_unregister_snapshot = false;
-	TransactionId OldestXmin;
-	BlockNumber root_blkno = InvalidBlockNumber;
-	OffsetNumber root_offsets[MaxHeapTuplesPerPage];
-
-	/*
-	 * sanity checks
-	 */
-	Assert(OidIsValid(indexRelation->rd_rel->relam));
-
-	/* Remember if it's a system catalog */
-	is_system_catalog = IsSystemRelation(heapRelation);
-
-	/* See whether we're verifying uniqueness/exclusion properties */
-	checking_uniqueness = (indexInfo->ii_Unique ||
-						   indexInfo->ii_ExclusionOps != NULL);
-
-	/*
-	 * "Any visible" mode is not compatible with uniqueness checks; make sure
-	 * only one of those is requested.
-	 */
-	Assert(!(anyvisible && checking_uniqueness));
-
-	/*
-	 * Need an EState for evaluation of index expressions and partial-index
-	 * predicates.  Also a slot to hold the current tuple.
-	 */
-	estate = CreateExecutorState();
-	econtext = GetPerTupleExprContext(estate);
-	slot = table_slot_create(heapRelation, NULL);
-
-	/* Arrange for econtext's scan tuple to be the tuple under test */
-	econtext->ecxt_scantuple = slot;
-
-	/* Set up execution state for predicate, if any. */
-	predicate = ExecPrepareQual(indexInfo->ii_Predicate, estate);
-
-	/*
-	 * Prepare for scan of the base relation.  In a normal index build, we use
-	 * SnapshotAny because we must retrieve all tuples and do our own time
-	 * qual checks (because we have to index RECENTLY_DEAD tuples). In a
-	 * concurrent build, or during bootstrap, we take a regular MVCC snapshot
-	 * and index whatever's live according to that.
-	 */
-	OldestXmin = InvalidTransactionId;
-
-	/* okay to ignore lazy VACUUMs here */
-	if (!IsBootstrapProcessingMode() && !indexInfo->ii_Concurrent)
-		OldestXmin = GetOldestXmin(heapRelation, PROCARRAY_FLAGS_VACUUM);
-
-	if (!scan)
-	{
-		/*
-		 * Serial index build.
-		 *
-		 * Must begin our own heap scan in this case.  We may also need to
-		 * register a snapshot whose lifetime is under our direct control.
-		 */
-		if (!TransactionIdIsValid(OldestXmin))
-		{
-			snapshot = RegisterSnapshot(GetTransactionSnapshot());
-			need_unregister_snapshot = true;
-		}
-		else
-			snapshot = SnapshotAny;
-
-		scan = table_beginscan_strat(heapRelation,	/* relation */
-									 snapshot,	/* snapshot */
-									 0,	/* number of keys */
-									 NULL,	/* scan key */
-									 true,	/* buffer access strategy OK */
-									 allow_sync);	/* syncscan OK? */
-	}
-	else
-	{
-		/*
-		 * Parallel index build.
-		 *
-		 * Parallel case never registers/unregisters own snapshot.  Snapshot
-		 * is taken from parallel heap scan, and is SnapshotAny or an MVCC
-		 * snapshot, based on same criteria as serial case.
-		 */
-		Assert(!IsBootstrapProcessingMode());
-		Assert(allow_sync);
-		snapshot = scan->rs_snapshot;
-	}
-
-	hscan = (HeapScanDesc) scan;
-
-	/*
-	 * Must call GetOldestXmin() with SnapshotAny.  Should never call
-	 * GetOldestXmin() with MVCC snapshot. (It's especially worth checking
-	 * this for parallel builds, since ambuild routines that support parallel
-	 * builds must work these details out for themselves.)
-	 */
-	Assert(snapshot == SnapshotAny || IsMVCCSnapshot(snapshot));
-	Assert(snapshot == SnapshotAny ? TransactionIdIsValid(OldestXmin) :
-		   !TransactionIdIsValid(OldestXmin));
-	Assert(snapshot == SnapshotAny || !anyvisible);
-
-	/* set our scan endpoints */
-	if (!allow_sync)
-		heap_setscanlimits(scan, start_blockno, numblocks);
-	else
-	{
-		/* syncscan can only be requested on whole relation */
-		Assert(start_blockno == 0);
-		Assert(numblocks == InvalidBlockNumber);
-	}
-
-	reltuples = 0;
-
-	/*
-	 * Scan all tuples in the base relation.
-	 */
-	while ((heapTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
-	{
-		bool		tupleIsAlive;
-
-		CHECK_FOR_INTERRUPTS();
-
-		/*
-		 * When dealing with a HOT-chain of updated tuples, we want to index
-		 * the values of the live tuple (if any), but index it under the TID
-		 * of the chain's root tuple.  This approach is necessary to preserve
-		 * the HOT-chain structure in the heap. So we need to be able to find
-		 * the root item offset for every tuple that's in a HOT-chain.  When
-		 * first reaching a new page of the relation, call
-		 * heap_get_root_tuples() to build a map of root item offsets on the
-		 * page.
-		 *
-		 * It might look unsafe to use this information across buffer
-		 * lock/unlock.  However, we hold ShareLock on the table so no
-		 * ordinary insert/update/delete should occur; and we hold pin on the
-		 * buffer continuously while visiting the page, so no pruning
-		 * operation can occur either.
-		 *
-		 * Also, although our opinions about tuple liveness could change while
-		 * we scan the page (due to concurrent transaction commits/aborts),
-		 * the chain root locations won't, so this info doesn't need to be
-		 * rebuilt after waiting for another transaction.
-		 *
-		 * Note the implied assumption that there is no more than one live
-		 * tuple per HOT-chain --- else we could create more than one index
-		 * entry pointing to the same root tuple.
-		 */
-		if (hscan->rs_cblock != root_blkno)
-		{
-			Page		page = BufferGetPage(hscan->rs_cbuf);
-
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
-			heap_get_root_tuples(page, root_offsets);
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-
-			root_blkno = hscan->rs_cblock;
-		}
-
-		if (snapshot == SnapshotAny)
-		{
-			/* do our own time qual check */
-			bool		indexIt;
-			TransactionId xwait;
-
-	recheck:
-
-			/*
-			 * We could possibly get away with not locking the buffer here,
-			 * since caller should hold ShareLock on the relation, but let's
-			 * be conservative about it.  (This remark is still correct even
-			 * with HOT-pruning: our pin on the buffer prevents pruning.)
-			 */
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
-
-			/*
-			 * The criteria for counting a tuple as live in this block need to
-			 * match what analyze.c's acquire_sample_rows() does, otherwise
-			 * CREATE INDEX and ANALYZE may produce wildly different reltuples
-			 * values, e.g. when there are many recently-dead tuples.
-			 */
-			switch (HeapTupleSatisfiesVacuum(heapTuple, OldestXmin,
-											 hscan->rs_cbuf))
-			{
-				case HEAPTUPLE_DEAD:
-					/* Definitely dead, we can ignore it */
-					indexIt = false;
-					tupleIsAlive = false;
-					break;
-				case HEAPTUPLE_LIVE:
-					/* Normal case, index and unique-check it */
-					indexIt = true;
-					tupleIsAlive = true;
-					/* Count it as live, too */
-					reltuples += 1;
-					break;
-				case HEAPTUPLE_RECENTLY_DEAD:
-
-					/*
-					 * If tuple is recently deleted then we must index it
-					 * anyway to preserve MVCC semantics.  (Pre-existing
-					 * transactions could try to use the index after we finish
-					 * building it, and may need to see such tuples.)
-					 *
-					 * However, if it was HOT-updated then we must only index
-					 * the live tuple at the end of the HOT-chain.  Since this
-					 * breaks semantics for pre-existing snapshots, mark the
-					 * index as unusable for them.
-					 *
-					 * We don't count recently-dead tuples in reltuples, even
-					 * if we index them; see acquire_sample_rows().
-					 */
-					if (HeapTupleIsHotUpdated(heapTuple))
-					{
-						indexIt = false;
-						/* mark the index as unsafe for old snapshots */
-						indexInfo->ii_BrokenHotChain = true;
-					}
-					else
-						indexIt = true;
-					/* In any case, exclude the tuple from unique-checking */
-					tupleIsAlive = false;
-					break;
-				case HEAPTUPLE_INSERT_IN_PROGRESS:
-
-					/*
-					 * In "anyvisible" mode, this tuple is visible and we
-					 * don't need any further checks.
-					 */
-					if (anyvisible)
-					{
-						indexIt = true;
-						tupleIsAlive = true;
-						reltuples += 1;
-						break;
-					}
-
-					/*
-					 * Since caller should hold ShareLock or better, normally
-					 * the only way to see this is if it was inserted earlier
-					 * in our own transaction.  However, it can happen in
-					 * system catalogs, since we tend to release write lock
-					 * before commit there.  Give a warning if neither case
-					 * applies.
-					 */
-					xwait = HeapTupleHeaderGetXmin(heapTuple->t_data);
-					if (!TransactionIdIsCurrentTransactionId(xwait))
-					{
-						if (!is_system_catalog)
-							elog(WARNING, "concurrent insert in progress within table \"%s\"",
-								 RelationGetRelationName(heapRelation));
-
-						/*
-						 * If we are performing uniqueness checks, indexing
-						 * such a tuple could lead to a bogus uniqueness
-						 * failure.  In that case we wait for the inserting
-						 * transaction to finish and check again.
-						 */
-						if (checking_uniqueness)
-						{
-							/*
-							 * Must drop the lock on the buffer before we wait
-							 */
-							LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-							XactLockTableWait(xwait, heapRelation,
-											  &heapTuple->t_self,
-											  XLTW_InsertIndexUnique);
-							CHECK_FOR_INTERRUPTS();
-							goto recheck;
-						}
-					}
-					else
-					{
-						/*
-						 * For consistency with acquire_sample_rows(), count
-						 * HEAPTUPLE_INSERT_IN_PROGRESS tuples as live only
-						 * when inserted by our own transaction.
-						 */
-						reltuples += 1;
-					}
-
-					/*
-					 * We must index such tuples, since if the index build
-					 * commits then they're good.
-					 */
-					indexIt = true;
-					tupleIsAlive = true;
-					break;
-				case HEAPTUPLE_DELETE_IN_PROGRESS:
-
-					/*
-					 * As with INSERT_IN_PROGRESS case, this is unexpected
-					 * unless it's our own deletion or a system catalog; but
-					 * in anyvisible mode, this tuple is visible.
-					 */
-					if (anyvisible)
-					{
-						indexIt = true;
-						tupleIsAlive = false;
-						reltuples += 1;
-						break;
-					}
-
-					xwait = HeapTupleHeaderGetUpdateXid(heapTuple->t_data);
-					if (!TransactionIdIsCurrentTransactionId(xwait))
-					{
-						if (!is_system_catalog)
-							elog(WARNING, "concurrent delete in progress within table \"%s\"",
-								 RelationGetRelationName(heapRelation));
-
-						/*
-						 * If we are performing uniqueness checks, assuming
-						 * the tuple is dead could lead to missing a
-						 * uniqueness violation.  In that case we wait for the
-						 * deleting transaction to finish and check again.
-						 *
-						 * Also, if it's a HOT-updated tuple, we should not
-						 * index it but rather the live tuple at the end of
-						 * the HOT-chain.  However, the deleting transaction
-						 * could abort, possibly leaving this tuple as live
-						 * after all, in which case it has to be indexed. The
-						 * only way to know what to do is to wait for the
-						 * deleting transaction to finish and check again.
-						 */
-						if (checking_uniqueness ||
-							HeapTupleIsHotUpdated(heapTuple))
-						{
-							/*
-							 * Must drop the lock on the buffer before we wait
-							 */
-							LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-							XactLockTableWait(xwait, heapRelation,
-											  &heapTuple->t_self,
-											  XLTW_InsertIndexUnique);
-							CHECK_FOR_INTERRUPTS();
-							goto recheck;
-						}
-
-						/*
-						 * Otherwise index it but don't check for uniqueness,
-						 * the same as a RECENTLY_DEAD tuple.
-						 */
-						indexIt = true;
-
-						/*
-						 * Count HEAPTUPLE_DELETE_IN_PROGRESS tuples as live,
-						 * if they were not deleted by the current
-						 * transaction.  That's what acquire_sample_rows()
-						 * does, and we want the behavior to be consistent.
-						 */
-						reltuples += 1;
-					}
-					else if (HeapTupleIsHotUpdated(heapTuple))
-					{
-						/*
-						 * It's a HOT-updated tuple deleted by our own xact.
-						 * We can assume the deletion will commit (else the
-						 * index contents don't matter), so treat the same as
-						 * RECENTLY_DEAD HOT-updated tuples.
-						 */
-						indexIt = false;
-						/* mark the index as unsafe for old snapshots */
-						indexInfo->ii_BrokenHotChain = true;
-					}
-					else
-					{
-						/*
-						 * It's a regular tuple deleted by our own xact. Index
-						 * it, but don't check for uniqueness nor count in
-						 * reltuples, the same as a RECENTLY_DEAD tuple.
-						 */
-						indexIt = true;
-					}
-					/* In any case, exclude the tuple from unique-checking */
-					tupleIsAlive = false;
-					break;
-				default:
-					elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
-					indexIt = tupleIsAlive = false; /* keep compiler quiet */
-					break;
-			}
-
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-
-			if (!indexIt)
-				continue;
-		}
-		else
-		{
-			/* heap_getnext did the time qual check */
-			tupleIsAlive = true;
-			reltuples += 1;
-		}
-
-		MemoryContextReset(econtext->ecxt_per_tuple_memory);
-
-		/* Set up for predicate or expression evaluation */
-		ExecStoreBufferHeapTuple(heapTuple, slot, hscan->rs_cbuf);
-
-		/*
-		 * In a partial index, discard tuples that don't satisfy the
-		 * predicate.
-		 */
-		if (predicate != NULL)
-		{
-			if (!ExecQual(predicate, econtext))
-				continue;
-		}
-
-		/*
-		 * For the current heap tuple, extract all the attributes we use in
-		 * this index, and note which are null.  This also performs evaluation
-		 * of any expressions needed.
-		 */
-		FormIndexDatum(indexInfo,
-					   slot,
-					   estate,
-					   values,
-					   isnull);
-
-		/*
-		 * You'd think we should go ahead and build the index tuple here, but
-		 * some index AMs want to do further processing on the data first.  So
-		 * pass the values[] and isnull[] arrays, instead.
-		 */
-
-		if (HeapTupleIsHeapOnly(heapTuple))
-		{
-			/*
-			 * For a heap-only tuple, pretend its TID is that of the root. See
-			 * src/backend/access/heap/README.HOT for discussion.
-			 */
-			HeapTupleData rootTuple;
-			OffsetNumber offnum;
-
-			rootTuple = *heapTuple;
-			offnum = ItemPointerGetOffsetNumber(&heapTuple->t_self);
-
-			if (!OffsetNumberIsValid(root_offsets[offnum - 1]))
-				ereport(ERROR,
-						(errcode(ERRCODE_DATA_CORRUPTED),
-						 errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"",
-										 ItemPointerGetBlockNumber(&heapTuple->t_self),
-										 offnum,
-										 RelationGetRelationName(heapRelation))));
-
-			ItemPointerSetOffsetNumber(&rootTuple.t_self,
-									   root_offsets[offnum - 1]);
-
-			/* Call the AM's callback routine to process the tuple */
-			callback(indexRelation, &rootTuple, values, isnull, tupleIsAlive,
-					 callback_state);
-		}
-		else
-		{
-			/* Call the AM's callback routine to process the tuple */
-			callback(indexRelation, heapTuple, values, isnull, tupleIsAlive,
-					 callback_state);
-		}
-	}
-
-	table_endscan(scan);
-
-	/* we can now forget our snapshot, if set and registered by us */
-	if (need_unregister_snapshot)
-		UnregisterSnapshot(snapshot);
-
-	ExecDropSingleTupleTableSlot(slot);
-
-	FreeExecutorState(estate);
-
-	/* These may have been pointing to the now-gone estate */
-	indexInfo->ii_ExpressionsState = NIL;
-	indexInfo->ii_PredicateState = NULL;
-
-	return reltuples;
-}
-
-
 /*
  * IndexCheckExclusion - verify that a new exclusion constraint is satisfied
  *
@@ -3127,7 +2559,7 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 				indexRelation;
 	IndexInfo  *indexInfo;
 	IndexVacuumInfo ivinfo;
-	v_i_state	state;
+	ValidateIndexState state;
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
@@ -3188,11 +2620,11 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	/*
 	 * Now scan the heap and "merge" it with the index
 	 */
-	validate_index_heapscan(heapRelation,
-							indexRelation,
-							indexInfo,
-							snapshot,
-							&state);
+	table_index_validate_scan(heapRelation,
+							  indexRelation,
+							  indexInfo,
+							  snapshot,
+							  &state);
 
 	/* Done with tuplesort object */
 	tuplesort_end(state.tuplesort);
@@ -3212,53 +2644,13 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	table_close(heapRelation, NoLock);
 }
 
-/*
- * itemptr_encode - Encode ItemPointer as int64/int8
- *
- * This representation must produce values encoded as int64 that sort in the
- * same order as their corresponding original TID values would (using the
- * default int8 opclass to produce a result equivalent to the default TID
- * opclass).
- *
- * As noted in validate_index(), this can be significantly faster.
- */
-static inline int64
-itemptr_encode(ItemPointer itemptr)
-{
-	BlockNumber block = ItemPointerGetBlockNumber(itemptr);
-	OffsetNumber offset = ItemPointerGetOffsetNumber(itemptr);
-	int64		encoded;
-
-	/*
-	 * Use the 16 least significant bits for the offset.  32 adjacent bits are
-	 * used for the block number.  Since remaining bits are unused, there
-	 * cannot be negative encoded values (We assume a two's complement
-	 * representation).
-	 */
-	encoded = ((uint64) block << 16) | (uint16) offset;
-
-	return encoded;
-}
-
-/*
- * itemptr_decode - Decode int64/int8 representation back to ItemPointer
- */
-static inline void
-itemptr_decode(ItemPointer itemptr, int64 encoded)
-{
-	BlockNumber block = (BlockNumber) (encoded >> 16);
-	OffsetNumber offset = (OffsetNumber) (encoded & 0xFFFF);
-
-	ItemPointerSet(itemptr, block, offset);
-}
-
 /*
  * validate_index_callback - bulkdelete callback to collect the index TIDs
  */
 static bool
 validate_index_callback(ItemPointer itemptr, void *opaque)
 {
-	v_i_state  *state = (v_i_state *) opaque;
+	ValidateIndexState *state = (ValidateIndexState *) opaque;
 	int64		encoded = itemptr_encode(itemptr);
 
 	tuplesort_putdatum(state->tuplesort, Int64GetDatum(encoded), false);
@@ -3266,245 +2658,6 @@ validate_index_callback(ItemPointer itemptr, void *opaque)
 	return false;				/* never actually delete anything */
 }
 
-/*
- * validate_index_heapscan - second table scan for concurrent index build
- *
- * This has much code in common with IndexBuildHeapScan, but it's enough
- * different that it seems cleaner to have two routines not one.
- */
-static void
-validate_index_heapscan(Relation heapRelation,
-						Relation indexRelation,
-						IndexInfo *indexInfo,
-						Snapshot snapshot,
-						v_i_state *state)
-{
-	TableScanDesc scan;
-	HeapScanDesc hscan;
-	HeapTuple	heapTuple;
-	Datum		values[INDEX_MAX_KEYS];
-	bool		isnull[INDEX_MAX_KEYS];
-	ExprState  *predicate;
-	TupleTableSlot *slot;
-	EState	   *estate;
-	ExprContext *econtext;
-	BlockNumber root_blkno = InvalidBlockNumber;
-	OffsetNumber root_offsets[MaxHeapTuplesPerPage];
-	bool		in_index[MaxHeapTuplesPerPage];
-
-	/* state variables for the merge */
-	ItemPointer indexcursor = NULL;
-	ItemPointerData decoded;
-	bool		tuplesort_empty = false;
-
-	/*
-	 * sanity checks
-	 */
-	Assert(OidIsValid(indexRelation->rd_rel->relam));
-
-	/*
-	 * Need an EState for evaluation of index expressions and partial-index
-	 * predicates.  Also a slot to hold the current tuple.
-	 */
-	estate = CreateExecutorState();
-	econtext = GetPerTupleExprContext(estate);
-	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
-									&TTSOpsHeapTuple);
-
-	/* Arrange for econtext's scan tuple to be the tuple under test */
-	econtext->ecxt_scantuple = slot;
-
-	/* Set up execution state for predicate, if any. */
-	predicate = ExecPrepareQual(indexInfo->ii_Predicate, estate);
-
-	/*
-	 * Prepare for scan of the base relation.  We need just those tuples
-	 * satisfying the passed-in reference snapshot.  We must disable syncscan
-	 * here, because it's critical that we read from block zero forward to
-	 * match the sorted TIDs.
-	 */
-	scan = table_beginscan_strat(heapRelation,	/* relation */
-								 snapshot,	/* snapshot */
-								 0,	/* number of keys */
-								 NULL,	/* scan key */
-								 true,	/* buffer access strategy OK */
-								 false); /* syncscan not OK */
-	hscan = (HeapScanDesc) scan;
-
-	/*
-	 * Scan all tuples matching the snapshot.
-	 */
-	while ((heapTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
-	{
-		ItemPointer heapcursor = &heapTuple->t_self;
-		ItemPointerData rootTuple;
-		OffsetNumber root_offnum;
-
-		CHECK_FOR_INTERRUPTS();
-
-		state->htups += 1;
-
-		/*
-		 * As commented in IndexBuildHeapScan, we should index heap-only
-		 * tuples under the TIDs of their root tuples; so when we advance onto
-		 * a new heap page, build a map of root item offsets on the page.
-		 *
-		 * This complicates merging against the tuplesort output: we will
-		 * visit the live tuples in order by their offsets, but the root
-		 * offsets that we need to compare against the index contents might be
-		 * ordered differently.  So we might have to "look back" within the
-		 * tuplesort output, but only within the current page.  We handle that
-		 * by keeping a bool array in_index[] showing all the
-		 * already-passed-over tuplesort output TIDs of the current page. We
-		 * clear that array here, when advancing onto a new heap page.
-		 */
-		if (hscan->rs_cblock != root_blkno)
-		{
-			Page		page = BufferGetPage(hscan->rs_cbuf);
-
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
-			heap_get_root_tuples(page, root_offsets);
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-
-			memset(in_index, 0, sizeof(in_index));
-
-			root_blkno = hscan->rs_cblock;
-		}
-
-		/* Convert actual tuple TID to root TID */
-		rootTuple = *heapcursor;
-		root_offnum = ItemPointerGetOffsetNumber(heapcursor);
-
-		if (HeapTupleIsHeapOnly(heapTuple))
-		{
-			root_offnum = root_offsets[root_offnum - 1];
-			if (!OffsetNumberIsValid(root_offnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_DATA_CORRUPTED),
-						 errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"",
-										 ItemPointerGetBlockNumber(heapcursor),
-										 ItemPointerGetOffsetNumber(heapcursor),
-										 RelationGetRelationName(heapRelation))));
-			ItemPointerSetOffsetNumber(&rootTuple, root_offnum);
-		}
-
-		/*
-		 * "merge" by skipping through the index tuples until we find or pass
-		 * the current root tuple.
-		 */
-		while (!tuplesort_empty &&
-			   (!indexcursor ||
-				ItemPointerCompare(indexcursor, &rootTuple) < 0))
-		{
-			Datum		ts_val;
-			bool		ts_isnull;
-
-			if (indexcursor)
-			{
-				/*
-				 * Remember index items seen earlier on the current heap page
-				 */
-				if (ItemPointerGetBlockNumber(indexcursor) == root_blkno)
-					in_index[ItemPointerGetOffsetNumber(indexcursor) - 1] = true;
-			}
-
-			tuplesort_empty = !tuplesort_getdatum(state->tuplesort, true,
-												  &ts_val, &ts_isnull, NULL);
-			Assert(tuplesort_empty || !ts_isnull);
-			if (!tuplesort_empty)
-			{
-				itemptr_decode(&decoded, DatumGetInt64(ts_val));
-				indexcursor = &decoded;
-
-				/* If int8 is pass-by-ref, free (encoded) TID Datum memory */
-#ifndef USE_FLOAT8_BYVAL
-				pfree(DatumGetPointer(ts_val));
-#endif
-			}
-			else
-			{
-				/* Be tidy */
-				indexcursor = NULL;
-			}
-		}
-
-		/*
-		 * If the tuplesort has overshot *and* we didn't see a match earlier,
-		 * then this tuple is missing from the index, so insert it.
-		 */
-		if ((tuplesort_empty ||
-			 ItemPointerCompare(indexcursor, &rootTuple) > 0) &&
-			!in_index[root_offnum - 1])
-		{
-			MemoryContextReset(econtext->ecxt_per_tuple_memory);
-
-			/* Set up for predicate or expression evaluation */
-			ExecStoreHeapTuple(heapTuple, slot, false);
-
-			/*
-			 * In a partial index, discard tuples that don't satisfy the
-			 * predicate.
-			 */
-			if (predicate != NULL)
-			{
-				if (!ExecQual(predicate, econtext))
-					continue;
-			}
-
-			/*
-			 * For the current heap tuple, extract all the attributes we use
-			 * in this index, and note which are null.  This also performs
-			 * evaluation of any expressions needed.
-			 */
-			FormIndexDatum(indexInfo,
-						   slot,
-						   estate,
-						   values,
-						   isnull);
-
-			/*
-			 * You'd think we should go ahead and build the index tuple here,
-			 * but some index AMs want to do further processing on the data
-			 * first. So pass the values[] and isnull[] arrays, instead.
-			 */
-
-			/*
-			 * If the tuple is already committed dead, you might think we
-			 * could suppress uniqueness checking, but this is no longer true
-			 * in the presence of HOT, because the insert is actually a proxy
-			 * for a uniqueness check on the whole HOT-chain.  That is, the
-			 * tuple we have here could be dead because it was already
-			 * HOT-updated, and if so the updating transaction will not have
-			 * thought it should insert index entries.  The index AM will
-			 * check the whole HOT-chain and correctly detect a conflict if
-			 * there is one.
-			 */
-
-			index_insert(indexRelation,
-						 values,
-						 isnull,
-						 &rootTuple,
-						 heapRelation,
-						 indexInfo->ii_Unique ?
-						 UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
-						 indexInfo);
-
-			state->tups_inserted += 1;
-		}
-	}
-
-	table_endscan(scan);
-
-	ExecDropSingleTupleTableSlot(slot);
-
-	FreeExecutorState(estate);
-
-	/* These may have been pointing to the now-gone estate */
-	indexInfo->ii_ExpressionsState = NIL;
-	indexInfo->ii_PredicateState = NULL;
-}
-
-
 /*
  * index_set_state_flags - adjust pg_index state flags
  *
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2c2d388dda6..bd2cdd34e08 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -16,6 +16,7 @@
 
 #include "access/relscan.h"
 #include "access/sdir.h"
+#include "catalog/index.h"
 #include "utils/guc.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
@@ -27,6 +28,7 @@ extern char *default_table_access_method;
 extern bool synchronize_seqscans;
 
 
+struct ValidateIndexState;
 struct BulkInsertStateData;
 
 /*
@@ -294,6 +296,28 @@ typedef struct TableAmRoutine
 	 */
 	void		(*finish_bulk_insert) (Relation rel, int options);
 
+
+	/* ------------------------------------------------------------------------
+	 * DDL related functionality.
+	 * ------------------------------------------------------------------------
+	 */
+
+	double		(*index_build_range_scan) (Relation heap_rel,
+										   Relation index_rel,
+										   IndexInfo *index_nfo,
+										   bool allow_sync,
+										   bool anyvisible,
+										   BlockNumber start_blockno,
+										   BlockNumber end_blockno,
+										   IndexBuildCallback callback,
+										   void *callback_state,
+										   TableScanDesc scan);
+	void		(*index_validate_scan) (Relation heap_rel,
+										Relation index_rel,
+										IndexInfo *index_info,
+										Snapshot snapshot,
+										struct ValidateIndexState *state);
+
 } TableAmRoutine;
 
 
@@ -706,6 +730,71 @@ table_finish_bulk_insert(Relation rel, int options)
 }
 
 
+/* ------------------------------------------------------------------------
+ * DDL related functionality.
+ * ------------------------------------------------------------------------
+ */
+
+static inline double
+table_index_build_scan(Relation heap_rel,
+					   Relation index_rel,
+					   IndexInfo *index_nfo,
+					   bool allow_sync,
+					   IndexBuildCallback callback,
+					   void *callback_state,
+					   TableScanDesc scan)
+{
+	return heap_rel->rd_tableam->index_build_range_scan(heap_rel,
+														index_rel,
+														index_nfo,
+														allow_sync,
+														false,
+														0,
+														InvalidBlockNumber,
+														callback,
+														callback_state,
+														scan);
+}
+
+static inline void
+table_index_validate_scan(Relation heap_rel,
+						  Relation index_rel,
+						  IndexInfo *index_info,
+						  Snapshot snapshot,
+						  struct ValidateIndexState *state)
+{
+	heap_rel->rd_tableam->index_validate_scan(heap_rel,
+											  index_rel,
+											  index_info,
+											  snapshot,
+											  state);
+}
+
+static inline double
+table_index_build_range_scan(Relation heap_rel,
+							 Relation index_rel,
+							 IndexInfo *index_nfo,
+							 bool allow_sync,
+							 bool anyvisible,
+							 BlockNumber start_blockno,
+							 BlockNumber numblocks,
+							 IndexBuildCallback callback,
+							 void *callback_state,
+							 TableScanDesc scan)
+{
+	return heap_rel->rd_tableam->index_build_range_scan(heap_rel,
+														index_rel,
+														index_nfo,
+														allow_sync,
+														anyvisible,
+														start_blockno,
+														numblocks,
+														callback,
+														callback_state,
+														scan);
+}
+
+
 /* ----------------------------------------------------------------------------
  * Functions to make modifications a bit simpler.
  * ----------------------------------------------------------------------------
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 29f7ed62379..8ac0e11f5fe 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -20,7 +20,7 @@
 
 #define DEFAULT_INDEX_TYPE	"btree"
 
-/* Typedef for callback function for IndexBuildHeapScan */
+/* Typedef for callback function for table_index_build_scan */
 typedef void (*IndexBuildCallback) (Relation index,
 									HeapTuple htup,
 									Datum *values,
@@ -37,6 +37,15 @@ typedef enum
 	INDEX_DROP_SET_DEAD
 } IndexStateFlagsAction;
 
+/* state info for validate_index bulkdelete callback */
+typedef struct ValidateIndexState
+{
+	Tuplesortstate *tuplesort;	/* for sorting the index TIDs */
+	/* statistics (for debug purposes only): */
+	double		htups,
+				itups,
+				tups_inserted;
+} ValidateIndexState;
 
 extern void index_check_primary_key(Relation heapRel,
 						IndexInfo *indexInfo,
@@ -110,25 +119,6 @@ extern void index_build(Relation heapRelation,
 			bool isreindex,
 			bool parallel);
 
-struct TableScanDescData;
-extern double IndexBuildHeapScan(Relation heapRelation,
-				   Relation indexRelation,
-				   IndexInfo *indexInfo,
-				   bool allow_sync,
-				   IndexBuildCallback callback,
-				   void *callback_state,
-				   struct TableScanDescData *scan);
-extern double IndexBuildHeapRangeScan(Relation heapRelation,
-						Relation indexRelation,
-						IndexInfo *indexInfo,
-						bool allow_sync,
-						bool anyvisible,
-						BlockNumber start_blockno,
-						BlockNumber end_blockno,
-						IndexBuildCallback callback,
-						void *callback_state,
-						struct TableScanDescData *scan);
-
 extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
 
 extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
@@ -155,4 +145,45 @@ extern void RestoreReindexState(void *reindexstate);
 
 extern void IndexSetParentIndex(Relation idx, Oid parentOid);
 
+
+/*
+ * itemptr_encode - Encode ItemPointer as int64/int8
+ *
+ * This representation must produce values encoded as int64 that sort in the
+ * same order as their corresponding original TID values would (using the
+ * default int8 opclass to produce a result equivalent to the default TID
+ * opclass).
+ *
+ * As noted in validate_index(), this can be significantly faster.
+ */
+static inline int64
+itemptr_encode(ItemPointer itemptr)
+{
+	BlockNumber block = ItemPointerGetBlockNumber(itemptr);
+	OffsetNumber offset = ItemPointerGetOffsetNumber(itemptr);
+	int64		encoded;
+
+	/*
+	 * Use the 16 least significant bits for the offset.  32 adjacent bits are
+	 * used for the block number.  Since remaining bits are unused, there
+	 * cannot be negative encoded values (We assume a two's complement
+	 * representation).
+	 */
+	encoded = ((uint64) block << 16) | (uint16) offset;
+
+	return encoded;
+}
+
+/*
+ * itemptr_decode - Decode int64/int8 representation back to ItemPointer
+ */
+static inline void
+itemptr_decode(ItemPointer itemptr, int64 encoded)
+{
+	BlockNumber block = (BlockNumber) (encoded >> 16);
+	OffsetNumber offset = (OffsetNumber) (encoded & 0xFFFF);
+
+	ItemPointerSet(itemptr, block, offset);
+}
+
 #endif							/* INDEX_H */
-- 
2.21.0.dirty

v18-0011-tableam-relation-creation-VACUUM-FULL-CLUSTER-SE.patchtext/x-diff; charset=us-asciiDownload
From 409d13fbba72bbce42e5ae34527d407c02c30e53 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 19 Jan 2019 23:59:18 -0800
Subject: [PATCH v18 11/18] tableam: relation creation, VACUUM FULL/CLUSTER,
 SET TABLESPACE.

TODO: Minimize differences / update code movevement to a few later
comment changes.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/heap/heapam_handler.c | 376 +++++++++++++++++++++++
 src/backend/bootstrap/bootparse.y        |   7 +-
 src/backend/catalog/heap.c               | 120 +++-----
 src/backend/catalog/index.c              |  11 +-
 src/backend/catalog/storage.c            |  88 ++++++
 src/backend/commands/cluster.c           | 282 +----------------
 src/backend/commands/sequence.c          |  30 +-
 src/backend/commands/tablecmds.c         | 182 +++--------
 src/backend/utils/cache/relcache.c       |  77 +++--
 src/backend/utils/sort/tuplesort.c       |   5 +-
 src/include/access/heapam.h              |  12 +
 src/include/access/rewriteheap.h         |  11 -
 src/include/access/tableam.h             |  44 +++
 src/include/catalog/heap.h               |   6 +-
 src/include/catalog/storage.h            |   3 +
 src/include/utils/relcache.h             |   3 +-
 16 files changed, 722 insertions(+), 535 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 1bf08b47250..9e67d48b6ea 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -23,16 +23,29 @@
 
 #include "access/genam.h"
 #include "access/heapam.h"
+#include "access/multixact.h"
 #include "access/tableam.h"
 #include "access/xact.h"
 #include "catalog/catalog.h"
 #include "catalog/index.h"
+#include "catalog/storage.h"
+#include "catalog/storage_xlog.h"
 #include "executor/executor.h"
 #include "storage/bufmgr.h"
 #include "storage/bufpage.h"
+#include "storage/bufmgr.h"
 #include "storage/lmgr.h"
+#include "storage/predicate.h"
 #include "storage/procarray.h"
+#include "storage/smgr.h"
 #include "utils/builtins.h"
+#include "utils/rel.h"
+
+
+static void
+reform_and_rewrite_tuple(HeapTuple tuple,
+						 Relation OldHeap, Relation NewHeap,
+						 Datum *values, bool *isnull, RewriteState rwstate);
 
 
 static const TableAmRoutine heapam_methods;
@@ -566,6 +579,322 @@ heapam_finish_bulk_insert(Relation relation, int options)
  * ------------------------------------------------------------------------
  */
 
+static void
+heapam_set_new_filenode(Relation rel, char persistence,
+						TransactionId *freezeXid, MultiXactId *minmulti)
+{
+	/*
+	 * Initialize to the minimum XID that could put tuples in the table. We
+	 * know that no xacts older than RecentXmin are still running, so that
+	 * will do.
+	 */
+	*freezeXid = RecentXmin;
+
+	/*
+	 * Similarly, initialize the minimum Multixact to the first value that
+	 * could possibly be stored in tuples in the table.  Running transactions
+	 * could reuse values from their local cache, so we are careful to
+	 * consider all currently running multis.
+	 *
+	 * XXX this could be refined further, but is it worth the hassle?
+	 */
+	*minmulti = GetOldestMultiXactId();
+
+	RelationCreateStorage(rel->rd_node, persistence);
+
+	/*
+	 * If required, set up an init fork for an unlogged table so that it can
+	 * be correctly reinitialized on restart.  An immediate sync is required
+	 * even if the page has been logged, because the write did not go through
+	 * shared_buffers and therefore a concurrent checkpoint may have moved the
+	 * redo pointer past our xlog record.  Recovery may as well remove it
+	 * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
+	 * record. Therefore, logging is necessary even if wal_level=minimal.
+	 */
+	if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
+	{
+		Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
+			   rel->rd_rel->relkind == RELKIND_MATVIEW ||
+			   rel->rd_rel->relkind == RELKIND_TOASTVALUE);
+		RelationOpenSmgr(rel);
+		smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
+		log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
+		smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
+	}
+}
+
+static void
+heapam_relation_nontransactional_truncate(Relation rel)
+{
+	RelationTruncate(rel, 0);
+}
+
+
+static void
+heapam_relation_copy_data(Relation rel, RelFileNode newrnode)
+{
+	SMgrRelation dstrel;
+
+	dstrel = smgropen(newrnode, rel->rd_backend);
+	RelationOpenSmgr(rel);
+
+	/*
+	 * Create and copy all forks of the relation, and schedule unlinking of
+	 * old physical files.
+	 *
+	 * NOTE: any conflict in relfilenode value will be caught in
+	 * RelationCreateStorage().
+	 */
+	RelationCreateStorage(newrnode, rel->rd_rel->relpersistence);
+
+	/* copy main fork */
+	RelationCopyStorage(rel->rd_smgr, dstrel, MAIN_FORKNUM,
+						rel->rd_rel->relpersistence);
+
+	/* copy those extra forks that exist */
+	for (ForkNumber forkNum = MAIN_FORKNUM + 1;
+		 forkNum <= MAX_FORKNUM; forkNum++)
+	{
+		if (smgrexists(rel->rd_smgr, forkNum))
+		{
+			smgrcreate(dstrel, forkNum, false);
+
+			/*
+			 * WAL log creation if the relation is persistent, or this is the
+			 * init fork of an unlogged relation.
+			 */
+			if (rel->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT ||
+				(rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED &&
+				 forkNum == INIT_FORKNUM))
+				log_smgrcreate(&newrnode, forkNum);
+			RelationCopyStorage(rel->rd_smgr, dstrel, forkNum,
+								rel->rd_rel->relpersistence);
+		}
+	}
+
+
+	/* drop old relation, and close new one */
+	RelationDropStorage(rel);
+	smgrclose(dstrel);
+}
+
+
+static void
+heapam_copy_for_cluster(Relation OldHeap, Relation NewHeap, Relation OldIndex,
+						bool use_sort,
+						TransactionId OldestXmin, TransactionId FreezeXid,
+						MultiXactId MultiXactCutoff,
+						double *num_tuples, double *tups_vacuumed,
+						double *tups_recently_dead)
+{
+	RewriteState rwstate;
+	IndexScanDesc indexScan;
+	TableScanDesc heapScan;
+	bool		use_wal;
+	bool		is_system_catalog;
+	Tuplesortstate *tuplesort;
+	TupleDesc	oldTupDesc = RelationGetDescr(OldHeap);
+	TupleDesc	newTupDesc = RelationGetDescr(NewHeap);
+	TupleTableSlot *slot;
+	int			natts;
+	Datum	   *values;
+	bool	   *isnull;
+	BufferHeapTupleTableSlot *hslot;
+
+	/* Remember if it's a system catalog */
+	is_system_catalog = IsSystemRelation(OldHeap);
+
+	/*
+	 * We need to log the copied data in WAL iff WAL archiving/streaming is
+	 * enabled AND it's a WAL-logged rel.
+	 */
+	use_wal = XLogIsNeeded() && RelationNeedsWAL(NewHeap);
+
+	/* use_wal off requires smgr_targblock be initially invalid */
+	Assert(RelationGetTargetBlock(NewHeap) == InvalidBlockNumber);
+
+	/* Preallocate values/isnull arrays */
+	natts = newTupDesc->natts;
+	values = (Datum *) palloc(natts * sizeof(Datum));
+	isnull = (bool *) palloc(natts * sizeof(bool));
+
+	/* Initialize the rewrite operation */
+	rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin, FreezeXid,
+								 MultiXactCutoff, use_wal);
+
+
+	/* Set up sorting if wanted */
+	if (use_sort)
+		tuplesort = tuplesort_begin_cluster(oldTupDesc, OldIndex,
+											maintenance_work_mem,
+											NULL, false);
+	else
+		tuplesort = NULL;
+
+	/*
+	 * Prepare to scan the OldHeap.  To ensure we see recently-dead tuples
+	 * that still need to be copied, we scan with SnapshotAny and use
+	 * HeapTupleSatisfiesVacuum for the visibility test.
+	 */
+	if (OldIndex != NULL && !use_sort)
+	{
+		heapScan = NULL;
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, 0, 0);
+		index_rescan(indexScan, NULL, 0, NULL, 0);
+	}
+	else
+	{
+		heapScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		indexScan = NULL;
+	}
+
+	slot = table_slot_create(OldHeap, NULL);
+	hslot = (BufferHeapTupleTableSlot *) slot;
+
+	/*
+	 * Scan through the OldHeap, either in OldIndex order or sequentially;
+	 * copy each tuple into the NewHeap, or transiently to the tuplesort
+	 * module.  Note that we don't bother sorting dead tuples (they won't get
+	 * to the new table anyway).
+	 */
+	for (;;)
+	{
+		bool		isdead;
+		TransactionId xid;
+
+		CHECK_FOR_INTERRUPTS();
+
+		if (indexScan != NULL)
+		{
+			if (!index_getnext_slot(indexScan, ForwardScanDirection, slot))
+				break;
+
+			/* Since we used no scan keys, should never need to recheck */
+			if (indexScan->xs_recheck)
+				elog(ERROR, "CLUSTER does not support lossy index conditions");
+		}
+		else
+		{
+			if (!table_scan_getnextslot(heapScan, ForwardScanDirection, slot))
+				break;
+		}
+
+		LockBuffer(hslot->buffer, BUFFER_LOCK_SHARE);
+
+		switch (HeapTupleSatisfiesVacuum(hslot->base.tuple, OldestXmin, hslot->buffer))
+		{
+			case HEAPTUPLE_DEAD:
+				/* Definitely dead */
+				isdead = true;
+				break;
+			case HEAPTUPLE_RECENTLY_DEAD:
+				*tups_recently_dead += 1;
+				/* fall through */
+			case HEAPTUPLE_LIVE:
+				/* Live or recently dead, must copy it */
+				isdead = false;
+				break;
+			case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+				/*
+				 * Since we hold exclusive lock on the relation, normally the
+				 * only way to see this is if it was inserted earlier in our
+				 * own transaction.  However, it can happen in system
+				 * catalogs, since we tend to release write lock before commit
+				 * there.  Give a warning if neither case applies; but in any
+				 * case we had better copy it.
+				 */
+				xid = HeapTupleHeaderGetXmin(hslot->base.tuple->t_data);
+				if (!is_system_catalog && !TransactionIdIsCurrentTransactionId(xid))
+					elog(WARNING, "concurrent insert in progress within table \"%s\"",
+						 RelationGetRelationName(OldHeap));
+				/* treat as live */
+				isdead = false;
+				break;
+			case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+				/*
+				 * Similar situation to INSERT_IN_PROGRESS case.
+				 */
+				xid = HeapTupleHeaderGetUpdateXid(hslot->base.tuple->t_data);
+				if (!is_system_catalog && !TransactionIdIsCurrentTransactionId(xid))
+					elog(WARNING, "concurrent delete in progress within table \"%s\"",
+						 RelationGetRelationName(OldHeap));
+				/* treat as recently dead */
+				*tups_recently_dead += 1;
+				isdead = false;
+				break;
+			default:
+				elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+				isdead = false; /* keep compiler quiet */
+				break;
+		}
+
+		LockBuffer(hslot->buffer, BUFFER_LOCK_UNLOCK);
+
+		if (isdead)
+		{
+			*tups_vacuumed += 1;
+			/* heap rewrite module still needs to see it... */
+			if (rewrite_heap_dead_tuple(rwstate, ExecFetchSlotHeapTuple(slot, false, NULL)))
+			{
+				/* A previous recently-dead tuple is now known dead */
+				*tups_vacuumed += 1;
+				*tups_recently_dead -= 1;
+			}
+			continue;
+		}
+
+		*num_tuples += 1;
+		if (tuplesort != NULL)
+			tuplesort_puttupleslot(tuplesort, slot);
+		else
+			reform_and_rewrite_tuple(ExecFetchSlotHeapTuple(slot, false, NULL),
+									 OldHeap, NewHeap,
+									 values, isnull, rwstate);
+	}
+
+	if (indexScan != NULL)
+		index_endscan(indexScan);
+	if (heapScan != NULL)
+		table_endscan(heapScan);
+
+	ExecDropSingleTupleTableSlot(slot);
+
+	/*
+	 * In scan-and-sort mode, complete the sort, then read out all live tuples
+	 * from the tuplestore and write them to the new relation.
+	 */
+	if (tuplesort != NULL)
+	{
+		tuplesort_performsort(tuplesort);
+
+		for (;;)
+		{
+			HeapTuple	tuple;
+
+			CHECK_FOR_INTERRUPTS();
+
+			tuple = tuplesort_getheaptuple(tuplesort, true);
+			if (tuple == NULL)
+				break;
+
+			reform_and_rewrite_tuple(tuple,
+									 OldHeap, NewHeap,
+									 values, isnull, rwstate);
+		}
+
+		tuplesort_end(tuplesort);
+	}
+
+	/* Write out any remaining tuples, and fsync if needed */
+	end_heap_rewrite(rwstate);
+
+	/* Clean up */
+	pfree(values);
+	pfree(isnull);
+}
+
 /*
  * PBORKED: Update comment.
  *
@@ -1314,6 +1643,49 @@ heapam_index_validate_scan(Relation heapRelation,
 	indexInfo->ii_ExpressionsState = NIL;
 	indexInfo->ii_PredicateState = NULL;
 }
+
+/*
+ * Reconstruct and rewrite the given tuple
+ *
+ * We cannot simply copy the tuple as-is, for several reasons:
+ *
+ * 1. We'd like to squeeze out the values of any dropped columns, both
+ * to save space and to ensure we have no corner-case failures. (It's
+ * possible for example that the new table hasn't got a TOAST table
+ * and so is unable to store any large values of dropped cols.)
+ *
+ * 2. The tuple might not even be legal for the new table; this is
+ * currently only known to happen as an after-effect of ALTER TABLE
+ * SET WITHOUT OIDS.
+ *
+ * So, we must reconstruct the tuple from component Datums.
+ */
+static void
+reform_and_rewrite_tuple(HeapTuple tuple,
+						 Relation OldHeap, Relation NewHeap,
+						 Datum *values, bool *isnull, RewriteState rwstate)
+{
+	TupleDesc	oldTupDesc = RelationGetDescr(OldHeap);
+	TupleDesc	newTupDesc = RelationGetDescr(NewHeap);
+	HeapTuple	copiedTuple;
+	int			i;
+
+	heap_deform_tuple(tuple, oldTupDesc, values, isnull);
+
+	/* Be sure to null out any dropped columns */
+	for (i = 0; i < newTupDesc->natts; i++)
+	{
+		if (TupleDescAttr(newTupDesc, i)->attisdropped)
+			isnull[i] = true;
+	}
+
+	copiedTuple = heap_form_tuple(newTupDesc, values, isnull);
+
+	/* The heap rewrite module does the rest */
+	rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+
+	heap_freetuple(copiedTuple);
+}
 static const TableAmRoutine heapam_methods = {
 	.type = T_TableAmRoutine,
 
@@ -1346,6 +1718,10 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_get_latest_tid = heap_get_latest_tid,
 	.tuple_satisfies_snapshot = heapam_tuple_satisfies_snapshot,
 
+	.relation_set_new_filenode = heapam_set_new_filenode,
+	.relation_nontransactional_truncate = heapam_relation_nontransactional_truncate,
+	.relation_copy_data = heapam_relation_copy_data,
+	.relation_copy_for_cluster = heapam_copy_for_cluster,
 	.index_build_range_scan = heapam_index_build_range_scan,
 	.index_validate_scan = heapam_index_validate_scan,
 };
diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index fef6e7c3dc4..6d7e11645d2 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -209,6 +209,9 @@ Boot_CreateStmt:
 
 					if ($4)
 					{
+						TransactionId relfrozenxid;
+						MultiXactId relminmxid;
+
 						if (boot_reldesc)
 						{
 							elog(DEBUG4, "create bootstrap: warning, open relation exists, closing first");
@@ -226,7 +229,9 @@ Boot_CreateStmt:
 												   RELPERSISTENCE_PERMANENT,
 												   shared_relation,
 												   mapped_relation,
-												   true);
+												   true,
+												   &relfrozenxid,
+												   &relminmxid);
 						elog(DEBUG4, "bootstrap relation created");
 					}
 					else
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index c7b5ff62f9f..a0e8dc0f63f 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -35,6 +35,7 @@
 #include "access/relation.h"
 #include "access/sysattr.h"
 #include "access/table.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
@@ -98,6 +99,8 @@ static void AddNewRelationTuple(Relation pg_class_desc,
 					Oid reloftype,
 					Oid relowner,
 					char relkind,
+					TransactionId relfrozenxid,
+					TransactionId relminmxid,
 					Datum relacl,
 					Datum reloptions);
 static ObjectAddress AddNewRelationType(const char *typeName,
@@ -300,7 +303,9 @@ heap_create(const char *relname,
 			char relpersistence,
 			bool shared_relation,
 			bool mapped_relation,
-			bool allow_system_table_mods)
+			bool allow_system_table_mods,
+			TransactionId *relfrozenxid,
+			MultiXactId *relminmxid)
 {
 	bool		create_storage;
 	Relation	rel;
@@ -327,6 +332,9 @@ heap_create(const char *relname,
 						get_namespace_name(relnamespace), relname),
 				 errdetail("System catalog modifications are currently disallowed.")));
 
+	*relfrozenxid = InvalidTransactionId;
+	*relminmxid = InvalidMultiXactId;
+
 	/* Handle reltablespace for specific relkinds. */
 	switch (relkind)
 	{
@@ -400,13 +408,36 @@ heap_create(const char *relname,
 	/*
 	 * Have the storage manager create the relation's disk file, if needed.
 	 *
-	 * We only create the main fork here, other forks will be created on
-	 * demand.
+	 * For relations the callback creates both the main and the init fork, for
+	 * indexes only the main fork is created. The other forks will be created
+	 * on demand.
 	 */
 	if (create_storage)
 	{
 		RelationOpenSmgr(rel);
-		RelationCreateStorage(rel->rd_node, relpersistence);
+
+		switch (rel->rd_rel->relkind)
+		{
+			case RELKIND_VIEW:
+			case RELKIND_COMPOSITE_TYPE:
+			case RELKIND_FOREIGN_TABLE:
+			case RELKIND_PARTITIONED_TABLE:
+			case RELKIND_PARTITIONED_INDEX:
+				Assert(false);
+				break;
+
+			case RELKIND_INDEX:
+			case RELKIND_SEQUENCE:
+				RelationCreateStorage(rel->rd_node, relpersistence);
+				break;
+
+			case RELKIND_RELATION:
+			case RELKIND_TOASTVALUE:
+			case RELKIND_MATVIEW:
+				table_set_new_filenode(rel, relpersistence,
+									   relfrozenxid, relminmxid);
+				break;
+		}
 	}
 
 	return rel;
@@ -892,6 +923,8 @@ AddNewRelationTuple(Relation pg_class_desc,
 					Oid reloftype,
 					Oid relowner,
 					char relkind,
+					TransactionId relfrozenxid,
+					TransactionId relminmxid,
 					Datum relacl,
 					Datum reloptions)
 {
@@ -928,40 +961,8 @@ AddNewRelationTuple(Relation pg_class_desc,
 			break;
 	}
 
-	/* Initialize relfrozenxid and relminmxid */
-	if (relkind == RELKIND_RELATION ||
-		relkind == RELKIND_MATVIEW ||
-		relkind == RELKIND_TOASTVALUE)
-	{
-		/*
-		 * Initialize to the minimum XID that could put tuples in the table.
-		 * We know that no xacts older than RecentXmin are still running, so
-		 * that will do.
-		 */
-		new_rel_reltup->relfrozenxid = RecentXmin;
-
-		/*
-		 * Similarly, initialize the minimum Multixact to the first value that
-		 * could possibly be stored in tuples in the table.  Running
-		 * transactions could reuse values from their local cache, so we are
-		 * careful to consider all currently running multis.
-		 *
-		 * XXX this could be refined further, but is it worth the hassle?
-		 */
-		new_rel_reltup->relminmxid = GetOldestMultiXactId();
-	}
-	else
-	{
-		/*
-		 * Other relation types will not contain XIDs, so set relfrozenxid to
-		 * InvalidTransactionId.  (Note: a sequence does contain a tuple, but
-		 * we force its xmin to be FrozenTransactionId always; see
-		 * commands/sequence.c.)
-		 */
-		new_rel_reltup->relfrozenxid = InvalidTransactionId;
-		new_rel_reltup->relminmxid = InvalidMultiXactId;
-	}
-
+	new_rel_reltup->relfrozenxid = relfrozenxid;
+	new_rel_reltup->relminmxid = relminmxid;
 	new_rel_reltup->relowner = relowner;
 	new_rel_reltup->reltype = new_type_oid;
 	new_rel_reltup->reloftype = reloftype;
@@ -1089,6 +1090,8 @@ heap_create_with_catalog(const char *relname,
 	Oid			new_type_oid;
 	ObjectAddress new_type_addr;
 	Oid			new_array_oid = InvalidOid;
+	TransactionId relfrozenxid;
+	MultiXactId relminmxid;
 
 	pg_class_desc = table_open(RelationRelationId, RowExclusiveLock);
 
@@ -1220,7 +1223,9 @@ heap_create_with_catalog(const char *relname,
 							   relpersistence,
 							   shared_relation,
 							   mapped_relation,
-							   allow_system_table_mods);
+							   allow_system_table_mods,
+							   &relfrozenxid,
+							   &relminmxid);
 
 	Assert(relid == RelationGetRelid(new_rel_desc));
 
@@ -1319,6 +1324,8 @@ heap_create_with_catalog(const char *relname,
 						reloftypeid,
 						ownerid,
 						relkind,
+						relfrozenxid,
+						relminmxid,
 						PointerGetDatum(relacl),
 						reloptions);
 
@@ -1407,14 +1414,6 @@ heap_create_with_catalog(const char *relname,
 	if (oncommit != ONCOMMIT_NOOP)
 		register_on_commit_action(relid, oncommit);
 
-	/*
-	 * Unlogged objects need an init fork, except for partitioned tables which
-	 * have no storage at all.
-	 */
-	if (relpersistence == RELPERSISTENCE_UNLOGGED &&
-		relkind != RELKIND_PARTITIONED_TABLE)
-		heap_create_init_fork(new_rel_desc);
-
 	/*
 	 * ok, the relation has been cataloged, so close our relations and return
 	 * the OID of the newly created relation.
@@ -1425,27 +1424,6 @@ heap_create_with_catalog(const char *relname,
 	return relid;
 }
 
-/*
- * Set up an init fork for an unlogged table so that it can be correctly
- * reinitialized on restart.  An immediate sync is required even if the
- * page has been logged, because the write did not go through
- * shared_buffers and therefore a concurrent checkpoint may have moved
- * the redo pointer past our xlog record.  Recovery may as well remove it
- * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
- * record. Therefore, logging is necessary even if wal_level=minimal.
- */
-void
-heap_create_init_fork(Relation rel)
-{
-	Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
-		   rel->rd_rel->relkind == RELKIND_MATVIEW ||
-		   rel->rd_rel->relkind == RELKIND_TOASTVALUE);
-	RelationOpenSmgr(rel);
-	smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
-	log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
-	smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
-}
-
 /*
  *		RelationRemoveInheritance
  *
@@ -3176,8 +3154,8 @@ heap_truncate_one_rel(Relation rel)
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		return;
 
-	/* Truncate the actual file (and discard buffers) */
-	RelationTruncate(rel, 0);
+	/* Truncate the underlying relation */
+	table_nontransactional_truncate(rel);
 
 	/* If the relation has indexes, truncate the indexes too */
 	RelationTruncateIndexes(rel);
@@ -3188,7 +3166,7 @@ heap_truncate_one_rel(Relation rel)
 	{
 		Relation	toastrel = table_open(toastrelid, AccessExclusiveLock);
 
-		RelationTruncate(toastrel, 0);
+		table_nontransactional_truncate(toastrel);
 		RelationTruncateIndexes(toastrel);
 		/* keep the lock... */
 		table_close(toastrel, NoLock);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index f9ae483ab97..b738b4b750b 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -737,6 +737,8 @@ index_create(Relation heapRelation,
 	bool		concurrent = (flags & INDEX_CREATE_CONCURRENT) != 0;
 	bool		partitioned = (flags & INDEX_CREATE_PARTITIONED) != 0;
 	char		relkind;
+	TransactionId relfrozenxid;
+	MultiXactId relminmxid;
 
 	/* constraint flags can only be set when a constraint is requested */
 	Assert((constr_flags == 0) ||
@@ -897,8 +899,12 @@ index_create(Relation heapRelation,
 								relpersistence,
 								shared_relation,
 								mapped_relation,
-								allow_system_table_mods);
+								allow_system_table_mods,
+								&relfrozenxid,
+								&relminmxid);
 
+	Assert(relfrozenxid == InvalidTransactionId);
+	Assert(relminmxid == InvalidMultiXactId);
 	Assert(indexRelationId == RelationGetRelid(indexRelation));
 
 	/*
@@ -2854,8 +2860,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
 		}
 
 		/* We'll build a new physical relation for the index */
-		RelationSetNewRelfilenode(iRel, persistence, InvalidTransactionId,
-								  InvalidMultiXactId);
+		RelationSetNewRelfilenode(iRel, persistence);
 
 		/* Initialize the index and rebuild */
 		/* Note: we do not need to re-establish pkey setting */
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 0302507e6ff..72242b24761 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -19,6 +19,8 @@
 
 #include "postgres.h"
 
+#include "miscadmin.h"
+
 #include "access/visibilitymap.h"
 #include "access/xact.h"
 #include "access/xlog.h"
@@ -290,6 +292,92 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 	smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
 }
 
+/*
+ * Copy a fork's data, block by block.
+ */
+void
+RelationCopyStorage(SMgrRelation src, SMgrRelation dst,
+					ForkNumber forkNum, char relpersistence)
+{
+	PGAlignedBlock buf;
+	Page		page;
+	bool		use_wal;
+	bool		copying_initfork;
+	BlockNumber nblocks;
+	BlockNumber blkno;
+
+	page = (Page) buf.data;
+
+	/*
+	 * The init fork for an unlogged relation in many respects has to be
+	 * treated the same as normal relation, changes need to be WAL logged and
+	 * it needs to be synced to disk.
+	 */
+	copying_initfork = relpersistence == RELPERSISTENCE_UNLOGGED &&
+		forkNum == INIT_FORKNUM;
+
+	/*
+	 * We need to log the copied data in WAL iff WAL archiving/streaming is
+	 * enabled AND it's a permanent relation.
+	 */
+	use_wal = XLogIsNeeded() &&
+		(relpersistence == RELPERSISTENCE_PERMANENT || copying_initfork);
+
+	nblocks = smgrnblocks(src, forkNum);
+
+	for (blkno = 0; blkno < nblocks; blkno++)
+	{
+		/* If we got a cancel signal during the copy of the data, quit */
+		CHECK_FOR_INTERRUPTS();
+
+		smgrread(src, forkNum, blkno, buf.data);
+
+		if (!PageIsVerified(page, blkno))
+			ereport(ERROR,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("invalid page in block %u of relation %s",
+							blkno,
+							relpathbackend(src->smgr_rnode.node,
+										   src->smgr_rnode.backend,
+										   forkNum))));
+
+		/*
+		 * WAL-log the copied page. Unfortunately we don't know what kind of a
+		 * page this is, so we have to log the full page including any unused
+		 * space.
+		 */
+		if (use_wal)
+			log_newpage(&dst->smgr_rnode.node, forkNum, blkno, page, false);
+
+		PageSetChecksumInplace(page, blkno);
+
+		/*
+		 * Now write the page.  We say isTemp = true even if it's not a temp
+		 * rel, because there's no need for smgr to schedule an fsync for this
+		 * write; we'll do it ourselves below.
+		 */
+		smgrextend(dst, forkNum, blkno, buf.data, true);
+	}
+
+	/*
+	 * If the rel is WAL-logged, must fsync before commit.  We use heap_sync
+	 * to ensure that the toast table gets fsync'd too.  (For a temp or
+	 * unlogged rel we don't care since the data will be gone after a crash
+	 * anyway.)
+	 *
+	 * It's obvious that we must do this when not WAL-logging the copy. It's
+	 * less obvious that we have to do it even if we did WAL-log the copied
+	 * pages. The reason is that since we're copying outside shared buffers, a
+	 * CHECKPOINT occurring during the copy has no way to flush the previously
+	 * written data to disk (indeed it won't know the new rel even exists).  A
+	 * crash later on would replay WAL from the checkpoint, therefore it
+	 * wouldn't replay our earlier WAL entries. If we do not fsync those pages
+	 * here, they might still not be on disk when the crash occurs.
+	 */
+	if (relpersistence == RELPERSISTENCE_PERMANENT || copying_initfork)
+		smgrimmedsync(dst, forkNum);
+}
+
 /*
  *	smgrDoPendingDeletes() -- Take care of relation deletes at end of xact.
  *
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 3e2a807640f..4c672238d02 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -21,7 +21,6 @@
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/relscan.h"
-#include "access/rewriteheap.h"
 #include "access/tableam.h"
 #include "access/transam.h"
 #include "access/tuptoaster.h"
@@ -43,7 +42,6 @@
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
-#include "storage/smgr.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
 #include "utils/inval.h"
@@ -69,14 +67,10 @@ typedef struct
 
 
 static void rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose);
-static void copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
+static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 			   bool verbose, bool *pSwapToastByContent,
 			   TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
-static void reform_and_rewrite_tuple(HeapTuple tuple,
-						 TupleDesc oldTupDesc, TupleDesc newTupDesc,
-						 Datum *values, bool *isnull,
-						 RewriteState rwstate);
 
 
 /*---------------------------------------------------------------------------
@@ -598,7 +592,7 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 							   AccessExclusiveLock);
 
 	/* Copy the heap data into the new table in the desired order */
-	copy_heap_data(OIDNewHeap, tableOid, indexOid, verbose,
+	copy_table_data(OIDNewHeap, tableOid, indexOid, verbose,
 				   &swap_toast_by_content, &frozenXid, &cutoffMulti);
 
 	/*
@@ -741,7 +735,7 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
 }
 
 /*
- * Do the physical copying of heap data.
+ * Do the physical copying of table data.
  *
  * There are three output parameters:
  * *pSwapToastByContent is set true if toast tables must be swapped by content.
@@ -749,7 +743,7 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
  * *pCutoffMulti receives the MultiXactId used as a cutoff point.
  */
 static void
-copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
+copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 			   bool *pSwapToastByContent, TransactionId *pFreezeXid,
 			   MultiXactId *pCutoffMulti)
 {
@@ -759,30 +753,18 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 	Relation	relRelation;
 	HeapTuple	reltup;
 	Form_pg_class relform;
-	TupleDesc	oldTupDesc;
-	TupleDesc	newTupDesc;
-	int			natts;
-	Datum	   *values;
-	bool	   *isnull;
-	IndexScanDesc indexScan;
-	TableScanDesc tableScan;
-	HeapScanDesc heapScan;
-	bool		use_wal;
-	bool		is_system_catalog;
+	TupleDesc	oldTupDesc PG_USED_FOR_ASSERTS_ONLY;
+	TupleDesc	newTupDesc PG_USED_FOR_ASSERTS_ONLY;
 	TransactionId OldestXmin;
 	TransactionId FreezeXid;
 	MultiXactId MultiXactCutoff;
-	RewriteState rwstate;
 	bool		use_sort;
-	Tuplesortstate *tuplesort;
 	double		num_tuples = 0,
 				tups_vacuumed = 0,
 				tups_recently_dead = 0;
 	BlockNumber num_pages;
 	int			elevel = verbose ? INFO : DEBUG2;
 	PGRUsage	ru0;
-	TupleTableSlot *slot;
-	BufferHeapTupleTableSlot *hslot;
 
 	pg_rusage_init(&ru0);
 
@@ -804,11 +786,6 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 	newTupDesc = RelationGetDescr(NewHeap);
 	Assert(newTupDesc->natts == oldTupDesc->natts);
 
-	/* Preallocate values/isnull arrays */
-	natts = newTupDesc->natts;
-	values = (Datum *) palloc(natts * sizeof(Datum));
-	isnull = (bool *) palloc(natts * sizeof(bool));
-
 	/*
 	 * If the OldHeap has a toast table, get lock on the toast table to keep
 	 * it from being vacuumed.  This is needed because autovacuum processes
@@ -825,15 +802,6 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 	if (OldHeap->rd_rel->reltoastrelid)
 		LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
 
-	/*
-	 * We need to log the copied data in WAL iff WAL archiving/streaming is
-	 * enabled AND it's a WAL-logged rel.
-	 */
-	use_wal = XLogIsNeeded() && RelationNeedsWAL(NewHeap);
-
-	/* use_wal off requires smgr_targblock be initially invalid */
-	Assert(RelationGetTargetBlock(NewHeap) == InvalidBlockNumber);
-
 	/*
 	 * If both tables have TOAST tables, perform toast swap by content.  It is
 	 * possible that the old table has a toast table but the new one doesn't,
@@ -894,13 +862,6 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 	*pFreezeXid = FreezeXid;
 	*pCutoffMulti = MultiXactCutoff;
 
-	/* Remember if it's a system catalog */
-	is_system_catalog = IsSystemRelation(OldHeap);
-
-	/* Initialize the rewrite operation */
-	rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin, FreezeXid,
-								 MultiXactCutoff, use_wal);
-
 	/*
 	 * Decide whether to use an indexscan or seqscan-and-optional-sort to scan
 	 * the OldHeap.  We know how to use a sort to duplicate the ordering of a
@@ -913,44 +874,14 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 	else
 		use_sort = false;
 
-	/* Set up sorting if wanted */
-	if (use_sort)
-		tuplesort = tuplesort_begin_cluster(oldTupDesc, OldIndex,
-											maintenance_work_mem,
-											NULL, false);
-	else
-		tuplesort = NULL;
-
-	/*
-	 * Prepare to scan the OldHeap.  To ensure we see recently-dead tuples
-	 * that still need to be copied, we scan with SnapshotAny and use
-	 * HeapTupleSatisfiesVacuum for the visibility test.
-	 */
-	if (OldIndex != NULL && !use_sort)
-	{
-		tableScan = NULL;
-		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, 0, 0);
-		index_rescan(indexScan, NULL, 0, NULL, 0);
-	}
-	else
-	{
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
-		heapScan = (HeapScanDesc) tableScan;
-		indexScan = NULL;
-	}
-
-	slot = table_slot_create(OldHeap, NULL);
-	hslot = (BufferHeapTupleTableSlot *) slot;
-
 	/* Log what we're doing */
-	if (indexScan != NULL)
+	if (OldIndex != NULL && !use_sort)
 		ereport(elevel,
 				(errmsg("clustering \"%s.%s\" using index scan on \"%s\"",
 						get_namespace_name(RelationGetNamespace(OldHeap)),
 						RelationGetRelationName(OldHeap),
 						RelationGetRelationName(OldIndex))));
-	else if (tuplesort != NULL)
+	else if (use_sort)
 		ereport(elevel,
 				(errmsg("clustering \"%s.%s\" using sequential scan and sort",
 						get_namespace_name(RelationGetNamespace(OldHeap)),
@@ -962,152 +893,12 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 						RelationGetRelationName(OldHeap))));
 
 	/*
-	 * Scan through the OldHeap, either in OldIndex order or sequentially;
-	 * copy each tuple into the NewHeap, or transiently to the tuplesort
-	 * module.  Note that we don't bother sorting dead tuples (they won't get
-	 * to the new table anyway).
+	 * Hand of the actual copying to AM specific function, the generic code
+	 * cannot know how to deal with visibility across AMs.
 	 */
-	for (;;)
-	{
-		HeapTuple	tuple;
-		Buffer		buf;
-		bool		isdead;
-
-		CHECK_FOR_INTERRUPTS();
-
-		if (indexScan != NULL)
-		{
-			if (!index_getnext_slot(indexScan, ForwardScanDirection, slot))
-				break;
-
-			/* Since we used no scan keys, should never need to recheck */
-			if (indexScan->xs_recheck)
-				elog(ERROR, "CLUSTER does not support lossy index conditions");
-
-			tuple = hslot->base.tuple;
-			buf = hslot->buffer;
-		}
-		else
-		{
-			tuple = heap_getnext(tableScan, ForwardScanDirection);
-			if (tuple == NULL)
-				break;
-
-			buf = heapScan->rs_cbuf;
-		}
-
-		LockBuffer(buf, BUFFER_LOCK_SHARE);
-
-		switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
-		{
-			case HEAPTUPLE_DEAD:
-				/* Definitely dead */
-				isdead = true;
-				break;
-			case HEAPTUPLE_RECENTLY_DEAD:
-				tups_recently_dead += 1;
-				/* fall through */
-			case HEAPTUPLE_LIVE:
-				/* Live or recently dead, must copy it */
-				isdead = false;
-				break;
-			case HEAPTUPLE_INSERT_IN_PROGRESS:
-
-				/*
-				 * Since we hold exclusive lock on the relation, normally the
-				 * only way to see this is if it was inserted earlier in our
-				 * own transaction.  However, it can happen in system
-				 * catalogs, since we tend to release write lock before commit
-				 * there.  Give a warning if neither case applies; but in any
-				 * case we had better copy it.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
-					elog(WARNING, "concurrent insert in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as live */
-				isdead = false;
-				break;
-			case HEAPTUPLE_DELETE_IN_PROGRESS:
-
-				/*
-				 * Similar situation to INSERT_IN_PROGRESS case.
-				 */
-				if (!is_system_catalog &&
-					!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
-					elog(WARNING, "concurrent delete in progress within table \"%s\"",
-						 RelationGetRelationName(OldHeap));
-				/* treat as recently dead */
-				tups_recently_dead += 1;
-				isdead = false;
-				break;
-			default:
-				elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
-				isdead = false; /* keep compiler quiet */
-				break;
-		}
-
-		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-
-		if (isdead)
-		{
-			tups_vacuumed += 1;
-			/* heap rewrite module still needs to see it... */
-			if (rewrite_heap_dead_tuple(rwstate, tuple))
-			{
-				/* A previous recently-dead tuple is now known dead */
-				tups_vacuumed += 1;
-				tups_recently_dead -= 1;
-			}
-			continue;
-		}
-
-		num_tuples += 1;
-		if (tuplesort != NULL)
-			tuplesort_putheaptuple(tuplesort, tuple);
-		else
-			reform_and_rewrite_tuple(tuple,
-									 oldTupDesc, newTupDesc,
-									 values, isnull,
-									 rwstate);
-	}
-
-	if (indexScan != NULL)
-		index_endscan(indexScan);
-	if (heapScan != NULL)
-		table_endscan(tableScan);
-	if (slot)
-		ExecDropSingleTupleTableSlot(slot);
-
-	/*
-	 * In scan-and-sort mode, complete the sort, then read out all live tuples
-	 * from the tuplestore and write them to the new relation.
-	 */
-	if (tuplesort != NULL)
-	{
-		tuplesort_performsort(tuplesort);
-
-		for (;;)
-		{
-			HeapTuple	tuple;
-
-			CHECK_FOR_INTERRUPTS();
-
-			tuple = tuplesort_getheaptuple(tuplesort, true);
-			if (tuple == NULL)
-				break;
-
-			reform_and_rewrite_tuple(tuple,
-									 oldTupDesc, newTupDesc,
-									 values, isnull,
-									 rwstate);
-		}
-
-		tuplesort_end(tuplesort);
-	}
-
-	/* Write out any remaining tuples, and fsync if needed */
-	end_heap_rewrite(rwstate);
+	table_copy_for_cluster(OldHeap, NewHeap, OldIndex, use_sort,
+						   OldestXmin, FreezeXid, MultiXactCutoff,
+						   &num_tuples, &tups_vacuumed, &tups_recently_dead);
 
 	/* Reset rd_toastoid just to be tidy --- it shouldn't be looked at again */
 	NewHeap->rd_toastoid = InvalidOid;
@@ -1125,10 +916,6 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 					   tups_recently_dead,
 					   pg_rusage_show(&ru0))));
 
-	/* Clean up */
-	pfree(values);
-	pfree(isnull);
-
 	if (OldIndex != NULL)
 		index_close(OldIndex, NoLock);
 	table_close(OldHeap, NoLock);
@@ -1751,46 +1538,3 @@ get_tables_to_cluster(MemoryContext cluster_context)
 
 	return rvs;
 }
-
-
-/*
- * Reconstruct and rewrite the given tuple
- *
- * We cannot simply copy the tuple as-is, for several reasons:
- *
- * 1. We'd like to squeeze out the values of any dropped columns, both
- * to save space and to ensure we have no corner-case failures. (It's
- * possible for example that the new table hasn't got a TOAST table
- * and so is unable to store any large values of dropped cols.)
- *
- * 2. The tuple might not even be legal for the new table; this is
- * currently only known to happen as an after-effect of ALTER TABLE
- * SET WITHOUT OIDS (in an older version, via pg_upgrade).
- *
- * So, we must reconstruct the tuple from component Datums.
- */
-static void
-reform_and_rewrite_tuple(HeapTuple tuple,
-						 TupleDesc oldTupDesc, TupleDesc newTupDesc,
-						 Datum *values, bool *isnull,
-						 RewriteState rwstate)
-{
-	HeapTuple	copiedTuple;
-	int			i;
-
-	heap_deform_tuple(tuple, oldTupDesc, values, isnull);
-
-	/* Be sure to null out any dropped columns */
-	for (i = 0; i < newTupDesc->natts; i++)
-	{
-		if (TupleDescAttr(newTupDesc, i)->attisdropped)
-			isnull[i] = true;
-	}
-
-	copiedTuple = heap_form_tuple(newTupDesc, values, isnull);
-
-	/* The heap rewrite module does the rest */
-	rewrite_heap_tuple(rwstate, tuple, copiedTuple);
-
-	heap_freetuple(copiedTuple);
-}
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 574b46a2812..e9add1b9873 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -312,12 +312,17 @@ ResetSequence(Oid seq_relid)
 	seq->log_cnt = 0;
 
 	/*
-	 * Create a new storage file for the sequence.  We want to keep the
-	 * sequence's relfrozenxid at 0, since it won't contain any unfrozen XIDs.
-	 * Same with relminmxid, since a sequence will never contain multixacts.
+	 * Create a new storage file for the sequence.
 	 */
-	RelationSetNewRelfilenode(seq_rel, seq_rel->rd_rel->relpersistence,
-							  InvalidTransactionId, InvalidMultiXactId);
+	RelationSetNewRelfilenode(seq_rel, seq_rel->rd_rel->relpersistence);
+
+	/*
+	 * Ensure sequence's relfrozenxid is at 0, since it won't contain any
+	 * unfrozen XIDs.  Same with relminmxid, since a sequence will never
+	 * contain multixacts.
+	 */
+	Assert(seq_rel->rd_rel->relfrozenxid == InvalidTransactionId);
+	Assert(seq_rel->rd_rel->relminmxid == InvalidMultiXactId);
 
 	/*
 	 * Insert the modified tuple into the new storage file.
@@ -482,12 +487,17 @@ AlterSequence(ParseState *pstate, AlterSeqStmt *stmt)
 
 		/*
 		 * Create a new storage file for the sequence, making the state
-		 * changes transactional.  We want to keep the sequence's relfrozenxid
-		 * at 0, since it won't contain any unfrozen XIDs.  Same with
-		 * relminmxid, since a sequence will never contain multixacts.
+		 * changes transactional.
 		 */
-		RelationSetNewRelfilenode(seqrel, seqrel->rd_rel->relpersistence,
-								  InvalidTransactionId, InvalidMultiXactId);
+		RelationSetNewRelfilenode(seqrel, seqrel->rd_rel->relpersistence);
+
+		/*
+		 * Ensure sequence's relfrozenxid is at 0, since it won't contain any
+		 * unfrozen XIDs.  Same with relminmxid, since a sequence will never
+		 * contain multixacts.
+		 */
+		Assert(seqrel->rd_rel->relfrozenxid == InvalidTransactionId);
+		Assert(seqrel->rd_rel->relminmxid == InvalidMultiXactId);
 
 		/*
 		 * Insert the modified tuple into the new storage file.
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1f5a7e93155..8419342c690 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -20,6 +20,7 @@
 #include "access/multixact.h"
 #include "access/reloptions.h"
 #include "access/relscan.h"
+#include "access/tableam.h"
 #include "access/sysattr.h"
 #include "access/tableam.h"
 #include "access/tupconvert.h"
@@ -468,8 +469,7 @@ static void ATExecEnableRowSecurity(Relation rel);
 static void ATExecDisableRowSecurity(Relation rel);
 static void ATExecForceNoForceRowSecurity(Relation rel, bool force_rls);
 
-static void copy_relation_data(SMgrRelation rel, SMgrRelation dst,
-				   ForkNumber forkNum, char relpersistence);
+static void index_copy_data(Relation rel, RelFileNode newrnode);
 static const char *storage_name(char c);
 
 static void RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid,
@@ -1692,7 +1692,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 		{
 			Oid			heap_relid;
 			Oid			toast_relid;
-			MultiXactId minmulti;
 
 			/*
 			 * This effectively deletes all rows in the table, and may be done
@@ -1702,8 +1701,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 			 */
 			CheckTableForSerializableConflictIn(rel);
 
-			minmulti = GetOldestMultiXactId();
-
 			/*
 			 * Need the full transaction-safe pushups.
 			 *
@@ -1711,10 +1708,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 			 * as the relfilenode value. The old storage file is scheduled for
 			 * deletion at commit.
 			 */
-			RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
-									  RecentXmin, minmulti);
-			if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
-				heap_create_init_fork(rel);
+			RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence);
 
 			heap_relid = RelationGetRelid(rel);
 
@@ -1726,12 +1720,8 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 			{
 				Relation	toastrel = relation_open(toast_relid,
 													 AccessExclusiveLock);
-
 				RelationSetNewRelfilenode(toastrel,
-										  toastrel->rd_rel->relpersistence,
-										  RecentXmin, minmulti);
-				if (toastrel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
-					heap_create_init_fork(toastrel);
+										  toastrel->rd_rel->relpersistence);
 				table_close(toastrel, NoLock);
 			}
 
@@ -4922,13 +4912,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 
 			/* Write the tuple out to the new relation */
 			if (newrel)
-			{
-				HeapTuple	tuple;
-
-				tuple = ExecFetchSlotHeapTuple(newslot, true, NULL);
-				heap_insert(newrel, tuple, mycid, hi_options, bistate);
-				ItemPointerCopy(&tuple->t_self, &newslot->tts_tid);
-			}
+				table_insert(newrel, insertslot, mycid, hi_options, bistate);
 
 			ResetExprContext(econtext);
 
@@ -11379,11 +11363,9 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 	Oid			reltoastrelid;
 	Oid			newrelfilenode;
 	RelFileNode newrnode;
-	SMgrRelation dstrel;
 	Relation	pg_class;
 	HeapTuple	tuple;
 	Form_pg_class rd_rel;
-	ForkNumber	forkNum;
 	List	   *reltoastidxids = NIL;
 	ListCell   *lc;
 
@@ -11468,46 +11450,20 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 	newrnode = rel->rd_node;
 	newrnode.relNode = newrelfilenode;
 	newrnode.spcNode = newTableSpace;
-	dstrel = smgropen(newrnode, rel->rd_backend);
 
-	RelationOpenSmgr(rel);
-
-	/*
-	 * Create and copy all forks of the relation, and schedule unlinking of
-	 * old physical files.
-	 *
-	 * NOTE: any conflict in relfilenode value will be caught in
-	 * RelationCreateStorage().
-	 */
-	RelationCreateStorage(newrnode, rel->rd_rel->relpersistence);
-
-	/* copy main fork */
-	copy_relation_data(rel->rd_smgr, dstrel, MAIN_FORKNUM,
-					   rel->rd_rel->relpersistence);
-
-	/* copy those extra forks that exist */
-	for (forkNum = MAIN_FORKNUM + 1; forkNum <= MAX_FORKNUM; forkNum++)
+	/* hand off to AM to actually create the new filenode and copy the data */
+	if (rel->rd_rel->relkind == RELKIND_INDEX)
 	{
-		if (smgrexists(rel->rd_smgr, forkNum))
-		{
-			smgrcreate(dstrel, forkNum, false);
-
-			/*
-			 * WAL log creation if the relation is persistent, or this is the
-			 * init fork of an unlogged relation.
-			 */
-			if (rel->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT ||
-				(rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED &&
-				 forkNum == INIT_FORKNUM))
-				log_smgrcreate(&newrnode, forkNum);
-			copy_relation_data(rel->rd_smgr, dstrel, forkNum,
-							   rel->rd_rel->relpersistence);
-		}
+		index_copy_data(rel, newrnode);
+	}
+	else
+	{
+		Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
+			   rel->rd_rel->relkind == RELKIND_MATVIEW ||
+			   rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ||
+			   rel->rd_rel->relkind == RELKIND_TOASTVALUE);
+		table_relation_copy_data(rel, newrnode);
 	}
-
-	/* drop old relation, and close new one */
-	RelationDropStorage(rel);
-	smgrclose(dstrel);
 
 	/* update the pg_class row */
 	rd_rel->reltablespace = (newTableSpace == MyDatabaseTableSpace) ? InvalidOid : newTableSpace;
@@ -11769,90 +11725,52 @@ AlterTableMoveAll(AlterTableMoveAllStmt *stmt)
 	return new_tablespaceoid;
 }
 
-/*
- * Copy data, block by block
- */
 static void
-copy_relation_data(SMgrRelation src, SMgrRelation dst,
-				   ForkNumber forkNum, char relpersistence)
+index_copy_data(Relation rel, RelFileNode newrnode)
 {
-	PGAlignedBlock buf;
-	Page		page;
-	bool		use_wal;
-	bool		copying_initfork;
-	BlockNumber nblocks;
-	BlockNumber blkno;
+	SMgrRelation dstrel;
 
-	page = (Page) buf.data;
+	dstrel = smgropen(newrnode, rel->rd_backend);
+	RelationOpenSmgr(rel);
 
 	/*
-	 * The init fork for an unlogged relation in many respects has to be
-	 * treated the same as normal relation, changes need to be WAL logged and
-	 * it needs to be synced to disk.
+	 * Create and copy all forks of the relation, and schedule unlinking of
+	 * old physical files.
+	 *
+	 * NOTE: any conflict in relfilenode value will be caught in
+	 * RelationCreateStorage().
 	 */
-	copying_initfork = relpersistence == RELPERSISTENCE_UNLOGGED &&
-		forkNum == INIT_FORKNUM;
+	RelationCreateStorage(newrnode, rel->rd_rel->relpersistence);
 
-	/*
-	 * We need to log the copied data in WAL iff WAL archiving/streaming is
-	 * enabled AND it's a permanent relation.
-	 */
-	use_wal = XLogIsNeeded() &&
-		(relpersistence == RELPERSISTENCE_PERMANENT || copying_initfork);
+	/* copy main fork */
+	RelationCopyStorage(rel->rd_smgr, dstrel, MAIN_FORKNUM,
+						rel->rd_rel->relpersistence);
 
-	nblocks = smgrnblocks(src, forkNum);
-
-	for (blkno = 0; blkno < nblocks; blkno++)
+	/* copy those extra forks that exist */
+	for (ForkNumber forkNum = MAIN_FORKNUM + 1;
+		 forkNum <= MAX_FORKNUM; forkNum++)
 	{
-		/* If we got a cancel signal during the copy of the data, quit */
-		CHECK_FOR_INTERRUPTS();
+		if (smgrexists(rel->rd_smgr, forkNum))
+		{
+			smgrcreate(dstrel, forkNum, false);
 
-		smgrread(src, forkNum, blkno, buf.data);
-
-		if (!PageIsVerified(page, blkno))
-			ereport(ERROR,
-					(errcode(ERRCODE_DATA_CORRUPTED),
-					 errmsg("invalid page in block %u of relation %s",
-							blkno,
-							relpathbackend(src->smgr_rnode.node,
-										   src->smgr_rnode.backend,
-										   forkNum))));
-
-		/*
-		 * WAL-log the copied page. Unfortunately we don't know what kind of a
-		 * page this is, so we have to log the full page including any unused
-		 * space.
-		 */
-		if (use_wal)
-			log_newpage(&dst->smgr_rnode.node, forkNum, blkno, page, false);
-
-		PageSetChecksumInplace(page, blkno);
-
-		/*
-		 * Now write the page.  We say isTemp = true even if it's not a temp
-		 * rel, because there's no need for smgr to schedule an fsync for this
-		 * write; we'll do it ourselves below.
-		 */
-		smgrextend(dst, forkNum, blkno, buf.data, true);
+			/*
+			 * WAL log creation if the relation is persistent, or this is the
+			 * init fork of an unlogged relation.
+			 */
+			if (rel->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT ||
+				(rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED &&
+				 forkNum == INIT_FORKNUM))
+				log_smgrcreate(&newrnode, forkNum);
+			RelationCopyStorage(rel->rd_smgr, dstrel, forkNum,
+								rel->rd_rel->relpersistence);
+		}
 	}
 
-	/*
-	 * If the rel is WAL-logged, must fsync before commit.  We use heap_sync
-	 * to ensure that the toast table gets fsync'd too.  (For a temp or
-	 * unlogged rel we don't care since the data will be gone after a crash
-	 * anyway.)
-	 *
-	 * It's obvious that we must do this when not WAL-logging the copy. It's
-	 * less obvious that we have to do it even if we did WAL-log the copied
-	 * pages. The reason is that since we're copying outside shared buffers, a
-	 * CHECKPOINT occurring during the copy has no way to flush the previously
-	 * written data to disk (indeed it won't know the new rel even exists).  A
-	 * crash later on would replay WAL from the checkpoint, therefore it
-	 * wouldn't replay our earlier WAL entries. If we do not fsync those pages
-	 * here, they might still not be on disk when the crash occurs.
-	 */
-	if (relpersistence == RELPERSISTENCE_PERMANENT || copying_initfork)
-		smgrimmedsync(dst, forkNum);
+
+	/* drop old relation, and close new one */
+	RelationDropStorage(rel);
+	smgrclose(dstrel);
 }
 
 /*
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index d9ffb784843..97878ca692a 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3375,31 +3375,16 @@ RelationBuildLocalRelation(const char *relname,
  * such as TRUNCATE or rebuilding an index from scratch.
  *
  * Caller must already hold exclusive lock on the relation.
- *
- * The relation is marked with relfrozenxid = freezeXid (InvalidTransactionId
- * must be passed for indexes and sequences).  This should be a lower bound on
- * the XIDs that will be put into the new relation contents.
- *
- * The new filenode's persistence is set to the given value.  This is useful
- * for the cases that are changing the relation's persistence; other callers
- * need to pass the original relpersistence value.
  */
 void
-RelationSetNewRelfilenode(Relation relation, char persistence,
-						  TransactionId freezeXid, MultiXactId minmulti)
+RelationSetNewRelfilenode(Relation relation, char persistence)
 {
 	Oid			newrelfilenode;
-	RelFileNodeBackend newrnode;
 	Relation	pg_class;
 	HeapTuple	tuple;
 	Form_pg_class classform;
-
-	/* Indexes, sequences must have Invalid frozenxid; other rels must not */
-	Assert((relation->rd_rel->relkind == RELKIND_INDEX ||
-			relation->rd_rel->relkind == RELKIND_SEQUENCE) ?
-		   freezeXid == InvalidTransactionId :
-		   TransactionIdIsNormal(freezeXid));
-	Assert(TransactionIdIsNormal(freezeXid) == MultiXactIdIsValid(minmulti));
+	MultiXactId minmulti = InvalidMultiXactId;
+	TransactionId freezeXid = InvalidTransactionId;
 
 	/* Allocate a new relfilenode */
 	newrelfilenode = GetNewRelFileNode(relation->rd_rel->reltablespace, NULL,
@@ -3417,18 +3402,6 @@ RelationSetNewRelfilenode(Relation relation, char persistence,
 			 RelationGetRelid(relation));
 	classform = (Form_pg_class) GETSTRUCT(tuple);
 
-	/*
-	 * Create storage for the main fork of the new relfilenode.
-	 *
-	 * NOTE: any conflict in relfilenode value will be caught here, if
-	 * GetNewRelFileNode messes up for any reason.
-	 */
-	newrnode.node = relation->rd_node;
-	newrnode.node.relNode = newrelfilenode;
-	newrnode.backend = relation->rd_backend;
-	RelationCreateStorage(newrnode.node, persistence);
-	smgrclosenode(newrnode);
-
 	/*
 	 * Schedule unlinking of the old storage at transaction commit.
 	 */
@@ -3443,9 +3416,51 @@ RelationSetNewRelfilenode(Relation relation, char persistence,
 		RelationMapUpdateMap(RelationGetRelid(relation),
 							 newrelfilenode,
 							 relation->rd_rel->relisshared,
-							 false);
+							 true);
 	else
+	{
+		relation->rd_rel->relfilenode = newrelfilenode;
 		classform->relfilenode = newrelfilenode;
+	}
+
+	RelationInitPhysicalAddr(relation);
+
+	/*
+	 * Create storage for the main fork of the new relfilenode. If it's
+	 * table-like object, call into table AM to do so, which'll also create
+	 * the table's init fork.
+	 *
+	 * NOTE: any conflict in relfilenode value will be caught here, if
+	 * GetNewRelFileNode messes up for any reason.
+	 */
+
+	/*
+	 * Create storage for relation.
+	 */
+	switch (relation->rd_rel->relkind)
+	{
+		/* shouldn't be called for these */
+		case RELKIND_VIEW:
+		case RELKIND_COMPOSITE_TYPE:
+		case RELKIND_FOREIGN_TABLE:
+		case RELKIND_PARTITIONED_TABLE:
+		case RELKIND_PARTITIONED_INDEX:
+			elog(ERROR, "should not have storage");
+			break;
+
+		case RELKIND_INDEX:
+		case RELKIND_SEQUENCE:
+			RelationCreateStorage(relation->rd_node, persistence);
+			RelationOpenSmgr(relation);
+			break;
+
+		case RELKIND_RELATION:
+		case RELKIND_TOASTVALUE:
+		case RELKIND_MATVIEW:
+			table_set_new_filenode(relation, persistence,
+								   &freezeXid, &minmulti);
+			break;
+	}
 
 	/* These changes are safe even for a mapped relation */
 	if (relation->rd_rel->relkind != RELKIND_SEQUENCE)
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 7b10fd2974c..60b96df8f98 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -3818,12 +3818,13 @@ comparetup_cluster(const SortTuple *a, const SortTuple *b,
 static void
 copytup_cluster(Tuplesortstate *state, SortTuple *stup, void *tup)
 {
-	HeapTuple	tuple = (HeapTuple) tup;
+	TupleTableSlot *slot = (TupleTableSlot *) tup;
+	HeapTuple	tuple;
 	Datum		original;
 	MemoryContext oldcontext = MemoryContextSwitchTo(state->tuplecontext);
 
 	/* copy the tuple into sort storage */
-	tuple = heap_copytuple(tuple);
+	tuple = ExecCopySlotHeapTuple(slot);
 	stup->tuple = (void *) tuple;
 	USEMEM(state, GetMemoryChunkSpace(tuple));
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 044d6b04e86..a0cbea9ba00 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -82,6 +82,9 @@ typedef struct IndexFetchHeapData
 	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
 } IndexFetchHeapData;
 
+/* struct definition is private to rewriteheap.c */
+typedef struct RewriteStateData *RewriteState;
+
 /* Result codes for HeapTupleSatisfiesVacuum */
 typedef enum
 {
@@ -224,4 +227,13 @@ extern bool ResolveCminCmaxDuringDecoding(struct HTAB *tuplecid_data,
 							  Buffer buffer,
 							  CommandId *cmin, CommandId *cmax);
 
+/* in heap/rewriteheap.c */
+extern RewriteState begin_heap_rewrite(Relation OldHeap, Relation NewHeap,
+				   TransactionId OldestXmin, TransactionId FreezeXid,
+				   MultiXactId MultiXactCutoff, bool use_wal);
+extern void end_heap_rewrite(RewriteState state);
+extern void rewrite_heap_tuple(RewriteState state, HeapTuple oldTuple,
+				   HeapTuple newTuple);
+extern bool rewrite_heap_dead_tuple(RewriteState state, HeapTuple oldTuple);
+
 #endif							/* HEAPAM_H */
diff --git a/src/include/access/rewriteheap.h b/src/include/access/rewriteheap.h
index 6006249d962..6c4ebccb9bd 100644
--- a/src/include/access/rewriteheap.h
+++ b/src/include/access/rewriteheap.h
@@ -18,17 +18,6 @@
 #include "storage/relfilenode.h"
 #include "utils/relcache.h"
 
-/* struct definition is private to rewriteheap.c */
-typedef struct RewriteStateData *RewriteState;
-
-extern RewriteState begin_heap_rewrite(Relation OldHeap, Relation NewHeap,
-				   TransactionId OldestXmin, TransactionId FreezeXid,
-				   MultiXactId MultiXactCutoff, bool use_wal);
-extern void end_heap_rewrite(RewriteState state);
-extern void rewrite_heap_tuple(RewriteState state, HeapTuple oldTuple,
-				   HeapTuple newTuple);
-extern bool rewrite_heap_dead_tuple(RewriteState state, HeapTuple oldTuple);
-
 /*
  * On-Disk data format for an individual logical rewrite mapping.
  */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index bd2cdd34e08..125ed1c012a 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -302,6 +302,16 @@ typedef struct TableAmRoutine
 	 * ------------------------------------------------------------------------
 	 */
 
+	void		(*relation_set_new_filenode) (Relation rel,
+											  char persistence,
+											  TransactionId *freezeXid,
+											  MultiXactId *minmulti);
+	void		(*relation_nontransactional_truncate) (Relation rel);
+	void		(*relation_copy_data) (Relation rel, RelFileNode newrnode);
+	void		(*relation_copy_for_cluster) (Relation NewHeap, Relation OldHeap, Relation OldIndex,
+											  bool use_sort,
+											  TransactionId OldestXmin, TransactionId FreezeXid, MultiXactId MultiXactCutoff,
+											  double *num_tuples, double *tups_vacuumed, double *tups_recently_dead);
 	double		(*index_build_range_scan) (Relation heap_rel,
 										   Relation index_rel,
 										   IndexInfo *index_nfo,
@@ -735,6 +745,40 @@ table_finish_bulk_insert(Relation rel, int options)
  * ------------------------------------------------------------------------
  */
 
+static inline void
+table_set_new_filenode(Relation rel, char persistence,
+					   TransactionId *freezeXid, MultiXactId *minmulti)
+{
+	rel->rd_tableam->relation_set_new_filenode(rel, persistence,
+											   freezeXid, minmulti);
+}
+
+static inline void
+table_nontransactional_truncate(Relation rel)
+{
+	rel->rd_tableam->relation_nontransactional_truncate(rel);
+}
+
+static inline void
+table_relation_copy_data(Relation rel, RelFileNode newrnode)
+{
+	rel->rd_tableam->relation_copy_data(rel, newrnode);
+}
+
+
+/* XXX: Move arguments to struct? */
+static inline void
+table_copy_for_cluster(Relation OldHeap, Relation NewHeap, Relation OldIndex,
+					   bool use_sort,
+					   TransactionId OldestXmin, TransactionId FreezeXid, MultiXactId MultiXactCutoff,
+					   double *num_tuples, double *tups_vacuumed, double *tups_recently_dead)
+{
+	OldHeap->rd_tableam->relation_copy_for_cluster(OldHeap, NewHeap, OldIndex,
+												   use_sort,
+												   OldestXmin, FreezeXid, MultiXactCutoff,
+												   num_tuples, tups_vacuumed, tups_recently_dead);
+}
+
 static inline double
 table_index_build_scan(Relation heap_rel,
 					   Relation index_rel,
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 85076d07437..f58d74edca1 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -55,7 +55,9 @@ extern Relation heap_create(const char *relname,
 			char relpersistence,
 			bool shared_relation,
 			bool mapped_relation,
-			bool allow_system_table_mods);
+			bool allow_system_table_mods,
+			TransactionId *relfrozenxid,
+			MultiXactId *relminmxid);
 
 extern Oid heap_create_with_catalog(const char *relname,
 						 Oid relnamespace,
@@ -79,8 +81,6 @@ extern Oid heap_create_with_catalog(const char *relname,
 						 Oid relrewrite,
 						 ObjectAddress *typaddress);
 
-extern void heap_create_init_fork(Relation rel);
-
 extern void heap_drop_with_catalog(Oid relid);
 
 extern void heap_truncate(List *relids);
diff --git a/src/include/catalog/storage.h b/src/include/catalog/storage.h
index 9f638be9249..882dc65c893 100644
--- a/src/include/catalog/storage.h
+++ b/src/include/catalog/storage.h
@@ -16,12 +16,15 @@
 
 #include "storage/block.h"
 #include "storage/relfilenode.h"
+#include "storage/smgr.h"
 #include "utils/relcache.h"
 
 extern void RelationCreateStorage(RelFileNode rnode, char relpersistence);
 extern void RelationDropStorage(Relation rel);
 extern void RelationPreserveStorage(RelFileNode rnode, bool atCommit);
 extern void RelationTruncate(Relation rel, BlockNumber nblocks);
+extern void RelationCopyStorage(SMgrRelation src, SMgrRelation dst,
+								ForkNumber forkNum, char relpersistence);
 
 /*
  * These functions used to be in storage/smgr/smgr.c, which explains the
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 8f5bd676498..809d6aa1236 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -110,8 +110,7 @@ extern Relation RelationBuildLocalRelation(const char *relname,
 /*
  * Routine to manage assignment of new relfilenode to a relation
  */
-extern void RelationSetNewRelfilenode(Relation relation, char persistence,
-						  TransactionId freezeXid, MultiXactId minmulti);
+extern void RelationSetNewRelfilenode(Relation relation, char persistence);
 
 /*
  * Routines for flushing/rebuilding relcache entries in various scenarios
-- 
2.21.0.dirty

v18-0012-tableam-VACUUM-and-ANALYZE.patchtext/x-diff; charset=us-asciiDownload
From 5db15d85ce237c5b288c093bcdad53082a78fab3 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 20 Jan 2019 00:02:52 -0800
Subject: [PATCH v18 12/18] tableam: VACUUM and ANALYZE.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/heap/heapam_handler.c | 160 ++++++++++++++++++
 src/backend/commands/analyze.c           | 205 +++++------------------
 src/backend/commands/vacuum.c            |   2 +-
 src/include/access/tableam.h             |  25 +++
 4 files changed, 229 insertions(+), 163 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 9e67d48b6ea..b35ed21581e 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -678,6 +678,163 @@ heapam_relation_copy_data(Relation rel, RelFileNode newrnode)
 	smgrclose(dstrel);
 }
 
+static void
+heapam_scan_analyze_next_block(TableScanDesc sscan, BlockNumber blockno, BufferAccessStrategy bstrategy)
+{
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+
+	/*
+	 * We must maintain a pin on the target page's buffer to ensure that the
+	 * maxoffset value stays good (else concurrent VACUUM might delete tuples
+	 * out from under us).  Hence, pin the page until we are done looking at
+	 * it.  We also choose to hold sharelock on the buffer throughout --- we
+	 * could release and re-acquire sharelock for each tuple, but since we
+	 * aren't doing much work per tuple, the extra lock traffic is probably
+	 * better avoided.
+	 */
+	scan->rs_cblock = blockno;
+	scan->rs_cbuf = ReadBufferExtended(scan->rs_base.rs_rd, MAIN_FORKNUM, blockno,
+									   RBM_NORMAL, bstrategy);
+	scan->rs_cindex = FirstOffsetNumber;
+	LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+}
+
+static bool
+heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, double *liverows, double *deadrows, TupleTableSlot *slot)
+{
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+	Page		targpage;
+	OffsetNumber maxoffset;
+	BufferHeapTupleTableSlot *hslot;
+
+	Assert(TTS_IS_BUFFERTUPLE(slot));
+
+	hslot = (BufferHeapTupleTableSlot *) slot;
+	targpage = BufferGetPage(scan->rs_cbuf);
+	maxoffset = PageGetMaxOffsetNumber(targpage);
+
+	/* Inner loop over all tuples on the selected page */
+	for (; scan->rs_cindex <= maxoffset; scan->rs_cindex++)
+	{
+		ItemId		itemid;
+		HeapTuple	targtuple = &hslot->base.tupdata;
+		bool		sample_it = false;
+
+		itemid = PageGetItemId(targpage, scan->rs_cindex);
+
+		/*
+		 * We ignore unused and redirect line pointers.  DEAD line pointers
+		 * should be counted as dead, because we need vacuum to run to get rid
+		 * of them.  Note that this rule agrees with the way that
+		 * heap_page_prune() counts things.
+		 */
+		if (!ItemIdIsNormal(itemid))
+		{
+			if (ItemIdIsDead(itemid))
+				*deadrows += 1;
+			continue;
+		}
+
+		ItemPointerSet(&targtuple->t_self, scan->rs_cblock, scan->rs_cindex);
+
+		targtuple->t_tableOid = RelationGetRelid(scan->rs_base.rs_rd);
+		targtuple->t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
+		targtuple->t_len = ItemIdGetLength(itemid);
+
+		switch (HeapTupleSatisfiesVacuum(targtuple, OldestXmin, scan->rs_cbuf))
+		{
+			case HEAPTUPLE_LIVE:
+				sample_it = true;
+				*liverows += 1;
+				break;
+
+			case HEAPTUPLE_DEAD:
+			case HEAPTUPLE_RECENTLY_DEAD:
+				/* Count dead and recently-dead rows */
+				*deadrows += 1;
+				break;
+
+			case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+				/*
+				 * Insert-in-progress rows are not counted.  We assume that
+				 * when the inserting transaction commits or aborts, it will
+				 * send a stats message to increment the proper count.  This
+				 * works right only if that transaction ends after we finish
+				 * analyzing the table; if things happen in the other order,
+				 * its stats update will be overwritten by ours.  However, the
+				 * error will be large only if the other transaction runs long
+				 * enough to insert many tuples, so assuming it will finish
+				 * after us is the safer option.
+				 *
+				 * A special case is that the inserting transaction might be
+				 * our own.  In this case we should count and sample the row,
+				 * to accommodate users who load a table and analyze it in one
+				 * transaction.  (pgstat_report_analyze has to adjust the
+				 * numbers we send to the stats collector to make this come
+				 * out right.)
+				 */
+				if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(targtuple->t_data)))
+				{
+					sample_it = true;
+					*liverows += 1;
+				}
+				break;
+
+			case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+				/*
+				 * We count and sample delete-in-progress rows the same as
+				 * live ones, so that the stats counters come out right if the
+				 * deleting transaction commits after us, per the same
+				 * reasoning given above.
+				 *
+				 * If the delete was done by our own transaction, however, we
+				 * must count the row as dead to make pgstat_report_analyze's
+				 * stats adjustments come out right.  (Note: this works out
+				 * properly when the row was both inserted and deleted in our
+				 * xact.)
+				 *
+				 * The net effect of these choices is that we act as though an
+				 * IN_PROGRESS transaction hasn't happened yet, except if it
+				 * is our own transaction, which we assume has happened.
+				 *
+				 * This approach ensures that we behave sanely if we see both
+				 * the pre-image and post-image rows for a row being updated
+				 * by a concurrent transaction: we will sample the pre-image
+				 * but not the post-image.  We also get sane results if the
+				 * concurrent transaction never commits.
+				 */
+				if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(targtuple->t_data)))
+					deadrows += 1;
+				else
+				{
+					sample_it = true;
+					liverows += 1;
+				}
+				break;
+
+			default:
+				elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+				break;
+		}
+
+		if (sample_it)
+		{
+			ExecStoreBufferHeapTuple(targtuple, slot, scan->rs_cbuf);
+			scan->rs_cindex++;
+
+			/* note that we leave the buffer locked here! */
+			return true;
+		}
+	}
+
+	/* Now release the lock and pin on the page */
+	UnlockReleaseBuffer(scan->rs_cbuf);
+	scan->rs_cbuf = InvalidBuffer;
+
+	return false;
+}
 
 static void
 heapam_copy_for_cluster(Relation OldHeap, Relation NewHeap, Relation OldIndex,
@@ -1721,6 +1878,9 @@ static const TableAmRoutine heapam_methods = {
 	.relation_set_new_filenode = heapam_set_new_filenode,
 	.relation_nontransactional_truncate = heapam_relation_nontransactional_truncate,
 	.relation_copy_data = heapam_relation_copy_data,
+	.relation_vacuum = heap_vacuum_rel,
+	.scan_analyze_next_block = heapam_scan_analyze_next_block,
+	.scan_analyze_next_tuple = heapam_scan_analyze_next_tuple,
 	.relation_copy_for_cluster = heapam_copy_for_cluster,
 	.index_build_range_scan = heapam_index_build_range_scan,
 	.index_validate_scan = heapam_index_validate_scan,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index c8192353ebe..996dc500a8f 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -17,11 +17,11 @@
 #include <math.h>
 
 #include "access/genam.h"
-#include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/relation.h"
 #include "access/sysattr.h"
 #include "access/table.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/tupconvert.h"
 #include "access/tuptoaster.h"
@@ -1014,6 +1014,8 @@ acquire_sample_rows(Relation onerel, int elevel,
 	TransactionId OldestXmin;
 	BlockSamplerData bs;
 	ReservoirStateData rstate;
+	TupleTableSlot *slot;
+	TableScanDesc scan;
 
 	Assert(targrows > 0);
 
@@ -1027,193 +1029,72 @@ acquire_sample_rows(Relation onerel, int elevel,
 	/* Prepare for sampling rows */
 	reservoir_init_selection_state(&rstate, targrows);
 
+	scan = table_beginscan_analyze(onerel);
+	slot = table_slot_create(onerel, NULL);
+
 	/* Outer loop over blocks to sample */
 	while (BlockSampler_HasMore(&bs))
 	{
 		BlockNumber targblock = BlockSampler_Next(&bs);
-		Buffer		targbuffer;
-		Page		targpage;
-		OffsetNumber targoffset,
-					maxoffset;
 
 		vacuum_delay_point();
 
 		/*
-		 * We must maintain a pin on the target page's buffer to ensure that
-		 * the maxoffset value stays good (else concurrent VACUUM might delete
-		 * tuples out from under us).  Hence, pin the page until we are done
-		 * looking at it.  We also choose to hold sharelock on the buffer
-		 * throughout --- we could release and re-acquire sharelock for each
-		 * tuple, but since we aren't doing much work per tuple, the extra
-		 * lock traffic is probably better avoided.
+		 * XXX: we could have this function return a boolean, instead of
+		 * forcing such checks to happen in next_tuple().
 		 */
-		targbuffer = ReadBufferExtended(onerel, MAIN_FORKNUM, targblock,
-										RBM_NORMAL, vac_strategy);
-		LockBuffer(targbuffer, BUFFER_LOCK_SHARE);
-		targpage = BufferGetPage(targbuffer);
-		maxoffset = PageGetMaxOffsetNumber(targpage);
+		table_scan_analyze_next_block(scan, targblock, vac_strategy);
 
-		/* Inner loop over all tuples on the selected page */
-		for (targoffset = FirstOffsetNumber; targoffset <= maxoffset; targoffset++)
+		while (table_scan_analyze_next_tuple(scan, OldestXmin, &liverows, &deadrows, slot))
 		{
-			ItemId		itemid;
-			HeapTupleData targtuple;
-			bool		sample_it = false;
-
-			itemid = PageGetItemId(targpage, targoffset);
-
 			/*
-			 * We ignore unused and redirect line pointers.  DEAD line
-			 * pointers should be counted as dead, because we need vacuum to
-			 * run to get rid of them.  Note that this rule agrees with the
-			 * way that heap_page_prune() counts things.
+			 * The first targrows sample rows are simply copied into the
+			 * reservoir. Then we start replacing tuples in the sample
+			 * until we reach the end of the relation.  This algorithm is
+			 * from Jeff Vitter's paper (see full citation below). It
+			 * works by repeatedly computing the number of tuples to skip
+			 * before selecting a tuple, which replaces a randomly chosen
+			 * element of the reservoir (current set of tuples).  At all
+			 * times the reservoir is a true random sample of the tuples
+			 * we've passed over so far, so when we fall off the end of
+			 * the relation we're done.
 			 */
-			if (!ItemIdIsNormal(itemid))
-			{
-				if (ItemIdIsDead(itemid))
-					deadrows += 1;
-				continue;
-			}
-
-			ItemPointerSet(&targtuple.t_self, targblock, targoffset);
-
-			targtuple.t_tableOid = RelationGetRelid(onerel);
-			targtuple.t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
-			targtuple.t_len = ItemIdGetLength(itemid);
-
-			switch (HeapTupleSatisfiesVacuum(&targtuple,
-											 OldestXmin,
-											 targbuffer))
-			{
-				case HEAPTUPLE_LIVE:
-					sample_it = true;
-					liverows += 1;
-					break;
-
-				case HEAPTUPLE_DEAD:
-				case HEAPTUPLE_RECENTLY_DEAD:
-					/* Count dead and recently-dead rows */
-					deadrows += 1;
-					break;
-
-				case HEAPTUPLE_INSERT_IN_PROGRESS:
-
-					/*
-					 * Insert-in-progress rows are not counted.  We assume
-					 * that when the inserting transaction commits or aborts,
-					 * it will send a stats message to increment the proper
-					 * count.  This works right only if that transaction ends
-					 * after we finish analyzing the table; if things happen
-					 * in the other order, its stats update will be
-					 * overwritten by ours.  However, the error will be large
-					 * only if the other transaction runs long enough to
-					 * insert many tuples, so assuming it will finish after us
-					 * is the safer option.
-					 *
-					 * A special case is that the inserting transaction might
-					 * be our own.  In this case we should count and sample
-					 * the row, to accommodate users who load a table and
-					 * analyze it in one transaction.  (pgstat_report_analyze
-					 * has to adjust the numbers we send to the stats
-					 * collector to make this come out right.)
-					 */
-					if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(targtuple.t_data)))
-					{
-						sample_it = true;
-						liverows += 1;
-					}
-					break;
-
-				case HEAPTUPLE_DELETE_IN_PROGRESS:
-
-					/*
-					 * We count and sample delete-in-progress rows the same as
-					 * live ones, so that the stats counters come out right if
-					 * the deleting transaction commits after us, per the same
-					 * reasoning given above.
-					 *
-					 * If the delete was done by our own transaction, however,
-					 * we must count the row as dead to make
-					 * pgstat_report_analyze's stats adjustments come out
-					 * right.  (Note: this works out properly when the row was
-					 * both inserted and deleted in our xact.)
-					 *
-					 * The net effect of these choices is that we act as
-					 * though an IN_PROGRESS transaction hasn't happened yet,
-					 * except if it is our own transaction, which we assume
-					 * has happened.
-					 *
-					 * This approach ensures that we behave sanely if we see
-					 * both the pre-image and post-image rows for a row being
-					 * updated by a concurrent transaction: we will sample the
-					 * pre-image but not the post-image.  We also get sane
-					 * results if the concurrent transaction never commits.
-					 */
-					if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(targtuple.t_data)))
-						deadrows += 1;
-					else
-					{
-						sample_it = true;
-						liverows += 1;
-					}
-					break;
-
-				default:
-					elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
-					break;
-			}
-
-			if (sample_it)
+			if (numrows < targrows)
+				rows[numrows++] = ExecCopySlotHeapTuple(slot);
+			else
 			{
 				/*
-				 * The first targrows sample rows are simply copied into the
-				 * reservoir. Then we start replacing tuples in the sample
-				 * until we reach the end of the relation.  This algorithm is
-				 * from Jeff Vitter's paper (see full citation below). It
-				 * works by repeatedly computing the number of tuples to skip
-				 * before selecting a tuple, which replaces a randomly chosen
-				 * element of the reservoir (current set of tuples).  At all
-				 * times the reservoir is a true random sample of the tuples
-				 * we've passed over so far, so when we fall off the end of
-				 * the relation we're done.
+				 * t in Vitter's paper is the number of records already
+				 * processed.  If we need to compute a new S value, we
+				 * must use the not-yet-incremented value of samplerows as
+				 * t.
 				 */
-				if (numrows < targrows)
-					rows[numrows++] = heap_copytuple(&targtuple);
-				else
+				if (rowstoskip < 0)
+					rowstoskip = reservoir_get_next_S(&rstate, samplerows, targrows);
+
+				if (rowstoskip <= 0)
 				{
 					/*
-					 * t in Vitter's paper is the number of records already
-					 * processed.  If we need to compute a new S value, we
-					 * must use the not-yet-incremented value of samplerows as
-					 * t.
+					 * Found a suitable tuple, so save it, replacing one
+					 * old tuple at random
 					 */
-					if (rowstoskip < 0)
-						rowstoskip = reservoir_get_next_S(&rstate, samplerows, targrows);
+					int			k = (int) (targrows * sampler_random_fract(rstate.randstate));
 
-					if (rowstoskip <= 0)
-					{
-						/*
-						 * Found a suitable tuple, so save it, replacing one
-						 * old tuple at random
-						 */
-						int			k = (int) (targrows * sampler_random_fract(rstate.randstate));
-
-						Assert(k >= 0 && k < targrows);
-						heap_freetuple(rows[k]);
-						rows[k] = heap_copytuple(&targtuple);
-					}
-
-					rowstoskip -= 1;
+					Assert(k >= 0 && k < targrows);
+					heap_freetuple(rows[k]);
+					rows[k] = ExecCopySlotHeapTuple(slot);
 				}
 
-				samplerows += 1;
+				rowstoskip -= 1;
 			}
-		}
 
-		/* Now release the lock and pin on the page */
-		UnlockReleaseBuffer(targbuffer);
+			samplerows += 1;
+		}
 	}
 
+	ExecDropSingleTupleTableSlot(slot);
+	table_endscan(scan);
+
 	/*
 	 * If we didn't find as many tuples as we wanted then we're done. No sort
 	 * is needed, since they're already in order.
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 3763a8c39e0..61d6d62e6d9 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1711,7 +1711,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 		cluster_rel(relid, InvalidOid, cluster_options);
 	}
 	else
-		heap_vacuum_rel(onerel, options, params, vac_strategy);
+		table_vacuum_rel(onerel, options, params, vac_strategy);
 
 	/* Roll back any GUC changes executed by index functions */
 	AtEOXact_GUC(false, save_nestlevel);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 125ed1c012a..8df3abd90a2 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -28,6 +28,7 @@ extern char *default_table_access_method;
 extern bool synchronize_seqscans;
 
 
+struct VacuumParams;
 struct ValidateIndexState;
 struct BulkInsertStateData;
 
@@ -308,6 +309,12 @@ typedef struct TableAmRoutine
 											  MultiXactId *minmulti);
 	void		(*relation_nontransactional_truncate) (Relation rel);
 	void		(*relation_copy_data) (Relation rel, RelFileNode newrnode);
+	void		(*relation_vacuum) (Relation onerel, int options,
+									struct VacuumParams *params, BufferAccessStrategy bstrategy);
+	void		(*scan_analyze_next_block) (TableScanDesc scan, BlockNumber blockno,
+											BufferAccessStrategy bstrategy);
+	bool		(*scan_analyze_next_tuple) (TableScanDesc scan, TransactionId OldestXmin,
+											double *liverows, double *deadrows, TupleTableSlot *slot);
 	void		(*relation_copy_for_cluster) (Relation NewHeap, Relation OldHeap, Relation OldIndex,
 											  bool use_sort,
 											  TransactionId OldestXmin, TransactionId FreezeXid, MultiXactId MultiXactCutoff,
@@ -765,6 +772,24 @@ table_relation_copy_data(Relation rel, RelFileNode newrnode)
 	rel->rd_tableam->relation_copy_data(rel, newrnode);
 }
 
+static inline void
+table_vacuum_rel(Relation rel, int options,
+				 struct VacuumParams *params, BufferAccessStrategy bstrategy)
+{
+	rel->rd_tableam->relation_vacuum(rel, options, params, bstrategy);
+}
+
+static inline void
+table_scan_analyze_next_block(TableScanDesc scan, BlockNumber blockno, BufferAccessStrategy bstrategy)
+{
+	scan->rs_rd->rd_tableam->scan_analyze_next_block(scan, blockno, bstrategy);
+}
+
+static inline bool
+table_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin, double *liverows, double *deadrows, TupleTableSlot *slot)
+{
+	return scan->rs_rd->rd_tableam->scan_analyze_next_tuple(scan, OldestXmin, liverows, deadrows, slot);
+}
 
 /* XXX: Move arguments to struct? */
 static inline void
-- 
2.21.0.dirty

v18-0013-tableam-planner-size-estimation.patchtext/x-diff; charset=us-asciiDownload
From 2251e3cb1b31fe731fb5980b7bf897e81997ea10 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 20 Jan 2019 00:06:22 -0800
Subject: [PATCH v18 13/18] tableam: planner size estimation.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/heap/heapam_handler.c | 108 +++++++++++++++++++++++
 src/backend/optimizer/util/plancat.c     |  64 +++++---------
 src/include/access/tableam.h             |  23 +++++
 src/include/optimizer/plancat.h          |   1 +
 4 files changed, 152 insertions(+), 44 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index b35ed21581e..d13f934833a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -19,6 +19,8 @@
  */
 #include "postgres.h"
 
+#include <math.h>
+
 #include "miscadmin.h"
 
 #include "access/genam.h"
@@ -31,6 +33,7 @@
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
 #include "executor/executor.h"
+#include "optimizer/plancat.h"
 #include "storage/bufmgr.h"
 #include "storage/bufpage.h"
 #include "storage/bufmgr.h"
@@ -1843,6 +1846,109 @@ reform_and_rewrite_tuple(HeapTuple tuple,
 
 	heap_freetuple(copiedTuple);
 }
+
+static void
+heapam_estimate_rel_size(Relation rel, int32 *attr_widths,
+						 BlockNumber *pages, double *tuples,
+						 double *allvisfrac)
+{
+	BlockNumber curpages;
+	BlockNumber relpages;
+	double		reltuples;
+	BlockNumber relallvisible;
+	double		density;
+
+	/* it has storage, ok to call the smgr */
+	curpages = RelationGetNumberOfBlocks(rel);
+
+	/* coerce values in pg_class to more desirable types */
+	relpages = (BlockNumber) rel->rd_rel->relpages;
+	reltuples = (double) rel->rd_rel->reltuples;
+	relallvisible = (BlockNumber) rel->rd_rel->relallvisible;
+
+	/*
+	 * HACK: if the relation has never yet been vacuumed, use a minimum size
+	 * estimate of 10 pages.  The idea here is to avoid assuming a
+	 * newly-created table is really small, even if it currently is, because
+	 * that may not be true once some data gets loaded into it.  Once a vacuum
+	 * or analyze cycle has been done on it, it's more reasonable to believe
+	 * the size is somewhat stable.
+	 *
+	 * (Note that this is only an issue if the plan gets cached and used again
+	 * after the table has been filled.  What we're trying to avoid is using a
+	 * nestloop-type plan on a table that has grown substantially since the
+	 * plan was made.  Normally, autovacuum/autoanalyze will occur once enough
+	 * inserts have happened and cause cached-plan invalidation; but that
+	 * doesn't happen instantaneously, and it won't happen at all for cases
+	 * such as temporary tables.)
+	 *
+	 * We approximate "never vacuumed" by "has relpages = 0", which means this
+	 * will also fire on genuinely empty relations.  Not great, but
+	 * fortunately that's a seldom-seen case in the real world, and it
+	 * shouldn't degrade the quality of the plan too much anyway to err in
+	 * this direction.
+	 *
+	 * If the table has inheritance children, we don't apply this heuristic.
+	 * Totally empty parent tables are quite common, so we should be willing
+	 * to believe that they are empty.
+	 */
+	if (curpages < 10 &&
+		relpages == 0 &&
+		!rel->rd_rel->relhassubclass)
+		curpages = 10;
+
+	/* report estimated # pages */
+	*pages = curpages;
+	/* quick exit if rel is clearly empty */
+	if (curpages == 0)
+	{
+		*tuples = 0;
+		*allvisfrac = 0;
+		return;
+	}
+
+	/* estimate number of tuples from previous tuple density */
+	if (relpages > 0)
+		density = reltuples / (double) relpages;
+	else
+	{
+		/*
+		 * When we have no data because the relation was truncated, estimate
+		 * tuple width from attribute datatypes.  We assume here that the
+		 * pages are completely full, which is OK for tables (since they've
+		 * presumably not been VACUUMed yet) but is probably an overestimate
+		 * for indexes.  Fortunately get_relation_info() can clamp the
+		 * overestimate to the parent table's size.
+		 *
+		 * Note: this code intentionally disregards alignment considerations,
+		 * because (a) that would be gilding the lily considering how crude
+		 * the estimate is, and (b) it creates platform dependencies in the
+		 * default plans which are kind of a headache for regression testing.
+		 */
+		int32		tuple_width;
+
+		tuple_width = get_rel_data_width(rel, attr_widths);
+		tuple_width += MAXALIGN(SizeofHeapTupleHeader);
+		tuple_width += sizeof(ItemIdData);
+		/* note: integer division is intentional here */
+		density = (BLCKSZ - SizeOfPageHeaderData) / tuple_width;
+	}
+	*tuples = rint(density * (double) curpages);
+
+	/*
+	 * We use relallvisible as-is, rather than scaling it up like we do for
+	 * the pages and tuples counts, on the theory that any pages added since
+	 * the last VACUUM are most likely not marked all-visible.  But costsize.c
+	 * wants it converted to a fraction.
+	 */
+	if (relallvisible == 0 || curpages <= 0)
+		*allvisfrac = 0;
+	else if ((double) relallvisible >= curpages)
+		*allvisfrac = 1;
+	else
+		*allvisfrac = (double) relallvisible / curpages;
+}
+
 static const TableAmRoutine heapam_methods = {
 	.type = T_TableAmRoutine,
 
@@ -1884,6 +1990,8 @@ static const TableAmRoutine heapam_methods = {
 	.relation_copy_for_cluster = heapam_copy_for_cluster,
 	.index_build_range_scan = heapam_index_build_range_scan,
 	.index_validate_scan = heapam_index_validate_scan,
+
+	.relation_estimate_size = heapam_estimate_rel_size,
 };
 
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 30f4dc151bc..5ee829bb24e 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -20,6 +20,7 @@
 #include "access/genam.h"
 #include "access/htup_details.h"
 #include "access/nbtree.h"
+#include "access/tableam.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/transam.h"
@@ -65,7 +66,6 @@ static void get_relation_foreign_keys(PlannerInfo *root, RelOptInfo *rel,
 						  Relation relation, bool inhparent);
 static bool infer_collation_opclass_match(InferenceElem *elem, Relation idxRel,
 							  List *idxExprs);
-static int32 get_rel_data_width(Relation rel, int32 *attr_widths);
 static List *get_relation_constraints(PlannerInfo *root,
 						 Oid relationObjectId, RelOptInfo *rel,
 						 bool include_notnull);
@@ -949,47 +949,24 @@ estimate_rel_size(Relation rel, int32 *attr_widths,
 	switch (rel->rd_rel->relkind)
 	{
 		case RELKIND_RELATION:
-		case RELKIND_INDEX:
 		case RELKIND_MATVIEW:
 		case RELKIND_TOASTVALUE:
+			table_estimate_size(rel, attr_widths, pages, tuples, allvisfrac);
+			break;
+
+		case RELKIND_INDEX:
+			/*
+			 * XXX: It'd probably be good to move this into a callback,
+			 * individual index types can have more precise knowledge.
+			 */
+
 			/* it has storage, ok to call the smgr */
 			curpages = RelationGetNumberOfBlocks(rel);
 
-			/*
-			 * HACK: if the relation has never yet been vacuumed, use a
-			 * minimum size estimate of 10 pages.  The idea here is to avoid
-			 * assuming a newly-created table is really small, even if it
-			 * currently is, because that may not be true once some data gets
-			 * loaded into it.  Once a vacuum or analyze cycle has been done
-			 * on it, it's more reasonable to believe the size is somewhat
-			 * stable.
-			 *
-			 * (Note that this is only an issue if the plan gets cached and
-			 * used again after the table has been filled.  What we're trying
-			 * to avoid is using a nestloop-type plan on a table that has
-			 * grown substantially since the plan was made.  Normally,
-			 * autovacuum/autoanalyze will occur once enough inserts have
-			 * happened and cause cached-plan invalidation; but that doesn't
-			 * happen instantaneously, and it won't happen at all for cases
-			 * such as temporary tables.)
-			 *
-			 * We approximate "never vacuumed" by "has relpages = 0", which
-			 * means this will also fire on genuinely empty relations.  Not
-			 * great, but fortunately that's a seldom-seen case in the real
-			 * world, and it shouldn't degrade the quality of the plan too
-			 * much anyway to err in this direction.
-			 *
-			 * There are two exceptions wherein we don't apply this heuristic.
-			 * One is if the table has inheritance children.  Totally empty
-			 * parent tables are quite common, so we should be willing to
-			 * believe that they are empty.  Also, we don't apply the 10-page
-			 * minimum to indexes.
-			 */
-			if (curpages < 10 &&
-				rel->rd_rel->relpages == 0 &&
-				!rel->rd_rel->relhassubclass &&
-				rel->rd_rel->relkind != RELKIND_INDEX)
-				curpages = 10;
+			/* coerce values in pg_class to more desirable types */
+			relpages = (BlockNumber) rel->rd_rel->relpages;
+			reltuples = (double) rel->rd_rel->reltuples;
+			relallvisible = (BlockNumber) rel->rd_rel->relallvisible;
 
 			/* report estimated # pages */
 			*pages = curpages;
@@ -1006,13 +983,12 @@ estimate_rel_size(Relation rel, int32 *attr_widths,
 			relallvisible = (BlockNumber) rel->rd_rel->relallvisible;
 
 			/*
-			 * If it's an index, discount the metapage while estimating the
-			 * number of tuples.  This is a kluge because it assumes more than
-			 * it ought to about index structure.  Currently it's OK for
-			 * btree, hash, and GIN indexes but suspect for GiST indexes.
+			 * Discount the metapage while estimating the number of tuples.
+			 * This is a kluge because it assumes more than it ought to about
+			 * index structure.  Currently it's OK for btree, hash, and GIN
+			 * indexes but suspect for GiST indexes.
 			 */
-			if (rel->rd_rel->relkind == RELKIND_INDEX &&
-				relpages > 0)
+			if (relpages > 0)
 			{
 				curpages--;
 				relpages--;
@@ -1096,7 +1072,7 @@ estimate_rel_size(Relation rel, int32 *attr_widths,
  * since they might be mostly NULLs, treating them as zero-width is not
  * necessarily the wrong thing anyway.
  */
-static int32
+int32
 get_rel_data_width(Relation rel, int32 *attr_widths)
 {
 	int32		tuple_width = 0;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8df3abd90a2..2fc144c3665 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -335,6 +335,15 @@ typedef struct TableAmRoutine
 										Snapshot snapshot,
 										struct ValidateIndexState *state);
 
+
+	/* ------------------------------------------------------------------------
+	 * Planner related functions.
+	 * ------------------------------------------------------------------------
+	 */
+
+	void		(*relation_estimate_size) (Relation rel, int32 *attr_widths,
+										   BlockNumber *pages, double *tuples, double *allvisfrac);
+
 } TableAmRoutine;
 
 
@@ -864,6 +873,20 @@ table_index_build_range_scan(Relation heap_rel,
 }
 
 
+/* ----------------------------------------------------------------------------
+ * Planner related functionality
+ * ----------------------------------------------------------------------------
+ */
+
+static inline void
+table_estimate_size(Relation rel, int32 *attr_widths,
+					BlockNumber *pages, double *tuples, double *allvisfrac)
+{
+	rel->rd_tableam->relation_estimate_size(rel, attr_widths,
+											pages, tuples, allvisfrac);
+}
+
+
 /* ----------------------------------------------------------------------------
  * Functions to make modifications a bit simpler.
  * ----------------------------------------------------------------------------
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index c337f047cb7..985e7b25005 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -33,6 +33,7 @@ extern List *infer_arbiter_indexes(PlannerInfo *root);
 extern void estimate_rel_size(Relation rel, int32 *attr_widths,
 				  BlockNumber *pages, double *tuples, double *allvisfrac);
 
+extern int32 get_rel_data_width(Relation rel, int32 *attr_widths);
 extern int32 get_relation_data_width(Oid relid, int32 *attr_widths);
 
 extern bool relation_excluded_by_constraints(PlannerInfo *root,
-- 
2.21.0.dirty

v18-0014-tableam-Sample-Scan-Support.patchtext/x-diff; charset=us-asciiDownload
From 6709a066fad0d75cf8d2daec97f27878f8cd9ce4 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 20 Jan 2019 00:08:58 -0800
Subject: [PATCH v18 14/18] tableam: Sample Scan Support.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 contrib/tsm_system_rows/tsm_system_rows.c |  86 ++------
 contrib/tsm_system_time/tsm_system_time.c |  13 +-
 src/backend/access/heap/heapam_handler.c  | 218 +++++++++++++++++++
 src/backend/access/tablesample/system.c   |  11 +-
 src/backend/executor/nodeSamplescan.c     | 249 +++-------------------
 src/include/access/tableam.h              |  31 +++
 src/include/access/tsmapi.h               |   2 +-
 src/include/nodes/execnodes.h             |   4 +
 src/include/nodes/tidbitmap.h             |   2 +-
 9 files changed, 308 insertions(+), 308 deletions(-)

diff --git a/contrib/tsm_system_rows/tsm_system_rows.c b/contrib/tsm_system_rows/tsm_system_rows.c
index 1d35ea3c53a..3611d058331 100644
--- a/contrib/tsm_system_rows/tsm_system_rows.c
+++ b/contrib/tsm_system_rows/tsm_system_rows.c
@@ -28,7 +28,6 @@
 
 #include "postgres.h"
 
-#include "access/heapam.h"
 #include "access/relscan.h"
 #include "access/tsmapi.h"
 #include "catalog/pg_type.h"
@@ -46,7 +45,6 @@ typedef struct
 {
 	uint32		seed;			/* random seed */
 	int64		ntuples;		/* number of tuples to return */
-	int64		donetuples;		/* number of tuples already returned */
 	OffsetNumber lt;			/* last tuple returned from current block */
 	BlockNumber doneblocks;		/* number of already-scanned blocks */
 	BlockNumber lb;				/* last block visited */
@@ -67,11 +65,10 @@ static void system_rows_beginsamplescan(SampleScanState *node,
 							Datum *params,
 							int nparams,
 							uint32 seed);
-static BlockNumber system_rows_nextsampleblock(SampleScanState *node);
+static BlockNumber system_rows_nextsampleblock(SampleScanState *node, BlockNumber nblocks);
 static OffsetNumber system_rows_nextsampletuple(SampleScanState *node,
 							BlockNumber blockno,
 							OffsetNumber maxoffset);
-static bool SampleOffsetVisible(OffsetNumber tupoffset, HeapScanDesc scan);
 static uint32 random_relative_prime(uint32 n, SamplerRandomState randstate);
 
 
@@ -187,7 +184,6 @@ system_rows_beginsamplescan(SampleScanState *node,
 
 	sampler->seed = seed;
 	sampler->ntuples = ntuples;
-	sampler->donetuples = 0;
 	sampler->lt = InvalidOffsetNumber;
 	sampler->doneblocks = 0;
 	/* lb will be initialized during first NextSampleBlock call */
@@ -206,11 +202,9 @@ system_rows_beginsamplescan(SampleScanState *node,
  * Uses linear probing algorithm for picking next block.
  */
 static BlockNumber
-system_rows_nextsampleblock(SampleScanState *node)
+system_rows_nextsampleblock(SampleScanState *node, BlockNumber nblocks)
 {
 	SystemRowsSamplerData *sampler = (SystemRowsSamplerData *) node->tsm_state;
-	TableScanDesc scan = node->ss.ss_currentScanDesc;
-	HeapScanDesc hscan = (HeapScanDesc) scan;
 
 	/* First call within scan? */
 	if (sampler->doneblocks == 0)
@@ -222,14 +216,14 @@ system_rows_nextsampleblock(SampleScanState *node)
 			SamplerRandomState randstate;
 
 			/* If relation is empty, there's nothing to scan */
-			if (hscan->rs_nblocks == 0)
+			if (nblocks == 0)
 				return InvalidBlockNumber;
 
 			/* We only need an RNG during this setup step */
 			sampler_random_init_state(sampler->seed, randstate);
 
 			/* Compute nblocks/firstblock/step only once per query */
-			sampler->nblocks = hscan->rs_nblocks;
+			sampler->nblocks = nblocks;
 
 			/* Choose random starting block within the relation */
 			/* (Actually this is the predecessor of the first block visited) */
@@ -246,7 +240,7 @@ system_rows_nextsampleblock(SampleScanState *node)
 
 	/* If we've read all blocks or returned all needed tuples, we're done */
 	if (++sampler->doneblocks > sampler->nblocks ||
-		sampler->donetuples >= sampler->ntuples)
+		node->donetuples >= sampler->ntuples)
 		return InvalidBlockNumber;
 
 	/*
@@ -259,7 +253,7 @@ system_rows_nextsampleblock(SampleScanState *node)
 	{
 		/* Advance lb, using uint64 arithmetic to forestall overflow */
 		sampler->lb = ((uint64) sampler->lb + sampler->step) % sampler->nblocks;
-	} while (sampler->lb >= hscan->rs_nblocks);
+	} while (sampler->lb >= nblocks);
 
 	return sampler->lb;
 }
@@ -279,77 +273,27 @@ system_rows_nextsampletuple(SampleScanState *node,
 							OffsetNumber maxoffset)
 {
 	SystemRowsSamplerData *sampler = (SystemRowsSamplerData *) node->tsm_state;
-	TableScanDesc scan = node->ss.ss_currentScanDesc;
-	HeapScanDesc hscan = (HeapScanDesc) scan;
 	OffsetNumber tupoffset = sampler->lt;
 
 	/* Quit if we've returned all needed tuples */
-	if (sampler->donetuples >= sampler->ntuples)
+	if (node->donetuples >= sampler->ntuples)
 		return InvalidOffsetNumber;
 
-	/*
-	 * Because we should only count visible tuples as being returned, we need
-	 * to search for a visible tuple rather than just let the core code do it.
-	 */
+	/* Advance to next possible offset on page */
+	if (tupoffset == InvalidOffsetNumber)
+		tupoffset = FirstOffsetNumber;
+	else
+		tupoffset++;
 
-	/* We rely on the data accumulated in pagemode access */
-	Assert(scan->rs_pageatatime);
-	for (;;)
-	{
-		/* Advance to next possible offset on page */
-		if (tupoffset == InvalidOffsetNumber)
-			tupoffset = FirstOffsetNumber;
-		else
-			tupoffset++;
-
-		/* Done? */
-		if (tupoffset > maxoffset)
-		{
-			tupoffset = InvalidOffsetNumber;
-			break;
-		}
-
-		/* Found a candidate? */
-		if (SampleOffsetVisible(tupoffset, hscan))
-		{
-			sampler->donetuples++;
-			break;
-		}
-	}
+	/* Done? */
+	if (tupoffset > maxoffset)
+		tupoffset = InvalidOffsetNumber;
 
 	sampler->lt = tupoffset;
 
 	return tupoffset;
 }
 
-/*
- * Check if tuple offset is visible
- *
- * In pageatatime mode, heapgetpage() already did visibility checks,
- * so just look at the info it left in rs_vistuples[].
- */
-static bool
-SampleOffsetVisible(OffsetNumber tupoffset, HeapScanDesc scan)
-{
-	int			start = 0,
-				end = scan->rs_ntuples - 1;
-
-	while (start <= end)
-	{
-		int			mid = (start + end) / 2;
-		OffsetNumber curoffset = scan->rs_vistuples[mid];
-
-		if (tupoffset == curoffset)
-			return true;
-		else if (tupoffset < curoffset)
-			end = mid - 1;
-		else
-			start = mid + 1;
-	}
-
-	return false;
-}
-
 /*
  * Compute greatest common divisor of two uint32's.
  */
diff --git a/contrib/tsm_system_time/tsm_system_time.c b/contrib/tsm_system_time/tsm_system_time.c
index 1cc7264e084..ab9f685f0af 100644
--- a/contrib/tsm_system_time/tsm_system_time.c
+++ b/contrib/tsm_system_time/tsm_system_time.c
@@ -26,7 +26,6 @@
 
 #include <math.h>
 
-#include "access/heapam.h"
 #include "access/relscan.h"
 #include "access/tsmapi.h"
 #include "catalog/pg_type.h"
@@ -66,7 +65,7 @@ static void system_time_beginsamplescan(SampleScanState *node,
 							Datum *params,
 							int nparams,
 							uint32 seed);
-static BlockNumber system_time_nextsampleblock(SampleScanState *node);
+static BlockNumber system_time_nextsampleblock(SampleScanState *node, BlockNumber nblocks);
 static OffsetNumber system_time_nextsampletuple(SampleScanState *node,
 							BlockNumber blockno,
 							OffsetNumber maxoffset);
@@ -213,11 +212,9 @@ system_time_beginsamplescan(SampleScanState *node,
  * Uses linear probing algorithm for picking next block.
  */
 static BlockNumber
-system_time_nextsampleblock(SampleScanState *node)
+system_time_nextsampleblock(SampleScanState *node, BlockNumber nblocks)
 {
 	SystemTimeSamplerData *sampler = (SystemTimeSamplerData *) node->tsm_state;
-	TableScanDesc scan = node->ss.ss_currentScanDesc;
-	HeapScanDesc hscan = (HeapScanDesc) scan;
 	instr_time	cur_time;
 
 	/* First call within scan? */
@@ -230,14 +227,14 @@ system_time_nextsampleblock(SampleScanState *node)
 			SamplerRandomState randstate;
 
 			/* If relation is empty, there's nothing to scan */
-			if (hscan->rs_nblocks == 0)
+			if (nblocks == 0)
 				return InvalidBlockNumber;
 
 			/* We only need an RNG during this setup step */
 			sampler_random_init_state(sampler->seed, randstate);
 
 			/* Compute nblocks/firstblock/step only once per query */
-			sampler->nblocks = hscan->rs_nblocks;
+			sampler->nblocks = nblocks;
 
 			/* Choose random starting block within the relation */
 			/* (Actually this is the predecessor of the first block visited) */
@@ -273,7 +270,7 @@ system_time_nextsampleblock(SampleScanState *node)
 	{
 		/* Advance lb, using uint64 arithmetic to forestall overflow */
 		sampler->lb = ((uint64) sampler->lb + sampler->step) % sampler->nblocks;
-	} while (sampler->lb >= hscan->rs_nblocks);
+	} while (sampler->lb >= nblocks);
 
 	return sampler->lb;
 }
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d13f934833a..f71b9b2a062 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -27,6 +27,7 @@
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/tableam.h"
+#include "access/tsmapi.h"
 #include "access/xact.h"
 #include "catalog/catalog.h"
 #include "catalog/index.h"
@@ -34,6 +35,7 @@
 #include "catalog/storage_xlog.h"
 #include "executor/executor.h"
 #include "optimizer/plancat.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/bufpage.h"
 #include "storage/bufmgr.h"
@@ -1804,6 +1806,219 @@ heapam_index_validate_scan(Relation heapRelation,
 	indexInfo->ii_PredicateState = NULL;
 }
 
+/*
+ * Check visibility of the tuple.
+ */
+static bool
+SampleHeapTupleVisible(HeapScanDesc scan, Buffer buffer,
+					   HeapTuple tuple,
+					   OffsetNumber tupoffset)
+{
+	if (scan->rs_base.rs_pageatatime)
+	{
+		/*
+		 * In pageatatime mode, heapgetpage() already did visibility checks,
+		 * so just look at the info it left in rs_vistuples[].
+		 *
+		 * We use a binary search over the known-sorted array.  Note: we could
+		 * save some effort if we insisted that NextSampleTuple select tuples
+		 * in increasing order, but it's not clear that there would be enough
+		 * gain to justify the restriction.
+		 */
+		int			start = 0,
+					end = scan->rs_ntuples - 1;
+
+		while (start <= end)
+		{
+			int			mid = (start + end) / 2;
+			OffsetNumber curoffset = scan->rs_vistuples[mid];
+
+			if (tupoffset == curoffset)
+				return true;
+			else if (tupoffset < curoffset)
+				end = mid - 1;
+			else
+				start = mid + 1;
+		}
+
+		return false;
+	}
+	else
+	{
+		/* Otherwise, we have to check the tuple individually. */
+		return HeapTupleSatisfiesVisibility(tuple, scan->rs_base.rs_snapshot,
+											buffer);
+	}
+}
+
+static bool
+heapam_scan_sample_next_block(TableScanDesc sscan, struct SampleScanState *scanstate)
+{
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+	TsmRoutine *tsm = scanstate->tsmroutine;
+	BlockNumber blockno;
+
+	/* return false immediately if relation is empty */
+	if (scan->rs_nblocks == 0)
+		return false;
+
+	if (tsm->NextSampleBlock)
+	{
+		blockno = tsm->NextSampleBlock(scanstate, scan->rs_nblocks);
+		scan->rs_cblock = blockno;
+	}
+	else
+	{
+		/* scanning table sequentially */
+
+		if (scan->rs_cblock == InvalidBlockNumber)
+		{
+			Assert(!scan->rs_inited);
+			blockno = scan->rs_startblock;
+		}
+		else
+		{
+			Assert(scan->rs_inited);
+
+			blockno = scan->rs_cblock + 1;
+
+			if (blockno >= scan->rs_nblocks)
+			{
+				/* wrap to begining of rel, might not have started at 0 */
+				blockno = 0;
+			}
+
+			/*
+			 * Report our new scan position for synchronization purposes.
+			 *
+			 * Note: we do this before checking for end of scan so that the
+			 * final state of the position hint is back at the start of the
+			 * rel.  That's not strictly necessary, but otherwise when you run
+			 * the same query multiple times the starting position would shift
+			 * a little bit backwards on every invocation, which is confusing.
+			 * We don't guarantee any specific ordering in general, though.
+			 */
+			if (scan->rs_base.rs_syncscan)
+				ss_report_location(scan->rs_base.rs_rd, blockno);
+
+			if (blockno == scan->rs_startblock)
+			{
+				blockno = InvalidBlockNumber;
+			}
+		}
+	}
+
+	if (!BlockNumberIsValid(blockno))
+	{
+		if (BufferIsValid(scan->rs_cbuf))
+			ReleaseBuffer(scan->rs_cbuf);
+		scan->rs_cbuf = InvalidBuffer;
+		scan->rs_cblock = InvalidBlockNumber;
+		scan->rs_inited = false;
+
+		return false;
+	}
+
+	heapgetpage(sscan, blockno);
+	scan->rs_inited = true;
+
+	return true;
+}
+
+static bool
+heapam_scan_sample_next_tuple(TableScanDesc sscan, struct SampleScanState *scanstate, TupleTableSlot *slot)
+{
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+	TsmRoutine *tsm = scanstate->tsmroutine;
+	BlockNumber blockno = scan->rs_cblock;
+	bool		pagemode = scan->rs_base.rs_pageatatime;
+
+	Page		page;
+	bool		all_visible;
+	OffsetNumber maxoffset;
+
+	ExecClearTuple(slot);
+
+	/*
+	 * When not using pagemode, we must lock the buffer during tuple
+	 * visibility checks.
+	 */
+	if (!pagemode)
+		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+
+	page = (Page) BufferGetPage(scan->rs_cbuf);
+	all_visible = PageIsAllVisible(page) &&
+		!scan->rs_base.rs_snapshot->takenDuringRecovery;
+	maxoffset = PageGetMaxOffsetNumber(page);
+
+	for (;;)
+	{
+		OffsetNumber tupoffset;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Ask the tablesample method which tuples to check on this page. */
+		tupoffset = tsm->NextSampleTuple(scanstate,
+										 blockno,
+										 maxoffset);
+
+		if (OffsetNumberIsValid(tupoffset))
+		{
+			ItemId		itemid;
+			bool		visible;
+			HeapTuple	tuple = &(scan->rs_ctup);
+
+			/* Skip invalid tuple pointers. */
+			itemid = PageGetItemId(page, tupoffset);
+			if (!ItemIdIsNormal(itemid))
+				continue;
+
+			tuple->t_data = (HeapTupleHeader) PageGetItem(page, itemid);
+			tuple->t_len = ItemIdGetLength(itemid);
+			ItemPointerSet(&(tuple->t_self), blockno, tupoffset);
+
+
+			if (all_visible)
+				visible = true;
+			else
+				visible = SampleHeapTupleVisible(scan, scan->rs_cbuf, tuple, tupoffset);
+
+			/* in pagemode, heapgetpage did this for us */
+			if (!pagemode)
+				CheckForSerializableConflictOut(visible, scan->rs_base.rs_rd, tuple,
+												scan->rs_cbuf, scan->rs_base.rs_snapshot);
+
+			/* Try next tuple from same page. */
+			if (!visible)
+				continue;
+
+			/* Found visible tuple, return it. */
+			if (!pagemode)
+				LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+
+			ExecStoreBufferHeapTuple(tuple, slot, scan->rs_cbuf);
+
+			/* Count successfully-fetched tuples as heap fetches */
+			pgstat_count_heap_getnext(scan->rs_base.rs_rd);
+
+			return true;
+		}
+		else
+		{
+			/*
+			 * If we get here, it means we've exhausted the items on this page
+			 * and it's time to move to the next.
+			 */
+			if (!pagemode)
+				LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+
+			break;
+		}
+	}
+
+	return false;
+}
+
 /*
  * Reconstruct and rewrite the given tuple
  *
@@ -1992,6 +2207,9 @@ static const TableAmRoutine heapam_methods = {
 	.index_validate_scan = heapam_index_validate_scan,
 
 	.relation_estimate_size = heapam_estimate_rel_size,
+
+	.scan_sample_next_block = heapam_scan_sample_next_block,
+	.scan_sample_next_tuple = heapam_scan_sample_next_tuple
 };
 
 
diff --git a/src/backend/access/tablesample/system.c b/src/backend/access/tablesample/system.c
index fe62a73341e..476411caf1a 100644
--- a/src/backend/access/tablesample/system.c
+++ b/src/backend/access/tablesample/system.c
@@ -27,7 +27,6 @@
 #include <math.h>
 
 #include "access/hash.h"
-#include "access/heapam.h"
 #include "access/relscan.h"
 #include "access/tsmapi.h"
 #include "catalog/pg_type.h"
@@ -56,7 +55,7 @@ static void system_beginsamplescan(SampleScanState *node,
 					   Datum *params,
 					   int nparams,
 					   uint32 seed);
-static BlockNumber system_nextsampleblock(SampleScanState *node);
+static BlockNumber system_nextsampleblock(SampleScanState *node, BlockNumber nblocks);
 static OffsetNumber system_nextsampletuple(SampleScanState *node,
 					   BlockNumber blockno,
 					   OffsetNumber maxoffset);
@@ -177,11 +176,9 @@ system_beginsamplescan(SampleScanState *node,
  * Select next block to sample.
  */
 static BlockNumber
-system_nextsampleblock(SampleScanState *node)
+system_nextsampleblock(SampleScanState *node, BlockNumber nblocks)
 {
 	SystemSamplerData *sampler = (SystemSamplerData *) node->tsm_state;
-	TableScanDesc scan = node->ss.ss_currentScanDesc;
-	HeapScanDesc hscan = (HeapScanDesc) scan;
 	BlockNumber nextblock = sampler->nextblock;
 	uint32		hashinput[2];
 
@@ -200,7 +197,7 @@ system_nextsampleblock(SampleScanState *node)
 	 * Loop over block numbers until finding suitable block or reaching end of
 	 * relation.
 	 */
-	for (; nextblock < hscan->rs_nblocks; nextblock++)
+	for (; nextblock < nblocks; nextblock++)
 	{
 		uint32		hash;
 
@@ -212,7 +209,7 @@ system_nextsampleblock(SampleScanState *node)
 			break;
 	}
 
-	if (nextblock < hscan->rs_nblocks)
+	if (nextblock < nblocks)
 	{
 		/* Found a suitable block; remember where we should start next time */
 		sampler->nextblock = nextblock + 1;
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 3a00d648099..ce0d3bfa572 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -15,7 +15,6 @@
 #include "postgres.h"
 
 #include "access/hash.h"
-#include "access/heapam.h"
 #include "access/relscan.h"
 #include "access/tableam.h"
 #include "access/tsmapi.h"
@@ -29,9 +28,7 @@
 
 static TupleTableSlot *SampleNext(SampleScanState *node);
 static void tablesample_init(SampleScanState *scanstate);
-static HeapTuple tablesample_getnext(SampleScanState *scanstate);
-static bool SampleTupleVisible(HeapTuple tuple, OffsetNumber tupoffset,
-				   HeapScanDesc scan);
+static TupleTableSlot *tablesample_getnext(SampleScanState *scanstate);
 
 /* ----------------------------------------------------------------
  *						Scan Support
@@ -47,10 +44,6 @@ static bool SampleTupleVisible(HeapTuple tuple, OffsetNumber tupoffset,
 static TupleTableSlot *
 SampleNext(SampleScanState *node)
 {
-	HeapTuple	tuple;
-	TupleTableSlot *slot;
-	HeapScanDesc hscan;
-
 	/*
 	 * if this is first call within a scan, initialize
 	 */
@@ -60,19 +53,7 @@ SampleNext(SampleScanState *node)
 	/*
 	 * get the next tuple, and store it in our result slot
 	 */
-	tuple = tablesample_getnext(node);
-
-	slot = node->ss.ss_ScanTupleSlot;
-	hscan = (HeapScanDesc) node->ss.ss_currentScanDesc;
-
-	if (tuple)
-		ExecStoreBufferHeapTuple(tuple, /* tuple to store */
-								 slot,	/* slot to store in */
-								 hscan->rs_cbuf); /* tuple's buffer */
-	else
-		ExecClearTuple(slot);
-
-	return slot;
+	return tablesample_getnext(node);
 }
 
 /*
@@ -237,6 +218,9 @@ ExecReScanSampleScan(SampleScanState *node)
 {
 	/* Remember we need to do BeginSampleScan again (if we did it at all) */
 	node->begun = false;
+	node->done = false;
+	node->haveblock = false;
+	node->donetuples = 0;
 
 	ExecScanReScan(&node->ss);
 }
@@ -258,6 +242,7 @@ tablesample_init(SampleScanState *scanstate)
 	int			i;
 	ListCell   *arg;
 
+	scanstate->donetuples = 0;
 	params = (Datum *) palloc(list_length(scanstate->args) * sizeof(Datum));
 
 	i = 0;
@@ -345,225 +330,49 @@ tablesample_init(SampleScanState *scanstate)
 
 /*
  * Get next tuple from TABLESAMPLE method.
- *
- * Note: an awful lot of this is copied-and-pasted from heapam.c.  It would
- * perhaps be better to refactor to share more code.
  */
-static HeapTuple
+static TupleTableSlot *
 tablesample_getnext(SampleScanState *scanstate)
 {
-	TsmRoutine *tsm = scanstate->tsmroutine;
 	TableScanDesc scan = scanstate->ss.ss_currentScanDesc;
-	HeapScanDesc hscan = (HeapScanDesc) scan;
-	HeapTuple	tuple = &(hscan->rs_ctup);
-	Snapshot	snapshot = scan->rs_snapshot;
-	bool		pagemode = scan->rs_pageatatime;
-	BlockNumber blockno;
-	Page		page;
-	bool		all_visible;
-	OffsetNumber maxoffset;
+	TupleTableSlot *slot = scanstate->ss.ss_ScanTupleSlot;
 
-	if (!hscan->rs_inited)
-	{
-		/*
-		 * return null immediately if relation is empty
-		 */
-		if (hscan->rs_nblocks == 0)
-		{
-			Assert(!BufferIsValid(hscan->rs_cbuf));
-			tuple->t_data = NULL;
-			return NULL;
-		}
-		if (tsm->NextSampleBlock)
-		{
-			blockno = tsm->NextSampleBlock(scanstate);
-			if (!BlockNumberIsValid(blockno))
-			{
-				tuple->t_data = NULL;
-				return NULL;
-			}
-		}
-		else
-			blockno = hscan->rs_startblock;
-		Assert(blockno < hscan->rs_nblocks);
-		heapgetpage(scan, blockno);
-		hscan->rs_inited = true;
-	}
-	else
-	{
-		/* continue from previously returned page/tuple */
-		blockno = hscan->rs_cblock;	/* current page */
-	}
+	ExecClearTuple(slot);
 
-	/*
-	 * When not using pagemode, we must lock the buffer during tuple
-	 * visibility checks.
-	 */
-	if (!pagemode)
-		LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
-
-	page = (Page) BufferGetPage(hscan->rs_cbuf);
-	all_visible = PageIsAllVisible(page) && !snapshot->takenDuringRecovery;
-	maxoffset = PageGetMaxOffsetNumber(page);
+	if (scanstate->done)
+		return NULL;
 
 	for (;;)
 	{
-		OffsetNumber tupoffset;
-		bool		finished;
-
-		CHECK_FOR_INTERRUPTS();
-
-		/* Ask the tablesample method which tuples to check on this page. */
-		tupoffset = tsm->NextSampleTuple(scanstate,
-										 blockno,
-										 maxoffset);
-
-		if (OffsetNumberIsValid(tupoffset))
+		if (!scanstate->haveblock)
 		{
-			ItemId		itemid;
-			bool		visible;
-
-			/* Skip invalid tuple pointers. */
-			itemid = PageGetItemId(page, tupoffset);
-			if (!ItemIdIsNormal(itemid))
-				continue;
-
-			tuple->t_data = (HeapTupleHeader) PageGetItem(page, itemid);
-			tuple->t_len = ItemIdGetLength(itemid);
-			ItemPointerSet(&(tuple->t_self), blockno, tupoffset);
-
-			if (all_visible)
-				visible = true;
-			else
-				visible = SampleTupleVisible(tuple, tupoffset, hscan);
-
-			/* in pagemode, heapgetpage did this for us */
-			if (!pagemode)
-				CheckForSerializableConflictOut(visible, scan->rs_rd, tuple,
-												hscan->rs_cbuf, snapshot);
-
-			if (visible)
+			if (!table_scan_sample_next_block(scan, scanstate))
 			{
-				/* Found visible tuple, return it. */
-				if (!pagemode)
-					LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-				break;
-			}
-			else
-			{
-				/* Try next tuple from same page. */
-				continue;
+				scanstate->haveblock = false;
+				scanstate->done = true;
+
+				/* exhausted relation */
+				return NULL;
 			}
+
+			scanstate->haveblock = true;
 		}
 
-		/*
-		 * if we get here, it means we've exhausted the items on this page and
-		 * it's time to move to the next.
-		 */
-		if (!pagemode)
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-
-		if (tsm->NextSampleBlock)
+		if (!table_scan_sample_next_tuple(scan, scanstate, slot))
 		{
-			blockno = tsm->NextSampleBlock(scanstate);
-			Assert(!scan->rs_syncscan);
-			finished = !BlockNumberIsValid(blockno);
-		}
-		else
-		{
-			/* Without NextSampleBlock, just do a plain forward seqscan. */
-			blockno++;
-			if (blockno >= hscan->rs_nblocks)
-				blockno = 0;
-
 			/*
-			 * Report our new scan position for synchronization purposes.
-			 *
-			 * Note: we do this before checking for end of scan so that the
-			 * final state of the position hint is back at the start of the
-			 * rel.  That's not strictly necessary, but otherwise when you run
-			 * the same query multiple times the starting position would shift
-			 * a little bit backwards on every invocation, which is confusing.
-			 * We don't guarantee any specific ordering in general, though.
+			 * If we get here, it means we've exhausted the items on this page
+			 * and it's time to move to the next.
 			 */
-			if (scan->rs_syncscan)
-				ss_report_location(scan->rs_rd, blockno);
-
-			finished = (blockno == hscan->rs_startblock);
+			scanstate->haveblock = false;
+			continue;
 		}
 
-		/*
-		 * Reached end of scan?
-		 */
-		if (finished)
-		{
-			if (BufferIsValid(hscan->rs_cbuf))
-				ReleaseBuffer(hscan->rs_cbuf);
-			hscan->rs_cbuf = InvalidBuffer;
-			hscan->rs_cblock = InvalidBlockNumber;
-			tuple->t_data = NULL;
-			hscan->rs_inited = false;
-			return NULL;
-		}
-
-		Assert(blockno < hscan->rs_nblocks);
-		heapgetpage(scan, blockno);
-
-		/* Re-establish state for new page */
-		if (!pagemode)
-			LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
-
-		page = (Page) BufferGetPage(hscan->rs_cbuf);
-		all_visible = PageIsAllVisible(page) && !snapshot->takenDuringRecovery;
-		maxoffset = PageGetMaxOffsetNumber(page);
+		/* Found visible tuple, return it. */
+		break;
 	}
 
-	/* Count successfully-fetched tuples as heap fetches */
-	pgstat_count_heap_getnext(scan->rs_rd);
+	scanstate->donetuples++;
 
-	return &(hscan->rs_ctup);
-}
-
-/*
- * Check visibility of the tuple.
- */
-static bool
-SampleTupleVisible(HeapTuple tuple, OffsetNumber tupoffset, HeapScanDesc scan)
-{
-	if (scan->rs_base.rs_pageatatime)
-	{
-		/*
-		 * In pageatatime mode, heapgetpage() already did visibility checks,
-		 * so just look at the info it left in rs_vistuples[].
-		 *
-		 * We use a binary search over the known-sorted array.  Note: we could
-		 * save some effort if we insisted that NextSampleTuple select tuples
-		 * in increasing order, but it's not clear that there would be enough
-		 * gain to justify the restriction.
-		 */
-		int			start = 0,
-					end = scan->rs_ntuples - 1;
-
-		while (start <= end)
-		{
-			int			mid = (start + end) / 2;
-			OffsetNumber curoffset = scan->rs_vistuples[mid];
-
-			if (tupoffset == curoffset)
-				return true;
-			else if (tupoffset < curoffset)
-				end = mid - 1;
-			else
-				start = mid + 1;
-		}
-
-		return false;
-	}
-	else
-	{
-		/* Otherwise, we have to check the tuple individually. */
-		return HeapTupleSatisfiesVisibility(tuple,
-											scan->rs_base.rs_snapshot,
-											scan->rs_cbuf);
-	}
+	return slot;
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fc144c3665..0c9339c676e 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -30,6 +30,7 @@ extern bool synchronize_seqscans;
 
 struct VacuumParams;
 struct ValidateIndexState;
+struct SampleScanState;
 struct BulkInsertStateData;
 
 /*
@@ -344,6 +345,18 @@ typedef struct TableAmRoutine
 	void		(*relation_estimate_size) (Relation rel, int32 *attr_widths,
 										   BlockNumber *pages, double *tuples, double *allvisfrac);
 
+
+	/* ------------------------------------------------------------------------
+	 * Executor related functions.
+	 * ------------------------------------------------------------------------
+	 */
+
+	bool		(*scan_sample_next_block) (TableScanDesc scan,
+										   struct SampleScanState *scanstate);
+	bool		(*scan_sample_next_tuple) (TableScanDesc scan,
+										   struct SampleScanState *scanstate,
+										   TupleTableSlot *slot);
+
 } TableAmRoutine;
 
 
@@ -887,6 +900,24 @@ table_estimate_size(Relation rel, int32 *attr_widths,
 }
 
 
+/* ----------------------------------------------------------------------------
+ * Executor related functionality
+ * ----------------------------------------------------------------------------
+ */
+
+static inline bool
+table_scan_sample_next_block(TableScanDesc scan, struct SampleScanState *scanstate)
+{
+	return scan->rs_rd->rd_tableam->scan_sample_next_block(scan, scanstate);
+}
+
+static inline bool
+table_scan_sample_next_tuple(TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot)
+{
+	return scan->rs_rd->rd_tableam->scan_sample_next_tuple(scan, scanstate, slot);
+}
+
+
 /* ----------------------------------------------------------------------------
  * Functions to make modifications a bit simpler.
  * ----------------------------------------------------------------------------
diff --git a/src/include/access/tsmapi.h b/src/include/access/tsmapi.h
index a5c0b4cafec..ccef65aedb4 100644
--- a/src/include/access/tsmapi.h
+++ b/src/include/access/tsmapi.h
@@ -34,7 +34,7 @@ typedef void (*BeginSampleScan_function) (SampleScanState *node,
 										  int nparams,
 										  uint32 seed);
 
-typedef BlockNumber (*NextSampleBlock_function) (SampleScanState *node);
+typedef BlockNumber (*NextSampleBlock_function) (SampleScanState *node, BlockNumber nblocks);
 
 typedef OffsetNumber (*NextSampleTuple_function) (SampleScanState *node,
 												  BlockNumber blockno,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 62eb1a06eef..22ac3b17518 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1300,6 +1300,9 @@ typedef struct SampleScanState
 	bool		use_pagemode;	/* use page-at-a-time visibility checking? */
 	bool		begun;			/* false means need to call BeginSampleScan */
 	uint32		seed;			/* random seed */
+	int64		donetuples;		/* number of tuples already returned */
+	bool		haveblock;		/* has a block for sampling been determined */
+	bool		done;			/* exhausted all tuples? */
 } SampleScanState;
 
 /*
@@ -1528,6 +1531,7 @@ typedef struct BitmapHeapScanState
 	Buffer		pvmbuffer;
 	long		exact_pages;
 	long		lossy_pages;
+	int			return_empty_tuples;
 	TBMIterator *prefetch_iterator;
 	int			prefetch_pages;
 	int			prefetch_target;
diff --git a/src/include/nodes/tidbitmap.h b/src/include/nodes/tidbitmap.h
index 2645085b344..6a7f3054a41 100644
--- a/src/include/nodes/tidbitmap.h
+++ b/src/include/nodes/tidbitmap.h
@@ -37,7 +37,7 @@ typedef struct TBMIterator TBMIterator;
 typedef struct TBMSharedIterator TBMSharedIterator;
 
 /* Result structure for tbm_iterate */
-typedef struct
+typedef struct TBMIterateResult
 {
 	BlockNumber blockno;		/* page number containing tuples */
 	int			ntuples;		/* -1 indicates lossy result */
-- 
2.21.0.dirty

v18-0015-tableam-bitmap-heap-scan.patchtext/x-diff; charset=us-asciiDownload
From 9d42e176fafd4d83ea7d3a24226a7c7f038be167 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 20 Jan 2019 00:11:32 -0800
Subject: [PATCH v18 15/18] tableam: bitmap heap scan.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/heap/heapam_handler.c  | 147 ++++++++++++
 src/backend/executor/nodeBitmapHeapscan.c | 266 +++++-----------------
 src/backend/optimizer/util/plancat.c      |   3 +-
 src/include/access/tableam.h              |  17 ++
 4 files changed, 223 insertions(+), 210 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index f71b9b2a062..b183b22ca16 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1806,6 +1806,151 @@ heapam_index_validate_scan(Relation heapRelation,
 	indexInfo->ii_PredicateState = NULL;
 }
 
+static bool
+heapam_scan_bitmap_pagescan(TableScanDesc sscan,
+							TBMIterateResult *tbmres)
+{
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+	BlockNumber page = tbmres->blockno;
+	Buffer		buffer;
+	Snapshot	snapshot;
+	int			ntup;
+
+	scan->rs_cindex = 0;
+	scan->rs_ntuples = 0;
+
+	/*
+	 * Ignore any claimed entries past what we think is the end of the
+	 * relation.  (This is probably not necessary given that we got at least
+	 * AccessShareLock on the table before performing any of the indexscans,
+	 * but let's be safe.)
+	 */
+	if (page >= scan->rs_nblocks)
+		return false;
+
+	scan->rs_cbuf = ReleaseAndReadBuffer(scan->rs_cbuf,
+										 scan->rs_base.rs_rd,
+										 page);
+	scan->rs_cblock = page;
+	buffer = scan->rs_cbuf;
+	snapshot = scan->rs_base.rs_snapshot;
+
+	ntup = 0;
+
+	/*
+	 * Prune and repair fragmentation for the whole page, if possible.
+	 */
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+
+	/*
+	 * We must hold share lock on the buffer content while examining tuple
+	 * visibility.  Afterwards, however, the tuples we have found to be
+	 * visible are guaranteed good as long as we hold the buffer pin.
+	 */
+	LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+	/*
+	 * We need two separate strategies for lossy and non-lossy cases.
+	 */
+	if (tbmres->ntuples >= 0)
+	{
+		/*
+		 * Bitmap is non-lossy, so we just look through the offsets listed in
+		 * tbmres; but we have to follow any HOT chain starting at each such
+		 * offset.
+		 */
+		int			curslot;
+
+		for (curslot = 0; curslot < tbmres->ntuples; curslot++)
+		{
+			OffsetNumber offnum = tbmres->offsets[curslot];
+			ItemPointerData tid;
+			HeapTupleData heapTuple;
+
+			ItemPointerSet(&tid, page, offnum);
+			if (heap_hot_search_buffer(&tid, sscan->rs_rd, buffer, snapshot,
+									   &heapTuple, NULL, true))
+				scan->rs_vistuples[ntup++] = ItemPointerGetOffsetNumber(&tid);
+		}
+	}
+	else
+	{
+		/*
+		 * Bitmap is lossy, so we must examine each item pointer on the page.
+		 * But we can ignore HOT chains, since we'll check each tuple anyway.
+		 */
+		Page		dp = (Page) BufferGetPage(buffer);
+		OffsetNumber maxoff = PageGetMaxOffsetNumber(dp);
+		OffsetNumber offnum;
+
+		for (offnum = FirstOffsetNumber; offnum <= maxoff; offnum = OffsetNumberNext(offnum))
+		{
+			ItemId		lp;
+			HeapTupleData loctup;
+			bool		valid;
+
+			lp = PageGetItemId(dp, offnum);
+			if (!ItemIdIsNormal(lp))
+				continue;
+			loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
+			loctup.t_len = ItemIdGetLength(lp);
+			loctup.t_tableOid = scan->rs_base.rs_rd->rd_id;
+			ItemPointerSet(&loctup.t_self, page, offnum);
+			valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer);
+			if (valid)
+			{
+				scan->rs_vistuples[ntup++] = offnum;
+				PredicateLockTuple(scan->rs_base.rs_rd, &loctup, snapshot);
+			}
+			CheckForSerializableConflictOut(valid, scan->rs_base.rs_rd, &loctup,
+											buffer, snapshot);
+		}
+	}
+
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
+	Assert(ntup <= MaxHeapTuplesPerPage);
+	scan->rs_ntuples = ntup;
+
+	return ntup > 0;
+}
+
+static bool
+heapam_scan_bitmap_pagescan_next(TableScanDesc sscan, TupleTableSlot *slot)
+{
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+	OffsetNumber targoffset;
+	Page		dp;
+	ItemId		lp;
+
+	if (scan->rs_cindex < 0 || scan->rs_cindex >= scan->rs_ntuples)
+		return false;
+
+	targoffset = scan->rs_vistuples[scan->rs_cindex];
+	dp = (Page) BufferGetPage(scan->rs_cbuf);
+	lp = PageGetItemId(dp, targoffset);
+	Assert(ItemIdIsNormal(lp));
+
+	scan->rs_ctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
+	scan->rs_ctup.t_len = ItemIdGetLength(lp);
+	scan->rs_ctup.t_tableOid = scan->rs_base.rs_rd->rd_id;
+	ItemPointerSet(&scan->rs_ctup.t_self, scan->rs_cblock, targoffset);
+
+	pgstat_count_heap_fetch(scan->rs_base.rs_rd);
+
+	/*
+	 * Set up the result slot to point to this tuple.  Note that the slot
+	 * acquires a pin on the buffer.
+	 */
+	ExecStoreBufferHeapTuple(&scan->rs_ctup,
+							 slot,
+							 scan->rs_cbuf);
+
+	scan->rs_cindex++;
+
+	return true;
+}
+
 /*
  * Check visibility of the tuple.
  */
@@ -2208,6 +2353,8 @@ static const TableAmRoutine heapam_methods = {
 
 	.relation_estimate_size = heapam_estimate_rel_size,
 
+	.scan_bitmap_pagescan = heapam_scan_bitmap_pagescan,
+	.scan_bitmap_pagescan_next = heapam_scan_bitmap_pagescan_next,
 	.scan_sample_next_block = heapam_scan_sample_next_block,
 	.scan_sample_next_tuple = heapam_scan_sample_next_tuple
 };
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 3a82857770c..59061c746b1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -37,7 +37,6 @@
 
 #include <math.h>
 
-#include "access/heapam.h"
 #include "access/relscan.h"
 #include "access/tableam.h"
 #include "access/transam.h"
@@ -55,7 +54,6 @@
 
 
 static TupleTableSlot *BitmapHeapNext(BitmapHeapScanState *node);
-static void bitgetpage(HeapScanDesc scan, TBMIterateResult *tbmres);
 static inline void BitmapDoneInitializingSharedState(
 								  ParallelBitmapHeapState *pstate);
 static inline void BitmapAdjustPrefetchIterator(BitmapHeapScanState *node,
@@ -78,12 +76,10 @@ BitmapHeapNext(BitmapHeapScanState *node)
 {
 	ExprContext *econtext;
 	TableScanDesc scan;
-	HeapScanDesc hscan;
 	TIDBitmap  *tbm;
 	TBMIterator *tbmiterator = NULL;
 	TBMSharedIterator *shared_tbmiterator = NULL;
 	TBMIterateResult *tbmres;
-	OffsetNumber targoffset;
 	TupleTableSlot *slot;
 	ParallelBitmapHeapState *pstate = node->pstate;
 	dsa_area   *dsa = node->ss.ps.state->es_query_dsa;
@@ -94,7 +90,6 @@ BitmapHeapNext(BitmapHeapScanState *node)
 	econtext = node->ss.ps.ps_ExprContext;
 	slot = node->ss.ss_ScanTupleSlot;
 	scan = node->ss.ss_currentScanDesc;
-	hscan = (HeapScanDesc) scan;
 	tbm = node->tbm;
 	if (pstate == NULL)
 		tbmiterator = node->tbmiterator;
@@ -194,16 +189,27 @@ BitmapHeapNext(BitmapHeapScanState *node)
 
 	for (;;)
 	{
-		Page		dp;
-		ItemId		lp;
-
 		CHECK_FOR_INTERRUPTS();
 
-		/*
-		 * Get next page of results if needed
-		 */
-		if (tbmres == NULL)
+		if (node->return_empty_tuples > 0)
 		{
+			ExecStoreAllNullTuple(slot);
+			node->return_empty_tuples--;
+		}
+		else if (tbmres)
+		{
+			if (!table_scan_bitmap_pagescan_next(scan, slot))
+			{
+				node->tbmres = tbmres = NULL;
+				continue;
+			}
+		}
+		else
+		{
+			/*
+			 * Get next page of results if needed
+			 */
+
 			if (!pstate)
 				node->tbmres = tbmres = tbm_iterate(tbmiterator);
 			else
@@ -216,18 +222,6 @@ BitmapHeapNext(BitmapHeapScanState *node)
 
 			BitmapAdjustPrefetchIterator(node, tbmres);
 
-			/*
-			 * Ignore any claimed entries past what we think is the end of the
-			 * relation.  (This is probably not necessary given that we got at
-			 * least AccessShareLock on the table before performing any of the
-			 * indexscans, but let's be safe.)
-			 */
-			if (tbmres->blockno >= hscan->rs_nblocks)
-			{
-				node->tbmres = tbmres = NULL;
-				continue;
-			}
-
 			/*
 			 * We can skip fetching the heap page if we don't need any fields
 			 * from the heap, and the bitmap entries don't need rechecking,
@@ -243,16 +237,21 @@ BitmapHeapNext(BitmapHeapScanState *node)
 			{
 				/*
 				 * The number of tuples on this page is put into
-				 * scan->rs_ntuples; note we don't fill scan->rs_vistuples.
+				 * node->return_empty_tuples; note we don't fill
+				 * scan->rs_vistuples.
 				 */
-				hscan->rs_ntuples = tbmres->ntuples;
+				node->return_empty_tuples = tbmres->ntuples;
 			}
 			else
 			{
 				/*
 				 * Fetch the current heap page and identify candidate tuples.
 				 */
-				bitgetpage(hscan, tbmres);
+				if (!table_scan_bitmap_pagescan(scan, tbmres))
+				{
+					/* AM doesn't think this block is valid, skip */
+					continue;
+				}
 			}
 
 			if (tbmres->ntuples >= 0)
@@ -260,51 +259,37 @@ BitmapHeapNext(BitmapHeapScanState *node)
 			else
 				node->lossy_pages++;
 
-			/*
-			 * Set rs_cindex to first slot to examine
-			 */
-			hscan->rs_cindex = 0;
-
 			/* Adjust the prefetch target */
 			BitmapAdjustPrefetchTarget(node);
-		}
-		else
-		{
+
 			/*
-			 * Continuing in previously obtained page; advance rs_cindex
+			 * XXX: Note we do not prefetch here.
 			 */
-			hscan->rs_cindex++;
+
+			continue;
+		}
+
 
 #ifdef USE_PREFETCH
 
-			/*
-			 * Try to prefetch at least a few pages even before we get to the
-			 * second page if we don't stop reading after the first tuple.
-			 */
-			if (!pstate)
-			{
-				if (node->prefetch_target < node->prefetch_maximum)
-					node->prefetch_target++;
-			}
-			else if (pstate->prefetch_target < node->prefetch_maximum)
-			{
-				/* take spinlock while updating shared state */
-				SpinLockAcquire(&pstate->mutex);
-				if (pstate->prefetch_target < node->prefetch_maximum)
-					pstate->prefetch_target++;
-				SpinLockRelease(&pstate->mutex);
-			}
-#endif							/* USE_PREFETCH */
-		}
-
 		/*
-		 * Out of range?  If so, nothing more to look at on this page
+		 * Try to prefetch at least a few pages even before we get to the
+		 * second page if we don't stop reading after the first tuple.
 		 */
-		if (hscan->rs_cindex < 0 || hscan->rs_cindex >= hscan->rs_ntuples)
+		if (!pstate)
 		{
-			node->tbmres = tbmres = NULL;
-			continue;
+			if (node->prefetch_target < node->prefetch_maximum)
+				node->prefetch_target++;
 		}
+		else if (pstate->prefetch_target < node->prefetch_maximum)
+		{
+			/* take spinlock while updating shared state */
+			SpinLockAcquire(&pstate->mutex);
+			if (pstate->prefetch_target < node->prefetch_maximum)
+				pstate->prefetch_target++;
+			SpinLockRelease(&pstate->mutex);
+		}
+#endif							/* USE_PREFETCH */
 
 		/*
 		 * We issue prefetch requests *after* fetching the current page to try
@@ -315,52 +300,19 @@ BitmapHeapNext(BitmapHeapScanState *node)
 		 */
 		BitmapPrefetch(node, scan);
 
-		if (node->skip_fetch)
+		/*
+		 * If we are using lossy info, we have to recheck the qual conditions
+		 * at every tuple.
+		 */
+		if (tbmres->recheck)
 		{
-			/*
-			 * If we don't have to fetch the tuple, just return nulls.
-			 */
-			ExecStoreAllNullTuple(slot);
-		}
-		else
-		{
-			/*
-			 * Okay to fetch the tuple.
-			 */
-			targoffset = hscan->rs_vistuples[hscan->rs_cindex];
-			dp = (Page) BufferGetPage(hscan->rs_cbuf);
-			lp = PageGetItemId(dp, targoffset);
-			Assert(ItemIdIsNormal(lp));
-
-			hscan->rs_ctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
-			hscan->rs_ctup.t_len = ItemIdGetLength(lp);
-			hscan->rs_ctup.t_tableOid = scan->rs_rd->rd_id;
-			ItemPointerSet(&hscan->rs_ctup.t_self, tbmres->blockno, targoffset);
-
-			pgstat_count_heap_fetch(scan->rs_rd);
-
-			/*
-			 * Set up the result slot to point to this tuple.  Note that the
-			 * slot acquires a pin on the buffer.
-			 */
-			ExecStoreBufferHeapTuple(&hscan->rs_ctup,
-									 slot,
-									 hscan->rs_cbuf);
-
-			/*
-			 * If we are using lossy info, we have to recheck the qual
-			 * conditions at every tuple.
-			 */
-			if (tbmres->recheck)
+			econtext->ecxt_scantuple = slot;
+			if (!ExecQualAndReset(node->bitmapqualorig, econtext))
 			{
-				econtext->ecxt_scantuple = slot;
-				if (!ExecQualAndReset(node->bitmapqualorig, econtext))
-				{
-					/* Fails recheck, so drop it and loop back for another */
-					InstrCountFiltered2(node, 1);
-					ExecClearTuple(slot);
-					continue;
-				}
+				/* Fails recheck, so drop it and loop back for another */
+				InstrCountFiltered2(node, 1);
+				ExecClearTuple(slot);
+				continue;
 			}
 		}
 
@@ -374,110 +326,6 @@ BitmapHeapNext(BitmapHeapScanState *node)
 	return ExecClearTuple(slot);
 }
 
-/*
- * bitgetpage - subroutine for BitmapHeapNext()
- *
- * This routine reads and pins the specified page of the relation, then
- * builds an array indicating which tuples on the page are both potentially
- * interesting according to the bitmap, and visible according to the snapshot.
- */
-static void
-bitgetpage(HeapScanDesc scan, TBMIterateResult *tbmres)
-{
-	BlockNumber page = tbmres->blockno;
-	Buffer		buffer;
-	Snapshot	snapshot;
-	int			ntup;
-
-	/*
-	 * Acquire pin on the target heap page, trading in any pin we held before.
-	 */
-	Assert(page < scan->rs_nblocks);
-
-	scan->rs_cbuf = ReleaseAndReadBuffer(scan->rs_cbuf,
-										 scan->rs_base.rs_rd,
-										 page);
-	buffer = scan->rs_cbuf;
-	snapshot = scan->rs_base.rs_snapshot;
-
-	ntup = 0;
-
-	/*
-	 * Prune and repair fragmentation for the whole page, if possible.
-	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
-
-	/*
-	 * We must hold share lock on the buffer content while examining tuple
-	 * visibility.  Afterwards, however, the tuples we have found to be
-	 * visible are guaranteed good as long as we hold the buffer pin.
-	 */
-	LockBuffer(buffer, BUFFER_LOCK_SHARE);
-
-	/*
-	 * We need two separate strategies for lossy and non-lossy cases.
-	 */
-	if (tbmres->ntuples >= 0)
-	{
-		/*
-		 * Bitmap is non-lossy, so we just look through the offsets listed in
-		 * tbmres; but we have to follow any HOT chain starting at each such
-		 * offset.
-		 */
-		int			curslot;
-
-		for (curslot = 0; curslot < tbmres->ntuples; curslot++)
-		{
-			OffsetNumber offnum = tbmres->offsets[curslot];
-			ItemPointerData tid;
-			HeapTupleData heapTuple;
-
-			ItemPointerSet(&tid, page, offnum);
-			if (heap_hot_search_buffer(&tid, scan->rs_base.rs_rd, buffer,
-									   snapshot, &heapTuple, NULL, true))
-				scan->rs_vistuples[ntup++] = ItemPointerGetOffsetNumber(&tid);
-		}
-	}
-	else
-	{
-		/*
-		 * Bitmap is lossy, so we must examine each item pointer on the page.
-		 * But we can ignore HOT chains, since we'll check each tuple anyway.
-		 */
-		Page		dp = (Page) BufferGetPage(buffer);
-		OffsetNumber maxoff = PageGetMaxOffsetNumber(dp);
-		OffsetNumber offnum;
-
-		for (offnum = FirstOffsetNumber; offnum <= maxoff; offnum = OffsetNumberNext(offnum))
-		{
-			ItemId		lp;
-			HeapTupleData loctup;
-			bool		valid;
-
-			lp = PageGetItemId(dp, offnum);
-			if (!ItemIdIsNormal(lp))
-				continue;
-			loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
-			loctup.t_len = ItemIdGetLength(lp);
-			loctup.t_tableOid = scan->rs_base.rs_rd->rd_id;
-			ItemPointerSet(&loctup.t_self, page, offnum);
-			valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer);
-			if (valid)
-			{
-				scan->rs_vistuples[ntup++] = offnum;
-				PredicateLockTuple(scan->rs_base.rs_rd, &loctup, snapshot);
-			}
-			CheckForSerializableConflictOut(valid, scan->rs_base.rs_rd,
-											&loctup, buffer, snapshot);
-		}
-	}
-
-	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
-
-	Assert(ntup <= MaxHeapTuplesPerPage);
-	scan->rs_ntuples = ntup;
-}
-
 /*
  *	BitmapDoneInitializingSharedState - Shared state is initialized
  *
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 5ee829bb24e..8ee8821a3ef 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -273,7 +273,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			info->amsearchnulls = amroutine->amsearchnulls;
 			info->amcanparallel = amroutine->amcanparallel;
 			info->amhasgettuple = (amroutine->amgettuple != NULL);
-			info->amhasgetbitmap = (amroutine->amgetbitmap != NULL);
+			info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
+				relation->rd_tableam->scan_bitmap_pagescan != NULL;
 			info->amcostestimate = amroutine->amcostestimate;
 			Assert(info->amcostestimate != NULL);
 
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 0c9339c676e..2ed25ec748f 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -351,6 +351,10 @@ typedef struct TableAmRoutine
 	 * ------------------------------------------------------------------------
 	 */
 
+	bool		(*scan_bitmap_pagescan) (TableScanDesc scan,
+										 TBMIterateResult *tbmres);
+	bool		(*scan_bitmap_pagescan_next) (TableScanDesc scan,
+											  TupleTableSlot *slot);
 	bool		(*scan_sample_next_block) (TableScanDesc scan,
 										   struct SampleScanState *scanstate);
 	bool		(*scan_sample_next_tuple) (TableScanDesc scan,
@@ -905,6 +909,19 @@ table_estimate_size(Relation rel, int32 *attr_widths,
  * ----------------------------------------------------------------------------
  */
 
+static inline bool
+table_scan_bitmap_pagescan(TableScanDesc scan,
+						   TBMIterateResult *tbmres)
+{
+	return scan->rs_rd->rd_tableam->scan_bitmap_pagescan(scan, tbmres);
+}
+
+static inline bool
+table_scan_bitmap_pagescan_next(TableScanDesc scan, TupleTableSlot *slot)
+{
+	return scan->rs_rd->rd_tableam->scan_bitmap_pagescan_next(scan, slot);
+}
+
 static inline bool
 table_scan_sample_next_block(TableScanDesc scan, struct SampleScanState *scanstate)
 {
-- 
2.21.0.dirty

v18-0016-WIP-Move-xid-horizon-computation-for-page-level-.patchtext/x-diff; charset=us-asciiDownload
From 0b65e46ae5bd9c79bc45cc49fc8d9ac76ab07f1e Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 19 Dec 2018 12:32:40 -0800
Subject: [PATCH v18 16/18] WIP: Move xid horizon computation for page level
 index vacuum to primary.

During recovery we do not know which table AM to go to to compute the
xid horizon. To allow for pluggable storage this therefore moves the
computation to the primary.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/hash/hash_xlog.c    | 153 +-----------------------
 src/backend/access/hash/hashinsert.c   |  18 ++-
 src/backend/access/heap/heapam.c       | 128 ++++++++++++++++++++
 src/backend/access/index/genam.c       |  36 ++++++
 src/backend/access/nbtree/nbtpage.c    |   7 ++
 src/backend/access/nbtree/nbtxlog.c    | 156 +------------------------
 src/backend/access/rmgrdesc/hashdesc.c |   5 +-
 src/backend/access/rmgrdesc/nbtdesc.c  |   3 +-
 src/include/access/genam.h             |   5 +
 src/include/access/hash_xlog.h         |   1 +
 src/include/access/heapam.h            |   4 +
 src/include/access/nbtxlog.h           |   1 +
 12 files changed, 201 insertions(+), 316 deletions(-)

diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index c6d87261579..20441e307a8 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -969,155 +969,6 @@ hash_xlog_update_meta_page(XLogReaderState *record)
 		UnlockReleaseBuffer(metabuf);
 }
 
-/*
- * Get the latestRemovedXid from the heap pages pointed at by the index
- * tuples being deleted. See also btree_xlog_delete_get_latestRemovedXid,
- * on which this function is based.
- */
-static TransactionId
-hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
-{
-	xl_hash_vacuum_one_page *xlrec;
-	OffsetNumber *unused;
-	Buffer		ibuffer,
-				hbuffer;
-	Page		ipage,
-				hpage;
-	RelFileNode rnode;
-	BlockNumber blkno;
-	ItemId		iitemid,
-				hitemid;
-	IndexTuple	itup;
-	HeapTupleHeader htuphdr;
-	BlockNumber hblkno;
-	OffsetNumber hoffnum;
-	TransactionId latestRemovedXid = InvalidTransactionId;
-	int			i;
-
-	xlrec = (xl_hash_vacuum_one_page *) XLogRecGetData(record);
-
-	/*
-	 * If there's nothing running on the standby we don't need to derive a
-	 * full latestRemovedXid value, so use a fast path out of here.  This
-	 * returns InvalidTransactionId, and so will conflict with all HS
-	 * transactions; but since we just worked out that that's zero people,
-	 * it's OK.
-	 *
-	 * XXX There is a race condition here, which is that a new backend might
-	 * start just after we look.  If so, it cannot need to conflict, but this
-	 * coding will result in throwing a conflict anyway.
-	 */
-	if (CountDBBackends(InvalidOid) == 0)
-		return latestRemovedXid;
-
-	/*
-	 * Check if WAL replay has reached a consistent database state. If not, we
-	 * must PANIC. See the definition of
-	 * btree_xlog_delete_get_latestRemovedXid for more details.
-	 */
-	if (!reachedConsistency)
-		elog(PANIC, "hash_xlog_vacuum_get_latestRemovedXid: cannot operate with inconsistent data");
-
-	/*
-	 * Get index page.  If the DB is consistent, this should not fail, nor
-	 * should any of the heap page fetches below.  If one does, we return
-	 * InvalidTransactionId to cancel all HS transactions.  That's probably
-	 * overkill, but it's safe, and certainly better than panicking here.
-	 */
-	XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno);
-	ibuffer = XLogReadBufferExtended(rnode, MAIN_FORKNUM, blkno, RBM_NORMAL);
-
-	if (!BufferIsValid(ibuffer))
-		return InvalidTransactionId;
-	LockBuffer(ibuffer, HASH_READ);
-	ipage = (Page) BufferGetPage(ibuffer);
-
-	/*
-	 * Loop through the deleted index items to obtain the TransactionId from
-	 * the heap items they point to.
-	 */
-	unused = (OffsetNumber *) ((char *) xlrec + SizeOfHashVacuumOnePage);
-
-	for (i = 0; i < xlrec->ntuples; i++)
-	{
-		/*
-		 * Identify the index tuple about to be deleted.
-		 */
-		iitemid = PageGetItemId(ipage, unused[i]);
-		itup = (IndexTuple) PageGetItem(ipage, iitemid);
-
-		/*
-		 * Locate the heap page that the index tuple points at
-		 */
-		hblkno = ItemPointerGetBlockNumber(&(itup->t_tid));
-		hbuffer = XLogReadBufferExtended(xlrec->hnode, MAIN_FORKNUM,
-										 hblkno, RBM_NORMAL);
-
-		if (!BufferIsValid(hbuffer))
-		{
-			UnlockReleaseBuffer(ibuffer);
-			return InvalidTransactionId;
-		}
-		LockBuffer(hbuffer, HASH_READ);
-		hpage = (Page) BufferGetPage(hbuffer);
-
-		/*
-		 * Look up the heap tuple header that the index tuple points at by
-		 * using the heap node supplied with the xlrec. We can't use
-		 * heap_fetch, since it uses ReadBuffer rather than XLogReadBuffer.
-		 * Note that we are not looking at tuple data here, just headers.
-		 */
-		hoffnum = ItemPointerGetOffsetNumber(&(itup->t_tid));
-		hitemid = PageGetItemId(hpage, hoffnum);
-
-		/*
-		 * Follow any redirections until we find something useful.
-		 */
-		while (ItemIdIsRedirected(hitemid))
-		{
-			hoffnum = ItemIdGetRedirect(hitemid);
-			hitemid = PageGetItemId(hpage, hoffnum);
-			CHECK_FOR_INTERRUPTS();
-		}
-
-		/*
-		 * If the heap item has storage, then read the header and use that to
-		 * set latestRemovedXid.
-		 *
-		 * Some LP_DEAD items may not be accessible, so we ignore them.
-		 */
-		if (ItemIdHasStorage(hitemid))
-		{
-			htuphdr = (HeapTupleHeader) PageGetItem(hpage, hitemid);
-			HeapTupleHeaderAdvanceLatestRemovedXid(htuphdr, &latestRemovedXid);
-		}
-		else if (ItemIdIsDead(hitemid))
-		{
-			/*
-			 * Conjecture: if hitemid is dead then it had xids before the xids
-			 * marked on LP_NORMAL items. So we just ignore this item and move
-			 * onto the next, for the purposes of calculating
-			 * latestRemovedxids.
-			 */
-		}
-		else
-			Assert(!ItemIdIsUsed(hitemid));
-
-		UnlockReleaseBuffer(hbuffer);
-	}
-
-	UnlockReleaseBuffer(ibuffer);
-
-	/*
-	 * If all heap tuples were LP_DEAD then we will be returning
-	 * InvalidTransactionId here, which avoids conflicts. This matches
-	 * existing logic which assumes that LP_DEAD tuples must already be older
-	 * than the latestRemovedXid on the cleanup record that set them as
-	 * LP_DEAD, hence must already have generated a conflict.
-	 */
-	return latestRemovedXid;
-}
-
 /*
  * replay delete operation in hash index to remove
  * tuples marked as DEAD during index tuple insertion.
@@ -1149,12 +1000,10 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
 	 */
 	if (InHotStandby)
 	{
-		TransactionId latestRemovedXid =
-		hash_xlog_vacuum_get_latestRemovedXid(record);
 		RelFileNode rnode;
 
 		XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
-		ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+		ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
 	}
 
 	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true, &buffer);
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 970733f0cd4..a248bd0743f 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -23,8 +23,8 @@
 #include "storage/buf_internals.h"
 #include "storage/predicate.h"
 
-static void _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
-					  RelFileNode hnode);
+static void _hash_vacuum_one_page(Relation rel, Relation hrel,
+								  Buffer metabuf, Buffer buf);
 
 /*
  *	_hash_doinsert() -- Handle insertion of a single index tuple.
@@ -137,7 +137,7 @@ restart_insert:
 
 			if (IsBufferCleanupOK(buf))
 			{
-				_hash_vacuum_one_page(rel, metabuf, buf, heapRel->rd_node);
+				_hash_vacuum_one_page(rel, heapRel, metabuf, buf);
 
 				if (PageGetFreeSpace(page) >= itemsz)
 					break;		/* OK, now we have enough space */
@@ -335,8 +335,7 @@ _hash_pgaddmultitup(Relation rel, Buffer buf, IndexTuple *itups,
  */
 
 static void
-_hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
-					  RelFileNode hnode)
+_hash_vacuum_one_page(Relation rel, Relation hrel, Buffer metabuf, Buffer buf)
 {
 	OffsetNumber deletable[MaxOffsetNumber];
 	int			ndeletable = 0;
@@ -360,6 +359,12 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
 
 	if (ndeletable > 0)
 	{
+		TransactionId latestRemovedXid;
+
+		latestRemovedXid =
+			index_compute_xid_horizon_for_tuples(rel, hrel, buf,
+												 deletable, ndeletable);
+
 		/*
 		 * Write-lock the meta page so that we can decrement tuple count.
 		 */
@@ -393,7 +398,8 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
 			xl_hash_vacuum_one_page xlrec;
 			XLogRecPtr	recptr;
 
-			xlrec.hnode = hnode;
+			xlrec.latestRemovedXid = latestRemovedXid;
+			xlrec.hnode = hrel->rd_node;
 			xlrec.ntuples = ndeletable;
 
 			XLogBeginInsert();
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0321d6bab9a..e9fdb0d4cf5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6946,6 +6946,134 @@ HeapTupleHeaderAdvanceLatestRemovedXid(HeapTupleHeader tuple,
 	/* *latestRemovedXid may still be invalid at end */
 }
 
+/*
+ * Get the latestRemovedXid from the heap pages pointed at by the index
+ * tuples being deleted.
+ *
+ * This puts the work for calculating latestRemovedXid into the recovery path
+ * rather than the primary path.
+ *
+ * It's possible that this generates a fair amount of I/O, since an index
+ * block may have hundreds of tuples being deleted. To amortize that cost to
+ * some degree, this uses prefetching and combines repeat accesses to the same
+ * block.
+ */
+TransactionId
+heap_compute_xid_horizon_for_tuples(Relation rel,
+									ItemPointerData *tids,
+									int nitems)
+{
+	TransactionId latestRemovedXid = InvalidTransactionId;
+	BlockNumber hblkno;
+	Buffer		buf = InvalidBuffer;
+	Page		hpage;
+
+	/*
+	 * Sort to avoid repeated lookups for the same page, and to make it more
+	 * likely to access items in an efficient order. In particular this
+	 * ensures thaf if there are multiple pointers to the same page, they all
+	 * get processed looking up and locking the page just once.
+	 */
+	qsort((void *) tids, nitems, sizeof(ItemPointerData),
+		  (int (*) (const void *, const void *)) ItemPointerCompare);
+
+	/* prefetch all pages */
+#ifdef USE_PREFETCH
+	hblkno = InvalidBlockNumber;
+	for (int i = 0; i < nitems; i++)
+	{
+		ItemPointer htid = &tids[i];
+
+		if (hblkno == InvalidBlockNumber ||
+			ItemPointerGetBlockNumber(htid) != hblkno)
+		{
+			hblkno = ItemPointerGetBlockNumber(htid);
+
+			PrefetchBuffer(rel, MAIN_FORKNUM, hblkno);
+		}
+	}
+#endif
+
+	/* Iterate over all tids, and check their horizon */
+	hblkno = InvalidBlockNumber;
+	for (int i = 0; i < nitems; i++)
+	{
+		ItemPointer htid = &tids[i];
+		ItemId		hitemid;
+		OffsetNumber hoffnum;
+
+		/*
+		 * Read heap buffer, but avoid refetching if it's the same block as
+		 * required for the last tid.
+		 */
+		if (hblkno == InvalidBlockNumber ||
+			ItemPointerGetBlockNumber(htid) != hblkno)
+		{
+			/* release old buffer */
+			if (BufferIsValid(buf))
+			{
+				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+				ReleaseBuffer(buf);
+			}
+
+			hblkno = ItemPointerGetBlockNumber(htid);
+
+			buf = ReadBuffer(rel, hblkno);
+			hpage = BufferGetPage(buf);
+
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
+		}
+
+		hoffnum = ItemPointerGetOffsetNumber(htid);
+		hitemid = PageGetItemId(hpage, hoffnum);
+
+		/*
+		 * Follow any redirections until we find something useful.
+		 */
+		while (ItemIdIsRedirected(hitemid))
+		{
+			hoffnum = ItemIdGetRedirect(hitemid);
+			hitemid = PageGetItemId(hpage, hoffnum);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		/*
+		 * If the heap item has storage, then read the header and use that to
+		 * set latestRemovedXid.
+		 *
+		 * Some LP_DEAD items may not be accessible, so we ignore them.
+		 */
+		if (ItemIdHasStorage(hitemid))
+		{
+			HeapTupleHeader htuphdr;
+
+			htuphdr = (HeapTupleHeader) PageGetItem(hpage, hitemid);
+
+			HeapTupleHeaderAdvanceLatestRemovedXid(htuphdr, &latestRemovedXid);
+		}
+		else if (ItemIdIsDead(hitemid))
+		{
+			/*
+			 * Conjecture: if hitemid is dead then it had xids before the xids
+			 * marked on LP_NORMAL items. So we just ignore this item and move
+			 * onto the next, for the purposes of calculating
+			 * latestRemovedxids.
+			 */
+		}
+		else
+			Assert(!ItemIdIsUsed(hitemid));
+
+	}
+
+	if (BufferIsValid(buf))
+	{
+		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		ReleaseBuffer(buf);
+	}
+
+	return latestRemovedXid;
+}
+
 /*
  * Perform XLogInsert to register a heap cleanup info message. These
  * messages are sent once per VACUUM and are required because
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index d34e4ccd3d5..3dbb264e728 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -273,6 +273,42 @@ BuildIndexValueDescription(Relation indexRelation,
 	return buf.data;
 }
 
+/*
+ * Get the latestRemovedXid from the heap pages pointed at by the index
+ * tuples being deleted.
+ */
+TransactionId
+index_compute_xid_horizon_for_tuples(Relation irel,
+									 Relation hrel,
+									 Buffer ibuf,
+									 OffsetNumber *itemnos,
+									 int nitems)
+{
+	ItemPointerData *htids = (ItemPointerData *) palloc(sizeof(ItemPointerData) * nitems);
+	TransactionId latestRemovedXid = InvalidTransactionId;
+	Page		ipage = BufferGetPage(ibuf);
+	IndexTuple	itup;
+
+	/* identify what the index tuples about to be deleted point to */
+	for (int i = 0; i < nitems; i++)
+	{
+		ItemId		iitemid;
+
+		iitemid = PageGetItemId(ipage, itemnos[i]);
+		itup = (IndexTuple) PageGetItem(ipage, iitemid);
+
+		ItemPointerCopy(&itup->t_tid, &htids[i]);
+	}
+
+	/* determine the actual xid horizon */
+	latestRemovedXid =
+		heap_compute_xid_horizon_for_tuples(hrel, htids, nitems);
+
+	pfree(htids);
+
+	return latestRemovedXid;
+}
+
 
 /* ----------------------------------------------------------------
  *		heap-or-index-scan access to system catalogs
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 9c785bca95e..bb38bb4606e 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -1032,10 +1032,16 @@ _bt_delitems_delete(Relation rel, Buffer buf,
 {
 	Page		page = BufferGetPage(buf);
 	BTPageOpaque opaque;
+	TransactionId latestRemovedXid = InvalidTransactionId;
 
 	/* Shouldn't be called unless there's something to do */
 	Assert(nitems > 0);
 
+	if (XLogStandbyInfoActive() && RelationNeedsWAL(rel))
+		latestRemovedXid =
+			index_compute_xid_horizon_for_tuples(rel, heapRel, buf,
+												 itemnos, nitems);
+
 	/* No ereport(ERROR) until changes are logged */
 	START_CRIT_SECTION();
 
@@ -1065,6 +1071,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
 		XLogRecPtr	recptr;
 		xl_btree_delete xlrec_delete;
 
+		xlrec_delete.latestRemovedXid = latestRemovedXid;
 		xlrec_delete.hnode = heapRel->rd_node;
 		xlrec_delete.nitems = nitems;
 
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index b0666b42df3..9c277f5016b 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -518,159 +518,6 @@ btree_xlog_vacuum(XLogReaderState *record)
 		UnlockReleaseBuffer(buffer);
 }
 
-/*
- * Get the latestRemovedXid from the heap pages pointed at by the index
- * tuples being deleted. This puts the work for calculating latestRemovedXid
- * into the recovery path rather than the primary path.
- *
- * It's possible that this generates a fair amount of I/O, since an index
- * block may have hundreds of tuples being deleted. Repeat accesses to the
- * same heap blocks are common, though are not yet optimised.
- *
- * XXX optimise later with something like XLogPrefetchBuffer()
- */
-static TransactionId
-btree_xlog_delete_get_latestRemovedXid(XLogReaderState *record)
-{
-	xl_btree_delete *xlrec = (xl_btree_delete *) XLogRecGetData(record);
-	OffsetNumber *unused;
-	Buffer		ibuffer,
-				hbuffer;
-	Page		ipage,
-				hpage;
-	RelFileNode rnode;
-	BlockNumber blkno;
-	ItemId		iitemid,
-				hitemid;
-	IndexTuple	itup;
-	HeapTupleHeader htuphdr;
-	BlockNumber hblkno;
-	OffsetNumber hoffnum;
-	TransactionId latestRemovedXid = InvalidTransactionId;
-	int			i;
-
-	/*
-	 * If there's nothing running on the standby we don't need to derive a
-	 * full latestRemovedXid value, so use a fast path out of here.  This
-	 * returns InvalidTransactionId, and so will conflict with all HS
-	 * transactions; but since we just worked out that that's zero people,
-	 * it's OK.
-	 *
-	 * XXX There is a race condition here, which is that a new backend might
-	 * start just after we look.  If so, it cannot need to conflict, but this
-	 * coding will result in throwing a conflict anyway.
-	 */
-	if (CountDBBackends(InvalidOid) == 0)
-		return latestRemovedXid;
-
-	/*
-	 * In what follows, we have to examine the previous state of the index
-	 * page, as well as the heap page(s) it points to.  This is only valid if
-	 * WAL replay has reached a consistent database state; which means that
-	 * the preceding check is not just an optimization, but is *necessary*. We
-	 * won't have let in any user sessions before we reach consistency.
-	 */
-	if (!reachedConsistency)
-		elog(PANIC, "btree_xlog_delete_get_latestRemovedXid: cannot operate with inconsistent data");
-
-	/*
-	 * Get index page.  If the DB is consistent, this should not fail, nor
-	 * should any of the heap page fetches below.  If one does, we return
-	 * InvalidTransactionId to cancel all HS transactions.  That's probably
-	 * overkill, but it's safe, and certainly better than panicking here.
-	 */
-	XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno);
-	ibuffer = XLogReadBufferExtended(rnode, MAIN_FORKNUM, blkno, RBM_NORMAL);
-	if (!BufferIsValid(ibuffer))
-		return InvalidTransactionId;
-	LockBuffer(ibuffer, BT_READ);
-	ipage = (Page) BufferGetPage(ibuffer);
-
-	/*
-	 * Loop through the deleted index items to obtain the TransactionId from
-	 * the heap items they point to.
-	 */
-	unused = (OffsetNumber *) ((char *) xlrec + SizeOfBtreeDelete);
-
-	for (i = 0; i < xlrec->nitems; i++)
-	{
-		/*
-		 * Identify the index tuple about to be deleted
-		 */
-		iitemid = PageGetItemId(ipage, unused[i]);
-		itup = (IndexTuple) PageGetItem(ipage, iitemid);
-
-		/*
-		 * Locate the heap page that the index tuple points at
-		 */
-		hblkno = ItemPointerGetBlockNumber(&(itup->t_tid));
-		hbuffer = XLogReadBufferExtended(xlrec->hnode, MAIN_FORKNUM, hblkno, RBM_NORMAL);
-		if (!BufferIsValid(hbuffer))
-		{
-			UnlockReleaseBuffer(ibuffer);
-			return InvalidTransactionId;
-		}
-		LockBuffer(hbuffer, BT_READ);
-		hpage = (Page) BufferGetPage(hbuffer);
-
-		/*
-		 * Look up the heap tuple header that the index tuple points at by
-		 * using the heap node supplied with the xlrec. We can't use
-		 * heap_fetch, since it uses ReadBuffer rather than XLogReadBuffer.
-		 * Note that we are not looking at tuple data here, just headers.
-		 */
-		hoffnum = ItemPointerGetOffsetNumber(&(itup->t_tid));
-		hitemid = PageGetItemId(hpage, hoffnum);
-
-		/*
-		 * Follow any redirections until we find something useful.
-		 */
-		while (ItemIdIsRedirected(hitemid))
-		{
-			hoffnum = ItemIdGetRedirect(hitemid);
-			hitemid = PageGetItemId(hpage, hoffnum);
-			CHECK_FOR_INTERRUPTS();
-		}
-
-		/*
-		 * If the heap item has storage, then read the header and use that to
-		 * set latestRemovedXid.
-		 *
-		 * Some LP_DEAD items may not be accessible, so we ignore them.
-		 */
-		if (ItemIdHasStorage(hitemid))
-		{
-			htuphdr = (HeapTupleHeader) PageGetItem(hpage, hitemid);
-
-			HeapTupleHeaderAdvanceLatestRemovedXid(htuphdr, &latestRemovedXid);
-		}
-		else if (ItemIdIsDead(hitemid))
-		{
-			/*
-			 * Conjecture: if hitemid is dead then it had xids before the xids
-			 * marked on LP_NORMAL items. So we just ignore this item and move
-			 * onto the next, for the purposes of calculating
-			 * latestRemovedxids.
-			 */
-		}
-		else
-			Assert(!ItemIdIsUsed(hitemid));
-
-		UnlockReleaseBuffer(hbuffer);
-	}
-
-	UnlockReleaseBuffer(ibuffer);
-
-	/*
-	 * If all heap tuples were LP_DEAD then we will be returning
-	 * InvalidTransactionId here, which avoids conflicts. This matches
-	 * existing logic which assumes that LP_DEAD tuples must already be older
-	 * than the latestRemovedXid on the cleanup record that set them as
-	 * LP_DEAD, hence must already have generated a conflict.
-	 */
-	return latestRemovedXid;
-}
-
 static void
 btree_xlog_delete(XLogReaderState *record)
 {
@@ -693,12 +540,11 @@ btree_xlog_delete(XLogReaderState *record)
 	 */
 	if (InHotStandby)
 	{
-		TransactionId latestRemovedXid = btree_xlog_delete_get_latestRemovedXid(record);
 		RelFileNode rnode;
 
 		XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
 
-		ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+		ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
 	}
 
 	/*
diff --git a/src/backend/access/rmgrdesc/hashdesc.c b/src/backend/access/rmgrdesc/hashdesc.c
index ade1c618161..a29aa96e9ca 100644
--- a/src/backend/access/rmgrdesc/hashdesc.c
+++ b/src/backend/access/rmgrdesc/hashdesc.c
@@ -113,8 +113,9 @@ hash_desc(StringInfo buf, XLogReaderState *record)
 			{
 				xl_hash_vacuum_one_page *xlrec = (xl_hash_vacuum_one_page *) rec;
 
-				appendStringInfo(buf, "ntuples %d",
-								 xlrec->ntuples);
+				appendStringInfo(buf, "ntuples %d, latest removed xid %u",
+								 xlrec->ntuples,
+								 xlrec->latestRemovedXid);
 				break;
 			}
 	}
diff --git a/src/backend/access/rmgrdesc/nbtdesc.c b/src/backend/access/rmgrdesc/nbtdesc.c
index 8d5c6ae0ab0..64cf7ed02e4 100644
--- a/src/backend/access/rmgrdesc/nbtdesc.c
+++ b/src/backend/access/rmgrdesc/nbtdesc.c
@@ -56,7 +56,8 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 			{
 				xl_btree_delete *xlrec = (xl_btree_delete *) rec;
 
-				appendStringInfo(buf, "%d items", xlrec->nitems);
+				appendStringInfo(buf, "%d items, latest removed xid %u",
+								 xlrec->nitems, xlrec->latestRemovedXid);
 				break;
 			}
 		case XLOG_BTREE_MARK_PAGE_HALFDEAD:
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1936195c535..780f3255de9 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -187,6 +187,11 @@ extern IndexScanDesc RelationGetIndexScan(Relation indexRelation,
 extern void IndexScanEnd(IndexScanDesc scan);
 extern char *BuildIndexValueDescription(Relation indexRelation,
 						   Datum *values, bool *isnull);
+extern TransactionId index_compute_xid_horizon_for_tuples(Relation irel,
+														  Relation hrel,
+														  Buffer ibuf,
+														  OffsetNumber *itemnos,
+														  int nitems);
 
 /*
  * heap-or-index access to system catalogs (in genam.c)
diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
index 9cef1b7c25d..045e2bf58ba 100644
--- a/src/include/access/hash_xlog.h
+++ b/src/include/access/hash_xlog.h
@@ -263,6 +263,7 @@ typedef struct xl_hash_init_bitmap_page
  */
 typedef struct xl_hash_vacuum_one_page
 {
+	TransactionId latestRemovedXid;
 	RelFileNode hnode;
 	int			ntuples;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a0cbea9ba00..fefe1daea74 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -181,6 +181,10 @@ extern void simple_heap_update(Relation relation, ItemPointer otid,
 
 extern void heap_sync(Relation relation);
 
+extern TransactionId heap_compute_xid_horizon_for_tuples(Relation rel,
+														 ItemPointerData *items,
+														 int nitems);
+
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern int heap_page_prune(Relation relation, Buffer buffer,
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index a605851c981..a294bd6fef4 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -123,6 +123,7 @@ typedef struct xl_btree_split
  */
 typedef struct xl_btree_delete
 {
+	TransactionId latestRemovedXid;
 	RelFileNode hnode;			/* RelFileNode of the heap the index currently
 								 * points at */
 	int			nitems;
-- 
2.21.0.dirty

v18-0017-tableam-Add-function-to-determine-newest-xid-amo.patchtext/x-diff; charset=us-asciiDownload
From 2a1a1a0d67d5f4a73799c6f1e207bf5627c14017 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 19 Dec 2018 12:52:18 -0800
Subject: [PATCH v18 17/18] tableam: Add function to determine newest xid among
 tuples.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/heap/heapam_handler.c |  1 +
 src/backend/access/index/genam.c         |  2 +-
 src/include/access/tableam.h             | 17 +++++++++++++++++
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index b183b22ca16..eba8dbb28da 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2340,6 +2340,7 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_fetch_row_version = heapam_fetch_row_version,
 	.tuple_get_latest_tid = heap_get_latest_tid,
 	.tuple_satisfies_snapshot = heapam_tuple_satisfies_snapshot,
+	.compute_xid_horizon_for_tuples = heap_compute_xid_horizon_for_tuples,
 
 	.relation_set_new_filenode = heapam_set_new_filenode,
 	.relation_nontransactional_truncate = heapam_relation_nontransactional_truncate,
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 3dbb264e728..70de8ff75f7 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -302,7 +302,7 @@ index_compute_xid_horizon_for_tuples(Relation irel,
 
 	/* determine the actual xid horizon */
 	latestRemovedXid =
-		heap_compute_xid_horizon_for_tuples(hrel, htids, nitems);
+		table_compute_xid_horizon_for_tuples(hrel, htids, nitems);
 
 	pfree(htids);
 
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2ed25ec748f..827fe6f35a7 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -240,6 +240,15 @@ typedef struct TableAmRoutine
 	bool		(*tuple_satisfies_snapshot) (Relation rel,
 											 TupleTableSlot *slot,
 											 Snapshot snapshot);
+	/*
+	 * Compute the newest xid among the tuples pointed to by items. This is
+	 * used to compute what snapshots to conflict with when replaying WAL
+	 * records for page-level index vacuums.
+	 */
+	TransactionId (*compute_xid_horizon_for_tuples) (Relation rel,
+													 ItemPointerData *items,
+													 int nitems);
+
 
 	/* ------------------------------------------------------------------------
 	 * Manipulations of physical tuples.
@@ -678,6 +687,14 @@ table_tuple_satisfies_snapshot(Relation rel, TupleTableSlot *slot, Snapshot snap
 	return rel->rd_tableam->tuple_satisfies_snapshot(rel, slot, snapshot);
 }
 
+static inline TransactionId
+table_compute_xid_horizon_for_tuples(Relation rel,
+									 ItemPointerData *items,
+									 int nitems)
+{
+	return rel->rd_tableam->compute_xid_horizon_for_tuples(rel, items, nitems);
+}
+
 
 /* ----------------------------------------------------------------------------
  *  Functions for manipulations of physical tuples.
-- 
2.21.0.dirty

v18-0018-tableam-Fetch-tuples-for-triggers-EPQ-using-a-pr.patchtext/x-diff; charset=us-asciiDownload
From b26063783b40a99c4685ff386eb47cd6ec1b10b4 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 20 Jan 2019 17:28:37 -0800
Subject: [PATCH v18 18/18] tableam: Fetch tuples for triggers & EPQ using a
 proper snapshot.

This is required for zheap, where tids don't uniquely identify a
tuple, due to in-place updates.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/commands/copy.c     |  2 +
 src/backend/commands/trigger.c  | 97 ++++++++++++++++++++++++---------
 src/backend/executor/execMain.c |  2 +-
 3 files changed, 73 insertions(+), 28 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 1e7a06a72fb..373142f4308 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2501,6 +2501,8 @@ CopyFrom(CopyState cstate)
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
 	estate->es_result_relation_info = resultRelInfo;
+	estate->es_snapshot = GetActiveSnapshot();
+	estate->es_output_cid = GetCurrentCommandId(true);
 
 	ExecInitRangeTable(estate, cstate->range_table);
 
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 6fbf0c2b81e..1653c37567f 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3383,8 +3383,19 @@ GetTupleForTrigger(EState *estate,
 	}
 	else
 	{
-		if (!table_fetch_row_version(relation, tid, SnapshotAny, oldslot, NULL))
-			elog(ERROR, "couldn't fetch tuple");
+		if (!table_fetch_row_version(relation, tid, estate->es_snapshot, oldslot, NULL))
+		{
+			/*
+			 * If the tuple is not visible to the current snapshot, it has to
+			 * be one that we followed via EPQ. In that case, it needs to have
+			 * been modified by an already committed transaction, otherwise
+			 * we'd not get here. So get a new snapshot, and try to fetch it
+			 * using that.
+			 * PBORKED: ZBORKED: Better approach?
+			 */
+			if (!table_fetch_row_version(relation, tid, GetLatestSnapshot(), oldslot, NULL))
+				elog(PANIC, "couldn't fetch tuple");
+		}
 	}
 
 	return true;
@@ -3612,7 +3623,9 @@ typedef struct AfterTriggerEventData *AfterTriggerEvent;
 typedef struct AfterTriggerEventData
 {
 	TriggerFlags ate_flags;		/* status bits and offset to shared data */
+	CommandId ate_cid1;
 	ItemPointerData ate_ctid1;	/* inserted, deleted, or old updated tuple */
+	CommandId ate_cid2;
 	ItemPointerData ate_ctid2;	/* new updated tuple */
 } AfterTriggerEventData;
 
@@ -3620,6 +3633,7 @@ typedef struct AfterTriggerEventData
 typedef struct AfterTriggerEventDataOneCtid
 {
 	TriggerFlags ate_flags;		/* status bits and offset to shared data */
+	CommandId ate_cid1;
 	ItemPointerData ate_ctid1;	/* inserted, deleted, or old updated tuple */
 }			AfterTriggerEventDataOneCtid;
 
@@ -4237,35 +4251,50 @@ AfterTriggerExecute(EState *estate,
 			break;
 
 		default:
-			if (ItemPointerIsValid(&(event->ate_ctid1)))
 			{
-				LocTriggerData.tg_trigslot = ExecGetTriggerOldSlot(estate, relInfo);
+				Assert(ActiveSnapshotSet());
+				if (ItemPointerIsValid(&(event->ate_ctid1)))
+				{
+					Snapshot snap = GetActiveSnapshot();
+					CommandId saved_cid = snap->curcid;
 
-				if (!table_fetch_row_version(rel, &(event->ate_ctid1), SnapshotAny, LocTriggerData.tg_trigslot, NULL))
-					elog(ERROR, "failed to fetch tuple1 for AFTER trigger");
-				LocTriggerData.tg_trigtuple =
-					ExecFetchSlotHeapTuple(LocTriggerData.tg_trigslot, false, &should_free_trig);
-			}
-			else
-			{
-				LocTriggerData.tg_trigtuple = NULL;
-			}
+					snap->curcid = event->ate_cid1;
 
-			/* don't touch ctid2 if not there */
-			if ((event->ate_flags & AFTER_TRIGGER_TUP_BITS) ==
-				AFTER_TRIGGER_2CTID &&
-				ItemPointerIsValid(&(event->ate_ctid2)))
-			{
-				LocTriggerData.tg_newslot = ExecGetTriggerNewSlot(estate, relInfo);
+					LocTriggerData.tg_trigslot = ExecGetTriggerOldSlot(estate, relInfo);
 
-				if (!table_fetch_row_version(rel, &(event->ate_ctid2), SnapshotAny, LocTriggerData.tg_newslot, NULL))
-					elog(ERROR, "failed to fetch tuple2 for AFTER trigger");
-				LocTriggerData.tg_newtuple =
-					ExecFetchSlotHeapTuple(LocTriggerData.tg_newslot, false, &should_free_new);
-			}
-			else
-			{
-				LocTriggerData.tg_newtuple = NULL;
+					if (!table_fetch_row_version(rel, &(event->ate_ctid1), snap, LocTriggerData.tg_trigslot, NULL))
+						elog(ERROR, "failed to fetch tuple1 for AFTER trigger");
+					LocTriggerData.tg_trigtuple =
+						ExecFetchSlotHeapTuple(LocTriggerData.tg_trigslot, false, &should_free_trig);
+					snap->curcid = saved_cid;
+				}
+				else
+				{
+					LocTriggerData.tg_trigtuple = NULL;
+				}
+
+				/* don't touch ctid2 if not there */
+				if ((event->ate_flags & AFTER_TRIGGER_TUP_BITS) ==
+					AFTER_TRIGGER_2CTID &&
+					ItemPointerIsValid(&(event->ate_ctid2)))
+				{
+					Snapshot snap = GetActiveSnapshot();
+					CommandId saved_cid = snap->curcid;
+
+					snap->curcid = event->ate_cid2;
+
+					LocTriggerData.tg_newslot = ExecGetTriggerNewSlot(estate, relInfo);
+
+					if (!table_fetch_row_version(rel, &(event->ate_ctid2), snap, LocTriggerData.tg_newslot, NULL))
+						elog(ERROR, "failed to fetch tuple2 for AFTER trigger");
+					LocTriggerData.tg_newtuple =
+						ExecFetchSlotHeapTuple(LocTriggerData.tg_newslot, false, &should_free_new);
+					snap->curcid = saved_cid;
+				}
+				else
+				{
+					LocTriggerData.tg_newtuple = NULL;
+				}
 			}
 	}
 
@@ -5832,14 +5861,18 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 				Assert(oldslot == NULL);
 				Assert(newslot != NULL);
 				ItemPointerCopy(&(newslot->tts_tid), &(new_event.ate_ctid1));
+				new_event.ate_cid1 = estate->es_output_cid + 1;
 				ItemPointerSetInvalid(&(new_event.ate_ctid2));
+				new_event.ate_cid2 = InvalidCommandId;
 			}
 			else
 			{
 				Assert(oldslot == NULL);
 				Assert(newslot == NULL);
 				ItemPointerSetInvalid(&(new_event.ate_ctid1));
+				new_event.ate_cid1 = InvalidCommandId;
 				ItemPointerSetInvalid(&(new_event.ate_ctid2));
+				new_event.ate_cid2 = InvalidCommandId;
 				cancel_prior_stmt_triggers(RelationGetRelid(rel),
 										   CMD_INSERT, event);
 			}
@@ -5851,14 +5884,18 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 				Assert(oldslot != NULL);
 				Assert(newslot == NULL);
 				ItemPointerCopy(&(oldslot->tts_tid), &(new_event.ate_ctid1));
+				new_event.ate_cid1 = estate->es_snapshot->curcid;
 				ItemPointerSetInvalid(&(new_event.ate_ctid2));
+				new_event.ate_cid2 = InvalidCommandId;
 			}
 			else
 			{
 				Assert(oldslot == NULL);
 				Assert(newslot == NULL);
 				ItemPointerSetInvalid(&(new_event.ate_ctid1));
+				new_event.ate_cid1 = InvalidCommandId;
 				ItemPointerSetInvalid(&(new_event.ate_ctid2));
+				new_event.ate_cid2 = InvalidCommandId;
 				cancel_prior_stmt_triggers(RelationGetRelid(rel),
 										   CMD_DELETE, event);
 			}
@@ -5870,14 +5907,18 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 				Assert(oldslot != NULL);
 				Assert(newslot != NULL);
 				ItemPointerCopy(&(oldslot->tts_tid), &(new_event.ate_ctid1));
+				new_event.ate_cid1 = estate->es_snapshot->curcid;
 				ItemPointerCopy(&(newslot->tts_tid), &(new_event.ate_ctid2));
+				new_event.ate_cid2 = estate->es_output_cid + 1;
 			}
 			else
 			{
 				Assert(oldslot == NULL);
 				Assert(newslot == NULL);
 				ItemPointerSetInvalid(&(new_event.ate_ctid1));
+				new_event.ate_cid1 = InvalidCommandId;
 				ItemPointerSetInvalid(&(new_event.ate_ctid2));
+				new_event.ate_cid2 = InvalidCommandId;
 				cancel_prior_stmt_triggers(RelationGetRelid(rel),
 										   CMD_UPDATE, event);
 			}
@@ -5887,7 +5928,9 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 			Assert(oldslot == NULL);
 			Assert(newslot == NULL);
 			ItemPointerSetInvalid(&(new_event.ate_ctid1));
+			new_event.ate_cid1 = InvalidCommandId;
 			ItemPointerSetInvalid(&(new_event.ate_ctid2));
+			new_event.ate_cid2 = InvalidCommandId;
 			break;
 		default:
 			elog(ERROR, "invalid after-trigger event code: %d", event);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index e70e9f08e42..5aad8df4ca6 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2639,7 +2639,7 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
 			{
 				/* ordinary table, fetch the tuple */
 				if (!table_fetch_row_version(erm->relation, (ItemPointer) DatumGetPointer(datum),
-											 SnapshotAny, slot, NULL))
+											 epqstate->estate->es_snapshot, slot, NULL))
 					elog(ERROR, "failed to fetch tuple for EvalPlanQual recheck");
 			}
 		}
-- 
2.21.0.dirty

#115Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#113)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-03-09 11:03:21 +1100, Haribabu Kommi wrote:

Here I attached the rebased patches that I shared earlier. I am adding the
comments to explain the API's in the code, will share those patches later.

I've started to add those for the callbacks in the first commit. I'd
appreciate a look!

I think I'll include the docs patches, sans the callback documentation,
in the next version. I'll probably merge them into one commit, if that's
OK with you?

I observed a crash with the latest patch series in the COPY command.

Hm, which version was this? I'd at some point accidentally posted a
'tmp' commit that was just a performance hack.

Btw, your patches always are attached out of order:
/messages/by-id/CAJrrPGd+rkz54wE-oXRojg4XwC3bcF6bjjRziD+XhFup9Q3n2w@mail.gmail.com
10, 1, 3, 4, 2 ...

Greetings,

Andres Freund

#116Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Andres Freund (#114)
Re: Pluggable Storage - Andres's take

On Sat, Mar 9, 2019 at 4:13 AM Andres Freund <andres@anarazel.de> wrote:

While 0001 is pretty bulky, the interesting bits concentrate on a
comparatively small area. I'd appreciate if somebody could give the
comments added in tableam.h a read (both on callbacks, and their
wrappers, as they have different audiences).

Potentially stupid question, but I'm curious about this one (couldn't find any
discussion about it):

    +/*
    + * Generic descriptor for table scans. This is the base-class for
table scans,
    + * which needs to be embedded in the scans of individual AMs.
    + */
    +typedef struct TableScanDescData
    // ...
    bool rs_pageatatime; /* verify visibility page-at-a-time? */
    bool rs_allow_strat; /* allow or disallow use of access strategy */
    bool rs_allow_sync; /* allow or disallow use of syncscan */
    + * allow_{strat, sync, pagemode} specify whether a scan strategy,
    + * synchronized scans, or page mode may be used (although not every AM
    + * will support those).
    // ...
    + TableScanDesc (*scan_begin) (Relation rel,

The last commentary makes me think that those flags (allow_strat / allow_sync /
pageatime) are more like AM specific, shouldn't they live in HeapScanDescData?

#117Andres Freund
andres@anarazel.de
In reply to: Dmitry Dolgov (#116)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-03-10 05:49:26 +0100, Dmitry Dolgov wrote:

On Sat, Mar 9, 2019 at 4:13 AM Andres Freund <andres@anarazel.de> wrote:

While 0001 is pretty bulky, the interesting bits concentrate on a
comparatively small area. I'd appreciate if somebody could give the
comments added in tableam.h a read (both on callbacks, and their
wrappers, as they have different audiences).

Potentially stupid question, but I'm curious about this one (couldn't find any
discussion about it):

Not stupid...

+/*
+ * Generic descriptor for table scans. This is the base-class for
table scans,
+ * which needs to be embedded in the scans of individual AMs.
+ */
+typedef struct TableScanDescData
// ...
bool rs_pageatatime; /* verify visibility page-at-a-time? */
bool rs_allow_strat; /* allow or disallow use of access strategy */
bool rs_allow_sync; /* allow or disallow use of syncscan */
+ * allow_{strat, sync, pagemode} specify whether a scan strategy,
+ * synchronized scans, or page mode may be used (although not every AM
+ * will support those).
// ...
+ TableScanDesc (*scan_begin) (Relation rel,

The last commentary makes me think that those flags (allow_strat / allow_sync /
pageatime) are more like AM specific, shouldn't they live in HeapScanDescData?

They're common enough across AMs, but more importantly calling code
currently specifies them in several places. As they're thus essentially
generic, rather than AM specific, I think it makes sense to have them in
the general scan struct.

Greetings,

Andres Freund

#118Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#115)
Re: Pluggable Storage - Andres's take

On Sat, Mar 9, 2019 at 2:18 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-03-09 11:03:21 +1100, Haribabu Kommi wrote:

Here I attached the rebased patches that I shared earlier. I am adding

the

comments to explain the API's in the code, will share those patches

later.

I've started to add those for the callbacks in the first commit. I'd
appreciate a look!

Thanks for the updated patches.

+ /*
------------------------------------------------------------------------
+ * Callbacks for hon-modifying operations on individual tuples
+ * ------------------------------------------------------------------------

Typo in tableam.h file. hon->non

I think I'll include the docs patches, sans the callback documentation,
in the next version. I'll probably merge them into one commit, if that's
OK with you?

OK.
For easy review, I will still maintain 3 or 4 patches instead of the
current patch
series.

I observed a crash with the latest patch series in the COPY command.

Hm, which version was this? I'd at some point accidentally posted a
'tmp' commit that was just a performance hack

Yes. in my version that checked have that commit.
May be that is the reason for the failure.

Btw, your patches always are attached out of order:

/messages/by-id/CAJrrPGd+rkz54wE-oXRojg4XwC3bcF6bjjRziD+XhFup9Q3n2w@mail.gmail.com
10, 1, 3, 4, 2 ...

Sorry about that.
I always think why it is ordering that way when I attached the patch files
into the mail.
I thought it may be gmail behavior, but with experiment I found that, while
adding the multiple
patches, the last selected patch given the preference and it will be listed
as first attachment.

I will take care that this problem doesn't repeat it again.

Regards,
Haribabu Kommi
Fujitsu Australia

#119Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#114)
18 attachment(s)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-03-08 19:13:10 -0800, Andres Freund wrote:

Changes:
- I've added comments to all the callbacks in the first commit / the
scan commit
- I've renamed table_gimmegimmeslot to table_slot_create
- I've made the callbacks and their wrappers more consistently named
- I've added asserts for necessary callbacks in scan commit
- Lots of smaller cleanup
- Added a commit message

While 0001 is pretty bulky, the interesting bits concentrate on a
comparatively small area. I'd appreciate if somebody could give the
comments added in tableam.h a read (both on callbacks, and their
wrappers, as they have different audiences). It'd make sense to first
read the commit message, to understand the goal (and I'd obviously also
appreciate suggestions for improvements there as well).

I'm pretty happy with the current state of the scan patch. I plan to do
two more passes through it (formatting, comment polishing, etc. I don't
know of any functional changes needed), and then commit it, lest
somebody objects.

Here's a further polished version. Pretty boring changes:
- newlines
- put tableam.h into the correct place
- a few comment improvements, including typos
- changed reorderqueue_push() to accept the slot. That avoids an
unnecessary heap_copytuple() in some cases

No meaningful changes in later patches.

Greetings,

Andres Freund

Attachments:

v19-0001-tableam-Add-and-use-scan-APIs.patch.gzapplication/x-patch-gzipDownload
v19-0002-tableam-Only-allow-heap-in-a-number-of-contrib-m.patch.gzapplication/x-patch-gzipDownload
v19-0003-tableam-Add-insert-delete-update-lock_tuple.patch.gzapplication/x-patch-gzipDownload
v19-0004-tableam-Add-fetch_row_version.patch.gzapplication/x-patch-gzipDownload
v19-0005-tableam-Add-use-tableam_fetch_follow_check.patch.gzapplication/x-patch-gzipDownload
v19-0006-tableam-Add-table_get_latest_tid.patch.gzapplication/x-patch-gzipDownload
v19-0007-tableam-multi_insert-and-slotify-COPY.patch.gzapplication/x-patch-gzipDownload
v19-0008-tableam-finish_bulk_insert.patch.gzapplication/x-patch-gzipDownload
v19-0009-tableam-slotify-CREATE-TABLE-AS-and-CREATE-MATER.patch.gzapplication/x-patch-gzipDownload
v19-0010-tableam-index-builds.patch.gzapplication/x-patch-gzipDownload
v19-0011-tableam-relation-creation-VACUUM-FULL-CLUSTER-SE.patch.gzapplication/x-patch-gzipDownload
����\v19-0011-tableam-relation-creation-VACUUM-FULL-CLUSTER-SE.patch�[{w������hv��bI�l��&[��m�����s�=:�Y\S�J�~4�w���#���=Wm,
��y�<O�������I�`4�n�vF����=�J�����#��J��#��s�����/�[�v���N�'R��Df�/~���/^�%�2l��M��K���{i]��y����s����} ��V���F�-�����ugp�N<`H���>��H�Q(=,���K 4N$?��o���������b����?���+���A�������v�������H��`�!�L&2���o�l�E1�})f��|�3�"��'&�Q`Q�T�����S/��
�v�t'G���G�7N��*g�gJ�#1M��:����*�K�Rm��*'��~����Q��hT�J�[#�������HmM�7�?�l���P&���S��������<�8NU�`z�{���ga>����\Xz��^���X���S��[�t����"_>-Bb�v{�Z*��N��rp V��8j��q���i��ml���Y3���g�~w
,��Z�����g�r`@Ztss�rY���O�d�rh,���aWA�8I��lJzr���� )|���7kN�y$WB$�1	R�L�� �b�F����9faJB��oeoi+��1]�Y���WN�%�����$5��u���-�H��L��n�����+��P��F�R!S!�� ��7�����VXoD{4i�v��_���C�������'�����.���X�N|��"�;u�h_����_*�"���X�0�w2"��X�ee�Ees�o��i�������H��\fE�|��������p}�w��<�|���[�a���n����������	44]+����lA��'��K�y
�Z^Uk�(��H-��j����J����!��f"'q2B������U�AP�$�w����qc��U���z�r)�%��"���^�IU�r�x�(C���)y������ch�����	^gvc/�F-fn�W��������W����4�Q�j:e��P�j���S�0/���lMj!^�E��W?4�6T&��M�t��!��QM	�:�D�e�xQ8�������
#{�x9I��C~`*��D����5�V���������� 
���+�N���[����~{���,��<K5����4��g���G
���q����"�w�8�#&^"�� E�EQ����RO�H]~��_[�[�I�v&:&��7�G�zI�L�+��l��5hv�� �<�8:����G���1������FoC�'�;�l%�:JL(�t���
v�L�G�����IK;I�"����!p
��sUiji��?b�@���L]��$K�t�71������3��
�d�"���)\����jMs���)��P`�LUm�M�I������|�M���Y��s� �����1��?��E0��4�����^��Q:���<�_�A��	�?"@Z�dj�s4&RX$�TH'0h�t!�O���h�X��r�[i[.�'>D?w�ZgwS=���JF�������'��B�%������{��:�a=f��@�=�o�%����{��2y�I���$]�4�e�l
3��y�=�
���7#�.>^\���t����M�3�
�r����}jZ-Jz���T��D�	�F�"�X�>z�0���5k����\p����������M������;��Vo�g0kd�6:����,��,�����!�;������O6��8���3�������N�[���KB��k��\F}xn�R�����Pj���eo0<������=���Jj�x�@���5���aB���
��S,�j@��Uy������h�8.�8���c��6����+3�6
��r-Z����dq�q<����$~�C�/�,o�$0+���%�`���)���r����Z�z�b�31����j����l����'j�RD4FA��
	+D��:�D��3�>��|�B����t�0�3��I��L	E��>����Dc4�j�]+r��e��P��"�3��A/O��E��ePsQL�w�k�
z�}]�����	�i8�~C>a�F�,_�]��l�:���wi�)��7y��4���?�����-"sF�����������]+`woF9��#Y�6��:dn����Zy�
68�NZ'C��A��S�Y
w�t����iFn����i<��\v/�|n�EC/~�Q�|z]�W5��h ���gv~�$Z^X!��n���0��$��ya�.��8$��SdY?��`�����H�Wl&�J�>4��b���Jy]���|���!�R�kM�cE
����s��'��R��S���I>��3
�^F�L'�Hm~���7�2��%�M8g�0����g9�$��f��)��B`hQg�yQ��M66�(n��@
�3H�*�b���$c���4��?x��97�SKR5���a A_�1�-`�~B��/�R;�J/��T�O��)�1I{�5j��;5�ph���y�fb-�7r&���~�O�����0�1Q�"�����s���K��~ 5�x9f�H����b��d/�:H�^2���[�G�L��&�h��\�Y\�(�b�$
��������.��DzS�s��&p���b�!�6	Q������$���6�2� 3�k��&�u�=�LO._�m����
���������[����OJ��b�]b�x���Io_��)������U
�<�X<��5�B�`t-h��
l{���T�"�D�,���4fI$#C�W[����/7x��Z6Rp7���2N��1������e��dj~��X��N*W} �?5��Xv��l��C�\��3D6?���G\�^\�������4��K�9�#�K
v�im��$sCET��G$�����l�������.]���9�zYgp��1H��ys5��"Z���2yon\�`��R��&��V��!P�(�!Y�)-��z���?�����Z��0�����5{�B�r���#�1�����`��u���l�e9���>Va��n�~��5�����?6�H�����	�
5��L���z�_��7�9-@\q�.d@���9����]��>��f����Ko<�F�Z]�����B�@��Lw.�z�YL���2Nu�������R1�#���8�*z��#��5��W��M����=z�N���q.h6:���5��)�M���{�+�����{ss{=���v�S#�N6A�?h��R ��+�z����&����4�F�#�}�u�~@�9h��r���{���H��H>�|V�1+U���!G�����3`��p�U����.^���hO�5���d���g������\_1wn�R����L�^�C�
�8p�Z���7��W'�����������Z�):�ou�E�'��l����s*��f�c�w�������;<�v��4���U����,��FI��L/�rq���)R���������k�,���~X�7/L|���[ {<H�c;)�\3xDm�ty������e�{3�������M��?2�msj}�`�|J��|"gL8q0G�(Nf�����6`�}J�'���C�Slh���0�60��b���=R��z?�n�M�[L���j��	�sP7�N�� B0�J"���Lt��h�4��a=����[������4���LZ����>R���9h(ME�N!R*Y�5���'>��UpfI����&�������[����,kO��J{������k���ss��|��5�h��$��/W��Y[�����/��F��<�f�S��

��#�R}H����}����_m>'�`����!�Z���V.��],���"��� ���U%����+��k,���C�q��f�Xs8%���0�
����	�/��4}��Fd��0���Qe���RT�L����r��!0"���p:.s�e�'lF�Y�?�
���z��������$�������dU��I�L\k$�S���F>q�Ve5|�?��w���+��R.y�g��1b���a���ULG�/r77�+R��<K�
r�f��1���k�LB7�K���io�{�U����ku0GV��t-Py������!F\i�\b�kCS�d��R���
�����qE��M��x���9�c��m�����1��`xi�
�����[�/2��b�����2!��s1�����
�������s=�T�@�M�������@���T!���c�p�o"���j[�=��L+3����6W#�g@���a
�����V0af�?��\�rP�S�"6������t����ft���-���������_�gG��^����M��KQ���.��j����w�[QZ��:H���3c�+B�`/���7A0�>���R��7��������k{�-��D%���H���d��ih})�]@���4����������l���>�G7�/6(
�=��)�q�����K?���{]��=��s��J����b�||3�t�f�����S��@����{c���[��H��q������Bt������OcnI�6��N�+SOq	{��2�<	(���2�a3r��<�������=�f-'�6��0��thL��;#�
����[���Msx���'C�_��xtFx�
��l>����=���
��������Y������x�Ya��N!l`�6���\yV<�����G}F��uTX|]��o��m�L���r-�pK�e+[X��F���\���sA��z];k�@����A����;i�T`Rk�bR���1���tJ��;$@o���;�������j���K��A)�]�_
u|j#�h������,v�	���>�i��ay4���|����>��[�;���!�Cz7G����I�M��^W6�*S�6 �������F3�+�x���a��^�[{�����c�S���VL���q����&�9�������`^��w@Kq�"\�@����k��}������+&r�'��;�x�����e��+{���g[~�b�8~��uHrN_!��h/?�����1�}�f~}���\��{XI���tE_q�;`]�5=OMx�d�04v��O�A��'�ow���/���:���:��7���',�<B+_��jo}{����o�����b�
���Hs6q����E����'iXl�/����}u��:�`���9)�^�������,z�,��7�B������d��=9�4�^K���dog�^��Y������W$�����
-���g�+��Q������'��A��)��KC�~~�E����C�:<�]�E���>"���);�<`����zJ
I��t^����|�b�c��-}��s��J�z;�v�I;���
�p����r���+~��j��n���v���K��7�ZZ-R���N=�
���u#��Ko�P�+�{Vj:����G[�E�MJ�k_�)���2�Ko�8&����>������H�-
7��0�k{�Ugg�;���8��)"?U�n��eZ�Y3 B&�/S�����5)l;�P>MR�_�s-?PL(�7_�j�
��F�"�T"��gS�r(��b�w�"����y/��s9&��F(.�����4�S�������w��n��������_����n�e��_�1��������/Q���L�g�����tqo����Q��K�$�oUk�X=�/��c]��S����2�Nq�X�� R������]%��>nt���}4:u��z.�o�Iz�u@�I�}w���M���������{����
z�h������U�7����p~u����reaE�u�f��7B���kkn�������������I�W{2S��$��%�%Oe�T 	ZK�����������o@7�Hvkwjch4��O��w�NK����~a?
�Y������s����A��N����W����M���&����8�"���1��#�2�F�9h�y���p�
�k���Q�%�;w����!���`�C	����k��t��\�R�M���t2�j�/��Pq�E�S��r��
��(��?��
0�JrR���KJ�~(<Le�
r���������Q ���X;SU8XE�*��w���4+eu�lU5��Z6]���LX-\#��rX�����,%^��u@`|��1��p�J�LqJ }�!��J�t^���Q������&d#T��(��;zB)k
��4�(��i�w��'�{���w�>����Ri��h���m�}R$��d�[^�5���6�D�����!�h��P�����������N{�f�N{�&�m��7��_B��<�j4��\%O����w�C�O����,6P�lSp;����tIu�����i�t��� ���n�M�o^�,59�<��?�V��Tdm��BZ��=�5�6#T��$Gfr�tH:�H�Y]���Lw��������O��Qj�?��)������S�H���u�Nb���nc�������NO~V�{|�-�%�3B\���T>�_�\\^~Ds�:�T��A;���MA� �<~�%�D�����\�VL)���+hB�h��?��x���d�[�B����:S^a���'�����U>/����]X�A�o�)	_]B������M7�TC"�j ������h�$[� O}a���8Z�U+�v�)�C�����u�$�B /CI4�	;;N8X���x6�vt���m���>lBT�5!�������/�{@�S`�x!@"�TB��)Q�RB��2;J��$�.pd?�[2v�OH3�K���
$������p ���q�o�s��*yx��Y������Os��8�b�����M!��G�b����f�T�?Rm�-wD�[��y|�\��K�/���0��H��j�jGd�=���3�1���OY7�����=��?��h�^��`�>��k��k�U��6,�Q]�s�=:u���eg�3���iv��K=@>��G��T�4�dKpr��F��f(+�V��������.;���t>
��v���
;�~�?��;U?a��j�<<F���������fc�yb������.��F/#�v2�9����O��.��@Bi����R�ik������b�a�����g"
�z4��iE
O����,��=�yjq����
u$yN��<1a~�����������jO-�g;��+<�&�q7�q��:
�$��|�e�`�kE�gQ���mq.�*�U��6G��R ���Fw4h����[S1E��8�/���r�&���Rf_Tc�6'��-�8�� �R�h&�B,�
��F�[
�d�&��'`A��3k-����No�������I��%�)[���nQ�5	�$���"���&���0W>���tb?���c=nw��a2��[�a���N�������������AU��u�Np�Z	l:A������b&i�;��&���z�������4�6���r��}�7#H"Z"L�-P��Kt���3���(���BM�1�3��x)rW
���q����4�,���Qy�'��.��x��?��[|�G��4��9�����p�}I
�` VO�0��&�M��`�����rE�<����;�w�`r���k�m�Sd5SY,�O�����(*�����&��u�D�^GI�V�+C����E�M"�e^lL���_j�x����2��	���b�p���dl%@b`��
=6�V��.��31�����@�T�5EY���dH�k�/$����)(h����A~��q�	04����C���u�va
�����6i MM�v���`+����
�\�`v?�)}�:����=_O�k(WO.O�nN�_���)�G�������_�V�;�%�;��#|��SIM>i������[>������L~6�L��)xN�%`8�!q��>/�S��e�h�]�O`�BR^�'�nN�P�?�Y�hA�<Ap�=�y\Pw�QB-��F�����/+�����K��=��,e��;�x@Rl9A1���|�p'_��Gv�.��u�M0�4~�_}��?p���-��_�n�r���G�e�['����G��!H�|�������=�h���~��g`���u��}��!pa�3I0����p��c����f6j�h4��ah�RM|'������O:u	�c��~I�����>�?(I�<F*��>��N�Q�������/����D���q�����M72����r|�~�lE���,��
���"J�������JmZ7m�$�m~gRO@u)oQ��4x�8�Z'��C
������OM�� G��x)m���^�X6F}������=��YW��w�����-�n>�)IB
6����:]�������N���P}��?q�X����i��@��l�������`�VH��<���O:�X� :&PP�o��b�����8]>1I0R�iG��������9��V6�T{7gr>���������S=H�D!U���+�;g�"��hJ���j����f�/�2���u������A�-u��t �����]������\���������
k]��m�u!'�]��������\)�0�}����3Y+%�w���>r�B�BV!
�C��xz����
Z����������!g��+z��#J�t*@`(?[�1��g�M4�e����'+�u��ae6�4<xy~jP���1�V@|���@�_oIu��U����S�p
�����j�`��}&����@
��.�kV�����������4����~���*O_�
�.�����Y�l�t���Dj�Z�����lZ�������U���Xpd����������������D%�c�)+�98:�%)��	]-�Y��H���A*�du�����F�]/�'p=�L�����Ifa���S�\��v�Cf?����^w�S�m39:��k�i�*���S�Fm��b���6Z��Z�PS8d#�������V�S�� ��!^���t^�o��o��<N�C d�H�)I�`ayB[?��xV[<gc�3<B��3��W1g��3�m��������RdA�������@��2��C�r�?�K{�rCV���
���M(m7�7;���vV��&�7�Q�c����U z�����0
�����uaZx�EaZ�Q�E�{�~����'��-�FZ��;������Wg�Ur��
���n./~�W�!�@�N���t�
��)4�pY�N����<b u�b����������\w���]���e���^�E��:�1�i��������m���]h���O�S����z�9,s����Z����`���O�8j��Ps8P�Y��,B�vm��0@�4d�u�b)\oQ��-���;�z�a�l��IDEltM�9@��mI
?�A]�l	��cE��]�X����u��;�
pA�JR"�n���G�Q���cTCk'Z�D��P����Z��<RL��$<�%��U\��x��h��|q(^~�R)q��/4�YXL)����d
������$:�L#?i�) �j�'w`6����	Y#�W�&��0�v4�����g�:g�+
~�?��1��!v�!rn�����&���>�M5����O�	�����gIB
b���������������*�����o���3p�#Q	d%%�L��q���=0��d�V�=
*EQ���� ��`����'�-��	:S�S=AX����	���	b��	v���	�4�%o3��
���
b��
�?{���Tp��nkG�"���@p�|��K��8�<^��K������Dd���G��N�:K�u�0��/���-Kb�M���[ �0F�uR_�;�Y?�F����4�o&W:�KDX�w��S���� ���f���:�.��s�i�l���Y1���� ��� ��� b{A�NQ!2�L�L�������q�E(5��<���q�������K����}H��0���s�����Z
�D�n	���Bz�f������
%xJ3����
%jP�
%��C+��@^r0'f�����7� ����G���O�����_�O)�7�c�7SEt��8�����	6�Z�d��w"{<�h�[�G��r��2UN������r"-�cW��9��KEN�%����g�y�����b���"'������v/r�=������"'f�"'�F�b����*`����S#e��P���r&����F�"�\9�}��W��G���3�C��TC�yQJ.9dQr�KJ0�mk�����������VP�$3����Vb$�Vb���",\�`�����Svk�����f�"1����2}�7������\�B�eU��9_���-qfk�c�q�E<1"�|����QZxXR
���#"[sDc�!�_�a��#�U���#���#�m�Kdc�Z)��C����^.�4v�?RZ��_�J����d�(��� �(,H��������q6
���GQ�n��
�Av���'S����:�S�\3�+����4M��b�A���+ID����W�L���P�0C��]k�7L	S�:������#�I��u�8���=��-."[VFd���lYa��M\���I�f�:�=r��
���8g�~���L����y��O����-�4B�����dDD�)'��������������8�����C���*��O98��9B���m��~[������T���w���q��������9���0���c������CqQ�8�!��A� WF����o-]F�^F/D�8���D����%^�����&�A�u��V2�g��d<�����*H��ZM�
�wH������Vx��Gnt���!���SD'i���Xf���?�%������*���a��m^�`�
x�F�s�4�'�HRc%�:���k�����JX�h�V=�\�L����w ����T��9��J���!���_���xF�fhy��R	��2E�/;m�������o'�_[�S��&2""1�HfY�Z��(��7���
yc���������2����Pe&z���)8��N���v������3Z�����!J��v���p�m����dmA��/R[����u���\fv%������*����vpAev��4{����
���Y�&����#��q�7h����x8������|`wV  ������
�7������[3Bw�r��tRA6�$=I��
:,�Fo�8UN��k�-����i�t�H�i�yv����<s�Hw�%�X�?�'Y�@�������(���(rv���l)����M�d��*�������P8Pn���������D�9xiJ��~�'�����'\F,�:��:T����D���#�`����/��`7�k��q��)�3�"?XON*$t%���F|B��Q�M�����R�_���Q�U�OfJ���Hrl����Q�I���zr���e'����
�?������\�VC���e�J�V?OV���ws�� �0����
d&�"a�%�P��t���C><�������A�i��i�W?`P�]8�Z./t7sE1�MdLD:Q����n�����,,0���L	)�Y�6���,;A?�R����7�ul[�,��*���Q��pU��W�+M�a����;�Vv��c-�}"���3��T�����.�!�bj ������D�(,���.�Wu%���B���zP	K�A�>i[�l�C�UG�*�������k��vJ�G����F��G�H�mv.�&���^A�0an��D����e|z�m�"D�����S7���	JN;+Z�>���Q��y
�����e8)x,,�Z�v�I��4���_�H�TRb3z�/�W6)T$>Rr�V�����[8�Z4X����f�F=1�
��;�v;��Q�je��uiK*3�]&�~-��m���)��3��""��B�91&%+���P��[�r���b��\^1n
���r
�Z���`5��L*D��xq��jp�������q�\N��������J5�u}�w�-��|AM�s��S���&d�0��!�UO����"�F�d���peXY���2U\�`�$�i�����b�����'4|2������[4�LYn)/'Y�:��#��\OQ}�[
��D"��'��w�2�3`�)�W!Q}�����m���?�b���������,���t�5��;��D���h�Ku��!����n.�Ka�n�K��zy4�h�5��a!�	
��1�U�`���h�&�*I��Z�v:%����B>�lV|�N3��RT���\U��XjS����6����v��.���F����#5d PU�a���� `�"�JR�Z
������������^#<!��1�58��(�D�j��:<t�=��)�L�x+�����J���`�3b���!jj���9�(��CX	�F���'��dsS��;��������T~�6��G���h����������������~W{��������]=w�q!��������B%�4��=�4���*�6����Ec� �/H�p@���'y�j��^���,F!����FH������`�
A�EI�jd�FH3����b�BYL��E�EH�Jz��	��||H�~i��<+��8k��m��R��B��%���J��j�g�>a=v%�HU�=�\����uI�2b1Z�K���t�bTu8�2��$�mGUg�hwTu%\2����.������n3U�-�\���
U]���B�"���U���sP
U����Q�u������4����=P�U�B�h��"�����C����hc�"�(���~����D�����j�M�VvM%�
�0|Q�p0��Q�|Q��_T��/����`��*1����;�,���/�a��PX��(0�4�F���"�i,WB���\<��GD%�x8��;�Y��jw����t��tt���
4����a�y���Gg`����^I�Q]	�S��qz�����'\��1�IDa�$�TR�fzk�!�E(������2&	�:s���k��R�|V�]4�Huu_�v����4eyBUHEM�#q�b�
��L NPJ`Y=�t����!��9!s����N0u?����J�r{ze%3qh����g���7$yB<(8A�v�X����w��A`�(
G)�0[�%����Y�%�����z�(+����[�������U2��s?��v��"q{��j!����������:B�����A
���&G�[<e��#S��N����0����;o���Wg��L�(�;
���q��[n��[�@k[]?Oh<�mm��l�A*����&��l5X+t+]�/,k}����-��������64��E�cD����KeL��uG��\x������X-��H��eZ��
F^��������9qr]W9����VOI���������
��$�,������!k�L��,�YG�����,�bN��(�%�������nV���7-���[�%���gV�x�j�@�1�n�������1Pa���w�!C��C�@+��z�2M@Zf�����HI����������~��(��Y;;�av����^��^���F�����|������a���!������;�JydhP��H�b��"{�����E����
�'��*5��
�z;7v���|U<g����
�e<����JS3�)+)���QpH��]=�����������_�����tv������sn��&x�c�G���X��y�3�F�h��6�9�����5�8�-�Z���~��
�-Q403��q���q�
��[�3y�����9�L���__Z�����h8���v;�!������Q��L`&����O�m��8o4 I@��6b��;�
&���7����3���d�N����\w!�7���QgDY���1�u�J�l�Q'H����Q������M�n\Z1�^�558���"����7�{���0%pC���0��x-�h���G���+5��aAoC���
�3��r��U?J�
���W#�g��Ig��*�Pz%O� +B��N�0C$E5�M	�M�Q�?������u^y<��>.�R�'#p��s���Xo2���w���3o����d�j��Ig<t�=��'����P*Ro�.�	�E�H{�k�;���E���-e%LO���hNmj��8zX,�*#@2�"{����;���0#dN��X/�a��*��'N�$����8��C��;*�	���o"6n��0�5V�H������\�f/jt!�y8~�K�Y�����%'�?������
�T=���<������s����X�fD���f�8�$F�5���B����K9�0Oo�s�cu`<<e
����y��������q
�$2�T&�@SA
wl��� �Me?���?�4���$w���Gw�Kp�!��roB�<�/(�:�E��	H��?������85F���i��g 9L�7���Vk0�%��t2�lg=ngA��6�����7�?�����+
�-�pb�v� bG"^�
�����=���}�����(�?���#T���r)N��n8P������D������TW0�M��H�Xrj�Fo�n:
�O��6o�q/����xr<l�:��?�����`�63��i�",��6�r�d(e����Vr/-��TV���;�$���j��J���=G��j��I��[w�������	��9E�)7��Om���
�Jf��Z��f?,���BR����x��9�L����w&�0�B����jF�m�Fu-���\��7�*���\�����{@�d�LY'�4��L��0�8�Im��d6�J�(��o��5��p���r4w_U����R��8���Su��7k�0�&�l6�wh�M������h`�R5���|�����P��m���o�����&v������a�ey�P*:��
rap<����/o#Hv��������F��wT���a��g��'
���D�2\'����J���Q��!n��������?8��^��$��@3�"R����"Ju��XkP�b�7^m������f�a�y4���4���L7y�)�����F;'Rk���j�G���������Qe�B����C6Y��Pw�.���@@+s�����[���$���V�5p8�Z�H���'�l4�Q������4�2��K4b�sU/aY����(p~A��k)��7��m-����������C��d��<�]C���_XCn}_����h�����j�F��t������pO��c��!��.(�>�#�3J}@�o�[�{��E��p���>~9����������GIq��[��ioN�M���'�4#O�]��g���.������Y�����*��Tm����T�����Sv
G�HKaC���"��*��V�\�����7�-��q��h����;"��RT��@�a���q��(U����;��}��X���@�W}6����������?6�.@�����T��C&~����<+�T-�
}b��10=�2��J:�^t[�N���-���/���O`�
v19-0012-tableam-VACUUM-and-ANALYZE.patch.gzapplication/x-patch-gzipDownload
v19-0013-tableam-planner-size-estimation.patch.gzapplication/x-patch-gzipDownload
v19-0014-tableam-Sample-Scan-Support.patch.gzapplication/x-patch-gzipDownload
v19-0015-tableam-bitmap-heap-scan.patch.gzapplication/x-patch-gzipDownload
v19-0016-WIP-Move-xid-horizon-computation-for-page-level-.patch.gzapplication/x-patch-gzipDownload
v19-0017-tableam-Add-function-to-determine-newest-xid-amo.patch.gzapplication/x-patch-gzipDownload
v19-0018-tableam-Fetch-tuples-for-triggers-EPQ-using-a-pr.patch.gzapplication/x-patch-gzipDownload
#120Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#119)
Re: Pluggable Storage - Andres's take

On 2019-03-11 12:37:46 -0700, Andres Freund wrote:

Hi,

On 2019-03-08 19:13:10 -0800, Andres Freund wrote:

Changes:
- I've added comments to all the callbacks in the first commit / the
scan commit
- I've renamed table_gimmegimmeslot to table_slot_create
- I've made the callbacks and their wrappers more consistently named
- I've added asserts for necessary callbacks in scan commit
- Lots of smaller cleanup
- Added a commit message

While 0001 is pretty bulky, the interesting bits concentrate on a
comparatively small area. I'd appreciate if somebody could give the
comments added in tableam.h a read (both on callbacks, and their
wrappers, as they have different audiences). It'd make sense to first
read the commit message, to understand the goal (and I'd obviously also
appreciate suggestions for improvements there as well).

I'm pretty happy with the current state of the scan patch. I plan to do
two more passes through it (formatting, comment polishing, etc. I don't
know of any functional changes needed), and then commit it, lest
somebody objects.

Here's a further polished version. Pretty boring changes:
- newlines
- put tableam.h into the correct place
- a few comment improvements, including typos
- changed reorderqueue_push() to accept the slot. That avoids an
unnecessary heap_copytuple() in some cases

No meaningful changes in later patches.

I pushed this. There's a failure on 32bit machines, unfortunately. The
problem comes from the ParallelTableScanDescData embedded in BTShared -
after the change the compiler can't see that that actually needs more
alignment, because ParallelTableScanDescData doesn't have any 8byte
members. That's a problem for just about all such "struct inheritance"
type tricks postgres, but we normally just allocate them separately,
guaranteeing maxalign. Given that we here already allocate enough space
after the BTShared struct, it's probably easiest to just not embed the
struct anymore.

- Andres

#121Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#120)
Re: Pluggable Storage - Andres's take

On 2019-03-11 13:31:26 -0700, Andres Freund wrote:

On 2019-03-11 12:37:46 -0700, Andres Freund wrote:

Hi,

On 2019-03-08 19:13:10 -0800, Andres Freund wrote:

Changes:
- I've added comments to all the callbacks in the first commit / the
scan commit
- I've renamed table_gimmegimmeslot to table_slot_create
- I've made the callbacks and their wrappers more consistently named
- I've added asserts for necessary callbacks in scan commit
- Lots of smaller cleanup
- Added a commit message

While 0001 is pretty bulky, the interesting bits concentrate on a
comparatively small area. I'd appreciate if somebody could give the
comments added in tableam.h a read (both on callbacks, and their
wrappers, as they have different audiences). It'd make sense to first
read the commit message, to understand the goal (and I'd obviously also
appreciate suggestions for improvements there as well).

I'm pretty happy with the current state of the scan patch. I plan to do
two more passes through it (formatting, comment polishing, etc. I don't
know of any functional changes needed), and then commit it, lest
somebody objects.

Here's a further polished version. Pretty boring changes:
- newlines
- put tableam.h into the correct place
- a few comment improvements, including typos
- changed reorderqueue_push() to accept the slot. That avoids an
unnecessary heap_copytuple() in some cases

No meaningful changes in later patches.

I pushed this. There's a failure on 32bit machines, unfortunately. The
problem comes from the ParallelTableScanDescData embedded in BTShared -
after the change the compiler can't see that that actually needs more
alignment, because ParallelTableScanDescData doesn't have any 8byte
members. That's a problem for just about all such "struct inheritance"
type tricks postgres, but we normally just allocate them separately,
guaranteeing maxalign. Given that we here already allocate enough space
after the BTShared struct, it's probably easiest to just not embed the
struct anymore.

I've pushed an attempt to fix this, which locally fixes 32bit
builds. It's copying the alignment logic for shm_toc_allocate, namely
using BUFFERALIGN for alignment. We should probably invent a more
appropriate define for this at some point...

Greetings,

Andres Freund

#122Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Haribabu Kommi (#118)
Re: Pluggable Storage - Andres's take

Hello.

I had a look on the patch set. I cannot see the thread structure
due to the depth and cannot get the picture on the all patches,
but I have some comments. I apologize in advance for possible
duplicate with upthread.

0001-Reduce-the...

This doesn't apply master.

TupleTableSlot *
ExecStoreHeapTuple(HeapTuple tuple,
TupleTableSlot *slot,
+ Oid relid,
bool shouldFree)

The comment for ExecStoreHeapTuple is missing the description
on "relid" parameter.

-        if (HeapTupleSatisfiesVisibility(tuple, &SnapshotDirty, hscan->rs_cbuf))
+        if (HeapTupleSatisfiesVisibility(tuple, RelationGetRelid(hscan->rs_scan.rs_rd),
+                                &SnapshotDirty, hscan->rs_cbuf))

The second parameter seems to be always
RelationGetRelid(...). Actually only relid is required but isn't
it better to take Relation instead of Oid as the second
parameter?

0005-Reorganize-am-as...

+ catalog. The contents of an table are entirely under the control of its

"of an table" => "of a table"

0006-Doc-update-of-Create-access..

+      be <type>index_am_handler</type> and for <literal>TABLE</literal>
+      access methods, it must be <type>table_am_handler</type>.
+      The C-level API that the handler function must implement varies
+      depending on the type of access method. The index access method API
+      is described in <xref linkend="index-access-methods"/> and the table access method
+      API is described in <xref linkend="table-access-methods"/>.

If table is the primary object, talbe-am should precede index-am?

0007-Doc-update-of-create-materi..

+ This clause specifies optional access method for the new materialize view;

"materialize view" => "materialized view"?

+ If this option is not specified, then the default table access method

I'm not sure the 'then' is needed.

+ is chosen for the new materialized view. see <xref linkend="guc-default-table-access-method"/>

"see" => "See"

0008-Doc-update-of-CREATE_TABLE..

+[ USING <replaceable class="parameter">method</replaceable> ]

*I* prefer access_method to just method.

+ If this option is not specified, then the default table access method

Same to 0007. "I'm not sure the 'then' is needed.".

+ is chosen for the new table. see <xref linkend="guc-default-table-access-method"/>

Same to 0007. " "see" => "See" "
"

0009-Doc-of-CREATE-TABLE-AS

Same as 0008.

0010-Table-access-method-API-

+ Any new <literal>TABLE ACCSESS METHOD</literal> developers can refer the exisitng <literal>HEAP</literal>

+ There are differnt type of API's that are defined and those details are below.

"differnt" => "different"

+ by the AM, in case if it support parallel scan.

"support" => "supports"

+ This API to return the total size that is required for the AM to perform

Total size of what? Shared memory chunk? Or parallel scan descriptor?

+     the parallel table scan. The minimum size that is required is 
+     <structname>ParallelBlockTableScanDescData</structname>.

I don't get what the "The minimum size" tells. Just reading
this I would always return the minimum size...

+     This API to perform the initialization of the <literal>parallel_scan</literal>
+     that is required for the parallel scan to be performed by the AM and also return
+     the total size that is required for the AM to perform the parallel table scan.

(Note: I'm not good at English..) Similar to the above. I cannot
read what the "size" is for.

In the code it is used as:

Size snapshot_off = rel->rd_tableam->parallelscan_initialize(rel, pscan);

(The varialbe name should be snapshot_offset) It is the offset
from the beginning of parallel scan descriptor but it should be
described in other representation, which I'm not sure of..

Something like this?

This API to initialize a parallel scan by the AM and also
return the consumed size so far of parallel scan descriptor.

(Sorry for not finishing. Time's up today.)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#123Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#114)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Sat, Mar 9, 2019 at 2:13 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

While 0001 is pretty bulky, the interesting bits concentrate on a
comparatively small area. I'd appreciate if somebody could give the
comments added in tableam.h a read (both on callbacks, and their
wrappers, as they have different audiences). It'd make sense to first
read the commit message, to understand the goal (and I'd obviously also
appreciate suggestions for improvements there as well).

I'm pretty happy with the current state of the scan patch. I plan to do
two more passes through it (formatting, comment polishing, etc. I don't
know of any functional changes needed), and then commit it, lest
somebody objects.

I found couple of typos in the committed patch, attached patch fixes them.
I am not sure about one typo, please check it once.

And I reviewed the 0002 patch, which is a pretty simple and it can be
committed.

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0001-table-access-methods-typos-correction.patchapplication/octet-stream; name=0001-table-access-methods-typos-correction.patchDownload
From dbe0007412dad96fed554ecacba2749b30a6b320 Mon Sep 17 00:00:00 2001
From: Hari Babu <kommi.haribabu@gmail.com>
Date: Sat, 16 Mar 2019 17:37:32 +1100
Subject: [PATCH] table access methods typos correction

---
 src/backend/access/heap/heapam_handler.c | 2 +-
 src/include/access/tableam.h             | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6a26fcef94..042502e54d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -12,7 +12,7 @@
  *
  *
  * NOTES
- *	  This files wires up the lower level heapam.c et routines with the
+ *	  This files wires up the lower level heapam.c etc routines with the
  *	  tableam abstraction.
  *
  *-------------------------------------------------------------------------
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 50b8ab9353..6d93e7ad45 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -74,7 +74,7 @@ typedef struct TableAmRoutine
 	 * synchronized scans, or page mode may be used (although not every AM
 	 * will support those).
 	 *
-	 * is_{bitmapscan, samplescan} specify whether the scan is inteded to
+	 * is_{bitmapscan, samplescan} specify whether the scan is intended to
 	 * support those types of scans.
 	 *
 	 * if temp_snap is true, the snapshot will need to be deallocated at
@@ -130,7 +130,7 @@ typedef struct TableAmRoutine
 	Size		(*parallelscan_initialize) (Relation rel, ParallelTableScanDesc pscan);
 
 	/*
-	 * Reinitilize `pscan` for a new scan. `rel` will be the same relation as
+	 * Reinitialize `pscan` for a new scan. `rel` will be the same relation as
 	 * when `pscan` was initialized by parallelscan_initialize.
 	 */
 	void		(*parallelscan_reinitialize) (Relation rel, ParallelTableScanDesc pscan);
-- 
2.20.1.windows.1

#124Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Kyotaro HORIGUCHI (#122)
5 attachment(s)
Re: Pluggable Storage - Andres's take

On Tue, Mar 12, 2019 at 7:28 PM Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello.

I had a look on the patch set. I cannot see the thread structure
due to the depth and cannot get the picture on the all patches,
but I have some comments. I apologize in advance for possible
duplicate with upthread.

Thanks for the review.

0001-Reduce-the...

This doesn't apply master.

Yes, these patches doesn't apply to the master.
These patches can only be applied to the code present in [1]https://github.com/anarazel/postgres-pluggable-storage.git.

TupleTableSlot *
ExecStoreHeapTuple(HeapTuple tuple,
TupleTableSlot *slot,
+ Oid relid,
bool shouldFree)

The comment for ExecStoreHeapTuple is missing the description
on "relid" parameter.

Corrected.

- if (HeapTupleSatisfiesVisibility(tuple, &SnapshotDirty,

hscan->rs_cbuf))

+ if (HeapTupleSatisfiesVisibility(tuple,

RelationGetRelid(hscan->rs_scan.rs_rd),

+ &SnapshotDirty, hscan->rs_cbuf))

The second parameter seems to be always
RelationGetRelid(...). Actually only relid is required but isn't
it better to take Relation instead of Oid as the second
parameter?

Currently the passed relid is used only in the case of
history MVCC verification function. Passing the direct relation
pointer will lessen the performance impact as there is no
need of calculation to find out the relid.

Will update and share it.

0005-Reorganize-am-as...

+ catalog. The contents of an table are entirely under the control of

its

"of an table" => "of a table"

Corrected.

0006-Doc-update-of-Create-access..

+      be <type>index_am_handler</type> and for <literal>TABLE</literal>
+      access methods, it must be <type>table_am_handler</type>.
+      The C-level API that the handler function must implement varies
+      depending on the type of access method. The index access method

API

+ is described in <xref linkend="index-access-methods"/> and the

table access method

+ API is described in <xref linkend="table-access-methods"/>.

If table is the primary object, talbe-am should precede index-am?

Changed.

0007-Doc-update-of-create-materi..

+ This clause specifies optional access method for the new materialize

view;

"materialize view" => "materialized view"?

Corrected.

+ If this option is not specified, then the default table access method

I'm not sure the 'then' is needed.

+ is chosen for the new materialized view. see <xref

linkend="guc-default-table-access-method"/>

"see" => "See"

0008-Doc-update-of-CREATE_TABLE..

+[ USING <replaceable class="parameter">method</replaceable> ]

*I* prefer access_method to just method.

+ If this option is not specified, then the default table access method

Same to 0007. "I'm not sure the 'then' is needed.".

+ is chosen for the new table. see <xref

linkend="guc-default-table-access-method"/>

Same to 0007. " "see" => "See" "
"

0009-Doc-of-CREATE-TABLE-AS

Same as 0008.

Corrected as per your suggestions.

0010-Table-access-method-API-

+ Any new <literal>TABLE ACCSESS METHOD</literal> developers can

refer the exisitng <literal>HEAP</literal>

+ There are differnt type of API's that are defined and those details

are below.

"differnt" => "different"

+ by the AM, in case if it support parallel scan.

"support" => "supports"

Corrected above both.

+ This API to return the total size that is required for the AM to

perform

Total size of what? Shared memory chunk? Or parallel scan descriptor?

It returns the required parallel scan descriptor memory size.

+     the parallel table scan. The minimum size that is required is
+     <structname>ParallelBlockTableScanDescData</structname>.

I don't get what the "The minimum size" tells. Just reading
this I would always return the minimum size...

+ This API to perform the initialization of the

<literal>parallel_scan</literal>

+ that is required for the parallel scan to be performed by the AM

and also return

+ the total size that is required for the AM to perform the parallel

table scan.

(Note: I'm not good at English..) Similar to the above. I cannot
read what the "size" is for.

In the code it is used as:

Size snapshot_off = rel->rd_tableam->parallelscan_initialize(rel,

pscan);

(The varialbe name should be snapshot_offset) It is the offset
from the beginning of parallel scan descriptor but it should be
described in other representation, which I'm not sure of..

Something like this?

This API to initialize a parallel scan by the AM and also
return the consumed size so far of parallel scan descriptor.

I updated the doc around those API's to make easy to understand.
Can you please check whether that helps?

updated patches are attached.

[1]: https://github.com/anarazel/postgres-pluggable-storage.git

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0004-Doc-updates-for-pluggable-table-access-method-syntax.patchapplication/octet-stream; name=0004-Doc-updates-for-pluggable-table-access-method-syntax.patchDownload
From 95fcb24134d1adcda8f0e56bed96a41af89aa146 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 7 Feb 2019 15:59:48 +1100
Subject: [PATCH 4/5] Doc updates for pluggable table access method syntax

Default_table_access_method GUC

CREATE ACCESS METHOD ... TYPE TABLE ..

CREATE MATERIALIZED VIEW ... USING heap2 ...

CREATE TABLE ... USING heap2 ...

CREATE TABLE AS ... USING heap2 ...
---
 doc/src/sgml/catalogs.sgml                    |  4 ++--
 doc/src/sgml/config.sgml                      | 24 +++++++++++++++++++
 doc/src/sgml/ref/create_access_method.sgml    | 14 +++++++----
 .../sgml/ref/create_materialized_view.sgml    | 13 ++++++++++
 doc/src/sgml/ref/create_table.sgml            | 17 ++++++++++++-
 doc/src/sgml/ref/create_table_as.sgml         | 13 ++++++++++
 6 files changed, 77 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 0fd792ff1a..21deba139c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -587,8 +587,8 @@
    The catalog <structname>pg_am</structname> stores information about
    relation access methods.  There is one row for each access method supported
    by the system.
-   Currently, only indexes have access methods.  The requirements for index
-   access methods are discussed in detail in <xref linkend="indexam"/>.
+   Currently, only tables, index and materialized views have access methods.
+   The requirements for access methods are discussed in detail in <xref linkend="am"/>.
   </para>
 
   <table>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6d42b7afe7..17a8871c51 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7214,6 +7214,30 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-default-table-access-method" xreflabel="default_table_access_method">
+      <term><varname>default_table_access_method</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>default_table_access_method</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        This parameter specifies the default table access method to use when creating
+        tables or materialized views if the <command>CREATE</command> command does
+        not explicitly specify an access method.
+       </para>
+
+       <para>
+        The value is either the name of a table access method, or an empty string
+        to specify using the default table access method of the current database.
+        If the value does not match the name of any existing table access method,
+        <productname>PostgreSQL</productname> will automatically use the default
+        table access method of the current database.
+       </para>
+
+      </listitem>
+     </varlistentry>
+     
      <varlistentry id="guc-default-tablespace" xreflabel="default_tablespace">
       <term><varname>default_tablespace</varname> (<type>string</type>)
       <indexterm>
diff --git a/doc/src/sgml/ref/create_access_method.sgml b/doc/src/sgml/ref/create_access_method.sgml
index 851c5e63be..c37491a713 100644
--- a/doc/src/sgml/ref/create_access_method.sgml
+++ b/doc/src/sgml/ref/create_access_method.sgml
@@ -61,7 +61,8 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
     <listitem>
      <para>
       This clause specifies the type of access method to define.
-      Only <literal>INDEX</literal> is supported at present.
+      Only <literal>TABLE</literal> and <literal>INDEX</literal>
+      are supported at present.
      </para>
     </listitem>
    </varlistentry>
@@ -75,10 +76,13 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
       that represents the access method.  The handler function must be
       declared to take a single argument of type <type>internal</type>,
       and its return type depends on the type of access method;
-      for <literal>INDEX</literal> access methods, it must
-      be <type>index_am_handler</type>.  The C-level API that the handler
-      function must implement varies depending on the type of access method.
-      The index access method API is described in <xref linkend="indexam"/>.
+      for <literal>TABLE</literal> access methods, it must
+      be <type>table_am_handler</type> and for <literal>INDEX</literal>
+      access methods, it must be <type>index_am_handler</type>.
+      The C-level API that the handler function must implement varies
+      depending on the type of access method. The table access method API
+      is described in <xref linkend="table-access-methods"/> and the index access method
+      API is described in <xref linkend="index-access-methods"/>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/doc/src/sgml/ref/create_materialized_view.sgml b/doc/src/sgml/ref/create_materialized_view.sgml
index 7f31ab4d26..24c4e200b0 100644
--- a/doc/src/sgml/ref/create_materialized_view.sgml
+++ b/doc/src/sgml/ref/create_materialized_view.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 <synopsis>
 CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     [ (<replaceable>column_name</replaceable> [, ...] ) ]
+    [ USING <replaceable class="parameter">method</replaceable> ]
     [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) ]
     [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
     AS <replaceable>query</replaceable>
@@ -85,6 +86,18 @@ CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional method for the new materialized view; The method should
+      be of type <literal>TABLE</literal>. See <xref linkend="table-access-methods"/> for more information. 
+      If this option is not specified, the default table access method is chosen for
+      the new materialized view. See <xref linkend="guc-default-table-access-method"/> for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 22dbc07b23..0457d70857 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -29,6 +29,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 ] )
 [ INHERITS ( <replaceable>parent_table</replaceable> [, ... ] ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -40,6 +41,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     [, ... ]
 ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -51,6 +53,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     [, ... ]
 ) ] { FOR VALUES <replaceable class="parameter">partition_bound_spec</replaceable> | DEFAULT }
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -954,7 +957,7 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
 
      <para>
       The access method must support <literal>amgettuple</literal> (see <xref
-      linkend="indexam"/>); at present this means <acronym>GIN</acronym>
+      linkend="index-access-methods"/>); at present this means <acronym>GIN</acronym>
       cannot be used.  Although it's allowed, there is little point in using
       B-tree or hash indexes with an exclusion constraint, because this
       does nothing that an ordinary unique constraint doesn't do better.
@@ -1137,6 +1140,18 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional method for the new table; The method should be
+      of type <literal>TABLE</literal>. See <xref linkend="table-access-methods"/> for more information.
+      If this option is not specified, the default table access method is chosen
+      for the new table. See <xref linkend="guc-default-table-access-method"/> for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 679e8f521e..c49a755e73 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 <synopsis>
 CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     [ (<replaceable>column_name</replaceable> [, ...] ) ]
+    [ USING <replaceable class="parameter">method</replaceable> ]
     [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
     [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
     [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -120,6 +121,18 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional method for the new table; The method should be
+      of type <literal>TABLE</literal>. See <xref linkend="table-access-methods"/> for more information.
+      If this option is not specified, then the default table access method is chosen
+      for the new table. see <xref linkend="guc-default-table-access-method"/> for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
-- 
2.20.1.windows.1

0003-Rename-indexam.sgml-to-am.sgml.patchapplication/octet-stream; name=0003-Rename-indexam.sgml-to-am.sgml.patchDownload
From 94bc3b94f87ca3c7add0b3c81f7aa0a0103ca70d Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 7 Feb 2019 16:13:56 +1100
Subject: [PATCH 3/5] Rename indexam.sgml to am.sgml

Reorganize am as both table and index

There is not much table access methods info.
---
 doc/src/sgml/{indexam.sgml => am.sgml} | 55 +++++++++++++++++---------
 doc/src/sgml/filelist.sgml             |  2 +-
 doc/src/sgml/postgres.sgml             |  2 +-
 doc/src/sgml/xindex.sgml               |  2 +-
 4 files changed, 40 insertions(+), 21 deletions(-)
 rename doc/src/sgml/{indexam.sgml => am.sgml} (97%)

diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/am.sgml
similarity index 97%
rename from doc/src/sgml/indexam.sgml
rename to doc/src/sgml/am.sgml
index 05102724ea..8d9edff622 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/am.sgml
@@ -1,16 +1,34 @@
-<!-- doc/src/sgml/indexam.sgml -->
+<!-- doc/src/sgml/am.sgml -->
 
-<chapter id="indexam">
- <title>Index Access Method Interface Definition</title>
+<chapter id="am">
+ <title>Access Method Interface Definition</title>
 
   <para>
    This chapter defines the interface between the core
-   <productname>PostgreSQL</productname> system and <firstterm>index access
-   methods</firstterm>, which manage individual index types.  The core system
-   knows nothing about indexes beyond what is specified here, so it is
-   possible to develop entirely new index types by writing add-on code.
+   <productname>PostgreSQL</productname> system and <firstterm>access
+   methods</firstterm>, which manage individual <literal>INDEX</literal>
+   and <literal>TABLE</literal> types.  The core system knows nothing
+   about these access methods beyond what is specified here, so it is
+   possible to develop entirely new access method types by writing add-on code.
   </para>
 
+ <sect1 id="table-access-methods">
+  <title>Overview of Table access methods </title>
+
+  <para>
+   All Tables in <productname>PostgreSQL</productname> are the primary
+   data store. Each table is stored as its own physical <firstterm>relation</firstterm>
+   and so is described by an entry in the <structname>pg_class</structname>
+   catalog. The table contents are entirely under the control of its
+   access method. (All the access methods furthermore use the standard page
+   layout described in <xref linkend="storage-page-layout"/>.)
+  </para>
+
+ </sect1>
+ 
+ <sect1 id="index-access-methods">
+  <title>Overview of Index access methods</title>
+
   <para>
    All indexes in <productname>PostgreSQL</productname> are what are known
    technically as <firstterm>secondary indexes</firstterm>; that is, the index is
@@ -43,7 +61,7 @@
    are reclaimed.
   </para>
 
- <sect1 id="index-api">
+ <sect2 id="index-api">
   <title>Basic API Structure for Indexes</title>
 
   <para>
@@ -217,9 +235,9 @@ typedef struct IndexAmRoutine
    conditions.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-functions">
+ <sect2 id="index-functions">
   <title>Index Access Method Functions</title>
 
   <para>
@@ -710,9 +728,9 @@ amparallelrescan (IndexScanDesc scan);
    the beginning.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-scanning">
+ <sect2 id="index-scanning">
   <title>Index Scanning</title>
 
   <para>
@@ -865,9 +883,9 @@ amparallelrescan (IndexScanDesc scan);
    if its internal implementation is unsuited to one API or the other.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-locking">
+ <sect2 id="index-locking">
   <title>Index Locking Considerations</title>
 
   <para>
@@ -979,9 +997,9 @@ amparallelrescan (IndexScanDesc scan);
    reduce the frequency of such transaction cancellations.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-unique-checks">
+ <sect2 id="index-unique-checks">
   <title>Index Uniqueness Checks</title>
 
   <para>
@@ -1128,9 +1146,9 @@ amparallelrescan (IndexScanDesc scan);
     </itemizedlist>
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-cost-estimation">
+ <sect2 id="index-cost-estimation">
   <title>Index Cost Estimation Functions</title>
 
   <para>
@@ -1377,5 +1395,6 @@ cost_qual_eval(&amp;index_qual_cost, path-&gt;indexquals, root);
    Examples of cost estimator functions can be found in
    <filename>src/backend/utils/adt/selfuncs.c</filename>.
   </para>
+ </sect2>
  </sect1>
 </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index a03ea1427b..52a5efca94 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -89,7 +89,7 @@
 <!ENTITY gin        SYSTEM "gin.sgml">
 <!ENTITY brin       SYSTEM "brin.sgml">
 <!ENTITY planstats    SYSTEM "planstats.sgml">
-<!ENTITY indexam    SYSTEM "indexam.sgml">
+<!ENTITY am         SYSTEM "am.sgml">
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 96d196d229..9dce0c5f81 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -250,7 +250,7 @@
   &tablesample-method;
   &custom-scan;
   &geqo;
-  &indexam;
+  &am;
   &generic-wal;
   &btree;
   &gist;
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml
index 9446f8b836..4fa821160c 100644
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@@ -36,7 +36,7 @@
    described in <classname>pg_am</classname>.  It is possible to add a
    new index access method by writing the necessary code and
    then creating an entry in <classname>pg_am</classname> &mdash; but that is
-   beyond the scope of this chapter (see <xref linkend="indexam"/>).
+   beyond the scope of this chapter (see <xref linkend="am"/>).
   </para>
 
   <para>
-- 
2.20.1.windows.1

0002-Removal-of-scan_update_snapshot-callback.patchapplication/octet-stream; name=0002-Removal-of-scan_update_snapshot-callback.patchDownload
From 8c098ad66e26635cb937fcb8bd4c1947c258db81 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Tue, 22 Jan 2019 11:29:20 +1100
Subject: [PATCH 2/5] Removal of scan_update_snapshot callback

The snapshot structure is avaiable in the tablescandesc
structure itself, so it can be accessed outside itself,
no need of any callback.
---
 src/backend/access/heap/heapam.c         | 18 ------------------
 src/backend/access/heap/heapam_handler.c |  1 -
 src/backend/access/table/tableam.c       | 13 +++++++++++++
 src/include/access/heapam.h              |  1 -
 src/include/access/tableam.h             | 16 ++++++----------
 5 files changed, 19 insertions(+), 30 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c1e4d07864..16a3a378eb 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1252,24 +1252,6 @@ heap_endscan(TableScanDesc sscan)
 	pfree(scan);
 }
 
-/* ----------------
- *		heap_update_snapshot
- *
- *		Update snapshot info in heap scan descriptor.
- * ----------------
- */
-void
-heap_update_snapshot(TableScanDesc sscan, Snapshot snapshot)
-{
-	HeapScanDesc scan = (HeapScanDesc) sscan;
-
-	Assert(IsMVCCSnapshot(snapshot));
-
-	RegisterSnapshot(snapshot);
-	scan->rs_scan.rs_snapshot = snapshot;
-	scan->rs_scan.rs_temp_snap = true;
-}
-
 /* ----------------
  *		heap_getnext	- retrieve next tuple in scan
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index f2719bb017..eab6a107a6 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2307,7 +2307,6 @@ static const TableAmRoutine heapam_methods = {
 	.scan_begin = heap_beginscan,
 	.scan_end = heap_endscan,
 	.scan_rescan = heap_rescan,
-	.scan_update_snapshot = heap_update_snapshot,
 	.scan_getnextslot = heap_getnextslot,
 
 	.parallelscan_estimate = table_block_parallelscan_estimate,
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 43e5444dcb..9b17cf1cd9 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -395,3 +395,16 @@ table_block_parallelscan_nextpage(Relation rel, ParallelBlockTableScanDesc pbsca
 
 	return page;
 }
+
+/*
+ * Update snapshot info in table scan descriptor.
+ */
+void
+table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot)
+{
+	Assert(IsMVCCSnapshot(snapshot));
+
+	RegisterSnapshot(snapshot);
+	scan->rs_snapshot = snapshot;
+	scan->rs_temp_snap = true;
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a290e4f053..0ae7923c95 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -182,7 +182,6 @@ extern void simple_heap_update(Relation relation, ItemPointer otid,
 				   HeapTuple tup);
 
 extern void heap_sync(Relation relation);
-extern void heap_update_snapshot(TableScanDesc scan, Snapshot snapshot);
 
 extern TransactionId heap_compute_xid_horizon_for_tuples(Relation rel,
 														 ItemPointerData *items,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index d0240d46f7..054f2102c8 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -90,7 +90,6 @@ typedef struct TableAmRoutine
 	void		(*scan_end) (TableScanDesc scan);
 	void		(*scan_rescan) (TableScanDesc scan, struct ScanKeyData *key, bool set_params,
 								bool allow_strat, bool allow_sync, bool allow_pagemode);
-	void		(*scan_update_snapshot) (TableScanDesc scan, Snapshot snapshot);
 	TupleTableSlot *(*scan_getnextslot) (TableScanDesc scan,
 										 ScanDirection direction, TupleTableSlot *slot);
 
@@ -390,15 +389,6 @@ table_rescan_set_params(TableScanDesc scan, struct ScanKeyData *key,
 										 allow_pagemode);
 }
 
-/*
- * Update snapshot info in heap scan descriptor.
- */
-static inline void
-table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot)
-{
-	scan->rs_rd->rd_tableam->scan_update_snapshot(scan, snapshot);
-}
-
 static inline TupleTableSlot *
 table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *slot)
 {
@@ -800,6 +790,12 @@ extern BlockNumber table_block_parallelscan_nextpage(Relation rel, ParallelBlock
 extern void table_block_parallelscan_startblock_init(Relation rel, ParallelBlockTableScanDesc pbscan);
 
 
+/* ----------------------------------------------------------------------------
+ * Helper function to update the snapshot of the scan descriptor
+ * ----------------------------------------------------------------------------
+ */
+extern void table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot);
+
 /* ----------------------------------------------------------------------------
  * Functions in tableamapi.c
  * ----------------------------------------------------------------------------
-- 
2.20.1.windows.1

0005-Table-access-method-API-explanation.patchapplication/octet-stream; name=0005-Table-access-method-API-explanation.patchDownload
From 8f5117806ab74241821ad33e2e8df9f90f2f6ffc Mon Sep 17 00:00:00 2001
From: Kommi <haribabuk@fast.au.fujitsu.com>
Date: Mon, 11 Mar 2019 15:44:44 +1100
Subject: [PATCH 5/5] Table access method API explanation

All the table access method API's and their details are explained.
---
 doc/src/sgml/am.sgml | 579 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 574 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/am.sgml b/doc/src/sgml/am.sgml
index 8d9edff622..85a94aefca 100644
--- a/doc/src/sgml/am.sgml
+++ b/doc/src/sgml/am.sgml
@@ -18,14 +18,583 @@
   <para>
    All Tables in <productname>PostgreSQL</productname> are the primary
    data store. Each table is stored as its own physical <firstterm>relation</firstterm>
-   and so is described by an entry in the <structname>pg_class</structname>
-   catalog. The table contents are entirely under the control of its
-   access method. (All the access methods furthermore use the standard page
-   layout described in <xref linkend="storage-page-layout"/>.)
+   and is described by an entry in the <structname>pg_class</structname>
+   catalog. A table's content is entirely controlled by its access method, although
+   all access methods use the same standard page layout described in <xref linkend="storage-page-layout"/>.
   </para>
 
+  <sect2 id="table-access-methods-api">
+   <title>Table access method API</title>
+
+   <para>
+    Each table access method is described by a row in the
+    <link linkend="catalog-pg-am"><structname>pg_am</structname></link> system
+    catalog. The <structname>pg_am</structname> entry specifies a <firstterm>type</firstterm>
+    of the access method and a <firstterm>handler function</firstterm> for the
+    access method. These entries can be created and deleted using the <xref linkend="sql-create-access-method"/>
+    and <xref linkend="sql-drop-access-method"/> SQL commands.
+   </para>
+
+   <para>
+    A table access method handler function must be declared to accept a
+    single argument of type <type>internal</type> and to return the
+    pseudo-type <type>table_am_handler</type>.  The argument is a dummy value that
+    simply serves to prevent handler functions from being called directly from
+    SQL commands.  The result of the function must be a palloc'd struct of
+    type <structname>TableAmRoutine</structname>, which contains everything
+    that the core code needs to know to make use of the table access method.
+    The <structname>TableAmRoutine</structname> struct, also called the access
+    method's <firstterm>API struct</firstterm>, includes fields specifying assorted
+    fixed properties of the access method, such as whether it can support
+    bitmap scans.  More importantly, it contains pointers to support
+    functions for the access method, which do all of the real work to access
+    tables.  These support functions are plain C functions and are not
+    visible or callable at the SQL level.  The support functions are described
+    in <structname>TableAmRoutine</structname> structure. For more details, please
+    refer the file <filename>src/include/access/tableam.h</filename>.
+   </para>
+
+   <para>
+    Any new <literal>TABLE ACCSESS METHOD</literal> developers can refer the exisitng <literal>HEAP</literal>
+    implementation present in the <filename>src/backend/heap/heapam_handler.c</filename> for more details of
+    how it is implemented for HEAP access method.
+   </para>
+
+   <para>
+    There are different type of API's that are defined and those details are below.
+   </para>
+
+   <sect3 id="slot-implementation-function">
+    <title>Slot implementation functions</title>
+
+   <para>
+<programlisting>
+const TupleTableSlotOps *(*slot_callbacks) (Relation rel);
+</programlisting>
+
+    This API expects the function should return the slot implementation that is specific to the AM.
+    Following are the predefined types of slot implementations that are available,
+    <literal>TTSOpsVirtual</literal>, <literal>TTSOpsHeapTuple</literal>,
+    <literal>TTSOpsMinimalTuple</literal> and <literal>TTSOpsBufferHeapTuple</literal>.
+    The AM implementations can use any one of them. For more details of these slot
+    specific implementations, you can refer <filename>src/include/executor/tuptable.h</filename>.
+   </para>
+   </sect3>
+
+   <sect3 id="table-scan-functions">
+    <title>Table scan functions</title>
+
+    <para>
+     The following API's are used for scanning of a table.
+    </para>
+
+    <para>
+<programlisting>
+TableScanDesc (*scan_begin) (Relation rel,
+                             Snapshot snapshot,
+                             int nkeys, struct ScanKeyData *key,
+                             ParallelTableScanDesc pscan,
+                             bool allow_strat,
+                             bool allow_sync,
+                             bool allow_pagemode,
+                             bool is_bitmapscan,
+                             bool is_samplescan,
+                             bool temp_snap);
+</programlisting>
+
+     This API to start a scan of a relation pointed by <literal>rel</literal> and returns the
+     <structname>TableScanDesc</structname>, which will be typically embed in a larger AM specific,
+     strcut. <literal>nkeys</literal> indicates results needs to be filtered based on the <literal>key</literal>.
+     <literal>pscan</literal> can be used by the AM, in case if it supports parallel scan.
+     <literal>allow_strat</literal>, <literal>allow_sync</literal> and <literal>allow_pagemode</literal>
+     are used for specifying whether the scan strategy, as whether it supports synchronize scans or
+     pagemode scans (although every AM is not required to support these). <literal>is_bitmapscan</literal>
+     and <literal>is_samplescan</literal> are used to specify whether the scan is intended to support
+     those type of scans are not? <literal>temp_snap</literal> indicates the provided snapshot is a
+     temporary allocated and it needs to be freed at the scan end.
+    </para>
+
+    <para>
+<programlisting>
+void        (*scan_end) (TableScanDesc scan);
+</programlisting>
+
+     This API to end the scan that is started by the API <literal>scan_begin</literal>
+     by releasing the resources. <structfield>TableScanDesc.rs_snapshot</structfield>
+     needs to be unregistered and it can be deallocated based on <structfield>TableScanDesc.temp_snap</structfield>.
+    </para>
+
+    <para>
+<programlisting>
+void        (*scan_rescan) (TableScanDesc scan, struct ScanKeyData *key, bool set_params,
+                            bool allow_strat, bool allow_sync, bool allow_pagemode);
+</programlisting>
+
+     This API to restart the given relation scan that is already started by the
+     API <literal>scan_begin</literal>. if <literal>set_params</literal> is set
+     to true, consider the provided options into the scan.
+    </para>
+
+    <para>
+<programlisting>
+TupleTableSlot *(*scan_getnextslot) (TableScanDesc scan,
+                                     ScanDirection direction, TupleTableSlot *slot);
+</programlisting>
+
+     This API to return the next satisified tuple from the scan started by the API
+     <literal>scan_begin</literal> and store it in the <literal>slot</literal>.
+    </para>
+
+   </sect3>
+
+   <sect3 id="parallel-table-scan-function">
+    <title>parallel table scan functions</title>
+
+    <para>
+     The following API's are used to perform the parallel table scan.
+    </para>
+
+    <para>
+<programlisting>
+Size        (*parallelscan_estimate) (Relation rel);
+</programlisting>
+
+     This API to return the total size that is required for the AM to perform
+     the parallel table scan. The requied size must include the <structname>ParallelTableScanDesc</structname>
+     which is typically embed in the AM specific struct.
+    </para>
+
+    <para>
+<programlisting>
+Size        (*parallelscan_initialize) (Relation rel, ParallelTableScanDesc pscan);
+</programlisting>
+
+     This API to perform the initialization of the <literal>pscan</literal>
+     that is required for the parallel scan to be performed by the AM and also return
+     the size that is estimated by the <literal>parallelscan_estimate</literal>.
+    </para>
+
+    <para>
+<programlisting>
+void        (*parallelscan_reinitialize) (Relation rel, ParallelTableScanDesc pscan);
+</programlisting>
+
+     This API to reinitalize the parallel scan structure pointed by the <literal>pscan</literal>
+     for the same relation.
+    </para>
+
+   </sect3>
+
+   <sect3 id="index-scan-functions">
+    <title>Index scan functions</title>
+
+    <para>
+<programlisting>
+struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+</programlisting>
+
+     This API to prepare fetching tuples from the relation, as needed when fetching
+     from index scan. The API needs to return the allocated and initialized <structname>IndexFetchTableData</structname>
+     strutucture, which is typically embed in the AM specific struct.
+    </para>
+
+    <para>
+<programlisting>
+void        (*index_fetch_reset) (struct IndexFetchTableData *data);
+</programlisting>
+
+     This API to reset the index fetch, typically it releases the AM specific resources
+     that are held by <structname>IndexFetchTableData</structname> of a index scan.
+    </para>
+
+    <para>
+<programlisting>
+void        (*index_fetch_end) (struct IndexFetchTableData *data);
+</programlisting>
+
+     This API to release AM-specific resources held by the <structname>IndexFetchTableData</structname>
+     and free the memory of <structname>IndexFetchTableData</structname> itself.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*index_fetch_tuple) (struct IndexFetchTableData *scan,
+                                  ItemPointer tid,
+                                  Snapshot snapshot,
+                                  TupleTableSlot *slot,
+                                  bool *call_again, bool *all_dead);
+</programlisting>
+
+     This API to fetch the tuple pointed by <literal>tid</literal> of a relation and store it in the
+     <literal>slot</literal> after performing visibility check according the provided <literal>snapshot</literal>.
+     Returns true when the tuple is found or false. <literal>call_again</literal> is false when the API
+     is called for the first time with the <literal>tid</literal>, in case if there are any potential match for
+     another tuple, <literal>call_again</literal> must be set to true to indicate the caller to execute the
+     API again to fetch the tuple. <literal>all_dead</literal> needs to be set to true when the tuple is not
+     visible.
+    </para>
+
+    <para>
+<programlisting>
+TransactionId (*compute_xid_horizon_for_tuples) (Relation rel,
+                                                 ItemPointerData *items,
+                                                 int nitems);
+</programlisting>
+
+     This API to get the newest xid among the provided tuples by <literal>items</literal>. This is used
+     to compute what snapshots to conflict with the <literal>items</literal> when replaying WAL records
+     for page-level index vacuums.
+    </para>
+
+   </sect3>
+
+   <sect3 id="non-modifying-tuple-functions">
+    <title>Non modifying tuple functions</title>
+
+    <para>
+<programlisting>
+bool        (*tuple_satisfies_snapshot) (Relation rel,
+                                         TupleTableSlot *slot,
+                                         Snapshot snapshot);
+</programlisting>
+
+     This API performs the tuple visibility that is present in the <literal>slot</literal>
+     based on provided snapshot and returns true if the current tuple is visible, otherwise false.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*tuple_fetch_row_version) (Relation rel,
+                                        ItemPointer tid,
+                                        Snapshot snapshot,
+                                        TupleTableSlot *slot,
+                                        Relation stats_relation);
+</programlisting>
+
+     This API to fetches the latest tuple specified by the ItemPointer <literal>tid</literal>
+     and store it in the slot. For e.g, in the case if Heap AM, the update chains are created
+     whenever the tuple is updated, so the function should fetch the latest tuple.
+    </para>
+
+    <para>
+<programlisting>
+void        (*tuple_get_latest_tid) (Relation rel,
+                                     Snapshot snapshot,
+                                     ItemPointer tid);
+</programlisting>
+
+     This API to get the TID of the latest version of the tuple based on the specified
+     ItemPointer. For e.g, in the case of Heap AM, the update chains are created whenever
+     any tuple is updated. This API is useful to find out latest ItemPointer.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*tuple_fetch_follow) (struct IndexFetchTableData *scan,
+                                   ItemPointer tid,
+                                   Snapshot snapshot,
+                                   TupleTableSlot *slot,
+                                   bool *call_again, bool *all_dead);
+</programlisting>
+
+     This API is used to fetch the tuple pointed by the ItemPointer based on the
+     IndexFetchTableData and store it in the specified slot and also updates the flags.
+     This API is called from the index scan operation.
+    </para>
+
+   </sect3>
+
+   <sect3 id="manipulation-of-physical-tuples-functions">
+    <title>Manipulation of physical tuples functions</title>
+
+    <para>
+<programlisting>
+void        (*tuple_insert) (Relation rel, TupleTableSlot *slot, CommandId cid,
+                             int options, struct BulkInsertStateData *bistate);
+</programlisting>
+
+     This API to insert the tuple contained in the provided slot into the relation
+     and update the unique identifier of the tuple <literal>ItemPointerData</literal>
+     in the slot, use the BulkInsertStateData if available.
+    </para>
+
+    <para>
+<programlisting>
+void        (*tuple_insert_speculative) (Relation rel,
+                                         TupleTableSlot *slot,
+                                         CommandId cid,
+                                         int options,
+                                         struct BulkInsertStateData *bistate,
+                                         uint32 specToken);
+</programlisting>
+
+     This API is similar like <literal>tuple_insert</literal> API, but it inserts the tuple
+     with addtional information that is necessray for speculative insertion, the insertion will
+     be confirmed later based on its successful insertion to the index.
+    </para>
+
+    <para>
+<programlisting>
+void        (*tuple_complete_speculative) (Relation rel,
+                                           TupleTableSlot *slot,
+                                           uint32 specToken,
+                                           bool succeeded);
+</programlisting>
+
+     This API to complete the speculative insertion of a tuple started by <literal>tuple_insert_speculative</literal>,
+     invoked after finishing the index insert and returns whether the operation is successfule or not?
+    </para>
+
+    <para>
+<programlisting>
+HTSU_Result (*tuple_delete) (Relation rel,
+                             ItemPointer tid,
+                             CommandId cid,
+                             Snapshot snapshot,
+                             Snapshot crosscheck,
+                             bool wait,
+                             HeapUpdateFailureData *hufd,
+                             bool changingPart);
+</programlisting>
+
+     This API to delete a tuple of the relation pointed by the ItemPointer and returns the
+     result of the operation. In case of any failure updates the hufd.
+    </para>
+
+    <para>
+<programlisting>
+HTSU_Result (*tuple_update) (Relation rel,
+                             ItemPointer otid,
+                             TupleTableSlot *slot,
+                             CommandId cid,
+                             Snapshot snapshot,
+                             Snapshot crosscheck,
+                             bool wait,
+                             HeapUpdateFailureData *hufd,
+                             LockTupleMode *lockmode,
+                             bool *update_indexes);
+</programlisting>
+
+     This API to perform updating a tuple with the new tuple pointed by the ItemPointer and returns
+     the result of the operation and also updates the flag whether the index needs an update or not?
+     In case of any failure it should update the hufd flag.
+    </para>
+
+    <para>
+<programlisting>
+void        (*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
+                             CommandId cid, int options, struct BulkInsertStateData *bistate);
+</programlisting>
+
+     This API to perform insertion of multiple tuples into the relation for faster data insertion.
+     use the BulkInsertStateData if available.
+    </para>
+
+    <para>
+<programlisting>
+HTSU_Result (*tuple_lock) (Relation rel,
+                           ItemPointer tid,
+                           Snapshot snapshot,
+                           TupleTableSlot *slot,
+                           CommandId cid,
+                           LockTupleMode mode,
+                           LockWaitPolicy wait_policy,
+                           uint8 flags,
+                           HeapUpdateFailureData *hufd);
+</programlisting>
+
+     This API to lock the specified tuple pointed by the ItemPointer <literal>tid</literal>
+     of its newest version and returns the result of the operation. In case of failure updates the hufd.
+    </para>
+
+    <para>
+<programlisting>
+void        (*finish_bulk_insert) (Relation rel, int options);
+</programlisting>
+
+     This API to perform the operations necessary to complete insertions made
+     via <literal>tuple_insert</literal> and <literal>multi_insert</literal> with a
+     BulkInsertState specified. This e.g. may e.g. used to flush the relation when
+     inserting with skipping WAL or may be no operation.
+    </para>
+
+   </sect3>
+
+   <sect3 id="ddl-related-functions">
+    <title>DDL related functions</title>
+
+    <para>
+<programlisting>
+void        (*relation_set_new_filenode) (Relation rel,
+                                          char persistence,
+                                          TransactionId *freezeXid,
+                                          MultiXactId *minmulti);
+</programlisting>
+
+     This API to create the storage that is necessary to store the tuples of the relation
+     and also updates the minimum XID that is possible to insert the tuples. For e.g, the Heap AM,
+     should create the relfilenode that is necessary to store the heap tuples.
+    </para>
+
+    <para>
+<programlisting>
+void        (*relation_nontransactional_truncate) (Relation rel);
+</programlisting>
+
+     This API is used to truncate the specified relation, this operation is not non-reversible.
+    </para>
+
+    <para>
+<programlisting>
+void        (*relation_copy_data) (Relation rel, RelFileNode newrnode);
+</programlisting>
+
+     This API to perform the copy of the relation from existing filenode to the new filenode
+     specified by the <literal>newrnode</literal> and removes the existing filenode.
+    </para>
+
+    <para>
+<programlisting>
+void        (*relation_vacuum) (Relation onerel, int options,
+                                struct VacuumParams *params, BufferAccessStrategy bstrategy);
+</programlisting>
+
+     This API performs vacuuming of the relation based on the specified params.
+     It Gathers all the dead tuples of the relation and clean them including
+     the indexes.
+    </para>
+
+    <para>
+<programlisting>
+void        (*scan_analyze_next_block) (TableScanDesc scan, BlockNumber blockno,
+                                        BufferAccessStrategy bstrategy);
+</programlisting>
+
+     This API to return a relation block, required to perform tuple analysis. Analysis of this
+     information is used by the planner to optimize the query planning on this relation.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*scan_analyze_next_tuple) (TableScanDesc scan, TransactionId OldestXmin,
+                                        double *liverows, double *deadrows, TupleTableSlot *slot);
+</programlisting>
+
+     This API to get the next visible tuple from the block being scanned based on the snapshot
+     and also updates the number of live and dead tuples encountered.
+    </para>
+
+    <para>
+<programlisting>
+void        (*relation_copy_for_cluster) (Relation NewHeap, Relation OldHeap, Relation OldIndex,
+                                          bool use_sort,
+                                          TransactionId OldestXmin, TransactionId FreezeXid, MultiXactId MultiXactCutoff,
+                                          double *num_tuples, double *tups_vacuumed, double *tups_recently_dead);
+</programlisting>
+
+     This API to make a copy of the content of a relation, optionally sorted using either the specified index or by sorting
+     explicitly. It also removes the dead tuples.
+    </para>
+
+    <para>
+<programlisting>
+double      (*index_build_range_scan) (Relation heap_rel,
+                                       Relation index_rel,
+                                       IndexInfo *index_nfo,
+                                       bool allow_sync,
+                                       bool anyvisible,
+                                       BlockNumber start_blockno,
+                                       BlockNumber end_blockno,
+                                       IndexBuildCallback callback,
+                                       void *callback_state,
+                                       TableScanDesc scan);
+</programlisting>
+
+     This API to scan the specified blocks of a given relation and insert them into the specified index
+     using the provided the callback function.
+    </para>
+
+    <para>
+<programlisting>
+void        (*index_validate_scan) (Relation heap_rel,
+                                    Relation index_rel,
+                                    IndexInfo *index_info,
+                                    Snapshot snapshot,
+                                    struct ValidateIndexState *state);
+</programlisting>
+
+     This API to scan the table according to the given snapshot and insert tuples
+     satisfying the snapshot into the specified index, provided their TIDs are
+     also present in the <structname>ValidateIndexState</structname> struct;
+     this API is used as the last phase of a concurrent index build.
+    </para>
+
+   </sect3>
+
+   <sect3 id="planner-functions">
+    <title>planner functions</title>
+
+    <para>
+<programlisting>
+void        (*relation_estimate_size) (Relation rel, int32 *attr_widths,
+                                       BlockNumber *pages, double *tuples, double *allvisfrac);
+</programlisting>
+
+     This API estimates the total size of the relation and also returns the number of
+     pages, tuples and etc related to the corresponding relation.
+    </para>
+
+   </sect3>
+
+   <sect3 id="executor-functions">
+    <title>executor functions</title>
+
+    <para>
+<programlisting>
+bool        (*scan_bitmap_pagescan) (TableScanDesc scan,
+                                     TBMIterateResult *tbmres);
+</programlisting>
+
+     This API to scan the relation block specified in the scan descriptor to collect and return the
+     tuples requested by the <structname>tbmres</structname> based on the visibility.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*scan_bitmap_pagescan_next) (TableScanDesc scan,
+                                          TupleTableSlot *slot);
+</programlisting>
+
+     This API to get the next tuple from the set of tuples of a given page specified in the scan descriptor
+     and return the provided slot; returns false in case if there are no more tuples.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*scan_sample_next_block) (TableScanDesc scan,
+                                       struct SampleScanState *scanstate);
+</programlisting>
+
+     This API to select the next block of a relation using the given sampling method or sequentially and
+     set its information in the scan descriptor.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*scan_sample_next_tuple) (TableScanDesc scan,
+                                       struct SampleScanState *scanstate,
+                                       TupleTableSlot *slot);
+</programlisting>
+
+     This API get the next tuple to sample from the current sampling block based on
+     the sampling method, otherwise get the next visible tuple of the block that is
+     choosen from the <literal>scan_sample_next_block</literal>.
+    </para>
+
+  </sect3>
+  </sect2>
  </sect1>
- 
+
  <sect1 id="index-access-methods">
   <title>Overview of Index access methods</title>
 
-- 
2.20.1.windows.1

0001-Reduce-the-use-of-HeapTuple-t_tableOid.patchapplication/octet-stream; name=0001-Reduce-the-use-of-HeapTuple-t_tableOid.patchDownload
From c00cc3e8f199d3c908f13fff13d2b518137d7bab Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Wed, 16 Jan 2019 18:43:47 +1100
Subject: [PATCH 1/5] Reduce the use of HeapTuple t_tableOid

Except trigger and where the HeapTuple is generated
and passed from slots, still the t_tableOid is used.
This needs to be replaced when the t_tableOid is stored
as a seperate variable/parameter.
---
 contrib/hstore/hstore_io.c                    |  2 --
 contrib/pg_visibility/pg_visibility.c         |  1 -
 contrib/pgstattuple/pgstatapprox.c            |  1 -
 contrib/pgstattuple/pgstattuple.c             |  3 +-
 contrib/postgres_fdw/postgres_fdw.c           | 11 ++++--
 src/backend/access/common/heaptuple.c         |  7 ----
 src/backend/access/heap/heapam.c              | 35 +++++--------------
 src/backend/access/heap/heapam_handler.c      | 27 +++++++-------
 src/backend/access/heap/heapam_visibility.c   | 20 ++++-------
 src/backend/access/heap/pruneheap.c           |  2 --
 src/backend/access/heap/tuptoaster.c          |  3 --
 src/backend/access/heap/vacuumlazy.c          |  2 --
 src/backend/access/index/genam.c              |  1 -
 src/backend/catalog/indexing.c                |  2 +-
 src/backend/commands/analyze.c                |  2 +-
 src/backend/commands/functioncmds.c           |  3 +-
 src/backend/commands/schemacmds.c             |  1 -
 src/backend/commands/trigger.c                | 21 +++++------
 src/backend/executor/execExprInterp.c         |  1 -
 src/backend/executor/execTuples.c             | 30 ++++++++++------
 src/backend/executor/execUtils.c              |  2 --
 src/backend/executor/nodeAgg.c                |  3 +-
 src/backend/executor/nodeGather.c             |  1 +
 src/backend/executor/nodeGatherMerge.c        |  1 +
 src/backend/executor/nodeIndexonlyscan.c      |  4 +--
 src/backend/executor/nodeIndexscan.c          |  3 +-
 src/backend/executor/nodeModifyTable.c        |  6 +---
 src/backend/executor/nodeSetOp.c              |  1 +
 src/backend/executor/spi.c                    |  1 -
 src/backend/executor/tqueue.c                 |  1 -
 src/backend/replication/logical/decode.c      |  9 -----
 .../replication/logical/reorderbuffer.c       |  4 +--
 src/backend/utils/adt/expandedrecord.c        |  1 -
 src/backend/utils/adt/jsonfuncs.c             |  2 --
 src/backend/utils/adt/rowtypes.c              | 10 ------
 src/backend/utils/cache/catcache.c            |  1 -
 src/backend/utils/sort/tuplesort.c            |  7 ++--
 src/include/access/heapam.h                   |  3 +-
 src/include/executor/tuptable.h               |  5 ++-
 src/pl/plpgsql/src/pl_exec.c                  |  2 --
 src/test/regress/regress.c                    |  1 -
 41 files changed, 93 insertions(+), 150 deletions(-)

diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 745497c76f..05244e77ef 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -845,7 +845,6 @@ hstore_from_record(PG_FUNCTION_ARGS)
 		/* Build a temporary HeapTuple control structure */
 		tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 		ItemPointerSetInvalid(&(tuple.t_self));
-		tuple.t_tableOid = InvalidOid;
 		tuple.t_data = rec;
 
 		values = (Datum *) palloc(ncolumns * sizeof(Datum));
@@ -998,7 +997,6 @@ hstore_populate_record(PG_FUNCTION_ARGS)
 		/* Build a temporary HeapTuple control structure */
 		tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 		ItemPointerSetInvalid(&(tuple.t_self));
-		tuple.t_tableOid = InvalidOid;
 		tuple.t_data = rec;
 	}
 
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index c9166730fe..503f00408c 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -657,7 +657,6 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
 			ItemPointerSet(&(tuple.t_self), blkno, offnum);
 			tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 			tuple.t_len = ItemIdGetLength(itemid);
-			tuple.t_tableOid = relid;
 
 			/*
 			 * If we're checking whether the page is all-visible, we expect
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index ed62aef766..17879f115a 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -155,7 +155,6 @@ statapprox_heap(Relation rel, output_type *stat)
 
 			tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 			tuple.t_len = ItemIdGetLength(itemid);
-			tuple.t_tableOid = RelationGetRelid(rel);
 
 			/*
 			 * We follow VACUUM's lead in counting INSERT_IN_PROGRESS tuples
diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 9bcb640884..a0e7abe748 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -344,7 +344,8 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 		/* must hold a buffer lock to call HeapTupleSatisfiesVisibility */
 		LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
 
-		if (HeapTupleSatisfiesVisibility(tuple, &SnapshotDirty, hscan->rs_cbuf))
+		if (HeapTupleSatisfiesVisibility(tuple, RelationGetRelid(hscan->rs_scan.rs_rd),
+								&SnapshotDirty, hscan->rs_cbuf))
 		{
 			stat.tuple_len += tuple->t_len;
 			stat.tuple_count++;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2f387fac42..bbe7f3010f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1471,6 +1471,8 @@ postgresIterateForeignScan(ForeignScanState *node)
 	 */
 	ExecStoreHeapTuple(fsstate->tuples[fsstate->next_tuple++],
 					   slot,
+					   fsstate->rel ?
+							   RelationGetRelid(fsstate->rel) : InvalidOid,
 					   false);
 
 	return slot;
@@ -3511,7 +3513,8 @@ store_returning_result(PgFdwModifyState *fmstate,
 		 * The returning slot will not necessarily be suitable to store
 		 * heaptuples directly, so allow for conversion.
 		 */
-		ExecForceStoreHeapTuple(newtup, slot);
+		ExecForceStoreHeapTuple(newtup, slot,
+					fmstate->rel ? RelationGetRelid(fmstate->rel) : InvalidOid);
 		ExecMaterializeSlot(slot);
 		pfree(newtup);
 	}
@@ -3785,7 +3788,11 @@ get_returning_data(ForeignScanState *node)
 												dmstate->retrieved_attrs,
 												node,
 												dmstate->temp_cxt);
-			ExecStoreHeapTuple(newtup, slot, false);
+			ExecStoreHeapTuple(newtup,
+								slot,
+								dmstate->rel ?
+								RelationGetRelid(dmstate->rel) : InvalidOid,
+								false);
 		}
 		PG_CATCH();
 		{
diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index 783b04a3cb..9e4c3c6cef 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -687,7 +687,6 @@ heap_copytuple(HeapTuple tuple)
 	newTuple = (HeapTuple) palloc(HEAPTUPLESIZE + tuple->t_len);
 	newTuple->t_len = tuple->t_len;
 	newTuple->t_self = tuple->t_self;
-	newTuple->t_tableOid = tuple->t_tableOid;
 	newTuple->t_data = (HeapTupleHeader) ((char *) newTuple + HEAPTUPLESIZE);
 	memcpy((char *) newTuple->t_data, (char *) tuple->t_data, tuple->t_len);
 	return newTuple;
@@ -713,7 +712,6 @@ heap_copytuple_with_tuple(HeapTuple src, HeapTuple dest)
 
 	dest->t_len = src->t_len;
 	dest->t_self = src->t_self;
-	dest->t_tableOid = src->t_tableOid;
 	dest->t_data = (HeapTupleHeader) palloc(src->t_len);
 	memcpy((char *) dest->t_data, (char *) src->t_data, src->t_len);
 }
@@ -848,7 +846,6 @@ expand_tuple(HeapTuple *targetHeapTuple,
 			= targetTHeader
 			= (HeapTupleHeader) ((char *) *targetHeapTuple + HEAPTUPLESIZE);
 		(*targetHeapTuple)->t_len = len;
-		(*targetHeapTuple)->t_tableOid = sourceTuple->t_tableOid;
 		(*targetHeapTuple)->t_self = sourceTuple->t_self;
 
 		targetTHeader->t_infomask = sourceTHeader->t_infomask;
@@ -1076,7 +1073,6 @@ heap_form_tuple(TupleDesc tupleDescriptor,
 	 */
 	tuple->t_len = len;
 	ItemPointerSetInvalid(&(tuple->t_self));
-	tuple->t_tableOid = InvalidOid;
 
 	HeapTupleHeaderSetDatumLength(td, len);
 	HeapTupleHeaderSetTypeId(td, tupleDescriptor->tdtypeid);
@@ -1160,7 +1156,6 @@ heap_modify_tuple(HeapTuple tuple,
 	 */
 	newTuple->t_data->t_ctid = tuple->t_data->t_ctid;
 	newTuple->t_self = tuple->t_self;
-	newTuple->t_tableOid = tuple->t_tableOid;
 
 	return newTuple;
 }
@@ -1223,7 +1218,6 @@ heap_modify_tuple_by_cols(HeapTuple tuple,
 	 */
 	newTuple->t_data->t_ctid = tuple->t_data->t_ctid;
 	newTuple->t_self = tuple->t_self;
-	newTuple->t_tableOid = tuple->t_tableOid;
 
 	return newTuple;
 }
@@ -1463,7 +1457,6 @@ heap_tuple_from_minimal_tuple(MinimalTuple mtup)
 	result = (HeapTuple) palloc(HEAPTUPLESIZE + len);
 	result->t_len = len;
 	ItemPointerSetInvalid(&(result->t_self));
-	result->t_tableOid = InvalidOid;
 	result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
 	memcpy((char *) result->t_data + MINIMAL_TUPLE_OFFSET, mtup, mtup->t_len);
 	memset(result->t_data, 0, offsetof(HeapTupleHeaderData, t_infomask2));
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2a88982576..c1e4d07864 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -414,7 +414,6 @@ heapgetpage(TableScanDesc sscan, BlockNumber page)
 			HeapTupleData loctup;
 			bool		valid;
 
-			loctup.t_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 			loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
 			loctup.t_len = ItemIdGetLength(lpp);
 			ItemPointerSet(&(loctup.t_self), page, lineoff);
@@ -422,7 +421,8 @@ heapgetpage(TableScanDesc sscan, BlockNumber page)
 			if (all_visible)
 				valid = true;
 			else
-				valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer);
+				valid = HeapTupleSatisfiesVisibility(&loctup, RelationGetRelid(scan->rs_scan.rs_rd),
+									snapshot, buffer);
 
 			CheckForSerializableConflictOut(valid, scan->rs_scan.rs_rd, &loctup,
 											buffer, snapshot);
@@ -640,7 +640,7 @@ heapgettup(HeapScanDesc scan,
 				/*
 				 * if current tuple qualifies, return it.
 				 */
-				valid = HeapTupleSatisfiesVisibility(tuple,
+				valid = HeapTupleSatisfiesVisibility(tuple, RelationGetRelid(scan->rs_scan.rs_rd),
 													 snapshot,
 													 scan->rs_cbuf);
 
@@ -1156,9 +1156,6 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 	if (!is_bitmapscan && snapshot)
 		PredicateLockRelation(relation, snapshot);
 
-	/* we only need to set this up once */
-	scan->rs_ctup.t_tableOid = RelationGetRelid(relation);
-
 	/*
 	 * we do this here instead of in initscan() because heap_rescan also calls
 	 * initscan() and we don't want to allocate memory again
@@ -1383,6 +1380,7 @@ heap_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *s
 
 	pgstat_count_heap_getnext(scan->rs_scan.rs_rd);
 
+	slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 	return ExecStoreBufferHeapTuple(&scan->rs_ctup, slot,
 									scan->rs_cbuf);
 }
@@ -1486,12 +1484,11 @@ heap_fetch(Relation relation,
 	ItemPointerCopy(tid, &(tuple->t_self));
 	tuple->t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tuple->t_len = ItemIdGetLength(lp);
-	tuple->t_tableOid = RelationGetRelid(relation);
 
 	/*
 	 * check tuple visibility, then release lock
 	 */
-	valid = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
+	valid = HeapTupleSatisfiesVisibility(tuple, RelationGetRelid(relation), snapshot, buffer);
 
 	if (valid)
 		PredicateLockTuple(relation, tuple, snapshot);
@@ -1596,7 +1593,6 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 
 		heapTuple->t_data = (HeapTupleHeader) PageGetItem(dp, lp);
 		heapTuple->t_len = ItemIdGetLength(lp);
-		heapTuple->t_tableOid = RelationGetRelid(relation);
 		ItemPointerSetOffsetNumber(&heapTuple->t_self, offnum);
 
 		/*
@@ -1633,7 +1629,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 			ItemPointerSet(&(heapTuple->t_self), BufferGetBlockNumber(buffer), offnum);
 
 			/* If it's visible per the snapshot, we must return it */
-			valid = HeapTupleSatisfiesVisibility(heapTuple, snapshot, buffer);
+			valid = HeapTupleSatisfiesVisibility(heapTuple, RelationGetRelid(relation), snapshot, buffer);
 			CheckForSerializableConflictOut(valid, relation, heapTuple,
 											buffer, snapshot);
 			/* reset to original, non-redirected, tid */
@@ -1790,7 +1786,6 @@ heap_get_latest_tid(Relation relation,
 		tp.t_self = ctid;
 		tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 		tp.t_len = ItemIdGetLength(lp);
-		tp.t_tableOid = RelationGetRelid(relation);
 
 		/*
 		 * After following a t_ctid link, we might arrive at an unrelated
@@ -1807,7 +1802,7 @@ heap_get_latest_tid(Relation relation,
 		 * Check tuple visibility; if visible, set it as the new result
 		 * candidate.
 		 */
-		valid = HeapTupleSatisfiesVisibility(&tp, snapshot, buffer);
+		valid = HeapTupleSatisfiesVisibility(&tp, RelationGetRelid(relation), snapshot, buffer);
 		CheckForSerializableConflictOut(valid, relation, &tp, buffer, snapshot);
 		if (valid)
 			*tid = ctid;
@@ -2157,7 +2152,6 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 
 	HeapTupleHeaderSetCmin(tup->t_data, cid);
 	HeapTupleHeaderSetXmax(tup->t_data, 0); /* for cleanliness */
-	tup->t_tableOid = RelationGetRelid(relation);
 
 	/*
 	 * If the new tuple is too big for storage or contains already toasted
@@ -2215,9 +2209,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	{
 		heaptuples[i] = heap_prepare_insert(relation, ExecFetchSlotHeapTuple(slots[i], true, NULL),
 											xid, cid, options);
-
-		if (slots[i]->tts_tableOid != InvalidOid)
-			heaptuples[i]->t_tableOid = slots[i]->tts_tableOid;
 	}
 
 	/*
@@ -2607,7 +2598,6 @@ heap_delete(Relation relation, ItemPointer tid,
 	lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
 	Assert(ItemIdIsNormal(lp));
 
-	tp.t_tableOid = RelationGetRelid(relation);
 	tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tp.t_len = ItemIdGetLength(lp);
 	tp.t_self = *tid;
@@ -2724,7 +2714,7 @@ l1:
 	if (crosscheck != InvalidSnapshot && result == HeapTupleMayBeUpdated)
 	{
 		/* Perform additional check for transaction-snapshot mode RI updates */
-		if (!HeapTupleSatisfiesVisibility(&tp, crosscheck, buffer))
+		if (!HeapTupleSatisfiesVisibility(&tp, RelationGetRelid(relation), crosscheck, buffer))
 			result = HeapTupleUpdated;
 	}
 
@@ -3126,14 +3116,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	 * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
 	 * properly.
 	 */
-	oldtup.t_tableOid = RelationGetRelid(relation);
 	oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	oldtup.t_len = ItemIdGetLength(lp);
 	oldtup.t_self = *otid;
 
-	/* the new tuple is ready, except for this: */
-	newtup->t_tableOid = RelationGetRelid(relation);
-
 	/* Determine columns modified by the update. */
 	modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
 												  &oldtup, newtup);
@@ -3364,7 +3350,7 @@ l2:
 	if (crosscheck != InvalidSnapshot && result == HeapTupleMayBeUpdated)
 	{
 		/* Perform additional check for transaction-snapshot mode RI updates */
-		if (!HeapTupleSatisfiesVisibility(&oldtup, crosscheck, buffer))
+		if (!HeapTupleSatisfiesVisibility(&oldtup, RelationGetRelid(relation), crosscheck, buffer))
 			result = HeapTupleUpdated;
 	}
 
@@ -4118,7 +4104,6 @@ heap_lock_tuple(Relation relation, ItemPointer tid,
 
 	tuple->t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tuple->t_len = ItemIdGetLength(lp);
-	tuple->t_tableOid = RelationGetRelid(relation);
 	tuple->t_self = *tid;
 
 l3:
@@ -5672,7 +5657,6 @@ heap_abort_speculative(Relation relation, HeapTuple tuple)
 	lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
 	Assert(ItemIdIsNormal(lp));
 
-	tp.t_tableOid = RelationGetRelid(relation);
 	tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tp.t_len = ItemIdGetLength(lp);
 	tp.t_self = *tid;
@@ -7504,7 +7488,6 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
 	HeapTupleHeader hdr = tup->t_data;
 
 	Assert(ItemPointerIsValid(&tup->t_self));
-	Assert(tup->t_tableOid != InvalidOid);
 
 	xlrec.top_xid = GetTopTransactionId();
 	xlrec.target_node = relation->rd_node;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index fec649b842..f2719bb017 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -118,8 +118,6 @@ heapam_heap_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
 
 	/* Update the tuple with table oid */
 	slot->tts_tableOid = RelationGetRelid(relation);
-	if (slot->tts_tableOid != InvalidOid)
-		tuple->t_tableOid = slot->tts_tableOid;
 
 	/* Perform the insertion, and copy the resulting ItemPointer */
 	heap_insert(relation, tuple, cid, options, bistate);
@@ -138,8 +136,6 @@ heapam_heap_insert_speculative(Relation relation, TupleTableSlot *slot, CommandI
 
 	/* Update the tuple with table oid */
 	slot->tts_tableOid = RelationGetRelid(relation);
-	if (slot->tts_tableOid != InvalidOid)
-		tuple->t_tableOid = slot->tts_tableOid;
 
 	HeapTupleHeaderSetSpeculativeToken(tuple->t_data, specToken);
 
@@ -566,7 +562,9 @@ heapam_tuple_satisfies_snapshot(Relation rel, TupleTableSlot *slot, Snapshot sna
 	 * Caller should be holding pin, but not lock.
 	 */
 	LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
-	res = HeapTupleSatisfiesVisibility(bslot->base.tuple, snapshot,
+	res = HeapTupleSatisfiesVisibility(bslot->base.tuple,
+									   RelationGetRelid(rel),
+									   snapshot,
 									   bslot->buffer);
 	LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
 
@@ -732,7 +730,6 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
 
 		ItemPointerSet(&targtuple->t_self, scan->rs_cblock, scan->rs_cindex);
 
-		targtuple->t_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 		targtuple->t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
 		targtuple->t_len = ItemIdGetLength(itemid);
 
@@ -817,6 +814,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
 		if (sample_it)
 		{
 			ExecStoreBufferHeapTuple(targtuple, slot, scan->rs_cbuf);
+			slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 			scan->rs_cindex++;
 
 			/* note that we leave the buffer locked here! */
@@ -1477,7 +1475,7 @@ heapam_index_build_range_scan(Relation heapRelation,
 		MemoryContextReset(econtext->ecxt_per_tuple_memory);
 
 		/* Set up for predicate or expression evaluation */
-		ExecStoreHeapTuple(heapTuple, slot, false);
+		ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(sscan->rs_rd), false);
 
 		/*
 		 * In a partial index, discard tuples that don't satisfy the
@@ -1731,7 +1729,7 @@ heapam_index_validate_scan(Relation heapRelation,
 			MemoryContextReset(econtext->ecxt_per_tuple_memory);
 
 			/* Set up for predicate or expression evaluation */
-			ExecStoreHeapTuple(heapTuple, slot, false);
+			ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(sscan->rs_rd), false);
 
 			/*
 			 * In a partial index, discard tuples that don't satisfy the
@@ -1884,9 +1882,9 @@ heapam_scan_bitmap_pagescan(TableScanDesc sscan,
 				continue;
 			loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
 			loctup.t_len = ItemIdGetLength(lp);
-			loctup.t_tableOid = scan->rs_scan.rs_rd->rd_id;
 			ItemPointerSet(&loctup.t_self, page, offnum);
-			valid = HeapTupleSatisfiesVisibility(&loctup, snapshot, buffer);
+			valid = HeapTupleSatisfiesVisibility(&loctup,
+					RelationGetRelid(scan->rs_scan.rs_rd), snapshot, buffer);
 			if (valid)
 			{
 				scan->rs_vistuples[ntup++] = offnum;
@@ -1923,7 +1921,6 @@ heapam_scan_bitmap_pagescan_next(TableScanDesc sscan, TupleTableSlot *slot)
 
 	scan->rs_ctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
 	scan->rs_ctup.t_len = ItemIdGetLength(lp);
-	scan->rs_ctup.t_tableOid = scan->rs_scan.rs_rd->rd_id;
 	ItemPointerSet(&scan->rs_ctup.t_self, scan->rs_cblock, targoffset);
 
 	pgstat_count_heap_fetch(scan->rs_scan.rs_rd);
@@ -1935,6 +1932,7 @@ heapam_scan_bitmap_pagescan_next(TableScanDesc sscan, TupleTableSlot *slot)
 	ExecStoreBufferHeapTuple(&scan->rs_ctup,
 							 slot,
 							 scan->rs_cbuf);
+	slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 
 	scan->rs_cindex++;
 
@@ -1981,8 +1979,10 @@ SampleHeapTupleVisible(HeapScanDesc scan, Buffer buffer,
 	else
 	{
 		/* Otherwise, we have to check the tuple individually. */
-		return HeapTupleSatisfiesVisibility(tuple, scan->rs_scan.rs_snapshot,
-											buffer);
+		return HeapTupleSatisfiesVisibility(tuple,
+				RelationGetRelid(scan->rs_scan.rs_rd),
+				scan->rs_scan.rs_snapshot,
+				buffer);
 	}
 }
 
@@ -2131,6 +2131,7 @@ heapam_scan_sample_next_tuple(TableScanDesc sscan, struct SampleScanState *scans
 				LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
 
 			ExecStoreBufferHeapTuple(tuple, slot, scan->rs_cbuf);
+			slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
 
 			/* Count successfully-fetched tuples as heap fetches */
 			pgstat_count_heap_getnext(scan->rs_scan.rs_rd);
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 5e8fdacb95..af63038d4a 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -173,7 +173,6 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
@@ -366,7 +365,6 @@ HeapTupleSatisfiesToast(HeapTuple htup, Snapshot snapshot,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
@@ -460,7 +458,6 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
@@ -750,7 +747,6 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	snapshot->xmin = snapshot->xmax = InvalidTransactionId;
 	snapshot->speculativeToken = 0;
@@ -967,7 +963,6 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
@@ -1168,7 +1163,6 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	/*
 	 * Has inserting transaction committed?
@@ -1425,7 +1419,6 @@ HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin)
 	HeapTupleHeader tuple = htup->t_data;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	/*
 	 * If the inserting transaction is marked invalid, then it aborted, and
@@ -1541,7 +1534,7 @@ TransactionIdInArray(TransactionId xid, TransactionId *xip, Size num)
  * complicated than when dealing "only" with the present.
  */
 static bool
-HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
+HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Oid relid, Snapshot snapshot,
 							   Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
@@ -1549,7 +1542,6 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 	TransactionId xmax = HeapTupleHeaderGetRawXmax(tuple);
 
 	Assert(ItemPointerIsValid(&htup->t_self));
-	Assert(htup->t_tableOid != InvalidOid);
 
 	/* inserting transaction aborted */
 	if (HeapTupleHeaderXminInvalid(tuple))
@@ -1570,7 +1562,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 		 * values externally.
 		 */
 		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(), snapshot,
-												 htup, buffer,
+												 htup, relid, buffer,
 												 &cmin, &cmax);
 
 		if (!resolved)
@@ -1641,7 +1633,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 
 		/* Lookup actual cmin/cmax values */
 		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(), snapshot,
-												 htup, buffer,
+												 htup, relid, buffer,
 												 &cmin, &cmax);
 
 		if (!resolved)
@@ -1689,8 +1681,10 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
  *	if so, the indicated buffer is marked dirty.
  */
 bool
-HeapTupleSatisfiesVisibility(HeapTuple tup, Snapshot snapshot, Buffer buffer)
+HeapTupleSatisfiesVisibility(HeapTuple tup, Oid relid, Snapshot snapshot, Buffer buffer)
 {
+	Assert(relid != InvalidOid);
+
 	switch (snapshot->snapshot_type)
 	{
 		case SNAPSHOT_MVCC:
@@ -1709,7 +1703,7 @@ HeapTupleSatisfiesVisibility(HeapTuple tup, Snapshot snapshot, Buffer buffer)
 			return HeapTupleSatisfiesDirty(tup, snapshot, buffer);
 			break;
 		case SNAPSHOT_HISTORIC_MVCC:
-			return HeapTupleSatisfiesHistoricMVCC(tup, snapshot, buffer);
+			return HeapTupleSatisfiesHistoricMVCC(tup, relid, snapshot, buffer);
 			break;
 		case SNAPSHOT_NON_VACUUMABLE:
 			return HeapTupleSatisfiesNonVacuumable(tup, snapshot, buffer);
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a3e51922d8..e09a9d7340 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -366,8 +366,6 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 				i;
 	HeapTupleData tup;
 
-	tup.t_tableOid = RelationGetRelid(relation);
-
 	rootlp = PageGetItemId(dp, rootoffnum);
 
 	/*
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 7ea964c493..257ae9761b 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1022,7 +1022,6 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 		result_tuple = (HeapTuple) palloc0(HEAPTUPLESIZE + new_tuple_len);
 		result_tuple->t_len = new_tuple_len;
 		result_tuple->t_self = newtup->t_self;
-		result_tuple->t_tableOid = newtup->t_tableOid;
 		new_data = (HeapTupleHeader) ((char *) result_tuple + HEAPTUPLESIZE);
 		result_tuple->t_data = new_data;
 
@@ -1123,7 +1122,6 @@ toast_flatten_tuple(HeapTuple tup, TupleDesc tupleDesc)
 	 * a syscache entry.
 	 */
 	new_tuple->t_self = tup->t_self;
-	new_tuple->t_tableOid = tup->t_tableOid;
 
 	new_tuple->t_data->t_choice = tup->t_data->t_choice;
 	new_tuple->t_data->t_ctid = tup->t_data->t_ctid;
@@ -1194,7 +1192,6 @@ toast_flatten_tuple_to_datum(HeapTupleHeader tup,
 	/* Build a temporary HeapTuple control structure */
 	tmptup.t_len = tup_len;
 	ItemPointerSetInvalid(&(tmptup.t_self));
-	tmptup.t_tableOid = InvalidOid;
 	tmptup.t_data = tup;
 
 	/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9416c31889..2056dde239 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1009,7 +1009,6 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 			tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 			tuple.t_len = ItemIdGetLength(itemid);
-			tuple.t_tableOid = RelationGetRelid(onerel);
 
 			tupgone = false;
 
@@ -2244,7 +2243,6 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 
 		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
 		tuple.t_len = ItemIdGetLength(itemid);
-		tuple.t_tableOid = RelationGetRelid(rel);
 
 		switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
 		{
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index f4a527b126..250c746971 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -510,7 +510,6 @@ systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup)
 	result = table_tuple_satisfies_snapshot(sysscan->heap_rel,
 											sysscan->slot,
 											freshsnap);
-
 	return result;
 }
 
diff --git a/src/backend/catalog/indexing.c b/src/backend/catalog/indexing.c
index 0c994122d8..7d443f8fe7 100644
--- a/src/backend/catalog/indexing.c
+++ b/src/backend/catalog/indexing.c
@@ -99,7 +99,7 @@ CatalogIndexInsert(CatalogIndexState indstate, HeapTuple heapTuple)
 	/* Need a slot to hold the tuple being examined */
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
 									&TTSOpsHeapTuple);
-	ExecStoreHeapTuple(heapTuple, slot, false);
+	ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(heapRelation), false);
 
 	/*
 	 * for each index, form and insert the index tuple
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index fb4384d556..a71d7b658e 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -760,7 +760,7 @@ compute_index_stats(Relation onerel, double totalrows,
 			ResetExprContext(econtext);
 
 			/* Set up for predicate or expression evaluation */
-			ExecStoreHeapTuple(heapTuple, slot, false);
+			ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(onerel), false);
 
 			/* If index is partial, check predicate */
 			if (predicate != NULL)
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index 4f62e48d98..0f3e52802b 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -2441,10 +2441,9 @@ ExecuteCallStmt(CallStmt *stmt, ParamListInfo params, bool atomic, DestReceiver
 
 		rettupdata.t_len = HeapTupleHeaderGetDatumLength(td);
 		ItemPointerSetInvalid(&(rettupdata.t_self));
-		rettupdata.t_tableOid = InvalidOid;
 		rettupdata.t_data = td;
 
-		slot = ExecStoreHeapTuple(&rettupdata, tstate->slot, false);
+		slot = ExecStoreHeapTuple(&rettupdata, tstate->slot, InvalidOid, false);
 		tstate->dest->receiveSlot(slot, tstate->dest);
 
 		end_tup_output(tstate);
diff --git a/src/backend/commands/schemacmds.c b/src/backend/commands/schemacmds.c
index 6cf94a3140..4492ae2b0e 100644
--- a/src/backend/commands/schemacmds.c
+++ b/src/backend/commands/schemacmds.c
@@ -355,7 +355,6 @@ AlterSchemaOwner_internal(HeapTuple tup, Relation rel, Oid newOwnerId)
 {
 	Form_pg_namespace nspForm;
 
-	Assert(tup->t_tableOid == NamespaceRelationId);
 	Assert(RelationGetRelid(rel) == NamespaceRelationId);
 
 	nspForm = (Form_pg_namespace) GETSTRUCT(tup);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 1653c37567..bae5de9764 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2567,7 +2567,7 @@ ExecBRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
 		}
 		else if (newtuple != oldtuple)
 		{
-			ExecForceStoreHeapTuple(newtuple, slot);
+			ExecForceStoreHeapTuple(newtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 			if (should_free)
 				heap_freetuple(oldtuple);
@@ -2649,7 +2649,8 @@ ExecIRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
 		}
 		else if (newtuple != oldtuple)
 		{
-			ExecForceStoreHeapTuple(newtuple, slot);
+			ExecForceStoreHeapTuple(newtuple, LocTriggerData.tg_trigslot,
+									RelationGetRelid(relinfo->ri_RelationDesc));
 
 			if (should_free)
 				heap_freetuple(oldtuple);
@@ -2778,7 +2779,7 @@ ExecBRDeleteTriggers(EState *estate, EPQState *epqstate,
 	else
 	{
 		trigtuple = fdw_trigtuple;
-		ExecForceStoreHeapTuple(trigtuple, slot);
+		ExecForceStoreHeapTuple(trigtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
 	}
 
 	LocTriggerData.type = T_TriggerData;
@@ -2850,7 +2851,7 @@ ExecARDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
 							   slot,
 							   NULL);
 		else
-			ExecForceStoreHeapTuple(fdw_trigtuple, slot);
+			ExecForceStoreHeapTuple(fdw_trigtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_DELETE,
 							  true, slot, NULL, NIL, NULL,
@@ -2879,7 +2880,7 @@ ExecIRDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
 	LocTriggerData.tg_oldtable = NULL;
 	LocTriggerData.tg_newtable = NULL;
 
-	ExecForceStoreHeapTuple(trigtuple, slot);
+	ExecForceStoreHeapTuple(trigtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 	for (i = 0; i < trigdesc->numtriggers; i++)
 	{
@@ -3038,7 +3039,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 	}
 	else
 	{
-		ExecForceStoreHeapTuple(fdw_trigtuple, oldslot);
+		ExecForceStoreHeapTuple(fdw_trigtuple, oldslot, RelationGetRelid(relinfo->ri_RelationDesc));
 		trigtuple = fdw_trigtuple;
 	}
 
@@ -3088,7 +3089,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 		}
 		else if (newtuple != oldtuple)
 		{
-			ExecForceStoreHeapTuple(newtuple, newslot);
+			ExecForceStoreHeapTuple(newtuple, newslot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 			if (should_free_new)
 				heap_freetuple(oldtuple);
@@ -3136,7 +3137,7 @@ ExecARUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 							   oldslot,
 							   NULL);
 		else if (fdw_trigtuple != NULL)
-			ExecForceStoreHeapTuple(fdw_trigtuple, oldslot);
+			ExecForceStoreHeapTuple(fdw_trigtuple, oldslot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_UPDATE,
 							  true, oldslot, newslot, recheckIndexes,
@@ -3164,7 +3165,7 @@ ExecIRUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 	LocTriggerData.tg_oldtable = NULL;
 	LocTriggerData.tg_newtable = NULL;
 
-	ExecForceStoreHeapTuple(trigtuple, oldslot);
+	ExecForceStoreHeapTuple(trigtuple, oldslot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 	for (i = 0; i < trigdesc->numtriggers; i++)
 	{
@@ -3200,7 +3201,7 @@ ExecIRUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 		}
 		else if (newtuple != oldtuple)
 		{
-			ExecForceStoreHeapTuple(newtuple, newslot);
+			ExecForceStoreHeapTuple(newtuple, newslot, RelationGetRelid(relinfo->ri_RelationDesc));
 
 			if (should_free)
 				heap_freetuple(oldtuple);
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index a018925d4e..11aee64d19 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3016,7 +3016,6 @@ ExecEvalFieldStoreDeForm(ExprState *state, ExprEvalStep *op, ExprContext *econte
 		tuphdr = DatumGetHeapTupleHeader(tupDatum);
 		tmptup.t_len = HeapTupleHeaderGetDatumLength(tuphdr);
 		ItemPointerSetInvalid(&(tmptup.t_self));
-		tmptup.t_tableOid = InvalidOid;
 		tmptup.t_data = tuphdr;
 
 		heap_deform_tuple(&tmptup, tupDesc,
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 41fa374b6f..72b9fab42c 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -389,7 +389,7 @@ tts_heap_copyslot(TupleTableSlot *dstslot, TupleTableSlot *srcslot)
 	tuple = ExecCopySlotHeapTuple(srcslot);
 	MemoryContextSwitchTo(oldcontext);
 
-	ExecStoreHeapTuple(tuple, dstslot, true);
+	ExecStoreHeapTuple(tuple, dstslot, srcslot->tts_tableOid, true);
 }
 
 static HeapTuple
@@ -1102,6 +1102,7 @@ MakeTupleTableSlot(TupleDesc tupleDesc,
 	slot->tts_tupleDescriptor = tupleDesc;
 	slot->tts_mcxt = CurrentMemoryContext;
 	slot->tts_nvalid = 0;
+	slot->tts_tableOid = InvalidOid;
 
 	if (tupleDesc != NULL)
 	{
@@ -1293,6 +1294,7 @@ ExecSetSlotDescriptor(TupleTableSlot *slot, /* slot to change */
  *
  *		tuple:	tuple to store
  *		slot:	TTSOpsHeapTuple type slot to store it in
+ *		relid:  Relation id, from where the tuple belongs or InvalidOid
  *		shouldFree: true if ExecClearTuple should pfree() the tuple
  *					when done with it
  *
@@ -1314,6 +1316,7 @@ ExecSetSlotDescriptor(TupleTableSlot *slot, /* slot to change */
 TupleTableSlot *
 ExecStoreHeapTuple(HeapTuple tuple,
 				   TupleTableSlot *slot,
+				   Oid relid,
 				   bool shouldFree)
 {
 	/*
@@ -1327,7 +1330,7 @@ ExecStoreHeapTuple(HeapTuple tuple,
 		elog(ERROR, "trying to store a heap tuple into wrong type of slot");
 	tts_heap_store_tuple(slot, tuple, shouldFree);
 
-	slot->tts_tableOid = tuple->t_tableOid;
+	slot->tts_tableOid = relid;
 
 	return slot;
 }
@@ -1349,6 +1352,8 @@ ExecStoreHeapTuple(HeapTuple tuple,
  *
  * If the target slot is not guaranteed to be TTSOpsBufferHeapTuple type slot,
  * use the, more expensive, ExecForceStoreHeapTuple().
+ *
+ * NOTE: Don't set the tts_tableOid from tuple t_tableOid.
  * --------------------------------
  */
 TupleTableSlot *
@@ -1368,8 +1373,6 @@ ExecStoreBufferHeapTuple(HeapTuple tuple,
 		elog(ERROR, "trying to store an on-disk heap tuple into wrong type of slot");
 	tts_buffer_heap_store_tuple(slot, tuple, buffer, false);
 
-	slot->tts_tableOid = tuple->t_tableOid;
-
 	return slot;
 }
 
@@ -1394,8 +1397,6 @@ ExecStorePinnedBufferHeapTuple(HeapTuple tuple,
 		elog(ERROR, "trying to store an on-disk heap tuple into wrong type of slot");
 	tts_buffer_heap_store_tuple(slot, tuple, buffer, true);
 
-	slot->tts_tableOid = tuple->t_tableOid;
-
 	return slot;
 }
 
@@ -1430,11 +1431,12 @@ ExecStoreMinimalTuple(MinimalTuple mtup,
  */
 void
 ExecForceStoreHeapTuple(HeapTuple tuple,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						Oid relid)
 {
 	if (TTS_IS_HEAPTUPLE(slot))
 	{
-		ExecStoreHeapTuple(tuple, slot, false);
+		ExecStoreHeapTuple(tuple, slot, relid, false);
 	}
 	else if (TTS_IS_BUFFERTUPLE(slot))
 	{
@@ -1447,6 +1449,7 @@ ExecForceStoreHeapTuple(HeapTuple tuple,
 		oldContext = MemoryContextSwitchTo(slot->tts_mcxt);
 		bslot->base.tuple = heap_copytuple(tuple);
 		MemoryContextSwitchTo(oldContext);
+		slot->tts_tableOid = relid;
 	}
 	else
 	{
@@ -1454,6 +1457,7 @@ ExecForceStoreHeapTuple(HeapTuple tuple,
 		heap_deform_tuple(tuple, slot->tts_tupleDescriptor,
 						  slot->tts_values, slot->tts_isnull);
 		ExecStoreVirtualTuple(slot);
+		slot->tts_tableOid = relid;
 	}
 }
 
@@ -1590,6 +1594,8 @@ ExecStoreHeapTupleDatum(Datum data, TupleTableSlot *slot)
 HeapTuple
 ExecFetchSlotHeapTuple(TupleTableSlot *slot, bool materialize, bool *shouldFree)
 {
+	HeapTuple htup;
+
 	/*
 	 * sanity checks
 	 */
@@ -1604,14 +1610,18 @@ ExecFetchSlotHeapTuple(TupleTableSlot *slot, bool materialize, bool *shouldFree)
 	{
 		if (shouldFree)
 			*shouldFree = true;
-		return slot->tts_ops->copy_heap_tuple(slot);
+		htup = slot->tts_ops->copy_heap_tuple(slot);
 	}
 	else
 	{
 		if (shouldFree)
 			*shouldFree = false;
-		return slot->tts_ops->get_heap_tuple(slot);
+		htup = slot->tts_ops->get_heap_tuple(slot);
 	}
+
+	htup->t_tableOid = slot->tts_tableOid;
+
+	return htup;
 }
 
 /* --------------------------------
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 7f33fe933b..adbbe17bde 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1010,7 +1010,6 @@ GetAttributeByName(HeapTupleHeader tuple, const char *attname, bool *isNull)
 	 */
 	tmptup.t_len = HeapTupleHeaderGetDatumLength(tuple);
 	ItemPointerSetInvalid(&(tmptup.t_self));
-	tmptup.t_tableOid = InvalidOid;
 	tmptup.t_data = tuple;
 
 	result = heap_getattr(&tmptup,
@@ -1058,7 +1057,6 @@ GetAttributeByNum(HeapTupleHeader tuple,
 	 */
 	tmptup.t_len = HeapTupleHeaderGetDatumLength(tuple);
 	ItemPointerSetInvalid(&(tmptup.t_self));
-	tmptup.t_tableOid = InvalidOid;
 	tmptup.t_data = tuple;
 
 	result = heap_getattr(&tmptup,
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index bae7989a42..07479bc5c4 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -1806,7 +1806,8 @@ agg_retrieve_direct(AggState *aggstate)
 				 * cleared from the slot.
 				 */
 				ExecForceStoreHeapTuple(aggstate->grp_firstTuple,
-								   firstSlot);
+								   firstSlot,
+								   InvalidOid);
 				aggstate->grp_firstTuple = NULL;	/* don't keep two pointers */
 
 				/* set up for first advance_aggregates call */
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 69d5a1f239..b467db19fb 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -280,6 +280,7 @@ gather_getnext(GatherState *gatherstate)
 			{
 				ExecStoreHeapTuple(tup, /* tuple to store */
 								   fslot,	/* slot to store the tuple */
+								   InvalidOid,
 								   true);	/* pfree tuple when done with it */
 				return fslot;
 			}
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 4de1d2b484..a04da7bbeb 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -703,6 +703,7 @@ gather_merge_readnext(GatherMergeState *gm_state, int reader, bool nowait)
 	ExecStoreHeapTuple(tup,			/* tuple to store */
 					   gm_state->gm_slots[reader],	/* slot in which to store
 													 * the tuple */
+					   InvalidOid,
 					   true);		/* pfree tuple when done with it */
 
 	return true;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 85c5a1fb79..383c9a8e22 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -206,8 +206,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 			 */
 			Assert(slot->tts_tupleDescriptor->natts ==
 				   scandesc->xs_hitupdesc->natts);
-			ExecForceStoreHeapTuple(scandesc->xs_hitup, slot);
-			slot->tts_tableOid = RelationGetRelid(scandesc->heapRelation);
+			ExecForceStoreHeapTuple(scandesc->xs_hitup, slot,
+					RelationGetRelid(scandesc->heapRelation));
 		}
 		else if (scandesc->xs_itup)
 			StoreIndexTuple(slot, scandesc->xs_itup, scandesc->xs_itupdesc);
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 84e8e872ee..a5401e9c02 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -246,8 +246,7 @@ IndexNextWithReorder(IndexScanState *node)
 				tuple = reorderqueue_pop(node);
 
 				/* Pass 'true', as the tuple in the queue is a palloc'd copy */
-				slot->tts_tableOid = RelationGetRelid(scandesc->heapRelation);
-				ExecStoreHeapTuple(tuple, slot, true);
+				ExecStoreHeapTuple(tuple, slot, RelationGetRelid(scandesc->heapRelation), true);
 				return slot;
 			}
 		}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index f4f95428af..fc6b53a148 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -844,7 +844,7 @@ ldelete:;
 			slot = ExecGetReturningSlot(estate, resultRelInfo);
 			if (oldtuple != NULL)
 			{
-				ExecForceStoreHeapTuple(oldtuple, slot);
+				ExecForceStoreHeapTuple(oldtuple, slot, RelationGetRelid(resultRelationDesc));
 			}
 			else
 			{
@@ -2017,10 +2017,6 @@ ExecModifyTable(PlanState *pstate)
 					oldtupdata.t_len =
 						HeapTupleHeaderGetDatumLength(oldtupdata.t_data);
 					ItemPointerSetInvalid(&(oldtupdata.t_self));
-					/* Historically, view triggers see invalid t_tableOid. */
-					oldtupdata.t_tableOid =
-						(relkind == RELKIND_VIEW) ? InvalidOid :
-						RelationGetRelid(resultRelInfo->ri_RelationDesc);
 
 					oldtuple = &oldtupdata;
 				}
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 26aeaee083..8432c54ddb 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -270,6 +270,7 @@ setop_retrieve_direct(SetOpState *setopstate)
 		 */
 		ExecStoreHeapTuple(setopstate->grp_firstTuple,
 						   resultTupleSlot,
+						   InvalidOid,
 						   true);
 		setopstate->grp_firstTuple = NULL;	/* don't keep two pointers */
 
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 70c03e0f60..c44147c618 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -870,7 +870,6 @@ SPI_modifytuple(Relation rel, HeapTuple tuple, int natts, int *attnum,
 		 */
 		mtuple->t_data->t_ctid = tuple->t_data->t_ctid;
 		mtuple->t_self = tuple->t_self;
-		mtuple->t_tableOid = tuple->t_tableOid;
 	}
 	else
 	{
diff --git a/src/backend/executor/tqueue.c b/src/backend/executor/tqueue.c
index 6e2eaa5dcf..3ebf1e347e 100644
--- a/src/backend/executor/tqueue.c
+++ b/src/backend/executor/tqueue.c
@@ -208,7 +208,6 @@ TupleQueueReaderNext(TupleQueueReader *reader, bool nowait, bool *done)
 	 * (which had better be sufficiently aligned).
 	 */
 	ItemPointerSetInvalid(&htup.t_self);
-	htup.t_tableOid = InvalidOid;
 	htup.t_len = nbytes;
 	htup.t_data = data;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index eec3a22842..f26dac0f70 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -940,12 +940,6 @@ DecodeMultiInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			/* not a disk based tuple */
 			ItemPointerSetInvalid(&tuple->tuple.t_self);
 
-			/*
-			 * We can only figure this out after reassembling the
-			 * transactions.
-			 */
-			tuple->tuple.t_tableOid = InvalidOid;
-
 			tuple->tuple.t_len = datalen + SizeofHeapTupleHeader;
 
 			memset(header, 0, SizeofHeapTupleHeader);
@@ -1033,9 +1027,6 @@ DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tuple)
 	/* not a disk based tuple */
 	ItemPointerSetInvalid(&tuple->tuple.t_self);
 
-	/* we can only figure this out after reassembling the transactions */
-	tuple->tuple.t_tableOid = InvalidOid;
-
 	/* data is not stored aligned, copy to aligned storage */
 	memcpy((char *) &xlhdr,
 		   data,
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 2b486b5e9f..e95a5bbb3d 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3486,7 +3486,7 @@ UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
 bool
 ResolveCminCmaxDuringDecoding(HTAB *tuplecid_data,
 							  Snapshot snapshot,
-							  HeapTuple htup, Buffer buffer,
+							  HeapTuple htup, Oid relid, Buffer buffer,
 							  CommandId *cmin, CommandId *cmax)
 {
 	ReorderBufferTupleCidKey key;
@@ -3528,7 +3528,7 @@ restart:
 	 */
 	if (ent == NULL && !updated_mapping)
 	{
-		UpdateLogicalMappings(tuplecid_data, htup->t_tableOid, snapshot);
+		UpdateLogicalMappings(tuplecid_data, relid, snapshot);
 		/* now check but don't update for a mapping again */
 		updated_mapping = true;
 		goto restart;
diff --git a/src/backend/utils/adt/expandedrecord.c b/src/backend/utils/adt/expandedrecord.c
index 9971abd71f..a49cf9b467 100644
--- a/src/backend/utils/adt/expandedrecord.c
+++ b/src/backend/utils/adt/expandedrecord.c
@@ -610,7 +610,6 @@ make_expanded_record_from_datum(Datum recorddatum, MemoryContext parentcontext)
 
 	tmptup.t_len = HeapTupleHeaderGetDatumLength(tuphdr);
 	ItemPointerSetInvalid(&(tmptup.t_self));
-	tmptup.t_tableOid = InvalidOid;
 	tmptup.t_data = tuphdr;
 
 	oldcxt = MemoryContextSwitchTo(objcxt);
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index dd88c09e6d..314777909d 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -3147,7 +3147,6 @@ populate_record(TupleDesc tupdesc,
 		/* Build a temporary HeapTuple control structure */
 		tuple.t_len = HeapTupleHeaderGetDatumLength(defaultval);
 		ItemPointerSetInvalid(&(tuple.t_self));
-		tuple.t_tableOid = InvalidOid;
 		tuple.t_data = defaultval;
 
 		/* Break down the tuple into fields */
@@ -3546,7 +3545,6 @@ populate_recordset_record(PopulateRecordsetState *state, JsObject *obj)
 	/* ok, save into tuplestore */
 	tuple.t_len = HeapTupleHeaderGetDatumLength(tuphead);
 	ItemPointerSetInvalid(&(tuple.t_self));
-	tuple.t_tableOid = InvalidOid;
 	tuple.t_data = tuphead;
 
 	tuplestore_puttuple(state->tuple_store, &tuple);
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index 5bbf568610..7f1adce08a 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -324,7 +324,6 @@ record_out(PG_FUNCTION_ARGS)
 	/* Build a temporary HeapTuple control structure */
 	tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 	ItemPointerSetInvalid(&(tuple.t_self));
-	tuple.t_tableOid = InvalidOid;
 	tuple.t_data = rec;
 
 	/*
@@ -671,7 +670,6 @@ record_send(PG_FUNCTION_ARGS)
 	/* Build a temporary HeapTuple control structure */
 	tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 	ItemPointerSetInvalid(&(tuple.t_self));
-	tuple.t_tableOid = InvalidOid;
 	tuple.t_data = rec;
 
 	/*
@@ -821,11 +819,9 @@ record_cmp(FunctionCallInfo fcinfo)
 	/* Build temporary HeapTuple control structures */
 	tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
 	ItemPointerSetInvalid(&(tuple1.t_self));
-	tuple1.t_tableOid = InvalidOid;
 	tuple1.t_data = record1;
 	tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
 	ItemPointerSetInvalid(&(tuple2.t_self));
-	tuple2.t_tableOid = InvalidOid;
 	tuple2.t_data = record2;
 
 	/*
@@ -1063,11 +1059,9 @@ record_eq(PG_FUNCTION_ARGS)
 	/* Build temporary HeapTuple control structures */
 	tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
 	ItemPointerSetInvalid(&(tuple1.t_self));
-	tuple1.t_tableOid = InvalidOid;
 	tuple1.t_data = record1;
 	tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
 	ItemPointerSetInvalid(&(tuple2.t_self));
-	tuple2.t_tableOid = InvalidOid;
 	tuple2.t_data = record2;
 
 	/*
@@ -1326,11 +1320,9 @@ record_image_cmp(FunctionCallInfo fcinfo)
 	/* Build temporary HeapTuple control structures */
 	tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
 	ItemPointerSetInvalid(&(tuple1.t_self));
-	tuple1.t_tableOid = InvalidOid;
 	tuple1.t_data = record1;
 	tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
 	ItemPointerSetInvalid(&(tuple2.t_self));
-	tuple2.t_tableOid = InvalidOid;
 	tuple2.t_data = record2;
 
 	/*
@@ -1570,11 +1562,9 @@ record_image_eq(PG_FUNCTION_ARGS)
 	/* Build temporary HeapTuple control structures */
 	tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
 	ItemPointerSetInvalid(&(tuple1.t_self));
-	tuple1.t_tableOid = InvalidOid;
 	tuple1.t_data = record1;
 	tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
 	ItemPointerSetInvalid(&(tuple2.t_self));
-	tuple2.t_tableOid = InvalidOid;
 	tuple2.t_data = record2;
 
 	/*
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 78dd5714fa..3d93f2b84b 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1832,7 +1832,6 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
 								MAXIMUM_ALIGNOF + dtp->t_len);
 		ct->tuple.t_len = dtp->t_len;
 		ct->tuple.t_self = dtp->t_self;
-		ct->tuple.t_tableOid = dtp->t_tableOid;
 		ct->tuple.t_data = (HeapTupleHeader)
 			MAXALIGN(((char *) ct) + sizeof(CatCTup));
 		/* copy tuple contents */
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 60b96df8f9..a3ed15214a 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -3792,11 +3792,11 @@ comparetup_cluster(const SortTuple *a, const SortTuple *b,
 
 		ecxt_scantuple = GetPerTupleExprContext(state->estate)->ecxt_scantuple;
 
-		ExecStoreHeapTuple(ltup, ecxt_scantuple, false);
+		ExecStoreHeapTuple(ltup, ecxt_scantuple, InvalidOid, false);
 		FormIndexDatum(state->indexInfo, ecxt_scantuple, state->estate,
 					   l_index_values, l_index_isnull);
 
-		ExecStoreHeapTuple(rtup, ecxt_scantuple, false);
+		ExecStoreHeapTuple(rtup, ecxt_scantuple, InvalidOid, false);
 		FormIndexDatum(state->indexInfo, ecxt_scantuple, state->estate,
 					   r_index_values, r_index_isnull);
 
@@ -3926,8 +3926,7 @@ readtup_cluster(Tuplesortstate *state, SortTuple *stup,
 	tuple->t_len = t_len;
 	LogicalTapeReadExact(state->tapeset, tapenum,
 						 &tuple->t_self, sizeof(ItemPointerData));
-	/* We don't currently bother to reconstruct t_tableOid */
-	tuple->t_tableOid = InvalidOid;
+
 	/* Read in the tuple body */
 	LogicalTapeReadExact(state->tapeset, tapenum,
 						 tuple->t_data, tuple->t_len);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8401dac483..a290e4f053 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -211,7 +211,7 @@ extern void heap_vacuum_rel(Relation onerel, int options,
 				struct VacuumParams *params, BufferAccessStrategy bstrategy);
 
 /* in heap/heapam_visibility.c */
-extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot,
+extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Oid relid, Snapshot snapshot,
 										 Buffer buffer);
 extern HTSU_Result HeapTupleSatisfiesUpdate(HeapTuple stup, CommandId curcid,
 						 Buffer buffer);
@@ -231,6 +231,7 @@ struct HTAB;
 extern bool ResolveCminCmaxDuringDecoding(struct HTAB *tuplecid_data,
 							  Snapshot snapshot,
 							  HeapTuple htup,
+							  Oid relid,
 							  Buffer buffer,
 							  CommandId *cmin, CommandId *cmax);
 
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index b0561ebe29..cb09ccb859 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -304,9 +304,8 @@ extern TupleTableSlot *MakeSingleTupleTableSlot(TupleDesc tupdesc,
 extern void ExecDropSingleTupleTableSlot(TupleTableSlot *slot);
 extern void ExecSetSlotDescriptor(TupleTableSlot *slot, TupleDesc tupdesc);
 extern TupleTableSlot *ExecStoreHeapTuple(HeapTuple tuple,
-				   TupleTableSlot *slot,
-				   bool shouldFree);
-extern void ExecForceStoreHeapTuple(HeapTuple tuple, TupleTableSlot *slot);
+				   TupleTableSlot *slot, Oid relid, bool shouldFree);
+extern void ExecForceStoreHeapTuple(HeapTuple tuple, TupleTableSlot *slot, Oid relid);
 extern TupleTableSlot *ExecStoreBufferHeapTuple(HeapTuple tuple,
 						 TupleTableSlot *slot,
 						 Buffer buffer);
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index a5aafa8c09..45ebd5814d 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -7235,7 +7235,6 @@ deconstruct_composite_datum(Datum value, HeapTupleData *tmptup)
 	/* Build a temporary HeapTuple control structure */
 	tmptup->t_len = HeapTupleHeaderGetDatumLength(td);
 	ItemPointerSetInvalid(&(tmptup->t_self));
-	tmptup->t_tableOid = InvalidOid;
 	tmptup->t_data = td;
 
 	/* Extract rowtype info and find a tupdesc */
@@ -7404,7 +7403,6 @@ exec_move_row_from_datum(PLpgSQL_execstate *estate,
 		/* Build a temporary HeapTuple control structure */
 		tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
 		ItemPointerSetInvalid(&(tmptup.t_self));
-		tmptup.t_tableOid = InvalidOid;
 		tmptup.t_data = td;
 
 		/* Extract rowtype info */
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index ad3e803899..9e60e88242 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -528,7 +528,6 @@ make_tuple_indirect(PG_FUNCTION_ARGS)
 	/* Build a temporary HeapTuple control structure */
 	tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 	ItemPointerSetInvalid(&(tuple.t_self));
-	tuple.t_tableOid = InvalidOid;
 	tuple.t_data = rec;
 
 	values = (Datum *) palloc(ncolumns * sizeof(Datum));
-- 
2.20.1.windows.1

#125Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#123)
Re: Pluggable Storage - Andres's take

On Sat, Mar 16, 2019 at 5:43 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:

On Sat, Mar 9, 2019 at 2:13 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

While 0001 is pretty bulky, the interesting bits concentrate on a
comparatively small area. I'd appreciate if somebody could give the
comments added in tableam.h a read (both on callbacks, and their
wrappers, as they have different audiences). It'd make sense to first
read the commit message, to understand the goal (and I'd obviously also
appreciate suggestions for improvements there as well).

I'm pretty happy with the current state of the scan patch. I plan to do
two more passes through it (formatting, comment polishing, etc. I don't
know of any functional changes needed), and then commit it, lest
somebody objects.

I found couple of typos in the committed patch, attached patch fixes them.
I am not sure about one typo, please check it once.

And I reviewed the 0002 patch, which is a pretty simple and it can be
committed.

As you are modifying the 0003 patch for modify API's, I went and reviewed
the
existing patch and found couple corrections that are needed, in case if you
are not
taken care of them already.

+ /* Update the tuple with table oid */
+ slot->tts_tableOid = RelationGetRelid(relation);
+ if (slot->tts_tableOid != InvalidOid)
+ tuple->t_tableOid = slot->tts_tableOid;

The setting of slot->tts_tableOid is not required in this function,
After set the check is happening, the above code is present in both
heapam_heap_insert and heapam_heap_insert_speculative.

+ slot->tts_tableOid = RelationGetRelid(relation);

In heapam_heap_update, i don't think there is a need to update
slot->tts_tableOid.

+ default:
+ elog(ERROR, "unrecognized heap_update status: %u", result);

heap_update --> table_update?

+ default:
+ elog(ERROR, "unrecognized heap_delete status: %u", result);

same as above?

+ /*hari FIXME*/
+ /*Assert(result != HeapTupleUpdated && hufd.traversed);*/

Removing the commented codes in both ExecDelete and ExecUpdate functions.

+ /**/
+ if (epqreturnslot)
+ {
+ *epqreturnslot = epqslot;
+ return NULL;
+ }

comment update missed?

Regards,
Haribabu Kommi
Fujitsu Australia

#126Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Haribabu Kommi (#125)
1 attachment(s)
Re: Pluggable Storage - Andres's take

Hi,

The psql \dA commands currently doesn't show the type of the access methods
of
type 'Table'.

postgres=# \dA heap
List of access methods
Name | Type
------+-------
heap |
(1 row)

Attached a simple patch that fixes the problem and outputs as follows.

postgres=# \dA heap
List of access methods
Name | Type
------+-------
heap | Table
(1 row)

The attached patch directly modifies the query that is sent to the server.
Servers < 12 doesn't support of type 'Table', so the same query can work,
because of the case addition as follows.

SELECT amname AS "Name",
CASE amtype WHEN 'i' THEN 'Index' WHEN 't' THEN 'Table' END AS
"Type"
FROM pg_catalog.pg_am ...

Anyone feels that it requires a separate query for servers < 12?

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0001-dA-to-show-Table-type-access-method.patchapplication/octet-stream; name=0001-dA-to-show-Table-type-access-method.patchDownload
From 2d2255f5597c6be2b841b8b11515e1a4dbc5f730 Mon Sep 17 00:00:00 2001
From: Hari Babu <kommi.haribabu@gmail.com>
Date: Thu, 21 Mar 2019 16:06:55 +1100
Subject: [PATCH] \dA to show Table type access method

---
 src/bin/psql/describe.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 779e48437c..cf302241fc 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -169,9 +169,11 @@ describeAccessMethods(const char *pattern, bool verbose)
 					  "SELECT amname AS \"%s\",\n"
 					  "  CASE amtype"
 					  " WHEN 'i' THEN '%s'"
+					  " WHEN 't' THEN '%s'"
 					  " END AS \"%s\"",
 					  gettext_noop("Name"),
 					  gettext_noop("Index"),
+					  gettext_noop("Table"),
 					  gettext_noop("Type"));
 
 	if (verbose)
-- 
2.20.1.windows.1

#127Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#125)
2 attachment(s)
Re: Pluggable Storage - Andres's take

Hi,

Attached is a version of just the first patch. I'm still updating it,
but it's getting closer to commit:

- There were no tests testing EPQ interactions with DELETE, and only an
accidental test for EPQ in UPDATE with a concurrent DELETE. I've added
tests. Plan to commit that ahead of the big change.

- I was pretty unhappy about how the EPQ integration looked before, I've
changed that now.

I still wonder if we should restore EvalPlanQualFetch and move the
table_lock_tuple() calls in ExecDelete/Update into it. But it seems
like it'd not gain that much, because there's custom surrounding code,
and it's not that much code.

- I changed heapam_tuple_tuple to return *WouldBlock rather than just
the last result. I think that's one of the reason Haribabu had
neutered a few asserts.

- I moved comments from heapam.h to tableam.h where appropriate

- I updated the name of HeapUpdateFailureData to TM_FailureData,
HTSU_Result to TM_Result, TM_Results members now properly distinguish
between update vs modifications (delete & update).

- I separated the HEAP_INSERT_ flags into TABLE_INSERT_* and
HEAP_INSERT_ with the latter being a copy of TABLE_INSERT_ with the
sole addition of _SPECULATIVE. table_insert_speculative callers now
don't specify that anymore.

Pending work:
- Wondering if table_insert/delete/update should rather be
table_tuple_insert etc. Would be a bit more consistent with the
callback names, but a bigger departure from existing code.

- I'm not yet happy with TableTupleDeleted computation in heapam.c, I
want to revise that further

- formatting

- commit message

- a few comments need a bit of polishing (ExecCheckTIDVisible, heapam_tuple_lock)

- Rename TableTupleMayBeModified to TableTupleOk, but also probably a s/TableTuple/TableMod/

- I'll probably move TUPLE_LOCK_FLAG_LOCK_* into tableam.h

- two more passes through the patch

On 2019-03-21 15:07:04 +1100, Haribabu Kommi wrote:

As you are modifying the 0003 patch for modify API's, I went and reviewed
the
existing patch and found couple corrections that are needed, in case if you
are not
taken care of them already.

Some I have...

+ /* Update the tuple with table oid */
+ slot->tts_tableOid = RelationGetRelid(relation);
+ if (slot->tts_tableOid != InvalidOid)
+ tuple->t_tableOid = slot->tts_tableOid;

The setting of slot->tts_tableOid is not required in this function,
After set the check is happening, the above code is present in both
heapam_heap_insert and heapam_heap_insert_speculative.

I'm not following? Those functions are independent?

+ slot->tts_tableOid = RelationGetRelid(relation);

In heapam_heap_update, i don't think there is a need to update
slot->tts_tableOid.

Why?

+ default:
+ elog(ERROR, "unrecognized heap_update status: %u", result);

heap_update --> table_update?

+ default:
+ elog(ERROR, "unrecognized heap_delete status: %u", result);

same as above?

I've fixed that in a number of places.

+ /*hari FIXME*/
+ /*Assert(result != HeapTupleUpdated && hufd.traversed);*/

Removing the commented codes in both ExecDelete and ExecUpdate functions.

I don't think that's the right fix. I've refactored that code
significantly now, and restored the assert in a, imo, correct version.

+ /**/
+ if (epqreturnslot)
+ {
+ *epqreturnslot = epqslot;
+ return NULL;
+ }

comment update missed?

Well, you'd deleted a comment around there ;). I've added something back
now...

Greetings,

Andres Freund

Attachments:

v20-0001-Expand-EPQ-tests-for-UPDATEs-and-DELETEs.patchtext/x-diff; charset=us-asciiDownload
From 3e5458614abf572bd6afc83ce203b4fec8f18363 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 20 Mar 2019 11:44:01 -0700
Subject: [PATCH v20 1/2] Expand EPQ tests for UPDATEs and DELETEs

Previously there was basically no coverage for UPDATEs encountering
deleted rows, and no coverage for DELETE having to perform EPQ. That's
problematic for an upcoming commit in which EPQ is tought to integrate
with tableams.

Author: Andres Freund
---
 .../isolation/expected/eval-plan-qual.out     | 218 +++++++++++++++++-
 src/test/isolation/specs/eval-plan-qual.spec  |  33 ++-
 2 files changed, 241 insertions(+), 10 deletions(-)

diff --git a/src/test/isolation/expected/eval-plan-qual.out b/src/test/isolation/expected/eval-plan-qual.out
index bbbb62ef4b1..bab01e0788a 100644
--- a/src/test/isolation/expected/eval-plan-qual.out
+++ b/src/test/isolation/expected/eval-plan-qual.out
@@ -1,10 +1,16 @@
 Parsed test spec with 3 sessions
 
 starting permutation: wx1 wx2 c1 c2 read
-step wx1: UPDATE accounts SET balance = balance - 200 WHERE accountid = 'checking';
-step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking'; <waiting ...>
+step wx1: UPDATE accounts SET balance = balance - 200 WHERE accountid = 'checking' RETURNING balance;
+balance        
+
+400            
+step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking' RETURNING balance; <waiting ...>
 step c1: COMMIT;
 step wx2: <... completed>
+balance        
+
+850            
 step c2: COMMIT;
 step read: SELECT * FROM accounts ORDER BY accountid;
 accountid      balance        
@@ -13,10 +19,15 @@ checking       850
 savings        600            
 
 starting permutation: wy1 wy2 c1 c2 read
-step wy1: UPDATE accounts SET balance = balance + 500 WHERE accountid = 'checking';
-step wy2: UPDATE accounts SET balance = balance + 1000 WHERE accountid = 'checking' AND balance < 1000; <waiting ...>
+step wy1: UPDATE accounts SET balance = balance + 500 WHERE accountid = 'checking' RETURNING balance;
+balance        
+
+1100           
+step wy2: UPDATE accounts SET balance = balance + 1000 WHERE accountid = 'checking' AND balance < 1000  RETURNING balance; <waiting ...>
 step c1: COMMIT;
 step wy2: <... completed>
+balance        
+
 step c2: COMMIT;
 step read: SELECT * FROM accounts ORDER BY accountid;
 accountid      balance        
@@ -24,6 +35,195 @@ accountid      balance
 checking       1100           
 savings        600            
 
+starting permutation: wx1 wx2 r1 c2 read
+step wx1: UPDATE accounts SET balance = balance - 200 WHERE accountid = 'checking' RETURNING balance;
+balance        
+
+400            
+step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking' RETURNING balance; <waiting ...>
+step r1: ROLLBACK;
+step wx2: <... completed>
+balance        
+
+1050           
+step c2: COMMIT;
+step read: SELECT * FROM accounts ORDER BY accountid;
+accountid      balance        
+
+checking       1050           
+savings        600            
+
+starting permutation: wy1 wy2 r1 c2 read
+step wy1: UPDATE accounts SET balance = balance + 500 WHERE accountid = 'checking' RETURNING balance;
+balance        
+
+1100           
+step wy2: UPDATE accounts SET balance = balance + 1000 WHERE accountid = 'checking' AND balance < 1000  RETURNING balance; <waiting ...>
+step r1: ROLLBACK;
+step wy2: <... completed>
+balance        
+
+1600           
+step c2: COMMIT;
+step read: SELECT * FROM accounts ORDER BY accountid;
+accountid      balance        
+
+checking       1600           
+savings        600            
+
+starting permutation: wx1 d1 wx2 c1 c2 read
+step wx1: UPDATE accounts SET balance = balance - 200 WHERE accountid = 'checking' RETURNING balance;
+balance        
+
+400            
+step d1: DELETE FROM accounts WHERE accountid = 'checking' AND balance < 1500 RETURNING balance;
+balance        
+
+400            
+step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking' RETURNING balance; <waiting ...>
+step c1: COMMIT;
+step wx2: <... completed>
+balance        
+
+step c2: COMMIT;
+step read: SELECT * FROM accounts ORDER BY accountid;
+accountid      balance        
+
+savings        600            
+
+starting permutation: wx2 d1 c2 c1 read
+step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking' RETURNING balance;
+balance        
+
+1050           
+step d1: DELETE FROM accounts WHERE accountid = 'checking' AND balance < 1500 RETURNING balance; <waiting ...>
+step c2: COMMIT;
+step d1: <... completed>
+balance        
+
+1050           
+step c1: COMMIT;
+step read: SELECT * FROM accounts ORDER BY accountid;
+accountid      balance        
+
+savings        600            
+
+starting permutation: wx2 wx2 d1 c2 c1 read
+step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking' RETURNING balance;
+balance        
+
+1050           
+step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking' RETURNING balance;
+balance        
+
+1500           
+step d1: DELETE FROM accounts WHERE accountid = 'checking' AND balance < 1500 RETURNING balance; <waiting ...>
+step c2: COMMIT;
+step d1: <... completed>
+balance        
+
+step c1: COMMIT;
+step read: SELECT * FROM accounts ORDER BY accountid;
+accountid      balance        
+
+checking       1500           
+savings        600            
+
+starting permutation: wx2 d2 d1 c2 c1 read
+step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking' RETURNING balance;
+balance        
+
+1050           
+step d2: DELETE FROM accounts WHERE accountid = 'checking';
+step d1: DELETE FROM accounts WHERE accountid = 'checking' AND balance < 1500 RETURNING balance; <waiting ...>
+step c2: COMMIT;
+step d1: <... completed>
+balance        
+
+step c1: COMMIT;
+step read: SELECT * FROM accounts ORDER BY accountid;
+accountid      balance        
+
+savings        600            
+
+starting permutation: wx1 d1 wx2 r1 c2 read
+step wx1: UPDATE accounts SET balance = balance - 200 WHERE accountid = 'checking' RETURNING balance;
+balance        
+
+400            
+step d1: DELETE FROM accounts WHERE accountid = 'checking' AND balance < 1500 RETURNING balance;
+balance        
+
+400            
+step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking' RETURNING balance; <waiting ...>
+step r1: ROLLBACK;
+step wx2: <... completed>
+balance        
+
+1050           
+step c2: COMMIT;
+step read: SELECT * FROM accounts ORDER BY accountid;
+accountid      balance        
+
+checking       1050           
+savings        600            
+
+starting permutation: wx2 d1 r2 c1 read
+step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking' RETURNING balance;
+balance        
+
+1050           
+step d1: DELETE FROM accounts WHERE accountid = 'checking' AND balance < 1500 RETURNING balance; <waiting ...>
+step r2: ROLLBACK;
+step d1: <... completed>
+balance        
+
+600            
+step c1: COMMIT;
+step read: SELECT * FROM accounts ORDER BY accountid;
+accountid      balance        
+
+savings        600            
+
+starting permutation: wx2 wx2 d1 r2 c1 read
+step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking' RETURNING balance;
+balance        
+
+1050           
+step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking' RETURNING balance;
+balance        
+
+1500           
+step d1: DELETE FROM accounts WHERE accountid = 'checking' AND balance < 1500 RETURNING balance; <waiting ...>
+step r2: ROLLBACK;
+step d1: <... completed>
+balance        
+
+600            
+step c1: COMMIT;
+step read: SELECT * FROM accounts ORDER BY accountid;
+accountid      balance        
+
+savings        600            
+
+starting permutation: wx2 d2 d1 r2 c1 read
+step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking' RETURNING balance;
+balance        
+
+1050           
+step d2: DELETE FROM accounts WHERE accountid = 'checking';
+step d1: DELETE FROM accounts WHERE accountid = 'checking' AND balance < 1500 RETURNING balance; <waiting ...>
+step r2: ROLLBACK;
+step d1: <... completed>
+balance        
+
+600            
+step c1: COMMIT;
+step read: SELECT * FROM accounts ORDER BY accountid;
+accountid      balance        
+
+savings        600            
+
 starting permutation: upsert1 upsert2 c1 c2 read
 step upsert1: 
 	WITH upsert AS
@@ -106,7 +306,10 @@ a              b              c
 step c2: COMMIT;
 
 starting permutation: wx2 partiallock c2 c1 read
-step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking';
+step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking' RETURNING balance;
+balance        
+
+1050           
 step partiallock: 
 	SELECT * FROM accounts a1, accounts a2
 	  WHERE a1.accountid = a2.accountid
@@ -126,7 +329,10 @@ checking       1050
 savings        600            
 
 starting permutation: wx2 lockwithvalues c2 c1 read
-step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking';
+step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking' RETURNING balance;
+balance        
+
+1050           
 step lockwithvalues: 
 	SELECT * FROM accounts a1, (values('checking'),('savings')) v(id)
 	  WHERE a1.accountid = v.id
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index 2e1b5095e8e..c22bf65f42f 100644
--- a/src/test/isolation/specs/eval-plan-qual.spec
+++ b/src/test/isolation/specs/eval-plan-qual.spec
@@ -42,9 +42,16 @@ teardown
 session "s1"
 setup		{ BEGIN ISOLATION LEVEL READ COMMITTED; }
 # wx1 then wx2 checks the basic case of re-fetching up-to-date values
-step "wx1"	{ UPDATE accounts SET balance = balance - 200 WHERE accountid = 'checking'; }
+step "wx1"	{ UPDATE accounts SET balance = balance - 200 WHERE accountid = 'checking' RETURNING balance; }
 # wy1 then wy2 checks the case where quals pass then fail
-step "wy1"	{ UPDATE accounts SET balance = balance + 500 WHERE accountid = 'checking'; }
+step "wy1"	{ UPDATE accounts SET balance = balance + 500 WHERE accountid = 'checking' RETURNING balance; }
+
+# d1 then wx1 checks that update can deal with the updated row vanishing
+# wx2 then d1 checks that the delete affects the updated row
+# wx2, wx2 then d1 checks that the delete checks the quals correctly (balance too high)
+# wx2, d2, then d1 checks that delete handles a vanishing row correctly
+step "d1"	{ DELETE FROM accounts WHERE accountid = 'checking' AND balance < 1500 RETURNING balance; }
+
 # upsert tests are to check writable-CTE cases
 step "upsert1"	{
 	WITH upsert AS
@@ -64,6 +71,7 @@ step "readp1"	{ SELECT tableoid::regclass, ctid, * FROM p WHERE b IN (0, 1) AND
 step "writep1"	{ UPDATE p SET b = -1 WHERE a = 1 AND b = 1 AND c = 0; }
 step "writep2"	{ UPDATE p SET b = -b WHERE a = 1 AND c = 0; }
 step "c1"	{ COMMIT; }
+step "r1"	{ ROLLBACK; }
 
 # these tests are meant to exercise EvalPlanQualFetchRowMarks,
 # ie, handling non-locked tables in an EvalPlanQual recheck
@@ -128,8 +136,10 @@ step "selectresultforupdate"	{
 
 session "s2"
 setup		{ BEGIN ISOLATION LEVEL READ COMMITTED; }
-step "wx2"	{ UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking'; }
-step "wy2"	{ UPDATE accounts SET balance = balance + 1000 WHERE accountid = 'checking' AND balance < 1000; }
+step "wx2"	{ UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking' RETURNING balance; }
+step "wy2"	{ UPDATE accounts SET balance = balance + 1000 WHERE accountid = 'checking' AND balance < 1000  RETURNING balance; }
+step "d2"	{ DELETE FROM accounts WHERE accountid = 'checking'; }
+
 step "upsert2"	{
 	WITH upsert AS
 	  (UPDATE accounts SET balance = balance + 1234
@@ -161,6 +171,7 @@ step "updateforcip3"	{
 step "wrtwcte"	{ UPDATE table_a SET value = 'tableAValue2' WHERE id = 1; }
 step "wrjt"	{ UPDATE jointest SET data = 42 WHERE id = 7; }
 step "c2"	{ COMMIT; }
+step "r2"	{ ROLLBACK; }
 
 session "s3"
 setup		{ BEGIN ISOLATION LEVEL READ COMMITTED; }
@@ -192,8 +203,22 @@ step "multireadwcte"	{
 
 teardown	{ COMMIT; }
 
+# test that normal update follows update chains, and reverifies quals
 permutation "wx1" "wx2" "c1" "c2" "read"
 permutation "wy1" "wy2" "c1" "c2" "read"
+permutation "wx1" "wx2" "r1" "c2" "read"
+permutation "wy1" "wy2" "r1" "c2" "read"
+
+# test that deletes follow chains, and reverifies quals
+permutation "wx1" "d1" "wx2" "c1" "c2" "read"
+permutation "wx2" "d1" "c2" "c1" "read"
+permutation "wx2" "wx2" "d1" "c2" "c1" "read"
+permutation "wx2" "d2" "d1" "c2" "c1" "read"
+permutation "wx1" "d1" "wx2" "r1" "c2" "read"
+permutation "wx2" "d1" "r2" "c1" "read"
+permutation "wx2" "wx2" "d1" "r2" "c1" "read"
+permutation "wx2" "d2" "d1" "r2" "c1" "read"
+
 permutation "upsert1" "upsert2" "c1" "c2" "read"
 permutation "readp1" "writep1" "readp2" "c1" "c2"
 permutation "writep2" "returningp1" "c1" "c2"
-- 
2.21.0.dirty

v20-0002-tableam-Add-insert-delete-update-lock_tuple.patchtext/x-diff; charset=us-asciiDownload
From f72cc54402dbf051c52657e843c9525c1aea972c Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 7 Mar 2019 16:23:34 -0800
Subject: [PATCH v20 2/2] tableam: Add insert, delete, update, lock_tuple.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 contrib/pgrowlocks/pgrowlocks.c               |   6 +-
 src/backend/access/heap/heapam.c              | 395 +++++++---------
 src/backend/access/heap/heapam_handler.c      | 324 +++++++++++++
 src/backend/access/heap/heapam_visibility.c   | 105 +++--
 src/backend/access/heap/tuptoaster.c          |   2 +-
 src/backend/access/table/tableam.c            | 113 +++++
 src/backend/access/table/tableamapi.c         |  13 +
 src/backend/commands/copy.c                   |   3 +-
 src/backend/commands/trigger.c                | 109 ++---
 src/backend/executor/execIndexing.c           |   4 +-
 src/backend/executor/execMain.c               | 289 +-----------
 src/backend/executor/execReplication.c        | 137 +++---
 src/backend/executor/nodeLockRows.c           | 137 ++----
 src/backend/executor/nodeModifyTable.c        | 424 ++++++++++--------
 src/backend/executor/nodeTidscan.c            |   2 +-
 src/include/access/heapam.h                   |  58 +--
 src/include/access/tableam.h                  | 347 ++++++++++++++
 src/include/executor/executor.h               |  12 +-
 src/include/nodes/lockoptions.h               |   5 +
 src/include/utils/snapshot.h                  |  13 -
 .../expected/partition-key-update-1.out       |   2 +-
 src/tools/pgindent/typedefs.list              |   4 +-
 22 files changed, 1465 insertions(+), 1039 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index 2d2a6cf1533..54514309091 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -146,7 +146,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 	/* scan the relation */
 	while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
 	{
-		HTSU_Result htsu;
+		TM_Result	htsu;
 		TransactionId xmax;
 		uint16		infomask;
 
@@ -160,9 +160,9 @@ pgrowlocks(PG_FUNCTION_ARGS)
 		infomask = tuple->t_data->t_infomask;
 
 		/*
-		 * A tuple is locked if HTSU returns BeingUpdated.
+		 * A tuple is locked if HTSU returns BeingModified.
 		 */
-		if (htsu == HeapTupleBeingUpdated)
+		if (htsu == TableTupleBeingModified)
 		{
 			char	  **values;
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3c8a5da0bc8..698faa4a83e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -86,7 +86,7 @@ static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
 						  LockTupleMode mode, bool is_update,
 						  TransactionId *result_xmax, uint16 *result_infomask,
 						  uint16 *result_infomask2);
-static HTSU_Result heap_lock_updated_tuple(Relation rel, HeapTuple tuple,
+static TM_Result heap_lock_updated_tuple(Relation rel, HeapTuple tuple,
 						ItemPointer ctid, TransactionId xid,
 						LockTupleMode mode);
 static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
@@ -1389,7 +1389,6 @@ heap_fetch(Relation relation,
 		   Snapshot snapshot,
 		   HeapTuple tuple,
 		   Buffer *userbuf,
-		   bool keep_buf,
 		   Relation stats_relation)
 {
 	ItemPointer tid = &(tuple->t_self);
@@ -1419,13 +1418,8 @@ heap_fetch(Relation relation,
 	if (offnum < FirstOffsetNumber || offnum > PageGetMaxOffsetNumber(page))
 	{
 		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
-		if (keep_buf)
-			*userbuf = buffer;
-		else
-		{
-			ReleaseBuffer(buffer);
-			*userbuf = InvalidBuffer;
-		}
+		ReleaseBuffer(buffer);
+		*userbuf = InvalidBuffer;
 		tuple->t_data = NULL;
 		return false;
 	}
@@ -1441,13 +1435,8 @@ heap_fetch(Relation relation,
 	if (!ItemIdIsNormal(lp))
 	{
 		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
-		if (keep_buf)
-			*userbuf = buffer;
-		else
-		{
-			ReleaseBuffer(buffer);
-			*userbuf = InvalidBuffer;
-		}
+		ReleaseBuffer(buffer);
+		*userbuf = InvalidBuffer;
 		tuple->t_data = NULL;
 		return false;
 	}
@@ -1486,14 +1475,9 @@ heap_fetch(Relation relation,
 		return true;
 	}
 
-	/* Tuple failed time qual, but maybe caller wants to see it anyway. */
-	if (keep_buf)
-		*userbuf = buffer;
-	else
-	{
-		ReleaseBuffer(buffer);
-		*userbuf = InvalidBuffer;
-	}
+	/* Tuple failed time qual */
+	ReleaseBuffer(buffer);
+	*userbuf = InvalidBuffer;
 
 	return false;
 }
@@ -1886,40 +1870,12 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
  * The new tuple is stamped with current transaction ID and the specified
  * command ID.
  *
- * If the HEAP_INSERT_SKIP_WAL option is specified, the new tuple is not
- * logged in WAL, even for a non-temp relation.  Safe usage of this behavior
- * requires that we arrange that all new tuples go into new pages not
- * containing any tuples from other transactions, and that the relation gets
- * fsync'd before commit.  (See also heap_sync() comments)
+ * See table_insert for comments about most of the input flags, except that
+ * this routine directly takes a tuple rather than a slot.
  *
- * The HEAP_INSERT_SKIP_FSM option is passed directly to
- * RelationGetBufferForTuple, which see for more info.
- *
- * HEAP_INSERT_FROZEN should only be specified for inserts into
- * relfilenodes created during the current subtransaction and when
- * there are no prior snapshots or pre-existing portals open.
- * This causes rows to be frozen, which is an MVCC violation and
- * requires explicit options chosen by user.
- *
- * HEAP_INSERT_SPECULATIVE is used on so-called "speculative insertions",
- * which can be backed out afterwards without aborting the whole transaction.
- * Other sessions can wait for the speculative insertion to be confirmed,
- * turning it into a regular tuple, or aborted, as if it never existed.
- * Speculatively inserted tuples behave as "value locks" of short duration,
- * used to implement INSERT .. ON CONFLICT.
- *
- * HEAP_INSERT_NO_LOGICAL force-disables the emitting of logical decoding
- * information for the tuple. This should solely be used during table rewrites
- * where RelationIsLogicallyLogged(relation) is not yet accurate for the new
- * relation.
- *
- * Note that most of these options will be applied when inserting into the
- * heap's TOAST table, too, if the tuple requires any out-of-line data.  Only
- * HEAP_INSERT_SPECULATIVE is explicitly ignored, as the toast data does not
- * partake in speculative insertion.
- *
- * The BulkInsertState object (if any; bistate can be NULL for default
- * behavior) is also just passed through to RelationGetBufferForTuple.
+ * There's corresponding HEAP_INSERT_ options to all the TABLE_INSERT_
+ * options, and there additionally is HEAP_INSERT_SPECULATIVE which is used to
+ * implement table_insert_speculative().
  *
  * On return the header fields of *tup are updated to match the stored tuple;
  * in particular tup->t_self receives the actual TID where the tuple was
@@ -2489,36 +2445,20 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
 /*
  *	heap_delete - delete a tuple
  *
- * NB: do not call this directly unless you are prepared to deal with
- * concurrent-update conditions.  Use simple_heap_delete instead.
+ * See table_delete() for an explanation of the parameters.
  *
- *	relation - table to be modified (caller must hold suitable lock)
- *	tid - TID of tuple to be deleted
- *	cid - delete command ID (used for visibility test, and stored into
- *		cmax if successful)
- *	crosscheck - if not InvalidSnapshot, also check tuple against this
- *	wait - true if should wait for any conflicting update to commit/abort
- *	hufd - output parameter, filled in failure cases (see below)
- *	changingPart - true iff the tuple is being moved to another partition
- *		table due to an update of the partition key. Otherwise, false.
  *
- * Normal, successful return value is HeapTupleMayBeUpdated, which
- * actually means we did delete it.  Failure return codes are
- * HeapTupleSelfUpdated, HeapTupleUpdated, or HeapTupleBeingUpdated
- * (the last only possible if wait == false).
- *
- * In the failure cases, the routine fills *hufd with the tuple's t_ctid,
+ * In the failure cases, the routine fills *tmfd with the tuple's t_ctid,
  * t_xmax (resolving a possible MultiXact, if necessary), and t_cmax
- * (the last only for HeapTupleSelfUpdated, since we
+ * (the last only for TableTupleSelfModified, since we
  * cannot obtain cmax from a combocid generated by another transaction).
- * See comments for struct HeapUpdateFailureData for additional info.
  */
-HTSU_Result
+TM_Result
 heap_delete(Relation relation, ItemPointer tid,
 			CommandId cid, Snapshot crosscheck, bool wait,
-			HeapUpdateFailureData *hufd, bool changingPart)
+			TM_FailureData *tmfd, bool changingPart)
 {
-	HTSU_Result result;
+	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
 	ItemId		lp;
 	HeapTupleData tp;
@@ -2586,14 +2526,14 @@ heap_delete(Relation relation, ItemPointer tid,
 l1:
 	result = HeapTupleSatisfiesUpdate(&tp, cid, buffer);
 
-	if (result == HeapTupleInvisible)
+	if (result == TableTupleInvisible)
 	{
 		UnlockReleaseBuffer(buffer);
 		ereport(ERROR,
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 				 errmsg("attempted to delete invisible tuple")));
 	}
-	else if (result == HeapTupleBeingUpdated && wait)
+	else if (result == TableTupleBeingModified && wait)
 	{
 		TransactionId xwait;
 		uint16		infomask;
@@ -2687,35 +2627,38 @@ l1:
 		if ((tp.t_data->t_infomask & HEAP_XMAX_INVALID) ||
 			HEAP_XMAX_IS_LOCKED_ONLY(tp.t_data->t_infomask) ||
 			HeapTupleHeaderIsOnlyLocked(tp.t_data))
-			result = HeapTupleMayBeUpdated;
+			result = TableTupleMayBeModified;
 		else
-			result = HeapTupleUpdated;
+			result = TableTupleUpdated;
 	}
 
-	if (crosscheck != InvalidSnapshot && result == HeapTupleMayBeUpdated)
+	if (crosscheck != InvalidSnapshot && result == TableTupleMayBeModified)
 	{
 		/* Perform additional check for transaction-snapshot mode RI updates */
 		if (!HeapTupleSatisfiesVisibility(&tp, crosscheck, buffer))
-			result = HeapTupleUpdated;
+			result = TableTupleUpdated;
 	}
 
-	if (result != HeapTupleMayBeUpdated)
+	if (result != TableTupleMayBeModified)
 	{
-		Assert(result == HeapTupleSelfUpdated ||
-			   result == HeapTupleUpdated ||
-			   result == HeapTupleBeingUpdated);
+		Assert(result == TableTupleSelfModified ||
+			   result == TableTupleUpdated ||
+			   result == TableTupleDeleted ||
+			   result == TableTupleBeingModified);
 		Assert(!(tp.t_data->t_infomask & HEAP_XMAX_INVALID));
-		hufd->ctid = tp.t_data->t_ctid;
-		hufd->xmax = HeapTupleHeaderGetUpdateXid(tp.t_data);
-		if (result == HeapTupleSelfUpdated)
-			hufd->cmax = HeapTupleHeaderGetCmax(tp.t_data);
+		tmfd->ctid = tp.t_data->t_ctid;
+		tmfd->xmax = HeapTupleHeaderGetUpdateXid(tp.t_data);
+		if (result == TableTupleSelfModified)
+			tmfd->cmax = HeapTupleHeaderGetCmax(tp.t_data);
 		else
-			hufd->cmax = InvalidCommandId;
+			tmfd->cmax = InvalidCommandId;
 		UnlockReleaseBuffer(buffer);
 		if (have_tuple_lock)
 			UnlockTupleTuplock(relation, &(tp.t_self), LockTupleExclusive);
 		if (vmbuffer != InvalidBuffer)
 			ReleaseBuffer(vmbuffer);
+		if (result == TableTupleUpdated && ItemPointerEquals(tid, &tmfd->ctid))
+			result = TableTupleDeleted;
 		return result;
 	}
 
@@ -2896,7 +2839,7 @@ l1:
 	if (old_key_tuple != NULL && old_key_copied)
 		heap_freetuple(old_key_tuple);
 
-	return HeapTupleMayBeUpdated;
+	return TableTupleMayBeModified;
 }
 
 /*
@@ -2910,28 +2853,32 @@ l1:
 void
 simple_heap_delete(Relation relation, ItemPointer tid)
 {
-	HTSU_Result result;
-	HeapUpdateFailureData hufd;
+	TM_Result	result;
+	TM_FailureData tmfd;
 
 	result = heap_delete(relation, tid,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &hufd, false /* changingPart */ );
+						 &tmfd, false /* changingPart */ );
 	switch (result)
 	{
-		case HeapTupleSelfUpdated:
+		case TableTupleSelfModified:
 			/* Tuple was already updated in current command? */
 			elog(ERROR, "tuple already updated by self");
 			break;
 
-		case HeapTupleMayBeUpdated:
+		case TableTupleMayBeModified:
 			/* done successfully */
 			break;
 
-		case HeapTupleUpdated:
+		case TableTupleUpdated:
 			elog(ERROR, "tuple concurrently updated");
 			break;
 
+		case TableTupleDeleted:
+			elog(ERROR, "tuple concurrently deleted");
+			break;
+
 		default:
 			elog(ERROR, "unrecognized heap_delete status: %u", result);
 			break;
@@ -2941,42 +2888,19 @@ simple_heap_delete(Relation relation, ItemPointer tid)
 /*
  *	heap_update - replace a tuple
  *
- * NB: do not call this directly unless you are prepared to deal with
- * concurrent-update conditions.  Use simple_heap_update instead.
+ * See table_update() for an explanation of the parameters.
  *
- *	relation - table to be modified (caller must hold suitable lock)
- *	otid - TID of old tuple to be replaced
- *	newtup - newly constructed tuple data to store
- *	cid - update command ID (used for visibility test, and stored into
- *		cmax/cmin if successful)
- *	crosscheck - if not InvalidSnapshot, also check old tuple against this
- *	wait - true if should wait for any conflicting update to commit/abort
- *	hufd - output parameter, filled in failure cases (see below)
- *	lockmode - output parameter, filled with lock mode acquired on tuple
- *
- * Normal, successful return value is HeapTupleMayBeUpdated, which
- * actually means we *did* update it.  Failure return codes are
- * HeapTupleSelfUpdated, HeapTupleUpdated, or HeapTupleBeingUpdated
- * (the last only possible if wait == false).
- *
- * On success, the header fields of *newtup are updated to match the new
- * stored tuple; in particular, newtup->t_self is set to the TID where the
- * new tuple was inserted, and its HEAP_ONLY_TUPLE flag is set iff a HOT
- * update was done.  However, any TOAST changes in the new tuple's
- * data are not reflected into *newtup.
- *
- * In the failure cases, the routine fills *hufd with the tuple's t_ctid,
- * t_xmax (resolving a possible MultiXact, if necessary), and t_cmax
- * (the last only for HeapTupleSelfUpdated, since we
- * cannot obtain cmax from a combocid generated by another transaction).
- * See comments for struct HeapUpdateFailureData for additional info.
+ * In the failure cases, the routine fills *tmfd with the tuple's t_ctid,
+ * t_xmax (resolving a possible MultiXact, if necessary), and t_cmax (the last
+ * only for TableTupleSelfModified, since we cannot obtain cmax from a
+ * combocid generated by another transaction).
  */
-HTSU_Result
+TM_Result
 heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 			CommandId cid, Snapshot crosscheck, bool wait,
-			HeapUpdateFailureData *hufd, LockTupleMode *lockmode)
+			TM_FailureData *tmfd, LockTupleMode *lockmode)
 {
-	HTSU_Result result;
+	TM_Result	result;
 	TransactionId xid = GetCurrentTransactionId();
 	Bitmapset  *hot_attrs;
 	Bitmapset  *key_attrs;
@@ -3150,16 +3074,16 @@ l2:
 	result = HeapTupleSatisfiesUpdate(&oldtup, cid, buffer);
 
 	/* see below about the "no wait" case */
-	Assert(result != HeapTupleBeingUpdated || wait);
+	Assert(result != TableTupleBeingModified || wait);
 
-	if (result == HeapTupleInvisible)
+	if (result == TableTupleInvisible)
 	{
 		UnlockReleaseBuffer(buffer);
 		ereport(ERROR,
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 				 errmsg("attempted to update invisible tuple")));
 	}
-	else if (result == HeapTupleBeingUpdated && wait)
+	else if (result == TableTupleBeingModified && wait)
 	{
 		TransactionId xwait;
 		uint16		infomask;
@@ -3250,7 +3174,7 @@ l2:
 			 * MultiXact. In that case, we need to check whether it committed
 			 * or aborted. If it aborted we are safe to update it again;
 			 * otherwise there is an update conflict, and we have to return
-			 * HeapTupleUpdated below.
+			 * TableTupleUpdated below.
 			 *
 			 * In the LockTupleExclusive case, we still need to preserve the
 			 * surviving members: those would include the tuple locks we had
@@ -3322,28 +3246,29 @@ l2:
 				can_continue = true;
 		}
 
-		result = can_continue ? HeapTupleMayBeUpdated : HeapTupleUpdated;
+		result = can_continue ? TableTupleMayBeModified : TableTupleUpdated;
 	}
 
-	if (crosscheck != InvalidSnapshot && result == HeapTupleMayBeUpdated)
+	if (crosscheck != InvalidSnapshot && result == TableTupleMayBeModified)
 	{
 		/* Perform additional check for transaction-snapshot mode RI updates */
 		if (!HeapTupleSatisfiesVisibility(&oldtup, crosscheck, buffer))
-			result = HeapTupleUpdated;
+			result = TableTupleUpdated;
 	}
 
-	if (result != HeapTupleMayBeUpdated)
+	if (result != TableTupleMayBeModified)
 	{
-		Assert(result == HeapTupleSelfUpdated ||
-			   result == HeapTupleUpdated ||
-			   result == HeapTupleBeingUpdated);
+		Assert(result == TableTupleSelfModified ||
+			   result == TableTupleUpdated ||
+			   result == TableTupleDeleted ||
+			   result == TableTupleBeingModified);
 		Assert(!(oldtup.t_data->t_infomask & HEAP_XMAX_INVALID));
-		hufd->ctid = oldtup.t_data->t_ctid;
-		hufd->xmax = HeapTupleHeaderGetUpdateXid(oldtup.t_data);
-		if (result == HeapTupleSelfUpdated)
-			hufd->cmax = HeapTupleHeaderGetCmax(oldtup.t_data);
+		tmfd->ctid = oldtup.t_data->t_ctid;
+		tmfd->xmax = HeapTupleHeaderGetUpdateXid(oldtup.t_data);
+		if (result == TableTupleSelfModified)
+			tmfd->cmax = HeapTupleHeaderGetCmax(oldtup.t_data);
 		else
-			hufd->cmax = InvalidCommandId;
+			tmfd->cmax = InvalidCommandId;
 		UnlockReleaseBuffer(buffer);
 		if (have_tuple_lock)
 			UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
@@ -3354,6 +3279,9 @@ l2:
 		bms_free(id_attrs);
 		bms_free(modified_attrs);
 		bms_free(interesting_attrs);
+		// FIXME, this needs to be implemented above
+		if (result == TableTupleUpdated && ItemPointerEquals(otid, &tmfd->ctid))
+			result = TableTupleDeleted;
 		return result;
 	}
 
@@ -3828,7 +3756,7 @@ l2:
 	bms_free(modified_attrs);
 	bms_free(interesting_attrs);
 
-	return HeapTupleMayBeUpdated;
+	return TableTupleMayBeModified;
 }
 
 /*
@@ -3948,29 +3876,33 @@ HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
 void
 simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup)
 {
-	HTSU_Result result;
-	HeapUpdateFailureData hufd;
+	TM_Result	result;
+	TM_FailureData tmfd;
 	LockTupleMode lockmode;
 
 	result = heap_update(relation, otid, tup,
 						 GetCurrentCommandId(true), InvalidSnapshot,
 						 true /* wait for commit */ ,
-						 &hufd, &lockmode);
+						 &tmfd, &lockmode);
 	switch (result)
 	{
-		case HeapTupleSelfUpdated:
+		case TableTupleSelfModified:
 			/* Tuple was already updated in current command? */
 			elog(ERROR, "tuple already updated by self");
 			break;
 
-		case HeapTupleMayBeUpdated:
+		case TableTupleMayBeModified:
 			/* done successfully */
 			break;
 
-		case HeapTupleUpdated:
+		case TableTupleUpdated:
 			elog(ERROR, "tuple concurrently updated");
 			break;
 
+		case TableTupleDeleted:
+			elog(ERROR, "tuple concurrently deleted");
+			break;
+
 		default:
 			elog(ERROR, "unrecognized heap_update status: %u", result);
 			break;
@@ -4005,7 +3937,7 @@ get_mxact_status_for_lock(LockTupleMode mode, bool is_update)
  *
  * Input parameters:
  *	relation: relation containing tuple (caller must hold suitable lock)
- *	tuple->t_self: TID of tuple to lock (rest of struct need not be valid)
+ *	tid: TID of tuple to lock
  *	cid: current command ID (used for visibility test, and stored into
  *		tuple's cmax if lock is successful)
  *	mode: indicates if shared or exclusive tuple lock is desired
@@ -4016,32 +3948,26 @@ get_mxact_status_for_lock(LockTupleMode mode, bool is_update)
  * Output parameters:
  *	*tuple: all fields filled in
  *	*buffer: set to buffer holding tuple (pinned but not locked at exit)
- *	*hufd: filled in failure cases (see below)
+ *	*tmfd: filled in failure cases (see below)
  *
- * Function result may be:
- *	HeapTupleMayBeUpdated: lock was successfully acquired
- *	HeapTupleInvisible: lock failed because tuple was never visible to us
- *	HeapTupleSelfUpdated: lock failed because tuple updated by self
- *	HeapTupleUpdated: lock failed because tuple updated by other xact
- *	HeapTupleWouldBlock: lock couldn't be acquired and wait_policy is skip
+ * Function results are the same as table_lock_tuple().
  *
- * In the failure cases other than HeapTupleInvisible, the routine fills
- * *hufd with the tuple's t_ctid, t_xmax (resolving a possible MultiXact,
- * if necessary), and t_cmax (the last only for HeapTupleSelfUpdated,
+ * In the failure cases other than TableTupleInvisible, the routine fills
+ * *tmfd with the tuple's t_ctid, t_xmax (resolving a possible MultiXact,
+ * if necessary), and t_cmax (the last only for TableTupleSelfModified,
  * since we cannot obtain cmax from a combocid generated by another
  * transaction).
- * See comments for struct HeapUpdateFailureData for additional info.
+ * See comments for struct TM_FailureData for additional info.
  *
  * See README.tuplock for a thorough explanation of this mechanism.
  */
-HTSU_Result
-heap_lock_tuple(Relation relation, HeapTuple tuple,
+TM_Result
+heap_lock_tuple(Relation relation, ItemPointer tid,
 				CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
 				bool follow_updates,
-				Buffer *buffer, HeapUpdateFailureData *hufd)
+				HeapTuple tuple, Buffer *buffer, TM_FailureData *tmfd)
 {
-	HTSU_Result result;
-	ItemPointer tid = &(tuple->t_self);
+	TM_Result	result;
 	ItemId		lp;
 	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
@@ -4076,11 +4002,12 @@ heap_lock_tuple(Relation relation, HeapTuple tuple,
 	tuple->t_data = (HeapTupleHeader) PageGetItem(page, lp);
 	tuple->t_len = ItemIdGetLength(lp);
 	tuple->t_tableOid = RelationGetRelid(relation);
+	tuple->t_self = *tid;
 
 l3:
 	result = HeapTupleSatisfiesUpdate(tuple, cid, *buffer);
 
-	if (result == HeapTupleInvisible)
+	if (result == TableTupleInvisible)
 	{
 		/*
 		 * This is possible, but only when locking a tuple for ON CONFLICT
@@ -4088,10 +4015,10 @@ l3:
 		 * order to give that case the opportunity to throw a more specific
 		 * error.
 		 */
-		result = HeapTupleInvisible;
+		result = TableTupleInvisible;
 		goto out_locked;
 	}
-	else if (result == HeapTupleBeingUpdated || result == HeapTupleUpdated)
+	else if (result == TableTupleBeingModified || result == TableTupleUpdated || result == TableTupleDeleted)
 	{
 		TransactionId xwait;
 		uint16		infomask;
@@ -4147,7 +4074,7 @@ l3:
 					if (TUPLOCK_from_mxstatus(members[i].status) >= mode)
 					{
 						pfree(members);
-						result = HeapTupleMayBeUpdated;
+						result = TableTupleMayBeModified;
 						goto out_unlocked;
 					}
 				}
@@ -4163,20 +4090,20 @@ l3:
 						Assert(HEAP_XMAX_IS_KEYSHR_LOCKED(infomask) ||
 							   HEAP_XMAX_IS_SHR_LOCKED(infomask) ||
 							   HEAP_XMAX_IS_EXCL_LOCKED(infomask));
-						result = HeapTupleMayBeUpdated;
+						result = TableTupleMayBeModified;
 						goto out_unlocked;
 					case LockTupleShare:
 						if (HEAP_XMAX_IS_SHR_LOCKED(infomask) ||
 							HEAP_XMAX_IS_EXCL_LOCKED(infomask))
 						{
-							result = HeapTupleMayBeUpdated;
+							result = TableTupleMayBeModified;
 							goto out_unlocked;
 						}
 						break;
 					case LockTupleNoKeyExclusive:
 						if (HEAP_XMAX_IS_EXCL_LOCKED(infomask))
 						{
-							result = HeapTupleMayBeUpdated;
+							result = TableTupleMayBeModified;
 							goto out_unlocked;
 						}
 						break;
@@ -4184,7 +4111,7 @@ l3:
 						if (HEAP_XMAX_IS_EXCL_LOCKED(infomask) &&
 							infomask2 & HEAP_KEYS_UPDATED)
 						{
-							result = HeapTupleMayBeUpdated;
+							result = TableTupleMayBeModified;
 							goto out_unlocked;
 						}
 						break;
@@ -4233,12 +4160,12 @@ l3:
 				 */
 				if (follow_updates && updated)
 				{
-					HTSU_Result res;
+					TM_Result	res;
 
 					res = heap_lock_updated_tuple(relation, tuple, &t_ctid,
 												  GetCurrentTransactionId(),
 												  mode);
-					if (res != HeapTupleMayBeUpdated)
+					if (res != TableTupleMayBeModified)
 					{
 						result = res;
 						/* recovery code expects to have buffer lock held */
@@ -4371,7 +4298,7 @@ l3:
 		 * or we must wait for the locking transaction or multixact; so below
 		 * we ensure that we grab buffer lock after the sleep.
 		 */
-		if (require_sleep && result == HeapTupleUpdated)
+		if (require_sleep && (result == TableTupleUpdated || result == TableTupleDeleted))
 		{
 			LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
 			goto failed;
@@ -4394,7 +4321,7 @@ l3:
 				 * This can only happen if wait_policy is Skip and the lock
 				 * couldn't be obtained.
 				 */
-				result = HeapTupleWouldBlock;
+				result = TableTupleWouldBlock;
 				/* recovery code expects to have buffer lock held */
 				LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
 				goto failed;
@@ -4420,7 +4347,7 @@ l3:
 														status, infomask, relation,
 														NULL))
 						{
-							result = HeapTupleWouldBlock;
+							result = TableTupleWouldBlock;
 							/* recovery code expects to have buffer lock held */
 							LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
 							goto failed;
@@ -4460,7 +4387,7 @@ l3:
 					case LockWaitSkip:
 						if (!ConditionalXactLockTableWait(xwait))
 						{
-							result = HeapTupleWouldBlock;
+							result = TableTupleWouldBlock;
 							/* recovery code expects to have buffer lock held */
 							LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
 							goto failed;
@@ -4479,12 +4406,12 @@ l3:
 			/* if there are updates, follow the update chain */
 			if (follow_updates && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
 			{
-				HTSU_Result res;
+				TM_Result	res;
 
 				res = heap_lock_updated_tuple(relation, tuple, &t_ctid,
 											  GetCurrentTransactionId(),
 											  mode);
-				if (res != HeapTupleMayBeUpdated)
+				if (res != TableTupleMayBeModified)
 				{
 					result = res;
 					/* recovery code expects to have buffer lock held */
@@ -4530,23 +4457,25 @@ l3:
 			(tuple->t_data->t_infomask & HEAP_XMAX_INVALID) ||
 			HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_data->t_infomask) ||
 			HeapTupleHeaderIsOnlyLocked(tuple->t_data))
-			result = HeapTupleMayBeUpdated;
+			result = TableTupleMayBeModified;
+		else if (ItemPointerEquals(&tuple->t_self, &tuple->t_data->t_ctid))
+			result = TableTupleDeleted;
 		else
-			result = HeapTupleUpdated;
+			result = TableTupleUpdated;
 	}
 
 failed:
-	if (result != HeapTupleMayBeUpdated)
+	if (result != TableTupleMayBeModified)
 	{
-		Assert(result == HeapTupleSelfUpdated || result == HeapTupleUpdated ||
-			   result == HeapTupleWouldBlock);
+		Assert(result == TableTupleSelfModified || result == TableTupleUpdated ||
+			   result == TableTupleWouldBlock || result == TableTupleDeleted);
 		Assert(!(tuple->t_data->t_infomask & HEAP_XMAX_INVALID));
-		hufd->ctid = tuple->t_data->t_ctid;
-		hufd->xmax = HeapTupleHeaderGetUpdateXid(tuple->t_data);
-		if (result == HeapTupleSelfUpdated)
-			hufd->cmax = HeapTupleHeaderGetCmax(tuple->t_data);
+		tmfd->ctid = tuple->t_data->t_ctid;
+		tmfd->xmax = HeapTupleHeaderGetUpdateXid(tuple->t_data);
+		if (result == TableTupleSelfModified)
+			tmfd->cmax = HeapTupleHeaderGetCmax(tuple->t_data);
 		else
-			hufd->cmax = InvalidCommandId;
+			tmfd->cmax = InvalidCommandId;
 		goto out_locked;
 	}
 
@@ -4664,7 +4593,7 @@ failed:
 
 	END_CRIT_SECTION();
 
-	result = HeapTupleMayBeUpdated;
+	result = TableTupleMayBeModified;
 
 out_locked:
 	LockBuffer(*buffer, BUFFER_LOCK_UNLOCK);
@@ -5022,8 +4951,8 @@ l5:
  * with the given xid, does the current transaction need to wait, fail, or can
  * it continue if it wanted to acquire a lock of the given mode?  "needwait"
  * is set to true if waiting is necessary; if it can continue, then
- * HeapTupleMayBeUpdated is returned.  If the lock is already held by the
- * current transaction, return HeapTupleSelfUpdated.  In case of a conflict
+ * TableTupleMayBeModified is returned.  If the lock is already held by the
+ * current transaction, return TableTupleSelfModified.  In case of a conflict
  * with another transaction, a different HeapTupleSatisfiesUpdate return code
  * is returned.
  *
@@ -5031,7 +4960,7 @@ l5:
  * lock held by a single Xid, i.e. not a real MultiXactId; we express it this
  * way for simplicity of API.
  */
-static HTSU_Result
+static TM_Result
 test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
 						   LockTupleMode mode, bool *needwait)
 {
@@ -5052,7 +4981,7 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
 		 * very rare but can happen if multiple transactions are trying to
 		 * lock an ancient version of the same tuple.
 		 */
-		return HeapTupleSelfUpdated;
+		return TableTupleSelfModified;
 	}
 	else if (TransactionIdIsInProgress(xid))
 	{
@@ -5072,10 +5001,10 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
 		 * If we set needwait above, then this value doesn't matter;
 		 * otherwise, this value signals to caller that it's okay to proceed.
 		 */
-		return HeapTupleMayBeUpdated;
+		return TableTupleMayBeModified;
 	}
 	else if (TransactionIdDidAbort(xid))
-		return HeapTupleMayBeUpdated;
+		return TableTupleMayBeModified;
 	else if (TransactionIdDidCommit(xid))
 	{
 		/*
@@ -5094,18 +5023,18 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
 		 * always be checked.
 		 */
 		if (!ISUPDATE_from_mxstatus(status))
-			return HeapTupleMayBeUpdated;
+			return TableTupleMayBeModified;
 
 		if (DoLockModesConflict(LOCKMODE_from_mxstatus(status),
 								LOCKMODE_from_mxstatus(wantedstatus)))
 			/* bummer */
-			return HeapTupleUpdated;
+			return TableTupleUpdated;
 
-		return HeapTupleMayBeUpdated;
+		return TableTupleMayBeModified;
 	}
 
 	/* Not in progress, not aborted, not committed -- must have crashed */
-	return HeapTupleMayBeUpdated;
+	return TableTupleMayBeModified;
 }
 
 
@@ -5116,11 +5045,11 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
  * xid with the given mode; if this tuple is updated, recurse to lock the new
  * version as well.
  */
-static HTSU_Result
+static TM_Result
 heap_lock_updated_tuple_rec(Relation rel, ItemPointer tid, TransactionId xid,
 							LockTupleMode mode)
 {
-	HTSU_Result result;
+	TM_Result	result;
 	ItemPointerData tupid;
 	HeapTupleData mytup;
 	Buffer		buf;
@@ -5145,7 +5074,7 @@ heap_lock_updated_tuple_rec(Relation rel, ItemPointer tid, TransactionId xid,
 		block = ItemPointerGetBlockNumber(&tupid);
 		ItemPointerCopy(&tupid, &(mytup.t_self));
 
-		if (!heap_fetch(rel, SnapshotAny, &mytup, &buf, false, NULL))
+		if (!heap_fetch(rel, SnapshotAny, &mytup, &buf, NULL))
 		{
 			/*
 			 * if we fail to find the updated version of the tuple, it's
@@ -5154,7 +5083,7 @@ heap_lock_updated_tuple_rec(Relation rel, ItemPointer tid, TransactionId xid,
 			 * chain, and there's no further tuple to lock: return success to
 			 * caller.
 			 */
-			result = HeapTupleMayBeUpdated;
+			result = TableTupleMayBeModified;
 			goto out_unlocked;
 		}
 
@@ -5203,7 +5132,7 @@ l4:
 			!TransactionIdEquals(HeapTupleHeaderGetXmin(mytup.t_data),
 								 priorXmax))
 		{
-			result = HeapTupleMayBeUpdated;
+			result = TableTupleMayBeModified;
 			goto out_locked;
 		}
 
@@ -5214,7 +5143,7 @@ l4:
 		 */
 		if (TransactionIdDidAbort(HeapTupleHeaderGetXmin(mytup.t_data)))
 		{
-			result = HeapTupleMayBeUpdated;
+			result = TableTupleMayBeModified;
 			goto out_locked;
 		}
 
@@ -5269,7 +5198,7 @@ l4:
 					 * this tuple and continue locking the next version in the
 					 * update chain.
 					 */
-					if (result == HeapTupleSelfUpdated)
+					if (result == TableTupleSelfModified)
 					{
 						pfree(members);
 						goto next;
@@ -5284,7 +5213,7 @@ l4:
 						pfree(members);
 						goto l4;
 					}
-					if (result != HeapTupleMayBeUpdated)
+					if (result != TableTupleMayBeModified)
 					{
 						pfree(members);
 						goto out_locked;
@@ -5345,7 +5274,7 @@ l4:
 				 * either.  We just need to skip this tuple and continue
 				 * locking the next version in the update chain.
 				 */
-				if (result == HeapTupleSelfUpdated)
+				if (result == TableTupleSelfModified)
 					goto next;
 
 				if (needwait)
@@ -5355,7 +5284,7 @@ l4:
 									  XLTW_LockUpdated);
 					goto l4;
 				}
-				if (result != HeapTupleMayBeUpdated)
+				if (result != TableTupleMayBeModified)
 				{
 					goto out_locked;
 				}
@@ -5415,7 +5344,7 @@ next:
 			ItemPointerEquals(&mytup.t_self, &mytup.t_data->t_ctid) ||
 			HeapTupleHeaderIsOnlyLocked(mytup.t_data))
 		{
-			result = HeapTupleMayBeUpdated;
+			result = TableTupleMayBeModified;
 			goto out_locked;
 		}
 
@@ -5425,9 +5354,14 @@ next:
 		UnlockReleaseBuffer(buf);
 	}
 
-	result = HeapTupleMayBeUpdated;
+	result = TableTupleMayBeModified;
 
 out_locked:
+
+	// FIXME
+	if (result == TableTupleUpdated && ItemPointerEquals(&mytup.t_self, &mytup.t_data->t_ctid))
+		result = TableTupleDeleted;
+
 	UnlockReleaseBuffer(buf);
 
 out_unlocked:
@@ -5459,7 +5393,7 @@ out_unlocked:
  * transaction cannot be using repeatable read or serializable isolation
  * levels, because that would lead to a serializability failure.
  */
-static HTSU_Result
+static TM_Result
 heap_lock_updated_tuple(Relation rel, HeapTuple tuple, ItemPointer ctid,
 						TransactionId xid, LockTupleMode mode)
 {
@@ -5485,7 +5419,7 @@ heap_lock_updated_tuple(Relation rel, HeapTuple tuple, ItemPointer ctid,
 	}
 
 	/* nothing to lock */
-	return HeapTupleMayBeUpdated;
+	return TableTupleMayBeModified;
 }
 
 /*
@@ -5505,7 +5439,7 @@ heap_lock_updated_tuple(Relation rel, HeapTuple tuple, ItemPointer ctid,
  * An explicit confirmation WAL record also makes logical decoding simpler.
  */
 void
-heap_finish_speculative(Relation relation, HeapTuple tuple)
+heap_finish_speculative(Relation relation, ItemPointer tid)
 {
 	Buffer		buffer;
 	Page		page;
@@ -5513,11 +5447,11 @@ heap_finish_speculative(Relation relation, HeapTuple tuple)
 	ItemId		lp = NULL;
 	HeapTupleHeader htup;
 
-	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
+	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
 	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 	page = (Page) BufferGetPage(buffer);
 
-	offnum = ItemPointerGetOffsetNumber(&(tuple->t_self));
+	offnum = ItemPointerGetOffsetNumber(tid);
 	if (PageGetMaxOffsetNumber(page) >= offnum)
 		lp = PageGetItemId(page, offnum);
 
@@ -5533,7 +5467,7 @@ heap_finish_speculative(Relation relation, HeapTuple tuple)
 	/* NO EREPORT(ERROR) from here till changes are logged */
 	START_CRIT_SECTION();
 
-	Assert(HeapTupleHeaderIsSpeculative(tuple->t_data));
+	Assert(HeapTupleHeaderIsSpeculative(htup));
 
 	MarkBufferDirty(buffer);
 
@@ -5541,7 +5475,7 @@ heap_finish_speculative(Relation relation, HeapTuple tuple)
 	 * Replace the speculative insertion token with a real t_ctid, pointing to
 	 * itself like it does on regular tuples.
 	 */
-	htup->t_ctid = tuple->t_self;
+	htup->t_ctid = *tid;
 
 	/* XLOG stuff */
 	if (RelationNeedsWAL(relation))
@@ -5549,7 +5483,7 @@ heap_finish_speculative(Relation relation, HeapTuple tuple)
 		xl_heap_confirm xlrec;
 		XLogRecPtr	recptr;
 
-		xlrec.offnum = ItemPointerGetOffsetNumber(&tuple->t_self);
+		xlrec.offnum = ItemPointerGetOffsetNumber(tid);
 
 		XLogBeginInsert();
 
@@ -5596,10 +5530,9 @@ heap_finish_speculative(Relation relation, HeapTuple tuple)
  * confirmation records.
  */
 void
-heap_abort_speculative(Relation relation, HeapTuple tuple)
+heap_abort_speculative(Relation relation, ItemPointer tid)
 {
 	TransactionId xid = GetCurrentTransactionId();
-	ItemPointer tid = &(tuple->t_self);
 	ItemId		lp;
 	HeapTupleData tp;
 	Page		page;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6a26fcef94c..0ec2f69b20e 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -21,7 +21,9 @@
 
 #include "access/heapam.h"
 #include "access/tableam.h"
+#include "access/xact.h"
 #include "storage/bufmgr.h"
+#include "storage/lmgr.h"
 #include "utils/builtins.h"
 
 
@@ -169,6 +171,321 @@ heapam_tuple_satisfies_snapshot(Relation rel, TupleTableSlot *slot,
 }
 
 
+/* ----------------------------------------------------------------------------
+ *  Functions for manipulations of physical tuples for heap AM.
+ * ----------------------------------------------------------------------------
+ */
+
+static void
+heapam_tuple_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
+					int options, BulkInsertState bistate)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+
+	/* Update the tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(relation);
+	if (slot->tts_tableOid != InvalidOid)
+		tuple->t_tableOid = slot->tts_tableOid;
+
+	/* Perform the insertion, and copy the resulting ItemPointer */
+	heap_insert(relation, tuple, cid, options, bistate);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+static void
+heapam_tuple_insert_speculative(Relation relation, TupleTableSlot *slot, CommandId cid,
+								int options, BulkInsertState bistate, uint32 specToken)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+
+	/* Update the tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(relation);
+	if (slot->tts_tableOid != InvalidOid)
+		tuple->t_tableOid = slot->tts_tableOid;
+
+	HeapTupleHeaderSetSpeculativeToken(tuple->t_data, specToken);
+	options |= HEAP_INSERT_SPECULATIVE;
+
+	/* Perform the insertion, and copy the resulting ItemPointer */
+	heap_insert(relation, tuple, cid, options, bistate);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+static void
+heapam_tuple_complete_speculative(Relation relation, TupleTableSlot *slot, uint32 spekToken,
+								  bool succeeded)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+
+	/* adjust the tuple's state accordingly */
+	if (!succeeded)
+		heap_finish_speculative(relation, &slot->tts_tid);
+	else
+		heap_abort_speculative(relation, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+static TM_Result
+heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
+					Snapshot snapshot, Snapshot crosscheck, bool wait,
+					TM_FailureData *tmfd, bool changingPart)
+{
+	/*
+	 * Currently Deleting of index tuples are handled at vacuum, in case if
+	 * the storage itself is cleaning the dead tuples by itself, it is the
+	 * time to call the index tuple deletion also.
+	 */
+	return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
+}
+
+
+static TM_Result
+heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
+					CommandId cid, Snapshot snapshot, Snapshot crosscheck,
+					bool wait, TM_FailureData *tmfd,
+					LockTupleMode *lockmode, bool *update_indexes)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+	TM_Result	result;
+
+	/* Update the tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(relation);
+	if (slot->tts_tableOid != InvalidOid)
+		tuple->t_tableOid = slot->tts_tableOid;
+
+	result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
+						 tmfd, lockmode);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	/*
+	 * Decide whether new index entries are needed for the tuple
+	 *
+	 * Note: heap_update returns the tid (location) of the new tuple in the
+	 * t_self field.
+	 *
+	 * If it's a HOT update, we mustn't insert new index entries.
+	 */
+	*update_indexes = result == TableTupleMayBeModified &&
+		!HeapTupleIsHeapOnly(tuple);
+
+	if (shouldFree)
+		pfree(tuple);
+
+	return result;
+}
+
+static TM_Result
+heapam_tuple_lock(Relation relation, ItemPointer tid, Snapshot snapshot,
+				  TupleTableSlot *slot, CommandId cid, LockTupleMode mode,
+				  LockWaitPolicy wait_policy, uint8 flags,
+				  TM_FailureData *tmfd)
+{
+	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+	TM_Result	result;
+	Buffer		buffer;
+	HeapTuple	tuple = &bslot->base.tupdata;
+	bool		follow_updates;
+
+	follow_updates = (flags & TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS) != 0;
+	tmfd->traversed = false;
+
+	Assert(TTS_IS_BUFFERTUPLE(slot));
+
+retry:
+	result = heap_lock_tuple(relation, tid, cid, mode, wait_policy,
+							 follow_updates, tuple, &buffer, tmfd);
+
+	if (result == TableTupleUpdated &&
+		(flags & TUPLE_LOCK_FLAG_FIND_LAST_VERSION))
+	{
+		ReleaseBuffer(buffer);
+		/* Should not encounter speculative tuple on recheck */
+		Assert(!HeapTupleHeaderIsSpeculative(tuple->t_data));
+
+		if (!ItemPointerEquals(&tmfd->ctid, &tuple->t_self))
+		{
+			SnapshotData SnapshotDirty;
+			TransactionId priorXmax;
+
+			/* it was updated, so look at the updated version */
+			*tid = tmfd->ctid;
+			/* updated row should have xmin matching this xmax */
+			priorXmax = tmfd->xmax;
+
+			/*
+			 * fetch target tuple
+			 *
+			 * Loop here to deal with updated or busy tuples
+			 */
+			InitDirtySnapshot(SnapshotDirty);
+			for (;;)
+			{
+				if (ItemPointerIndicatesMovedPartitions(tid))
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("tuple to be locked was already moved to another partition due to concurrent update")));
+
+				tuple->t_self = *tid;
+				if (heap_fetch(relation, &SnapshotDirty, tuple, &buffer, NULL))
+				{
+					/*
+					 * If xmin isn't what we're expecting, the slot must have
+					 * been recycled and reused for an unrelated tuple.  This
+					 * implies that the latest version of the row was deleted,
+					 * so we need do nothing.  (Should be safe to examine xmin
+					 * without getting buffer's content lock.  We assume
+					 * reading a TransactionId to be atomic, and Xmin never
+					 * changes in an existing tuple, except to invalid or
+					 * frozen, and neither of those can match priorXmax.)
+					 */
+					if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple->t_data),
+											 priorXmax))
+					{
+						ReleaseBuffer(buffer);
+						return TableTupleDeleted;
+					}
+
+					/* otherwise xmin should not be dirty... */
+					if (TransactionIdIsValid(SnapshotDirty.xmin))
+						elog(ERROR, "t_xmin is uncommitted in tuple to be updated");
+
+					/*
+					 * If tuple is being updated by other transaction then we
+					 * have to wait for its commit/abort, or die trying.
+					 */
+					if (TransactionIdIsValid(SnapshotDirty.xmax))
+					{
+						ReleaseBuffer(buffer);
+						switch (wait_policy)
+						{
+							case LockWaitBlock:
+								XactLockTableWait(SnapshotDirty.xmax,
+												  relation, &tuple->t_self,
+												  XLTW_FetchUpdated);
+								break;
+							case LockWaitSkip:
+								if (!ConditionalXactLockTableWait(SnapshotDirty.xmax))
+									/* skip instead of waiting */
+									return TableTupleWouldBlock;
+								break;
+							case LockWaitError:
+								if (!ConditionalXactLockTableWait(SnapshotDirty.xmax))
+									ereport(ERROR,
+											(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
+											 errmsg("could not obtain lock on row in relation \"%s\"",
+													RelationGetRelationName(relation))));
+								break;
+						}
+						continue;	/* loop back to repeat heap_fetch */
+					}
+
+					/*
+					 * If tuple was inserted by our own transaction, we have
+					 * to check cmin against es_output_cid: cmin >= current
+					 * CID means our command cannot see the tuple, so we
+					 * should ignore it. Otherwise heap_lock_tuple() will
+					 * throw an error, and so would any later attempt to
+					 * update or delete the tuple.  (We need not check cmax
+					 * because HeapTupleSatisfiesDirty will consider a tuple
+					 * deleted by our transaction dead, regardless of cmax.)
+					 * We just checked that priorXmax == xmin, so we can test
+					 * that variable instead of doing HeapTupleHeaderGetXmin
+					 * again.
+					 */
+					if (TransactionIdIsCurrentTransactionId(priorXmax) &&
+						HeapTupleHeaderGetCmin(tuple->t_data) >= cid)
+					{
+						ReleaseBuffer(buffer);
+						return result;
+					}
+
+					tmfd->traversed = true;
+					*tid = tuple->t_data->t_ctid;
+					ReleaseBuffer(buffer);
+					goto retry;
+				}
+
+				/*
+				 * If the referenced slot was actually empty, the latest
+				 * version of the row must have been deleted, so we need do
+				 * nothing.
+				 */
+				if (tuple->t_data == NULL)
+				{
+					return TableTupleDeleted;
+				}
+
+				/*
+				 * As above, if xmin isn't what we're expecting, do nothing.
+				 */
+				if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple->t_data),
+										 priorXmax))
+				{
+					if (BufferIsValid(buffer))
+						ReleaseBuffer(buffer);
+					return TableTupleDeleted;
+				}
+
+				/*
+				 * If we get here, the tuple was found but failed
+				 * SnapshotDirty. Assuming the xmin is either a committed xact
+				 * or our own xact (as it certainly should be if we're trying
+				 * to modify the tuple), this must mean that the row was
+				 * updated or deleted by either a committed xact or our own
+				 * xact.  If it was deleted, we can ignore it; if it was
+				 * updated then chain up to the next version and repeat the
+				 * whole process.
+				 *
+				 * As above, it should be safe to examine xmax and t_ctid
+				 * without the buffer content lock, because they can't be
+				 * changing.
+				 */
+				if (ItemPointerEquals(&tuple->t_self, &tuple->t_data->t_ctid))
+				{
+					/* deleted, so forget about it */
+					if (BufferIsValid(buffer))
+						ReleaseBuffer(buffer);
+					return TableTupleDeleted;
+				}
+
+				/* updated, so look at the updated row */
+				*tid = tuple->t_data->t_ctid;
+				/* updated row should have xmin matching this xmax */
+				priorXmax = HeapTupleHeaderGetUpdateXid(tuple->t_data);
+				if (BufferIsValid(buffer))
+					ReleaseBuffer(buffer);
+				/* loop back to fetch next in chain */
+			}
+		}
+		else
+		{
+			/* tuple was deleted, so give up */
+			return TableTupleDeleted;
+		}
+	}
+
+	slot->tts_tableOid = RelationGetRelid(relation);
+	if (slot->tts_tableOid != InvalidOid)
+		tuple->t_tableOid = slot->tts_tableOid;
+	/* store in slot, transferring existing pin */
+	ExecStorePinnedBufferHeapTuple(tuple, slot, buffer);
+
+	return result;
+}
+
+
 /* ------------------------------------------------------------------------
  * Definition of the heap table access method.
  * ------------------------------------------------------------------------
@@ -193,6 +510,13 @@ static const TableAmRoutine heapam_methods = {
 	.index_fetch_end = heapam_index_fetch_end,
 	.index_fetch_tuple = heapam_index_fetch_tuple,
 
+	.tuple_insert = heapam_tuple_insert,
+	.tuple_insert_speculative = heapam_tuple_insert_speculative,
+	.tuple_complete_speculative = heapam_tuple_complete_speculative,
+	.tuple_delete = heapam_tuple_delete,
+	.tuple_update = heapam_tuple_update,
+	.tuple_lock = heapam_tuple_lock,
+
 	.tuple_satisfies_snapshot = heapam_tuple_satisfies_snapshot,
 };
 
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 6cb38f80c68..b9d0475cde1 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -67,6 +67,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/subtrans.h"
+#include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
@@ -433,24 +434,24 @@ HeapTupleSatisfiesToast(HeapTuple htup, Snapshot snapshot,
  *
  *	The possible return codes are:
  *
- *	HeapTupleInvisible: the tuple didn't exist at all when the scan started,
+ *	TableTupleInvisible: the tuple didn't exist at all when the scan started,
  *	e.g. it was created by a later CommandId.
  *
- *	HeapTupleMayBeUpdated: The tuple is valid and visible, so it may be
+ *	TableTupleMayBeModified: The tuple is valid and visible, so it may be
  *	updated.
  *
- *	HeapTupleSelfUpdated: The tuple was updated by the current transaction,
+ *	TableTupleSelfModified: The tuple was updated by the current transaction,
  *	after the current scan started.
  *
- *	HeapTupleUpdated: The tuple was updated by a committed transaction.
+ *	TableTupleUpdated: The tuple was updated by a committed transaction.
  *
- *	HeapTupleBeingUpdated: The tuple is being updated by an in-progress
+ *	TableTupleBeingModified: The tuple is being updated by an in-progress
  *	transaction other than the current transaction.  (Note: this includes
  *	the case where the tuple is share-locked by a MultiXact, even if the
  *	MultiXact includes the current transaction.  Callers that want to
  *	distinguish that case must test for it themselves.)
  */
-HTSU_Result
+TM_Result
 HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 						 Buffer buffer)
 {
@@ -462,7 +463,7 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
 		if (HeapTupleHeaderXminInvalid(tuple))
-			return HeapTupleInvisible;
+			return TableTupleInvisible;
 
 		/* Used by pre-9.0 binary upgrades */
 		if (tuple->t_infomask & HEAP_MOVED_OFF)
@@ -470,14 +471,14 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
 
 			if (TransactionIdIsCurrentTransactionId(xvac))
-				return HeapTupleInvisible;
+				return TableTupleInvisible;
 			if (!TransactionIdIsInProgress(xvac))
 			{
 				if (TransactionIdDidCommit(xvac))
 				{
 					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
 								InvalidTransactionId);
-					return HeapTupleInvisible;
+					return TableTupleInvisible;
 				}
 				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
 							InvalidTransactionId);
@@ -491,7 +492,7 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 			if (!TransactionIdIsCurrentTransactionId(xvac))
 			{
 				if (TransactionIdIsInProgress(xvac))
-					return HeapTupleInvisible;
+					return TableTupleInvisible;
 				if (TransactionIdDidCommit(xvac))
 					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
 								InvalidTransactionId);
@@ -499,17 +500,17 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 				{
 					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
 								InvalidTransactionId);
-					return HeapTupleInvisible;
+					return TableTupleInvisible;
 				}
 			}
 		}
 		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
 		{
 			if (HeapTupleHeaderGetCmin(tuple) >= curcid)
-				return HeapTupleInvisible;	/* inserted after scan started */
+				return TableTupleInvisible;	/* inserted after scan started */
 
 			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
-				return HeapTupleMayBeUpdated;
+				return TableTupleMayBeModified;
 
 			if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
 			{
@@ -527,9 +528,9 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 				if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
 				{
 					if (MultiXactIdIsRunning(xmax, true))
-						return HeapTupleBeingUpdated;
+						return TableTupleBeingModified;
 					else
-						return HeapTupleMayBeUpdated;
+						return TableTupleMayBeModified;
 				}
 
 				/*
@@ -538,8 +539,8 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 				 * locked/updated.
 				 */
 				if (!TransactionIdIsInProgress(xmax))
-					return HeapTupleMayBeUpdated;
-				return HeapTupleBeingUpdated;
+					return TableTupleMayBeModified;
+				return TableTupleBeingModified;
 			}
 
 			if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
@@ -556,16 +557,16 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 				{
 					if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple),
 											 false))
-						return HeapTupleBeingUpdated;
-					return HeapTupleMayBeUpdated;
+						return TableTupleBeingModified;
+					return TableTupleMayBeModified;
 				}
 				else
 				{
 					if (HeapTupleHeaderGetCmax(tuple) >= curcid)
-						return HeapTupleSelfUpdated;	/* updated after scan
+						return TableTupleSelfModified;	/* updated after scan
 														 * started */
 					else
-						return HeapTupleInvisible;	/* updated before scan
+						return TableTupleInvisible;	/* updated before scan
 													 * started */
 				}
 			}
@@ -575,16 +576,16 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 				/* deleting subtransaction must have aborted */
 				SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
 							InvalidTransactionId);
-				return HeapTupleMayBeUpdated;
+				return TableTupleMayBeModified;
 			}
 
 			if (HeapTupleHeaderGetCmax(tuple) >= curcid)
-				return HeapTupleSelfUpdated;	/* updated after scan started */
+				return TableTupleSelfModified;	/* updated after scan started */
 			else
-				return HeapTupleInvisible;	/* updated before scan started */
+				return TableTupleInvisible;	/* updated before scan started */
 		}
 		else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
-			return HeapTupleInvisible;
+			return TableTupleInvisible;
 		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
 			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
 						HeapTupleHeaderGetRawXmin(tuple));
@@ -593,20 +594,23 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 			/* it must have aborted or crashed */
 			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
 						InvalidTransactionId);
-			return HeapTupleInvisible;
+			return TableTupleInvisible;
 		}
 	}
 
 	/* by here, the inserting transaction has committed */
 
 	if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid or aborted */
-		return HeapTupleMayBeUpdated;
+		return TableTupleMayBeModified;
 
 	if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
 	{
 		if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
-			return HeapTupleMayBeUpdated;
-		return HeapTupleUpdated;	/* updated by other */
+			return TableTupleMayBeModified;
+		if (ItemPointerEquals(&htup->t_self, &tuple->t_ctid))
+			return TableTupleDeleted;	/* deleted by other */
+		else
+			return TableTupleUpdated;	/* updated by other */
 	}
 
 	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
@@ -614,22 +618,22 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 		TransactionId xmax;
 
 		if (HEAP_LOCKED_UPGRADED(tuple->t_infomask))
-			return HeapTupleMayBeUpdated;
+			return TableTupleMayBeModified;
 
 		if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
 		{
 			if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), true))
-				return HeapTupleBeingUpdated;
+				return TableTupleBeingModified;
 
 			SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
-			return HeapTupleMayBeUpdated;
+			return TableTupleMayBeModified;
 		}
 
 		xmax = HeapTupleGetUpdateXid(tuple);
 		if (!TransactionIdIsValid(xmax))
 		{
 			if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
-				return HeapTupleBeingUpdated;
+				return TableTupleBeingModified;
 		}
 
 		/* not LOCKED_ONLY, so it has to have an xmax */
@@ -638,16 +642,21 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 		if (TransactionIdIsCurrentTransactionId(xmax))
 		{
 			if (HeapTupleHeaderGetCmax(tuple) >= curcid)
-				return HeapTupleSelfUpdated;	/* updated after scan started */
+				return TableTupleSelfModified;	/* updated after scan started */
 			else
-				return HeapTupleInvisible;	/* updated before scan started */
+				return TableTupleInvisible;	/* updated before scan started */
 		}
 
 		if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
-			return HeapTupleBeingUpdated;
+			return TableTupleBeingModified;
 
 		if (TransactionIdDidCommit(xmax))
-			return HeapTupleUpdated;
+		{
+			if (ItemPointerEquals(&htup->t_self, &tuple->t_ctid))
+				return TableTupleDeleted;
+			else
+				return TableTupleUpdated;
+		}
 
 		/*
 		 * By here, the update in the Xmax is either aborted or crashed, but
@@ -662,34 +671,34 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 			 */
 			SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
 						InvalidTransactionId);
-			return HeapTupleMayBeUpdated;
+			return TableTupleMayBeModified;
 		}
 		else
 		{
 			/* There are lockers running */
-			return HeapTupleBeingUpdated;
+			return TableTupleBeingModified;
 		}
 	}
 
 	if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmax(tuple)))
 	{
 		if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
-			return HeapTupleBeingUpdated;
+			return TableTupleBeingModified;
 		if (HeapTupleHeaderGetCmax(tuple) >= curcid)
-			return HeapTupleSelfUpdated;	/* updated after scan started */
+			return TableTupleSelfModified;	/* updated after scan started */
 		else
-			return HeapTupleInvisible;	/* updated before scan started */
+			return TableTupleInvisible;	/* updated before scan started */
 	}
 
 	if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
-		return HeapTupleBeingUpdated;
+		return TableTupleBeingModified;
 
 	if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
 	{
 		/* it must have aborted or crashed */
 		SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
 					InvalidTransactionId);
-		return HeapTupleMayBeUpdated;
+		return TableTupleMayBeModified;
 	}
 
 	/* xmax transaction committed */
@@ -698,12 +707,16 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 	{
 		SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
 					InvalidTransactionId);
-		return HeapTupleMayBeUpdated;
+		return TableTupleMayBeModified;
 	}
 
 	SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
 				HeapTupleHeaderGetRawXmax(tuple));
-	return HeapTupleUpdated;	/* updated by other */
+
+	if (ItemPointerEquals(&htup->t_self, &tuple->t_ctid))
+		return TableTupleDeleted;	/* deleted by other */
+	else
+		return TableTupleUpdated;	/* updated by other */
 }
 
 /*
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index cd921a46005..a40cfcf1954 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1763,7 +1763,7 @@ toast_delete_datum(Relation rel, Datum value, bool is_speculative)
 		 * Have a chunk, delete it
 		 */
 		if (is_speculative)
-			heap_abort_speculative(toastrel, toasttup);
+			heap_abort_speculative(toastrel, &toasttup->t_self);
 		else
 			simple_heap_delete(toastrel, &toasttup->t_self);
 	}
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 628d930c130..540c1f99766 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -176,6 +176,119 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc parallel_scan)
 }
 
 
+/* ----------------------------------------------------------------------------
+ * Functions to make modifications a bit simpler.
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * simple_table_insert - insert a tuple
+ *
+ * Currently, this routine differs from table_insert only in supplying a
+ * default command ID and not allowing access to the speedup options.
+ */
+void
+simple_table_insert(Relation rel, TupleTableSlot *slot)
+{
+	table_insert(rel, slot, GetCurrentCommandId(true), 0, NULL);
+}
+
+/*
+ * simple_table_delete - delete a tuple
+ *
+ * This routine may be used to delete a tuple when concurrent updates of
+ * the target tuple are not expected (for example, because we have a lock
+ * on the relation associated with the tuple).  Any failure is reported
+ * via ereport().
+ */
+void
+simple_table_delete(Relation rel, ItemPointer tid, Snapshot snapshot)
+{
+	TM_Result	result;
+	TM_FailureData tmfd;
+
+	result = table_delete(rel, tid,
+						  GetCurrentCommandId(true),
+						  snapshot, InvalidSnapshot,
+						  true /* wait for commit */ ,
+						  &tmfd, false /* changingPart */ );
+
+	switch (result)
+	{
+		case TableTupleSelfModified:
+			/* Tuple was already updated in current command? */
+			elog(ERROR, "tuple already updated by self");
+			break;
+
+		case TableTupleMayBeModified:
+			/* done successfully */
+			break;
+
+		case TableTupleUpdated:
+			elog(ERROR, "tuple concurrently updated");
+			break;
+
+		case TableTupleDeleted:
+			elog(ERROR, "tuple concurrently deleted");
+			break;
+
+		default:
+			elog(ERROR, "unrecognized table_delete status: %u", result);
+			break;
+	}
+}
+
+/*
+ * simple_table_update - replace a tuple
+ *
+ * This routine may be used to update a tuple when concurrent updates of
+ * the target tuple are not expected (for example, because we have a lock
+ * on the relation associated with the tuple).  Any failure is reported
+ * via ereport().
+ */
+void
+simple_table_update(Relation rel, ItemPointer otid,
+					TupleTableSlot *slot,
+					Snapshot snapshot,
+					bool *update_indexes)
+{
+	TM_Result	result;
+	TM_FailureData tmfd;
+	LockTupleMode lockmode;
+
+	result = table_update(rel, otid, slot,
+						  GetCurrentCommandId(true),
+						  snapshot, InvalidSnapshot,
+						  true /* wait for commit */ ,
+						  &tmfd, &lockmode, update_indexes);
+
+	switch (result)
+	{
+		case TableTupleSelfModified:
+			/* Tuple was already updated in current command? */
+			elog(ERROR, "tuple already updated by self");
+			break;
+
+		case TableTupleMayBeModified:
+			/* done successfully */
+			break;
+
+		case TableTupleUpdated:
+			elog(ERROR, "tuple concurrently updated");
+			break;
+
+		case TableTupleDeleted:
+			elog(ERROR, "tuple concurrently deleted");
+			break;
+
+		default:
+			elog(ERROR, "unrecognized table_update status: %u", result);
+			break;
+	}
+
+}
+
+
 /* ----------------------------------------------------------------------------
  * Helper functions to implement parallel scans for block oriented AMs.
  * ----------------------------------------------------------------------------
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index 3d3b82e1e58..c8592060112 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -64,6 +64,19 @@ GetTableAmRoutine(Oid amhandler)
 
 	Assert(routine->tuple_satisfies_snapshot != NULL);
 
+	Assert(routine->tuple_insert != NULL);
+
+	/*
+	 * Could be made optional, but would require throwing error during
+	 * parse-analysis.
+	 */
+	Assert(routine->tuple_insert_speculative != NULL);
+	Assert(routine->tuple_complete_speculative != NULL);
+
+	Assert(routine->tuple_delete != NULL);
+	Assert(routine->tuple_update != NULL);
+	Assert(routine->tuple_lock != NULL);
+
 	return routine;
 }
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 218a6e01cbb..705df8900ba 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3007,7 +3007,6 @@ CopyFrom(CopyState cstate)
 					/* And create index entries for it */
 					if (resultRelInfo->ri_NumIndices > 0)
 						recheckIndexes = ExecInsertIndexTuples(slot,
-															   &(tuple->t_self),
 															   estate,
 															   false,
 															   NULL,
@@ -3151,7 +3150,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 			cstate->cur_lineno = firstBufferedLineNo + i;
 			ExecStoreHeapTuple(bufferedTuples[i], myslot, false);
 			recheckIndexes =
-				ExecInsertIndexTuples(myslot, &(bufferedTuples[i]->t_self),
+				ExecInsertIndexTuples(myslot,
 									  estate, false, NULL, NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 myslot,
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 71098896947..2221188ea47 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -15,6 +15,7 @@
 
 #include "access/genam.h"
 #include "access/heapam.h"
+#include "access/tableam.h"
 #include "access/sysattr.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
@@ -3285,19 +3286,11 @@ GetTupleForTrigger(EState *estate,
 				   TupleTableSlot **newSlot)
 {
 	Relation	relation = relinfo->ri_RelationDesc;
-	HeapTuple	tuple;
-	Buffer		buffer;
-	BufferHeapTupleTableSlot *boldslot;
-
-	Assert(TTS_IS_BUFFERTUPLE(oldslot));
-	ExecClearTuple(oldslot);
-	boldslot = (BufferHeapTupleTableSlot *) oldslot;
-	tuple = &boldslot->base.tupdata;
 
 	if (newSlot != NULL)
 	{
-		HTSU_Result test;
-		HeapUpdateFailureData hufd;
+		TM_Result	test;
+		TM_FailureData tmfd;
 
 		*newSlot = NULL;
 
@@ -3307,15 +3300,15 @@ GetTupleForTrigger(EState *estate,
 		/*
 		 * lock tuple for update
 		 */
-ltrmark:;
-		tuple->t_self = *tid;
-		test = heap_lock_tuple(relation, tuple,
-							   estate->es_output_cid,
-							   lockmode, LockWaitBlock,
-							   false, &buffer, &hufd);
+		test = table_lock_tuple(relation, tid, estate->es_snapshot, oldslot,
+								estate->es_output_cid,
+								lockmode, LockWaitBlock,
+								IsolationUsesXactSnapshot() ? 0 : TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
+								&tmfd);
+
 		switch (test)
 		{
-			case HeapTupleSelfUpdated:
+			case TableTupleSelfModified:
 
 				/*
 				 * The target tuple was already updated or deleted by the
@@ -3325,73 +3318,59 @@ ltrmark:;
 				 * enumerated in ExecUpdate and ExecDelete in
 				 * nodeModifyTable.c.
 				 */
-				if (hufd.cmax != estate->es_output_cid)
+				if (tmfd.cmax != estate->es_output_cid)
 					ereport(ERROR,
 							(errcode(ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION),
 							 errmsg("tuple to be updated was already modified by an operation triggered by the current command"),
 							 errhint("Consider using an AFTER trigger instead of a BEFORE trigger to propagate changes to other rows.")));
 
 				/* treat it as deleted; do not process */
-				ReleaseBuffer(buffer);
 				return false;
 
-			case HeapTupleMayBeUpdated:
-				ExecStorePinnedBufferHeapTuple(tuple, oldslot, buffer);
-
-				break;
-
-			case HeapTupleUpdated:
-				ReleaseBuffer(buffer);
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				if (ItemPointerIndicatesMovedPartitions(&hufd.ctid))
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("tuple to be locked was already moved to another partition due to concurrent update")));
-
-				if (!ItemPointerEquals(&hufd.ctid, &tuple->t_self))
+			case TableTupleMayBeModified:
+				if (tmfd.traversed)
 				{
-					/* it was updated, so look at the updated version */
 					TupleTableSlot *epqslot;
 
 					epqslot = EvalPlanQual(estate,
 										   epqstate,
 										   relation,
 										   relinfo->ri_RangeTableIndex,
-										   lockmode,
-										   &hufd.ctid,
-										   hufd.xmax);
-					if (!TupIsNull(epqslot))
-					{
-						*tid = hufd.ctid;
+										   oldslot);
 
-						*newSlot = epqslot;
+					/*
+					 * If PlanQual failed for updated tuple - we must not
+					 * process this tuple!
+					 */
+					if (TupIsNull(epqslot))
+						return false;
 
-						/*
-						 * EvalPlanQual already locked the tuple, but we
-						 * re-call heap_lock_tuple anyway as an easy way of
-						 * re-fetching the correct tuple.  Speed is hardly a
-						 * criterion in this path anyhow.
-						 */
-						goto ltrmark;
-					}
+					*newSlot = epqslot;
 				}
+				break;
 
-				/*
-				 * if tuple was deleted or PlanQual failed for updated tuple -
-				 * we must not process this tuple!
-				 */
+			case TableTupleUpdated:
+				if (IsolationUsesXactSnapshot())
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("could not serialize access due to concurrent update")));
+				elog(ERROR, "unexpected table_lock_tuple status: %u", test);
+				break;
+
+			case TableTupleDeleted:
+				if (IsolationUsesXactSnapshot())
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("could not serialize access due to concurrent delete")));
+				/* tuple was deleted */
 				return false;
 
-			case HeapTupleInvisible:
+			case TableTupleInvisible:
 				elog(ERROR, "attempted to lock invisible tuple");
 				break;
 
 			default:
-				ReleaseBuffer(buffer);
-				elog(ERROR, "unrecognized heap_lock_tuple status: %u", test);
+				elog(ERROR, "unrecognized table_lock_tuple status: %u", test);
 				return false;	/* keep compiler quiet */
 		}
 	}
@@ -3399,6 +3378,14 @@ ltrmark:;
 	{
 		Page		page;
 		ItemId		lp;
+		Buffer		buffer;
+		BufferHeapTupleTableSlot *boldslot;
+		HeapTuple tuple;
+
+		Assert(TTS_IS_BUFFERTUPLE(oldslot));
+		ExecClearTuple(oldslot);
+		boldslot = (BufferHeapTupleTableSlot *) oldslot;
+		tuple = &boldslot->base.tupdata;
 
 		buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
 
@@ -4286,7 +4273,7 @@ AfterTriggerExecute(EState *estate,
 				LocTriggerData.tg_trigslot = ExecGetTriggerOldSlot(estate, relInfo);
 
 				ItemPointerCopy(&(event->ate_ctid1), &(tuple1.t_self));
-				if (!heap_fetch(rel, SnapshotAny, &tuple1, &buffer, false, NULL))
+				if (!heap_fetch(rel, SnapshotAny, &tuple1, &buffer, NULL))
 					elog(ERROR, "failed to fetch tuple1 for AFTER trigger");
 				ExecStorePinnedBufferHeapTuple(&tuple1,
 											   LocTriggerData.tg_trigslot,
@@ -4310,7 +4297,7 @@ AfterTriggerExecute(EState *estate,
 				LocTriggerData.tg_newslot = ExecGetTriggerNewSlot(estate, relInfo);
 
 				ItemPointerCopy(&(event->ate_ctid2), &(tuple2.t_self));
-				if (!heap_fetch(rel, SnapshotAny, &tuple2, &buffer, false, NULL))
+				if (!heap_fetch(rel, SnapshotAny, &tuple2, &buffer, NULL))
 					elog(ERROR, "failed to fetch tuple2 for AFTER trigger");
 				ExecStorePinnedBufferHeapTuple(&tuple2,
 											   LocTriggerData.tg_newslot,
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index e67dd6750c6..3b602bb8baf 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -271,12 +271,12 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  */
 List *
 ExecInsertIndexTuples(TupleTableSlot *slot,
-					  ItemPointer tupleid,
 					  EState *estate,
 					  bool noDupErr,
 					  bool *specConflict,
 					  List *arbiterIndexes)
 {
+	ItemPointer tupleid = &slot->tts_tid;
 	List	   *result = NIL;
 	ResultRelInfo *resultRelInfo;
 	int			i;
@@ -288,6 +288,8 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 	Datum		values[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
 
+	Assert(ItemPointerIsValid(tupleid));
+
 	/*
 	 * Get information from the result relation info structure.
 	 */
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 63a34760eec..3a8e852b49d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2417,27 +2417,29 @@ ExecBuildAuxRowMark(ExecRowMark *erm, List *targetlist)
 
 
 /*
- * Check a modified tuple to see if we want to process its updated version
- * under READ COMMITTED rules.
+ * Check the updated version of a tuple to see if we want to process it under
+ * READ COMMITTED rules.
  *
  *	estate - outer executor state data
  *	epqstate - state for EvalPlanQual rechecking
  *	relation - table containing tuple
  *	rti - rangetable index of table containing tuple
- *	lockmode - requested tuple lock mode
- *	*tid - t_ctid from the outdated tuple (ie, next updated version)
- *	priorXmax - t_xmax from the outdated tuple
+ *	inputslot - tuple for processing - this can be the slot from
+ *		EvalPlanQualSlot(), for the increased efficiency.
  *
- * *tid is also an output parameter: it's modified to hold the TID of the
- * latest version of the tuple (note this may be changed even on failure)
+ * This tests whether the tuple in inputslot still matches the relvant
+ * quals. For that result to be useful, typically the input tuple has to be
+ * last row version (otherwise the result isn't particularly useful) and
+ * locked (otherwise the result might be out of date). That's typically
+ * achieved by using table_lock_tuple() with the
+ * TUPLE_LOCK_FLAG_FIND_LAST_VERSION flag..
  *
  * Returns a slot containing the new candidate update/delete tuple, or
  * NULL if we determine we shouldn't process the row.
  */
 TupleTableSlot *
 EvalPlanQual(EState *estate, EPQState *epqstate,
-			 Relation relation, Index rti, LockTupleMode lockmode,
-			 ItemPointer tid, TransactionId priorXmax)
+			 Relation relation, Index rti, TupleTableSlot *inputslot)
 {
 	TupleTableSlot *slot;
 	TupleTableSlot *testslot;
@@ -2450,19 +2452,12 @@ EvalPlanQual(EState *estate, EPQState *epqstate,
 	EvalPlanQualBegin(epqstate, estate);
 
 	/*
-	 * Get and lock the updated version of the row; if fail, return NULL.
+	 * Callers will often use the EvalPlanQualSlot to store the tuple to avoid
+	 * an unnecessary copy.
 	 */
 	testslot = EvalPlanQualSlot(epqstate, relation, rti);
-	if (!EvalPlanQualFetch(estate, relation, lockmode, LockWaitBlock,
-						   tid, priorXmax,
-						   testslot))
-		return NULL;
-
-	/*
-	 * For UPDATE/DELETE we have to return tid of actual row we're executing
-	 * PQ for.
-	 */
-	*tid = testslot->tts_tid;
+	if (testslot != inputslot)
+		ExecCopySlot(testslot, inputslot);
 
 	/*
 	 * Fetch any non-locked source rows
@@ -2494,258 +2489,6 @@ EvalPlanQual(EState *estate, EPQState *epqstate,
 	return slot;
 }
 
-/*
- * Fetch a copy of the newest version of an outdated tuple
- *
- *	estate - executor state data
- *	relation - table containing tuple
- *	lockmode - requested tuple lock mode
- *	wait_policy - requested lock wait policy
- *	*tid - t_ctid from the outdated tuple (ie, next updated version)
- *	priorXmax - t_xmax from the outdated tuple
- *	slot - slot to store newest tuple version
- *
- * Returns true, with slot containing the newest tuple version, or false if we
- * find that there is no newest version (ie, the row was deleted not updated).
- * We also return false if the tuple is locked and the wait policy is to skip
- * such tuples.
- *
- * If successful, we have locked the newest tuple version, so caller does not
- * need to worry about it changing anymore.
- */
-bool
-EvalPlanQualFetch(EState *estate, Relation relation, LockTupleMode lockmode,
-				  LockWaitPolicy wait_policy,
-				  ItemPointer tid, TransactionId priorXmax,
-				  TupleTableSlot *slot)
-{
-	HeapTupleData tuple;
-	SnapshotData SnapshotDirty;
-
-	/*
-	 * fetch target tuple
-	 *
-	 * Loop here to deal with updated or busy tuples
-	 */
-	InitDirtySnapshot(SnapshotDirty);
-	tuple.t_self = *tid;
-	for (;;)
-	{
-		Buffer		buffer;
-
-		if (heap_fetch(relation, &SnapshotDirty, &tuple, &buffer, true, NULL))
-		{
-			HTSU_Result test;
-			HeapUpdateFailureData hufd;
-
-			/*
-			 * If xmin isn't what we're expecting, the slot must have been
-			 * recycled and reused for an unrelated tuple.  This implies that
-			 * the latest version of the row was deleted, so we need do
-			 * nothing.  (Should be safe to examine xmin without getting
-			 * buffer's content lock.  We assume reading a TransactionId to be
-			 * atomic, and Xmin never changes in an existing tuple, except to
-			 * invalid or frozen, and neither of those can match priorXmax.)
-			 */
-			if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
-									 priorXmax))
-			{
-				ReleaseBuffer(buffer);
-				return false;
-			}
-
-			/* otherwise xmin should not be dirty... */
-			if (TransactionIdIsValid(SnapshotDirty.xmin))
-				elog(ERROR, "t_xmin is uncommitted in tuple to be updated");
-
-			/*
-			 * If tuple is being updated by other transaction then we have to
-			 * wait for its commit/abort, or die trying.
-			 */
-			if (TransactionIdIsValid(SnapshotDirty.xmax))
-			{
-				ReleaseBuffer(buffer);
-				switch (wait_policy)
-				{
-					case LockWaitBlock:
-						XactLockTableWait(SnapshotDirty.xmax,
-										  relation, &tuple.t_self,
-										  XLTW_FetchUpdated);
-						break;
-					case LockWaitSkip:
-						if (!ConditionalXactLockTableWait(SnapshotDirty.xmax))
-							return false;	/* skip instead of waiting */
-						break;
-					case LockWaitError:
-						if (!ConditionalXactLockTableWait(SnapshotDirty.xmax))
-							ereport(ERROR,
-									(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
-									 errmsg("could not obtain lock on row in relation \"%s\"",
-											RelationGetRelationName(relation))));
-						break;
-				}
-				continue;		/* loop back to repeat heap_fetch */
-			}
-
-			/*
-			 * If tuple was inserted by our own transaction, we have to check
-			 * cmin against es_output_cid: cmin >= current CID means our
-			 * command cannot see the tuple, so we should ignore it. Otherwise
-			 * heap_lock_tuple() will throw an error, and so would any later
-			 * attempt to update or delete the tuple.  (We need not check cmax
-			 * because HeapTupleSatisfiesDirty will consider a tuple deleted
-			 * by our transaction dead, regardless of cmax.) We just checked
-			 * that priorXmax == xmin, so we can test that variable instead of
-			 * doing HeapTupleHeaderGetXmin again.
-			 */
-			if (TransactionIdIsCurrentTransactionId(priorXmax) &&
-				HeapTupleHeaderGetCmin(tuple.t_data) >= estate->es_output_cid)
-			{
-				ReleaseBuffer(buffer);
-				return false;
-			}
-
-			/*
-			 * This is a live tuple, so now try to lock it.
-			 */
-			test = heap_lock_tuple(relation, &tuple,
-								   estate->es_output_cid,
-								   lockmode, wait_policy,
-								   false, &buffer, &hufd);
-			/* We now have two pins on the buffer, get rid of one */
-			ReleaseBuffer(buffer);
-
-			switch (test)
-			{
-				case HeapTupleSelfUpdated:
-
-					/*
-					 * The target tuple was already updated or deleted by the
-					 * current command, or by a later command in the current
-					 * transaction.  We *must* ignore the tuple in the former
-					 * case, so as to avoid the "Halloween problem" of
-					 * repeated update attempts.  In the latter case it might
-					 * be sensible to fetch the updated tuple instead, but
-					 * doing so would require changing heap_update and
-					 * heap_delete to not complain about updating "invisible"
-					 * tuples, which seems pretty scary (heap_lock_tuple will
-					 * not complain, but few callers expect
-					 * HeapTupleInvisible, and we're not one of them).  So for
-					 * now, treat the tuple as deleted and do not process.
-					 */
-					ReleaseBuffer(buffer);
-					return false;
-
-				case HeapTupleMayBeUpdated:
-					/* successfully locked */
-					break;
-
-				case HeapTupleUpdated:
-					ReleaseBuffer(buffer);
-					if (IsolationUsesXactSnapshot())
-						ereport(ERROR,
-								(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-								 errmsg("could not serialize access due to concurrent update")));
-					if (ItemPointerIndicatesMovedPartitions(&hufd.ctid))
-						ereport(ERROR,
-								(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-								 errmsg("tuple to be locked was already moved to another partition due to concurrent update")));
-
-					/* Should not encounter speculative tuple on recheck */
-					Assert(!HeapTupleHeaderIsSpeculative(tuple.t_data));
-					if (!ItemPointerEquals(&hufd.ctid, &tuple.t_self))
-					{
-						/* it was updated, so look at the updated version */
-						tuple.t_self = hufd.ctid;
-						/* updated row should have xmin matching this xmax */
-						priorXmax = hufd.xmax;
-						continue;
-					}
-					/* tuple was deleted, so give up */
-					return false;
-
-				case HeapTupleWouldBlock:
-					ReleaseBuffer(buffer);
-					return false;
-
-				case HeapTupleInvisible:
-					elog(ERROR, "attempted to lock invisible tuple");
-					break;
-
-				default:
-					ReleaseBuffer(buffer);
-					elog(ERROR, "unrecognized heap_lock_tuple status: %u",
-						 test);
-					return false;	/* keep compiler quiet */
-			}
-
-			/*
-			 * We got tuple - store it for use by the recheck query.
-			 */
-			ExecStorePinnedBufferHeapTuple(&tuple, slot, buffer);
-			ExecMaterializeSlot(slot);
-			break;
-		}
-
-		/*
-		 * If the referenced slot was actually empty, the latest version of
-		 * the row must have been deleted, so we need do nothing.
-		 */
-		if (tuple.t_data == NULL)
-		{
-			ReleaseBuffer(buffer);
-			return false;
-		}
-
-		/*
-		 * As above, if xmin isn't what we're expecting, do nothing.
-		 */
-		if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
-								 priorXmax))
-		{
-			ReleaseBuffer(buffer);
-			return false;
-		}
-
-		/*
-		 * If we get here, the tuple was found but failed SnapshotDirty.
-		 * Assuming the xmin is either a committed xact or our own xact (as it
-		 * certainly should be if we're trying to modify the tuple), this must
-		 * mean that the row was updated or deleted by either a committed xact
-		 * or our own xact.  If it was deleted, we can ignore it; if it was
-		 * updated then chain up to the next version and repeat the whole
-		 * process.
-		 *
-		 * As above, it should be safe to examine xmax and t_ctid without the
-		 * buffer content lock, because they can't be changing.
-		 */
-
-		/* check whether next version would be in a different partition */
-		if (HeapTupleHeaderIndicatesMovedPartitions(tuple.t_data))
-			ereport(ERROR,
-					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-					 errmsg("tuple to be locked was already moved to another partition due to concurrent update")));
-
-		/* check whether tuple has been deleted */
-		if (ItemPointerEquals(&tuple.t_self, &tuple.t_data->t_ctid))
-		{
-			/* deleted, so forget about it */
-			ReleaseBuffer(buffer);
-			return false;
-		}
-
-		/* updated, so look at the updated row */
-		tuple.t_self = tuple.t_data->t_ctid;
-		/* updated row should have xmin matching this xmax */
-		priorXmax = HeapTupleHeaderGetUpdateXid(tuple.t_data);
-		ReleaseBuffer(buffer);
-		/* loop back to fetch next in chain */
-	}
-
-	/* signal success */
-	return true;
-}
-
 /*
  * EvalPlanQualInit -- initialize during creation of a plan state node
  * that might need to invoke EPQ processing.
@@ -2911,7 +2654,7 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
 
 				tuple.t_self = *((ItemPointer) DatumGetPointer(datum));
 				if (!heap_fetch(erm->relation, SnapshotAny, &tuple, &buffer,
-								false, NULL))
+								NULL))
 					elog(ERROR, "failed to fetch tuple for EvalPlanQual recheck");
 
 				/* successful, store tuple */
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 95dfc4987de..c8bdc224803 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -15,7 +15,6 @@
 #include "postgres.h"
 
 #include "access/genam.h"
-#include "access/heapam.h"
 #include "access/relscan.h"
 #include "access/tableam.h"
 #include "access/transam.h"
@@ -166,35 +165,28 @@ retry:
 	/* Found tuple, try to lock it in the lockmode. */
 	if (found)
 	{
-		Buffer		buf;
-		HeapUpdateFailureData hufd;
-		HTSU_Result res;
-		HeapTupleData locktup;
-		HeapTupleTableSlot *hslot = (HeapTupleTableSlot *)outslot;
-
-		/* Only a heap tuple has item pointers. */
-		Assert(TTS_IS_HEAPTUPLE(outslot) || TTS_IS_BUFFERTUPLE(outslot));
-		ItemPointerCopy(&hslot->tuple->t_self, &locktup.t_self);
+		TM_FailureData tmfd;
+		TM_Result res;
 
 		PushActiveSnapshot(GetLatestSnapshot());
 
-		res = heap_lock_tuple(rel, &locktup, GetCurrentCommandId(false),
-							  lockmode,
-							  LockWaitBlock,
-							  false /* don't follow updates */ ,
-							  &buf, &hufd);
-		/* the tuple slot already has the buffer pinned */
-		ReleaseBuffer(buf);
+		res = table_lock_tuple(rel, &(outslot->tts_tid), GetLatestSnapshot(),
+							   outslot,
+							   GetCurrentCommandId(false),
+							   lockmode,
+							   LockWaitBlock,
+							   0 /* don't follow updates */ ,
+							   &tmfd);
 
 		PopActiveSnapshot();
 
 		switch (res)
 		{
-			case HeapTupleMayBeUpdated:
+			case TableTupleMayBeModified:
 				break;
-			case HeapTupleUpdated:
+			case TableTupleUpdated:
 				/* XXX: Improve handling here */
-				if (ItemPointerIndicatesMovedPartitions(&hufd.ctid))
+				if (ItemPointerIndicatesMovedPartitions(&tmfd.ctid))
 					ereport(LOG,
 							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 							 errmsg("tuple to be locked was already moved to another partition due to concurrent update, retrying")));
@@ -203,11 +195,17 @@ retry:
 							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 							 errmsg("concurrent update, retrying")));
 				goto retry;
-			case HeapTupleInvisible:
+			case TableTupleDeleted:
+				/* XXX: Improve handling here */
+				ereport(LOG,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("concurrent delete, retrying")));
+				goto retry;
+			case TableTupleInvisible:
 				elog(ERROR, "attempted to lock invisible tuple");
 				break;
 			default:
-				elog(ERROR, "unexpected heap_lock_tuple status: %u", res);
+				elog(ERROR, "unexpected table_lock_tuple status: %u", res);
 				break;
 		}
 	}
@@ -330,35 +328,28 @@ retry:
 	/* Found tuple, try to lock it in the lockmode. */
 	if (found)
 	{
-		Buffer		buf;
-		HeapUpdateFailureData hufd;
-		HTSU_Result res;
-		HeapTupleData locktup;
-		HeapTupleTableSlot *hslot = (HeapTupleTableSlot *)outslot;
-
-		/* Only a heap tuple has item pointers. */
-		Assert(TTS_IS_HEAPTUPLE(outslot) || TTS_IS_BUFFERTUPLE(outslot));
-		ItemPointerCopy(&hslot->tuple->t_self, &locktup.t_self);
+		TM_FailureData tmfd;
+		TM_Result res;
 
 		PushActiveSnapshot(GetLatestSnapshot());
 
-		res = heap_lock_tuple(rel, &locktup, GetCurrentCommandId(false),
-							  lockmode,
-							  LockWaitBlock,
-							  false /* don't follow updates */ ,
-							  &buf, &hufd);
-		/* the tuple slot already has the buffer pinned */
-		ReleaseBuffer(buf);
+		res = table_lock_tuple(rel, &(outslot->tts_tid), GetLatestSnapshot(),
+							   outslot,
+							   GetCurrentCommandId(false),
+							   lockmode,
+							   LockWaitBlock,
+							   0 /* don't follow updates */ ,
+							   &tmfd);
 
 		PopActiveSnapshot();
 
 		switch (res)
 		{
-			case HeapTupleMayBeUpdated:
+			case TableTupleMayBeModified:
 				break;
-			case HeapTupleUpdated:
+			case TableTupleUpdated:
 				/* XXX: Improve handling here */
-				if (ItemPointerIndicatesMovedPartitions(&hufd.ctid))
+				if (ItemPointerIndicatesMovedPartitions(&tmfd.ctid))
 					ereport(LOG,
 							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 							 errmsg("tuple to be locked was already moved to another partition due to concurrent update, retrying")));
@@ -367,11 +358,17 @@ retry:
 							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 							 errmsg("concurrent update, retrying")));
 				goto retry;
-			case HeapTupleInvisible:
+			case TableTupleDeleted:
+				/* XXX: Improve handling here */
+				ereport(LOG,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("concurrent delete, retrying")));
+				goto retry;
+			case TableTupleInvisible:
 				elog(ERROR, "attempted to lock invisible tuple");
 				break;
 			default:
-				elog(ERROR, "unexpected heap_lock_tuple status: %u", res);
+				elog(ERROR, "unexpected table_lock_tuple status: %u", res);
 				break;
 		}
 	}
@@ -392,7 +389,6 @@ void
 ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	HeapTuple	tuple;
 	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 
@@ -419,16 +415,11 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		if (resultRelInfo->ri_PartitionCheck)
 			ExecPartitionCheck(resultRelInfo, slot, estate, true);
 
-		/* Materialize slot into a tuple that we can scribble upon. */
-		tuple = ExecFetchSlotHeapTuple(slot, true, NULL);
-
 		/* OK, store the tuple and create index entries for it */
-		simple_heap_insert(rel, tuple);
-		ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+		simple_table_insert(resultRelInfo->ri_RelationDesc, slot);
 
 		if (resultRelInfo->ri_NumIndices > 0)
-			recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
-												   estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW INSERT Triggers */
@@ -456,13 +447,9 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	HeapTuple	tuple;
 	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
-	HeapTupleTableSlot *hsearchslot = (HeapTupleTableSlot *)searchslot;
-
-	/* We expect the searchslot to contain a heap tuple. */
-	Assert(TTS_IS_HEAPTUPLE(searchslot) || TTS_IS_BUFFERTUPLE(searchslot));
+	ItemPointer tid = &(searchslot->tts_tid);
 
 	/* For now we support only tables. */
 	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
@@ -474,14 +461,14 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		resultRelInfo->ri_TrigDesc->trig_update_before_row)
 	{
 		if (!ExecBRUpdateTriggers(estate, epqstate, resultRelInfo,
-								  &hsearchslot->tuple->t_self,
-								  NULL, slot))
+								  tid, NULL, slot))
 			skip_tuple = true;		/* "do nothing" */
 	}
 
 	if (!skip_tuple)
 	{
 		List	   *recheckIndexes = NIL;
+		bool		update_indexes;
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -489,23 +476,16 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		if (resultRelInfo->ri_PartitionCheck)
 			ExecPartitionCheck(resultRelInfo, slot, estate, true);
 
-		/* Materialize slot into a tuple that we can scribble upon. */
-		tuple = ExecFetchSlotHeapTuple(slot, true, NULL);
+		simple_table_update(rel, tid, slot,
+							estate->es_snapshot, &update_indexes);
 
-		/* OK, update the tuple and index entries for it */
-		simple_heap_update(rel, &hsearchslot->tuple->t_self, tuple);
-		ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
-
-		if (resultRelInfo->ri_NumIndices > 0 &&
-			!HeapTupleIsHeapOnly(tuple))
-			recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
-												   estate, false, NULL,
+		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
+			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW UPDATE Triggers */
 		ExecARUpdateTriggers(estate, resultRelInfo,
-							 &(tuple->t_self),
-							 NULL, slot,
+							 tid, NULL, slot,
 							 recheckIndexes, NULL);
 
 		list_free(recheckIndexes);
@@ -525,11 +505,7 @@ ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
 	bool		skip_tuple = false;
 	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
-	HeapTupleTableSlot *hsearchslot = (HeapTupleTableSlot *)searchslot;
-
-	/* For now we support only tables and heap tuples. */
-	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
-	Assert(TTS_IS_HEAPTUPLE(searchslot) || TTS_IS_BUFFERTUPLE(searchslot));
+	ItemPointer tid = &searchslot->tts_tid;
 
 	CheckCmdReplicaIdentity(rel, CMD_DELETE);
 
@@ -538,23 +514,18 @@ ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
 		resultRelInfo->ri_TrigDesc->trig_delete_before_row)
 	{
 		skip_tuple = !ExecBRDeleteTriggers(estate, epqstate, resultRelInfo,
-										   &hsearchslot->tuple->t_self,
-										   NULL, NULL);
+										   tid, NULL, NULL);
 
 	}
 
 	if (!skip_tuple)
 	{
-		List	   *recheckIndexes = NIL;
-
 		/* OK, delete the tuple */
-		simple_heap_delete(rel, &hsearchslot->tuple->t_self);
+		simple_table_delete(rel, tid, estate->es_snapshot);
 
 		/* AFTER ROW DELETE Triggers */
 		ExecARDeleteTriggers(estate, resultRelInfo,
-							 &hsearchslot->tuple->t_self, NULL, NULL);
-
-		list_free(recheckIndexes);
+							 tid, NULL, NULL);
 	}
 }
 
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 76f0f9d66e5..cfa258fed96 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -23,6 +23,7 @@
 
 #include "access/heapam.h"
 #include "access/htup_details.h"
+#include "access/tableam.h"
 #include "access/xact.h"
 #include "executor/executor.h"
 #include "executor/nodeLockRows.h"
@@ -82,11 +83,11 @@ lnext:
 		ExecRowMark *erm = aerm->rowmark;
 		Datum		datum;
 		bool		isNull;
-		HeapTupleData tuple;
-		Buffer		buffer;
-		HeapUpdateFailureData hufd;
+		ItemPointerData tid;
+		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		HTSU_Result test;
+		int			lockflags = 0;
+		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -112,6 +113,7 @@ lnext:
 				/* this child is inactive right now */
 				erm->ermActive = false;
 				ItemPointerSetInvalid(&(erm->curCtid));
+				ExecClearTuple(markSlot);
 				continue;
 			}
 		}
@@ -160,8 +162,8 @@ lnext:
 			continue;
 		}
 
-		/* okay, try to lock the tuple */
-		tuple.t_self = *((ItemPointer) DatumGetPointer(datum));
+		/* okay, try to lock (and fetch) the tuple */
+		tid = *((ItemPointer) DatumGetPointer(datum));
 		switch (erm->markType)
 		{
 			case ROW_MARK_EXCLUSIVE:
@@ -182,18 +184,23 @@ lnext:
 				break;
 		}
 
-		test = heap_lock_tuple(erm->relation, &tuple,
-							   estate->es_output_cid,
-							   lockmode, erm->waitPolicy, true,
-							   &buffer, &hufd);
-		ReleaseBuffer(buffer);
+		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+		if (!IsolationUsesXactSnapshot())
+			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+		test = table_lock_tuple(erm->relation, &tid, estate->es_snapshot,
+								markSlot, estate->es_output_cid,
+								lockmode, erm->waitPolicy,
+								lockflags,
+								&tmfd);
+
 		switch (test)
 		{
-			case HeapTupleWouldBlock:
+			case TableTupleWouldBlock:
 				/* couldn't lock tuple in SKIP LOCKED mode */
 				goto lnext;
 
-			case HeapTupleSelfUpdated:
+			case TableTupleSelfModified:
 
 				/*
 				 * The target tuple was already updated or deleted by the
@@ -204,65 +211,48 @@ lnext:
 				 * to fetch the updated tuple instead, but doing so would
 				 * require changing heap_update and heap_delete to not
 				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (heap_lock_tuple will not complain, but few
-				 * callers expect HeapTupleInvisible, and we're not one of
+				 * pretty scary (table_lock_tuple will not complain, but few
+				 * callers expect TableTupleInvisible, and we're not one of
 				 * them).  So for now, treat the tuple as deleted and do not
 				 * process.
 				 */
 				goto lnext;
 
-			case HeapTupleMayBeUpdated:
-				/* got the lock successfully */
+			case TableTupleMayBeModified:
+				/*
+				 * Got the lock successfully, the locked tuple saved in
+				 * markSlot for, if needed, EvalPlanQual testing below.
+				 */
+				if (tmfd.traversed)
+					epq_needed = true;
 				break;
 
-			case HeapTupleUpdated:
+			case TableTupleUpdated:
 				if (IsolationUsesXactSnapshot())
 					ereport(ERROR,
 							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 							 errmsg("could not serialize access due to concurrent update")));
-				if (ItemPointerIndicatesMovedPartitions(&hufd.ctid))
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("tuple to be locked was already moved to another partition due to concurrent update")));
-
-				if (ItemPointerEquals(&hufd.ctid, &tuple.t_self))
-				{
-					/* Tuple was deleted, so don't return it */
-					goto lnext;
-				}
-
-				/* updated, so fetch and lock the updated version */
-				if (!EvalPlanQualFetch(estate, erm->relation,
-									   lockmode, erm->waitPolicy,
-									   &hufd.ctid, hufd.xmax,
-									   markSlot))
-				{
-					/*
-					 * Tuple was deleted; or it's locked and we're under SKIP
-					 * LOCKED policy, so don't return it
-					 */
-					goto lnext;
-				}
-				/* remember the actually locked tuple's TID */
-				tuple.t_self = markSlot->tts_tid;
-
-				/* Remember we need to do EPQ testing */
-				epq_needed = true;
-
-				/* Continue loop until we have all target tuples */
+				elog(ERROR, "unexpected table_lock_tuple status: %u", test);
 				break;
 
-			case HeapTupleInvisible:
+			case TableTupleDeleted:
+				if (IsolationUsesXactSnapshot())
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("could not serialize access due to concurrent update")));
+				/* tuple was deleted so don't return it */
+				goto lnext;
+
+			case TableTupleInvisible:
 				elog(ERROR, "attempted to lock invisible tuple");
 				break;
 
 			default:
-				elog(ERROR, "unrecognized heap_lock_tuple status: %u",
-					 test);
+				elog(ERROR, "unrecognized table_lock_tuple status: %u", test);
 		}
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
-		erm->curCtid = tuple.t_self;
+		erm->curCtid = tid;
 	}
 
 	/*
@@ -270,49 +260,6 @@ lnext:
 	 */
 	if (epq_needed)
 	{
-		/*
-		 * Fetch a copy of any rows that were successfully locked without any
-		 * update having occurred.  (We do this in a separate pass so as to
-		 * avoid overhead in the common case where there are no concurrent
-		 * updates.)  Make sure any inactive child rels have NULL test tuples
-		 * in EPQ.
-		 */
-		foreach(lc, node->lr_arowMarks)
-		{
-			ExecAuxRowMark *aerm = (ExecAuxRowMark *) lfirst(lc);
-			ExecRowMark *erm = aerm->rowmark;
-			TupleTableSlot *markSlot;
-			HeapTupleData tuple;
-			Buffer buffer;
-
-			markSlot = EvalPlanQualSlot(&node->lr_epqstate, erm->relation, erm->rti);
-
-			/* skip non-active child tables, but clear their test tuples */
-			if (!erm->ermActive)
-			{
-				Assert(erm->rti != erm->prti);	/* check it's child table */
-				ExecClearTuple(markSlot);
-				continue;
-			}
-
-			/* was tuple updated and fetched above? */
-			if (!TupIsNull(markSlot))
-				continue;
-
-			/* foreign tables should have been fetched above */
-			Assert(erm->relation->rd_rel->relkind != RELKIND_FOREIGN_TABLE);
-			Assert(ItemPointerIsValid(&(erm->curCtid)));
-
-			/* okay, fetch the tuple */
-			tuple.t_self = erm->curCtid;
-			if (!heap_fetch(erm->relation, SnapshotAny, &tuple, &buffer,
-							false, NULL))
-				elog(ERROR, "failed to fetch tuple for EvalPlanQual recheck");
-			ExecStorePinnedBufferHeapTuple(&tuple, markSlot, buffer);
-			ExecMaterializeSlot(markSlot);
-			/* successful, use tuple in slot */
-		}
-
 		/*
 		 * Now fetch any non-locked source rows --- the EPQ logic knows how to
 		 * do that.
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index fa92db130bb..c106d437a7b 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -190,31 +190,33 @@ ExecProcessReturning(ResultRelInfo *resultRelInfo,
  */
 static void
 ExecCheckHeapTupleVisible(EState *estate,
-						  HeapTuple tuple,
-						  Buffer buffer)
+						  Relation rel,
+						  TupleTableSlot *slot)
 {
 	if (!IsolationUsesXactSnapshot())
 		return;
 
-	/*
-	 * We need buffer pin and lock to call HeapTupleSatisfiesVisibility.
-	 * Caller should be holding pin, but not lock.
-	 */
-	LockBuffer(buffer, BUFFER_LOCK_SHARE);
-	if (!HeapTupleSatisfiesVisibility(tuple, estate->es_snapshot, buffer))
+	if (!table_tuple_satisfies_snapshot(rel, slot, estate->es_snapshot))
 	{
+		Datum		xminDatum;
+		TransactionId xmin;
+		bool		isnull;
+
+		xminDatum = slot_getsysattr(slot, MinTransactionIdAttributeNumber, &isnull);
+		Assert(!isnull);
+		xmin = DatumGetTransactionId(xminDatum);
+
 		/*
 		 * We should not raise a serialization failure if the conflict is
 		 * against a tuple inserted by our own transaction, even if it's not
 		 * visible to our snapshot.  (This would happen, for example, if
 		 * conflicting keys are proposed for insertion in a single command.)
 		 */
-		if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
+		if (!TransactionIdIsCurrentTransactionId(xmin))
 			ereport(ERROR,
 					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 					 errmsg("could not serialize access due to concurrent update")));
 	}
-	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 }
 
 /*
@@ -223,7 +225,8 @@ ExecCheckHeapTupleVisible(EState *estate,
 static void
 ExecCheckTIDVisible(EState *estate,
 					ResultRelInfo *relinfo,
-					ItemPointer tid)
+					ItemPointer tid,
+					TupleTableSlot *tempSlot)
 {
 	Relation	rel = relinfo->ri_RelationDesc;
 	Buffer		buffer;
@@ -234,10 +237,10 @@ ExecCheckTIDVisible(EState *estate,
 		return;
 
 	tuple.t_self = *tid;
-	if (!heap_fetch(rel, SnapshotAny, &tuple, &buffer, false, NULL))
+	if (!heap_fetch(rel, SnapshotAny, &tuple, &buffer, NULL))
 		elog(ERROR, "failed to fetch conflicting tuple for ON CONFLICT");
-	ExecCheckHeapTupleVisible(estate, &tuple, buffer);
-	ReleaseBuffer(buffer);
+	ExecStorePinnedBufferHeapTuple(&tuple, tempSlot, buffer);
+	ExecCheckHeapTupleVisible(estate, rel, tempSlot);
 }
 
 /* ----------------------------------------------------------------
@@ -319,7 +322,6 @@ ExecInsert(ModifyTableState *mtstate,
 	else
 	{
 		WCOKind		wco_kind;
-		HeapTuple	inserttuple;
 
 		/*
 		 * Constraints might reference the tableoid column, so (re-)initialize
@@ -417,16 +419,19 @@ ExecInsert(ModifyTableState *mtstate,
 					 * In case of ON CONFLICT DO NOTHING, do nothing. However,
 					 * verify that the tuple is visible to the executor's MVCC
 					 * snapshot at higher isolation levels.
+					 *
+					 * FIXME: Either comment or replace usage of
+					 * ExecGetReturningSlot(). Need a slot that's compatible
+					 * with the resultRelInfo table.
 					 */
 					Assert(onconflict == ONCONFLICT_NOTHING);
-					ExecCheckTIDVisible(estate, resultRelInfo, &conflictTid);
+					ExecCheckTIDVisible(estate, resultRelInfo, &conflictTid,
+										ExecGetReturningSlot(estate, resultRelInfo));
 					InstrCountTuples2(&mtstate->ps, 1);
 					return NULL;
 				}
 			}
 
-			inserttuple = ExecFetchSlotHeapTuple(slot, true, NULL);
-
 			/*
 			 * Before we start insertion proper, acquire our "speculative
 			 * insertion lock".  Others can use that to wait for us to decide
@@ -434,26 +439,22 @@ ExecInsert(ModifyTableState *mtstate,
 			 * waiting for the whole transaction to complete.
 			 */
 			specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
-			HeapTupleHeaderSetSpeculativeToken(inserttuple->t_data, specToken);
 
 			/* insert the tuple, with the speculative token */
-			heap_insert(resultRelationDesc, inserttuple,
-						estate->es_output_cid,
-						HEAP_INSERT_SPECULATIVE,
-						NULL);
-			slot->tts_tableOid = RelationGetRelid(resultRelationDesc);
-			ItemPointerCopy(&inserttuple->t_self, &slot->tts_tid);
+			table_insert_speculative(resultRelationDesc, slot,
+									 estate->es_output_cid,
+									 0,
+									 NULL,
+									 specToken);
 
 			/* insert index entries for tuple */
-			recheckIndexes = ExecInsertIndexTuples(slot, &(inserttuple->t_self),
+			recheckIndexes = ExecInsertIndexTuples(slot,
 												   estate, true, &specConflict,
 												   arbiterIndexes);
 
 			/* adjust the tuple's state accordingly */
-			if (!specConflict)
-				heap_finish_speculative(resultRelationDesc, inserttuple);
-			else
-				heap_abort_speculative(resultRelationDesc, inserttuple);
+			table_complete_speculative(resultRelationDesc, slot,
+									   specToken, specConflict);
 
 			/*
 			 * Wake up anyone waiting for our decision.  They will re-check
@@ -479,23 +480,14 @@ ExecInsert(ModifyTableState *mtstate,
 		}
 		else
 		{
-			/*
-			 * insert the tuple normally.
-			 *
-			 * Note: heap_insert returns the tid (location) of the new tuple
-			 * in the t_self field.
-			 */
-			inserttuple = ExecFetchSlotHeapTuple(slot, true, NULL);
-			heap_insert(resultRelationDesc, inserttuple,
-						estate->es_output_cid,
-						0, NULL);
-			slot->tts_tableOid = RelationGetRelid(resultRelationDesc);
-			ItemPointerCopy(&inserttuple->t_self, &slot->tts_tid);
+			/* insert the tuple normally */
+			table_insert(resultRelationDesc, slot,
+						 estate->es_output_cid,
+						 0, NULL);
 
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(slot, &(inserttuple->t_self),
-													   estate, false, NULL,
+				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
 													   NIL);
 		}
 	}
@@ -594,8 +586,8 @@ ExecDelete(ModifyTableState *mtstate,
 {
 	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
-	HTSU_Result result;
-	HeapUpdateFailureData hufd;
+	TM_Result result;
+	TM_FailureData tmfd;
 	TupleTableSlot *slot = NULL;
 	TransitionCaptureState *ar_delete_trig_tcs;
 
@@ -671,15 +663,17 @@ ExecDelete(ModifyTableState *mtstate,
 		 * mode transactions.
 		 */
 ldelete:;
-		result = heap_delete(resultRelationDesc, tupleid,
-							 estate->es_output_cid,
-							 estate->es_crosscheck_snapshot,
-							 true /* wait for commit */ ,
-							 &hufd,
-							 changingPart);
+		result = table_delete(resultRelationDesc, tupleid,
+							  estate->es_output_cid,
+							  estate->es_snapshot,
+							  estate->es_crosscheck_snapshot,
+							  true /* wait for commit */ ,
+							  &tmfd,
+							  changingPart);
+
 		switch (result)
 		{
-			case HeapTupleSelfUpdated:
+			case TableTupleSelfModified:
 
 				/*
 				 * The target tuple was already updated or deleted by the
@@ -705,7 +699,7 @@ ldelete:;
 				 * can re-execute the DELETE and then return NULL to cancel
 				 * the outer delete.
 				 */
-				if (hufd.cmax != estate->es_output_cid)
+				if (tmfd.cmax != estate->es_output_cid)
 					ereport(ERROR,
 							(errcode(ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION),
 							 errmsg("tuple to be updated was already modified by an operation triggered by the current command"),
@@ -714,52 +708,97 @@ ldelete:;
 				/* Else, already deleted by self; nothing to do */
 				return NULL;
 
-			case HeapTupleMayBeUpdated:
+			case TableTupleMayBeModified:
 				break;
 
-			case HeapTupleUpdated:
+			case TableTupleUpdated:
+				{
+					TupleTableSlot *inputslot;
+					TupleTableSlot *epqslot;
+
+					if (IsolationUsesXactSnapshot())
+						ereport(ERROR,
+								(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+								 errmsg("could not serialize access due to concurrent update")));
+
+					/*
+					 * Already know that we're going to need to do EPQ, so
+					 * fetch tuple directly into the right slot.
+					 */
+					EvalPlanQualBegin(epqstate, estate);
+					inputslot = EvalPlanQualSlot(epqstate, resultRelationDesc,
+												 resultRelInfo->ri_RangeTableIndex);
+
+					result = table_lock_tuple(resultRelationDesc, tupleid,
+											  estate->es_snapshot,
+											  inputslot, estate->es_output_cid,
+											  LockTupleExclusive, LockWaitBlock,
+											  TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
+											  &tmfd);
+
+					switch (result)
+					{
+						case TableTupleMayBeModified:
+							Assert(tmfd.traversed);
+							epqslot = EvalPlanQual(estate,
+												   epqstate,
+												   resultRelationDesc,
+												   resultRelInfo->ri_RangeTableIndex,
+												   inputslot);
+							if (TupIsNull(epqslot))
+								/* Tuple not passing quals anymore, exiting... */
+								return NULL;
+
+							/*
+							 * If requested, skip delete and pass back the updated
+							 * row.
+							 */
+							if (epqreturnslot)
+							{
+								*epqreturnslot = epqslot;
+								return NULL;
+							}
+							else
+								goto ldelete;
+
+						case TableTupleDeleted:
+							/* tuple already deleted; nothing to do */
+							return NULL;
+
+						default:
+							/*
+							 * TableTupleInvisible should be impossible
+							 * because we're waiting for updated row versions,
+							 * and would already have errored out if the first
+							 * version is invisible.
+							 *
+							 * TableTupleSelfModified should be impossible, as
+							 * we'd otherwise should have hit the
+							 * TableTupleSelfModified case in response to
+							 * table_delete above.
+							 *
+							 * TableTupleUpdated should be impossible, because
+							 * we're locking the latest version via
+							 * TUPLE_LOCK_FLAG_FIND_LAST_VERSION.
+							 */
+							elog(ERROR, "unexpected table_lock_tuple status: %u", result);
+							return NULL;
+					}
+
+					Assert(false);
+					break;
+				}
+
+			case TableTupleDeleted:
 				if (IsolationUsesXactSnapshot())
 					ereport(ERROR,
 							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				if (ItemPointerIndicatesMovedPartitions(&hufd.ctid))
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("tuple to be deleted was already moved to another partition due to concurrent update")));
-
-				if (!ItemPointerEquals(tupleid, &hufd.ctid))
-				{
-					TupleTableSlot *my_epqslot;
-
-					my_epqslot = EvalPlanQual(estate,
-											  epqstate,
-											  resultRelationDesc,
-											  resultRelInfo->ri_RangeTableIndex,
-											  LockTupleExclusive,
-											  &hufd.ctid,
-											  hufd.xmax);
-					if (!TupIsNull(my_epqslot))
-					{
-						*tupleid = hufd.ctid;
-
-						/*
-						 * If requested, skip delete and pass back the updated
-						 * row.
-						 */
-						if (epqreturnslot)
-						{
-							*epqreturnslot = my_epqslot;
-							return NULL;
-						}
-						else
-							goto ldelete;
-					}
-				}
+							 errmsg("could not serialize access due to concurrent delete")));
 				/* tuple already deleted; nothing to do */
 				return NULL;
 
 			default:
-				elog(ERROR, "unrecognized heap_delete status: %u", result);
+				elog(ERROR, "unrecognized table_delete status: %u", result);
 				return NULL;
 		}
 
@@ -842,7 +881,7 @@ ldelete:;
 
 				deltuple->t_self = *tupleid;
 				if (!heap_fetch(resultRelationDesc, SnapshotAny,
-								deltuple, &buffer, false, NULL))
+								deltuple, &buffer, NULL))
 					elog(ERROR, "failed to fetch deleted tuple for DELETE RETURNING");
 
 				ExecStorePinnedBufferHeapTuple(deltuple, slot, buffer);
@@ -897,11 +936,10 @@ ExecUpdate(ModifyTableState *mtstate,
 		   EState *estate,
 		   bool canSetTag)
 {
-	HeapTuple	updatetuple;
 	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
-	HTSU_Result result;
-	HeapUpdateFailureData hufd;
+	TM_Result	result;
+	TM_FailureData tmfd;
 	List	   *recheckIndexes = NIL;
 	TupleConversionMap *saved_tcs_map = NULL;
 
@@ -925,7 +963,7 @@ ExecUpdate(ModifyTableState *mtstate,
 	{
 		if (!ExecBRUpdateTriggers(estate, epqstate, resultRelInfo,
 								  tupleid, oldtuple, slot))
-			return NULL;        /* "do nothing" */
+			return NULL;		/* "do nothing" */
 	}
 
 	/* INSTEAD OF ROW UPDATE Triggers */
@@ -934,7 +972,7 @@ ExecUpdate(ModifyTableState *mtstate,
 	{
 		if (!ExecIRUpdateTriggers(estate, resultRelInfo,
 								  oldtuple, slot))
-			return NULL;        /* "do nothing" */
+			return NULL;		/* "do nothing" */
 	}
 	else if (resultRelInfo->ri_FdwRoutine)
 	{
@@ -960,6 +998,7 @@ ExecUpdate(ModifyTableState *mtstate,
 	{
 		LockTupleMode lockmode;
 		bool		partition_constraint_failed;
+		bool		update_indexes;
 
 		/*
 		 * Constraints might reference the tableoid column, so (re-)initialize
@@ -973,11 +1012,14 @@ ExecUpdate(ModifyTableState *mtstate,
 		 * If we generate a new candidate tuple after EvalPlanQual testing, we
 		 * must loop back here and recheck any RLS policies and constraints.
 		 * (We don't need to redo triggers, however.  If there are any BEFORE
-		 * triggers then trigger.c will have done heap_lock_tuple to lock the
+		 * triggers then trigger.c will have done table_lock_tuple to lock the
 		 * correct tuple, so there's no need to do them again.)
 		 */
 lreplace:;
 
+		/* ensure slot is independent, consider e.g. EPQ */
+		ExecMaterializeSlot(slot);
+
 		/*
 		 * If partition constraint fails, this row might get moved to another
 		 * partition, in which case we should check the RLS CHECK policy just
@@ -1145,18 +1187,16 @@ lreplace:;
 		 * needed for referential integrity updates in transaction-snapshot
 		 * mode transactions.
 		 */
-		updatetuple = ExecFetchSlotHeapTuple(slot, true, NULL);
-		result = heap_update(resultRelationDesc, tupleid,
-							 updatetuple,
-							 estate->es_output_cid,
-							 estate->es_crosscheck_snapshot,
-							 true /* wait for commit */ ,
-							 &hufd, &lockmode);
-		ItemPointerCopy(&updatetuple->t_self, &slot->tts_tid);
+		result = table_update(resultRelationDesc, tupleid, slot,
+							  estate->es_output_cid,
+							  estate->es_snapshot,
+							  estate->es_crosscheck_snapshot,
+							  true /* wait for commit */ ,
+							  &tmfd, &lockmode, &update_indexes);
 
 		switch (result)
 		{
-			case HeapTupleSelfUpdated:
+			case TableTupleSelfModified:
 
 				/*
 				 * The target tuple was already updated or deleted by the
@@ -1181,7 +1221,7 @@ lreplace:;
 				 * can re-execute the UPDATE (assuming it can figure out how)
 				 * and then return NULL to cancel the outer update.
 				 */
-				if (hufd.cmax != estate->es_output_cid)
+				if (tmfd.cmax != estate->es_output_cid)
 					ereport(ERROR,
 							(errcode(ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION),
 							 errmsg("tuple to be updated was already modified by an operation triggered by the current command"),
@@ -1190,64 +1230,80 @@ lreplace:;
 				/* Else, already updated by self; nothing to do */
 				return NULL;
 
-			case HeapTupleMayBeUpdated:
+			case TableTupleMayBeModified:
 				break;
 
-			case HeapTupleUpdated:
+			case TableTupleUpdated:
+				{
+					TupleTableSlot *inputslot;
+					TupleTableSlot *epqslot;
+
+					if (IsolationUsesXactSnapshot())
+						ereport(ERROR,
+								(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+								 errmsg("could not serialize access due to concurrent update")));
+
+					/*
+					 * Already know that we're going to need to do EPQ, so
+					 * fetch tuple directly into the right slot.
+					 */
+					EvalPlanQualBegin(epqstate, estate);
+					inputslot = EvalPlanQualSlot(epqstate, resultRelationDesc,
+												 resultRelInfo->ri_RangeTableIndex);
+
+					result = table_lock_tuple(resultRelationDesc, tupleid,
+											  estate->es_snapshot,
+											  inputslot, estate->es_output_cid,
+											  lockmode, LockWaitBlock,
+											  TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
+											  &tmfd);
+
+					switch (result)
+					{
+						case TableTupleMayBeModified:
+							Assert(tmfd.traversed);
+
+							epqslot = EvalPlanQual(estate,
+												   epqstate,
+												   resultRelationDesc,
+												   resultRelInfo->ri_RangeTableIndex,
+												   inputslot);
+							if (TupIsNull(epqslot))
+								/* Tuple not passing quals anymore, exiting... */
+								return NULL;
+
+							slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+							goto lreplace;
+
+						case TableTupleDeleted:
+							/* tuple already deleted; nothing to do */
+							return NULL;
+
+						default:
+							/* see table_lock_tuple call in ExecDelete() */
+							elog(ERROR, "unexpected table_lock_tuple status: %u", result);
+							return NULL;
+					}
+				}
+
+				break;
+
+			case TableTupleDeleted:
 				if (IsolationUsesXactSnapshot())
 					ereport(ERROR,
 							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				if (ItemPointerIndicatesMovedPartitions(&hufd.ctid))
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("tuple to be updated was already moved to another partition due to concurrent update")));
-
-				if (!ItemPointerEquals(tupleid, &hufd.ctid))
-				{
-					TupleTableSlot *epqslot;
-
-					epqslot = EvalPlanQual(estate,
-										   epqstate,
-										   resultRelationDesc,
-										   resultRelInfo->ri_RangeTableIndex,
-										   lockmode,
-										   &hufd.ctid,
-										   hufd.xmax);
-					if (!TupIsNull(epqslot))
-					{
-						*tupleid = hufd.ctid;
-						slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
-						goto lreplace;
-					}
-				}
+							 errmsg("could not serialize access due to concurrent delete")));
 				/* tuple already deleted; nothing to do */
 				return NULL;
 
 			default:
-				elog(ERROR, "unrecognized heap_update status: %u", result);
+				elog(ERROR, "unrecognized table_update status: %u", result);
 				return NULL;
 		}
 
-		/*
-		 * Note: instead of having to update the old index tuples associated
-		 * with the heap tuple, all we do is form and insert new index tuples.
-		 * This is because UPDATEs are actually DELETEs and INSERTs, and index
-		 * tuple deletion is done later by VACUUM (see notes in ExecDelete).
-		 * All we do here is insert new index tuples.  -cim 9/27/89
-		 */
-
-		/*
-		 * insert index entries for tuple
-		 *
-		 * Note: heap_update returns the tid (location) of the new tuple in
-		 * the t_self field.
-		 *
-		 * If it's a HOT update, we mustn't insert new index entries.
-		 */
-		if (resultRelInfo->ri_NumIndices > 0 && !HeapTupleIsHeapOnly(updatetuple))
-			recheckIndexes = ExecInsertIndexTuples(slot, &(updatetuple->t_self),
-												   estate, false, NULL, NIL);
+		/* insert index entries for tuple if necessary */
+		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
+			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
 	}
 
 	if (canSetTag)
@@ -1306,11 +1362,12 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	Relation	relation = resultRelInfo->ri_RelationDesc;
 	ExprState  *onConflictSetWhere = resultRelInfo->ri_onConflict->oc_WhereClause;
 	TupleTableSlot *existing = resultRelInfo->ri_onConflict->oc_Existing;
-	HeapTupleData tuple;
-	HeapUpdateFailureData hufd;
+	TM_FailureData tmfd;
 	LockTupleMode lockmode;
-	HTSU_Result test;
-	Buffer		buffer;
+	TM_Result	test;
+	Datum		xminDatum;
+	TransactionId xmin;
+	bool		isnull;
 
 	/* Determine lock mode to use */
 	lockmode = ExecUpdateLockMode(estate, resultRelInfo);
@@ -1321,17 +1378,18 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 * previous conclusion that the tuple is conclusively committed is not
 	 * true anymore.
 	 */
-	tuple.t_self = *conflictTid;
-	test = heap_lock_tuple(relation, &tuple, estate->es_output_cid,
-						   lockmode, LockWaitBlock, false, &buffer,
-						   &hufd);
+	test = table_lock_tuple(relation, conflictTid,
+							estate->es_snapshot,
+							existing, estate->es_output_cid,
+							lockmode, LockWaitBlock, 0,
+							&tmfd);
 	switch (test)
 	{
-		case HeapTupleMayBeUpdated:
+		case TableTupleMayBeModified:
 			/* success! */
 			break;
 
-		case HeapTupleInvisible:
+		case TableTupleInvisible:
 
 			/*
 			 * This can occur when a just inserted tuple is updated again in
@@ -1339,7 +1397,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 			 * conflicting key values are inserted.
 			 *
 			 * This is somewhat similar to the ExecUpdate()
-			 * HeapTupleSelfUpdated case.  We do not want to proceed because
+			 * TableTupleSelfModified case.  We do not want to proceed because
 			 * it would lead to the same row being updated a second time in
 			 * some unspecified order, and in contrast to plain UPDATEs
 			 * there's no historical behavior to break.
@@ -1349,7 +1407,13 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 			 * that for SQL MERGE, an exception must be raised in the event of
 			 * an attempt to update the same row twice.
 			 */
-			if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple.t_data)))
+			xminDatum = slot_getsysattr(existing,
+										MinTransactionIdAttributeNumber,
+										&isnull);
+			Assert(!isnull);
+			xmin = DatumGetTransactionId(xminDatum);
+
+			if (TransactionIdIsCurrentTransactionId(xmin))
 				ereport(ERROR,
 						(errcode(ERRCODE_CARDINALITY_VIOLATION),
 						 errmsg("ON CONFLICT DO UPDATE command cannot affect row a second time"),
@@ -1359,7 +1423,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 			elog(ERROR, "attempted to lock invisible tuple");
 			break;
 
-		case HeapTupleSelfUpdated:
+		case TableTupleSelfModified:
 
 			/*
 			 * This state should never be reached. As a dirty snapshot is used
@@ -1369,7 +1433,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 			elog(ERROR, "unexpected self-updated tuple");
 			break;
 
-		case HeapTupleUpdated:
+		case TableTupleUpdated:
 			if (IsolationUsesXactSnapshot())
 				ereport(ERROR,
 						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
@@ -1381,7 +1445,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 			 * be lock is moved to another partition due to concurrent update
 			 * of the partition key.
 			 */
-			Assert(!ItemPointerIndicatesMovedPartitions(&hufd.ctid));
+			Assert(!ItemPointerIndicatesMovedPartitions(&tmfd.ctid));
 
 			/*
 			 * Tell caller to try again from the very start.
@@ -1390,11 +1454,20 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 			 * loop here, as the new version of the row might not conflict
 			 * anymore, or the conflicting tuple has actually been deleted.
 			 */
-			ReleaseBuffer(buffer);
+			ExecClearTuple(existing);
+			return false;
+
+		case TableTupleDeleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent delete")));
+
+			ExecClearTuple(existing);
 			return false;
 
 		default:
-			elog(ERROR, "unrecognized heap_lock_tuple status: %u", test);
+			elog(ERROR, "unrecognized table_lock_tuple status: %u", test);
 	}
 
 	/* Success, the tuple is locked. */
@@ -1412,10 +1485,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 * snapshot.  This is in line with the way UPDATE deals with newer tuple
 	 * versions.
 	 */
-	ExecCheckHeapTupleVisible(estate, &tuple, buffer);
-
-	/* Store target's existing tuple in the state's dedicated slot */
-	ExecStorePinnedBufferHeapTuple(&tuple, existing, buffer);
+	ExecCheckHeapTupleVisible(estate, relation, existing);
 
 	/*
 	 * Make tuple and any needed join variables available to ExecQual and
@@ -1462,7 +1532,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 
 	/*
 	 * Note that it is possible that the target tuple has been modified in
-	 * this session, after the above heap_lock_tuple. We choose to not error
+	 * this session, after the above table_lock_tuple. We choose to not error
 	 * out in that case, in line with ExecUpdate's treatment of similar cases.
 	 * This can happen if an UPDATE is triggered from within ExecQual(),
 	 * ExecWithCheckOptions() or ExecProject() above, e.g. by selecting from a
@@ -1470,7 +1540,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 */
 
 	/* Execute UPDATE with projection */
-	*returning = ExecUpdate(mtstate, &tuple.t_self, NULL,
+	*returning = ExecUpdate(mtstate, conflictTid, NULL,
 							resultRelInfo->ri_onConflict->oc_ProjSlot,
 							planSlot,
 							&mtstate->mt_epqstate, mtstate->ps.state,
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 08872ef9b4f..0e6a0748c8c 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -376,7 +376,7 @@ TidNext(TidScanState *node)
 		if (node->tss_isCurrentOf)
 			heap_get_latest_tid(heapRelation, snapshot, &tuple->t_self);
 
-		if (heap_fetch(heapRelation, snapshot, tuple, &buffer, false, NULL))
+		if (heap_fetch(heapRelation, snapshot, tuple, &buffer, NULL))
 		{
 			/*
 			 * Store the scanned tuple in the scan tuple slot of the scan
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index eb9e160bfd9..505fce96b0e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -19,6 +19,7 @@
 #include "access/sdir.h"
 #include "access/skey.h"
 #include "access/table.h"		/* for backward compatibility */
+#include "access/tableam.h"
 #include "nodes/lockoptions.h"
 #include "nodes/primnodes.h"
 #include "storage/bufpage.h"
@@ -28,39 +29,16 @@
 
 
 /* "options" flag bits for heap_insert */
-#define HEAP_INSERT_SKIP_WAL	0x0001
-#define HEAP_INSERT_SKIP_FSM	0x0002
-#define HEAP_INSERT_FROZEN		0x0004
-#define HEAP_INSERT_SPECULATIVE 0x0008
-#define HEAP_INSERT_NO_LOGICAL	0x0010
+#define HEAP_INSERT_SKIP_WAL	TABLE_INSERT_SKIP_WAL
+#define HEAP_INSERT_SKIP_FSM	TABLE_INSERT_SKIP_FSM
+#define HEAP_INSERT_FROZEN		TABLE_INSERT_FROZEN
+#define HEAP_INSERT_NO_LOGICAL	TABLE_INSERT_NO_LOGICAL
+#define HEAP_INSERT_SPECULATIVE 0x0010
 
 typedef struct BulkInsertStateData *BulkInsertState;
 
 #define MaxLockTupleMode	LockTupleExclusive
 
-/*
- * When heap_update, heap_delete, or heap_lock_tuple fail because the target
- * tuple is already outdated, they fill in this struct to provide information
- * to the caller about what happened.
- * ctid is the target's ctid link: it is the same as the target's TID if the
- * target was deleted, or the location of the replacement tuple if the target
- * was updated.
- * xmax is the outdating transaction's XID.  If the caller wants to visit the
- * replacement tuple, it must check that this matches before believing the
- * replacement is really a match.
- * cmax is the outdating command's CID, but only when the failure code is
- * HeapTupleSelfUpdated (i.e., something in the current transaction outdated
- * the tuple); otherwise cmax is zero.  (We make this restriction because
- * HeapTupleHeaderGetCmax doesn't work for tuples outdated in other
- * transactions.)
- */
-typedef struct HeapUpdateFailureData
-{
-	ItemPointerData ctid;
-	TransactionId xmax;
-	CommandId	cmax;
-} HeapUpdateFailureData;
-
 /*
  * Descriptor for heap table scans.
  */
@@ -150,8 +128,7 @@ extern bool heap_getnextslot(TableScanDesc sscan,
 				 ScanDirection direction, struct TupleTableSlot *slot);
 
 extern bool heap_fetch(Relation relation, Snapshot snapshot,
-		   HeapTuple tuple, Buffer *userbuf, bool keep_buf,
-		   Relation stats_relation);
+		   HeapTuple tuple, Buffer *userbuf, Relation stats_relation);
 extern bool heap_hot_search_buffer(ItemPointer tid, Relation relation,
 					   Buffer buffer, Snapshot snapshot, HeapTuple heapTuple,
 					   bool *all_dead, bool first_call);
@@ -170,19 +147,20 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 			int options, BulkInsertState bistate);
 extern void heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 				  CommandId cid, int options, BulkInsertState bistate);
-extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
+extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 			CommandId cid, Snapshot crosscheck, bool wait,
-			HeapUpdateFailureData *hufd, bool changingPart);
-extern void heap_finish_speculative(Relation relation, HeapTuple tuple);
-extern void heap_abort_speculative(Relation relation, HeapTuple tuple);
-extern HTSU_Result heap_update(Relation relation, ItemPointer otid,
+			struct TM_FailureData *tmfd, bool changingPart);
+extern void heap_finish_speculative(Relation relation, ItemPointer tid);
+extern void heap_abort_speculative(Relation relation, ItemPointer tid);
+extern TM_Result heap_update(Relation relation, ItemPointer otid,
 			HeapTuple newtup,
 			CommandId cid, Snapshot crosscheck, bool wait,
-			HeapUpdateFailureData *hufd, LockTupleMode *lockmode);
-extern HTSU_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
+			struct TM_FailureData *tmfd, LockTupleMode *lockmode);
+extern TM_Result heap_lock_tuple(Relation relation, ItemPointer tid,
 				CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
-				bool follow_update,
-				Buffer *buffer, HeapUpdateFailureData *hufd);
+				bool follow_update, HeapTuple tuple,
+				Buffer *buffer, struct TM_FailureData *tmfd);
+
 extern void heap_inplace_update(Relation relation, HeapTuple tuple);
 extern bool heap_freeze_tuple(HeapTupleHeader tuple,
 				  TransactionId relfrozenxid, TransactionId relminmxid,
@@ -223,7 +201,7 @@ extern void heap_vacuum_rel(Relation onerel,
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot,
 							 Buffer buffer);
-extern HTSU_Result HeapTupleSatisfiesUpdate(HeapTuple stup, CommandId curcid,
+extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple stup, CommandId curcid,
 						 Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple stup, TransactionId OldestXmin,
 						 Buffer buffer);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 50b8ab93539..b257e9a2aa5 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -27,6 +27,73 @@ extern char *default_table_access_method;
 extern bool synchronize_seqscans;
 
 
+struct BulkInsertStateData;
+
+
+/*
+ * Result codes for table_{update,delete,lock}_tuple, and for visibility
+ * routines inside table AMs.
+ */
+typedef enum TM_Result
+{
+	/* Signals that the action succeeded (i.e. update/delete performed) */
+	TableTupleMayBeModified,
+
+	/* The affected tuple wasn't visible to the relevant snapshot */
+	TableTupleInvisible,
+
+	/* The affected tuple was already modified by the calling backend */
+	TableTupleSelfModified,
+
+	/* The affected tuple was updated by another transaction */
+	TableTupleUpdated,
+
+	/* The affected tuple was deleted by another transaction */
+	TableTupleDeleted,
+
+	/*
+	 * The affected tuple is currently being modified by another session. This
+	 * will only be returned if (update/delete/lock)_tuple are instructed not
+	 * to wait.
+	 */
+	TableTupleBeingModified,
+
+	/* lock couldn't be acquired, action skipped. Only used by lock_tuple */
+	TableTupleWouldBlock
+} TM_Result;
+
+
+/*
+ * When table_update, table_delete, or table_lock_tuple fail because the target
+ * tuple is already outdated, they fill in this struct to provide information
+ * to the caller about what happened.
+ * ctid is the target's ctid link: it is the same as the target's TID if the
+ * target was deleted, or the location of the replacement tuple if the target
+ * was updated.
+ * xmax is the outdating transaction's XID.  If the caller wants to visit the
+ * replacement tuple, it must check that this matches before believing the
+ * replacement is really a match.
+ * cmax is the outdating command's CID, but only when the failure code is
+ * TableTupleSelfModified (i.e., something in the current transaction outdated
+ * the tuple); otherwise cmax is zero.  (We make this restriction because
+ * HeapTupleHeaderGetCmax doesn't work for tuples outdated in other
+ * transactions.)
+ */
+typedef struct TM_FailureData
+{
+	ItemPointerData ctid;
+	TransactionId xmax;
+	CommandId	cmax;
+	bool		traversed;
+} TM_FailureData;
+
+/* "options" flag bits for heap_insert */
+#define TABLE_INSERT_SKIP_WAL		0x0001
+#define TABLE_INSERT_SKIP_FSM		0x0002
+#define TABLE_INSERT_FROZEN			0x0004
+#define TABLE_INSERT_NO_LOGICAL		0x0008
+
+
 /*
  * API struct for a table AM.  Note this must be allocated in a
  * server-lifetime manner, typically as a static const struct, which then gets
@@ -200,6 +267,62 @@ typedef struct TableAmRoutine
 											 TupleTableSlot *slot,
 											 Snapshot snapshot);
 
+	/* ------------------------------------------------------------------------
+	 * Manipulations of physical tuples.
+	 * ------------------------------------------------------------------------
+	 */
+
+	/* see table_insert() for reference about parameters */
+	void		(*tuple_insert) (Relation rel, TupleTableSlot *slot, CommandId cid,
+								 int options, struct BulkInsertStateData *bistate);
+
+	/* see table_insert() for reference about parameters */
+	void		(*tuple_insert_speculative) (Relation rel,
+											 TupleTableSlot *slot,
+											 CommandId cid,
+											 int options,
+											 struct BulkInsertStateData *bistate,
+											 uint32 specToken);
+
+	/* see table_insert() for reference about parameters */
+	void		(*tuple_complete_speculative) (Relation rel,
+											   TupleTableSlot *slot,
+											   uint32 specToken,
+											   bool succeeded);
+
+	/* see table_insert() for reference about parameters */
+	TM_Result	(*tuple_delete) (Relation rel,
+								 ItemPointer tid,
+								 CommandId cid,
+								 Snapshot snapshot,
+								 Snapshot crosscheck,
+								 bool wait,
+								 TM_FailureData *tmfd,
+								 bool changingPart);
+
+	/* see table_insert() for reference about parameters */
+	TM_Result	(*tuple_update) (Relation rel,
+								 ItemPointer otid,
+								 TupleTableSlot *slot,
+								 CommandId cid,
+								 Snapshot snapshot,
+								 Snapshot crosscheck,
+								 bool wait,
+								 TM_FailureData *tmfd,
+								 LockTupleMode *lockmode,
+								 bool *update_indexes);
+
+	/* see table_insert() for reference about parameters */
+	TM_Result	(*tuple_lock) (Relation rel,
+							   ItemPointer tid,
+							   Snapshot snapshot,
+							   TupleTableSlot *slot,
+							   CommandId cid,
+							   LockTupleMode mode,
+							   LockWaitPolicy wait_policy,
+							   uint8 flags,
+							   TM_FailureData *tmfd);
+
 } TableAmRoutine;
 
 
@@ -487,6 +610,230 @@ table_tuple_satisfies_snapshot(Relation rel, TupleTableSlot *slot, Snapshot snap
 }
 
 
+/* ----------------------------------------------------------------------------
+ *  Functions for manipulations of physical tuples.
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * Insert a tuple from a slot into table AM routine.
+ *
+ * The options bitmask allows to specify options that allow to change the
+ * behaviour of the AM. Several options might be ignored by AMs not supporting
+ * them.
+ *
+ * If the TABLE_INSERT_SKIP_WAL option is specified, the new tuple will not
+ * necessarily logged to WAL, even for a non-temp relation. It is the AMs
+ * choice whether this optimization is supported.
+ *
+ * If the TABLE_INSERT_SKIP_FSM option is specified, AMs are free to not reuse
+ * free space in the relation. This can save some cycles when we know the
+ * relation is new and doesn't contain useful amounts of free space.  It's
+ * commonly passed directly to RelationGetBufferForTuple, see for more info.
+ *
+ * TABLE_INSERT_FROZEN should only be specified for inserts into
+ * relfilenodes created during the current subtransaction and when
+ * there are no prior snapshots or pre-existing portals open.
+ * This causes rows to be frozen, which is an MVCC violation and
+ * requires explicit options chosen by user.
+ *
+ * TABLE_INSERT_NO_LOGICAL force-disables the emitting of logical decoding
+ * information for the tuple. This should solely be used during table rewrites
+ * where RelationIsLogicallyLogged(relation) is not yet accurate for the new
+ * relation.
+ *
+ * Note that most of these options will be applied when inserting into the
+ * heap's TOAST table, too, if the tuple requires any out-of-line data
+ *
+ *
+ * The BulkInsertState object (if any; bistate can be NULL for default
+ * behavior) is also just passed through to RelationGetBufferForTuple.
+ *
+ * On return the slot's tts_tid and tts_tableOid are updated to reflect the
+ * insertion. But note that any toasting of fields within the slot is NOT
+ * reflected in the slots contents.
+ */
+static inline void
+table_insert(Relation rel, TupleTableSlot *slot, CommandId cid,
+			 int options, struct BulkInsertStateData *bistate)
+{
+	rel->rd_tableam->tuple_insert(rel, slot, cid, options,
+								  bistate);
+}
+
+/*
+ * Perform a "speculative insertion". These can be backed out afterwards
+ * without aborting the whole transaction.  Other sessions can wait for the
+ * speculative insertion to be confirmed, turning it into a regular tuple, or
+ * aborted, as if it never existed.  Speculatively inserted tuples behave as
+ * "value locks" of short duration, used to implement INSERT .. ON CONFLICT.
+ *
+ * A transaction having performed a speculative insertion has to either abort,
+ * or finish the speculative insertion with
+ * table_complete_speculative(succeeded = ...).
+ */
+static inline void
+table_insert_speculative(Relation rel, TupleTableSlot *slot, CommandId cid,
+						 int options, struct BulkInsertStateData *bistate, uint32 specToken)
+{
+	rel->rd_tableam->tuple_insert_speculative(rel, slot, cid, options,
+											  bistate, specToken);
+}
+
+/*
+ * Complete "speculative insertion" started in the same transaction. If
+ * succeeded is true, the tuple is fully inserted, if false, it's removed.
+ */
+static inline void
+table_complete_speculative(Relation rel, TupleTableSlot *slot, uint32 specToken,
+						   bool succeeded)
+{
+	return rel->rd_tableam->tuple_complete_speculative(rel, slot, specToken,
+													   succeeded);
+}
+
+/*
+ * Delete a tuple.
+ *
+ * NB: do not call this directly unless prepared to deal with
+ * concurrent-update conditions.  Use simple_table_delete instead.
+ *
+ * Input parameters:
+ *	relation - table to be modified (caller must hold suitable lock)
+ *	tid - TID of tuple to be deleted
+ *	cid - delete command ID (used for visibility test, and stored into
+ *		cmax if successful)
+ *	crosscheck - if not InvalidSnapshot, also check tuple against this
+ *	wait - true if should wait for any conflicting update to commit/abort
+ * Output parameters:
+ *	tmfd - filled in failure cases (see below)
+ *	changingPart - true iff the tuple is being moved to another partition
+ *		table due to an update of the partition key. Otherwise, false.
+ *
+ * Normal, successful return value is TableTupleMayBeModified, which
+ * actually means we did delete it.  Failure return codes are
+ * TableTupleSelfModified, TableTupleUpdated, or TableTupleBeingModified
+ * (the last only possible if wait == false).
+ *
+ * In the failure cases, the routine fills *tmfd with the tuple's t_ctid,
+ * t_xmax, and, if possible, and, if possible, t_cmax.  See comments for
+ * struct TM_FailureData for additional info.
+ */
+static inline TM_Result
+table_delete(Relation rel, ItemPointer tid, CommandId cid,
+			 Snapshot snapshot, Snapshot crosscheck, bool wait,
+			 TM_FailureData *tmfd, bool changingPart)
+{
+	return rel->rd_tableam->tuple_delete(rel, tid, cid,
+										 snapshot, crosscheck,
+										 wait, tmfd, changingPart);
+}
+
+/*
+ * Update a tuple.
+ *
+ * NB: do not call this directly unless you are prepared to deal with
+ * concurrent-update conditions.  Use simple_table_update instead.
+ *
+ * Input parameters:
+ *	relation - table to be modified (caller must hold suitable lock)
+ *	otid - TID of old tuple to be replaced
+ *	newtup - newly constructed tuple data to store
+ *	cid - update command ID (used for visibility test, and stored into
+ *		cmax/cmin if successful)
+ *	crosscheck - if not InvalidSnapshot, also check old tuple against this
+ *	wait - true if should wait for any conflicting update to commit/abort
+ * Output parameters:
+ *	tmfd - filled in failure cases (see below)
+ *	lockmode - filled with lock mode acquired on tuple
+ *  update_indexes - in success cases this is set to true if new index entries
+ *		are required for this tuple
+ *
+ * Normal, successful return value is TableTupleMayBeModified, which
+ * actually means we *did* update it.  Failure return codes are
+ * TableTupleSelfModified, TableTupleUpdated, or TableTupleBeingModified
+ * (the last only possible if wait == false).
+ *
+ * On success, the header fields of *newtup are updated to match the new
+ * stored tuple; in particular, newtup->t_self is set to the TID where the
+ * new tuple was inserted, and its HEAP_ONLY_TUPLE flag is set iff a HOT
+ * update was done.  However, any TOAST changes in the new tuple's
+ * data are not reflected into *newtup.
+ *
+ * In the failure cases, the routine fills *tmfd with the tuple's t_ctid,
+ * t_xmax, and, if possible, t_cmax.  See comments for struct TM_FailureData
+ * for additional info.
+ */
+static inline TM_Result
+table_update(Relation rel, ItemPointer otid, TupleTableSlot *slot,
+			 CommandId cid, Snapshot snapshot, Snapshot crosscheck, bool wait,
+			 TM_FailureData *tmfd, LockTupleMode *lockmode,
+			 bool *update_indexes)
+{
+	return rel->rd_tableam->tuple_update(rel, otid, slot,
+										 cid, snapshot, crosscheck,
+										 wait, tmfd,
+										 lockmode, update_indexes);
+}
+
+/*
+ * Lock a tuple in the specified mode.
+ *
+ * Input parameters:
+ *	relation: relation containing tuple (caller must hold suitable lock)
+ *	tid: TID of tuple to lock
+ *	snapshot: snapshot to use for visibility determinations
+ *	cid: current command ID (used for visibility test, and stored into
+ *		tuple's cmax if lock is successful)
+ *	mode: lock mode desired
+ *	wait_policy: what to do if tuple lock is not available
+ *	flags:
+ *		If TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS, follow the update chain to
+ *		also lock descendant tuples if lock modes don't conflict.
+ *		If TUPLE_LOCK_FLAG_FIND_LAST_VERSION, update chain and lock lastest
+ *		version.
+ *
+ * Output parameters:
+ *	*slot: contains the target tuple
+ *	*tmfd: filled in failure cases (see below)
+ *
+ * Function result may be:
+ *	TableTupleMayBeModified: lock was successfully acquired
+ *	TableTupleInvisible: lock failed because tuple was never visible to us
+ *	TableTupleSelfModified: lock failed because tuple updated by self
+ *	TableTupleUpdated: lock failed because tuple updated by other xact
+ *	TableTupleDeleted: lock failed because tuple deleted by other xact
+ *	TableTupleWouldBlock: lock couldn't be acquired and wait_policy is skip
+ *
+ * In the failure cases other than TableTupleInvisible, the routine fills
+ * *tmfd with the tuple's t_ctid, t_xmax, and, if possible, t_cmax.  See
+ * comments for struct TM_FailureData for additional info.
+ */
+static inline TM_Result
+table_lock_tuple(Relation rel, ItemPointer tid, Snapshot snapshot,
+				 TupleTableSlot *slot, CommandId cid, LockTupleMode mode,
+				 LockWaitPolicy wait_policy, uint8 flags,
+				 TM_FailureData *tmfd)
+{
+	return rel->rd_tableam->tuple_lock(rel, tid, snapshot, slot,
+									   cid, mode, wait_policy,
+									   flags, tmfd);
+}
+
+
+/* ----------------------------------------------------------------------------
+ * Functions to make modifications a bit simpler.
+ * ----------------------------------------------------------------------------
+ */
+
+extern void simple_table_insert(Relation rel, TupleTableSlot *slot);
+extern void simple_table_delete(Relation rel, ItemPointer tid,
+					Snapshot snapshot);
+extern void simple_table_update(Relation rel, ItemPointer otid,
+					TupleTableSlot *slot, Snapshot snapshot,
+					bool *update_indexes);
+
+
 /* ----------------------------------------------------------------------------
  * Helper functions to implement parallel scans for block oriented AMs.
  * ----------------------------------------------------------------------------
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 9003f2ce583..ceacd1c6370 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -195,12 +195,7 @@ extern LockTupleMode ExecUpdateLockMode(EState *estate, ResultRelInfo *relinfo);
 extern ExecRowMark *ExecFindRowMark(EState *estate, Index rti, bool missing_ok);
 extern ExecAuxRowMark *ExecBuildAuxRowMark(ExecRowMark *erm, List *targetlist);
 extern TupleTableSlot *EvalPlanQual(EState *estate, EPQState *epqstate,
-			 Relation relation, Index rti, LockTupleMode lockmode,
-			 ItemPointer tid, TransactionId priorXmax);
-extern bool EvalPlanQualFetch(EState *estate, Relation relation,
-				  LockTupleMode lockmode, LockWaitPolicy wait_policy,
-				  ItemPointer tid, TransactionId priorXmax,
-				  TupleTableSlot *slot);
+			 Relation relation, Index rti, TupleTableSlot *testslot);
 extern void EvalPlanQualInit(EPQState *epqstate, EState *estate,
 				 Plan *subplan, List *auxrowmarks, int epqParam);
 extern void EvalPlanQualSetPlan(EPQState *epqstate,
@@ -569,9 +564,8 @@ extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relIn
  */
 extern void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative);
 extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
-extern List *ExecInsertIndexTuples(TupleTableSlot *slot, ItemPointer tupleid,
-					  EState *estate, bool noDupErr, bool *specConflict,
-					  List *arbiterIndexes);
+extern List *ExecInsertIndexTuples(TupleTableSlot *slot, EState *estate, bool noDupErr,
+					  bool *specConflict, List *arbiterIndexes);
 extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate,
 						  ItemPointer conflictTid, List *arbiterIndexes);
 extern void check_exclusion_constraint(Relation heap, Relation index,
diff --git a/src/include/nodes/lockoptions.h b/src/include/nodes/lockoptions.h
index 8e8ccff43ca..d6b1160ab4b 100644
--- a/src/include/nodes/lockoptions.h
+++ b/src/include/nodes/lockoptions.h
@@ -58,4 +58,9 @@ typedef enum LockTupleMode
 	LockTupleExclusive
 } LockTupleMode;
 
+/* Follow tuples whose update is in progress if lock modes don't conflict  */
+#define TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS	(1 << 0)
+/* Follow update chain and lock lastest version of tuple */
+#define TUPLE_LOCK_FLAG_FIND_LAST_VERSION		(1 << 1)
+
 #endif							/* LOCKOPTIONS_H */
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index e7ea5cf7b56..7bf7cad5727 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -184,17 +184,4 @@ typedef struct SnapshotData
 	XLogRecPtr	lsn;			/* position in the WAL stream when taken */
 } SnapshotData;
 
-/*
- * Result codes for HeapTupleSatisfiesUpdate.
- */
-typedef enum
-{
-	HeapTupleMayBeUpdated,
-	HeapTupleInvisible,
-	HeapTupleSelfUpdated,
-	HeapTupleUpdated,
-	HeapTupleBeingUpdated,
-	HeapTupleWouldBlock			/* can be returned by heap_tuple_lock */
-} HTSU_Result;
-
 #endif							/* SNAPSHOT_H */
diff --git a/src/test/isolation/expected/partition-key-update-1.out b/src/test/isolation/expected/partition-key-update-1.out
index 37fe6a7b277..a632d7f7bad 100644
--- a/src/test/isolation/expected/partition-key-update-1.out
+++ b/src/test/isolation/expected/partition-key-update-1.out
@@ -15,7 +15,7 @@ step s1u: UPDATE foo SET a=2 WHERE a=1;
 step s2d: DELETE FROM foo WHERE a=1; <waiting ...>
 step s1c: COMMIT;
 step s2d: <... completed>
-error in steps s1c s2d: ERROR:  tuple to be deleted was already moved to another partition due to concurrent update
+error in steps s1c s2d: ERROR:  tuple to be locked was already moved to another partition due to concurrent update
 step s2c: COMMIT;
 
 starting permutation: s1b s2b s2d s1u s2c s1c
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b301bce4b1b..0015fc0ead9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -943,7 +943,6 @@ HSParser
 HSpool
 HStore
 HTAB
-HTSU_Result
 HTSV_Result
 HV
 Hash
@@ -982,7 +981,6 @@ HeapTupleData
 HeapTupleFields
 HeapTupleHeader
 HeapTupleHeaderData
-HeapUpdateFailureData
 HistControl
 HotStandbyState
 I32
@@ -2282,6 +2280,8 @@ TBMSharedIteratorState
 TBMStatus
 TBlockState
 TIDBitmap
+TM_FailureData
+TM_Result
 TOKEN_DEFAULT_DACL
 TOKEN_INFORMATION_CLASS
 TOKEN_PRIVILEGES
-- 
2.21.0.dirty

#128Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#127)
Re: Pluggable Storage - Andres's take

On Fri, Mar 22, 2019 at 5:16 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

Attached is a version of just the first patch. I'm still updating it,
but it's getting closer to commit:

- There were no tests testing EPQ interactions with DELETE, and only an
accidental test for EPQ in UPDATE with a concurrent DELETE. I've added
tests. Plan to commit that ahead of the big change.

- I was pretty unhappy about how the EPQ integration looked before, I've
changed that now.

I still wonder if we should restore EvalPlanQualFetch and move the
table_lock_tuple() calls in ExecDelete/Update into it. But it seems
like it'd not gain that much, because there's custom surrounding code,
and it's not that much code.

- I changed heapam_tuple_tuple to return *WouldBlock rather than just
the last result. I think that's one of the reason Haribabu had
neutered a few asserts.

- I moved comments from heapam.h to tableam.h where appropriate

- I updated the name of HeapUpdateFailureData to TM_FailureData,
HTSU_Result to TM_Result, TM_Results members now properly distinguish
between update vs modifications (delete & update).

- I separated the HEAP_INSERT_ flags into TABLE_INSERT_* and
HEAP_INSERT_ with the latter being a copy of TABLE_INSERT_ with the
sole addition of _SPECULATIVE. table_insert_speculative callers now
don't specify that anymore.

Pending work:
- Wondering if table_insert/delete/update should rather be
table_tuple_insert etc. Would be a bit more consistent with the
callback names, but a bigger departure from existing code.

- I'm not yet happy with TableTupleDeleted computation in heapam.c, I
want to revise that further

- formatting

- commit message

- a few comments need a bit of polishing (ExecCheckTIDVisible,
heapam_tuple_lock)

- Rename TableTupleMayBeModified to TableTupleOk, but also probably a
s/TableTuple/TableMod/

- I'll probably move TUPLE_LOCK_FLAG_LOCK_* into tableam.h

- two more passes through the patch

Thanks for the corrections.

On 2019-03-21 15:07:04 +1100, Haribabu Kommi wrote:

As you are modifying the 0003 patch for modify API's, I went and reviewed
the
existing patch and found couple corrections that are needed, in case if

you

are not
taken care of them already.

Some I have...

+ /* Update the tuple with table oid */
+ slot->tts_tableOid = RelationGetRelid(relation);
+ if (slot->tts_tableOid != InvalidOid)
+ tuple->t_tableOid = slot->tts_tableOid;

The setting of slot->tts_tableOid is not required in this function,
After set the check is happening, the above code is present in both
heapam_heap_insert and heapam_heap_insert_speculative.

I'm not following? Those functions are independent?

In those functions, the slot->tts_tableOid is set and in the next statement
checked whether it is invalid or not? Callers of table_insert should have
already set that. So setting the value and checking in the next line is it
required?
The value cannot be InvalidOid.

+ slot->tts_tableOid = RelationGetRelid(relation);

In heapam_heap_update, i don't think there is a need to update
slot->tts_tableOid.

Why?

The slot->tts_tableOid should have been updated before the call to
heap_update.
setting it again after the heap_update is required?

I also observed setting slot->tts_tableOid after table_insert_XXX calls
also in
Exec_insert function?

Is this to make sure that AM hasn't modified that value?

+ default:
+ elog(ERROR, "unrecognized heap_update status: %u", result);

heap_update --> table_update?

+ default:
+ elog(ERROR, "unrecognized heap_delete status: %u", result);

same as above?

I've fixed that in a number of places.

+ /*hari FIXME*/
+ /*Assert(result != HeapTupleUpdated && hufd.traversed);*/

Removing the commented codes in both ExecDelete and ExecUpdate functions.

I don't think that's the right fix. I've refactored that code
significantly now, and restored the assert in a, imo, correct version.

OK.

+ /**/
+ if (epqreturnslot)
+ {
+ *epqreturnslot = epqslot;
+ return NULL;
+ }

comment update missed?

Well, you'd deleted a comment around there ;). I've added something back
now...

This is not only the problem I could have introduced, All the comments that
listed are introduced by me ;).

Regards,
Haribabu Kommi
Fujitsu Australia

#129Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#127)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-03-21 11:15:57 -0700, Andres Freund wrote:

Pending work:
- Wondering if table_insert/delete/update should rather be
table_tuple_insert etc. Would be a bit more consistent with the
callback names, but a bigger departure from existing code.

I've left this as is.

- I'm not yet happy with TableTupleDeleted computation in heapam.c, I
want to revise that further

I changed that. Found a bunch of untested paths, I've pushed tests for
those already.

- formatting

Done that.

- commit message

Done that.

- a few comments need a bit of polishing (ExecCheckTIDVisible, heapam_tuple_lock)

Done that.

- Rename TableTupleMayBeModified to TableTupleOk, but also probably a s/TableTuple/TableMod/

It's now TM_*.

/*
* Result codes for table_{update,delete,lock}_tuple, and for visibility
* routines inside table AMs.
*/
typedef enum TM_Result
{
/*
* Signals that the action succeeded (i.e. update/delete performed, lock
* was acquired)
*/
TM_Ok,

/* The affected tuple wasn't visible to the relevant snapshot */
TM_Invisible,

/* The affected tuple was already modified by the calling backend */
TM_SelfModified,

/*
* The affected tuple was updated by another transaction. This includes
* the case where tuple was moved to another partition.
*/
TM_Updated,

/* The affected tuple was deleted by another transaction */
TM_Deleted,

/*
* The affected tuple is currently being modified by another session. This
* will only be returned if (update/delete/lock)_tuple are instructed not
* to wait.
*/
TM_BeingModified,

/* lock couldn't be acquired, action skipped. Only used by lock_tuple */
TM_WouldBlock
} TM_Result;

- I'll probably move TUPLE_LOCK_FLAG_LOCK_* into tableam.h

Done.

- two more passes through the patch

One of them completed. Which is good, because there was a subtle bug in
heapam_tuple_lock (*tid was adjusted to be the followup tuple after the
heap_fetch(), before going to heap_lock_tuple - but that's wrong, it
should only be adjusted when heap_fetch() ing the next version.).

Greetings,

Andres Freund

#130Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#127)
17 attachment(s)
Re: Pluggable Storage - Andres's take

Hi,

(sorry, I somehow miskeyed, and sent a partial version of this email
before it was ready)

On 2019-03-21 11:15:57 -0700, Andres Freund wrote:

Pending work:
- Wondering if table_insert/delete/update should rather be
table_tuple_insert etc. Would be a bit more consistent with the
callback names, but a bigger departure from existing code.

I've left this as is.

- I'm not yet happy with TableTupleDeleted computation in heapam.c, I
want to revise that further

I changed that. Found a bunch of untested paths, I've pushed tests for
those already.

- formatting

Done that.

- commit message

Done that.

- a few comments need a bit of polishing (ExecCheckTIDVisible, heapam_tuple_lock)

Done that.

- Rename TableTupleMayBeModified to TableTupleOk, but also probably a s/TableTuple/TableMod/

It's now TM_*.

/*
* Result codes for table_{update,delete,lock}_tuple, and for visibility
* routines inside table AMs.
*/
typedef enum TM_Result
{
/*
* Signals that the action succeeded (i.e. update/delete performed, lock
* was acquired)
*/
TM_Ok,

/* The affected tuple wasn't visible to the relevant snapshot */
TM_Invisible,

/* The affected tuple was already modified by the calling backend */
TM_SelfModified,

/*
* The affected tuple was updated by another transaction. This includes
* the case where tuple was moved to another partition.
*/
TM_Updated,

/* The affected tuple was deleted by another transaction */
TM_Deleted,

/*
* The affected tuple is currently being modified by another session. This
* will only be returned if (update/delete/lock)_tuple are instructed not
* to wait.
*/
TM_BeingModified,

/* lock couldn't be acquired, action skipped. Only used by lock_tuple */
TM_WouldBlock
} TM_Result;

- I'll probably move TUPLE_LOCK_FLAG_LOCK_* into tableam.h

Done.

- two more passes through the patch

One of them completed. Which is good, because there was a subtle bug in
heapam_tuple_lock (*tid was adjusted to be the followup tuple after the
heap_fetch(), before going to heap_lock_tuple - but that's wrong, it
should only be adjusted when heap_fetch() ing the next version.).

I'm pretty happy with that last version (of the first patch). I'm
planning to do one more pass, and then push.

There are no meaningful changes to later patches in the series besides
followup changes required from changes in the first patch.

Greetings,

Andres Freund

Attachments:

v21-0001-tableam-Add-tuple_-insert-delete-update-lock-and.patch.gzapplication/x-patch-gzipDownload
���\v21-0001-tableam-Add-tuple_-insert-delete-update-lock-and.patch�[ms�F��,������$���(g��,S1/z�D'���b
�!�0@�����=�H������JL	����~�e��gq4G�A�}0�u;�^�79��:�C9>�L�A������u{Mq��V-D�H4�o�?�n6[���'�+-�b������?������*�{�;��L�[q+��hw���1�u,Z�����nW��G��s�������������Gq�n�f��:�/��q�$-�y"I����Z����@%�*��'�3���?>D�U�q�3_�yZ�jY��-�c�U
Uqr!\c��i1�ba7��!�9�<���b��L �@S�(�_���Hf�'�L���������+w&C_��Lz4�Hxr�7������V��.�"rr�m\����R�5�Q2%������� ~y���DiY4g�/e�B���&<T/��O�.o�D1�I����D%��F;V���W�������j�1��@!T
����$�.6�`�G��	���a6�S'���]-����$�J��l�'U�I �B��t����?]��G�W�?���O~�
.?��On����7�����]4�]H����\
:q����������Mc3Q���A��g�<V*�Hk�
���{_�G�0�D��`}���*���<!"[=���j�	��H�
�B�$ �6�	�J~�'�Va����~h�,#(��%V�@������M�"���]���'X�9���bH������3�Y�v�?�X�	��(%X	31Pcf�`*���"�^�a,�*�Y�G��"�Nl�MXu��;��JS�2Q����;(= �,������a1W2��G�n����i��Y��@ ��g�0�
C�	KL,� ��9��!b@J�1�.�`���������%�8Fr��^Y��'��!����
�5��X�B'��������9�u�Og����L�I��8b]%��J���w�,�����=U�&��D���o��s?o?�n�N��
��2�s�sxa�V��9~�����4�C�������=�O�4�%�j��3d
Z����B���x(��I��� ����l4a�`���T�n#O]D��qH����)yb<�QI��3rItn�o����.�b�D�o(�7��
~� 6���F���9��_?Yx���K�4�� �}-��f�G
G�[�C�V�X0:���Jsh�G���\���S K���Y���dbR6Mv�#&p�j���e�?�|+	�_����S"����t��U�Gn��������)i��u���-2Ek�a�u�$�vD$�a���&����������s��GH1 �c��IqU���O�}1<y�'�,g����N��p~���9^D!�w���	��S��w������LS��'y��G�c9N��4���k7������,I�m���L����yI_�^�v�Kw�H��c�����e �z!s|��q���4�|���}������r��:J��^�����G�N���f��I��b��Z���GO���b
�HnR~���?������<�
�*���n��m�9_��Yt;�gj��KdF@�\d�>�r�����$�}xy���|���`�O�?I"d'������m�-�a���},��\�<U.��l,IS�����nX+|�C�v�p��������rN��������(�_F_��%�O�,���~�Q�������D�)��J��+B`�s$������9}C���uG5��g��bZ��V�/�Ot���]���+?t��SE8O��J>���6N���a&����lE�e%��/O�0�2�&~�:�=���k�1�^��:B%vy�X�S�5���Zu�����DQ@n
�-�j#y\  Nt=�u�d9;��	��<]�
���U���}��V��e�s~T�u��D�6E"'/y��K#��A���<t'��N�^?l��:����@��k~y5z����^�Z��*l�|��j@���O��C�G'7?��:b��'����*`�{
�Yr�\����� |��*	�CR�9Uq�8�{�����&���W�����s"��S��)e��N�u�wv�Tx�<x�@"���s���S$����?�Ds�������Y����;��S�E{�}��?I�E���#���8��_����'��CXI���^�?�sb�6������4M������x�nu-��KkGA�%j��?��@�#���������_����X�w��<�ds�����#o���n�X�Q�"1����{��E�N�\W��4�m���P-G�\����
�.^>�����JX�.�=4��IP��V�V����b����L�p�R[F�w�ujv{%C"s���=3��Q�&3bXsu!����oI�����|5����(�s8�5!�Q6��0����~P�H~����d���X)<3��<I�+a�[������!��w������f���F2�����������N�b���tRe!6��Z����/J[��li*�iQ� ���J����!��:�R��_z���l��D4���\�E���N�&�/S�9����}����SE���!��G.���+c��������7�,���>��Cer��;���A3��)�k���v��V�5���A�{�WT�$��e��ss��R,&t�������i��mY5t�\
�H�o�/�.T�����,V��V��zt`���fk)Q��`	�1����>��s%~K��J���X�r�X�0��T���'�G��s(~"�����f9o�sR&!oe�X����g4��I�
�Aw�t��YEf��9��n�Y�-�������H�}�a
�E�34e�U��w��-_�=�d��C����
��V���N
�����r=\��o�����_N�E���o9�j�6X�F	S
���T�0�*��
��+1"��09��r�D������X����L-��qMl���c:hd5�`�������9�S����lJ����	�6M�� 8��!eR����k�6����
��'���0!*��V���hTe�_C7�$��S��u@�M6�����N�N�.���)�RT 7�W"���$�e<>P'M�QW���X��B8@�ZnR���EA��5��"���I �+����*k������GIIC������g7W�_
$i�������B�U�.�E@�N:���+��{i��c2k���h�Un[�Xt����?��"��lDSux��z��E'�+�������%\����-�Q���Z�������S@[Q+��l���G4B��t�a4cn���v{�?�t~2���eR�%=Q���'^o,���U&e���(�����39AfD�L-��=��3�.g�i�dB5��b�i�E]�t��7��^�7"�iN�x�b�/jz��AM17�)a�r��)D����`��EG<"t�Z�2�*/3}v2������k�:@f�Q���c������
�Q���Kqzuyv>8n���r��p��/�y�&`'
��������@�)7��"�"���:�,hv��PhmGC3�z������eL�~�{B~f�}n
��sW��������9���-����5Y�1j[xs��	�^��PA��7���o�^���^y��Jj���2%���l-��v��t�
n�%����2
������5�Y���Qy
��B�F��DB~v=�G|�ET|��}�E��()�ca{j"qc:Yhd�p��~L���.T��=�9nm����(�"2���trm�	B_$�e�������{��r��siM�����"K��]�(��
E���;@��@�<���p���Gm�
���IA�D:6���A�\0u3o��	�����A���C�W��V8\J�i_���05f���A�����T�ji���K�MU��
��g�c/j��C���~��-]�!�u�� �<��!w�����H�}�>����!j�&Jm����j��'��0��\���\_Kw�[�C�r�=��qk6��n��b��(;y�U���D��-���=^����S��$���D��_c�e=nK�^}�A.��X����%�W=/A>�-���dg����?qKY�I&�8���)���=i�$��H��d��SIe2�5>��*�����\ �T83v�Vq��BM��'S���1&�����T�L
?1��"Jt*���U-��lc��ng�,=3������A��+-FhFc^����VO{��N�������{s|�Z%���_-�>s&��C�jL�����#�a�.����%e�^���������y(L�����-\GN��-'�jc���T��\��wj�p����-�,��,i����U�=V��1��	� qE��`\G��T��K9y}����R&��������i���K��~��y�!��aDc:�	�%>�I�~��NU�bsO�1�S!'��	�R����PD���w�v��eO,�_#�d��\��r]����L���I4�J�D��dJ_ R���BU������(�
$�Vw5U�S�c�p���W�W����	�\����;��[�I���)��2z�8XPa�X�7����h�3��A�j2��S1Ko+T{�1U���`A���\%Sxn��\����d����m�z�)�D��u��F��7��jd����-�e���|�*��<�)����O!�-�4�G�D��J������-**���������������j�$rt}�����ip;�G���a�N�k�s=���	���,�����X����]S�4%E�e�E�,��a��7�Y���[a����m��2
{G�N(8lQ&��F�,�J��?m��7&�����W������]��3����+�������n&�����#g�MG�sn��f����S�#,�Wc ��;~�K�U�}��jh���U���)9U�r����'�T�.�^P�q�e���:?9�bU�������`r�Uu�6UH�^�[O�%pW[fk_A�E���'��=�bx3��,�f-+�q�P�l���Zy�L~��b�P�����6�l�eA���W/�j5� 	���0����A��-�M���A���'+����%z+)l{m��u����2��+���P��}����C,����=",�����F�e{^'=��m�O��D���_m����C����d��J����d
-�s���XE�V��%y2��S�O�|Ax�k3�^�V����2v"�K����*�yS�E��7���p�T�������c����u��R�	�������+>�~����-��;��+e�a���l���5��4A��j5��q�Ym�����j���I�/�
��/H��ISk[�cR��$v=9&�g�L������J
WM
Yo�
_��>)p�s�X���.��:����4��I���K�K�4�b������bG�x��2��#��6z��D���� �b�N���G�0����H�-������q�Mm��}#g}6�ad��-������q��"K�^��L�/�%-#��n��?�������F�l�@��/^�Ez�����-�}"n+��J�X��4��H�5J*�������u��O�Qc��V��&�=nU[|O���b����j�W��E��v����y�S��J�[T��Z)�"n��T��QI=�J���d�~���[(T�-�}s	����
w��'W��W.���������@�H���TS������Z�������m#K�[����#����)O�G�eGol�-)��f��p�$�I�M���i���VP AJN�=��i�D�Pu���]��;���B�?���|"�.�x�xg�:��!�8�\�V9�N�c��!�.��via_&�n8�����|JT�����i���N��~8�":-t3����;p��q���W2���^�-9���R�=�#�r �1���9��9�Oc�+s�|c ��|���e,�nP����ev��W����E4��*N�F����|�����@bk�[�r� (5k���P������g���#����{:���O�P k����&��dj���1����2�k����<�l�mv�vY]��C��0I3�*����G� N�$�2��k��Ds�@~�^L�[bs��IW���g-�IP|�;9��P,��P>$fzj�������{�	���L�� ����X�+���4��������u��9"\0��r���H'�b�1"��s8"��aN���0��f9��k6h�i6�;�m���N']�&�����d�6��i��h�m�h����:#�8G�aC����$N?�%<����'D�&�yC~[Jum��xW���W�����d������j��Y�c����t�����i�K����x&�H��t��4>g9��e�~���6
@���'�M��	�8�>��
��a��fg�]nt���8��A?/�5VJ���I<Z������Z�k�p����c��v����xpR���~>�O�������������`)��������U�B��Z��|
����p�RR^�:����|q{5����<8�8����;r��L���wH)�ng�e����'�k������a��
!����xOS{|M�G���l���w!��-/4��(��@�U�[��
:�	8xA��HW+�������y�w�����8��6�k�5���H���O\Vk���d�@}��AJzsHW����y������m..�W����)�����������|��;��q���%~�S��!�HZ>J����8X$nO6�^�W���}��>Ywa������w4�&���n,4���;�G�S�����)�������>��2t�@Z�n�s`H�2��fi�1�cO�]E�����������D��f,}��>�;
�������� k8������l��I�������>.N�_�=��Ic��~}���LD@�8i���b�ph�{d�,�����
��M<H�U|����:���X[G�����:)6��I�b��������[�Y�.|���zD�����R���-S@�gk�����]���z=(������,u��5�N)�;��F��r0���_�&�J�|Nn����V����PV���kd%D�"���hoPZ��Y���+�o�K�]���]0
����� #��kAo�8�V2;�e�]��`-!�Bq���� u����&�d���G�	��1&'��O!��ms7am�e������m���nb�-uWtI�Z����,�K�.��k���gj���_�3�Z���c����
�n7��IL�f��E�S�_v��������&����:��yM����_�;�K������>���ynMH�.��kM��6��������^�p�Y.����-2C:�������7��8a�l���4'MX���T`>��7�W����,j�����������^������
���>��zc?}J�!x�L\c�K�}���/��N_���m4��W�=l��q�@f��P��[�
����#�0�(�B���E+��n�g�$������LK�PV�j����R��/����&G?��� V��=F/��XTI��3�o������T���R����_����� >v�S���<�����2�Y|�3�+��e�v�@l��X_u(D�{�����a�,�i�!No�X��`~;��0,B�`e0!��:�4L�s��s����3�M���XYc
�]�^���\=����������cLq���Z��>�r��;�J2��K��s�a��(�F2I�P����z}g�"�J`�i�^&K�v�?x ���yg;G>��f���!�54Dx�O�g?��EY'���;6;t4����`�MHT��+;l��.?DS��G&]��6T�%�q�5
}���w1o�o�a�1�7����|�������I�?�i����q�,��@������*2=�R��O���~-C/-a����#�|s"#���PUZo�^��M��I1��H1{�B�0�bb��J	��c �/�|����`��~i�/=�,TXr%���P�$b�O
�\j5ke(w��3�5�vN����]1���/	�`��`����l!5�/��'����f�8��p��AlF���;��c����>\��Y��2�����a���+�O����y,��T������R�����j�S�H�Mv��e�����{y��F�u��*��1i<�p��R���
�l��
\��pZ���!����{���4�9��jE��V�:�����V-����������;��ql�/Q��
���DE�J��<������GZ Tu��3z��p��%�B���B�`�;����\cx�_uF>N���n�D�cUQ�����%D�V���)�%H9H�Z\z����JNa��=�_e�P��������B+=!�8?�x����`	�kk��eytZF#�g=:�4����B���[/�z��)��E�1��mq�cF�;��oFa�+���V1@�UV��h���(T�R����(&�B=�ht��L���Z2��0O0��+a��$D6�X.���z
�HH��B:�r�����������n��Jz��Vh�R@�Uk5��w:��z�Y�E	���a����b�&�~�.l9������%�B�'�hD�o����������e1����e�T��gsgjg����,��
��)r���a��Z�����<��,��"
�������`T���UF�1��Q;��l��R�T�� ����4����\�J�.�mF�I��/!��hxtL�M:���-��B�A����_��a���rc�����2�>�Y��4��K�	������r)����_��=�q$'2��Br����� ���=vHm�M,.��xq1{�a;#^���B{������Rs-z9���rV����$�����S~����G�n4����2��*Cs�`�Kn�!�v�T���:���j������j�C�xJ(��2�)fCq,*/`
�P�;�	����.
d�=H����s�i]��T5�t���m��m�&�[�����#�R���{����Rz�7�}�W���=�\������R�������I��Z���{~yJ8	ISbu�>��Y��$=�xr�^����P7��X�;J�w`����
,��0W���;k$�8%_��.E^����:f1|��@oX���B���z1#��N8��C�Yt�^�Z��������_�	5+A�QkYZ�<�'��7�$��g������:���K�j�:�}�%}��x�Q��=h9��no�SdJ_~�-�9S���r�
Y��� ��q�Y-l5G?F}���H�{�n���p�g��$��l�t��9��	��}$����C�j4�
��
���3�M4���W�'��<����2U��{n,�R�i��2cA\h��p�2iP�S��<L�}i�}��H����a����=�� ?��E;=����m�n�f���J��9�-���������K7�����1�|�4��ot���B�.F�T����j���^���~{����d����B����LAx��5�-�D�������7W�tAf3pV�z����?���?�b�2�����x�0Q�G!��(m| "�������o��O�0=�����F������Z������ma�(��#aX�
��,��=������$�E�Q�w�1J�� �4��J�,�7��B,���E�������W�����^d�V����c$*@D�6��~��T�d<Z�{6Ia�XP���n=����OL�H��H�@�V��
	Vt�EA��`���`(/Ki+Qr��^[����IY�^�A[;
)�7������WV����2<d�v��U=9���[���~��O�	���Zm*��������`UMk7�	��T@�N4S�?-�5I.W�������	�|��8g�n�e��!%�������:r����%���k�6���X��������WN�C����
�e�^
�.����������s-�+�K�*]�����{u�����o���l~o�2�j�N���?��L���j�~'�B$�$�Na�h���1�n}@�z�qVYX@�fQ�����`���z���`�}s�:H��>�t�;e���l�(�j�w�|t��j�� ����0;>)�}����o���p�~>S7�`:���[T�W����U�m>�2��M4�����n��
��y0=���kt�%����W���_Wk�K�]],����T*7���Kf�=��<���N��������M
����M�D�a�)8�5��ugP�v����~8��jA�V;��R�T����R����Y��_�Y��M��Z�������g(S��R�!D��^�E�=�1�7��k�p�����BH�[�Hq����t�A�~X/7)���I��D$��@�q`��H��P��,��ba�G��84�A�{�h��!!0���>A��16�9�o)���G�����<�%��t�����v�� ���7NWn�J��wJ#DDlm
��Yh0CK��R��zr�)���@?����kJ7]V��U�IR��]�`�
* Q���C������n�0?����2�z�^D�3\�6��"������cB�������8������*ql�EIXW���W�>�G���5�T�mE���
��@L����R��e8�$[��+�-R�������]^����z��?7��fS~@J[�����'���������6�9P���!����
�����WV0��03+�C8r��e����)�������Y��KNt��U�t�u�Z���v��(	�Dcb�%%8�H�R(�$�!4���`�����QW�*�$w�^��J���_a�6O������-�oFq�3�W
��R!F���P���V>\xoA4:�6)���Hz�R�3�i��JI,����^$"i�S+B�Oi*]�t��>\����������$��th.�1Fu�kF���P}0�5
�u�g��	�R��������x9��t�(�t�m5`"����C����JpK��X���� �

���,Y��J�sv��T��%e���J��@D	��^�e������X��#�eF�=���WHF�E���K��P,hc���|�V����������0*�K�7��s����v����>m���S�����	�Y��d�L]5>�k�,��Wd`���5'
Q�-�B9}qzy��2L���z���-`�}�m�WW��CJ�{�2��@fGt�V��eX�������;Xv"����t6�xp�f��">{���������7��W��O/.���a��:��0/a��%��i8����]�����X�O�Nm\���c}��:����#��]�A�"��0�������3�6���1tC��&���?�l�� ��n���[���'/W���OP������d$L?���t���qR`?��GN��PH���no�9J~���
���O��Fb��y�'r�l�=�
�����1VL=ouC��=��
�l���Bm;+�Q8p�m?N��0
k����cb��TY%9~��JW���R[������cpu_����J*�3+U������v����q��5������n�a�U��<�$pCeE�q��e'&�V�S�[��poF�crG81�Cy�j'���H�y����#F����|&8V��X
v���]�k�G'�� 8OX�� �����OT�N���6s����~�A�����/����!��v�v����)��$��5yXvBvt9
���<G�r@�(�2���2�X�5J����@�S#J�Ug��{=���e�pCH*(�yYT�R����~wOl��z���u���D���lK������!H%�]�*[�YsQ):a�W�Uwj�D����.��B_z�)\�.�
��M&G4q*�Z��%��b	�L�X
��N���;kwI�0
�@����u<�$���]�BtX��@�%�hJ�.\���6�&t�V*t8�$����n�1���:5��?)��&��[�Jx�%����A���T���B�'���t��K��c��=���
�r'T&<�Y���/�����\�}j��r�'��#�u������~�47�(Z�7�H���E�,�h���2#g�+JU� 00��Y���O�]����n��ks�?���P����5��_��N'�����Z��m������F(H`e�����R	����8�h>�l5����k-v�&��w����)bT=F���U08����Sga`�����b(1��,���v��4L(?A�X��,�CKL�5��-0!q��7b�wu0xU��k����OS,i�;����p������{��J
���W-m�B���`.c��?I�V�-
7��]���s#���`��J����tD�����a�h0�9���C=�	����0Pk��.��R���}�O��1��z�l,�+��D`�����,
l5��]Q>O|���8�����o	���=������:-��,��c�=�](^�t�:������H�~��Bu�n �Y2�Y��XE�P�Q�
���!��@]$j����h%Fc���������tqo�������0��Q�mKo��
�2g��pu�p�\[C~H��z�\��d?��/�0S��Oe��f����H@y���:!�A������s��Y�P�r.G2���jhA��J����lT�.�(���*��;F�	���x6�����J#�=�n*j4��k�cN[����*-R��;9������?���5�������X�A~?���������>�K��JP����hs�TfN�Kx�=�J��^ue��$��������/CQ�[�9�����b3U�p�����iB�b��)���&��@��S���gb�
�%[E��-�p2dW�j�zPN7���9-
#���Z���k9������b��^����;����`�#��j5c5%��=��)d\�e,�X��*M��mA���Nd�)X$��n-���~�}��
��j�=������Fc��b��B!����a�sm[_����Z2%������?N}�J�Hd+L���n���/~�7�����ET�o��-)��j�Wq/�rL0���df��-�j��,`7�8��v���F�F�[�L�i���#�)%�����������MU���,�E�X��.nD�����>�������-�x�G@8Y�)t�;u�FR��a�C����z��������W��a����/�/Ml��A;Ex��U������Lo��l�����z����8�Z��=4�IE���<�]g�M���(�Q���-_��>�c�g�Uf@����	[��7
���pN*�]k���U:T/C�QrkCm�W�k���_P����J����t�e,�8e��[�O�n��	�U�[}��	���97�W&�_�-oe��������W����
��q)��2������'3>�ChRJij`;:����]��8e�3�n�C�K�(�������Z����j-�G��Jv��z����V��h�o���W�(���� 6���.����7�*���:H��8����O��*b�P�k4t�7�TyA���@����2��M�/�y"���J��g�T��>�
�|H��Z5���J��B�:9������W��pu����|�s�_�e�`�
�V���,�6&�rZ+u!������������e+.��3����wg,�;6f��o;��vz�j���=q��Tl�����v�w
����N����%\�T�Z��C}�V*�<�T�!0��r��(1����
.�ozsu�b�����}�\,&�@�M�\��YD�N���q5:2��A�
���x����Y��.��~��~j2>W8�3j"�nUk\vU�����Iz����.J+)�����;�P����\��j�V��NR�#]V	c��n�"k�j���!pKg����zJ�=���V���P�c�]�=:�u�quy���k,l������.3:}'��=l��`��?a������N��B���zP��s#�'7��R�����F\r�o����[{�,V���B����V��y0�u1�nS�{e�B������c(����=bP�����X��-9k.�g7�oE�p*}�&�m/�lp,�n.�B��B�������gxu���}�G�gg<�,�e-	�U:��)w���ZZ�O��K�9X2^]� !`��8o�[W�;�����tPo��>�)c�@�f�M�����/O_n���l���v3����
���ryA�e	��j%�T����
�!�^��C7�,X�R���D^=?B/
��!�?��RU����'Ei� ?�7��6�zH�Ub�Hb�?�N\���eFQcXV��b�t�mt�=�bO ��J�{a�A�����g�f�;`�s:��rsBP�����EG������Xte�+]H�P�$���+�����e�Z����y\�p����-�,�+�����3���_���K���3+��Ef���z�g��79���}��d(r8���"�,��%��������O3�B�H�!Vn��X��:����n�
�U����!�]��sD��/��6�����v3�L;�zo��VkU�����zp]����E�:.Iu�b��B���PP�s�0d,�)����U�d��(���wXV�YG0�]L>�%2��SA�~+��^�;�����+�?�d�d�:���k�^��rO`�������F��h;�����U��z��w��N��r?����
���=���w���
w^w��A�������j�2L�����w�(��A�SA
���&���%%)�x����(}D�*����4���6���r�P1���� 	e1����0���{�$9�g/	-#����[Jy=�������Vy"����E�h��r�����b�������rPc�Ix��YIExJ��W6%69!��
����@R4����*��Pf��.�!6�����8�y�z��A]�d�^����)]t�n�jOt
�\>E;��*�Y�X�"�n�t>���+���/����%o��H��.]�9h�e��v
F�{������Z�������f�1�05Q.�j$�(H"s@�o&>����Z���z[	p�<��M�������j�s����H�h��IDx�P�)��,��D���+_��<��� ��D��k}�tw���i�3�R=C6`.a�IVY�����8������w
���%���P
V@J�����
��=�?�C=3X�)Z|�P�l��C1�%�zQ
;f4U�����x�a�E0Gc������,R��|��&����#[Kk�M�B��c��9l�����������4j�z}u����b��E+�>fv��I}R���a�����������aD��Q:&�M_R�7��h����X90Mm�����$q�7B�C���-"L�����H�p1#�
���7IX�����:^6'��[�K�4hwJ�Y:X�>�����?��|j��sw5s�d
>d�r�)��F��;k�A�_��Z��v�V���wp����M��������6���}��J�����2�j4d�����CNLMAvs��s48�J�8�\���gQ��b��=���AM��bh�3
�
�	T���,!,�t���H3E�2���QH%P|�PI�>��/��-LBQ��V&����h����)�s�?o�����F�+���;Rk�s4K�����_��A)�(Q#Kh�)�r�C��E�SV� �����f��DP?���g��-b�Vv`�V/S�H�;{�cfv|A]]����p�l�;������8�s�3�8�|"��N��9��V��A#�a�p�,p"MWK�i�v���Z��I��	'y9���([3�?�Ww�<]k�0$�`�Ai�[p��_��*���!^�3^�����|�,���$��D�B~�>��5!�E�a�9�������Oi4��2��x4$d��)��AR�f�;�|�h=�s)�����
��|�������cB�=U�p[i����'�,���Ru� ��R���R�>�_� Kd���Fh����RA�1��Pl#�����p���->��h>�f�pB�
����$/��G���W�UQ�^	`���.�]��V��9H�N�o��P��)
����8���������~�;X��L�~���	��h����/A-8Z��o��L��4�lUnG5�2@��_�5IJ�Ut����t^�
ph{B��&l�z��B���(�@8Y�����p�b�����/��5��L�q��x��+�	����PE�Ou\���C�:^���K\[���g�_�^����<�:���p���i����7oIP^ x���O������ �t���|>M}�6������)��W=��:���l\�^�������~�9����
b9�~$���n�*���
�^�=�4��s���=�~9`h�+\~#��u��_�a�����
��F���c�wr�7�6 YU�
&�n�k�S ��~�z���}ly3�����#@�/8��<f�����7�Bf���1���Kd�TV��F�B�$�%�p�7�gaT��z{������j�%
�c�����.���N?��G�/���9SO�F�J2$�]-�g���H����7��S u�n7�l�@�bD+MB�����P�[-q��P�]��+�
=�����a����8tx����>c��>
|:,�n�F��{���u�R��o����[Rw/��Q��wq^ED?���L
���\���4l����`��[lU�i��O�o���i�'Efd_��o��o&��
���=xCja,���Zt�ZW����W_
4�/R~f��{J����--x�{U�;kSr����y�C;�"�vU�����A�s�c�d����$�siNcy*��>�/��Y)���h�,$w�WyyV����CNA���;���B��R�8�$,��l�)��H0��}�&��R�-%J� p6T:���TW��a`l(�-_��Q��7y����y���=x���c��R�-�uH*�������(f�}0S�P��� �!���0��<����:���$R��tA�F�	�jp>�<-C&�0�����
pkj��i��SCw�*��%��[��2�=��{	��9�2��X������n`���4"4��������B
�o;v�kH��B�!I�5Z����X]���}GW���a�����6g}��o����(���K��!�j<������i�N+�������a�68�V���Z��o�{��v���������_�u�V�?�O�]�$d�����	v��#orv�I�/��Sa�;����`#i����y&����t6K�����xr=�s������#R�)(���>p�
���.`���`�wg`��ZF#x���#A��v.^��-/J�`����0*>���w/O�B�����y�?�-*%�?y��m���`��T����z�zPr�x��
�1����@#%��%��U|{���y�h��j!���^s����ubz���j��;��'��ZqZ���~���P������{��F�����������7�����q�7Y�GZ�9���X��gL������*��0�

���!��bF������N�	f�QH���-�1-���L��� m:�A%�p�����5c{�jHOw4xv"C`��7[�k@��N��#��Gv	�9���]��/p�h0`�o�|%�x�
��R�r&�8�����v��D�M�"�vb�h@O�Y���F��b��@�rS���<*�kC�`?4EE�O�`��*
�5�5��	�Z@�ix}
�p2��������R�$F�;:0�k����/7{8n�0�=_��������(L)�!���XX2�	8ywL�,������0p#�X2�2fX�����.���6��F�j�
I��Wc�C^/ ��~
�Z��O_bT�>.����T�n������Ph������v��C=�����8�����P���A�LU������Q��|"���8wtd/Qv�K�W�aWJ�������OV��m9���E�aO����a�����g��"{H2mW�jKU�@��������9}�W�E�{A 	.<%��s��JW�w��������$ ��Rz>z/sl�O<x���>�K�U�x	���|��)p�����O�e���Y�'&����/(���c^���'�X���X�0V��$��PA���=�tOG��S�Eu�MB�M:��j,�[B�����.=Y�tj�P�B��~K���������L�Dz�8OxtN&7�C�5V����/��N�^��9�:���X����?����e\��k�t��������C����Xb2��j�}�]`�]1�@��R���Fz��������x"h�I��
p�$��;P�
����
�5����[VXJ�Q�&����Z�&�����+\3AC>1�RH�YK@��;m��
��?D���,�$�Ie���,��@�Hx(�u�s�d��B����w�wM��p%�L�����=���t��bP��#�+X$I�N�Zx�����!�bw�B�IU�����q�om���jXd��0t�AwX,*`�3�uQ;I_�S7�A������I%�����s�-�8���)�����]��Rm���@=�.�`=]q~u�O�l�MPV(�N�@y:*CI����V�������WV���ID_����;:8��	:	~��s�	���	9�1�1�I�o�;b��lrt��2W�7�oi�y	��H[ye�T���5_���F|�f!��5�b�)
��\w<E!�^L�Y_��t�@��,�/U\�`�rIL5U)w��
_�T;)�m����b��=��7������up7�y<�e$V9�(��C������
�e?
�)��^,TJ���N��	�=!e��J��>eN^u�;!G�Fel����	�p���������E������Q�pY��%p�>$�C��j���D/]�=�H����b����O6��������#S����Xx4X��Hy����
cq,�jFR���Y�Bs_gi$H�b�L7	^A��#�vw7�%����g(� ���4���7W�t�:�xq�C��c�T�����Xi���Td�=��I��<zY8bKv
�T�5���5O8��_u�>>{s����{���qd7KA`P72rC��O�=���O�D��*��]ol�wL��K�����h��g��rdsS
�?��K�!t�8�\p���f�]h��^���~�(�#zF�gRqr��M'�	��o(~L���+38])���n��;�bcz����*�>�?���V���0���]���fPpO���(��@���93=�����F6��]1[�%�;���Y������@������W������1��v`;�/!+\�(6`���Y��f9���0�s���h�W�]/����g�ho����w+�����G�]e�$o�EpY�fCN��R�������_�C�,�DQ ��M����$/�����j�t��b(��X��*��L*���s�#&>t��%X���1w&�q���~*T%� �p��)h^wK�)Z����+l����?�����>�Kh��9
u����'�0��#6~j�/Q
�R�R� 6��Q�MD��=#����A\'�,#��5�����lk�(y�Z��m��i�E������:$M��>������Q�T��6R�U|�@��b������4���}��{�Vz2��fZg���2 �
or��Io,�el����f'X���M����E�Y���<�w�<�(h��lJ��t��&<,^���\ak�4���zxH������|��qi�P�n�=�P44t���
�X�kn�%�6l�}+��Er.t���c�7���wSF&+�t.o�l��%�����Q������v~���Ln��Z���g���a��
,�bj>P�	��u����+��Y?��k�0����,Y�.���/�&��}����E��]�m �sZ���-���+����?���3�&�
m��<l5�r8(����VG�'�S,�}�o-�n�f�Z+s���nH/��uyf
�qPzuE���m�����<N@����0���1�zi���gw�Q ��z�6����C����CS(Y���$v;�������c�����`�sO���d�6N����{*��
�C1w�BTcXN,KM=����T�&���s��=�����z��b�����F�����V����r2�.?���6FW��
���n��E��1]] �B#����Y��&u��n�T�{���sa�lr�Ul���"g^wzE�:���X�0�e-�i-���JN�M����!������E���Xh��D�(��0h��]�%�_;2Wh�K��J�	N"} %3����tM����|6U�G�~2�i�'����#9�������z#Q�����Pj�����0��T�X�&R>����t�RF'��r>�vw���!�pX��U
��9�q���������s��l��$P�]���^�!`rEy�a��w$� ����������aLN����T�k�H�F�>u	�����o_���_/��v
(}����[���A�Z����p	���.WDt���A���
M�d5�h6Re	6R�5�	��p)pY��~T?h��-5���r3	�i��Q��W(���r��b��h�B�`�����|�P��@>7(@�M�bB-��jH�+F�Vr��Y{�bn��`�� _�P��n�H�`JG3?��\+8������#�e�c�����[�}�o[U'x�U���i��!��L�_$���h��6oP��lF�);C;�������NHm6�P� ��p��d������gMc`p�A}o���q�E�xjmR+�D�
��l�+�$9G�����cw�irZ�A��8�-#������1I=�-���R��h�����'�o.�k	M#v�)��w����F���_������?���Gd
�i��fF��Zo�����9��h�L���q��2qrE1��Q�����T�r]!��c+��o ����99�.����Q�,Si�	���GCz~���� ��V�����F`F����_�!@�Ff#4Q�h6�_�����W���|�U����/��5�/�����W����/:hyiv����y�S��#$���2Y&��L�$��a}g,�����
�+G��b��b��SH���w����%���m��~�����5�B�
}����(V�$��a�@�1��J������ �"z���,��^]L!2�2�3�
Zea��K��RO@&�z��?�3y����*n�E�T9��XZ�r�y<��u�7�V��b��YQ�t��N�6-�}�������w���O1�>/!!�\����S��*�}�������Mu2Z�r�0��n��NF�������f�O�L*9���]��c�H~�os��+���o+D|����A�d����J���}��Cg�q��)\�T��c����\	G���.��>@
�w��o~��'AR�H�x�x�}�����0����a����8R�K���������
�\�73�*��t�r�	�P
�CL�����P��
�c�������}�x����������w�-x�B����E�t��+���3�Y���0b��^P�U�{j�C����0��\�������r�����e����L��g�Jt2��9t���^��h�m�I:�v�.9&&�,����I�����cN
��%��p$�@��������TG�,")^g)�����Z�j_��Y��n�wJ���~�Uks|B���>[���K'���������CWY������Gv�����d<�����bg��������]Bl�����:^K��J���sV#d��	#�~c��>��TB��B�����f�@�#��Zgr��QY)u�zg:��s��%����sw��Y�*�xY$C�xYd�����e��CW����:�%�YZ�PG�@��/��Jn\��J*�\��;����U�6���a�� ���[Q�j�~�:�
�Su#����eqC�gs�2��U��l���	����=w�����L���wP��}8��Ao8�&����w������Ws5A����+�j���f����*�>��6���Q�]|GT#M�8=�A@R�������|�������u	P_�0�Al����PV@nC,Yh|��j�2jA�2
��r�K�@�.��S&wx=�0dn���z��6S�7�h`�+S��`�@P�F#���&=t"3
���Y2q���!/�-�l94��\�`����bv������/����*�~&4����m����5���Z!�?��]gw�*�4p����6�9���~p�Q�X'6U\wHF����4d'N��J��=���{�����.�~>="r�z�QI-�c�GJ���S���\�5�aow8��S��/�;'(����4&����������������E��fe]��?��g 6�*B^��!���INF9���*~i
�mp6��g���1�w�-��J5F��������%�iCur`��k������AK�J��O�L"+���J<���J@����w�������aN���s�����>/�t��w����9���iI%��2��%�*�'e{{���_����fcf�+[�<x�����tN��rs�S�����8���_��^�?�
��>"6�~Y8�!rr��`0�O���K2���s\$i��.�'Vk�*�,6ee��'��x�zA_k�U�p�4{�`����v�Zqb�IBs��"���l�k��]��l�8-�v����T�r���9��6Z����x��<@���**1C*d�y�ox���PZ7]Fp&�,�P�	��s�mf�����YJ��	[F=����d{h��j���[2���B�z����:���������[��`����������i��;�-����=���k�}���=g	s�C���N/N���..N�]���d����R��~N�y>r8�6�����uH�eZ�6����is��HC~�����:��A����sG,|�c�������7��d�qI5T�iO�W���nL4�/'0\�x���vw;���C�����`��N��}�)����1������`Ic�P���64�{��
���[�~4��f�nkem�iZW�	v{d��N�����Y2W=Z��}�*��O�,}l��d�3��<�����j9��j`�	`����E�Y��BF�9N�K���9�q���oc!w��b������o�|�>vqz,`]���'D^��+�;@;�����~���
��M�����`���|��/K�5��q
Bq�����W_����S0+m�����yF����iy������C���$����~����=����'��R��!�t#?�s����wZv�+������H�"&��-�R��r���Z�E���{�8��TNCvT]�:�a������j�y�?��[�~AG���
_�����Z��i���������J.�[ki�2����i��������$�d�_VD�HU���8�Y��N���,�s��6P��_ kM����Qu`-�9E�0��X�6P&�����@tE�'RD���PY�/8�85�g�
b�O�7����\�!���&��nO~�$mM��j��J��'/���*��Y�}T��j��b�����x�{5������F��c�(R������!I��g���+�i���� fm�n~	m��Y�{Z��J����: �JD	w"��=c�]�$������D����>����H���{���t
����M	wCt�}��`?~�"��xKe��WsU-G� �Vw�����e�a{@xH;��R�O��Jf���<����a�a�������QeM�i�L��b�5�)+m,�
�n���"+R$�$9]��N�c�}/
K�����|Y3g<���_��q��VM�Y.��V)Z�xI%�
�0?Ke5�����]pr������+����	e����!-(��[��
��b�d{�w�&b=�����\�zs��A;pk�n[2o��\�"�%������J�����]P+���-b��(�\%'V�:a]i,JR������bL�j+Uv���Cp ��5j�F}�������_�;)xy�;�����k2�!��"3V�_h����d]P�����{�����@n.�[�U�P�$y-����;�����'�k�u���JUN}�����@I	�����Gb����Y�1Bk�RK���((�2�`���������PM�?�;���U@@D�
���_$�tv�7r&��!�d���jU����"��J����S�.����52����#
�g�����m��_@}d&��D'o���:�f�Kol?��_�~����n�TR��>K�)�����(j�^`H-�6�{��%W���@o@~|��Z����y���J���T��P.{���+�,�H6���,���f��(7�1.)S*#��[�(&�x8��m=��U��SED�L�B�1`��"�2��-S��e8�z��l[�&e9�c�@x{BU�����uq�q�l�q�IyZ�����J�<�	B�����|z���|����J,�9Z��9��3,~=���"U~������������Y���������$����/���������dg�<�L����$�BUT�y���Y?"��N��)��X�F�Xuq���I��(PC���I��S%J$�+W���!�L?XUl�n�~~���
�����'i�%���|,�`&������P�+�
� %mQ�&��P�`�ie�b��I�G�Q���A��(!`6aC ����x�a�$!��Sl[�T��W��U�[��s��a�����a�5��?��y�	�N�������0��)f�X]�����pE+Q��r���N!�rm��Vgb�[����p���c�����Tn\zy1�+SH~\~�����o���|6��;-���DY���b2��L��0�k>H$����^����������A��%�Q�!n�}���K��u�S�|��r��W��[��8Ip{Ya��!��`���I`�)D1{��-��A�h.���%�1h[+�`y(w�r������f�m�����M��
��#�k�&tXkp^��v:����
Y���s\�{X�yu&�pdB���y(���ku0�T�7�U�������8{�����e����q����w�O�?��s��
x1!�gI����&��d��SV�^�t���k�2��r����^/w���Q����|�Z^��������W4/��[?���7Z�(���F�17(3�4�Y��c��qg{�-�����z4�enb^C7(���e;a))a>�',��4 �1��y�
Yx*��m����,�^_X�'c��r��H��x(��+�b�E��}��H�3
\�DOnu���	����EA��<Izae�R��D����U��P����2�o���@��S,��ULVz���5V�z��(���������x;:��:zX�"�8uP]��+�������o,���1���I�/��������Ap#b"���r+���T��������J��>y�������4S����l}���q0.M���?Fy�
�M];��������r��l�����S%����.@Fi�j�&��1��p
��Y����!o�	���-�*��EE�������-�ygT��U5gT���v��n#��S��/R�V�����4y�,��b���������R���KU}���o������Q��u3}��C��������<f	�d7����;�)�����$��c�6yj��Ld�);�����f�R���w��v ��u�]�9-�/Z��,�ja�3����&��L�(�ygk��2mS5S��N�Z���o������.���s��c��uV�9��J���+}�V������S�2���O7\;}�$�gl1�UY3K{�����v�l���}��^��,��l�J��Xg�o-�	�U�YI�����x�4.��&�-8.������X�,���0�x�6�v��1���2�5c�����k>����2�_�^����x=6��&���O�W?]�;{�����"��$q������+u�k��\��\a�
O���"s�v��z7�g������?��|���O�	�zo{�`S��^��jC;��@�c���@�!�Cq]�$��� 
���%
�	�?>k)�t)��zt������%d��@5"u��H����P�� ��E�����������B)]�����5�O���z-z��E<������+Z��������&����,7y1Nd
:OB����[��aD��$\�=����;!��)%Hbmb���t��7��[1���!��,N��(@�`*�:C���o)�*+�DL���)d�q���A^�Wu@��!0!��k�N�Z�(�N�=I��l���|db;�2�8	e-�x�@�d�Qo@v	K8��R�!N�Q=
'�rYF�F��Vo���EGnIw7�BQ�hDf�0�"��X�w/���w�{����\Zm�hTi�"'?���� ��0(�~�����:�F�4�oq�?��|����m;��E�{]�'rR>*b�]�j�l����kVc'q�Z_���T�
h���l�uy�G�\_� i�?���������v���*�6�-8'-������R�cPy��ut��@�9����r�����~ur�!��`�O����o�2�`�W�W�W���/�6��?���x=?���h�n��o1���`�W�L
����N`��������E���3��8���"���������h�,�������-�q�������l���/���Z
��������q1�$�����X�=�7��r�XH�rU��Q	M�������H�$��8?!FR�L��2� �!�Z��c.��`0��d��1��I����w�� �;C�@J��`�Qy�R	#!{���@]��������p7�&����vdD�zh4D��
APD����8�kw��=�X��-O�F���{e�D"������L���������t�_5_0��953Y�
/W�^����e������lT�������� �1�@�&)�-��SjL	�	��|�v@F���!�<���O$������!{B�"�[���3�+������~�C���4�|���dL��T�djB�,��)�u�`��+=�^oeny�lu�J�A_��)y���L.{�^*�5S
����1����%C��F�Y�rR���^j�����v����^"���Q�H�*�0 0Pe)���Pq{���p	��W��X�'�"��B���
`
r*_d�^�0������4���`�n�
G�qx1��@��$���J�{@�����
�]�rg0�����-�1�a��2�Qbk1���*�_a��F��K�_���H�azc����K���:t�8�x@�!=���,B�[d�����Z�I�qxg ����E��fydw��}�PD��0kA���Ma�
�u��!�Ds���_���g�"��PJ	�jn�Y4��`�������������bV$��V�`IN��\���x�t���6H�1����$�7^�c����K �� ��$����
I7�N��x;��k����6��l��I`��\�I�dk����c���O�B	b��S8��}w{\��M�Q�5e=��)'�t����oO/^��>	�O�p���(�~HH}@��_K7���N)(zI�wJ�2�&�f
vWu���j6l�W� �mHD/&�:���5(`A��X�2������g����]���y�5�`��������S�`���&��j�	����
��'�l���_�������K��@��S�R7���V6�fP�F���
S'��q��h>=,++HP���jZ����`)f;-�����>���o�6��Z,.r���$�y)i��%%��J��6Z�Uz�<��vfs���F��O)�$/����P[�����L�N�T�}uF�c����4%�iHj�Q�'���'����WY8�[�2�5	����.H~5�t���}~�v7��X����x)�J�;�7q#?�TZZ1� 3{�]�et��v]����������]��������~��H�������PDi��Yu�F��k���w/��07">Qg a�`��D��v�M�	���$F������iE�$cA�B��)C�� ��QQ����
�>��#Z`���O��0O}����������{JDe�K�0��wx�0���jn�����zK�o����'5�;�
82"���V�e[kH�T�����B��TA���q"%)�TzeW�����3V�v���ne���=����$�k�
^��ktxB�ckO4!�ct^A�6���m�����z���|Jw�\����J�Q?�|��J�K!��{�w�a�v�~k�#�'|����xNH�)
H
��	t�c�����g�i�{�5)����x
T�t`���:q1����Up�B��2M�mP�L�nX^�A7�b
�v��^w����j-<�������`����B
�!�c�D�))�����|[�y	�x����$MPi�y�t#Q���I�����B�2�!��u^hh�<s=,�cl�I��{yeV�������RB��F��M����G)5	�
z�sv�pA��Tiv^����M�~'�����N�����a��_��$��?�ZP��%����}O�f�/��b�}�Z=��;�3q��
5l1w�=U�=l�d�~L���|:�����S�Z�T�	O��]nBy���c�xL��o<
�:q����a������&q�V<{�����V�S�V�/i���-5j���8������~NG�5�vm�w��7���Nx\��"���c����K���d�Q�����9������;���s�����Zb�V���8[������4�
��l7�#Jr����O�O�x�t�8p9p�T(X�-�x�NF�/�N@���{����(��~�i�:�h5��h�d9�
��A��G�����ES�f4[���&4\�Z��(���eGm���ZB�>�
���D��=eX"�MA5(�iw�$�[�h�&$�!���`��jH����9V2}0�#�5���z��1��l����c�����	DdD����(����� �(D��G/2���g#����K��'�U�#!��]��T��0�Q5����O�BR��h[6-���;�-���Y�	���`��f���;������!b����&�!����R����`���ON�m���I���
�������O��	��l�5�_>��F�v'�s'�Y4U���>�L��Th��U�|��X��h�&�3-����c��x��$�����!�h��������cS�-�E���X��q��gd�oI��]p����L��i����O���t�CdB)�m~�i��*A�7�vI�N��	�sf
"H��������z������!S�����$�A�t��q�J����6.�g}z��qt��y4\q���OI;����(��}��.��������45���p*<;h�����'[N���!-k�SH�_o�a��tZ�����C�Fp�R+�{��W��(���X_���pb�
�����0#7\i�2}�(mH�t�)_O����Qj'�C�`�U��wp^��s#�v����5��}Vo���?�J.��\���~�
&�B�H���.ZO9��;���]��K(.�������t�������P�\��'�s�����\�1k f.��I�3������P>�lM���)FW�;������`��En�OB��`,�cKF����9@[�=���,
I�,=��NpuK���N�cl�L������5=�� g�����������������!�$�s�,b���7T��lk���^��l5;�����8�����61�Q�Q�4���`S��nYT�J������Ft��.�R�0u�����v��O��C���'�|��.�� �U��F��q�7~�7�&lD^��+�B�0�{����%����P
-q�9������'%H��\\��F7p�i���7��,�.Y�c�i8���0R�
3��B��S�:�c��}�G9�n����5�S��g?B���@��tD��N���b-����c;pF����|�JA��K�����ve"����D0����Q�I�*���[	����8$`�d�GF��Z��I��	v���yO��
�}�"��:�v6��w�X���x�B�`�Q�2�/`0�E���4�����,���!�N��� ��,v��_����?�C��i4��Qe��la���<�`�qm��b6����K�k,�m����	��&���c,=�M���^��hV�����c�KYs�{�y�i���v�lBX�J;���h,��|�� �\��]���b�h��a������c���o&.q/�[Y~�-�/�T�	d��A5z������-Dhp84�)t3�`.u�u*P�i���K�f���h5���������[���?�JSY�o+1j����9�%,7p=���f�*������b>~&{V���.u68�X����dg���'��*��:���18����66����[0�E2���� ��$�Z�D�N����'�w<f�A����z���*P;���U�u��SM���t��lz{�`��$�b�G���H'�m�;6&� �KS	e���3�ei\�O�6�����^�9m�5���t�C�R��,=!�����&ySM��}R`��7��f�)������\N��u���3
H�����2��25��'����\]�������Z�-[���k�L��)Q����� ���V��?)s���~w3P��Mp�����F��T���m���^-����~�g�����o������N��d{���m4p4�F�D}"b"�@��[��r���N��G���"V�z�j1!A��x�,�����b�R���	z�����\;�J3J�+���1���Fyz�K>�x�J'�(���U�H���y�Z����Le�2/!_KQBz�t�^s3�l��mB9�)$p(vH
�m86e����p���*y����v�:�@�Q���rsC�,�S�)YV���
$�iF��U�&��6�h!#���0�q���M�m���T'�t�J`l���BQa��d��xQ���u�8@�S���~@:���](��b���@SU�pA{Q��^���8^�}B-��0��2uSB�`�)*k�C5��1�_��(SF~�'+��}���Y-Rr�rgr����	��u4
1PS�B!*<��L�$��!Y�m�V�R������0dA*����FS�n�X{�Z��pR�S�+���P�<�{�K'0�M��?���X��@Mm���2�o-���L����P8�!�Q��F��2�J���b������ +��x(G�2��AB�x���$�(�%A�����,��E���	��eC�%o����7x85��c[�����.#P����f�&�d'5������"fH���LR���������MsQ�&���
��F���(�[��+��1D��Ks�t�I����`<��K
b5L��Fpa���w�C��$&p>x�[��on��;C�s���Su�@z
aK��]���N�D����3�����^,0��H3�{�l5�]J$mE��yw~�+�]��wh�8(�Q��^dkD4A��^����w���k�d�S_�|?v��W��V^�q�M�*���T�l�����������g(���x�q��De�0�	���*���}��)��6���(8��'t�h�nYq�X���Y-"g�d�D,���r�,�����,�8&t���.�s82Xp������k��$t2���=E���%O1��Vu����W��D�@�U��fP�V��9-��������1/= �M}&���8�2���_D[��2�;.�9�����~��^�)z<r�����}Z�Y�H����v�Y��S����'��	�w�(��bD�crN��5�����Z%FY[o�2B��Y����K��E/���5��j����(�qX��c7��9F#{5^rq92����H0j&�o-c.&#0�+YL)���*QF���]�|<R���*?A�O�]� AZ�<@"[��#�� �UX�!�}����C��-�G&��Z�;����J�p�.b�6��	\�zeY������iI��I0��"����P��I�0��@�S�>�(>
/u�1��|$�8�!$���n���$Yt-B��=@�1�&fh�=d�$� `W��`zP3�L���$Z����#������of\��������-c�+�����+���Ph��J����@�0���x(B�T��a�@����l����<#]Q0Oo���e�=���|i����D��$m�U����;���}F\�0�H�Q*�&���������:���%T^����
�=�������E�������9�1go�#n�!�,o��g��ZQJ6�H��L��O��D��
�-��F=��"��G����� ����[���(��m��������T:���[�w���������9(��;�f���/��h����!�m����+A��AW�����^0����
T��,��J�VsB6Y�]�e"d�/��/�l��QsS��D��L���c�)|�[cW]�:>�Oso�k��mqK��l+P'g�OG�
��K�Y��6��9���2�����8��>`�U�C Y���������P]0�|���v1r��U�S1G{b�c�Y<��T������LdiOD��_g++2F��?��w��s���H�`���'��������N	{��x�{���,r]�znj�4����f�����n@p��S�6�j��h��
=�R����}
b���t���("V=R�HmF~N]�C��&��\�G��`�\�r�D�����
d;�n�}���%;i�(8��.D2i�#�	_E.}���h@[���+�d�i�=��U��aObS���Us�����<��:�8:��x��d�����z��|TTj���o���E�&�=�����%�@:�(��s������YC���E"}8��K�I���������R���/(~K�������x���0y����g�C4]z�����bu�������K�����_D��Z�(�y�&�L��!%��I<!$���"���RU������L��7���	_�,9�>b�GPN� ���#Hz�������y��6\�K��h�XKW�C^_�n~���=��@��l�(�����������{�x���RW��;�
a�T�	�M��{��fNj�7�K&y��B�����^s����V�����o���f~����l
��a����K	��}]��)�q�xt�ya���j�ET�Cz�>.������`��yM��C�7,\(����q�����T���ON�/�hh~�v���U��#%��3��U�zS��)�������/NQ66�=�~5s�	.D�+X��X��b)p��dS*��s�:?��7\�B\M��� �
2��mJ�M���g��q8[^SCZ��R�Io�i����a���{���}�2��_}��C�:l�;A�u�.��3��~j����� �S����M�� �N�	�����f������,�6�IX�;k�M����w�����������=�_.����s�!0��U^���"�\����|,|q�1����S�������J%��N�w�����;��?q���@���e���jK�^�J�%{�:}S���a�k����:�V������u�8��&3e��L� ���:���?�=�X"$��.���M|s��g[�d�|�5��11P3�p�������[��>;�=� 
3(y���`����R�~`e�[�Z�t�������{`�T�	G[�r��"���\p�m�n��P��k�Ir��������?��}�^$��{R�bO��+�{v�U�U���o�"��F�Q;l��p�Z�{�A�@�����|���
^&;��&i/��� �/�M�:�������]#�����S�������1<�����������
���EV�������(89������?�����WW��+H�I�=j���G�/���%~�_������o

l�<�P)��/�l�M�}��+��\��.����p!�
��}{��������z.��}���9����?r=C^���������Mo��L�{���*(,r��4aNT��������~�Zm�����A�^���/���9������#l�p���JXV����z�D��b��/�T,���F<R��Y�/�K���r
�v���Ko>1�|��'i����0�
~P�9�Zj8��0'��=
TO��f���M��n�A�����-�Z(k�S��7�w��M'���|�D�'�����,���O�u_��:���U������������T�������<xq��������j��'�j�^�U����'�?+K��#O
v21-0002-tableam-Add-fetch_row_version.patch.gzapplication/x-patch-gzipDownload
v21-0003-tableam-Add-use-tableam_fetch_follow_check.patch.gzapplication/x-patch-gzipDownload
v21-0004-tableam-Add-table_get_latest_tid.patch.gzapplication/x-patch-gzipDownload
v21-0005-tableam-multi_insert-and-slotify-COPY.patch.gzapplication/x-patch-gzipDownload
v21-0006-tableam-finish_bulk_insert.patch.gzapplication/x-patch-gzipDownload
v21-0007-tableam-slotify-CREATE-TABLE-AS-and-CREATE-MATER.patch.gzapplication/x-patch-gzipDownload
v21-0008-tableam-index-builds.patch.gzapplication/x-patch-gzipDownload
v21-0009-tableam-relation-creation-VACUUM-FULL-CLUSTER-SE.patch.gzapplication/x-patch-gzipDownload
v21-0010-tableam-VACUUM-and-ANALYZE.patch.gzapplication/x-patch-gzipDownload
v21-0011-tableam-planner-size-estimation.patch.gzapplication/x-patch-gzipDownload
v21-0012-tableam-Sample-Scan-Support.patch.gzapplication/x-patch-gzipDownload
v21-0013-tableam-bitmap-heap-scan.patch.gzapplication/x-patch-gzipDownload
v21-0014-tableam-Only-allow-heap-in-a-number-of-contrib-m.patch.gzapplication/x-patch-gzipDownload
v21-0015-WIP-Move-xid-horizon-computation-for-page-level-.patch.gzapplication/x-patch-gzipDownload
v21-0016-tableam-Add-function-to-determine-newest-xid-amo.patch.gzapplication/x-patch-gzipDownload
v21-0017-tableam-Fetch-tuples-for-triggers-EPQ-using-a-pr.patch.gzapplication/x-patch-gzipDownload
#131Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#130)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-03-23 20:16:30 -0700, Andres Freund wrote:

I'm pretty happy with that last version (of the first patch). I'm
planning to do one more pass, and then push.

And done, after a bunch of mostly cosmetic changes (renaming
ExecCheckHeapTupleVisible to ExecCheckTupleVisible, removing an
unnecessary change in heap_lock_tuple parameters, a bunch of comments,
stuff like that). Let's see what the buildfarm says.

The remaining commits luckily all are a good bit smaller.

Greetings,

Andres Freund

#132Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#96)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-02-22 14:52:08 -0500, Robert Haas wrote:

On Fri, Feb 22, 2019 at 11:19 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Thanks for the review. Attached v2.

Thanks. I took this, combined it with Andres's
v12-0040-WIP-Move-xid-horizon-computation-for-page-level-.patch, did
some polishing of the code and comments, and pgindented. Here's what
I ended up with; see what you think.

I pushed this after some fairly minor changes, directly including the
patch to route the horizon computation through tableam. The only real
change is that I removed the table relfilenode from the nbtree/hash
deletion WAL record - it was only required to access the heap without
accessing the catalog and was unused now. Also added a WAL version
bump.

It seems possible that some other AM might want to generalize the
prefetch logic from heapam.c, but I think it's fair to defer that until
such an AM wants to do so.

Greetings,

Andres Freund

#133Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#132)
2 attachment(s)
Re: Pluggable Storage - Andres's take

On Wed, Mar 27, 2019 at 11:17 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-02-22 14:52:08 -0500, Robert Haas wrote:

On Fri, Feb 22, 2019 at 11:19 AM Amit Khandekar <amitdkhan.pg@gmail.com>

wrote:

Thanks for the review. Attached v2.

Thanks. I took this, combined it with Andres's
v12-0040-WIP-Move-xid-horizon-computation-for-page-level-.patch, did
some polishing of the code and comments, and pgindented. Here's what
I ended up with; see what you think.

I pushed this after some fairly minor changes, directly including the
patch to route the horizon computation through tableam. The only real
change is that I removed the table relfilenode from the nbtree/hash
deletion WAL record - it was only required to access the heap without
accessing the catalog and was unused now. Also added a WAL version
bump.

It seems possible that some other AM might want to generalize the
prefetch logic from heapam.c, but I think it's fair to defer that until
such an AM wants to do so

As I see that your are fixing some typos of the code that is committed,
I just want to share some more corrections that I found in the patches
that are committed till now.

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0001-dA-to-show-Table-type-access-method.patchapplication/octet-stream; name=0001-dA-to-show-Table-type-access-method.patchDownload
From 3c32fc2cdc294f183f384810942003a4f97c4622 Mon Sep 17 00:00:00 2001
From: Hari Babu <kommi.haribabu@gmail.com>
Date: Mon, 25 Mar 2019 14:48:02 +1100
Subject: [PATCH 1/2] \dA to show Table type access method

---
 src/bin/psql/describe.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 8129e3ccbf..2f8a4d752a 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -169,9 +169,11 @@ describeAccessMethods(const char *pattern, bool verbose)
 					  "SELECT amname AS \"%s\",\n"
 					  "  CASE amtype"
 					  " WHEN 'i' THEN '%s'"
+					  " WHEN 't' THEN '%s'"
 					  " END AS \"%s\"",
 					  gettext_noop("Name"),
 					  gettext_noop("Index"),
+					  gettext_noop("Table"),
 					  gettext_noop("Type"));
 
 	if (verbose)
-- 
2.20.1.windows.1

0002-Typos-and-comemnt-corrections.patchapplication/octet-stream; name=0002-Typos-and-comemnt-corrections.patchDownload
From c2011ede1ea17868b445cbe8a7fc7401c75c98e8 Mon Sep 17 00:00:00 2001
From: Hari Babu <kommi.haribabu@gmail.com>
Date: Mon, 25 Mar 2019 14:48:11 +1100
Subject: [PATCH 2/2] Typos and comemnt corrections

---
 src/backend/access/heap/heapam_handler.c |  2 +-
 src/include/access/tableam.h             | 12 ++++++------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 581a6bd9d1..4185d61d69 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -12,7 +12,7 @@
  *
  *
  * NOTES
- *	  This files wires up the lower level heapam.c et routines with the
+ *	  This files wires up the lower level heapam.c etc routines with the
  *	  tableam abstraction.
  *
  *-------------------------------------------------------------------------
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 7101d46c02..c571f8a899 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -164,7 +164,7 @@ typedef struct TableAmRoutine
 	 * synchronized scans, or page mode may be used (although not every AM
 	 * will support those).
 	 *
-	 * is_{bitmapscan, samplescan} specify whether the scan is inteded to
+	 * is_{bitmapscan, samplescan} specify whether the scan is intended to
 	 * support those types of scans.
 	 *
 	 * if temp_snap is true, the snapshot will need to be deallocated at
@@ -220,7 +220,7 @@ typedef struct TableAmRoutine
 	Size		(*parallelscan_initialize) (Relation rel, ParallelTableScanDesc pscan);
 
 	/*
-	 * Reinitilize `pscan` for a new scan. `rel` will be the same relation as
+	 * Reinitialize `pscan` for a new scan. `rel` will be the same relation as
 	 * when `pscan` was initialized by parallelscan_initialize.
 	 */
 	void		(*parallelscan_reinitialize) (Relation rel, ParallelTableScanDesc pscan);
@@ -913,7 +913,7 @@ table_delete(Relation rel, ItemPointer tid, CommandId cid,
  * Input parameters:
  *	relation - table to be modified (caller must hold suitable lock)
  *	otid - TID of old tuple to be replaced
- *	newtup - newly constructed tuple data to store
+ *	slot - newly constructed tuple data to store
  *	cid - update command ID (used for visibility test, and stored into
  *		cmax/cmin if successful)
  *	crosscheck - if not InvalidSnapshot, also check old tuple against this
@@ -929,8 +929,8 @@ table_delete(Relation rel, ItemPointer tid, CommandId cid,
  * TM_SelfModified, TM_Updated, or TM_BeingModified
  * (the last only possible if wait == false).
  *
- * On success, the header fields of *newtup are updated to match the new
- * stored tuple; in particular, newtup->t_self is set to the TID where the
+ * On success, the slot's tts_tid and tts_tableOid are updated to match the new
+ * stored tuple; in particular, slot->tts_tid is set to the TID where the
  * new tuple was inserted, and its HEAP_ONLY_TUPLE flag is set iff a HOT
  * update was done.  However, any TOAST changes in the new tuple's
  * data are not reflected into *newtup.
@@ -965,7 +965,7 @@ table_update(Relation rel, ItemPointer otid, TupleTableSlot *slot,
  *	flags:
  *		If TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS, follow the update chain to
  *		also lock descendant tuples if lock modes don't conflict.
- *		If TUPLE_LOCK_FLAG_FIND_LAST_VERSION, update chain and lock lastest
+ *		If TUPLE_LOCK_FLAG_FIND_LAST_VERSION, update chain and lock latest
  *		version.
  *
  * Output parameters:
-- 
2.20.1.windows.1

#134Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#133)
Re: Pluggable Storage - Andres's take

On 2019-03-29 18:38:46 +1100, Haribabu Kommi wrote:

As I see that your are fixing some typos of the code that is committed,
I just want to share some more corrections that I found in the patches
that are committed till now.

Pushed both, thanks!

#135Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#124)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-03-16 23:21:31 +1100, Haribabu Kommi wrote:

updated patches are attached.

Now that nearly all of the tableam patches are committed (with the
exception of the copy.c changes which are for bad reasons discussed at
[1]: /messages/by-id/CAKJS1f98Fa+QRTGKwqbtz0M=Cy1EHYR8Q-W08cpA78tOy4euKQ@mail.gmail.com

What made you rename indexam.sgml to am.sgml, instead of creating a
separate tableam.sgml? Seems easier to just have a separate file?

I'm currently not planning to include the function-by-function API
reference you've in your patchset, as I think it's more reasonable to
centralize all of it in tableam.h. I think I've included most of the
information there - could you check whether you agree?

[1]: /messages/by-id/CAKJS1f98Fa+QRTGKwqbtz0M=Cy1EHYR8Q-W08cpA78tOy4euKQ@mail.gmail.com

Greetings,

Andres Freund

#136Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#135)
3 attachment(s)
Re: Pluggable Storage - Andres's take

On Tue, Apr 2, 2019 at 10:18 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-03-16 23:21:31 +1100, Haribabu Kommi wrote:

updated patches are attached.

Now that nearly all of the tableam patches are committed (with the
exception of the copy.c changes which are for bad reasons discussed at
[1]) I'm looking at the docs changes.

Thanks.

What made you rename indexam.sgml to am.sgml, instead of creating a
separate tableam.sgml? Seems easier to just have a separate file?

No specific reason, I just thought of adding all the access methods under
one file.
I can change it to tableam.sgml.

I'm currently not planning to include the function-by-function API
reference you've in your patchset, as I think it's more reasonable to
centralize all of it in tableam.h. I think I've included most of the
information there - could you check whether you agree?

I checked all the comments and explanation that is provided in the
tableam.h is
good enough to understand. Even I updated docs section to reflect with some
more
details from tableam.h comments.

I can understand your point of avoiding function-by-function API reference,
as the user can check directly the code comments, Still I feel some people
may refer the doc for API changes. I am fine to remove based on your
opinion.

Added current set of doc patches for your reference.

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0001-Rename-indexam.sgml-to-am.sgml.patchapplication/octet-stream; name=0001-Rename-indexam.sgml-to-am.sgml.patchDownload
From 26349564d145dc6e6f6321e77533108ebe64b0bb Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 7 Feb 2019 16:13:56 +1100
Subject: [PATCH 1/3] Rename indexam.sgml to am.sgml

Reorganize am as both table and index

There is not much table access methods info.
---
 doc/src/sgml/{indexam.sgml => am.sgml} | 55 +++++++++++++++++---------
 doc/src/sgml/filelist.sgml             |  2 +-
 doc/src/sgml/postgres.sgml             |  2 +-
 doc/src/sgml/xindex.sgml               |  2 +-
 4 files changed, 40 insertions(+), 21 deletions(-)
 rename doc/src/sgml/{indexam.sgml => am.sgml} (97%)

diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/am.sgml
similarity index 97%
rename from doc/src/sgml/indexam.sgml
rename to doc/src/sgml/am.sgml
index b56d3b3daa..b2a97f20aa 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/am.sgml
@@ -1,16 +1,34 @@
-<!-- doc/src/sgml/indexam.sgml -->
+<!-- doc/src/sgml/am.sgml -->
 
-<chapter id="indexam">
- <title>Index Access Method Interface Definition</title>
+<chapter id="am">
+ <title>Access Method Interface Definition</title>
 
   <para>
    This chapter defines the interface between the core
-   <productname>PostgreSQL</productname> system and <firstterm>index access
-   methods</firstterm>, which manage individual index types.  The core system
-   knows nothing about indexes beyond what is specified here, so it is
-   possible to develop entirely new index types by writing add-on code.
+   <productname>PostgreSQL</productname> system and <firstterm>access
+   methods</firstterm>, which manage individual <literal>INDEX</literal>
+   and <literal>TABLE</literal> types.  The core system knows nothing
+   about these access methods beyond what is specified here, so it is
+   possible to develop entirely new access method types by writing add-on code.
   </para>
 
+ <sect1 id="table-access-methods">
+  <title>Overview of Table access methods </title>
+
+  <para>
+   All Tables in <productname>PostgreSQL</productname> are the primary
+   data store. Each table is stored as its own physical <firstterm>relation</firstterm>
+   and so is described by an entry in the <structname>pg_class</structname>
+   catalog. The table contents are entirely under the control of its
+   access method. (All the access methods furthermore use the standard page
+   layout described in <xref linkend="storage-page-layout"/>.)
+  </para>
+
+ </sect1>
+ 
+ <sect1 id="index-access-methods">
+  <title>Overview of Index access methods</title>
+
   <para>
    All indexes in <productname>PostgreSQL</productname> are what are known
    technically as <firstterm>secondary indexes</firstterm>; that is, the index is
@@ -43,7 +61,7 @@
    are reclaimed.
   </para>
 
- <sect1 id="index-api">
+ <sect2 id="index-api">
   <title>Basic API Structure for Indexes</title>
 
   <para>
@@ -217,9 +235,9 @@ typedef struct IndexAmRoutine
    conditions.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-functions">
+ <sect2 id="index-functions">
   <title>Index Access Method Functions</title>
 
   <para>
@@ -710,9 +728,9 @@ amparallelrescan (IndexScanDesc scan);
    the beginning.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-scanning">
+ <sect2 id="index-scanning">
   <title>Index Scanning</title>
 
   <para>
@@ -865,9 +883,9 @@ amparallelrescan (IndexScanDesc scan);
    if its internal implementation is unsuited to one API or the other.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-locking">
+ <sect2 id="index-locking">
   <title>Index Locking Considerations</title>
 
   <para>
@@ -979,9 +997,9 @@ amparallelrescan (IndexScanDesc scan);
    reduce the frequency of such transaction cancellations.
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-unique-checks">
+ <sect2 id="index-unique-checks">
   <title>Index Uniqueness Checks</title>
 
   <para>
@@ -1128,9 +1146,9 @@ amparallelrescan (IndexScanDesc scan);
     </itemizedlist>
   </para>
 
- </sect1>
+ </sect2>
 
- <sect1 id="index-cost-estimation">
+ <sect2 id="index-cost-estimation">
   <title>Index Cost Estimation Functions</title>
 
   <para>
@@ -1377,5 +1395,6 @@ cost_qual_eval(&amp;index_qual_cost, path-&gt;indexquals, root);
    Examples of cost estimator functions can be found in
    <filename>src/backend/utils/adt/selfuncs.c</filename>.
   </para>
+ </sect2>
  </sect1>
 </chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index a03ea1427b..52a5efca94 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -89,7 +89,7 @@
 <!ENTITY gin        SYSTEM "gin.sgml">
 <!ENTITY brin       SYSTEM "brin.sgml">
 <!ENTITY planstats    SYSTEM "planstats.sgml">
-<!ENTITY indexam    SYSTEM "indexam.sgml">
+<!ENTITY am         SYSTEM "am.sgml">
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 96d196d229..9dce0c5f81 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -250,7 +250,7 @@
   &tablesample-method;
   &custom-scan;
   &geqo;
-  &indexam;
+  &am;
   &generic-wal;
   &btree;
   &gist;
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml
index 9446f8b836..4fa821160c 100644
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@@ -36,7 +36,7 @@
    described in <classname>pg_am</classname>.  It is possible to add a
    new index access method by writing the necessary code and
    then creating an entry in <classname>pg_am</classname> &mdash; but that is
-   beyond the scope of this chapter (see <xref linkend="indexam"/>).
+   beyond the scope of this chapter (see <xref linkend="am"/>).
   </para>
 
   <para>
-- 
2.20.1.windows.1

0002-Doc-updates-for-pluggable-table-access-method-syntax.patchapplication/octet-stream; name=0002-Doc-updates-for-pluggable-table-access-method-syntax.patchDownload
From 4874207136d593fbccd08a5eca918562b9def6ba Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 7 Feb 2019 15:59:48 +1100
Subject: [PATCH 2/3] Doc updates for pluggable table access method syntax

Default_table_access_method GUC

CREATE ACCESS METHOD ... TYPE TABLE ..

CREATE MATERIALIZED VIEW ... USING heap2 ...

CREATE TABLE ... USING heap2 ...

CREATE TABLE AS ... USING heap2 ...
---
 doc/src/sgml/catalogs.sgml                    |  4 ++--
 doc/src/sgml/config.sgml                      | 24 +++++++++++++++++++
 doc/src/sgml/ref/create_access_method.sgml    | 14 +++++++----
 .../sgml/ref/create_materialized_view.sgml    | 13 ++++++++++
 doc/src/sgml/ref/create_table.sgml            | 17 ++++++++++++-
 doc/src/sgml/ref/create_table_as.sgml         | 13 ++++++++++
 6 files changed, 77 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index f4aabf5dc7..fa6541d99b 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -587,8 +587,8 @@
    The catalog <structname>pg_am</structname> stores information about
    relation access methods.  There is one row for each access method supported
    by the system.
-   Currently, only indexes have access methods.  The requirements for index
-   access methods are discussed in detail in <xref linkend="indexam"/>.
+   Currently, only tables, index and materialized views have access methods.
+   The requirements for access methods are discussed in detail in <xref linkend="am"/>.
   </para>
 
   <table>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d383de2512..e03da43b7d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7231,6 +7231,30 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-default-table-access-method" xreflabel="default_table_access_method">
+      <term><varname>default_table_access_method</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>default_table_access_method</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        This parameter specifies the default table access method to use when creating
+        tables or materialized views if the <command>CREATE</command> command does
+        not explicitly specify an access method.
+       </para>
+
+       <para>
+        The value is either the name of a table access method, or an empty string
+        to specify using the default table access method of the current database.
+        If the value does not match the name of any existing table access method,
+        <productname>PostgreSQL</productname> will automatically use the default
+        table access method of the current database.
+       </para>
+
+      </listitem>
+     </varlistentry>
+     
      <varlistentry id="guc-default-tablespace" xreflabel="default_tablespace">
       <term><varname>default_tablespace</varname> (<type>string</type>)
       <indexterm>
diff --git a/doc/src/sgml/ref/create_access_method.sgml b/doc/src/sgml/ref/create_access_method.sgml
index 851c5e63be..c37491a713 100644
--- a/doc/src/sgml/ref/create_access_method.sgml
+++ b/doc/src/sgml/ref/create_access_method.sgml
@@ -61,7 +61,8 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
     <listitem>
      <para>
       This clause specifies the type of access method to define.
-      Only <literal>INDEX</literal> is supported at present.
+      Only <literal>TABLE</literal> and <literal>INDEX</literal>
+      are supported at present.
      </para>
     </listitem>
    </varlistentry>
@@ -75,10 +76,13 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
       that represents the access method.  The handler function must be
       declared to take a single argument of type <type>internal</type>,
       and its return type depends on the type of access method;
-      for <literal>INDEX</literal> access methods, it must
-      be <type>index_am_handler</type>.  The C-level API that the handler
-      function must implement varies depending on the type of access method.
-      The index access method API is described in <xref linkend="indexam"/>.
+      for <literal>TABLE</literal> access methods, it must
+      be <type>table_am_handler</type> and for <literal>INDEX</literal>
+      access methods, it must be <type>index_am_handler</type>.
+      The C-level API that the handler function must implement varies
+      depending on the type of access method. The table access method API
+      is described in <xref linkend="table-access-methods"/> and the index access method
+      API is described in <xref linkend="index-access-methods"/>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/doc/src/sgml/ref/create_materialized_view.sgml b/doc/src/sgml/ref/create_materialized_view.sgml
index 7f31ab4d26..24c4e200b0 100644
--- a/doc/src/sgml/ref/create_materialized_view.sgml
+++ b/doc/src/sgml/ref/create_materialized_view.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 <synopsis>
 CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     [ (<replaceable>column_name</replaceable> [, ...] ) ]
+    [ USING <replaceable class="parameter">method</replaceable> ]
     [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) ]
     [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
     AS <replaceable>query</replaceable>
@@ -85,6 +86,18 @@ CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional method for the new materialized view; The method should
+      be of type <literal>TABLE</literal>. See <xref linkend="table-access-methods"/> for more information. 
+      If this option is not specified, the default table access method is chosen for
+      the new materialized view. See <xref linkend="guc-default-table-access-method"/> for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 0fcbc660b3..a98655189e 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -29,6 +29,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 ] )
 [ INHERITS ( <replaceable>parent_table</replaceable> [, ... ] ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -40,6 +41,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     [, ... ]
 ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -51,6 +53,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     [, ... ]
 ) ] { FOR VALUES <replaceable class="parameter">partition_bound_spec</replaceable> | DEFAULT }
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -987,7 +990,7 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
 
      <para>
       The access method must support <literal>amgettuple</literal> (see <xref
-      linkend="indexam"/>); at present this means <acronym>GIN</acronym>
+      linkend="index-access-methods"/>); at present this means <acronym>GIN</acronym>
       cannot be used.  Although it's allowed, there is little point in using
       B-tree or hash indexes with an exclusion constraint, because this
       does nothing that an ordinary unique constraint doesn't do better.
@@ -1170,6 +1173,18 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional method for the new table; The method should be
+      of type <literal>TABLE</literal>. See <xref linkend="table-access-methods"/> for more information.
+      If this option is not specified, the default table access method is chosen
+      for the new table. See <xref linkend="guc-default-table-access-method"/> for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 679e8f521e..c49a755e73 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 <synopsis>
 CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     [ (<replaceable>column_name</replaceable> [, ...] ) ]
+    [ USING <replaceable class="parameter">method</replaceable> ]
     [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
     [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
     [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -120,6 +121,18 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional method for the new table; The method should be
+      of type <literal>TABLE</literal>. See <xref linkend="table-access-methods"/> for more information.
+      If this option is not specified, then the default table access method is chosen
+      for the new table. see <xref linkend="guc-default-table-access-method"/> for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
-- 
2.20.1.windows.1

0003-Table-access-method-API-explanation.patchapplication/octet-stream; name=0003-Table-access-method-API-explanation.patchDownload
From fd18641adfe19ecf12a4f6a17a07bd5ffb272d48 Mon Sep 17 00:00:00 2001
From: Kommi <haribabuk@fast.au.fujitsu.com>
Date: Mon, 11 Mar 2019 15:44:44 +1100
Subject: [PATCH 3/3] Table access method API explanation

All the table access method API's and their details are explained.
---
 doc/src/sgml/am.sgml | 794 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 789 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/am.sgml b/doc/src/sgml/am.sgml
index b2a97f20aa..455e7a10c6 100644
--- a/doc/src/sgml/am.sgml
+++ b/doc/src/sgml/am.sgml
@@ -18,14 +18,798 @@
   <para>
    All Tables in <productname>PostgreSQL</productname> are the primary
    data store. Each table is stored as its own physical <firstterm>relation</firstterm>
-   and so is described by an entry in the <structname>pg_class</structname>
-   catalog. The table contents are entirely under the control of its
-   access method. (All the access methods furthermore use the standard page
-   layout described in <xref linkend="storage-page-layout"/>.)
+   and is described by an entry in the <structname>pg_class</structname>
+   catalog. A table's content is entirely controlled by its access method, although
+   all access methods use the same standard page layout described in <xref linkend="storage-page-layout"/>.
   </para>
 
- </sect1>
+  <sect2 id="table-access-methods-api">
+   <title>Table access method API</title>
+
+   <para>
+    Each table access method is described by a row in the
+    <link linkend="catalog-pg-am"><structname>pg_am</structname></link> system
+    catalog. The <structname>pg_am</structname> entry specifies a <firstterm>type</firstterm>
+    of the access method and a <firstterm>handler function</firstterm> for the
+    access method. These entries can be created and deleted using the <xref linkend="sql-create-access-method"/>
+    and <xref linkend="sql-drop-access-method"/> SQL commands.
+   </para>
+
+   <para>
+    A table access method handler function must be declared to accept a
+    single argument of type <type>internal</type> and to return the
+    pseudo-type <type>table_am_handler</type>.  The argument is a dummy value that
+    simply serves to prevent handler functions from being called directly from
+    SQL commands.  The result of the function must be a palloc'd struct of
+    type <structname>TableAmRoutine</structname>, which contains everything
+    that the core code needs to know to make use of the table access method.
+    The <structname>TableAmRoutine</structname> struct, also called the access
+    method's <firstterm>API struct</firstterm>, includes fields specifying assorted
+    fixed properties of the access method, such as whether it can support
+    bitmap scans.  More importantly, it contains pointers to support
+    functions for the access method, which do all of the real work to access
+    tables.  These support functions are plain C functions and are not
+    visible or callable at the SQL level.  The support functions are described
+    in <structname>TableAmRoutine</structname> structure. For more details, please
+    refer the file <filename>src/include/access/tableam.h</filename>.
+   </para>
+
+   <para>
+    Any new <literal>TABLE ACCSESS METHOD</literal> developers can refer the exisitng <literal>HEAP</literal>
+    implementation present in the <filename>src/backend/heap/heapam_handler.c</filename> for more details of
+    how it is implemented for HEAP access method.
+   </para>
+
+   <para>
+    There are different type of API's that are defined and those details are below.
+   </para>
+
+   <sect3 id="slot-implementation-function">
+    <title>Slot implementation functions</title>
+
+   <para>
+<programlisting>
+const TupleTableSlotOps *(*slot_callbacks) (Relation rel);
+</programlisting>
+
+    This API expects the function should return the slot implementation that is specific to the AM.
+    Following are the predefined types of slot implementations that are available,
+    <literal>TTSOpsVirtual</literal>, <literal>TTSOpsHeapTuple</literal>,
+    <literal>TTSOpsMinimalTuple</literal> and <literal>TTSOpsBufferHeapTuple</literal>.
+    The AM implementations can use any one of them. For more details of these slot
+    specific implementations, you can refer <filename>src/include/executor/tuptable.h</filename>.
+   </para>
+   </sect3>
+
+   <sect3 id="table-scan-functions">
+    <title>Table scan functions</title>
+
+    <para>
+     The following API's are used for scanning of a table.
+    </para>
+
+    <para>
+<programlisting>
+TableScanDesc (*scan_begin) (Relation rel,
+                             Snapshot snapshot,
+                             int nkeys, struct ScanKeyData *key,
+                             ParallelTableScanDesc pscan,
+                             bool allow_strat,
+                             bool allow_sync,
+                             bool allow_pagemode,
+                             bool is_bitmapscan,
+                             bool is_samplescan,
+                             bool temp_snap);
+</programlisting>
+
+     This API to start a scan of a relation pointed by <literal>rel</literal> and returns the
+     <structname>TableScanDesc</structname>, which will be typically embed in a larger AM specific,
+     strcut. 
+     
+     The <literal>nkeys</literal> indicates results needs to be filtered based on the <literal>key</literal>.
+     The <literal>pscan</literal> can be used by the AM, in case if it supports parallel scan.
+     The parameters <literal>allow_strat</literal>, <literal>allow_sync</literal> and <literal>allow_pagemode</literal>
+     are used for specifying whether the scan strategy, as whether it supports synchronize scans or
+     pagemode scans (although every AM is not required to support these).
+
+     The parameters <literal>is_bitmapscan</literal> and <literal>is_samplescan</literal> are used to
+     specify whether the scan is intended to support those type of scans are not?
+     
+     The <literal>temp_snap</literal> indicates the provided snapshot is a temporary allocated and
+     it needs to be freed at the scan end.
+    </para>
+
+    <para>
+<programlisting>
+void        (*scan_end) (TableScanDesc scan);
+</programlisting>
+
+     This API to end the scan that is started by the API <literal>scan_begin</literal>
+     by releasing the resources. <structfield>TableScanDesc.rs_snapshot</structfield>
+     needs to be unregistered and it can be deallocated based on <structfield>TableScanDesc.temp_snap</structfield>.
+    </para>
+
+    <para>
+<programlisting>
+void        (*scan_rescan) (TableScanDesc scan, struct ScanKeyData *key, bool set_params,
+                            bool allow_strat, bool allow_sync, bool allow_pagemode);
+</programlisting>
+
+     This API to restart the given relation scan that is already started by the
+     API <literal>scan_begin</literal>. if <literal>set_params</literal> is set
+     to true, consider the provided options into the scan.
+    </para>
+
+    <para>
+<programlisting>
+TupleTableSlot *(*scan_getnextslot) (TableScanDesc scan,
+                                     ScanDirection direction, TupleTableSlot *slot);
+</programlisting>
+
+     This API to return the next satisified tuple from the scan started by the API
+     <literal>scan_begin</literal> and store it in the <literal>slot</literal>.
+    </para>
+
+   </sect3>
+
+   <sect3 id="parallel-table-scan-function">
+    <title>parallel table scan functions</title>
+
+    <para>
+     The following API's are used to perform the parallel table scan.
+    </para>
+
+    <para>
+<programlisting>
+Size        (*parallelscan_estimate) (Relation rel);
+</programlisting>
+
+     This API to return the total size that is required for the AM to perform
+     the parallel table scan. The requied size must include the <structname>ParallelTableScanDesc</structname>
+     which is typically embed in the AM specific struct.
+    </para>
+
+    <para>
+<programlisting>
+Size        (*parallelscan_initialize) (Relation rel, ParallelTableScanDesc pscan);
+</programlisting>
+
+     This API to perform the initialization of the <literal>pscan</literal>
+     that is required for the parallel scan to be performed by the AM and also return
+     the size that is estimated by the <literal>parallelscan_estimate</literal>.
+    </para>
+
+    <para>
+<programlisting>
+void        (*parallelscan_reinitialize) (Relation rel, ParallelTableScanDesc pscan);
+</programlisting>
+
+     This API to reinitalize the parallel scan structure pointed by the <literal>pscan</literal>
+     for the same relation.
+    </para>
+
+   </sect3>
+
+   <sect3 id="index-scan-functions">
+    <title>Index scan functions</title>
+
+    <para>
+<programlisting>
+struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+</programlisting>
+
+     This API to prepare fetching tuples from the relation, as needed when fetching
+     from index scan. The API needs to return the allocated and initialized <structname>IndexFetchTableData</structname>
+     strutucture, which is typically embed in the AM specific struct.
+    </para>
+
+    <para>
+<programlisting>
+void        (*index_fetch_reset) (struct IndexFetchTableData *data);
+</programlisting>
+
+     This API to reset the index fetch, typically it releases the AM specific resources
+     that are held by <structname>IndexFetchTableData</structname> of a index scan.
+    </para>
+
+    <para>
+<programlisting>
+void        (*index_fetch_end) (struct IndexFetchTableData *data);
+</programlisting>
+
+     This API to release AM-specific resources held by the <structname>IndexFetchTableData</structname>
+     and free the memory of <structname>IndexFetchTableData</structname> itself.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*index_fetch_tuple) (struct IndexFetchTableData *scan,
+                                  ItemPointer tid,
+                                  Snapshot snapshot,
+                                  TupleTableSlot *slot,
+                                  bool *call_again, bool *all_dead);
+</programlisting>
+
+     This API to fetch the tuple pointed by <literal>tid</literal> of a relation and store it in the
+     <literal>slot</literal> after performing visibility check according the provided <literal>snapshot</literal>.
+     Returns true when the tuple is found or false. 
+     
+     The <literal>call_again</literal> is false when the API is called for the first time with the <literal>tid</literal>,
+     in case if there are any potential match for another tuple, <literal>call_again</literal> must be
+     set to true to indicate the caller to execute the API again to fetch the tuple.
+
+     The <literal>all_dead</literal> is not NULL, should be set to true if by the API function iff it
+     is guaranteed that no backend needs to see that tuple. Index AMs can use that do avoid returning
+     that tid in future searches.
+    </para>
+
+   </sect3>
+
+   <sect3 id="non-modifying-tuple-functions">
+    <title>Non modifying tuple functions</title>
+
+    <para>
+<programlisting>
+bool        (*tuple_fetch_row_version) (Relation rel,
+                                        ItemPointer tid,
+                                        Snapshot snapshot,
+                                        TupleTableSlot *slot);
+</programlisting>
+
+     This API to fetches the latest tuple specified by the ItemPointer <literal>tid</literal>
+     and store it in the <literal>slot</literal> after doing a visibilty test according to the
+     <literal>snapshot</literal>. If a tuple was found and passed visibility test return true,
+     otherwise false.
+
+     For e.g, in the case if Heap AM, the update chains are created whenever
+     the tuple is updated, so the function should fetch the latest tuple.
+    </para>
+
+    <para>
+<programlisting>
+void        (*tuple_get_latest_tid) (Relation rel,
+                                     Snapshot snapshot,
+                                     ItemPointer tid);
+</programlisting>
+
+     This API to get the latest version of the tuple based on the specified ItemPointer <literal>tid</literal>.
+     For e.g, in the case of Heap AM, the update chains are created whenever any tuple is updated.
+     This API is used to find out latest ItemPointer.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*tuple_satisfies_snapshot) (Relation rel,
+                                         TupleTableSlot *slot,
+                                         Snapshot snapshot);
+</programlisting>
+
+     This API performs the tuple visibility that is present in the <literal>slot</literal>
+     based on provided <literal>snapshot</literal> and returns true if the current tuple is visible,
+     otherwise false. Some AMs might modify the data underlying the tuple as a side-effect.
+     If so they ought to mark the relevant buffer dirty.
+    </para>
+    
+    <para>
+<programlisting>
+bool        (*tuple_fetch_follow) (struct IndexFetchTableData *scan,
+                                   ItemPointer tid,
+                                   Snapshot snapshot,
+                                   TupleTableSlot *slot,
+                                   bool *call_again, bool *all_dead);
+</programlisting>
+
+     This API is used to fetch the tuple pointed by the ItemPointer based on the
+     IndexFetchTableData and store it in the specified slot and also updates the flags.
+     This API is called from the index scan operation.
+    </para>
+
+    <para>
+<programlisting>
+TransactionId (*compute_xid_horizon_for_tuples) (Relation rel,
+                                                 ItemPointerData *items,
+                                                 int nitems);
+</programlisting>
+
+     This API to get the newest xid among the provided tuples by <literal>items</literal>. This is used
+     to compute what snapshots to conflict with the <literal>items</literal> when replaying WAL records
+     for page-level index vacuums.
+    </para>
+    
+   </sect3>
+
+   <sect3 id="manipulation-of-physical-tuples-functions">
+    <title>Manipulation of physical tuples functions</title>
+
+    <para>
+<programlisting>
+void        (*tuple_insert) (Relation rel, TupleTableSlot *slot, CommandId cid,
+                             int options, struct BulkInsertStateData *bistate);
+</programlisting>
+
+     This API to insert a tuple from a <literal>slot</literal> into AM routine.
+     
+     The <literal>cid</literal> is command identifier used to specify the number
+     of the comamnd that is getting processed.
+     
+     The <literal>options</literal> bitmask allows to specify options that allow
+     to change the behaviour of the AM. Several options might be ignored by AMs
+     not supporting them.
  
+     If the <literal>TABLE_INSERT_SKIP_WAL</literal> option is specified, the new
+     tuple will not necessarily logged to WAL, even for a non-temp relation. It is
+     the AMs choice whether this optimization is supported.
+     If the <literal>TABLE_INSERT_SKIP_FSM</literal> option is specified, AMs are
+     free to not reuse free space in the relation. This can save some cycles when
+     we know the relation is new and doesn't contain useful amounts of free space.
+     It's commonly passed directly to RelationGetBufferForTuple, see for more info.
+     If the <literal>TABLE_INSERT_FROZEN</literal> option can only be specified for
+     inserts into relfilenodes created during the current subtransaction and when
+     there are no prior snapshots or pre-existing portals open. This causes rows to
+     be frozen, which is an MVCC violation and requires explicit options chosen by user.
+     If the <literal>TABLE_INSERT_NO_LOGICAL</literal> can only be specified to
+     indicate the AM to force-disables the emitting of logical decoding information
+     for the tuple. This should solely be used during table rewrites where
+     RelationIsLogicallyLogged(relation) is not yet accurate for the new relation.
+    
+     The <literal>BulkInsertState</literal> object (if any; bistate can be NULL for default
+     behavior) is also just passed through to RelationGetBufferForTuple.
+
+     On return the slot's tts_tid and tts_tableOid are updated to reflect the
+     insertion.
+    </para>
+
+    <para>
+<programlisting>
+void        (*tuple_insert_speculative) (Relation rel,
+                                         TupleTableSlot *slot,
+                                         CommandId cid,
+                                         int options,
+                                         struct BulkInsertStateData *bistate,
+                                         uint32 specToken);
+</programlisting>
+
+     This API is similar like <literal>tuple_insert</literal> API, but it perform a
+     "speculative insertion". This API is used to backed out afterwards without aborting
+     the whole transaction.
+     
+     Other sessions can wait for the speculative insertion to be confirmed, turning it
+     into a regular tuple, or aborted, as if it never existed.  Speculatively inserted
+     tuples behave as "value locks" of short duration, used to implement
+     <command>INSERT .. ON CONFLICT</command>.
+ 
+     A transaction having performed a speculative insertion has to either abort, or finish
+     the speculative insertion with <function>table_complete_speculative()</function>.
+    </para>
+
+    <para>
+<programlisting>
+void        (*tuple_complete_speculative) (Relation rel,
+                                           TupleTableSlot *slot,
+                                           uint32 specToken,
+                                           bool succeeded);
+</programlisting>
+
+     This API to complete the speculative insertion of a tuple started in the same transaction
+     by <literal>tuple_insert_speculative</literal>, It is invoked after finishing the index insert.
+     If <literal>succeeded</literal> is true, the tuple is fully inserted, if false it should b
+     removed.
+    </para>
+
+    <para>
+<programlisting>
+TM_Result (*tuple_delete) (Relation rel,
+                           ItemPointer tid,
+                           CommandId cid,
+                           Snapshot snapshot,
+                           Snapshot crosscheck,
+                           bool wait,
+                           TM_FailureData *tmfd,
+                           bool changingPart);
+</programlisting>
+
+     This API to delete a tuple of the relation pointed by the ItemPointer <literal>tid</literal>
+     and returns the result of the operation.
+     
+     The <literal>cid</literal> is a command identifier, used for the visibility test to identify
+     the tuple according to the snapshot <literal>snapshot</literal>.
+     The <literal>crosscheck</literal> is not null, use it for verifying it against the visibilty
+     test.
+     The <literal>wait</literal> is true indicates the process to wait for any conflicting transactions
+     to either commit/rollback.
+     
+     The following two parameters must be outputed by the API function.
+
+     The <literal>tmfd</literal> should be set with proper details when the tuple delete operation fails.
+     The data that needs to fill in case failure, refer <structname>TM_FailureData</structname>.
+     The <literal>changingPart</literal> true iff the tuple is being moved to another partition
+     table due to an update of the partition key. Otherwise, false.
+    </para>
+
+    <para>
+<programlisting>
+TM_Result (*tuple_update) (Relation rel,
+                           ItemPointer otid,
+                           TupleTableSlot *slot,
+                           CommandId cid,
+                           Snapshot snapshot,
+                           Snapshot crosscheck,
+                           bool wait,
+                           TM_FailureData *tmfd,
+                           LockTupleMode *lockmode,
+                           bool *update_indexes);
+</programlisting>
+
+     This API to updates a tuple of the relation pointed by the ItemPointer <literal>otid</literal>
+     with the new tuple from <literal>slot</literal> and returns the result of the operation.
+     
+     The <literal>cid</literal> is a command identifier, used for the visibility test to identify
+     the tuple according to the snapshot <literal>snapshot</literal>.
+     The <literal>crosscheck</literal> is not null, use it for verifying it against the visibilty
+     test.
+     The <literal>wait</literal> is true indicates the process to wait for any conflicting transactions
+     to either commit/rollback.
+     
+     The following three parameters must be outputed by the API function.
+
+     The <literal>tmfd</literal> should be set with proper details when the tuple update operation fails.
+     The data that needs to fill in case failure, refer <structname>TM_FailureData</structname>.
+     The <literal>lockmode</literal> filled with lock mode acquired on tuple.
+     The <literal>update_indexes</literal> true if new index entries are required for this tuple. 
+     Otherwise false.
+     
+     On return the slot's tts_tid and tts_tableOid are updated to reflect the update. In particular,
+     slot->tts_tid is set to the TID where the new tuple was inserted, and its HEAP_ONLY_TUPLE flag
+     is set iff a HOT update was done.
+    </para>
+
+    <para>
+<programlisting>
+void        (*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
+                             CommandId cid, int options, struct BulkInsertStateData *bistate);
+</programlisting>
+
+     This API to perform insertion of multiple tuples from <literal>slots</literal> into the relation
+     for faster data insertion. Refer <function>tuple_insert</function> for more details about parameters and
+     return values.
+    </para>
+
+    <para>
+<programlisting>
+TM_Result (*tuple_lock) (Relation rel,
+                         ItemPointer tid,
+                         Snapshot snapshot,
+                         TupleTableSlot *slot,
+                         CommandId cid,
+                         LockTupleMode mode,
+                         LockWaitPolicy wait_policy,
+                         uint8 flags,
+                         TM_FailureData *tmfd);
+</programlisting>
+
+     This API to lock a tuple pointed by the ItemPointer <literal>tid</literal> with the specified mode
+     and return the result of the operation.
+     
+     The <literal>cid</literal> is a command identifier, used for the visibility test to identify
+     the tuple according to the snapshot <literal>snapshot</literal>.
+     The <literal>mode</literal> lock mode desired.
+     The <literal>wait_policy</literal> indicates the operation in case if the tuple lock is not available.
+     The <literal>flags</literal> allows to specify options such as <literal>TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS</literal>
+     to follow the update chain to also lock descendant tuples if lock modes don't conflict. or 
+     <literal>TUPLE_LOCK_FLAG_FIND_LAST_VERSION</literal> update chain and lock latest version.
+     
+     The following two parameters must be outputed by the API function.
+     
+     The <literal>tmfd</literal> should be set with proper details when the tuple update operation fails.
+     The <literal>slot</literal> contains the locked target tuple.
+    </para>
+
+    <para>
+<programlisting>
+void        (*finish_bulk_insert) (Relation rel, int options);
+</programlisting>
+
+     This API to perform the operations necessary to complete insertions made
+     via <literal>tuple_insert</literal> and <literal>multi_insert</literal> with a
+     BulkInsertState specified. This e.g. may e.g. used to flush the relation when
+     inserting with skipping WAL or may be no operation.
+    </para>
+
+   </sect3>
+
+   <sect3 id="ddl-related-functions">
+    <title>DDL related functions</title>
+
+    <para>
+<programlisting>
+void        (*relation_set_new_filenode) (Relation rel,
+                                          char persistence,
+                                          TransactionId *freezeXid,
+                                          MultiXactId *minmulti);
+</programlisting>
+
+     This API to create the storage file for the relation <literal>rel</literal>, with presistence set to
+     <literal>persistence</literal> that is necessary to store the tuples of the relation.
+     <literal>freezeXid</literal>, <literal>minmulti</literal> are set to the xid / multixact
+     horizon for the table that pg_class.{relfrozenxid, relminmxid} have to be set to. For e.g, the Heap AM,
+     should create the relfilenode that is necessary to store the heap tuples.
+    </para>
+
+    <para>
+<programlisting>
+void        (*relation_nontransactional_truncate) (Relation rel);
+</programlisting>
+
+     This API is used to remove all table contents from `rel`, in a non-transactional manner.
+     Non-transactional meaning that there's no need to support rollbacks. This commonly only
+     is used to perform truncations for relfilenodes created in the current transaction.
+     This operation is not non-reversible.
+    </para>
+
+    <para>
+<programlisting>
+void        (*relation_copy_data) (Relation rel, RelFileNode newrnode);
+</programlisting>
+
+     This API to perform the copy of the relation <literal>rel</literal> from existing filenode
+     to the new filenode <literal>newrnode</literal> and removes the existing filenode.
+    </para>
+
+    <para>
+<programlisting>
+void        (*relation_copy_for_cluster) (Relation NewHeap, Relation OldHeap, Relation OldIndex,
+                                          bool use_sort,
+                                          TransactionId OldestXmin, TransactionId FreezeXid, MultiXactId MultiXactCutoff,
+                                          double *num_tuples, double *tups_vacuumed, double *tups_recently_dead);
+</programlisting>
+
+     This API to make a copy data from <literal>OldHeap</literal> into <literal>NewHeap</literal>,
+     as part of a <command>CLUSTER</command> or <command>VACUUM FULL</command>.
+ 
+     If <literal>use_sort</literal> is true, the table contents are sorted appropriate for 
+     <literal>OldIndex</literal>; if <literal>use_sort</literal> is false and <literal>OldIndex</literal>
+     is not InvalidOid, the data is copied in that index's order; if <literal>use_sort</literal>
+     is false and <literal>OidIndex</literal> is InvalidOid, no sorting is performed.
+ 
+     <literal>OldestXmin</literal>, <literal>FreezeXid</literal>, <literal>MultiXactCutoff</literal>
+     can be used to clean the dead tuples of the table.
+ 
+     On successfule operation, <literal>num_tuples</literal>, <literal>tups_vacuumed</literal>, <literal>tups_recently_dead</literal>
+     will contain statistics computed while copying for the relation. Not all might make sense for every AM.
+    </para>
+    
+    <para>
+<programlisting>
+void        (*relation_vacuum) (Relation onerel, int options,
+                                struct VacuumParams *params, BufferAccessStrategy bstrategy);
+</programlisting>
+
+     This API performs vacuuming of the relation based on the specified params.
+     It Gathers all the dead tuples of the relation and clean them including
+     the indexes.
+     
+     This API Perform <command>VACUUM</command> on the relation. The <command>VACUUM</command>
+     can be user triggered or by <literal>autovacuum</literal>. The specific actions performed
+     by the AM will depend heavily on the individual AM. For eg- heapAM, It Gathers all the
+     dead tuples of the relation and clean them including the indexes
+
+     Note that neither <command>VACUUM FULL</command> (and <command>CLUSTER</command>), nor
+     <command>ANALYZE</command> go through this routine, even if (in the latter case), part of
+     the same <command>VACUUM</command> command.
+    </para>
+
+    <para>
+<programlisting>
+void        (*scan_analyze_next_block) (TableScanDesc scan, BlockNumber blockno,
+                                        BufferAccessStrategy bstrategy);
+</programlisting>
+
+     This API prepares the block <literal>blockno</literal> to analyze of table scan <literal>scan</literal>.
+     The scan needs to have been started with <function>table_beginscan_analyze()</function>.
+     Note that this routine might acquire resources like locks that are held until
+     <function>table_scan_analyze_next_tuple()</function> returns false.
+ 
+     Returns false if block is unsuitable for sampling, true otherwise.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*scan_analyze_next_tuple) (TableScanDesc scan, TransactionId OldestXmin,
+                                        double *liverows, double *deadrows, TupleTableSlot *slot);
+</programlisting>
+
+     This API iterate over tuples in the block selected with <function>table_scan_analyze_next_block()</function>
+     If a tuple that's suitable for sampling is found, returns true and a tuple is stored in <literal>slot</literal>.
+     Otherwise returns false.
+     
+     The <literal>liverows</literal> and <literal>deadrows</literal> are incremented according to the encountered
+     tuples.
+    </para>
+    
+    <para>
+<programlisting>
+double      (*index_build_range_scan) (Relation heap_rel,
+                                       Relation index_rel,
+                                       IndexInfo *index_nfo,
+                                       bool allow_sync,
+                                       bool anyvisible,
+                                       BlockNumber start_blockno,
+                                       BlockNumber end_blockno,
+                                       IndexBuildCallback callback,
+                                       void *callback_state,
+                                       TableScanDesc scan);
+</programlisting>
+
+     This API to scan the table to find the tuples to be indexed from the specified
+     blocks of a given relation and insert them into the specified index using the
+     provided the callback function.
+     
+     This is called back from an access-method-specific index build procedure
+     after the AM has done whatever setup it needs.  The parent heap relation
+     <literal>heap_rel</literal> is scanned to find tuples that should be entered
+     into the index <literal>index_rel</literal>.  Each such tuple is passed to
+     the AM's callback routine <literal>callback</literal>, which does the right
+     things to add it to the new index.  After we return, the AM's index build
+     procedure does whatever cleanup it needs.
+
+     The <literal>callback_state</literal> is member needs to be passed to the
+     <literal>callback</literal> when it is invoked from the AM specific function.
+     
+     The <literal>allow_sync</literal> specifies the scan on the relation should follow
+     <literal>synchronize_seqscans</literal> configuration parameter.
+     
+     The <literal>index_info</literal> can be used to pass back some infromation
+     related to the AM. For eg- in heapAM, <structfield>indexInfo->ii_BrokenHotChain</structfield>
+     to true if we detect any potentially broken HOT chains.  Currently, we set
+     this if there are any RECENTLY_DEAD or DELETE_IN_PROGRESS entries in a HOT chain,
+     without trying very hard to detect whether they're really incompatible with
+     the chain tip. This need to be generalized for other AMs later.
+       
+     When <literal>anyvisible</literal> mode is requested, all tuples visible to
+     any transaction are indexed and counted as live, including those inserted
+     or deleted by transactions that are still in progress.
+ 
+     The <literal>start_blockno</literal> and <literal>end_blockno</literal> are
+     used to specify the range of the blocks that needs to be scanned on the relation.
+     
+     Upon successful execution, The total count of live tuples is returned. This
+     is for updating pg_class statistics.  
+    </para>
+
+    <para>
+<programlisting>
+void        (*index_validate_scan) (Relation heap_rel,
+                                    Relation index_rel,
+                                    IndexInfo *index_info,
+                                    Snapshot snapshot,
+                                    struct ValidateIndexState *state);
+</programlisting>
+
+     This API to perform second table scan in a concurrent index build.
+     
+     The table <literal>heap_rel</literal> scanned to find the tuples and insert
+     them into <literal>index_rel</literal> according to the given snapshot <literal>snapshot</literal>
+     by verifying their ItemPointerData in the provided <structname>ValidateIndexState</structname> struct;
+     this API is used as the last phase of a concurrent index build.
+    </para>
+    
+   </sect3>
+
+   <sect3 id="planner-functions">
+    <title>planner functions</title>
+
+    <para>
+<programlisting>
+void        (*relation_estimate_size) (Relation rel, int32 *attr_widths,
+                                       BlockNumber *pages, double *tuples, double *allvisfrac);
+</programlisting>
+
+     This API estimates the current size of the relation <literal>rel</literal> and also
+     returns the number of <literal>pages</literal>, <literal>tuples</literal> corresponding relation.
+
+     If <literal>attr_widths</literal> isn't NULL, It may contains the previously cached relation
+     attribute widhts, AM must fill this in if we have need to compute the attribute widths for
+     estimation purposes.
+     
+     It also returns <literal>allvisfrac</literal> the fraction of the pages that are marked
+     all-visible in the visibilty map.
+    </para>
+
+   </sect3>
+
+   <sect3 id="executor-functions">
+    <title>executor functions</title>
+
+    <para>
+<programlisting>
+bool        (*scan_bitmap_next_block) (TableScanDesc scan,
+                                       TBMIterateResult *tbmres);
+</programlisting>
+
+     This API prepare to fetch / check / return tuples from <literal>tbmres</literal>
+     structure <structfield>blockno</structfield> as part of a bitmap table scan.
+     <literal>scan</literal> that was started via <function>table_beginscan_bm()</function>.
+     Return false if there's no tuples to be found on the page, true otherwise.
+     
+     If <structfield>tbmres->blockno</structfield> is -1, this is a lossy scan
+     and all visible tuples on the page have to be returned, otherwise the tuples at
+     offsets in <structfield>tbmres->offsets</structfield> need to be returned.
+     
+     This is an optional callback, but either both <function>scan_bitmap_next_block</function>
+     and <function>scan_bitmap_next_tuple</function> need to exist, or neither.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*scan_bitmap_next_tuple) (TableScanDesc scan,
+                                       TupleTableSlot *slot);
+</programlisting>
+
+     This API to get the next tuple from the set of tuples of a given page specified in the scan descriptor
+     and return the provided slot; returns false in case if there are no more tuples.
+     
+     This API fetch the next tuple of a bitmap table scan into <literal>slot</literal>
+     and return true if a visible tuple was found, false otherwise.
+     
+     For some AMs it will make more sense to do all the work referencing <literal>tbmres</literal>
+     contents in <function>scan_bitmap_next_block</function>, for others it might be
+     better to defer more work to this callback.
+     
+     This is an optional callback, but either both <function>scan_bitmap_next_block</function>
+     and <function>scan_bitmap_next_tuple</function> need to exist, or neither.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*scan_sample_next_block) (TableScanDesc scan,
+                                       struct SampleScanState *scanstate);
+</programlisting>
+
+     This API to prepare to fetch tuples from the next block in a sample scan.
+     Return false if the sample scan is finished, true otherwise. <literal>scan</literal>
+     was started via <function>table_beginscan_sampling()</function>.
+     
+     Typically this will first determine the target block by call the 
+     <structfield>scanstate->tsmroutine->NextSampleBlock</structfield> if not NULL,
+     or alternatively perform a sequential scan over all blocks.  The determined
+     block is then typically read and pinned, 
+     
+     As the TsmRoutine interface is block based, the blocks needs to be passed
+     to <function>NextSampleBlock()</function> to return the sampled block.
+     
+     Note that it's not acceptable to hold deadlock prone resources such as lwlocks
+     until <function>scan_sample_next_tuple()</function> has exhausted the tuples on the
+     block - the tuple is likely to be returned to an upper query node, and the
+     next call could be off a long while. Holding buffer pins etc is obviously OK.
+     
+     Currently it is required to implement this interface, as there's no
+     alternative way (contrary e.g. to bitmap scans) to implement sample
+     scans. If infeasible to implement the AM may raise an error.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*scan_sample_next_tuple) (TableScanDesc scan,
+                                       struct SampleScanState *scanstate,
+                                       TupleTableSlot *slot);
+</programlisting>
+
+     This API must determine the next tuple and store it in <literal>slot</literal>
+     from the selected block using the TsmRoutine's <function>NextSampleTuple()</function>
+     callback.
+     
+     This API needs to perform visibilty checks according to snapshot and return only
+     the valid tuple.
+     
+     The <literal>TsmRoutine</literal> interface assumes that there's a maximum offset
+     on a given page, so if that doesn't apply to an AM, it needs to emulate that
+     assumption somehow.
+    </para>
+
+  </sect3>
+  </sect2>
+ </sect1>
+
  <sect1 id="index-access-methods">
   <title>Overview of Index access methods</title>
 
-- 
2.20.1.windows.1

#137Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#136)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-04-02 11:39:57 +1100, Haribabu Kommi wrote:

What made you rename indexam.sgml to am.sgml, instead of creating a
separate tableam.sgml? Seems easier to just have a separate file?

No specific reason, I just thought of adding all the access methods under
one file.
I can change it to tableam.sgml.

I'd rather keep it separate. It seems likely that both table and indexam
docs will grow further over time, and they're not that closely
related. Additionally not moving sect1->sect2 etc will keep links stable
(which we could also achieve with different sect1 names, I realize
that).

I can understand your point of avoiding function-by-function API reference,
as the user can check directly the code comments, Still I feel some people
may refer the doc for API changes. I am fine to remove based on your
opinion.

I think having to keeping both tableam.h and the sgml file current is
too much overhead - and anybody that's going to create a new tableam is
going to be able to look into the source anyway.

Greetings,

Andres Freund

#138Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Andres Freund (#137)
2 attachment(s)
Re: Pluggable Storage - Andres's take

On Tue, Apr 2, 2019 at 11:53 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-04-02 11:39:57 +1100, Haribabu Kommi wrote:

What made you rename indexam.sgml to am.sgml, instead of creating a
separate tableam.sgml? Seems easier to just have a separate file?

No specific reason, I just thought of adding all the access methods under
one file.
I can change it to tableam.sgml.

I'd rather keep it separate. It seems likely that both table and indexam
docs will grow further over time, and they're not that closely
related. Additionally not moving sect1->sect2 etc will keep links stable
(which we could also achieve with different sect1 names, I realize
that).

OK.

I can understand your point of avoiding function-by-function API

reference,

as the user can check directly the code comments, Still I feel some

people

may refer the doc for API changes. I am fine to remove based on your
opinion.

I think having to keeping both tableam.h and the sgml file current is
too much overhead - and anybody that's going to create a new tableam is
going to be able to look into the source anyway.

Here I attached updated patches as per the discussion.
Is the description of table access methods is enough? or do you want me to
add some more details?

Regards,
Haribabu Kommi
Fujitsu Australia

Attachments:

0002-Doc-updates-for-pluggable-table-access-method-syntax.patchapplication/octet-stream; name=0002-Doc-updates-for-pluggable-table-access-method-syntax.patchDownload
From 0093ac615d5100022f6f17e0ab8259a9c839c978 Mon Sep 17 00:00:00 2001
From: Hari Babu <kommi.haribabu@gmail.com>
Date: Tue, 2 Apr 2019 16:00:52 +1100
Subject: [PATCH 2/2] Doc updates for pluggable table access method syntax

Default_table_access_method GUC

CREATE ACCESS METHOD ... TYPE TABLE ..

CREATE MATERIALIZED VIEW ... USING heap2 ...

CREATE TABLE ... USING heap2 ...

CREATE TABLE AS ... USING heap2 ...
---
 doc/src/sgml/config.sgml                      | 24 +++++++++++++++++++
 doc/src/sgml/ref/create_access_method.sgml    | 14 +++++++----
 .../sgml/ref/create_materialized_view.sgml    | 13 ++++++++++
 doc/src/sgml/ref/create_table.sgml            | 15 ++++++++++++
 doc/src/sgml/ref/create_table_as.sgml         | 13 ++++++++++
 5 files changed, 74 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d383de2512..e03da43b7d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7231,6 +7231,30 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-default-table-access-method" xreflabel="default_table_access_method">
+      <term><varname>default_table_access_method</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>default_table_access_method</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        This parameter specifies the default table access method to use when creating
+        tables or materialized views if the <command>CREATE</command> command does
+        not explicitly specify an access method.
+       </para>
+
+       <para>
+        The value is either the name of a table access method, or an empty string
+        to specify using the default table access method of the current database.
+        If the value does not match the name of any existing table access method,
+        <productname>PostgreSQL</productname> will automatically use the default
+        table access method of the current database.
+       </para>
+
+      </listitem>
+     </varlistentry>
+     
      <varlistentry id="guc-default-tablespace" xreflabel="default_tablespace">
       <term><varname>default_tablespace</varname> (<type>string</type>)
       <indexterm>
diff --git a/doc/src/sgml/ref/create_access_method.sgml b/doc/src/sgml/ref/create_access_method.sgml
index 851c5e63be..dae43dbaed 100644
--- a/doc/src/sgml/ref/create_access_method.sgml
+++ b/doc/src/sgml/ref/create_access_method.sgml
@@ -61,7 +61,8 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
     <listitem>
      <para>
       This clause specifies the type of access method to define.
-      Only <literal>INDEX</literal> is supported at present.
+      Only <literal>TABLE</literal> and <literal>INDEX</literal>
+      are supported at present.
      </para>
     </listitem>
    </varlistentry>
@@ -75,10 +76,13 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
       that represents the access method.  The handler function must be
       declared to take a single argument of type <type>internal</type>,
       and its return type depends on the type of access method;
-      for <literal>INDEX</literal> access methods, it must
-      be <type>index_am_handler</type>.  The C-level API that the handler
-      function must implement varies depending on the type of access method.
-      The index access method API is described in <xref linkend="indexam"/>.
+      for <literal>TABLE</literal> access methods, it must
+      be <type>table_am_handler</type> and for <literal>INDEX</literal>
+      access methods, it must be <type>index_am_handler</type>.
+      The C-level API that the handler function must implement varies
+      depending on the type of access method. The table access method API
+      is described in <xref linkend="tableam"/> and the index access method
+      API is described in <xref linkend="indexam"/>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/doc/src/sgml/ref/create_materialized_view.sgml b/doc/src/sgml/ref/create_materialized_view.sgml
index 7f31ab4d26..754c4194f1 100644
--- a/doc/src/sgml/ref/create_materialized_view.sgml
+++ b/doc/src/sgml/ref/create_materialized_view.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 <synopsis>
 CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     [ (<replaceable>column_name</replaceable> [, ...] ) ]
+    [ USING <replaceable class="parameter">method</replaceable> ]
     [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) ]
     [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
     AS <replaceable>query</replaceable>
@@ -85,6 +86,18 @@ CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional method for the new materialized view; The method should
+      be of type <literal>TABLE</literal>. See <xref linkend="tableam"/> for more information. 
+      If this option is not specified, the default table access method is chosen for
+      the new materialized view. See <xref linkend="guc-default-table-access-method"/> for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 0fcbc660b3..dcad41b8dd 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -29,6 +29,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 ] )
 [ INHERITS ( <replaceable>parent_table</replaceable> [, ... ] ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -40,6 +41,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     [, ... ]
 ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -51,6 +53,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     [, ... ]
 ) ] { FOR VALUES <replaceable class="parameter">partition_bound_spec</replaceable> | DEFAULT }
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -1170,6 +1173,18 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional method for the new table; The method should be
+      of type <literal>TABLE</literal>. See <xref linkend="tableam"/> for more information.
+      If this option is not specified, the default table access method is chosen
+      for the new table. See <xref linkend="guc-default-table-access-method"/> for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 679e8f521e..f807de2c00 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 <synopsis>
 CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     [ (<replaceable>column_name</replaceable> [, ...] ) ]
+    [ USING <replaceable class="parameter">method</replaceable> ]
     [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
     [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
     [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -120,6 +121,18 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional method for the new table; The method should be
+      of type <literal>TABLE</literal>. See <xref linkend="tableam"/> for more information.
+      If this option is not specified, then the default table access method is chosen
+      for the new table. see <xref linkend="guc-default-table-access-method"/> for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+   
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
-- 
2.20.1.windows.1

0001-tableam-doc-update-of-table-access-methods.patchapplication/octet-stream; name=0001-tableam-doc-update-of-table-access-methods.patchDownload
From a72cfcd523887f1220473231d7982928acc23684 Mon Sep 17 00:00:00 2001
From: Hari Babu <kommi.haribabu@gmail.com>
Date: Tue, 2 Apr 2019 15:41:17 +1100
Subject: [PATCH 1/2] tableam : doc update of table access methods

Providing basic explanation of table access methods
including their structure details and reference heap
implementation files.
---
 doc/src/sgml/catalogs.sgml |  5 ++--
 doc/src/sgml/filelist.sgml |  1 +
 doc/src/sgml/postgres.sgml |  1 +
 doc/src/sgml/tableam.sgml  | 56 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 61 insertions(+), 2 deletions(-)
 create mode 100644 doc/src/sgml/tableam.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index f4aabf5dc7..200708e121 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -587,8 +587,9 @@
    The catalog <structname>pg_am</structname> stores information about
    relation access methods.  There is one row for each access method supported
    by the system.
-   Currently, only indexes have access methods.  The requirements for index
-   access methods are discussed in detail in <xref linkend="indexam"/>.
+   Currently, only table and indexes have access methods. The requirements for table
+   access methods are discussed in detail in <xref linkend="tableam"/> and the
+   requirements for index access methods are discussed in detail in <xref linkend="indexam"/>.
   </para>
 
   <table>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index a03ea1427b..7e37042a55 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -89,6 +89,7 @@
 <!ENTITY gin        SYSTEM "gin.sgml">
 <!ENTITY brin       SYSTEM "brin.sgml">
 <!ENTITY planstats    SYSTEM "planstats.sgml">
+<!ENTITY tableam    SYSTEM "tableam.sgml">
 <!ENTITY indexam    SYSTEM "indexam.sgml">
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 96d196d229..3e115f1c76 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -250,6 +250,7 @@
   &tablesample-method;
   &custom-scan;
   &geqo;
+  &tableam;
   &indexam;
   &generic-wal;
   &btree;
diff --git a/doc/src/sgml/tableam.sgml b/doc/src/sgml/tableam.sgml
new file mode 100644
index 0000000000..9eca52ee70
--- /dev/null
+++ b/doc/src/sgml/tableam.sgml
@@ -0,0 +1,56 @@
+<!-- doc/src/sgml/tableam.sgml -->
+
+<chapter id="tableam">
+ <title>Table Access Method Interface Definition</title>
+ 
+  <para>
+   This chapter defines the interface between the core <productname>PostgreSQL</productname>
+   system and <firstterm>access methods</firstterm>, which manage <literal>TABLE</literal>
+   types. The core system knows nothing about these access methods beyond
+   what is specified here, so it is possible to develop entirely new access
+   method types by writing add-on code.
+  </para>
+  
+  <para>
+   All Tables in <productname>PostgreSQL</productname> system are the primary
+   data store. Each table is stored as its own physical <firstterm>relation</firstterm>
+   and so is described by an entry in the <structname>pg_class</structname>
+   catalog. A table's content is entirely controlled by its access method.
+   (All the access methods furthermore use the standard page layout described
+   in <xref linkend="storage-page-layout"/>.)
+  </para>
+  
+  <para>
+   Each table access method is described by a row in the
+   <link linkend="catalog-pg-am"><structname>pg_am</structname></link> system
+   catalog. The <structname>pg_am</structname> entry specifies a <firstterm>type</firstterm>
+   of the access method and a <firstterm>handler function</firstterm> for the
+   access method. These entries can be created and deleted using the <xref linkend="sql-create-access-method"/>
+   and <xref linkend="sql-drop-access-method"/> SQL commands.
+  </para>
+
+  <para>
+   A table access method handler function must be declared to accept a single
+   argument of type <type>internal</type> and to return the pseudo-type
+   <type>table_am_handler</type>.  The argument is a dummy value that simply
+   serves to prevent handler functions from being called directly from SQL commands.
+   The result of the function must be a palloc'd struct of type <structname>TableAmRoutine</structname>,
+   which contains everything that the core code needs to know to make use of
+   the table access method. The <structname>TableAmRoutine</structname> struct,
+   also called the access method's <firstterm>API struct</firstterm>, includes
+   fields specifying assorted fixed properties of the access method, such as
+   whether it can support bitmap scans.  More importantly, it contains pointers
+   to support functions for the access method, which do all of the real work to
+   access tables. These support functions are plain C functions and are not
+   visible or callable at the SQL level.  The support functions are described
+   in <structname>TableAmRoutine</structname> structure. For more details, please
+   refer the file <filename>src/include/access/tableam.h</filename>.
+  </para>
+
+  <para>
+   Any new <literal>TABLE</literal> access method developers can refer the exisitng
+   <literal>HEAP</literal> implementation present in the <filename>src/backend/heap/heapam_handler.c</filename>
+   for more details of how it is implemented for HEAP access method.
+  </para>
+
+</chapter>
-- 
2.20.1.windows.1

#139Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#138)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-04-02 17:11:07 +1100, Haribabu Kommi wrote:

+     <varlistentry id="guc-default-table-access-method" xreflabel="default_table_access_method">
+      <term><varname>default_table_access_method</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>default_table_access_method</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        The value is either the name of a table access method, or an empty string
+        to specify using the default table access method of the current database.
+        If the value does not match the name of any existing table access method,
+        <productname>PostgreSQL</productname> will automatically use the default
+        table access method of the current database.
+       </para>

Hm, this doesn't strike me as right (there's no such thing as "default
table access method of the current database"). You just get an error in
that case. I think we should simply not allow setting to "" - what's the
point in that?

Greetings,

Andres Freund

#140Andres Freund
andres@anarazel.de
In reply to: Haribabu Kommi (#138)
1 attachment(s)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-04-02 17:11:07 +1100, Haribabu Kommi wrote:

From a72cfcd523887f1220473231d7982928acc23684 Mon Sep 17 00:00:00 2001
From: Hari Babu <kommi.haribabu@gmail.com>
Date: Tue, 2 Apr 2019 15:41:17 +1100
Subject: [PATCH 1/2] tableam : doc update of table access methods

Providing basic explanation of table access methods
including their structure details and reference heap
implementation files.
---
doc/src/sgml/catalogs.sgml | 5 ++--
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/postgres.sgml | 1 +
doc/src/sgml/tableam.sgml | 56 ++++++++++++++++++++++++++++++++++++++
4 files changed, 61 insertions(+), 2 deletions(-)
create mode 100644 doc/src/sgml/tableam.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index f4aabf5dc7..200708e121 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -587,8 +587,9 @@
The catalog <structname>pg_am</structname> stores information about
relation access methods.  There is one row for each access method supported
by the system.
-   Currently, only indexes have access methods.  The requirements for index
-   access methods are discussed in detail in <xref linkend="indexam"/>.
+   Currently, only table and indexes have access methods. The requirements for table
+   access methods are discussed in detail in <xref linkend="tableam"/> and the
+   requirements for index access methods are discussed in detail in <xref linkend="indexam"/>.
</para>

I also adapted pg_am.amtype.

diff --git a/doc/src/sgml/tableam.sgml b/doc/src/sgml/tableam.sgml
new file mode 100644
index 0000000000..9eca52ee70
--- /dev/null
+++ b/doc/src/sgml/tableam.sgml
@@ -0,0 +1,56 @@
+<!-- doc/src/sgml/tableam.sgml -->
+
+<chapter id="tableam">
+ <title>Table Access Method Interface Definition</title>
+ 
+  <para>
+   This chapter defines the interface between the core <productname>PostgreSQL</productname>
+   system and <firstterm>access methods</firstterm>, which manage <literal>TABLE</literal>
+   types. The core system knows nothing about these access methods beyond
+   what is specified here, so it is possible to develop entirely new access
+   method types by writing add-on code.
+  </para>
+  
+  <para>
+   All Tables in <productname>PostgreSQL</productname> system are the primary
+   data store. Each table is stored as its own physical <firstterm>relation</firstterm>
+   and so is described by an entry in the <structname>pg_class</structname>
+   catalog. A table's content is entirely controlled by its access method.
+   (All the access methods furthermore use the standard page layout described
+   in <xref linkend="storage-page-layout"/>.)
+  </para>

I don't think there's actually any sort of dependency on the page
layout. It's entirely conceivable to write an AM that doesn't use
postgres' shared buffers.

+  <para>
+   A table access method handler function must be declared to accept a single
+   argument of type <type>internal</type> and to return the pseudo-type
+   <type>table_am_handler</type>.  The argument is a dummy value that simply
+   serves to prevent handler functions from being called directly from SQL commands.
+   The result of the function must be a palloc'd struct of type <structname>TableAmRoutine</structname>,
+   which contains everything that the core code needs to know to make use of
+   the table access method.

That's not been correct for a while...

The <structname>TableAmRoutine</structname> struct,
+   also called the access method's <firstterm>API struct</firstterm>, includes
+   fields specifying assorted fixed properties of the access method, such as
+   whether it can support bitmap scans.  More importantly, it contains pointers
+   to support functions for the access method, which do all of the real work to
+   access tables. These support functions are plain C functions and are not
+   visible or callable at the SQL level.  The support functions are described
+   in <structname>TableAmRoutine</structname> structure. For more details, please
+   refer the file <filename>src/include/access/tableam.h</filename>.
+  </para>

This seems to not have been adapted after copying it from indexam?

I'm still working on this (in particular I think storage.sgml and
probably some other places needs updates to make clear they apply to
heap not generally; I think there needs to be some references to generic
WAL records in tableam.sgml, ...), but I got to run a few errands.

One thing I want to call out is that I made the reference to
src/include/access/tableam.h a link to gitweb. I think that makes it
much more useful to the casual reader. But it also means that, baring
some infrastructure / procedure we don't have, the link will just
continue to point to master. I'm not particularly concerned about that,
but it seems worth pointing out, given that we've only a single link to
gitweb in the sgml docs so far.

Greetings,

Andres Freund

Attachments:

0001-tableam-docs.patchtext/x-diff; charset=us-asciiDownload
From 33826b211e3f2d5fe76024e2938ff2eb6aeb00da Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 2 Apr 2019 14:55:30 -0700
Subject: [PATCH] tableam: docs

---
 doc/src/sgml/catalogs.sgml                    |  9 +-
 doc/src/sgml/config.sgml                      | 17 ++++
 doc/src/sgml/filelist.sgml                    |  1 +
 doc/src/sgml/postgres.sgml                    |  1 +
 doc/src/sgml/ref/create_access_method.sgml    | 14 ++--
 .../sgml/ref/create_materialized_view.sgml    | 15 ++++
 doc/src/sgml/ref/create_table.sgml            | 19 ++++-
 doc/src/sgml/ref/create_table_as.sgml         | 15 ++++
 doc/src/sgml/ref/select_into.sgml             |  8 ++
 doc/src/sgml/tableam.sgml                     | 83 +++++++++++++++++++
 src/include/access/tableam.h                  |  3 +
 11 files changed, 175 insertions(+), 10 deletions(-)
 create mode 100644 doc/src/sgml/tableam.sgml

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index f4aabf5dc7f..0e38382f319 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -587,8 +587,9 @@
    The catalog <structname>pg_am</structname> stores information about
    relation access methods.  There is one row for each access method supported
    by the system.
-   Currently, only indexes have access methods.  The requirements for index
-   access methods are discussed in detail in <xref linkend="indexam"/>.
+   Currently, only table and indexes have access methods. The requirements for table
+   and index access methods are discussed in detail in <xref linkend="tableam"/> and
+   <xref linkend="indexam"/> respectively.
   </para>
 
   <table>
@@ -634,8 +635,8 @@
       <entry><type>char</type></entry>
       <entry></entry>
       <entry>
-       Currently always <literal>i</literal> to indicate an index access
-       method; other values may be allowed in future
+       <literal>t</literal> = table (including materialized views),
+       <literal>i</literal> = index.
       </entry>
      </row>
     </tbody>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2166b99fc4e..84c507512f1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7266,6 +7266,23 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-default-table-access-method" xreflabel="default_table_access_method">
+      <term><varname>default_table_access_method</varname> (<type>string</type>)
+      <indexterm>
+       <primary><varname>default_table_access_method</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        This parameter specifies the default table access method to use when
+        creating tables or materialized views if the <command>CREATE</command>
+        command does not explicitly specify an access method, or when
+        <command>SELECT ... INTO</command> is used, which does not allow to
+        specify a table access method.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-default-tablespace" xreflabel="default_tablespace">
       <term><varname>default_tablespace</varname> (<type>string</type>)
       <indexterm>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index a03ea1427b9..7e37042a55e 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -89,6 +89,7 @@
 <!ENTITY gin        SYSTEM "gin.sgml">
 <!ENTITY brin       SYSTEM "brin.sgml">
 <!ENTITY planstats    SYSTEM "planstats.sgml">
+<!ENTITY tableam    SYSTEM "tableam.sgml">
 <!ENTITY indexam    SYSTEM "indexam.sgml">
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 96d196d2293..3e115f1c76c 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -250,6 +250,7 @@
   &tablesample-method;
   &custom-scan;
   &geqo;
+  &tableam;
   &indexam;
   &generic-wal;
   &btree;
diff --git a/doc/src/sgml/ref/create_access_method.sgml b/doc/src/sgml/ref/create_access_method.sgml
index 851c5e63beb..dae43dbaed5 100644
--- a/doc/src/sgml/ref/create_access_method.sgml
+++ b/doc/src/sgml/ref/create_access_method.sgml
@@ -61,7 +61,8 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
     <listitem>
      <para>
       This clause specifies the type of access method to define.
-      Only <literal>INDEX</literal> is supported at present.
+      Only <literal>TABLE</literal> and <literal>INDEX</literal>
+      are supported at present.
      </para>
     </listitem>
    </varlistentry>
@@ -75,10 +76,13 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
       that represents the access method.  The handler function must be
       declared to take a single argument of type <type>internal</type>,
       and its return type depends on the type of access method;
-      for <literal>INDEX</literal> access methods, it must
-      be <type>index_am_handler</type>.  The C-level API that the handler
-      function must implement varies depending on the type of access method.
-      The index access method API is described in <xref linkend="indexam"/>.
+      for <literal>TABLE</literal> access methods, it must
+      be <type>table_am_handler</type> and for <literal>INDEX</literal>
+      access methods, it must be <type>index_am_handler</type>.
+      The C-level API that the handler function must implement varies
+      depending on the type of access method. The table access method API
+      is described in <xref linkend="tableam"/> and the index access method
+      API is described in <xref linkend="indexam"/>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/doc/src/sgml/ref/create_materialized_view.sgml b/doc/src/sgml/ref/create_materialized_view.sgml
index 7f31ab4d26d..2824248130c 100644
--- a/doc/src/sgml/ref/create_materialized_view.sgml
+++ b/doc/src/sgml/ref/create_materialized_view.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 <synopsis>
 CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     [ (<replaceable>column_name</replaceable> [, ...] ) ]
+    [ USING <replaceable class="parameter">method</replaceable> ]
     [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) ]
     [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
     AS <replaceable>query</replaceable>
@@ -85,6 +86,20 @@ CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional method for the new materialized view; The
+      method should be an access method of type <literal>TABLE</literal>. See
+      <xref linkend="tableam"/> for more information.  If this option is not
+      specified, the default table access method is chosen for the new
+      materialized view. See <xref linkend="guc-default-table-access-method"/>
+      for more information.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 0fcbc660b31..e3e377a46fc 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -29,6 +29,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 ] )
 [ INHERITS ( <replaceable>parent_table</replaceable> [, ... ] ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -40,6 +41,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     [, ... ]
 ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -51,6 +53,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     [, ... ]
 ) ] { FOR VALUES <replaceable class="parameter">partition_bound_spec</replaceable> | DEFAULT }
 [ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -1170,6 +1173,20 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     </listitem>
    </varlistentry>
 
+   <varlistentry id="sql-createtable-method">
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional method for the new table; The method
+      should be an access method of type <literal>TABLE</literal>. See <xref
+      linkend="tableam"/> for more information.  If this option is not
+      specified, the default table access method is chosen for the new
+      table. See <xref linkend="guc-default-table-access-method"/> for more
+      information.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
@@ -1243,7 +1260,7 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     </listitem>
    </varlistentry>
 
-   <varlistentry>
+   <varlistentry id="sql-createtable-tablespace">
     <term><literal>TABLESPACE <replaceable class="parameter">tablespace_name</replaceable></literal></term>
     <listitem>
      <para>
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 679e8f521ed..385fcf4ad1e 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 <synopsis>
 CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
     [ (<replaceable>column_name</replaceable> [, ...] ) ]
+    [ USING <replaceable class="parameter">method</replaceable> ]
     [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITHOUT OIDS ]
     [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
     [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -120,6 +121,20 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+    <listitem>
+     <para>
+      This clause specifies optional method for the new table; The method
+      should be an access method of type <literal>TABLE</literal>. See <xref
+      linkend="tableam"/> for more information.  If this option is not
+      specified, then the default table access method is chosen for the new
+      table. see <xref linkend="guc-default-table-access-method"/> for more
+      information.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
     <listitem>
diff --git a/doc/src/sgml/ref/select_into.sgml b/doc/src/sgml/ref/select_into.sgml
index 462e3723819..73dd93e9a39 100644
--- a/doc/src/sgml/ref/select_into.sgml
+++ b/doc/src/sgml/ref/select_into.sgml
@@ -104,6 +104,14 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
    <command>CREATE TABLE AS</command> offers a superset of the
    functionality provided by <command>SELECT INTO</command>.
   </para>
+
+  <para>
+   In contrast to <command>CREATE TABLE AS</command> <command>SELECT
+   INTO</command> does not allow to specify properties like a table's access
+   method with <xref linkend="sql-createtable-method" /> or the table's
+   tablespace with <xref linkend="sql-createtable-tablespace" />. Use <xref
+   linkend="sql-createtableas"/> if necessary.
+  </para>
  </refsect1>
 
  <refsect1>
diff --git a/doc/src/sgml/tableam.sgml b/doc/src/sgml/tableam.sgml
new file mode 100644
index 00000000000..fb11df6c744
--- /dev/null
+++ b/doc/src/sgml/tableam.sgml
@@ -0,0 +1,83 @@
+<!-- doc/src/sgml/tableam.sgml -->
+
+<chapter id="tableam">
+ <title>Table Access Method Interface Definition</title>
+
+  <para>
+   This chapter defines the interface between the core
+   <productname>PostgreSQL</productname> system and <firstterm>table access
+   methods</firstterm>, which manage the storage for tables. The core system
+   knows little about these access methods beyond what is specified here, so
+   it is possible to develop entirely new access method types by writing
+   add-on code.
+  </para>
+
+  <para>
+   Each table access method is described by a row in the <link
+   linkend="catalog-pg-am"><structname>pg_am</structname></link> system
+   catalog. The <structname>pg_am</structname> entry specifies a
+   <firstterm>type</firstterm> of the access method and a <firstterm>handler
+   function</firstterm> for the access method. These entries can be created
+   and deleted using the <xref linkend="sql-create-access-method"/> and <xref
+   linkend="sql-drop-access-method"/> SQL commands.
+  </para>
+
+  <para>
+   A table access method handler function must be declared to accept a single
+   argument of type <type>internal</type> and to return the pseudo-type
+   <type>table_am_handler</type>.  The argument is a dummy value that simply
+   serves to prevent handler functions from being called directly from SQL commands.
+
+   The result of the function must be a pointer to a struct of type
+   <structname>TableAmRoutine</structname>, which contains everything that the
+   core code needs to know to make use of the table access method. The return
+   value needs to be of server lifetime, which is typically achieved by
+   defining it as a <literal>static const</literal> variable in global
+   scope. The <structname>TableAmRoutine</structname> struct, also called the
+   access method's <firstterm>API struct</firstterm>, defines the behavior of
+   the access method using callbacks. These callbacks are pointers to plain C
+   functions and are not visible or callable at the SQL level. All the
+   callbacks and their behavior is defined in the
+   <structname>TableAmRoutine</structname> structure (with comments inside the
+   struct defining the requirements for callbacks). Most callbacks have
+   wrapper functions, which are documented for the point of view of a user,
+   rather than an implementor, of the table access method.  For details,
+   please refer to the <ulink url="https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/access/tableam.h;hb=HEAD">
+   <filename>src/include/access/tableam.h</filename></ulink> file.
+  </para>
+
+  <para>
+   To implement a access method, an implementor will typically need to
+   implement a AM specific type of tuple table slot (see
+   <ulink url="https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/executor/tuptable.h;hb=HEAD">
+   <filename>src/include/executor/tuptable.h</filename></ulink>) which allows
+   code outside the access method to hold references to tuples of the AM, and
+   to access the columns of the tuple.
+  </para>
+
+  <para>
+   Currently the design of the actual storage an AM implements is fairly
+   unconstrained. It is possible to use postgres' shared buffer cache, but not
+   required. One fairly large constraint is that currently, if the AM wants to
+   support modifications and/or indexes, it is necessary that each tuple has a
+   tuple identifier (<acronym>TID</acronym>) consisting of a block number and
+   an item number within that block (see also <xref
+   linkend="storage-page-layout"/>).  It is not strictly necessary that the
+   sub-parts of <acronym>TIDs</acronym> have the same meaning they e.g. have
+   for <literal>heap</literal>, but if bitmap scan support is desired (it is
+   optional), the block number needs to provide locality.
+  </para>
+
+  <para>
+   For crash safety an AM can use postgres' <xref linkend="wal"/>, or a custom
+   approach can be implemented.
+  </para>
+
+  <para>
+   Any developer of a new <literal>table access method</literal> can refer to
+   the existing <literal>heap</literal> implementation present in
+   <filename>src/backend/heap/heapam_handler.c</filename> for more details of
+   how it is implemented.
+  </para>
+
+</chapter>
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 4b760c2cd75..42e2ba68bf9 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -9,6 +9,9 @@
  *
  * src/include/access/tableam.h
  *
+ * NOTES
+ *		See tableam.sgml for higher level documentation.
+ *
  *-------------------------------------------------------------------------
  */
 #ifndef TABLEAM_H
-- 
2.21.0.dirty

#141Justin Pryzby
pryzby@telsasoft.com
In reply to: Andres Freund (#140)
1 attachment(s)
Re: Pluggable Storage - Andres's take

I reviewed new docs for $SUBJECT.

Find attached proposed changes.

There's one XXX item I'm unsure what it's intended to say.

Justin

From a3d290bf67af2a34e44cd6c160daf552b56a13b5 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 4 Apr 2019 00:48:09 -0500
Subject: [PATCH v1] Fine tune documentation for tableam

Added at commit b73c3a11963c8bb783993cfffabb09f558f86e37
---
doc/src/sgml/catalogs.sgml | 2 +-
doc/src/sgml/config.sgml | 4 ++--
doc/src/sgml/ref/select_into.sgml | 6 +++---
doc/src/sgml/storage.sgml | 17 ++++++++-------
doc/src/sgml/tableam.sgml | 44 ++++++++++++++++++++-------------------
5 files changed, 38 insertions(+), 35 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 58c8c96..40ddec4 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -587,7 +587,7 @@
    The catalog <structname>pg_am</structname> stores information about
    relation access methods.  There is one row for each access method supported
    by the system.
-   Currently, only table and indexes have access methods. The requirements for table
+   Currently, only tables and indexes have access methods. The requirements for table
    and index access methods are discussed in detail in <xref linkend="tableam"/> and
    <xref linkend="indexam"/> respectively.
   </para>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4a9a1e8..90b478d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7306,8 +7306,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         This parameter specifies the default table access method to use when
         creating tables or materialized views if the <command>CREATE</command>
         command does not explicitly specify an access method, or when
-        <command>SELECT ... INTO</command> is used, which does not allow to
-        specify a table access method. The default is <literal>heap</literal>.
+        <command>SELECT ... INTO</command> is used, which does not allow
+        specification of a table access method. The default is <literal>heap</literal>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/select_into.sgml b/doc/src/sgml/ref/select_into.sgml
index 17bed24..1443d79 100644
--- a/doc/src/sgml/ref/select_into.sgml
+++ b/doc/src/sgml/ref/select_into.sgml
@@ -106,11 +106,11 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
   </para>
   <para>
-   In contrast to <command>CREATE TABLE AS</command> <command>SELECT
-   INTO</command> does not allow to specify properties like a table's access
+   In contrast to <command>CREATE TABLE AS</command>, <command>SELECT
+   INTO</command> does not allow specification of properties like a table's access
    method with <xref linkend="sql-createtable-method" /> or the table's
    tablespace with <xref linkend="sql-createtable-tablespace" />. Use <xref
-   linkend="sql-createtableas"/> if necessary.  Therefore the default table
+   linkend="sql-createtableas"/> if necessary.  Therefore, the default table
    access method is chosen for the new table. See <xref
    linkend="guc-default-table-access-method"/> for more information.
   </para>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 62333e3..5dfca1b 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -189,11 +189,11 @@ there.
 </para>
 <para>
- Note that the following sections describe the way the builtin
+ Note that the following sections describe the behavior of the builtin
  <literal>heap</literal> <link linkend="tableam">table access method</link>,
- and the builtin <link linkend="indexam">index access methods</link> work. Due
- to the extensible nature of <productname>PostgreSQL</productname> other types
- of access method might work similar or not.
+ and the builtin <link linkend="indexam">index access methods</link>. Due
+ to the extensible nature of <productname>PostgreSQL</productname>, other
+ access methods might work differently.
 </para>
 <para>
@@ -703,11 +703,12 @@ erased (they will be recreated automatically as needed).
 This section provides an overview of the page format used within
 <productname>PostgreSQL</productname> tables and indexes.<footnote>
   <para>
-    Actually, neither table nor index access methods need not use this page
-    format.  All the existing index methods do use this basic format, but the
+    Actually, use of this page format is not required for either table or index
+    access methods.
+    The <literal>heap</literal> table access method always uses this format.
+    All the existing index methods also use the basic format, but the
     data kept on index metapages usually doesn't follow the item layout
-    rules. The <literal>heap</literal> table access method also always uses
-    this format.
+    rules.
   </para>
 </footnote>
 Sequences and <acronym>TOAST</acronym> tables are formatted just like a regular table.
diff --git a/doc/src/sgml/tableam.sgml b/doc/src/sgml/tableam.sgml
index 8d9bfd8..0a89935 100644
--- a/doc/src/sgml/tableam.sgml
+++ b/doc/src/sgml/tableam.sgml
@@ -48,54 +48,56 @@
   callbacks and their behavior is defined in the
   <structname>TableAmRoutine</structname> structure (with comments inside the
   struct defining the requirements for callbacks). Most callbacks have
-  wrapper functions, which are documented for the point of view of a user,
-  rather than an implementor, of the table access method.  For details,
+  wrapper functions, which are documented from the point of view of a user
+  (rather than an implementor) of the table access method.  For details,
   please refer to the <ulink url="https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/access/tableam.h;hb=HEAD">
   <filename>src/include/access/tableam.h</filename></ulink> file.
  </para>
  <para>
-  To implement a access method, an implementor will typically need to
-  implement a AM specific type of tuple table slot (see
+  To implement an access method, an implementor will typically need to
+  implement an AM-specific type of tuple table slot (see
   <ulink url="https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/executor/tuptable.h;hb=HEAD">
-   <filename>src/include/executor/tuptable.h</filename></ulink>) which allows
+   <filename>src/include/executor/tuptable.h</filename></ulink>), which allows
    code outside the access method to hold references to tuples of the AM, and
    to access the columns of the tuple.
  </para>
  <para>
-  Currently the the way an AM actually stores data is fairly
-  unconstrained. It is e.g. possible to use postgres' shared buffer cache,
-  but not required. In case shared buffers are used, it likely makes to
-  postgres' standard page layout described in <xref
+  Currently, the way an AM actually stores data is fairly
+  unconstrained. For example, it's possible but not required to use postgres'
+  shared buffer cache.  In case it is used, it likely makes
+XXX something missing here ?
+  to postgres' standard page layout described in <xref
   linkend="storage-page-layout"/>.
  </para>
  <para>
   One fairly large constraint of the table access method API is that,
   currently, if the AM wants to support modifications and/or indexes, it is
-  necessary that each tuple has a tuple identifier (<acronym>TID</acronym>)
+  necessary for each tuple to have a tuple identifier (<acronym>TID</acronym>)
   consisting of a block number and an item number (see also <xref
   linkend="storage-page-layout"/>).  It is not strictly necessary that the
-  sub-parts of <acronym>TIDs</acronym> have the same meaning they e.g. have
+  sub-parts of <acronym>TIDs</acronym> have the same meaning as used
   for <literal>heap</literal>, but if bitmap scan support is desired (it is
   optional), the block number needs to provide locality.
  </para>
  <para>
-  For crash safety an AM can use postgres' <link
-  linkend="wal"><acronym>WAL</acronym></link>, or a custom approach can be
-  implemented.  If <acronym>WAL</acronym> is chosen, either <link
-  linkend="generic-wal">Generic WAL Records</link> can be used &mdash; which
-  implies higher WAL volume but is easy, or a new type of
-  <acronym>WAL</acronym> records can be implemented &mdash; but that
-  currently requires modifications of core code (namely modifying
+  For crash safety, an AM can use postgres' <link
+  linkend="wal"><acronym>WAL</acronym></link>, or a custom implementation.
+  If <acronym>WAL</acronym> is chosen, either <link
+  linkend="generic-wal">Generic WAL Records</link> can be used,
+  or a new type of <acronym>WAL</acronym> records can be implemented.
+  Generic WAL Records are easy, but imply higher WAL volume.
+  Implementation of a new type of WAL record
+  currently requires modifications to core code (specifically,
   <filename>src/include/access/rmgrlist.h</filename>).
  </para>
  <para>
   To implement transactional support in a manner that allows different table
-  access methods be accessed within a single transaction, it likely is
+  access methods be accessed within a single transaction, it's likely
   necessary to closely integrate with the machinery in
   <filename>src/backend/access/transam/xlog.c</filename>.
  </para>
@@ -103,8 +105,8 @@
  <para>
   Any developer of a new <literal>table access method</literal> can refer to
   the existing <literal>heap</literal> implementation present in
-  <filename>src/backend/heap/heapam_handler.c</filename> for more details of
-  how it is implemented.
+  <filename>src/backend/heap/heapam_handler.c</filename> for details of
+  its implementation.
  </para>

</chapter>
--
2.1.4

Attachments:

v1-0001-Fine-tune-documentation-for-tableam.patchtext/x-diff; charset=us-asciiDownload
From a3d290bf67af2a34e44cd6c160daf552b56a13b5 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 4 Apr 2019 00:48:09 -0500
Subject: [PATCH v1] Fine tune documentation for tableam

Added at commit b73c3a11963c8bb783993cfffabb09f558f86e37
---
 doc/src/sgml/catalogs.sgml        |  2 +-
 doc/src/sgml/config.sgml          |  4 ++--
 doc/src/sgml/ref/select_into.sgml |  6 +++---
 doc/src/sgml/storage.sgml         | 17 ++++++++-------
 doc/src/sgml/tableam.sgml         | 44 ++++++++++++++++++++-------------------
 5 files changed, 38 insertions(+), 35 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 58c8c96..40ddec4 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -587,7 +587,7 @@
    The catalog <structname>pg_am</structname> stores information about
    relation access methods.  There is one row for each access method supported
    by the system.
-   Currently, only table and indexes have access methods. The requirements for table
+   Currently, only tables and indexes have access methods. The requirements for table
    and index access methods are discussed in detail in <xref linkend="tableam"/> and
    <xref linkend="indexam"/> respectively.
   </para>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4a9a1e8..90b478d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7306,8 +7306,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         This parameter specifies the default table access method to use when
         creating tables or materialized views if the <command>CREATE</command>
         command does not explicitly specify an access method, or when
-        <command>SELECT ... INTO</command> is used, which does not allow to
-        specify a table access method. The default is <literal>heap</literal>.
+        <command>SELECT ... INTO</command> is used, which does not allow
+        specification of a table access method. The default is <literal>heap</literal>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/select_into.sgml b/doc/src/sgml/ref/select_into.sgml
index 17bed24..1443d79 100644
--- a/doc/src/sgml/ref/select_into.sgml
+++ b/doc/src/sgml/ref/select_into.sgml
@@ -106,11 +106,11 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
   </para>
 
   <para>
-   In contrast to <command>CREATE TABLE AS</command> <command>SELECT
-   INTO</command> does not allow to specify properties like a table's access
+   In contrast to <command>CREATE TABLE AS</command>, <command>SELECT
+   INTO</command> does not allow specification of properties like a table's access
    method with <xref linkend="sql-createtable-method" /> or the table's
    tablespace with <xref linkend="sql-createtable-tablespace" />. Use <xref
-   linkend="sql-createtableas"/> if necessary.  Therefore the default table
+   linkend="sql-createtableas"/> if necessary.  Therefore, the default table
    access method is chosen for the new table. See <xref
    linkend="guc-default-table-access-method"/> for more information.
   </para>
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 62333e3..5dfca1b 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -189,11 +189,11 @@ there.
 </para>
 
 <para>
- Note that the following sections describe the way the builtin
+ Note that the following sections describe the behavior of the builtin
  <literal>heap</literal> <link linkend="tableam">table access method</link>,
- and the builtin <link linkend="indexam">index access methods</link> work. Due
- to the extensible nature of <productname>PostgreSQL</productname> other types
- of access method might work similar or not.
+ and the builtin <link linkend="indexam">index access methods</link>. Due
+ to the extensible nature of <productname>PostgreSQL</productname>, other
+ access methods might work differently.
 </para>
 
 <para>
@@ -703,11 +703,12 @@ erased (they will be recreated automatically as needed).
 This section provides an overview of the page format used within
 <productname>PostgreSQL</productname> tables and indexes.<footnote>
   <para>
-    Actually, neither table nor index access methods need not use this page
-    format.  All the existing index methods do use this basic format, but the
+    Actually, use of this page format is not required for either table or index
+    access methods.
+    The <literal>heap</literal> table access method always uses this format.
+    All the existing index methods also use the basic format, but the
     data kept on index metapages usually doesn't follow the item layout
-    rules. The <literal>heap</literal> table access method also always uses
-    this format.
+    rules.
   </para>
 </footnote>
 Sequences and <acronym>TOAST</acronym> tables are formatted just like a regular table.
diff --git a/doc/src/sgml/tableam.sgml b/doc/src/sgml/tableam.sgml
index 8d9bfd8..0a89935 100644
--- a/doc/src/sgml/tableam.sgml
+++ b/doc/src/sgml/tableam.sgml
@@ -48,54 +48,56 @@
   callbacks and their behavior is defined in the
   <structname>TableAmRoutine</structname> structure (with comments inside the
   struct defining the requirements for callbacks). Most callbacks have
-  wrapper functions, which are documented for the point of view of a user,
-  rather than an implementor, of the table access method.  For details,
+  wrapper functions, which are documented from the point of view of a user
+  (rather than an implementor) of the table access method.  For details,
   please refer to the <ulink url="https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/access/tableam.h;hb=HEAD">
   <filename>src/include/access/tableam.h</filename></ulink> file.
  </para>
 
  <para>
-  To implement a access method, an implementor will typically need to
-  implement a AM specific type of tuple table slot (see
+  To implement an access method, an implementor will typically need to
+  implement an AM-specific type of tuple table slot (see
   <ulink url="https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/executor/tuptable.h;hb=HEAD">
-   <filename>src/include/executor/tuptable.h</filename></ulink>) which allows
+   <filename>src/include/executor/tuptable.h</filename></ulink>), which allows
    code outside the access method to hold references to tuples of the AM, and
    to access the columns of the tuple.
  </para>
 
  <para>
-  Currently the the way an AM actually stores data is fairly
-  unconstrained. It is e.g. possible to use postgres' shared buffer cache,
-  but not required. In case shared buffers are used, it likely makes to
-  postgres' standard page layout described in <xref
+  Currently, the way an AM actually stores data is fairly
+  unconstrained. For example, it's possible but not required to use postgres'
+  shared buffer cache.  In case it is used, it likely makes
+XXX something missing here ?
+  to postgres' standard page layout described in <xref
   linkend="storage-page-layout"/>.
  </para>
 
  <para>
   One fairly large constraint of the table access method API is that,
   currently, if the AM wants to support modifications and/or indexes, it is
-  necessary that each tuple has a tuple identifier (<acronym>TID</acronym>)
+  necessary for each tuple to have a tuple identifier (<acronym>TID</acronym>)
   consisting of a block number and an item number (see also <xref
   linkend="storage-page-layout"/>).  It is not strictly necessary that the
-  sub-parts of <acronym>TIDs</acronym> have the same meaning they e.g. have
+  sub-parts of <acronym>TIDs</acronym> have the same meaning as used
   for <literal>heap</literal>, but if bitmap scan support is desired (it is
   optional), the block number needs to provide locality.
  </para>
 
  <para>
-  For crash safety an AM can use postgres' <link
-  linkend="wal"><acronym>WAL</acronym></link>, or a custom approach can be
-  implemented.  If <acronym>WAL</acronym> is chosen, either <link
-  linkend="generic-wal">Generic WAL Records</link> can be used &mdash; which
-  implies higher WAL volume but is easy, or a new type of
-  <acronym>WAL</acronym> records can be implemented &mdash; but that
-  currently requires modifications of core code (namely modifying
+  For crash safety, an AM can use postgres' <link
+  linkend="wal"><acronym>WAL</acronym></link>, or a custom implementation.
+  If <acronym>WAL</acronym> is chosen, either <link
+  linkend="generic-wal">Generic WAL Records</link> can be used,
+  or a new type of <acronym>WAL</acronym> records can be implemented.
+  Generic WAL Records are easy, but imply higher WAL volume.
+  Implementation of a new type of WAL record
+  currently requires modifications to core code (specifically,
   <filename>src/include/access/rmgrlist.h</filename>).
  </para>
 
  <para>
   To implement transactional support in a manner that allows different table
-  access methods be accessed within a single transaction, it likely is
+  access methods be accessed within a single transaction, it's likely
   necessary to closely integrate with the machinery in
   <filename>src/backend/access/transam/xlog.c</filename>.
  </para>
@@ -103,8 +105,8 @@
  <para>
   Any developer of a new <literal>table access method</literal> can refer to
   the existing <literal>heap</literal> implementation present in
-  <filename>src/backend/heap/heapam_handler.c</filename> for more details of
-  how it is implemented.
+  <filename>src/backend/heap/heapam_handler.c</filename> for details of
+  its implementation.
  </para>
 
 </chapter>
-- 
2.1.4

#142Andres Freund
andres@anarazel.de
In reply to: Justin Pryzby (#141)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-04-04 00:51:38 -0500, Justin Pryzby wrote:

I reviewed new docs for $SUBJECT.

Find attached proposed changes.

There's one XXX item I'm unsure what it's intended to say.

Thanks! I applied most of these, and filled in the XXX. I didn't like
the s/allow to to specify properties/allow specification of properties/,
so I left those out. But I could be convinced otherwise...

Greetings,

Andres Freund

#143Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Andres Freund (#142)
3 attachment(s)
Re: Pluggable Storage - Andres's take

I wrote a little toy implementation that just returns constant data to
play with this a little. Looks good overall.

There were a bunch of typos in the comments in tableam.h, see attached.
Some of the comments could use more copy-editing and clarification, I
think, but I stuck to fixing just typos and such for now.

index_update_stats() calls RelationGetNumberOfBlocks(<table>). If the AM
doesn't use normal data files, that won't work. I bumped into that with
my toy implementation, which wouldn't need to create any data files, if
it wasn't for this.

The comments for relation_set_new_relfilenode() callback say that the AM
can set *freezeXid and *minmulti to invalid. But when I did that, VACUUM
hits this assertion:

TRAP: FailedAssertion("!(((classForm->relfrozenxid) >= ((TransactionId)
3)))", File: "vacuum.c", Line: 1323)

There's a little bug in index-only scan executor node, where it mixes up
the slots to hold a tuple from the index, and from the table. That
doesn't cause any ill effects if the AM uses TTSOpsHeapTuple, but with
my toy AM, which uses a virtual slot, it caused warnings like this from
index-only scans:

WARNING: problem in alloc set ExecutorState: detected write past chunk
end in block 0x56419b0f88e8, chunk 0x56419b0f8f90

Attached is a patch with the toy implementation I used to test this.
I'm not suggesting we should commit that - although feel free to do that
if you think it's useful - but it shows how I bumped into these issues.
The second patch fixes the index-only-scan slot confusion (untested,
except with my toy AM).

- Heikki

Attachments:

0001-Add-a-toy-table-AM-implementation-to-play-with.patchtext/x-patch; name=0001-Add-a-toy-table-AM-implementation-to-play-with.patchDownload
From 97e0eea6a3fb123845ac5650f1aaa1802bf56694 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 8 Apr 2019 15:16:53 +0300
Subject: [PATCH 1/3] Add a toy table AM implementation to play with.

It returns a constant data set. No insert/update/delete. But you can
create indexes.
---
 src/test/modules/toytable/Makefile            |  25 +
 .../modules/toytable/expected/toytable.out    |  41 ++
 src/test/modules/toytable/sql/toytable.sql    |  17 +
 src/test/modules/toytable/toytable--1.0.sql   |  12 +
 src/test/modules/toytable/toytable.control    |   4 +
 src/test/modules/toytable/toytableam.c        | 612 ++++++++++++++++++
 6 files changed, 711 insertions(+)
 create mode 100644 src/test/modules/toytable/Makefile
 create mode 100644 src/test/modules/toytable/expected/toytable.out
 create mode 100644 src/test/modules/toytable/sql/toytable.sql
 create mode 100644 src/test/modules/toytable/toytable--1.0.sql
 create mode 100644 src/test/modules/toytable/toytable.control
 create mode 100644 src/test/modules/toytable/toytableam.c

diff --git a/src/test/modules/toytable/Makefile b/src/test/modules/toytable/Makefile
new file mode 100644
index 00000000000..142ef2d23e6
--- /dev/null
+++ b/src/test/modules/toytable/Makefile
@@ -0,0 +1,25 @@
+# src/test/modules/toytable/Makefile
+
+MODULE_big = toytable
+OBJS = toytableam.o $(WIN32RES)
+PGFILEDESC = "A dummy implementantation of the table AM API"
+
+EXTENSION = toytable
+DATA = toytable--1.0.sql
+
+REGRESS = toytable
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/toytable
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+OBJS = toytableam.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/test/modules/toytable/expected/toytable.out b/src/test/modules/toytable/expected/toytable.out
new file mode 100644
index 00000000000..3e8598e284c
--- /dev/null
+++ b/src/test/modules/toytable/expected/toytable.out
@@ -0,0 +1,41 @@
+CREATE EXTENSION toytable;
+create table toytab (i int4, j int4, k int4) using toytable;
+select * from toytab;
+ i  | j  | k  
+----+----+----
+  1 |  1 |  1
+  2 |  2 |  2
+  3 |  3 |  3
+  4 |  4 |  4
+  5 |  5 |  5
+  6 |  6 |  6
+  7 |  7 |  7
+  8 |  8 |  8
+  9 |  9 |  9
+ 10 | 10 | 10
+(10 rows)
+
+create index toyidx on toytab(i);
+-- test index scan
+set enable_seqscan=off;
+set enable_indexscan=on;
+select i, j from toytab where i = 4;
+ i | j 
+---+---
+ 4 | 4
+(1 row)
+
+-- index only scan
+explain (costs off) select i from toytab where i = 4;
+               QUERY PLAN               
+----------------------------------------
+ Index Only Scan using toyidx on toytab
+   Index Cond: (i = 4)
+(2 rows)
+
+select i from toytab where i = 4 ;
+ i 
+---
+ 4
+(1 row)
+
diff --git a/src/test/modules/toytable/sql/toytable.sql b/src/test/modules/toytable/sql/toytable.sql
new file mode 100644
index 00000000000..8d9bac41bbf
--- /dev/null
+++ b/src/test/modules/toytable/sql/toytable.sql
@@ -0,0 +1,17 @@
+CREATE EXTENSION toytable;
+
+create table toytab (i int4, j int4, k int4) using toytable;
+
+select * from toytab;
+
+create index toyidx on toytab(i);
+
+-- test index scan
+set enable_seqscan=off;
+set enable_indexscan=on;
+
+select i, j from toytab where i = 4;
+
+-- index only scan
+explain (costs off) select i from toytab where i = 4;
+select i from toytab where i = 4 ;
diff --git a/src/test/modules/toytable/toytable--1.0.sql b/src/test/modules/toytable/toytable--1.0.sql
new file mode 100644
index 00000000000..52085d27f4a
--- /dev/null
+++ b/src/test/modules/toytable/toytable--1.0.sql
@@ -0,0 +1,12 @@
+/* src/test/modules/toyam/toyam--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION toytable" to load this file. \quit
+
+CREATE FUNCTION toytableam_handler(internal)
+RETURNS pg_catalog.table_am_handler STRICT
+AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE ACCESS METHOD toytable TYPE TABLE HANDLER toytableam_handler
+
+
diff --git a/src/test/modules/toytable/toytable.control b/src/test/modules/toytable/toytable.control
new file mode 100644
index 00000000000..8f613e58d6e
--- /dev/null
+++ b/src/test/modules/toytable/toytable.control
@@ -0,0 +1,4 @@
+comment = 'Dummy implementation of table AM api'
+default_version = '1.0'
+module_pathname = '$libdir/toytable'
+relocatable = true
diff --git a/src/test/modules/toytable/toytableam.c b/src/test/modules/toytable/toytableam.c
new file mode 100644
index 00000000000..30b0e74e7f6
--- /dev/null
+++ b/src/test/modules/toytable/toytableam.c
@@ -0,0 +1,612 @@
+/*-------------------------------------------------------------------------
+ *
+ * toyam_handler.c
+ *	  a toy table access method code
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/toytable/toyam_handler.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <math.h>
+
+#include "miscadmin.h"
+
+#include "access/multixact.h"
+#include "access/relscan.h"
+#include "access/tableam.h"
+#include "catalog/catalog.h"
+#include "catalog/storage.h"
+#include "catalog/index.h"
+#include "catalog/pg_type.h"
+#include "executor/executor.h"
+#include "utils/builtins.h"
+#include "utils/rel.h"
+#include "storage/bufmgr.h"
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(toytableam_handler);
+
+typedef struct
+{
+	TableScanDescData scan;
+
+	int			tupidx;
+} ToyScanDescData;
+typedef ToyScanDescData *ToyScanDesc;
+
+static const TupleTableSlotOps *
+toyam_slot_callbacks(Relation relation)
+{
+	return &TTSOpsVirtual;
+}
+
+static TableScanDesc toyam_scan_begin(Relation rel,
+							 Snapshot snapshot,
+							 int nkeys, struct ScanKeyData *key,
+							 ParallelTableScanDesc pscan,
+							 bool allow_strat,
+							 bool allow_sync,
+							 bool allow_pagemode,
+							 bool is_bitmapscan,
+							 bool is_samplescan,
+							 bool temp_snap)
+{
+	ToyScanDesc tscan;
+
+	tscan = palloc0(sizeof(ToyScanDescData));
+	tscan->scan.rs_rd = rel;
+	tscan->scan.rs_snapshot = snapshot;
+	tscan->scan.rs_nkeys = nkeys;
+	tscan->scan.rs_bitmapscan = is_bitmapscan;
+	tscan->scan.rs_samplescan = is_samplescan;
+	tscan->scan.rs_allow_strat = allow_strat;
+	tscan->scan.rs_allow_sync = allow_sync;
+	tscan->scan.rs_temp_snap = temp_snap;
+	tscan->scan.rs_parallel = pscan;
+
+	tscan->tupidx = 0;
+
+	return &tscan->scan;
+}
+
+static void
+toyam_scan_end(TableScanDesc scan)
+{
+	pfree(scan);
+}
+
+static void
+toyam_scan_rescan(TableScanDesc scan, struct ScanKeyData *key,
+				  bool set_params, bool allow_strat,
+				  bool allow_sync, bool allow_pagemode)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static bool
+toyam_scan_getnextslot(TableScanDesc scan,
+					   ScanDirection direction,
+					   TupleTableSlot *slot)
+{
+	ToyScanDesc tscan = (ToyScanDesc) scan;
+
+	slot->tts_nvalid = 0;
+	slot->tts_flags |= TTS_FLAG_EMPTY;
+
+	/*
+	 * Return a constant 1 rows. Every int4 attribute gets
+	 * a running count, everything else is NULL.
+	 */
+	if (tscan->tupidx < 10)
+	{
+		TupleDesc desc = RelationGetDescr(tscan->scan.rs_rd);
+
+		tscan->tupidx++;
+
+		for (AttrNumber attno = 1; attno <= desc->natts; attno++)
+		{
+			Form_pg_attribute att = &desc->attrs[attno - 1];
+			Datum		d;
+			bool		isnull;
+
+			if (att->atttypid == INT4OID)
+			{
+				d = tscan->tupidx;
+				isnull = false;
+			}
+			else
+			{
+				d = (Datum) 0;
+				isnull = true;
+			}
+
+			slot->tts_values[attno - 1] = d;
+			slot->tts_isnull[attno - 1] = isnull;
+		}
+
+		ItemPointerSet(&slot->tts_tid, 1, tscan->tupidx);
+		slot->tts_nvalid = slot->tts_tupleDescriptor->natts;
+		slot->tts_flags &= ~TTS_FLAG_EMPTY;
+
+		return true;
+	}
+	else
+		return false;
+}
+
+static Size
+toyam_parallelscan_estimate(Relation rel)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static Size
+toyam_parallelscan_initialize(Relation rel,
+							  ParallelTableScanDesc pscan)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_parallelscan_reinitialize(Relation rel,
+								ParallelTableScanDesc pscan)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static struct IndexFetchTableData *
+toyam_index_fetch_begin(Relation rel)
+{
+	IndexFetchTableData *tfetch = palloc0(sizeof(IndexFetchTableData));
+
+	tfetch->rel = rel;
+
+	return tfetch;
+}
+
+static void
+toyam_index_fetch_reset(struct IndexFetchTableData *data)
+{
+}
+
+static void
+toyam_index_fetch_end(struct IndexFetchTableData *data)
+{
+	pfree(data);
+}
+
+static bool
+toyam_index_fetch_tuple(struct IndexFetchTableData *scan,
+						ItemPointer tid,
+						Snapshot snapshot,
+						TupleTableSlot *slot,
+						bool *call_again, bool *all_dead)
+{
+	TupleDesc desc = RelationGetDescr(scan->rel);
+	int			tupidx;
+
+	if (ItemPointerGetBlockNumber(tid) != 1)
+		return false;
+
+	tupidx = ItemPointerGetOffsetNumber(tid);
+	if (tupidx < 1 || tupidx > 10)
+		return false;
+
+	slot->tts_nvalid = 0;
+	slot->tts_flags |= TTS_FLAG_EMPTY;
+
+	/* Return same data as toyam_scan_getnextslot does */
+	for (AttrNumber attno = 1; attno <= desc->natts; attno++)
+	{
+		Form_pg_attribute att = &desc->attrs[attno - 1];
+		Datum		d;
+		bool		isnull;
+
+		if (att->atttypid == INT4OID)
+		{
+			d = tupidx;
+			isnull = false;
+		}
+		else
+		{
+			d = (Datum) 0;
+			isnull = true;
+		}
+
+		slot->tts_values[attno - 1] = d;
+		slot->tts_isnull[attno - 1] = isnull;
+	}
+
+	ItemPointerSet(&slot->tts_tid, 1, tupidx);
+	slot->tts_nvalid = slot->tts_tupleDescriptor->natts;
+	slot->tts_flags &= ~TTS_FLAG_EMPTY;
+
+	return true;
+}
+
+static bool
+toyam_tuple_fetch_row_version(Relation rel,
+							  ItemPointer tid,
+							  Snapshot snapshot,
+							  TupleTableSlot *slot)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_tuple_get_latest_tid(Relation rel,
+						   Snapshot snapshot,
+						   ItemPointer tid)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static bool
+toyam_tuple_satisfies_snapshot(Relation rel,
+							   TupleTableSlot *slot,
+							   Snapshot snapshot)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static TransactionId
+toyam_compute_xid_horizon_for_tuples(Relation rel,
+									 ItemPointerData *items,
+									 int nitems)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_tuple_insert(Relation rel, TupleTableSlot *slot,
+				   CommandId cid, int options,
+				   struct BulkInsertStateData *bistate)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_tuple_insert_speculative(Relation rel,
+							   TupleTableSlot *slot,
+							   CommandId cid,
+							   int options,
+							   struct BulkInsertStateData *bistate,
+							   uint32 specToken)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_tuple_complete_speculative(Relation rel,
+								 TupleTableSlot *slot,
+								 uint32 specToken,
+								 bool succeeded)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static TM_Result
+toyam_tuple_delete(Relation rel,
+				   ItemPointer tid,
+				   CommandId cid,
+				   Snapshot snapshot,
+				   Snapshot crosscheck,
+				   bool wait,
+				   TM_FailureData *tmfd,
+				   bool changingPart)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
+				   CommandId cid, int options, struct BulkInsertStateData *bistate)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static TM_Result
+toyam_tuple_update(Relation rel,
+				   ItemPointer otid,
+				   TupleTableSlot *slot,
+				   CommandId cid,
+				   Snapshot snapshot,
+				   Snapshot crosscheck,
+				   bool wait,
+				   TM_FailureData *tmfd,
+				   LockTupleMode *lockmode,
+				   bool *update_indexes)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static TM_Result
+toyam_tuple_lock(Relation rel,
+				 ItemPointer tid,
+				 Snapshot snapshot,
+				 TupleTableSlot *slot,
+				 CommandId cid,
+				 LockTupleMode mode,
+				 LockWaitPolicy wait_policy,
+				 uint8 flags,
+				 TM_FailureData *tmfd)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_finish_bulk_insert(Relation rel, int options)
+{
+	return;
+}
+
+static void
+toyam_relation_set_new_filenode(Relation rel,
+								char persistence,
+								TransactionId *freezeXid,
+								MultiXactId *minmulti)
+{
+	*freezeXid = InvalidTransactionId;
+	*minmulti = InvalidMultiXactId;
+
+	/*
+	 * FIXME: We don't need this for anything. But index build calls
+	 * RelationGetNumberOfBlocks, from index_update_stats(), and that
+	 * fails if the underlying file doesn't exist.
+	 */
+	RelationCreateStorage(rel->rd_node, persistence);
+}
+
+static void
+toyam_relation_nontransactional_truncate(Relation rel)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_relation_copy_data(Relation rel, RelFileNode newrnode)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_relation_copy_for_cluster(Relation NewHeap,
+								Relation OldHeap,
+								Relation OldIndex,
+								bool use_sort,
+								TransactionId OldestXmin,
+								TransactionId FreezeXid,
+								MultiXactId MultiXactCutoff,
+								double *num_tuples,
+								double *tups_vacuumed,
+								double *tups_recently_dead)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_relation_vacuum(Relation onerel,
+					  struct VacuumParams *params,
+					  BufferAccessStrategy bstrategy)
+{
+	/* we've got nothing to do */
+}
+
+static bool
+toyam_scan_analyze_next_block(TableScanDesc scan,
+							  BlockNumber blockno,
+							  BufferAccessStrategy bstrategy)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static bool
+toyam_scan_analyze_next_tuple(TableScanDesc scan,
+							  TransactionId OldestXmin,
+							  double *liverows,
+							  double *deadrows,
+							  TupleTableSlot *slot)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static double
+toyam_index_build_range_scan(Relation heap_rel,
+							 Relation index_rel,
+							 struct IndexInfo *index_nfo,
+							 bool allow_sync,
+							 bool anyvisible,
+							 bool progress,
+							 BlockNumber start_blockno,
+							 BlockNumber end_blockno,
+							 IndexBuildCallback callback,
+							 void *callback_state,
+							 TableScanDesc scan)
+{
+	TupleTableSlot *slot;
+	EState     *estate;
+
+	estate = CreateExecutorState();
+	slot = table_slot_create(heap_rel, NULL);
+
+	if (!scan)
+		scan = toyam_scan_begin(heap_rel,
+								SnapshotAny,
+								0, NULL,
+								NULL,
+								false,
+								false,
+								false,
+								false,
+								false,
+								false);
+
+	while (toyam_scan_getnextslot(scan, ForwardScanDirection, slot))
+	{
+		Datum           values[INDEX_MAX_KEYS];
+		bool            isnull[INDEX_MAX_KEYS];
+		HeapTuple		heapTuple;
+
+		FormIndexDatum(index_nfo, slot, estate, values, isnull);
+
+		/* Call the AM's callback routine to process the tuple */
+		heapTuple = ExecCopySlotHeapTuple(slot);
+		heapTuple->t_self = slot->tts_tid;
+		callback(heap_rel, heapTuple, values, isnull, true,
+				 callback_state);
+		pfree(heapTuple);
+	}
+
+	toyam_scan_end(scan);
+	ExecDropSingleTupleTableSlot(slot);
+	FreeExecutorState(estate);
+
+	return 10;
+}
+
+static void
+toyam_index_validate_scan(Relation heap_rel,
+						  Relation index_rel,
+						  struct IndexInfo *index_info,
+						  Snapshot snapshot,
+						  struct ValidateIndexState *state)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_relation_estimate_size(Relation rel, int32 *attr_widths,
+							 BlockNumber *pages, double *tuples,
+							 double *allvisfrac)
+{
+	*pages = 1;
+	*tuples = 1;
+	*allvisfrac = 1.0;
+}
+
+static bool
+toyam_scan_sample_next_block(TableScanDesc scan,
+							 struct SampleScanState *scanstate)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static bool
+toyam_scan_sample_next_tuple(TableScanDesc scan,
+					   struct SampleScanState *scanstate,
+					   TupleTableSlot *slot)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static const TableAmRoutine toyam_methods = {
+	.type = T_TableAmRoutine,
+
+	.slot_callbacks = toyam_slot_callbacks,
+
+	.scan_begin = toyam_scan_begin,
+	.scan_end = toyam_scan_end,
+	.scan_rescan = toyam_scan_rescan,
+	.scan_getnextslot = toyam_scan_getnextslot,
+
+	.parallelscan_estimate = toyam_parallelscan_estimate,
+	.parallelscan_initialize = toyam_parallelscan_initialize,
+	.parallelscan_reinitialize = toyam_parallelscan_reinitialize,
+
+	.index_fetch_begin = toyam_index_fetch_begin,
+	.index_fetch_reset = toyam_index_fetch_reset,
+	.index_fetch_end = toyam_index_fetch_end,
+	.index_fetch_tuple = toyam_index_fetch_tuple,
+
+	.tuple_fetch_row_version = toyam_tuple_fetch_row_version,
+	.tuple_get_latest_tid = toyam_tuple_get_latest_tid,
+	.tuple_satisfies_snapshot = toyam_tuple_satisfies_snapshot,
+	.compute_xid_horizon_for_tuples = toyam_compute_xid_horizon_for_tuples,
+
+	.tuple_insert = toyam_tuple_insert,
+	.tuple_insert_speculative = toyam_tuple_insert_speculative,
+	.tuple_complete_speculative = toyam_tuple_complete_speculative,
+	.multi_insert = toyam_multi_insert,
+	.tuple_delete = toyam_tuple_delete,
+	.tuple_update = toyam_tuple_update,
+	.tuple_lock = toyam_tuple_lock,
+	.finish_bulk_insert = toyam_finish_bulk_insert,
+
+	.relation_set_new_filenode = toyam_relation_set_new_filenode,
+	.relation_nontransactional_truncate = toyam_relation_nontransactional_truncate,
+	.relation_copy_data = toyam_relation_copy_data,
+	.relation_copy_for_cluster = toyam_relation_copy_for_cluster,
+	.relation_vacuum = toyam_relation_vacuum,
+
+	.scan_analyze_next_block = toyam_scan_analyze_next_block,
+	.scan_analyze_next_tuple = toyam_scan_analyze_next_tuple,
+	.index_build_range_scan = toyam_index_build_range_scan,
+	.index_validate_scan = toyam_index_validate_scan,
+
+	.relation_estimate_size = toyam_relation_estimate_size,
+
+	.scan_bitmap_next_block = NULL,
+	.scan_bitmap_next_tuple = NULL,
+	.scan_sample_next_block = toyam_scan_sample_next_block,
+	.scan_sample_next_tuple = toyam_scan_sample_next_tuple,
+};
+
+Datum
+toytableam_handler(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_POINTER(&toyam_methods);
+}
-- 
2.20.1

0002-Fix-confusion-on-different-kinds-of-slots-in-IndexOn.patchtext/x-patch; name=0002-Fix-confusion-on-different-kinds-of-slots-in-IndexOn.patchDownload
From b329e4345731cd84708e5efcc51e3d5298c27bb2 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 8 Apr 2019 15:18:19 +0300
Subject: [PATCH 2/3] Fix confusion on different kinds of slots in
 IndexOnlyScans.

We used the same slot, to store a tuple from the index, and to store a
tuple from the table. That's not OK. It worked with the heap, because
heapam_getnextslot() stores a HeapTuple to the slot, and doesn't care how
large the tts_values/nulls arrays are. But when I played with a toy table
AM implementation that used a virtual tuple, it caused memory overruns.
---
 src/backend/executor/nodeIndexonlyscan.c | 16 +++++++++++++---
 src/include/nodes/execnodes.h            |  1 +
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 7711728495c..5833d683b38 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -166,10 +166,10 @@ IndexOnlyNext(IndexOnlyScanState *node)
 			 * Rats, we have to visit the heap to check visibility.
 			 */
 			InstrCountTuples2(node, 1);
-			if (!index_fetch_heap(scandesc, slot))
+			if (!index_fetch_heap(scandesc, node->ioss_TableSlot))
 				continue;		/* no visible tuple, try next index entry */
 
-			ExecClearTuple(slot);
+			ExecClearTuple(node->ioss_TableSlot);
 
 			/*
 			 * Only MVCC snapshots are supported here, so there should be no
@@ -528,7 +528,17 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	 */
 	tupDesc = ExecTypeFromTL(node->indextlist);
 	ExecInitScanTupleSlot(estate, &indexstate->ss, tupDesc,
-						  table_slot_callbacks(currentRelation));
+						  &TTSOpsVirtual);
+
+	/*
+	 * We need another slot, in a format that's suitable for the table AM,
+	 * for when we need to fetch a tuple from the table for rechecking
+	 * visibility.
+	 */
+	indexstate->ioss_TableSlot =
+		ExecAllocTableSlot(&estate->es_tupleTable,
+						   RelationGetDescr(currentRelation),
+						   table_slot_callbacks(currentRelation));
 
 	/*
 	 * Initialize result type and projection info.  The node's targetlist will
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a5e4b7ef2e0..108dee61e24 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1424,6 +1424,7 @@ typedef struct IndexOnlyScanState
 	struct IndexScanDescData *ioss_ScanDesc;
 	Buffer		ioss_VMBuffer;
 	Size		ioss_PscanLen;
+	TupleTableSlot *ioss_TableSlot;
 } IndexOnlyScanState;
 
 /* ----------------
-- 
2.20.1

0003-Fix-typos-and-grammar-in-tableam.h-comments.patchtext/x-patch; name=0003-Fix-typos-and-grammar-in-tableam.h-comments.patchDownload
From 213e33f92532201d0d278394cac7ffcaf0dccafa Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 8 Apr 2019 15:28:00 +0300
Subject: [PATCH 3/3] Fix typos and grammar in tableam.h comments.

---
 src/include/access/tableam.h | 119 +++++++++++++++++------------------
 1 file changed, 59 insertions(+), 60 deletions(-)

diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 51398f35c01..ab80919f8d0 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -26,6 +26,7 @@
 
 #define DEFAULT_TABLE_ACCESS_METHOD	"heap"
 
+/* GUCs */
 extern char *default_table_access_method;
 extern bool synchronize_seqscans;
 
@@ -40,7 +41,7 @@ struct ValidateIndexState;
 
 
 /*
- * Result codes for table_{update,delete,lock}_tuple, and for visibility
+ * Result codes for table_{update,delete,lock_tuple}, and for visibility
  * routines inside table AMs.
  */
 typedef enum TM_Result
@@ -68,8 +69,8 @@ typedef enum TM_Result
 
 	/*
 	 * The affected tuple is currently being modified by another session. This
-	 * will only be returned if (update/delete/lock)_tuple are instructed not
-	 * to wait.
+	 * will only be returned if table_(update/delete/lock_tuple) are instructed
+	 * not to wait.
 	 */
 	TM_BeingModified,
 
@@ -82,12 +83,15 @@ typedef enum TM_Result
  * When table_update, table_delete, or table_lock_tuple fail because the target
  * tuple is already outdated, they fill in this struct to provide information
  * to the caller about what happened.
+ *
  * ctid is the target's ctid link: it is the same as the target's TID if the
  * target was deleted, or the location of the replacement tuple if the target
  * was updated.
+ *
  * xmax is the outdating transaction's XID.  If the caller wants to visit the
  * replacement tuple, it must check that this matches before believing the
  * replacement is really a match.
+ *
  * cmax is the outdating command's CID, but only when the failure code is
  * TM_SelfModified (i.e., something in the current transaction outdated the
  * tuple); otherwise cmax is zero.  (We make this restriction because
@@ -108,10 +112,10 @@ typedef struct TM_FailureData
 #define TABLE_INSERT_FROZEN			0x0004
 #define TABLE_INSERT_NO_LOGICAL		0x0008
 
-/* flag bits fortable_lock_tuple */
+/* flag bits for table_lock_tuple */
 /* Follow tuples whose update is in progress if lock modes don't conflict  */
 #define TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS	(1 << 0)
-/* Follow update chain and lock lastest version of tuple */
+/* Follow update chain and lock latest version of tuple */
 #define TUPLE_LOCK_FLAG_FIND_LAST_VERSION		(1 << 1)
 
 
@@ -128,8 +132,8 @@ typedef void (*IndexBuildCallback) (Relation index,
  * server-lifetime manner, typically as a static const struct, which then gets
  * returned by FormData_pg_am.amhandler.
  *
- * I most cases it's not appropriate to directly call the callbacks directly,
- * instead use the table_* wrapper functions.
+ * In most cases it's not appropriate to call the callbacks directly, use the
+ * table_* wrapper functions instead.
  *
  * GetTableAmRoutine() asserts that required callbacks are filled in, remember
  * to update when adding a callback.
@@ -194,7 +198,7 @@ typedef struct TableAmRoutine
 	void		(*scan_end) (TableScanDesc scan);
 
 	/*
-	 * Restart relation scan.  If set_params is set to true, allow{strat,
+	 * Restart relation scan.  If set_params is set to true, allow_{strat,
 	 * sync, pagemode} (see scan_begin) changes should be taken into account.
 	 */
 	void		(*scan_rescan) (TableScanDesc scan, struct ScanKeyData *key,
@@ -222,7 +226,7 @@ typedef struct TableAmRoutine
 
 	/*
 	 * Initialize ParallelTableScanDesc for a parallel scan of this relation.
-	 * pscan will be sized according to parallelscan_estimate() for the same
+	 * `pscan` will be sized according to parallelscan_estimate() for the same
 	 * relation.
 	 */
 	Size		(*parallelscan_initialize) (Relation rel,
@@ -243,7 +247,7 @@ typedef struct TableAmRoutine
 
 	/*
 	 * Prepare to fetch tuples from the relation, as needed when fetching
-	 * tuples for an index scan.  The callback has to return a
+	 * tuples for an index scan.  The callback has to return an
 	 * IndexFetchTableData, which the AM will typically embed in a larger
 	 * structure with additional information.
 	 *
@@ -268,16 +272,16 @@ typedef struct TableAmRoutine
 	 * test, return true, false otherwise.
 	 *
 	 * Note that AMs that do not necessarily update indexes when indexed
-	 * columns do not change, need to return the current/correct version of a
-	 * tuple as appropriate, even if the tid points to an older version of the
-	 * tuple.
+	 * columns do not change, need to return the current/correct version of
+	 * the tuple that is visible to the snapshot, even if the tid points to an
+	 * older version of the tuple.
 	 *
 	 * *call_again is false on the first call to index_fetch_tuple for a tid.
 	 * If there potentially is another tuple matching the tid, *call_again
 	 * needs be set to true by index_fetch_tuple, signalling to the caller
 	 * that index_fetch_tuple should be called again for the same tid.
 	 *
-	 * *all_dead, if all_dead is not NULL, should be set to true if by
+	 * *all_dead, if all_dead is not NULL, should be set to true by
 	 * index_fetch_tuple iff it is guaranteed that no backend needs to see
 	 * that tuple. Index AMs can use that do avoid returning that tid in
 	 * future searches.
@@ -288,14 +292,14 @@ typedef struct TableAmRoutine
 									  TupleTableSlot *slot,
 									  bool *call_again, bool *all_dead);
 
+
 	/* ------------------------------------------------------------------------
 	 * Callbacks for non-modifying operations on individual tuples
 	 * ------------------------------------------------------------------------
 	 */
 
-
 	/*
-	 * Fetch tuple at `tid` into `slot, after doing a visibility test
+	 * Fetch tuple at `tid` into `slot`, after doing a visibility test
 	 * according to `snapshot`. If a tuple was found and passed the visibility
 	 * test, returns true, false otherwise.
 	 */
@@ -390,13 +394,13 @@ typedef struct TableAmRoutine
 	/*
 	 * Perform operations necessary to complete insertions made via
 	 * tuple_insert and multi_insert with a BulkInsertState specified. This
-	 * e.g. may e.g. used to flush the relation when inserting with
-	 * TABLE_INSERT_SKIP_WAL specified.
+	 * may for example be used to flush the relation, when the
+	 * TABLE_INSERT_SKIP_WAL option was used.
 	 *
 	 * Typically callers of tuple_insert and multi_insert will just pass all
-	 * the flags the apply to them, and each AM has to decide which of them
+	 * the flags that apply to them, and each AM has to decide which of them
 	 * make sense for it, and then only take actions in finish_bulk_insert
-	 * that make sense for a specific AM.
+	 * for those flags, and ignore others.
 	 *
 	 * Optional callback.
 	 */
@@ -412,10 +416,10 @@ typedef struct TableAmRoutine
 	 * This callback needs to create a new relation filenode for `rel`, with
 	 * appropriate durability behaviour for `persistence`.
 	 *
-	 * On output *freezeXid, *minmulti should be set to the values appropriate
-	 * for pg_class.{relfrozenxid, relminmxid} have to be set to. For AMs that
-	 * don't need those fields to be filled they can be set to
-	 * InvalidTransactionId, InvalidMultiXactId respectively.
+	 * On output *freezeXid, *minmulti must be set to the values appropriate
+	 * for pg_class.{relfrozenxid, relminmxid}. For AMs that don't need those
+	 * fields to be filled they can be set to InvalidTransactionId and
+	 * InvalidMultiXactId, respectively.
 	 *
 	 * See also table_relation_set_new_filenode().
 	 */
@@ -463,8 +467,8 @@ typedef struct TableAmRoutine
 	 * locked with a ShareUpdateExclusive lock.
 	 *
 	 * Note that neither VACUUM FULL (and CLUSTER), nor ANALYZE go through
-	 * this routine, even if (in the latter case), part of the same VACUUM
-	 * command.
+	 * this routine, even if (for ANALYZE) it is part of the same
+	 * VACUUM command.
 	 *
 	 * There probably, in the future, needs to be a separate callback to
 	 * integrate with autovacuum's scheduling.
@@ -487,8 +491,8 @@ typedef struct TableAmRoutine
 	 * sampling, e.g. because it's a metapage that could never contain tuples.
 	 *
 	 * XXX: This obviously is primarily suited for block-based AMs. It's not
-	 * clear what a good interface for non block based AMs would be, so don't
-	 * try to invent one yet.
+	 * clear what a good interface for non block based AMs would be, so there
+	 * isn't one yet.
 	 */
 	bool		(*scan_analyze_next_block) (TableScanDesc scan,
 											BlockNumber blockno,
@@ -537,7 +541,7 @@ typedef struct TableAmRoutine
 	/*
 	 * See table_relation_estimate_size().
 	 *
-	 * While block oriented, it shouldn't be too hard to for an AM that
+	 * While block oriented, it shouldn't be too hard for an AM that doesn't
 	 * doesn't internally use blocks to convert into a usable representation.
 	 */
 	void		(*relation_estimate_size) (Relation rel, int32 *attr_widths,
@@ -553,7 +557,7 @@ typedef struct TableAmRoutine
 	/*
 	 * Prepare to fetch / check / return tuples from `tbmres->blockno` as part
 	 * of a bitmap table scan. `scan` was started via table_beginscan_bm().
-	 * Return false if there's no tuples to be found on the page, true
+	 * Return false if there are no tuples to be found on the page, true
 	 * otherwise.
 	 *
 	 * This will typically read and pin the target block, and do the necessary
@@ -617,8 +621,8 @@ typedef struct TableAmRoutine
 	 * Note that it's not acceptable to hold deadlock prone resources such as
 	 * lwlocks until scan_sample_next_tuple() has exhausted the tuples on the
 	 * block - the tuple is likely to be returned to an upper query node, and
-	 * the next call could be off a long while. Holding buffer pins etc is
-	 * obviously OK.
+	 * the next call could be off a long while. Holding buffer pins and such
+	 * is obviously OK.
 	 *
 	 * Currently it is required to implement this interface, as there's no
 	 * alternative way (contrary e.g. to bitmap scans) to implement sample
@@ -707,7 +711,6 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 									   false, false, false);
 }
 
-
 /*
  * table_beginscan_bm is an alternative entry point for setting up a
  * TableScanDesc for a bitmap heap scan.  Although that scan technology is
@@ -762,7 +765,6 @@ table_endscan(TableScanDesc scan)
 	scan->rs_rd->rd_tableam->scan_end(scan);
 }
 
-
 /*
  * Restart a relation scan.
  */
@@ -795,7 +797,6 @@ table_rescan_set_params(TableScanDesc scan, struct ScanKeyData *key,
  */
 extern void table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot);
 
-
 /*
  * Return next tuple from `scan`, store in slot.
  */
@@ -833,7 +834,7 @@ extern void table_parallelscan_initialize(Relation rel,
  * table_parallelscan_initialize(), for the same relation. The initialization
  * does not need to have happened in this backend.
  *
- * Caller must hold a suitable lock on the correct relation.
+ * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation rel,
 						 ParallelTableScanDesc pscan);
@@ -904,7 +905,7 @@ table_index_fetch_end(struct IndexFetchTableData *scan)
  * The difference between this function and table_fetch_row_version is that
  * this function returns the currently visible version of a row if the AM
  * supports storing multiple row versions reachable via a single index entry
- * (like heap's HOT). Whereas table_fetch_row_version only evaluates the the
+ * (like heap's HOT). Whereas table_fetch_row_version only evaluates the
  * tuple exactly at `tid`. Outside of index entry ->table tuple lookups,
  * table_fetch_row_version is what's usually needed.
  */
@@ -940,7 +941,7 @@ extern bool table_index_fetch_tuple_check(Relation rel,
 
 
 /*
- * Fetch tuple at `tid` into `slot, after doing a visibility test according to
+ * Fetch tuple at `tid` into `slot`, after doing a visibility test according to
  * `snapshot`. If a tuple was found and passed the visibility test, returns
  * true, false otherwise.
  *
@@ -1009,8 +1010,8 @@ table_compute_xid_horizon_for_tuples(Relation rel,
  * behaviour of the AM. Several options might be ignored by AMs not supporting
  * them.
  *
- * If the TABLE_INSERT_SKIP_WAL option is specified, the new tuple will not
- * necessarily logged to WAL, even for a non-temp relation. It is the AMs
+ * If the TABLE_INSERT_SKIP_WAL option is specified, the new tuple doesn't
+ * need to be logged to WAL, even for a non-temp relation. It is the AMs
  * choice whether this optimization is supported.
  *
  * If the TABLE_INSERT_SKIP_FSM option is specified, AMs are free to not reuse
@@ -1030,7 +1031,7 @@ table_compute_xid_horizon_for_tuples(Relation rel,
  * relation.
  *
  * Note that most of these options will be applied when inserting into the
- * heap's TOAST table, too, if the tuple requires any out-of-line data
+ * heap's TOAST table, too, if the tuple requires any out-of-line data.
  *
  *
  * The BulkInsertState object (if any; bistate can be NULL for default
@@ -1082,7 +1083,7 @@ table_complete_speculative(Relation rel, TupleTableSlot *slot,
 }
 
 /*
- * Insert multiple tuple into a table.
+ * Insert multiple tuples into a table.
  *
  * This is like table_insert(), but inserts multiple tuples in one
  * operation. That's often faster than calling table_insert() in a loop,
@@ -1121,10 +1122,9 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
  *	changingPart - true iff the tuple is being moved to another partition
  *		table due to an update of the partition key. Otherwise, false.
  *
- * Normal, successful return value is TM_Ok, which
- * actually means we did delete it.  Failure return codes are
- * TM_SelfModified, TM_Updated, or TM_BeingModified
- * (the last only possible if wait == false).
+ * Normal, successful return value is TM_Ok, which means we did actually
+ * delete it.  Failure return codes are TM_SelfModified, TM_Updated, and
+ * TM_BeingModified (the last only possible if wait == false).
  *
  * In the failure cases, the routine fills *tmfd with the tuple's t_ctid,
  * t_xmax, and, if possible, and, if possible, t_cmax.  See comments for
@@ -1160,10 +1160,9 @@ table_delete(Relation rel, ItemPointer tid, CommandId cid,
  *  update_indexes - in success cases this is set to true if new index entries
  *		are required for this tuple
  *
- * Normal, successful return value is TM_Ok, which
- * actually means we *did* update it.  Failure return codes are
- * TM_SelfModified, TM_Updated, or TM_BeingModified
- * (the last only possible if wait == false).
+ * Normal, successful return value is TM_Ok, which means we did actually
+ * update it.  Failure return codes are TM_SelfModified, TM_Updated, and
+ * TM_BeingModified (the last only possible if wait == false).
  *
  * On success, the slot's tts_tid and tts_tableOid are updated to match the new
  * stored tuple; in particular, slot->tts_tid is set to the TID where the
@@ -1201,8 +1200,8 @@ table_update(Relation rel, ItemPointer otid, TupleTableSlot *slot,
  *	flags:
  *		If TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS, follow the update chain to
  *		also lock descendant tuples if lock modes don't conflict.
- *		If TUPLE_LOCK_FLAG_FIND_LAST_VERSION, update chain and lock latest
- *		version.
+ *		If TUPLE_LOCK_FLAG_FIND_LAST_VERSION, follow the update chain and lock
+ *		latest version.
  *
  * Output parameters:
  *	*slot: contains the target tuple
@@ -1303,7 +1302,7 @@ table_relation_copy_data(Relation rel, RelFileNode newrnode)
  * is copied in that index's order; if use_sort is false and OidIndex is
  * InvalidOid, no sorting is performed.
  *
- * OldestXmin, FreezeXid, MultiXactCutoff need to currently valid values for
+ * OldestXmin, FreezeXid, MultiXactCutoff must be currently valid values for
  * the table.
  *
  * *num_tuples, *tups_vacuumed, *tups_recently_dead will contain statistics
@@ -1329,15 +1328,15 @@ table_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 }
 
 /*
- * Perform VACUUM on the relation. The VACUUM can be user triggered or by
+ * Perform VACUUM on the relation. The VACUUM can be user-triggered or by
  * autovacuum. The specific actions performed by the AM will depend heavily on
  * the individual AM.
 
  * On entry a transaction needs to already been established, and the
- * transaction is locked with a ShareUpdateExclusive lock.
+ * table is locked with a ShareUpdateExclusive lock.
  *
  * Note that neither VACUUM FULL (and CLUSTER), nor ANALYZE go through this
- * routine, even if (in the latter case), part of the same VACUUM command.
+ * routine, even if (for ANALYZE) it is part of the same VACUUM command.
  */
 static inline void
 table_relation_vacuum(Relation rel, struct VacuumParams *params,
@@ -1363,7 +1362,7 @@ table_scan_analyze_next_block(TableScanDesc scan, BlockNumber blockno,
 }
 
 /*
- * Iterate over tuples tuples in the block selected with
+ * Iterate over tuples in the block selected with
  * table_scan_analyze_next_block() (which needs to have returned true, and
  * this routine may not have returned false for the same block before). If a
  * tuple that's suitable for sampling is found, true is returned and a tuple
@@ -1383,7 +1382,7 @@ table_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
 }
 
 /*
- * table_index_build_range_scan - scan the table to find tuples to be indexed
+ * table_index_build_scan - scan the table to find tuples to be indexed
  *
  * This is called back from an access-method-specific index build procedure
  * after the AM has done whatever setup it needs.  The parent heap relation
@@ -1515,8 +1514,8 @@ table_relation_estimate_size(Relation rel, int32 *attr_widths,
 /*
  * Prepare to fetch / check / return tuples from `tbmres->blockno` as part of
  * a bitmap table scan. `scan` needs to have been started via
- * table_beginscan_bm(). Returns false if there's no tuples to be found on the
- * page, true otherwise.
+ * table_beginscan_bm(). Returns false if there are no tuples to be found on
+ * the page, true otherwise.
  *
  * Note, this is an optionally implemented function, therefore should only be
  * used after verifying the presence (at plan time or such).
-- 
2.20.1

#144Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Heikki Linnakangas (#143)
Re: Pluggable Storage - Andres's take

On Mon, Apr 8, 2019 at 9:34 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

I wrote a little toy implementation that just returns constant data to
play with this a little. Looks good overall.

There were a bunch of typos in the comments in tableam.h, see attached.
Some of the comments could use more copy-editing and clarification, I
think, but I stuck to fixing just typos and such for now.

index_update_stats() calls RelationGetNumberOfBlocks(<table>). If the AM
doesn't use normal data files, that won't work. I bumped into that with
my toy implementation, which wouldn't need to create any data files, if
it wasn't for this.

The comments for relation_set_new_relfilenode() callback say that the AM
can set *freezeXid and *minmulti to invalid. But when I did that, VACUUM
hits this assertion:

TRAP: FailedAssertion("!(((classForm->relfrozenxid) >= ((TransactionId)
3)))", File: "vacuum.c", Line: 1323)

There's a little bug in index-only scan executor node, where it mixes up
the slots to hold a tuple from the index, and from the table. That
doesn't cause any ill effects if the AM uses TTSOpsHeapTuple, but with
my toy AM, which uses a virtual slot, it caused warnings like this from
index-only scans:

WARNING: problem in alloc set ExecutorState: detected write past chunk
end in block 0x56419b0f88e8, chunk 0x56419b0f8f90

Attached is a patch with the toy implementation I used to test this.
I'm not suggesting we should commit that - although feel free to do that
if you think it's useful - but it shows how I bumped into these issues.
The second patch fixes the index-only-scan slot confusion (untested,
except with my toy AM).

Awesome... it's built and ran tests cleanly, but I got assertion running
VACUUM:

fabrizio=# vacuum toytab ;
TRAP: FailedAssertion("!(((classForm->relfrozenxid) >= ((TransactionId)
3)))", File: "vacuum.c", Line: 1323)
psql: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: 2019-04-08
12:29:16.204 -03 [20844] LOG: server process (PID 24457) was terminated by
signal 6: Aborted
2019-04-08 12:29:16.204 -03 [20844] DETAIL: Failed process was running:
vacuum toytab ;
2019-04-08 12:29:16.204 -03 [20844] LOG: terminating any other active
server processes
2019-04-08 12:29:16.205 -03 [24458] WARNING: terminating connection
because of crash of another server process

And backtrace is:

(gdb) bt
#0 0x00007f813779f428 in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007f81377a102a in __GI_abort () at abort.c:89
#2 0x0000000000ec0de9 in ExceptionalCondition (conditionName=0x10e3bb8
"!(((classForm->relfrozenxid) >= ((TransactionId) 3)))",
errorType=0x10e33f3 "FailedAssertion", fileName=0x10e345a "vacuum.c",
lineNumber=1323) at assert.c:54
#3 0x0000000000893646 in vac_update_datfrozenxid () at vacuum.c:1323
#4 0x000000000089127a in vacuum (relations=0x26c4390,
params=0x7ffeb1a3fb30, bstrategy=0x26c4218, isTopLevel=true) at vacuum.c:452
#5 0x00000000008906ae in ExecVacuum (pstate=0x26145b8, vacstmt=0x25f46f0,
isTopLevel=true) at vacuum.c:196
#6 0x0000000000c3a883 in standard_ProcessUtility (pstmt=0x25f4a50,
queryString=0x25f3be8 "vacuum toytab ;", context=PROCESS_UTILITY_TOPLEVEL,
params=0x0, queryEnv=0x0, dest=0x25f4b48, completionTag=0x7ffeb1a3ffb0 "")
at utility.c:670
#7 0x0000000000c3977a in ProcessUtility (pstmt=0x25f4a50,
queryString=0x25f3be8 "vacuum toytab ;", context=PROCESS_UTILITY_TOPLEVEL,
params=0x0, queryEnv=0x0, dest=0x25f4b48, completionTag=0x7ffeb1a3ffb0 "")
at utility.c:360
#8 0x0000000000c3793e in PortalRunUtility (portal=0x265ba28,
pstmt=0x25f4a50, isTopLevel=true, setHoldSnapshot=false, dest=0x25f4b48,
completionTag=0x7ffeb1a3ffb0 "") at pquery.c:1175
#9 0x0000000000c37d7f in PortalRunMulti (portal=0x265ba28,
isTopLevel=true, setHoldSnapshot=false, dest=0x25f4b48, altdest=0x25f4b48,
completionTag=0x7ffeb1a3ffb0 "") at pquery.c:1321
#10 0x0000000000c36899 in PortalRun (portal=0x265ba28,
count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x25f4b48,
altdest=0x25f4b48, completionTag=0x7ffeb1a3ffb0 "") at pquery.c:796
#11 0x0000000000c2a40e in exec_simple_query (query_string=0x25f3be8 "vacuum
toytab ;") at postgres.c:1215
#12 0x0000000000c332a3 in PostgresMain (argc=1, argv=0x261fe68,
dbname=0x261fca8 "fabrizio", username=0x261fc80 "fabrizio") at
postgres.c:4249
#13 0x0000000000b051fc in BackendRun (port=0x2616d20) at postmaster.c:4429
#14 0x0000000000b042c3 in BackendStartup (port=0x2616d20) at
postmaster.c:4120
#15 0x0000000000afc70a in ServerLoop () at postmaster.c:1703
#16 0x0000000000afb94e in PostmasterMain (argc=3, argv=0x25ed850) at
postmaster.c:1376
#17 0x0000000000977de8 in main (argc=3, argv=0x25ed850) at main.c:228

Isn't better raise an exception as you did in other functions??

static void
toyam_relation_vacuum(Relation onerel,
struct VacuumParams *params,
BufferAccessStrategy bstrategy)
{
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("function %s not implemented yet", __func__)));
}

Regards,

--
Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

#145Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#143)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-04-08 15:34:46 +0300, Heikki Linnakangas wrote:

There were a bunch of typos in the comments in tableam.h, see attached. Some
of the comments could use more copy-editing and clarification, I think, but
I stuck to fixing just typos and such for now.

I pushed these after adding three boring changes by pgindent. Thanks for
those!

I'd greatly welcome more feedback on the comments - I've been pretty
deep in this for so long that I don't see all of the issues anymore. And
a mild dyslexia doesn't help...

index_update_stats() calls RelationGetNumberOfBlocks(<table>). If the AM
doesn't use normal data files, that won't work. I bumped into that with my
toy implementation, which wouldn't need to create any data files, if it
wasn't for this.

Hm. That should be fixed. I've been burning the candle on both ends for
too long, so I'll not get to it today. But I think we should fix it
soon. I'll create an open item for it.

The comments for relation_set_new_relfilenode() callback say that the AM can
set *freezeXid and *minmulti to invalid. But when I did that, VACUUM hits
this assertion:

TRAP: FailedAssertion("!(((classForm->relfrozenxid) >= ((TransactionId)
3)))", File: "vacuum.c", Line: 1323)

Hm. That needs to be fixed - IIRC it previously worked, because zheap
doesn't have relfrozenxid either. Probably broke it when trying to
winnow down the tableam patches. I'm planning to rebase zheap onto the
newest version soon, so I'll re-encounter this.

There's a little bug in index-only scan executor node, where it mixes up the
slots to hold a tuple from the index, and from the table. That doesn't cause
any ill effects if the AM uses TTSOpsHeapTuple, but with my toy AM, which
uses a virtual slot, it caused warnings like this from index-only scans:

Hm. That's another one that I think I had fixed previously :(, and then
concluded that it's not actually necessary for some reason. Your fix
looks correct to me. Do you want to commit it? Otherwise I'll look at
it after rebasing zheap, and checking it with that.

Attached is a patch with the toy implementation I used to test this. I'm not
suggesting we should commit that - although feel free to do that if you
think it's useful - but it shows how I bumped into these issues.

Hm, probably not a bad idea to include something like it. It seems like
we kinda would need non-stub implementation of more functions for it to
test much / and to serve as an example. I'm mildy inclined to just do
it via zheap / externally, but I'm not quite sure that's good enough.

+static Size
+toyam_parallelscan_estimate(Relation rel)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}

The other stubbed functions seem like we should require them, but I
wonder if we should make the parallel stuff optional?

Greetings,

Andres Freund

#146Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Andres Freund (#145)
4 attachment(s)
Re: Pluggable Storage - Andres's take

On 08/04/2019 20:37, Andres Freund wrote:

On 2019-04-08 15:34:46 +0300, Heikki Linnakangas wrote:

There's a little bug in index-only scan executor node, where it mixes up the
slots to hold a tuple from the index, and from the table. That doesn't cause
any ill effects if the AM uses TTSOpsHeapTuple, but with my toy AM, which
uses a virtual slot, it caused warnings like this from index-only scans:

Hm. That's another one that I think I had fixed previously :(, and then
concluded that it's not actually necessary for some reason. Your fix
looks correct to me. Do you want to commit it? Otherwise I'll look at
it after rebasing zheap, and checking it with that.

I found another slot type confusion bug, while playing with zedstore. In
an Index Scan, if you have an ORDER BY key that needs to be rechecked,
so that it uses the reorder queue, then it will sometimes use the
reorder queue slot, and sometimes the table AM's slot, for the scan
slot. If they're not of the same type, you get an assertion:

TRAP: FailedAssertion("!(op->d.fetch.kind == slot->tts_ops)", File:
"execExprInterp.c", Line: 1905)

Attached is a test for this, again using the toy table AM, extended to
be able to test this. And a fix.

Attached is a patch with the toy implementation I used to test this. I'm not
suggesting we should commit that - although feel free to do that if you
think it's useful - but it shows how I bumped into these issues.

Hm, probably not a bad idea to include something like it. It seems like
we kinda would need non-stub implementation of more functions for it to
test much / and to serve as an example. I'm mildy inclined to just do
it via zheap / externally, but I'm not quite sure that's good enough.

Works for me.

+static Size
+toyam_parallelscan_estimate(Relation rel)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}

The other stubbed functions seem like we should require them, but I
wonder if we should make the parallel stuff optional?

Yeah, that would be good. I would assume it to be optional.

- Heikki

Attachments:

0001-Fix-confusion-on-different-kinds-of-slots-in-IndexOn.patchtext/x-patch; name=0001-Fix-confusion-on-different-kinds-of-slots-in-IndexOn.patchDownload
From e8854c876927b32e21f485337dd2335f4bfebd32 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 8 Apr 2019 15:18:19 +0300
Subject: [PATCH 1/4] Fix confusion on different kinds of slots in
 IndexOnlyScans.

We used the same slot, to store a tuple from the index, and to store a
tuple from the table. That's not OK. It worked with the heap, because
heapam_getnextslot() stores a HeapTuple to the slot, and doesn't care how
large the tts_values/nulls arrays are. But when I played with a toy table
AM implementation that used a virtual tuple, it caused memory overruns.
---
 src/backend/executor/nodeIndexonlyscan.c | 16 +++++++++++++---
 src/include/nodes/execnodes.h            |  1 +
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 7711728495c..5833d683b38 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -166,10 +166,10 @@ IndexOnlyNext(IndexOnlyScanState *node)
 			 * Rats, we have to visit the heap to check visibility.
 			 */
 			InstrCountTuples2(node, 1);
-			if (!index_fetch_heap(scandesc, slot))
+			if (!index_fetch_heap(scandesc, node->ioss_TableSlot))
 				continue;		/* no visible tuple, try next index entry */
 
-			ExecClearTuple(slot);
+			ExecClearTuple(node->ioss_TableSlot);
 
 			/*
 			 * Only MVCC snapshots are supported here, so there should be no
@@ -528,7 +528,17 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	 */
 	tupDesc = ExecTypeFromTL(node->indextlist);
 	ExecInitScanTupleSlot(estate, &indexstate->ss, tupDesc,
-						  table_slot_callbacks(currentRelation));
+						  &TTSOpsVirtual);
+
+	/*
+	 * We need another slot, in a format that's suitable for the table AM,
+	 * for when we need to fetch a tuple from the table for rechecking
+	 * visibility.
+	 */
+	indexstate->ioss_TableSlot =
+		ExecAllocTableSlot(&estate->es_tupleTable,
+						   RelationGetDescr(currentRelation),
+						   table_slot_callbacks(currentRelation));
 
 	/*
 	 * Initialize result type and projection info.  The node's targetlist will
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a5e4b7ef2e0..108dee61e24 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1424,6 +1424,7 @@ typedef struct IndexOnlyScanState
 	struct IndexScanDescData *ioss_ScanDesc;
 	Buffer		ioss_VMBuffer;
 	Size		ioss_PscanLen;
+	TupleTableSlot *ioss_TableSlot;
 } IndexOnlyScanState;
 
 /* ----------------
-- 
2.20.1

0002-Add-a-toy-table-AM-implementation-to-play-with.patchtext/x-patch; name=0002-Add-a-toy-table-AM-implementation-to-play-with.patchDownload
From 309a773e09aa3d1256a431f2030f0f5819e6b32d Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 8 Apr 2019 15:16:53 +0300
Subject: [PATCH 2/4] Add a toy table AM implementation to play with.

It returns a constant data set. No insert/update/delete. But you can
create indexes.
---
 src/test/modules/toytable/Makefile            |  25 +
 .../modules/toytable/expected/toytable.out    |  41 ++
 src/test/modules/toytable/sql/toytable.sql    |  17 +
 src/test/modules/toytable/toytable--1.0.sql   |  12 +
 src/test/modules/toytable/toytable.control    |   4 +
 src/test/modules/toytable/toytableam.c        | 630 ++++++++++++++++++
 6 files changed, 729 insertions(+)
 create mode 100644 src/test/modules/toytable/Makefile
 create mode 100644 src/test/modules/toytable/expected/toytable.out
 create mode 100644 src/test/modules/toytable/sql/toytable.sql
 create mode 100644 src/test/modules/toytable/toytable--1.0.sql
 create mode 100644 src/test/modules/toytable/toytable.control
 create mode 100644 src/test/modules/toytable/toytableam.c

diff --git a/src/test/modules/toytable/Makefile b/src/test/modules/toytable/Makefile
new file mode 100644
index 00000000000..142ef2d23e6
--- /dev/null
+++ b/src/test/modules/toytable/Makefile
@@ -0,0 +1,25 @@
+# src/test/modules/toytable/Makefile
+
+MODULE_big = toytable
+OBJS = toytableam.o $(WIN32RES)
+PGFILEDESC = "A dummy implementantation of the table AM API"
+
+EXTENSION = toytable
+DATA = toytable--1.0.sql
+
+REGRESS = toytable
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/toytable
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+OBJS = toytableam.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/test/modules/toytable/expected/toytable.out b/src/test/modules/toytable/expected/toytable.out
new file mode 100644
index 00000000000..3e8598e284c
--- /dev/null
+++ b/src/test/modules/toytable/expected/toytable.out
@@ -0,0 +1,41 @@
+CREATE EXTENSION toytable;
+create table toytab (i int4, j int4, k int4) using toytable;
+select * from toytab;
+ i  | j  | k  
+----+----+----
+  1 |  1 |  1
+  2 |  2 |  2
+  3 |  3 |  3
+  4 |  4 |  4
+  5 |  5 |  5
+  6 |  6 |  6
+  7 |  7 |  7
+  8 |  8 |  8
+  9 |  9 |  9
+ 10 | 10 | 10
+(10 rows)
+
+create index toyidx on toytab(i);
+-- test index scan
+set enable_seqscan=off;
+set enable_indexscan=on;
+select i, j from toytab where i = 4;
+ i | j 
+---+---
+ 4 | 4
+(1 row)
+
+-- index only scan
+explain (costs off) select i from toytab where i = 4;
+               QUERY PLAN               
+----------------------------------------
+ Index Only Scan using toyidx on toytab
+   Index Cond: (i = 4)
+(2 rows)
+
+select i from toytab where i = 4 ;
+ i 
+---
+ 4
+(1 row)
+
diff --git a/src/test/modules/toytable/sql/toytable.sql b/src/test/modules/toytable/sql/toytable.sql
new file mode 100644
index 00000000000..8d9bac41bbf
--- /dev/null
+++ b/src/test/modules/toytable/sql/toytable.sql
@@ -0,0 +1,17 @@
+CREATE EXTENSION toytable;
+
+create table toytab (i int4, j int4, k int4) using toytable;
+
+select * from toytab;
+
+create index toyidx on toytab(i);
+
+-- test index scan
+set enable_seqscan=off;
+set enable_indexscan=on;
+
+select i, j from toytab where i = 4;
+
+-- index only scan
+explain (costs off) select i from toytab where i = 4;
+select i from toytab where i = 4 ;
diff --git a/src/test/modules/toytable/toytable--1.0.sql b/src/test/modules/toytable/toytable--1.0.sql
new file mode 100644
index 00000000000..52085d27f4a
--- /dev/null
+++ b/src/test/modules/toytable/toytable--1.0.sql
@@ -0,0 +1,12 @@
+/* src/test/modules/toyam/toyam--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION toytable" to load this file. \quit
+
+CREATE FUNCTION toytableam_handler(internal)
+RETURNS pg_catalog.table_am_handler STRICT
+AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE ACCESS METHOD toytable TYPE TABLE HANDLER toytableam_handler
+
+
diff --git a/src/test/modules/toytable/toytable.control b/src/test/modules/toytable/toytable.control
new file mode 100644
index 00000000000..8f613e58d6e
--- /dev/null
+++ b/src/test/modules/toytable/toytable.control
@@ -0,0 +1,4 @@
+comment = 'Dummy implementation of table AM api'
+default_version = '1.0'
+module_pathname = '$libdir/toytable'
+relocatable = true
diff --git a/src/test/modules/toytable/toytableam.c b/src/test/modules/toytable/toytableam.c
new file mode 100644
index 00000000000..4cb2b5d75db
--- /dev/null
+++ b/src/test/modules/toytable/toytableam.c
@@ -0,0 +1,630 @@
+/*-------------------------------------------------------------------------
+ *
+ * toyam_handler.c
+ *	  a toy table access method code
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/toytable/toyam_handler.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <math.h>
+
+#include "miscadmin.h"
+
+#include "access/multixact.h"
+#include "access/relscan.h"
+#include "access/tableam.h"
+#include "catalog/catalog.h"
+#include "catalog/storage.h"
+#include "catalog/index.h"
+#include "catalog/pg_type.h"
+#include "executor/executor.h"
+#include "utils/builtins.h"
+#include "utils/rel.h"
+#include "storage/bufmgr.h"
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(toytableam_handler);
+
+typedef struct
+{
+	TableScanDescData scan;
+
+	int			tupidx;
+} ToyScanDescData;
+typedef ToyScanDescData *ToyScanDesc;
+
+static const TupleTableSlotOps *
+toyam_slot_callbacks(Relation relation)
+{
+	return &TTSOpsVirtual;
+}
+
+static TableScanDesc toyam_scan_begin(Relation rel,
+							 Snapshot snapshot,
+							 int nkeys, struct ScanKeyData *key,
+							 ParallelTableScanDesc pscan,
+							 bool allow_strat,
+							 bool allow_sync,
+							 bool allow_pagemode,
+							 bool is_bitmapscan,
+							 bool is_samplescan,
+							 bool temp_snap)
+{
+	ToyScanDesc tscan;
+
+	tscan = palloc0(sizeof(ToyScanDescData));
+	tscan->scan.rs_rd = rel;
+	tscan->scan.rs_snapshot = snapshot;
+	tscan->scan.rs_nkeys = nkeys;
+	tscan->scan.rs_bitmapscan = is_bitmapscan;
+	tscan->scan.rs_samplescan = is_samplescan;
+	tscan->scan.rs_allow_strat = allow_strat;
+	tscan->scan.rs_allow_sync = allow_sync;
+	tscan->scan.rs_temp_snap = temp_snap;
+	tscan->scan.rs_parallel = pscan;
+
+	tscan->tupidx = 0;
+
+	return &tscan->scan;
+}
+
+static void
+toyam_scan_end(TableScanDesc scan)
+{
+	pfree(scan);
+}
+
+static void
+toyam_scan_rescan(TableScanDesc scan, struct ScanKeyData *key,
+				  bool set_params, bool allow_strat,
+				  bool allow_sync, bool allow_pagemode)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static Datum
+invent_toy_data(Oid atttypid, int idx, bool *isnull_p)
+{
+	Datum		d;
+	bool		isnull;
+
+	switch (atttypid)
+	{
+		case INT4OID:
+			d = Int32GetDatum(idx);
+			isnull = false;
+			break;
+
+		case BOXOID:
+			d = DirectFunctionCall1(box_in,
+									CStringGetDatum(psprintf("(%d,%d),(%d,%d)",
+															 idx - 10, idx - 10, idx+10, idx+10)));
+			isnull = false;
+			break;
+
+		case POLYGONOID:
+			d = DirectFunctionCall1(poly_in,
+									CStringGetDatum(psprintf("%d,%d,%d,%d", 0,0, idx, idx)));
+			isnull = false;
+			break;
+
+		default:
+			d = (Datum) 0;
+			isnull = true;
+			break;
+	}
+
+	*isnull_p = isnull;
+	return d;
+}
+
+static bool
+toyam_scan_getnextslot(TableScanDesc scan,
+					   ScanDirection direction,
+					   TupleTableSlot *slot)
+{
+	ToyScanDesc tscan = (ToyScanDesc) scan;
+
+	slot->tts_nvalid = 0;
+	slot->tts_flags |= TTS_FLAG_EMPTY;
+
+	/*
+	 * Return a constant 1 rows. Every int4 attribute gets
+	 * a running count, everything else is NULL.
+	 */
+	if (tscan->tupidx < 10)
+	{
+		TupleDesc desc = RelationGetDescr(tscan->scan.rs_rd);
+
+		tscan->tupidx++;
+
+		for (AttrNumber attno = 1; attno <= desc->natts; attno++)
+		{
+			Form_pg_attribute att = &desc->attrs[attno - 1];
+			Datum		d;
+			bool		isnull;
+
+			d = invent_toy_data(att->atttypid, tscan->tupidx, &isnull);
+
+			slot->tts_values[attno - 1] = d;
+			slot->tts_isnull[attno - 1] = isnull;
+		}
+
+		ItemPointerSet(&slot->tts_tid, 1, tscan->tupidx);
+		slot->tts_nvalid = slot->tts_tupleDescriptor->natts;
+		slot->tts_flags &= ~TTS_FLAG_EMPTY;
+
+		return true;
+	}
+	else
+		return false;
+}
+
+static Size
+toyam_parallelscan_estimate(Relation rel)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static Size
+toyam_parallelscan_initialize(Relation rel,
+							  ParallelTableScanDesc pscan)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_parallelscan_reinitialize(Relation rel,
+								ParallelTableScanDesc pscan)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static struct IndexFetchTableData *
+toyam_index_fetch_begin(Relation rel)
+{
+	IndexFetchTableData *tfetch = palloc0(sizeof(IndexFetchTableData));
+
+	tfetch->rel = rel;
+
+	return tfetch;
+}
+
+static void
+toyam_index_fetch_reset(struct IndexFetchTableData *data)
+{
+}
+
+static void
+toyam_index_fetch_end(struct IndexFetchTableData *data)
+{
+	pfree(data);
+}
+
+static bool
+toyam_index_fetch_tuple(struct IndexFetchTableData *scan,
+						ItemPointer tid,
+						Snapshot snapshot,
+						TupleTableSlot *slot,
+						bool *call_again, bool *all_dead)
+{
+	TupleDesc desc = RelationGetDescr(scan->rel);
+	int			tupidx;
+
+	if (ItemPointerGetBlockNumber(tid) != 1)
+		return false;
+
+	tupidx = ItemPointerGetOffsetNumber(tid);
+	if (tupidx < 1 || tupidx > 10)
+		return false;
+
+	slot->tts_nvalid = 0;
+	slot->tts_flags |= TTS_FLAG_EMPTY;
+
+	/* Return same data as toyam_scan_getnextslot does */
+	for (AttrNumber attno = 1; attno <= desc->natts; attno++)
+	{
+		Form_pg_attribute att = &desc->attrs[attno - 1];
+		Datum		d;
+		bool		isnull;
+
+		d = invent_toy_data(att->atttypid, tupidx, &isnull);
+
+		slot->tts_values[attno - 1] = d;
+		slot->tts_isnull[attno - 1] = isnull;
+	}
+
+	ItemPointerSet(&slot->tts_tid, 1, tupidx);
+	slot->tts_nvalid = slot->tts_tupleDescriptor->natts;
+	slot->tts_flags &= ~TTS_FLAG_EMPTY;
+
+	return true;
+}
+
+static bool
+toyam_tuple_fetch_row_version(Relation rel,
+							  ItemPointer tid,
+							  Snapshot snapshot,
+							  TupleTableSlot *slot)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_tuple_get_latest_tid(Relation rel,
+						   Snapshot snapshot,
+						   ItemPointer tid)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static bool
+toyam_tuple_satisfies_snapshot(Relation rel,
+							   TupleTableSlot *slot,
+							   Snapshot snapshot)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static TransactionId
+toyam_compute_xid_horizon_for_tuples(Relation rel,
+									 ItemPointerData *items,
+									 int nitems)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_tuple_insert(Relation rel, TupleTableSlot *slot,
+				   CommandId cid, int options,
+				   struct BulkInsertStateData *bistate)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_tuple_insert_speculative(Relation rel,
+							   TupleTableSlot *slot,
+							   CommandId cid,
+							   int options,
+							   struct BulkInsertStateData *bistate,
+							   uint32 specToken)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_tuple_complete_speculative(Relation rel,
+								 TupleTableSlot *slot,
+								 uint32 specToken,
+								 bool succeeded)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static TM_Result
+toyam_tuple_delete(Relation rel,
+				   ItemPointer tid,
+				   CommandId cid,
+				   Snapshot snapshot,
+				   Snapshot crosscheck,
+				   bool wait,
+				   TM_FailureData *tmfd,
+				   bool changingPart)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
+				   CommandId cid, int options, struct BulkInsertStateData *bistate)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static TM_Result
+toyam_tuple_update(Relation rel,
+				   ItemPointer otid,
+				   TupleTableSlot *slot,
+				   CommandId cid,
+				   Snapshot snapshot,
+				   Snapshot crosscheck,
+				   bool wait,
+				   TM_FailureData *tmfd,
+				   LockTupleMode *lockmode,
+				   bool *update_indexes)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static TM_Result
+toyam_tuple_lock(Relation rel,
+				 ItemPointer tid,
+				 Snapshot snapshot,
+				 TupleTableSlot *slot,
+				 CommandId cid,
+				 LockTupleMode mode,
+				 LockWaitPolicy wait_policy,
+				 uint8 flags,
+				 TM_FailureData *tmfd)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_finish_bulk_insert(Relation rel, int options)
+{
+	return;
+}
+
+static void
+toyam_relation_set_new_filenode(Relation rel,
+								char persistence,
+								TransactionId *freezeXid,
+								MultiXactId *minmulti)
+{
+	*freezeXid = InvalidTransactionId;
+	*minmulti = InvalidMultiXactId;
+
+	/*
+	 * FIXME: We don't need this for anything. But index build calls
+	 * RelationGetNumberOfBlocks, from index_update_stats(), and that
+	 * fails if the underlying file doesn't exist.
+	 */
+	RelationCreateStorage(rel->rd_node, persistence);
+}
+
+static void
+toyam_relation_nontransactional_truncate(Relation rel)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_relation_copy_data(Relation rel, RelFileNode newrnode)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_relation_copy_for_cluster(Relation NewHeap,
+								Relation OldHeap,
+								Relation OldIndex,
+								bool use_sort,
+								TransactionId OldestXmin,
+								TransactionId FreezeXid,
+								MultiXactId MultiXactCutoff,
+								double *num_tuples,
+								double *tups_vacuumed,
+								double *tups_recently_dead)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_relation_vacuum(Relation onerel,
+					  struct VacuumParams *params,
+					  BufferAccessStrategy bstrategy)
+{
+	/* we've got nothing to do */
+}
+
+static bool
+toyam_scan_analyze_next_block(TableScanDesc scan,
+							  BlockNumber blockno,
+							  BufferAccessStrategy bstrategy)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static bool
+toyam_scan_analyze_next_tuple(TableScanDesc scan,
+							  TransactionId OldestXmin,
+							  double *liverows,
+							  double *deadrows,
+							  TupleTableSlot *slot)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static double
+toyam_index_build_range_scan(Relation heap_rel,
+							 Relation index_rel,
+							 struct IndexInfo *index_nfo,
+							 bool allow_sync,
+							 bool anyvisible,
+							 bool progress,
+							 BlockNumber start_blockno,
+							 BlockNumber end_blockno,
+							 IndexBuildCallback callback,
+							 void *callback_state,
+							 TableScanDesc scan)
+{
+	TupleTableSlot *slot;
+	EState     *estate;
+
+	estate = CreateExecutorState();
+	slot = table_slot_create(heap_rel, NULL);
+
+	if (!scan)
+		scan = toyam_scan_begin(heap_rel,
+								SnapshotAny,
+								0, NULL,
+								NULL,
+								false,
+								false,
+								false,
+								false,
+								false,
+								false);
+
+	while (toyam_scan_getnextslot(scan, ForwardScanDirection, slot))
+	{
+		Datum           values[INDEX_MAX_KEYS];
+		bool            isnull[INDEX_MAX_KEYS];
+		HeapTuple		heapTuple;
+
+		FormIndexDatum(index_nfo, slot, estate, values, isnull);
+
+		/* Call the AM's callback routine to process the tuple */
+		heapTuple = ExecCopySlotHeapTuple(slot);
+		heapTuple->t_self = slot->tts_tid;
+		callback(index_rel, heapTuple, values, isnull, true,
+				 callback_state);
+		pfree(heapTuple);
+	}
+
+	toyam_scan_end(scan);
+	ExecDropSingleTupleTableSlot(slot);
+	FreeExecutorState(estate);
+
+	return 10;
+}
+
+static void
+toyam_index_validate_scan(Relation heap_rel,
+						  Relation index_rel,
+						  struct IndexInfo *index_info,
+						  Snapshot snapshot,
+						  struct ValidateIndexState *state)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static void
+toyam_relation_estimate_size(Relation rel, int32 *attr_widths,
+							 BlockNumber *pages, double *tuples,
+							 double *allvisfrac)
+{
+	*pages = 1;
+	*tuples = 1;
+	*allvisfrac = 1.0;
+}
+
+static bool
+toyam_scan_sample_next_block(TableScanDesc scan,
+							 struct SampleScanState *scanstate)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static bool
+toyam_scan_sample_next_tuple(TableScanDesc scan,
+					   struct SampleScanState *scanstate,
+					   TupleTableSlot *slot)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function %s not implemented yet", __func__)));
+}
+
+static const TableAmRoutine toyam_methods = {
+	.type = T_TableAmRoutine,
+
+	.slot_callbacks = toyam_slot_callbacks,
+
+	.scan_begin = toyam_scan_begin,
+	.scan_end = toyam_scan_end,
+	.scan_rescan = toyam_scan_rescan,
+	.scan_getnextslot = toyam_scan_getnextslot,
+
+	.parallelscan_estimate = toyam_parallelscan_estimate,
+	.parallelscan_initialize = toyam_parallelscan_initialize,
+	.parallelscan_reinitialize = toyam_parallelscan_reinitialize,
+
+	.index_fetch_begin = toyam_index_fetch_begin,
+	.index_fetch_reset = toyam_index_fetch_reset,
+	.index_fetch_end = toyam_index_fetch_end,
+	.index_fetch_tuple = toyam_index_fetch_tuple,
+
+	.tuple_fetch_row_version = toyam_tuple_fetch_row_version,
+	.tuple_get_latest_tid = toyam_tuple_get_latest_tid,
+	.tuple_satisfies_snapshot = toyam_tuple_satisfies_snapshot,
+	.compute_xid_horizon_for_tuples = toyam_compute_xid_horizon_for_tuples,
+
+	.tuple_insert = toyam_tuple_insert,
+	.tuple_insert_speculative = toyam_tuple_insert_speculative,
+	.tuple_complete_speculative = toyam_tuple_complete_speculative,
+	.multi_insert = toyam_multi_insert,
+	.tuple_delete = toyam_tuple_delete,
+	.tuple_update = toyam_tuple_update,
+	.tuple_lock = toyam_tuple_lock,
+	.finish_bulk_insert = toyam_finish_bulk_insert,
+
+	.relation_set_new_filenode = toyam_relation_set_new_filenode,
+	.relation_nontransactional_truncate = toyam_relation_nontransactional_truncate,
+	.relation_copy_data = toyam_relation_copy_data,
+	.relation_copy_for_cluster = toyam_relation_copy_for_cluster,
+	.relation_vacuum = toyam_relation_vacuum,
+
+	.scan_analyze_next_block = toyam_scan_analyze_next_block,
+	.scan_analyze_next_tuple = toyam_scan_analyze_next_tuple,
+	.index_build_range_scan = toyam_index_build_range_scan,
+	.index_validate_scan = toyam_index_validate_scan,
+
+	.relation_estimate_size = toyam_relation_estimate_size,
+
+	.scan_bitmap_next_block = NULL,
+	.scan_bitmap_next_tuple = NULL,
+	.scan_sample_next_block = toyam_scan_sample_next_block,
+	.scan_sample_next_tuple = toyam_scan_sample_next_tuple,
+};
+
+Datum
+toytableam_handler(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_POINTER(&toyam_methods);
+}
-- 
2.20.1

0003-Add-test-for-bug-with-index-reorder-queue-slot-type-.patchtext/x-patch; name=0003-Add-test-for-bug-with-index-reorder-queue-slot-type-.patchDownload
From 56bd50a63afc4c2c3f31f26a0411bf22c54bbb6f Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 9 Apr 2019 15:36:59 +0300
Subject: [PATCH 3/4] Add test for bug with index reorder queue slot type
 confusion.

This currently fails an assertion.
---
 src/test/modules/toytable/Makefile            |  2 +-
 .../toytable/expected/index-reorder-slot.out  | 25 ++++++++++++
 .../toytable/sql/index-reorder-slot.sql       | 15 +++++++
 src/test/modules/toytable/toytableam.c        | 40 ++++++++++++++++---
 4 files changed, 76 insertions(+), 6 deletions(-)
 create mode 100644 src/test/modules/toytable/expected/index-reorder-slot.out
 create mode 100644 src/test/modules/toytable/sql/index-reorder-slot.sql

diff --git a/src/test/modules/toytable/Makefile b/src/test/modules/toytable/Makefile
index 142ef2d23e6..a763ddacc89 100644
--- a/src/test/modules/toytable/Makefile
+++ b/src/test/modules/toytable/Makefile
@@ -7,7 +7,7 @@ PGFILEDESC = "A dummy implementantation of the table AM API"
 EXTENSION = toytable
 DATA = toytable--1.0.sql
 
-REGRESS = toytable
+REGRESS = toytable index-reorder-slot
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/src/test/modules/toytable/expected/index-reorder-slot.out b/src/test/modules/toytable/expected/index-reorder-slot.out
new file mode 100644
index 00000000000..52a1ccec850
--- /dev/null
+++ b/src/test/modules/toytable/expected/index-reorder-slot.out
@@ -0,0 +1,25 @@
+create extension toytable;
+ERROR:  extension "toytable" already exists
+set toytable.num_tuples = 10000;
+-- This is a variant of the test in 'polygon' regression test, but using
+-- the toy table. It used to fail an assertion, because index scan would
+-- sometimes use the reorderqueue's slot, which is a heaptuple slot, and
+-- sometimes the table's scan slot, which is a virtual slot with toytable
+CREATE TABLE quad_poly_toy (id int, p polygon) USING toytable;
+CREATE INDEX quad_poly_toy_idx ON quad_poly_toy USING spgist(p);
+set enable_seqscan=off;
+SELECT * FROM quad_poly_toy ORDER BY p <-> point '123,456' LIMIT 10;
+ id  |           p           
+-----+-----------------------
+ 284 | ((284,284),(294,294))
+ 283 | ((283,283),(293,293))
+ 287 | ((287,287),(297,297))
+ 286 | ((286,286),(296,296))
+ 280 | ((280,280),(290,290))
+ 289 | ((289,289),(299,299))
+ 288 | ((288,288),(298,298))
+ 285 | ((285,285),(295,295))
+ 281 | ((281,281),(291,291))
+ 282 | ((282,282),(292,292))
+(10 rows)
+
diff --git a/src/test/modules/toytable/sql/index-reorder-slot.sql b/src/test/modules/toytable/sql/index-reorder-slot.sql
new file mode 100644
index 00000000000..7154639aa27
--- /dev/null
+++ b/src/test/modules/toytable/sql/index-reorder-slot.sql
@@ -0,0 +1,15 @@
+create extension toytable;
+
+set toytable.num_tuples = 10000;
+
+-- This is a variant of the test in 'polygon' regression test, but using
+-- the toy table. It used to fail an assertion, because index scan would
+-- sometimes use the reorderqueue's slot, which is a heaptuple slot, and
+-- sometimes the table's scan slot, which is a virtual slot with toytable
+CREATE TABLE quad_poly_toy (id int, p polygon) USING toytable;
+
+CREATE INDEX quad_poly_toy_idx ON quad_poly_toy USING spgist(p);
+
+set enable_seqscan=off;
+
+SELECT * FROM quad_poly_toy ORDER BY p <-> point '123,456' LIMIT 10;
diff --git a/src/test/modules/toytable/toytableam.c b/src/test/modules/toytable/toytableam.c
index 4cb2b5d75db..35ce6ce4709 100644
--- a/src/test/modules/toytable/toytableam.c
+++ b/src/test/modules/toytable/toytableam.c
@@ -30,10 +30,39 @@
 #include "utils/rel.h"
 #include "storage/bufmgr.h"
 
+/*
+ * Number of rows the fake table data will include.
+ * Can be set with the toytable.num_tuples GUC.
+ */
+int num_toy_tuples;
+
 PG_MODULE_MAGIC;
 
 PG_FUNCTION_INFO_V1(toytableam_handler);
 
+void _PG_init(void);
+
+
+/*
+ * Module Load Callback
+ */
+void
+_PG_init(void)
+{
+	/* Define custom GUC variables */
+	DefineCustomIntVariable("toytable.num_tuples",
+							"Number of tuples the in the fake table data",
+							NULL,
+							&num_toy_tuples,
+							10,
+							0, INT_MAX / 1000,
+							PGC_USERSET,
+							0,
+							NULL,
+							NULL,
+							NULL);
+}
+
 typedef struct
 {
 	TableScanDescData scan;
@@ -109,13 +138,14 @@ invent_toy_data(Oid atttypid, int idx, bool *isnull_p)
 		case BOXOID:
 			d = DirectFunctionCall1(box_in,
 									CStringGetDatum(psprintf("(%d,%d),(%d,%d)",
-															 idx - 10, idx - 10, idx+10, idx+10)));
+															 idx, idx, idx+10, idx+10)));
 			isnull = false;
 			break;
 
 		case POLYGONOID:
 			d = DirectFunctionCall1(poly_in,
-									CStringGetDatum(psprintf("%d,%d,%d,%d", 0,0, idx, idx)));
+									CStringGetDatum(psprintf("%d,%d,%d,%d",
+															 idx,idx, idx+10, idx+10)));
 			isnull = false;
 			break;
 
@@ -143,7 +173,7 @@ toyam_scan_getnextslot(TableScanDesc scan,
 	 * Return a constant 1 rows. Every int4 attribute gets
 	 * a running count, everything else is NULL.
 	 */
-	if (tscan->tupidx < 10)
+	if (tscan->tupidx < num_toy_tuples)
 	{
 		TupleDesc desc = RelationGetDescr(tscan->scan.rs_rd);
 
@@ -232,7 +262,7 @@ toyam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		return false;
 
 	tupidx = ItemPointerGetOffsetNumber(tid);
-	if (tupidx < 1 || tupidx > 10)
+	if (tupidx < 1 || tupidx > num_toy_tuples)
 		return false;
 
 	slot->tts_nvalid = 0;
@@ -527,7 +557,7 @@ toyam_index_build_range_scan(Relation heap_rel,
 	ExecDropSingleTupleTableSlot(slot);
 	FreeExecutorState(estate);
 
-	return 10;
+	return num_toy_tuples;
 }
 
 static void
-- 
2.20.1

0004-Fix-confusion-on-type-of-slot-used-for-Index-Scan-s-.patchtext/x-patch; name=0004-Fix-confusion-on-type-of-slot-used-for-Index-Scan-s-.patchDownload
From 41ee331aeb7e81d367f6dcf6796170e896b71d6d Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 9 Apr 2019 15:37:29 +0300
Subject: [PATCH 4/4] Fix confusion on type of slot used for Index Scan's
 projections.

If there are ORDER BY keys, so that the index scan node needs to use the
reorder queue, then it wil sometimes use the underlying table's slot
directly, and sometimes the reorder queue's slot, for evaluating
projections and quals. If they are not both of the same type, then we
must not set the 'scanopsfixed' flag.
---
 src/backend/executor/nodeIndexscan.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 399ac0109c3..21744b219c9 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -901,6 +901,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 {
 	IndexScanState *indexstate;
 	Relation	currentRelation;
+	const TupleTableSlotOps *table_slot_ops;
 	LOCKMODE	lockmode;
 
 	/*
@@ -927,11 +928,19 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	indexstate->ss.ss_currentScanDesc = NULL;	/* no heap scan here */
 
 	/*
-	 * get the scan type from the relation descriptor.
+	 * Initialize the scan slot.
+	 *
+	 * With the reorder queue, we will sometimes use the reorderqueue's slot,
+	 * which uses heap ops, and sometimes the table AM's slot directly.  We
+	 * have to set scanopsfixed to false, unless the table AM also uses heap
+	 * ops.
 	 */
+	table_slot_ops = table_slot_callbacks(currentRelation);
 	ExecInitScanTupleSlot(estate, &indexstate->ss,
 						  RelationGetDescr(currentRelation),
-						  table_slot_callbacks(currentRelation));
+						  table_slot_ops);
+	if (node->indexorderby && table_slot_ops != &TTSOpsHeapTuple)
+		indexstate->ss.ps.scanopsfixed = false;
 
 	/*
 	 * Initialize result type and projection.
-- 
2.20.1

#147Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Andres Freund (#145)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On 08/04/2019 20:37, Andres Freund wrote:

Hi,

On 2019-04-08 15:34:46 +0300, Heikki Linnakangas wrote:

There were a bunch of typos in the comments in tableam.h, see attached. Some
of the comments could use more copy-editing and clarification, I think, but
I stuck to fixing just typos and such for now.

I pushed these after adding three boring changes by pgindent. Thanks for
those!

I'd greatly welcome more feedback on the comments - I've been pretty
deep in this for so long that I don't see all of the issues anymore. And
a mild dyslexia doesn't help...

Here is another iteration on the comments. The patch is a mix of
copy-editing and questions. The questions are marked with "HEIKKI:". I
can continue the copy-editing, if you can reply to the questions,
clarifying the intention on some parts of the API. (Or feel free to pick
and push any of these fixes immediately, if you prefer.)

- Heikki

Attachments:

tableam-h-rewording-and-questions.patchtext/x-patch; name=tableam-h-rewording-and-questions.patchDownload
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f7f726b5aec..bbcab9ce31a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3638,7 +3638,7 @@ static struct config_string ConfigureNamesString[] =
 		{"default_table_access_method", PGC_USERSET, CLIENT_CONN_STATEMENT,
 			gettext_noop("Sets the default table access method for new tables."),
 			NULL,
-			GUC_IS_NAME
+			GUC_NOT_IN_SAMPLE | GUC_IS_NAME
 		},
 		&default_table_access_method,
 		DEFAULT_TABLE_ACCESS_METHOD,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 82de4cdcf2c..8aeeba38ca2 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -57,6 +57,8 @@ typedef struct TableScanDescData *TableScanDesc;
  * pointer to this structure.  The information here must be sufficient to
  * properly initialize each new TableScanDesc as workers join the scan, and it
  * must act as a information what to scan for those workers.
+ *
+ * This is stored in dynamic shared memory, so you can't use pointers here.
  */
 typedef struct ParallelTableScanDescData
 {
@@ -64,6 +66,11 @@ typedef struct ParallelTableScanDescData
 	bool		phs_syncscan;	/* report location to syncscan logic? */
 	bool		phs_snapshot_any;	/* SnapshotAny, not phs_snapshot_data? */
 	Size		phs_snapshot_off;	/* data for snapshot */
+
+	/*
+	 * Table AM specific data follows. After the table AM specific data
+	 * comes the snapshot data, at 'phs_snapshot_off'.
+	 */
 } ParallelTableScanDescData;
 typedef struct ParallelTableScanDescData *ParallelTableScanDesc;
 
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 6fbfcb96c98..d4709563e7e 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -91,8 +91,9 @@ typedef enum TM_Result
  * xmax is the outdating transaction's XID.  If the caller wants to visit the
  * replacement tuple, it must check that this matches before believing the
  * replacement is really a match.
+ * HEIKKI: matches what? xmin, but that's specific to the heapam.
  *
- * cmax is the outdating command's CID, but only when the failure code is
+ * cmax is the outdating command's CID. Only set when the failure code is
  * TM_SelfModified (i.e., something in the current transaction outdated the
  * tuple); otherwise cmax is zero.  (We make this restriction because
  * HeapTupleHeaderGetCmax doesn't work for tuples outdated in other
@@ -133,7 +134,11 @@ typedef void (*IndexBuildCallback) (Relation index,
  * returned by FormData_pg_am.amhandler.
  *
  * In most cases it's not appropriate to call the callbacks directly, use the
- * table_* wrapper functions instead.
+ * table_* wrapper functions instead. The descriptions of the callbacks here
+ * are written from the AM implementor's point of view. For more information
+ * on how to call them, see the wrapper functions. (If you update the comments
+ * on either a callback or a wrapper function, remember to also update the other
+ * one!)
  *
  * GetTableAmRoutine() asserts that required callbacks are filled in, remember
  * to update when adding a callback.
@@ -179,6 +184,12 @@ typedef struct TableAmRoutine
 	 *
 	 * if temp_snap is true, the snapshot will need to be deallocated at
 	 * scan_end.
+	 *
+	 * HEIKKI: table_scan_update_snapshot() changes the snapshot. That's
+	 * a bit surprising for the AM, no? Can it be called when a scan is
+	 * already in progress?
+	 *
+	 * HEIKKI: A flags bitmask argument would be more readable than 6 booleans
 	 */
 	TableScanDesc (*scan_begin) (Relation rel,
 								 Snapshot snapshot,
@@ -194,6 +205,9 @@ typedef struct TableAmRoutine
 	/*
 	 * Release resources and deallocate scan. If TableScanDesc.temp_snap,
 	 * TableScanDesc.rs_snapshot needs to be unregistered.
+	 *
+	 * HEIKKI: I find this 'temp_snap' thing pretty weird. Can't the caller handle
+	 * deregistering it?
 	 */
 	void		(*scan_end) (TableScanDesc scan);
 
@@ -221,6 +235,11 @@ typedef struct TableAmRoutine
 	/*
 	 * Estimate the size of shared memory needed for a parallel scan of this
 	 * relation. The snapshot does not need to be accounted for.
+	 *
+	 * HEIKKI: If this returns X, then the parallelscan_initialize() call
+	 * mustn't use more than X. So this is not just for optimization purposes,
+	 * for example. Not sure how to phrase that, but could use some
+	 * clarification.
 	 */
 	Size		(*parallelscan_estimate) (Relation rel);
 
@@ -228,6 +247,8 @@ typedef struct TableAmRoutine
 	 * Initialize ParallelTableScanDesc for a parallel scan of this relation.
 	 * `pscan` will be sized according to parallelscan_estimate() for the same
 	 * relation.
+	 *
+	 * HEIKKI: What does this return?
 	 */
 	Size		(*parallelscan_initialize) (Relation rel,
 											ParallelTableScanDesc pscan);
@@ -246,18 +267,22 @@ typedef struct TableAmRoutine
 	 */
 
 	/*
-	 * Prepare to fetch tuples from the relation, as needed when fetching
-	 * tuples for an index scan.  The callback has to return an
-	 * IndexFetchTableData, which the AM will typically embed in a larger
-	 * structure with additional information.
+	 * Prepare to fetch tuples from the relation, for an index scan.  The
+	 * callback has to return an IndexFetchTableData, which the AM will
+	 * typically embed in a larger structure with additional information.
 	 *
-	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
+	 * After this, the caller will call index_fetch_tuple(), as many times as
+	 * needed, to fetch the tuples.
 	 */
 	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
 	 * resources held in IndexFetchTableData.
+	 *
+	 * HEIKKI: Is this called between every call to index_fetch_tuple()?
+	 * Between every call to index_fetch_tuple(), except when call_again is
+	 * set? Can it be a no-op?
 	 */
 	void		(*index_fetch_reset) (struct IndexFetchTableData *data);
 
@@ -272,19 +297,22 @@ typedef struct TableAmRoutine
 	 * test, return true, false otherwise.
 	 *
 	 * Note that AMs that do not necessarily update indexes when indexed
-	 * columns do not change, need to return the current/correct version of
+	 * columns don't change, need to return the current/correct version of
 	 * the tuple that is visible to the snapshot, even if the tid points to an
 	 * older version of the tuple.
 	 *
 	 * *call_again is false on the first call to index_fetch_tuple for a tid.
-	 * If there potentially is another tuple matching the tid, *call_again
-	 * needs be set to true by index_fetch_tuple, signalling to the caller
+	 * If there potentially is another tuple matching the tid, the callback
+	 * needs to set *call_again to true, signalling to the caller
 	 * that index_fetch_tuple should be called again for the same tid.
 	 *
 	 * *all_dead, if all_dead is not NULL, should be set to true by
 	 * index_fetch_tuple iff it is guaranteed that no backend needs to see
-	 * that tuple. Index AMs can use that do avoid returning that tid in
+	 * that tuple. Index AMs can use that to avoid returning that tid in
 	 * future searches.
+	 *
+	 * HEIKKI: Should the snapshot be given in index_fetch_begin()? Can it
+	 * differ between calls?
 	 */
 	bool		(*index_fetch_tuple) (struct IndexFetchTableData *scan,
 									  ItemPointer tid,
@@ -302,6 +330,8 @@ typedef struct TableAmRoutine
 	 * Fetch tuple at `tid` into `slot`, after doing a visibility test
 	 * according to `snapshot`. If a tuple was found and passed the visibility
 	 * test, returns true, false otherwise.
+	 *
+	 * HEIKKI: explain how this differs from index_fetch_tuple.
 	 */
 	bool		(*tuple_fetch_row_version) (Relation rel,
 											ItemPointer tid,
@@ -311,14 +341,17 @@ typedef struct TableAmRoutine
 	/*
 	 * Return the latest version of the tuple at `tid`, by updating `tid` to
 	 * point at the newest version.
+	 *
+	 * HEIKKI: the latest version visible to the snapshot?
 	 */
 	void		(*tuple_get_latest_tid) (Relation rel,
 										 Snapshot snapshot,
 										 ItemPointer tid);
 
 	/*
-	 * Does the tuple in `slot` satisfy `snapshot`?  The slot needs to be of
-	 * the appropriate type for the AM.
+	 * Does the tuple in `slot` satisfy `snapshot`?
+	 *
+	 * The AM may modify the data underlying the tuple as a side-effect.
 	 */
 	bool		(*tuple_satisfies_snapshot) (Relation rel,
 											 TupleTableSlot *slot,
@@ -413,8 +446,8 @@ typedef struct TableAmRoutine
 	 */
 
 	/*
-	 * This callback needs to create a new relation filenode for `rel`, with
-	 * appropriate durability behaviour for `persistence`.
+	 * Create a new relation filenode for `rel`, with appropriate durability
+	 * behaviour for `persistence`.
 	 *
 	 * On output *freezeXid, *minmulti must be set to the values appropriate
 	 * for pg_class.{relfrozenxid, relminmxid}. For AMs that don't need those
@@ -429,24 +462,40 @@ typedef struct TableAmRoutine
 											  MultiXactId *minmulti);
 
 	/*
-	 * This callback needs to remove all contents from `rel`'s current
-	 * relfilenode. No provisions for transactional behaviour need to be made.
-	 * Often this can be implemented by truncating the underlying storage to
-	 * its minimal size.
-	 *
-	 * See also table_relation_nontransactional_truncate().
+	 * Remove all rows from `rel`'s current relfilenode. No provisions for
+	 * transactional behaviour need to be made. Often this can be implemented
+	 * by truncating the underlying storage to its minimal size.
 	 */
 	void		(*relation_nontransactional_truncate) (Relation rel);
 
 	/*
-	 * See table_relation_copy_data().
+	 * Copy data from `rel` into the new relfilenode `newrnode`. The new
+	 * relfilenode might not have storage associated with it before this
+	 * callback is called. This is used for low level operations like
+	 * changing a relation's tablespace.
 	 *
 	 * This can typically be implemented by directly copying the underlying
 	 * storage, unless it contains references to the tablespace internally.
 	 */
 	void		(*relation_copy_data) (Relation rel, RelFileNode newrnode);
 
-	/* See table_relation_copy_for_cluster() */
+	/*
+	 * Copy all data from `OldHeap` into `NewHeap`, as part of a CLUSTER or
+	 * VACUUM FULL.
+	 *
+	 * If `OldIndex` is valid, the data should be ordered according to the
+	 * given index. If `use_sort` is false, the data should be fetched from the
+	 * index, otherwise it should be fetched from the old table and sorted.
+	 *
+	 * OldestXmin, FreezeXid, MultiXactCutoff are currently valid values for
+	 * the table.
+	 * HEIKKI: What does "currently valid" mean? Valid for the old table?
+	 *
+	 * The callback should set *num_tuples, *tups_vacuumed, *tups_recently_dead
+	 * to statistics computed while copying for the relation. Not all might make
+	 * sense for every AM.
+	 * HEIKKI: What to do for the ones that don't make sense? Set to 0?
+	 */
 	void		(*relation_copy_for_cluster) (Relation NewHeap,
 											  Relation OldHeap,
 											  Relation OldIndex,
@@ -466,9 +515,8 @@ typedef struct TableAmRoutine
 	 * On entry a transaction is already established, and the relation is
 	 * locked with a ShareUpdateExclusive lock.
 	 *
-	 * Note that neither VACUUM FULL (and CLUSTER), nor ANALYZE go through
-	 * this routine, even if (for ANALYZE) it is part of the same VACUUM
-	 * command.
+	 * Note that VACUUM FULL (or CLUSTER) does not use this callback.
+	 * Neither does ANALYZE, even if it is part of the same VACUUM command.
 	 *
 	 * There probably, in the future, needs to be a separate callback to
 	 * integrate with autovacuum's scheduling.
@@ -479,13 +527,13 @@ typedef struct TableAmRoutine
 
 	/*
 	 * Prepare to analyze block `blockno` of `scan`. The scan has been started
-	 * with table_beginscan_analyze().  See also
-	 * table_scan_analyze_next_block().
+	 * with table_beginscan_analyze().
 	 *
 	 * The callback may acquire resources like locks that are held until
-	 * table_scan_analyze_next_tuple() returns false. It e.g. can make sense
+	 * table_scan_analyze_next_tuple() returns false. For example, it can make sense
 	 * to hold a lock until all tuples on a block have been analyzed by
 	 * scan_analyze_next_tuple.
+	 * HEIKKI: Hold a lock on what? A lwlock on the page?
 	 *
 	 * The callback can return false if the block is not suitable for
 	 * sampling, e.g. because it's a metapage that could never contain tuples.
@@ -589,8 +637,8 @@ typedef struct TableAmRoutine
 										   struct TBMIterateResult *tbmres);
 
 	/*
-	 * Fetch the next tuple of a bitmap table scan into `slot` and return true
-	 * if a visible tuple was found, false otherwise.
+	 * Fetch the next visible tuple of a bitmap table scan into `slot`. If a
+	 * tuple was found, returns true, false otherwise.
 	 *
 	 * For some AMs it will make more sense to do all the work referencing
 	 * `tbmres` contents in scan_bitmap_next_block, for others it might be
@@ -618,6 +666,8 @@ typedef struct TableAmRoutine
 	 * internally needs to perform mapping between the internal and a block
 	 * based representation.
 	 *
+	 * HEIKKI: What TsmRoutine? Where is that?
+	 *
 	 * Note that it's not acceptable to hold deadlock prone resources such as
 	 * lwlocks until scan_sample_next_tuple() has exhausted the tuples on the
 	 * block - the tuple is likely to be returned to an upper query node, and
@@ -632,9 +682,11 @@ typedef struct TableAmRoutine
 										   struct SampleScanState *scanstate);
 
 	/*
-	 * This callback, only called after scan_sample_next_block has returned
-	 * true, should determine the next tuple to be returned from the selected
-	 * block using the TsmRoutine's NextSampleTuple() callback.
+	 * Return the next tuple in a sample scan.
+	 *
+	 * This callback will only be called after scan_sample_next_block has
+	 * returned true. It should determine the next tuple to be returned from
+	 * the selected block using the TsmRoutine's NextSampleTuple() callback.
 	 *
 	 * The callback needs to perform visibility checks, and only return
 	 * visible tuples. That obviously can mean calling NextSampletuple()
@@ -657,15 +709,15 @@ typedef struct TableAmRoutine
  */
 
 /*
- * Returns slot callbacks suitable for holding tuples of the appropriate type
+ * Return slot callbacks suitable for holding tuples of the appropriate type
  * for the relation.  Works for tables, views, foreign tables and partitioned
  * tables.
  */
 extern const TupleTableSlotOps *table_slot_callbacks(Relation rel);
 
 /*
- * Returns slot using the callbacks returned by table_slot_callbacks(), and
- * registers it on *reglist.
+ * Return a slot using the callbacks returned by table_slot_callbacks(), and
+ * register it on *reglist.
  */
 extern TupleTableSlot *table_slot_create(Relation rel, List **reglist);
 
@@ -676,8 +728,8 @@ extern TupleTableSlot *table_slot_create(Relation rel, List **reglist);
  */
 
 /*
- * Start a scan of `rel`. Returned tuples pass a visibility test of
- * `snapshot`, and if nkeys != 0, the results are filtered by those scan keys.
+ * Start a scan of `rel`. Returned tuples are visible according to `snapshot`,
+ * and if nkeys != 0, the results are filtered by those scan keys.
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
@@ -688,18 +740,24 @@ table_beginscan(Relation rel, Snapshot snapshot,
 }
 
 /*
- * Like table_beginscan(), but for scanning catalog. It'll automatically use a
- * snapshot appropriate for scanning catalog relations.
+ * Like table_beginscan(), but for scanning a system catalog. It will
+ * automatically use a snapshot appropriate for scanning catalog relations.
  */
 extern TableScanDesc table_beginscan_catalog(Relation rel, int nkeys,
 						struct ScanKeyData *key);
 
 /*
  * Like table_beginscan(), but table_beginscan_strat() offers an extended API
- * that lets the caller control whether a nondefault buffer access strategy
- * can be used, and whether syncscan can be chosen (possibly resulting in the
- * scan not starting from block zero).  Both of these default to true with
- * plain table_beginscan.
+ * that lets the caller to use a non-default buffer access strategy, or
+ * specify that a synchronized scan can be used (possibly resulting in the
+ * scan not starting from block zero).  Both of these default to true, as
+ * with plain table_beginscan.
+ *
+ * HEIKKI: I'm a bit confused by 'allow_strat'. What is the non-default
+ * strategy that will get used if you pass allow_strat=true? Perhaps the flag
+ * should be called "use_bulkread_strategy"? Or it should be of type
+ * BufferAccessStrategyType, or the caller should create a strategy with
+ * GetAccessStrategy() and pass that.
  */
 static inline TableScanDesc
 table_beginscan_strat(Relation rel, Snapshot snapshot,
@@ -712,10 +770,10 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 }
 
 /*
- * table_beginscan_bm is an alternative entry point for setting up a
- * TableScanDesc for a bitmap heap scan.  Although that scan technology is
- * really quite unlike a standard seqscan, there is just enough commonality to
- * make it worth using the same data structure.
+ * table_beginscan_bm() is an alternative entry point for setting up a
+ * TableScanDesc for a bitmap heap scan.  Although a bitmap scan is
+ * really quite unlike a standard seqscan, there is just enough commonality
+ * that it makes sense to use a TableScanDesc for both.
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
@@ -726,11 +784,13 @@ table_beginscan_bm(Relation rel, Snapshot snapshot,
 }
 
 /*
- * table_beginscan_sampling is an alternative entry point for setting up a
+ * table_beginscan_sampling() is an alternative entry point for setting up a
  * TableScanDesc for a TABLESAMPLE scan.  As with bitmap scans, it's worth
  * using the same data structure although the behavior is rather different.
  * In addition to the options offered by table_beginscan_strat, this call
  * also allows control of whether page-mode visibility checking is used.
+ *
+ * HEIKKI: What is 'pagemode'?
  */
 static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
@@ -973,9 +1033,6 @@ table_get_latest_tid(Relation rel, Snapshot snapshot, ItemPointer tid)
  *
  * This assumes the slot's tuple is valid, and of the appropriate type for the
  * AM.
- *
- * Some AMs might modify the data underlying the tuple as a side-effect. If so
- * they ought to mark the relevant buffer dirty.
  */
 static inline bool
 table_tuple_satisfies_snapshot(Relation rel, TupleTableSlot *slot,
@@ -1004,31 +1061,33 @@ table_compute_xid_horizon_for_tuples(Relation rel,
  */
 
 /*
- * Insert a tuple from a slot into table AM routine.
+ * Insert a tuple from a slot into the table.
  *
- * The options bitmask allows to specify options that allow to change the
- * behaviour of the AM. Several options might be ignored by AMs not supporting
- * them.
+ * The options bitmask allows changing the behaviour of the AM. The AM may
+ * ignore options that it does not support.
  *
  * If the TABLE_INSERT_SKIP_WAL option is specified, the new tuple doesn't
  * need to be logged to WAL, even for a non-temp relation. It is the AMs
  * choice whether this optimization is supported.
  *
- * If the TABLE_INSERT_SKIP_FSM option is specified, AMs are free to not reuse
- * free space in the relation. This can save some cycles when we know the
+ * The TABLE_INSERT_SKIP_FSM option is a hint that the AM should not bother
+ * to try reusing free space. This can save some cycles when we know the
  * relation is new and doesn't contain useful amounts of free space.  It's
  * commonly passed directly to RelationGetBufferForTuple, see for more info.
  *
- * TABLE_INSERT_FROZEN should only be specified for inserts into
- * relfilenodes created during the current subtransaction and when
- * there are no prior snapshots or pre-existing portals open.
- * This causes rows to be frozen, which is an MVCC violation and
- * requires explicit options chosen by user.
+ * TABLE_INSERT_FROZEN means that the AM may skip normal MVCC rules for the
+ * tuples, so that they become immediately visible to everyone. That is a
+ * violation of the normal MVCC rules, and requires specific action from the
+ * user, currently only used for the COPY FREEZE option. Even then, it should
+ * only be specified for inserts into relfilenodes created during the current
+ * subtransaction and when there are no prior snapshots or pre-existing portals
+ * open.
  *
  * TABLE_INSERT_NO_LOGICAL force-disables the emitting of logical decoding
  * information for the tuple. This should solely be used during table rewrites
  * where RelationIsLogicallyLogged(relation) is not yet accurate for the new
  * relation.
+ * HEIKKI: Is this optional, too? Can the AM ignore it?
  *
  * Note that most of these options will be applied when inserting into the
  * heap's TOAST table, too, if the tuple requires any out-of-line data.
@@ -1041,6 +1100,8 @@ table_compute_xid_horizon_for_tuples(Relation rel,
  * On return the slot's tts_tid and tts_tableOid are updated to reflect the
  * insertion. But note that any toasting of fields within the slot is NOT
  * reflected in the slots contents.
+ *
+ * HEIKKI: I think GetBulkInsertState() should be an AM-specific callback.
  */
 static inline void
 table_insert(Relation rel, TupleTableSlot *slot, CommandId cid,
@@ -1089,9 +1150,8 @@ table_complete_speculative(Relation rel, TupleTableSlot *slot,
  * operation. That's often faster than calling table_insert() in a loop,
  * because e.g. the AM can reduce WAL logging and page locking overhead.
  *
- * Except for taking `nslots` tuples as input, as an array of TupleTableSlots
- * in `slots`, the parameters for table_multi_insert() are the same as for
- * table_insert().
+ * The parameters are the same as for table_insert(), except for taking
+ * an array of slots.
  *
  * Note: this leaks memory into the current memory context. You can create a
  * temporary context before calling this, if that's a problem.
@@ -1115,20 +1175,25 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
  *	tid - TID of tuple to be deleted
  *	cid - delete command ID (used for visibility test, and stored into
  *		cmax if successful)
+ * HEIKKI: description for 'snapshot' parameter is missing
  *	crosscheck - if not InvalidSnapshot, also check tuple against this
  *	wait - true if should wait for any conflicting update to commit/abort
- * Output parameters:
- *	tmfd - filled in failure cases (see below)
  *	changingPart - true iff the tuple is being moved to another partition
  *		table due to an update of the partition key. Otherwise, false.
+ * Output parameters:
+ *	tmfd - filled in failure cases (see below)
+ *
+ * HEIKKI: What's the AM supposed to do differently if 'changingPart' is set?
+ * (I know, it's supposed to set the t_ctid to the magic "moved partition"
+ * value. Explain that)
  *
  * Normal, successful return value is TM_Ok, which means we did actually
- * delete it.  Failure return codes are TM_SelfModified, TM_Updated, and
- * TM_BeingModified (the last only possible if wait == false).
+ * delete it.  Failure return codes are TM_SelfModified, TM_Updated,
+ * TM_Deleted, and TM_BeingModified (the last only possible if wait == false).
  *
  * In the failure cases, the routine fills *tmfd with the tuple's t_ctid,
- * t_xmax, and, if possible, and, if possible, t_cmax.  See comments for
- * struct TM_FailureData for additional info.
+ * t_xmax, and, if possible, t_cmax.  See comments for struct TM_FailureData
+ * for additional info.
  */
 static inline TM_Result
 table_delete(Relation rel, ItemPointer tid, CommandId cid,
@@ -1161,8 +1226,8 @@ table_delete(Relation rel, ItemPointer tid, CommandId cid,
  *		are required for this tuple
  *
  * Normal, successful return value is TM_Ok, which means we did actually
- * update it.  Failure return codes are TM_SelfModified, TM_Updated, and
- * TM_BeingModified (the last only possible if wait == false).
+ * update it.  Failure return codes are TM_SelfModified, TM_Updated,
+ * TM_Deleted, and TM_BeingModified (the last only possible if wait == false).
  *
  * On success, the slot's tts_tid and tts_tableOid are updated to match the new
  * stored tuple; in particular, slot->tts_tid is set to the TID where the
@@ -1170,6 +1235,9 @@ table_delete(Relation rel, ItemPointer tid, CommandId cid,
  * update was done.  However, any TOAST changes in the new tuple's
  * data are not reflected into *newtup.
  *
+ * HEIKKI: There is no 'newtup'.
+ * HEIKKI: HEAP_ONLY_TUPLE is AM-specific; do the callers peek into that, currently?
+ *
  * In the failure cases, the routine fills *tmfd with the tuple's t_ctid,
  * t_xmax, and, if possible, t_cmax.  See comments for struct TM_FailureData
  * for additional info.
@@ -1233,8 +1301,8 @@ table_lock_tuple(Relation rel, ItemPointer tid, Snapshot snapshot,
 /*
  * Perform operations necessary to complete insertions made via
  * tuple_insert and multi_insert with a BulkInsertState specified. This
- * e.g. may e.g. used to flush the relation when inserting with
- * TABLE_INSERT_SKIP_WAL specified.
+ * may e.g. be used to flush the relation when the TABLE_INSERT_SKIP_WAL
+ * option was used.
  */
 static inline void
 table_finish_bulk_insert(Relation rel, int options)
@@ -1257,8 +1325,8 @@ table_finish_bulk_insert(Relation rel, int options)
  * This is used both during relation creation and various DDL operations to
  * create a new relfilenode that can be filled from scratch.
  *
- * *freezeXid, *minmulti are set to the xid / multixact horizon for the table
- * that pg_class.{relfrozenxid, relminmxid} have to be set to.
+ * The function sets *freezeXid, *minmulti to the xid / multixact horizon
+ * values that the table's pg_class.{relfrozenxid, relminmxid} have to be set to.
  */
 static inline void
 table_relation_set_new_filenode(Relation rel, char persistence,
@@ -1271,7 +1339,7 @@ table_relation_set_new_filenode(Relation rel, char persistence,
 
 /*
  * Remove all table contents from `rel`, in a non-transactional manner.
- * Non-transactional meaning that there's no need to support rollbacks. This
+ * Non-transactional, meaning that this cannot be rolled back. This is
  * commonly only is used to perform truncations for relfilenodes created in the
  * current transaction.
  */
@@ -1294,20 +1362,19 @@ table_relation_copy_data(Relation rel, RelFileNode newrnode)
 }
 
 /*
- * Copy data from `OldHeap` into `NewHeap`, as part of a CLUSTER or VACUUM
- * FULL.
+ * Copy all data from `OldHeap` into `NewHeap`, as part of a CLUSTER or
+ * VACUUM FULL.
  *
- * If `use_sort` is true, the table contents are sorted appropriate for
- * `OldIndex`; if use_sort is false and OldIndex is not InvalidOid, the data
- * is copied in that index's order; if use_sort is false and OidIndex is
- * InvalidOid, no sorting is performed.
+ * If `OldIndex` is valid, the data should be ordered according to the
+ * given index. If `use_sort` is false, the data should be fetched from the
+ * index, otherwise it should be fetched from the old table and sorted.
  *
  * OldestXmin, FreezeXid, MultiXactCutoff must be currently valid values for
  * the table.
  *
- * *num_tuples, *tups_vacuumed, *tups_recently_dead will contain statistics
- * computed while copying for the relation. Not all might make sense for every
- * AM.
+ * The function sets *num_tuples, *tups_vacuumed, *tups_recently_dead with
+ * statistics computed while copying for the relation. Not all might make sense
+ * for every AM.
  */
 static inline void
 table_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
@@ -1369,7 +1436,7 @@ table_scan_analyze_next_block(TableScanDesc scan, BlockNumber blockno,
  * is stored in `slot`.
  *
  * *liverows and *deadrows are incremented according to the encountered
- * tuples.
+ * tuples, if the AM has the concept of live and dead tuples.
  */
 static inline bool
 table_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
@@ -1384,34 +1451,36 @@ table_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
 /*
  * table_index_build_scan - scan the table to find tuples to be indexed
  *
- * This is called back from an access-method-specific index build procedure
- * after the AM has done whatever setup it needs.  The parent heap relation
- * is scanned to find tuples that should be entered into the index.  Each
- * such tuple is passed to the AM's callback routine, which does the right
- * things to add it to the new index.  After we return, the AM's index
- * build procedure does whatever cleanup it needs.
+ * This is called by the index-AM-specific index build procedure, after the
+ * index AM has done whatever setup it needs.  This function scans the parent
+ * table, to find tuples that should be entered into the index.  Each such tuple
+ * is passed to the index AM's callback routine, which does the right things to
+ * add it to the new index.  After this returns, the index AM's index build
+ * procedure does whatever cleanup it needs.
  *
  * The total count of live tuples is returned.  This is for updating pg_class
  * statistics.  (It's annoying not to be able to do that here, but we want to
  * merge that update with others; see index_update_stats.)  Note that the
  * index AM itself must keep track of the number of index tuples; we don't do
- * so here because the AM might reject some of the tuples for its own reasons,
+ * so here because the index AM might reject some of the tuples for its own reasons,
  * such as being unable to store NULLs.
  *
- * If 'progress', the PROGRESS_SCAN_BLOCKS_TOTAL counter is updated when
+ * If 'progress' is true, the PROGRESS_SCAN_BLOCKS_TOTAL counter is updated when
  * starting the scan, and PROGRESS_SCAN_BLOCKS_DONE is updated as we go along.
  *
- * A side effect is to set indexInfo->ii_BrokenHotChain to true if we detect
+ * A side effect is to set index_info->ii_BrokenHotChain to true if we detect
  * any potentially broken HOT chains.  Currently, we set this if there are any
  * RECENTLY_DEAD or DELETE_IN_PROGRESS entries in a HOT chain, without trying
  * very hard to detect whether they're really incompatible with the chain tip.
  * This only really makes sense for heap AM, it might need to be generalized
  * for other AMs later.
+ *
+ * HEIKKI: What does 'allow_sync' do?
  */
 static inline double
 table_index_build_scan(Relation heap_rel,
 					   Relation index_rel,
-					   struct IndexInfo *index_nfo,
+					   struct IndexInfo *index_info,
 					   bool allow_sync,
 					   bool progress,
 					   IndexBuildCallback callback,
@@ -1420,7 +1489,7 @@ table_index_build_scan(Relation heap_rel,
 {
 	return heap_rel->rd_tableam->index_build_range_scan(heap_rel,
 														index_rel,
-														index_nfo,
+														index_info,
 														allow_sync,
 														false,
 														progress,
@@ -1432,12 +1501,12 @@ table_index_build_scan(Relation heap_rel,
 }
 
 /*
- * As table_index_build_scan(), except that instead of scanning the complete
- * table, only the given number of blocks are scanned.  Scan to end-of-rel can
+ * As table_index_build_scan(), but instead of scanning the complete
+ * table, only the given block range is scanned.  Scan to end-of-rel can
  * be signalled by passing InvalidBlockNumber as numblocks.  Note that
  * restricting the range to scan cannot be done when requesting syncscan.
  *
- * When "anyvisible" mode is requested, all tuples visible to any transaction
+ * When 'anyvisible' mode is requested, all tuples visible to any transaction
  * are indexed and counted as live, including those inserted or deleted by
  * transactions that are still in progress.
  */
#148Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#147)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-04-11 14:52:40 +0300, Heikki Linnakangas wrote:

Here is another iteration on the comments. The patch is a mix of
copy-editing and questions. The questions are marked with "HEIKKI:". I can
continue the copy-editing, if you can reply to the questions, clarifying the
intention on some parts of the API. (Or feel free to pick and push any of
these fixes immediately, if you prefer.)

Thanks!

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f7f726b5aec..bbcab9ce31a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3638,7 +3638,7 @@ static struct config_string ConfigureNamesString[] =
{"default_table_access_method", PGC_USERSET, CLIENT_CONN_STATEMENT,
gettext_noop("Sets the default table access method for new tables."),
NULL,
-			GUC_IS_NAME
+			GUC_NOT_IN_SAMPLE | GUC_IS_NAME
},
&default_table_access_method,
DEFAULT_TABLE_ACCESS_METHOD,

Hm, I think we should rather add it to sample. That's an oversight, not
intentional.

index 6fbfcb96c98..d4709563e7e 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -91,8 +91,9 @@ typedef enum TM_Result
* xmax is the outdating transaction's XID.  If the caller wants to visit the
* replacement tuple, it must check that this matches before believing the
* replacement is really a match.
+ * HEIKKI: matches what? xmin, but that's specific to the heapam.

It's basically just the old comment moved. I wonder if we can just get
rid of that field - because the logic to follow update chains correctly
is now inside the lock tuple callback. And as you say - it's not clear
what callers can do with it for the purpose of following chains. The
counter-argument is that having it makes it a lot less annoying to adapt
external code that wants to adapt with the minimal set of changes, and
only is really interested in supporting heap for now.

* GetTableAmRoutine() asserts that required callbacks are filled in, remember
* to update when adding a callback.
@@ -179,6 +184,12 @@ typedef struct TableAmRoutine
*
* if temp_snap is true, the snapshot will need to be deallocated at
* scan_end.
+	 *
+	 * HEIKKI: table_scan_update_snapshot() changes the snapshot. That's
+	 * a bit surprising for the AM, no? Can it be called when a scan is
+	 * already in progress?

Yea, it can be called when the scan is in-progress. I think we probably
should just fix calling code to not need that - it's imo weird that
nodeBitmapHeapscan.c doesn't just delay starting the scan till it has
the snapshot. This isn't new code, but it's now going to be exposed to
more AMs, so I think there's a good argument to fix it now.

Robert: You committed that addition, in

commit f35742ccb7aa53ee3ed8416bbb378b0c3eeb6bb9
Author: Robert Haas <rhaas@postgresql.org>
Date: 2017-03-08 12:05:43 -0500

Support parallel bitmap heap scans.

do you remember why that's done?

+ * HEIKKI: A flags bitmask argument would be more readable than 6 booleans
*/
TableScanDesc (*scan_begin) (Relation rel,
Snapshot snapshot,

I honestly don't have strong feelings about it. Not sure that I buy that
bitmasks would be much more readable - but perhaps we could just use the
struct trickery we started to use in

commit f831d4accda00b9144bc647ede2e2f848b59f39d
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: 2019-02-01 11:29:42 -0300

Add ArchiveOpts to pass options to ArchiveEntry

@@ -194,6 +205,9 @@ typedef struct TableAmRoutine
/*
* Release resources and deallocate scan. If TableScanDesc.temp_snap,
* TableScanDesc.rs_snapshot needs to be unregistered.
+	 *
+	 * HEIKKI: I find this 'temp_snap' thing pretty weird. Can't the caller handle
+	 * deregistering it?
*/
void		(*scan_end) (TableScanDesc scan);

It's old logic, just wrapped new. I think there's some argument that
some of this should be moved to tableam.c rather than the individual
AMs.

@@ -221,6 +235,11 @@ typedef struct TableAmRoutine
/*
* Estimate the size of shared memory needed for a parallel scan of this
* relation. The snapshot does not need to be accounted for.
+	 *
+	 * HEIKKI: If this returns X, then the parallelscan_initialize() call
+	 * mustn't use more than X. So this is not just for optimization purposes,
+	 * for example. Not sure how to phrase that, but could use some
+	 * clarification.
*/
Size		(*parallelscan_estimate) (Relation rel);

Hm. I thought I'd done that by adding the note about the amount of space
parallelscan_initialize() getting memory sized by
parallelscan_estimate().

/*
* Reset index fetch. Typically this will release cross index fetch
* resources held in IndexFetchTableData.
+	 *
+	 * HEIKKI: Is this called between every call to index_fetch_tuple()?
+	 * Between every call to index_fetch_tuple(), except when call_again is
+	 * set? Can it be a no-op?
*/
void		(*index_fetch_reset) (struct IndexFetchTableData *data);

It's basically just to release resources eagerly. I'll add a note.

@@ -272,19 +297,22 @@ typedef struct TableAmRoutine
* test, return true, false otherwise.
*
* Note that AMs that do not necessarily update indexes when indexed
-	 * columns do not change, need to return the current/correct version of
+	 * columns don't change, need to return the current/correct version of
* the tuple that is visible to the snapshot, even if the tid points to an
* older version of the tuple.
* *call_again is false on the first call to index_fetch_tuple for a tid.
-	 * If there potentially is another tuple matching the tid, *call_again
-	 * needs be set to true by index_fetch_tuple, signalling to the caller
+	 * If there potentially is another tuple matching the tid, the callback
+	 * needs to set *call_again to true, signalling to the caller
* that index_fetch_tuple should be called again for the same tid.
*
* *all_dead, if all_dead is not NULL, should be set to true by
* index_fetch_tuple iff it is guaranteed that no backend needs to see
-	 * that tuple. Index AMs can use that do avoid returning that tid in
+	 * that tuple. Index AMs can use that to avoid returning that tid in
* future searches.
+	 *
+	 * HEIKKI: Should the snapshot be given in index_fetch_begin()? Can it
+	 * differ between calls?
*/
bool		(*index_fetch_tuple) (struct IndexFetchTableData *scan,
ItemPointer tid,

Hm. It could very well differ between calls. E.g. _bt_check_unique()
could benefit from that (although it currently uses the
table_index_fetch_tuple_check() wrapper), as it does one lookup with
SnapshotDirty, and then the next with SnapshotSelf.

@@ -302,6 +330,8 @@ typedef struct TableAmRoutine
* Fetch tuple at `tid` into `slot`, after doing a visibility test
* according to `snapshot`. If a tuple was found and passed the visibility
* test, returns true, false otherwise.
+	 *
+	 * HEIKKI: explain how this differs from index_fetch_tuple.
*/
bool		(*tuple_fetch_row_version) (Relation rel,
ItemPointer tid,

Currently the wrapper has:

* See table_index_fetch_tuple's comment about what the difference between
* these functions is. This function is the correct to use outside of
* index entry->table tuple lookups.

referencing

* The difference between this function and table_fetch_row_version is that
* this function returns the currently visible version of a row if the AM
* supports storing multiple row versions reachable via a single index entry
* (like heap's HOT). Whereas table_fetch_row_version only evaluates the
* tuple exactly at `tid`. Outside of index entry ->table tuple lookups,
* table_fetch_row_version is what's usually needed.

should we just duplicate that?

@@ -311,14 +341,17 @@ typedef struct TableAmRoutine
/*
* Return the latest version of the tuple at `tid`, by updating `tid` to
* point at the newest version.
+	 *
+	 * HEIKKI: the latest version visible to the snapshot?
*/
void		(*tuple_get_latest_tid) (Relation rel,
Snapshot snapshot,
ItemPointer tid);

It's such a bad interface :(. I'd love to just remove it. Based on

/messages/by-id/17ef5a8a-71cb-5cbf-1762-dbb71626f84e@dream.email.ne.jp

I think we can basically just remove currtid_byreloid/byrelname. I've
not sufficiently thought about TidNext() yet.

/*
-	 * Does the tuple in `slot` satisfy `snapshot`?  The slot needs to be of
-	 * the appropriate type for the AM.
+	 * Does the tuple in `slot` satisfy `snapshot`?
+	 *
+	 * The AM may modify the data underlying the tuple as a side-effect.
*/
bool		(*tuple_satisfies_snapshot) (Relation rel,
TupleTableSlot *slot,

Hm, this obviously should be moved here from the wrapper. But I now
wonder if we can't phrase this better. Might try to come up with
something.

+	/*
+	 * Copy all data from `OldHeap` into `NewHeap`, as part of a CLUSTER or
+	 * VACUUM FULL.
+	 *
+	 * If `OldIndex` is valid, the data should be ordered according to the
+	 * given index. If `use_sort` is false, the data should be fetched from the
+	 * index, otherwise it should be fetched from the old table and sorted.
+	 *
+	 * OldestXmin, FreezeXid, MultiXactCutoff are currently valid values for
+	 * the table.
+	 * HEIKKI: What does "currently valid" mean? Valid for the old table?

They are system-wide values, basically. Not sure into how much detail
about that to go her?

+	 * The callback should set *num_tuples, *tups_vacuumed, *tups_recently_dead
+	 * to statistics computed while copying for the relation. Not all might make
+	 * sense for every AM.
+	 * HEIKKI: What to do for the ones that don't make sense? Set to 0?
+	 */

I don't see much of an alternative, yea. I suspect we're going to have
to expand vacuum's reporting once we have a better grasp about what
other AMs want / need.

/*
* Prepare to analyze block `blockno` of `scan`. The scan has been started
-	 * with table_beginscan_analyze().  See also
-	 * table_scan_analyze_next_block().
+	 * with table_beginscan_analyze().
*
* The callback may acquire resources like locks that are held until
-	 * table_scan_analyze_next_tuple() returns false. It e.g. can make sense
+	 * table_scan_analyze_next_tuple() returns false. For example, it can make sense
* to hold a lock until all tuples on a block have been analyzed by
* scan_analyze_next_tuple.
+	 * HEIKKI: Hold a lock on what? A lwlock on the page?

Yea, that's what heapam does. I'm not particularly happy with this, but
I'm not sure how to do better. I expect that we'll have to revise this
to be more general at some not too far away point.

@@ -618,6 +666,8 @@ typedef struct TableAmRoutine
* internally needs to perform mapping between the internal and a block
* based representation.
*
+	 * HEIKKI: What TsmRoutine? Where is that?

I'm not sure what you mean. The SampleScanState has it's associated
tablesample routine. Would saying something like "will call the
NextSampleBlock() callback for the TsmRoutine associated with the
SampleScanState" be better?

/*
* Like table_beginscan(), but table_beginscan_strat() offers an extended API
- * that lets the caller control whether a nondefault buffer access strategy
- * can be used, and whether syncscan can be chosen (possibly resulting in the
- * scan not starting from block zero).  Both of these default to true with
- * plain table_beginscan.
+ * that lets the caller to use a non-default buffer access strategy, or
+ * specify that a synchronized scan can be used (possibly resulting in the
+ * scan not starting from block zero).  Both of these default to true, as
+ * with plain table_beginscan.
+ *
+ * HEIKKI: I'm a bit confused by 'allow_strat'. What is the non-default
+ * strategy that will get used if you pass allow_strat=true? Perhaps the flag
+ * should be called "use_bulkread_strategy"? Or it should be of type
+ * BufferAccessStrategyType, or the caller should create a strategy with
+ * GetAccessStrategy() and pass that.
*/

That's really just a tableam port of the pre-existing heapam interface.
I don't like the API very much, but there's only so much things that
were realistic to change during this project (I think, there were
obviously lots of judgement calls). I don't think there's much reason
to defend the current status - and I'm happy to collaborate on fixing
that. But I think it's a out of scope for 12.

/*
- * table_beginscan_sampling is an alternative entry point for setting up a
+ * table_beginscan_sampling() is an alternative entry point for setting up a
* TableScanDesc for a TABLESAMPLE scan.  As with bitmap scans, it's worth
* using the same data structure although the behavior is rather different.
* In addition to the options offered by table_beginscan_strat, this call
* also allows control of whether page-mode visibility checking is used.
+ *
+ * HEIKKI: What is 'pagemode'?
*/

That's a good question. My not defining it is pretty much a cop-out,
because there previously wasn't any explanation, and I wasn't sure there
*is* a meaningful definition. I mean, it's basically largely an
efficiency hack inside heapam.c, but it's currently externally
determined e.g. in bernoulli.c (code from 11):

* Use bulkread, since we're scanning all pages. But pagemode visibility
* checking is a win only at larger sampling fractions. The 25% cutoff
* here is based on very limited experimentation.
*/
node->use_bulkread = true;
node->use_pagemode = (percent >= 25);

If you have a suggestion how to either get rid of it, or how to properly
phrase this...

* TABLE_INSERT_NO_LOGICAL force-disables the emitting of logical decoding
* information for the tuple. This should solely be used during table rewrites
* where RelationIsLogicallyLogged(relation) is not yet accurate for the new
* relation.
+ * HEIKKI: Is this optional, too? Can the AM ignore it?

Hm. Currently logical decoding isn't really extensible automatically to
an AM (it works via WAL and WAL isn't extensible) - so it'll currently
not mean anything to non-heap AMs (or AMs that patch/are part of core).

* Note that most of these options will be applied when inserting into the
* heap's TOAST table, too, if the tuple requires any out-of-line data.
@@ -1041,6 +1100,8 @@ table_compute_xid_horizon_for_tuples(Relation rel,
* On return the slot's tts_tid and tts_tableOid are updated to reflect the
* insertion. But note that any toasting of fields within the slot is NOT
* reflected in the slots contents.
+ *
+ * HEIKKI: I think GetBulkInsertState() should be an AM-specific callback.
*/

I agree. There was some of that in an earlier version of the patch, but
the interface wasn't yet right. I think there's a lot such things that
just need to be added incrementally.

@@ -1170,6 +1235,9 @@ table_delete(Relation rel, ItemPointer tid, CommandId cid,
* update was done.  However, any TOAST changes in the new tuple's
* data are not reflected into *newtup.
*
+ * HEIKKI: There is no 'newtup'.
+ * HEIKKI: HEAP_ONLY_TUPLE is AM-specific; do the callers peek into that, currently?

No, callers currently don't. The callback does, and sets
*update_indexes accordingly.

- * A side effect is to set indexInfo->ii_BrokenHotChain to true if we detect
+ * A side effect is to set index_info->ii_BrokenHotChain to true if we detect
* any potentially broken HOT chains.  Currently, we set this if there are any
* RECENTLY_DEAD or DELETE_IN_PROGRESS entries in a HOT chain, without trying
* very hard to detect whether they're really incompatible with the chain tip.
* This only really makes sense for heap AM, it might need to be generalized
* for other AMs later.
+ *
+ * HEIKKI: What does 'allow_sync' do?

Heh, I'm going to be responsible for everything that was previously
undocumented, aren't I ;). I guess we should say something vague like
"When allow_sync is set to true, an AM may use scans synchronized with
other backends, if that makes sense. For some AMs that determines
whether tuples are going to be returned in TID order".
It's vague, but I'm not sure we can do better.

Thanks!

Greetings,

Andres Freund

#149Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#148)
Re: Pluggable Storage - Andres's take

On Thu, Apr 11, 2019 at 12:49 PM Andres Freund <andres@anarazel.de> wrote:

@@ -179,6 +184,12 @@ typedef struct TableAmRoutine
*
* if temp_snap is true, the snapshot will need to be deallocated at
* scan_end.
+      *
+      * HEIKKI: table_scan_update_snapshot() changes the snapshot. That's
+      * a bit surprising for the AM, no? Can it be called when a scan is
+      * already in progress?

Yea, it can be called when the scan is in-progress. I think we probably
should just fix calling code to not need that - it's imo weird that
nodeBitmapHeapscan.c doesn't just delay starting the scan till it has
the snapshot. This isn't new code, but it's now going to be exposed to
more AMs, so I think there's a good argument to fix it now.

Robert: You committed that addition, in

commit f35742ccb7aa53ee3ed8416bbb378b0c3eeb6bb9
Author: Robert Haas <rhaas@postgresql.org>
Date: 2017-03-08 12:05:43 -0500

Support parallel bitmap heap scans.

do you remember why that's done?

I don't think there was any brilliant idea behind it. Delaying the
scan start until it has the snapshot seems like a good idea.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#150Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#148)
Re: Pluggable Storage - Andres's take

Andres Freund <andres@anarazel.de> writes:

On 2019-04-11 14:52:40 +0300, Heikki Linnakangas wrote:

+ * HEIKKI: A flags bitmask argument would be more readable than 6 booleans

I honestly don't have strong feelings about it. Not sure that I buy that
bitmasks would be much more readable

Sure they would be --- how's FLAG_FOR_FOO | FLAG_FOR_BAR not
better than unlabeled "true" and "false"?

- but perhaps we could just use the
struct trickery we started to use in

I find that rather ugly really. If we're doing something other than a
dozen-or-so booleans, maybe its the only viable option. But for cases
where a flags argument will serve, that's our longstanding practice and
I don't see a reason to deviate.

regards, tom lane

#151Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#143)
1 attachment(s)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-04-08 15:34:46 +0300, Heikki Linnakangas wrote:

The comments for relation_set_new_relfilenode() callback say that the AM can
set *freezeXid and *minmulti to invalid. But when I did that, VACUUM hits
this assertion:

TRAP: FailedAssertion("!(((classForm->relfrozenxid) >= ((TransactionId)
3)))", File: "vacuum.c", Line: 1323)

Hm, that necessary change unfortunately escaped into the the zheap tree
(which indeed doesn't set relfrozenxid). That's why I'd not noticed
this. How about something like the attached?

I found a related problem in VACUUM FULL / CLUSTER while working on the
above, not fixed in the attached yet. Namely even if a relation doesn't
yet have a valid relfrozenxid/relminmxid before a VACUUM FULL / CLUSTER,
we'll set one after that. That's not great.

I suspect the easiest fix would be to to make the relevant
relation_copy_for_cluster() FreezeXid, MultiXactCutoff arguments into
pointer, and allow the AM to reset them to an invalid value if that's
the appropriate one.

It'd probably be better if we just moved the entire xid limit
computation into the AM, but I'm worried that we actually need to move
it *further up* instead - independent of this change. I don't think it's
quite right to allow a table with a toast table to be independently
VACUUM FULL/CLUSTERed from the toast table. GetOldestXmin() can go
backwards for a myriad of reasons (or limited by
old_snapshot_threshold), and I'm fairly certain that e.g. VACUUM FULLing
the toast table, setting a lower old_snapshot_threshold, and VACUUM
FULLing the main table would result in failures.

I think we need to fix this for 12, rather than wait for 13. Does
anybody disagree?

Greetings,

Andres Freund

Attachments:

0001-Allow-pg_class-xid-multixid-horizons-to-not-be-set.patchtext/x-diff; charset=us-asciiDownload
From 5c84256ea5e41055b0cb9e0dc121a4daaca43336 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 18 Apr 2019 13:20:31 -0700
Subject: [PATCH] Allow pg_class xid & multixid horizons to not be set.

This allows table AMs that don't need these horizons. This was already
documented in the tableam relation_set_new_filenode callback, but an
assert prevented if from actually working (the test AM contained the
ne necessary AM itself).

Reported-By: Heikki Linnakangas
Author: Andres Freund
Discussion: https://postgr.es/m/9a7fb9cc-2419-5db7-8840-ddc10c93f122@iki.fi
---
 src/backend/access/heap/vacuumlazy.c |  4 +++
 src/backend/commands/vacuum.c        | 53 ++++++++++++++++++++--------
 src/backend/postmaster/autovacuum.c  |  4 +--
 3 files changed, 45 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8dc76fa8583..9364cd4c33f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -213,6 +213,10 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 	Assert(params != NULL);
 	Assert(params->index_cleanup != VACOPT_TERNARY_DEFAULT);
 
+	/* not every AM requires these to be valid, but heap does */
+	Assert(TransactionIdIsNormal(onerel->rd_rel->relfrozenxid));
+	Assert(MultiXactIdIsValid(onerel->rd_rel->relminmxid));
+
 	/* measure elapsed time iff autovacuum logging requires it */
 	if (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
 	{
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 1a7291d94bc..94fb6f26063 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1313,36 +1313,61 @@ vac_update_datfrozenxid(void)
 
 		/*
 		 * Only consider relations able to hold unfrozen XIDs (anything else
-		 * should have InvalidTransactionId in relfrozenxid anyway.)
+		 * should have InvalidTransactionId in relfrozenxid anyway).
 		 */
 		if (classForm->relkind != RELKIND_RELATION &&
 			classForm->relkind != RELKIND_MATVIEW &&
 			classForm->relkind != RELKIND_TOASTVALUE)
+		{
+			Assert(!TransactionIdIsValid(classForm->relfrozenxid));
+			Assert(!MultiXactIdIsValid(classForm->relminmxid));
 			continue;
-
-		Assert(TransactionIdIsNormal(classForm->relfrozenxid));
-		Assert(MultiXactIdIsValid(classForm->relminmxid));
+		}
 
 		/*
+		 * Some table AMs might not need per-relation xid / multixid
+		 * horizons. It therefore seems reasonable to allow relfrozenxid and
+		 * relminmxid to not be set (i.e. set to their respective Invalid*Id)
+		 * independently. Thus validate and compute horizon for each only if
+		 * set.
+		 *
 		 * If things are working properly, no relation should have a
 		 * relfrozenxid or relminmxid that is "in the future".  However, such
 		 * cases have been known to arise due to bugs in pg_upgrade.  If we
 		 * see any entries that are "in the future", chicken out and don't do
-		 * anything.  This ensures we won't truncate clog before those
-		 * relations have been scanned and cleaned up.
+		 * anything.  This ensures we won't truncate clog & multixact SLRUs
+		 * before those relations have been scanned and cleaned up.
 		 */
-		if (TransactionIdPrecedes(lastSaneFrozenXid, classForm->relfrozenxid) ||
-			MultiXactIdPrecedes(lastSaneMinMulti, classForm->relminmxid))
+
+		if (TransactionIdIsValid(classForm->relfrozenxid))
 		{
-			bogus = true;
-			break;
+			Assert(TransactionIdIsNormal(classForm->relfrozenxid));
+
+			/* check for values in the future */
+			if (TransactionIdPrecedes(lastSaneFrozenXid, classForm->relfrozenxid))
+			{
+				bogus = true;
+				break;
+			}
+
+			/* determine new horizon */
+			if (TransactionIdPrecedes(classForm->relfrozenxid, newFrozenXid))
+				newFrozenXid = classForm->relfrozenxid;
 		}
 
-		if (TransactionIdPrecedes(classForm->relfrozenxid, newFrozenXid))
-			newFrozenXid = classForm->relfrozenxid;
+		if (MultiXactIdIsValid(classForm->relminmxid))
+		{
+			/* check for values in the future */
+			if (MultiXactIdPrecedes(lastSaneMinMulti, classForm->relminmxid))
+			{
+				bogus = true;
+				break;
+			}
 
-		if (MultiXactIdPrecedes(classForm->relminmxid, newMinMulti))
-			newMinMulti = classForm->relminmxid;
+			/* determine new horizon */
+			if (MultiXactIdPrecedes(classForm->relminmxid, newMinMulti))
+				newMinMulti = classForm->relminmxid;
+		}
 	}
 
 	/* we're done with pg_class */
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 0976029e737..53c91d92778 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -3033,8 +3033,8 @@ relation_needs_vacanalyze(Oid relid,
 		multiForceLimit = recentMulti - multixact_freeze_max_age;
 		if (multiForceLimit < FirstMultiXactId)
 			multiForceLimit -= FirstMultiXactId;
-		force_vacuum = MultiXactIdPrecedes(classForm->relminmxid,
-										   multiForceLimit);
+		force_vacuum = MultiXactIdIsValid(classForm->relminmxid) &&
+			MultiXactIdPrecedes(classForm->relminmxid, multiForceLimit);
 	}
 	*wraparound = force_vacuum;
 
-- 
2.21.0.dirty

#152Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#143)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-04-08 15:34:46 +0300, Heikki Linnakangas wrote:

index_update_stats() calls RelationGetNumberOfBlocks(<table>). If the AM
doesn't use normal data files, that won't work. I bumped into that with my
toy implementation, which wouldn't need to create any data files, if it
wasn't for this.

There are a few more of these:

1) index_update_stats(), computing pg_class.relpages

Feels like the number of both heap and index blocks should be
computed by the index build and stored in IndexInfo. That'd also get
a bit closer towards allowing indexams not going through smgr (useful
e.g. for memory only ones).

2) commands/analyze.c, computing pg_class.relpages

This should imo be moved to the tableam callback. It's currently done
a bit weirdly imo, with fdws computing relpages the callback, but
then also returning the acquirefunc. Seems like it should entirely be
computed as part of calling acquirefunc.

3) nodeTidscan, skipping over too large tids
I think this should just be moved into the AMs, there's no need to
have this in nodeTidscan.c

4) freespace.c, used for the new small-rels-have-no-fsm paths.
That's being revised currently anyway. But I'm not particularly
concerned even if it stays as is - freespace use is optional
anyway. And I can't quite see an AM that doesn't want to use
postgres' storage mechanism wanting to use freespace.c

Therefore I'm inclined not to thouch this independent of fixing the
others.

I think none of these are critical issues for tableam, but we should fix
them.

I'm not sure about doing so for v12 though. 1) and 3) are fairly
trivial, but 2) would involve changing the FDW interface, by changing
the AnalyzeForeignTable, AcquireSampleRowsFunc signatures. But OTOH,
we're not even in beta1.

Comments?

Greetings,

Andres Freund

#153Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#152)
Re: Pluggable Storage - Andres's take

Andres Freund <andres@anarazel.de> writes:

... I think none of these are critical issues for tableam, but we should fix
them.

I'm not sure about doing so for v12 though. 1) and 3) are fairly
trivial, but 2) would involve changing the FDW interface, by changing
the AnalyzeForeignTable, AcquireSampleRowsFunc signatures. But OTOH,
we're not even in beta1.

Probably better to fix those API issues now rather than later.

regards, tom lane

#154Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#153)
Re: Pluggable Storage - Andres's take

On Tue, Apr 23, 2019 at 6:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

... I think none of these are critical issues for tableam, but we should fix
them.

I'm not sure about doing so for v12 though. 1) and 3) are fairly
trivial, but 2) would involve changing the FDW interface, by changing
the AnalyzeForeignTable, AcquireSampleRowsFunc signatures. But OTOH,
we're not even in beta1.

Probably better to fix those API issues now rather than later.

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#155Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#152)
1 attachment(s)
Re: Pluggable Storage - Andres's take

Hi Heikki, Ashwin, Tom,

On 2019-04-23 15:52:01 -0700, Andres Freund wrote:

On 2019-04-08 15:34:46 +0300, Heikki Linnakangas wrote:

index_update_stats() calls RelationGetNumberOfBlocks(<table>). If the AM
doesn't use normal data files, that won't work. I bumped into that with my
toy implementation, which wouldn't need to create any data files, if it
wasn't for this.

There are a few more of these:

I'm not sure about doing so for v12 though. 1) and 3) are fairly
trivial, but 2) would involve changing the FDW interface, by changing
the AnalyzeForeignTable, AcquireSampleRowsFunc signatures. But OTOH,
we're not even in beta1.

Hm. I think some of those changes would be a bit bigger than I initially
though. Attached is a more minimal fix that'd route
RelationGetNumberOfBlocksForFork() through tableam if necessary. I
think it's definitely the right answer for 1), probably the pragmatic
answer to 2), but certainly not for 3).

I've for now made the AM return the size in bytes, and then convert that
into blocks in RelationGetNumberOfBlocksForFork(). Most postgres callers
are going to continue to want it internally as pages (otherwise there's
going to be way too much churn, without a benefit I can see). So I
thinkt that's OK.

There's also a somewhat weird bit of returning the total relation size
for InvalidForkNumber - it's pretty likely that other AMs wouldn't use
postgres' current forks, but have equivalent concepts. And without that
there'd be no way to get that size. I'm not sure I like this, input
welcome. But it seems good to offer the ability to get the entire size
somehow.

Btw, isn't RelationGetNumberOfBlocksForFork() currently weirdly placed?
I don't see why bufmgr.c would be appropriate? Although I don't think
it's particulary clear where it'd best reside - I'd tentatively say
storage.c.

Heikki, Ashwin, your inputs would be appreciated here, in particular the
tid fetch bit below.

The attached patch isn't intended to be applied as-is, just basis for
discussion.

1) index_update_stats(), computing pg_class.relpages

Feels like the number of both heap and index blocks should be
computed by the index build and stored in IndexInfo. That'd also get
a bit closer towards allowing indexams not going through smgr (useful
e.g. for memory only ones).

Due to parallel index builds that'd actually be hard. Given the number
of places wanting to compute relpages for pg_class I think the above
patch routing RelationGetNumberOfBlocksForFork() through tableam is the
right fix.

2) commands/analyze.c, computing pg_class.relpages

This should imo be moved to the tableam callback. It's currently done
a bit weirdly imo, with fdws computing relpages the callback, but
then also returning the acquirefunc. Seems like it should entirely be
computed as part of calling acquirefunc.

Here I'm not sure routing RelationGetNumberOfBlocksForFork() through
tableam wouldn't be the right minimal approach too. It has the
disadvantage of implying certain values for the
RelationGetNumberOfBlocksForFork(MAIN) return value. The alternative
would be to return the desire sampling range in
table_beginscan_analyze() - but that'd require some duplication because
currently that just uses the generic scan_begin() callback.

I suspect - as previously mentioned- that we're going to have to extend
statistics collection beyond the current approach at some point, but I
don't think that's now. At least to me it's not clear how to best
represent the stats, and how to best use them, if the underlying storage
is fundamentally not block best. Nor how we'd avoid code duplication...

3) nodeTidscan, skipping over too large tids
I think this should just be moved into the AMs, there's no need to
have this in nodeTidscan.c

I think here it's *not* actually correct at all to use the relation
size. It's currently doing:

/*
* We silently discard any TIDs that are out of range at the time of scan
* start. (Since we hold at least AccessShareLock on the table, it won't
* be possible for someone to truncate away the blocks we intend to
* visit.)
*/
nblocks = RelationGetNumberOfBlocks(tidstate->ss.ss_currentRelation);

which is fine (except for a certain abstraction leakage) for an AM like
heap or zheap, but I suspect strongly that that's not ok for Ashwin &
Heikki's approach where tid isn't tied to physical representation.

The obvious answer would be to just move that check into the
table_fetch_row_version implementation (currently just calling
heap_fetch()) - but that doesn't seem OK from a performance POV, because
we'd then determine the relation size once for each tid, rather than
once per tidscan. And it'd also check in cases where we know the tid is
supposed to be valid (e.g. fetching trigger tuples and such).

The proper fix seems to be to introduce a new scan variant
(e.g. table_beginscan_tid()), and then have table_fetch_row_version take
a scan as a parameter. But it seems we'd have to introduce that as a
separate tableam callback, because we'd not want to incur the overhead
of creating an additional scan / RelationGetNumberOfBlocks() checks for
triggers et al.

Greetings,

Andres Freund

Attachments:

tableam-size.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6584a9cb8da..132e9466450 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1962,6 +1962,31 @@ heapam_scan_get_blocks_done(HeapScanDesc hscan)
 }
 
 
+/* ------------------------------------------------------------------------
+ * Miscellaneous callbacks for the heap AM
+ * ------------------------------------------------------------------------
+ */
+
+static uint64
+heapam_relation_size(Relation rel, ForkNumber forkNumber)
+{
+	uint64		nblocks = 0;
+
+	/* Open it at the smgr level if not already done */
+	RelationOpenSmgr(rel);
+
+	/* InvalidForkNumber indicates the size of all forks */
+	if (forkNumber == InvalidForkNumber)
+	{
+		for (int i = 0; i < MAX_FORKNUM; i++)
+			nblocks += smgrnblocks(rel->rd_smgr, i);
+	}
+	else
+		nblocks = smgrnblocks(rel->rd_smgr, forkNumber);
+
+	return nblocks * BLCKSZ;
+}
+
 
 /* ------------------------------------------------------------------------
  * Planner related callbacks for the heap AM
@@ -2543,6 +2568,8 @@ static const TableAmRoutine heapam_methods = {
 	.index_build_range_scan = heapam_index_build_range_scan,
 	.index_validate_scan = heapam_index_validate_scan,
 
+	.relation_size = heapam_relation_size,
+
 	.relation_estimate_size = heapam_estimate_rel_size,
 
 	.scan_bitmap_next_block = heapam_scan_bitmap_next_block,
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index bfd713f3af1..2b632e002c4 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -86,6 +86,9 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->scan_analyze_next_tuple != NULL);
 	Assert(routine->index_build_range_scan != NULL);
 	Assert(routine->index_validate_scan != NULL);
+
+	Assert(routine->relation_size != NULL);
+
 	Assert(routine->relation_estimate_size != NULL);
 
 	/* optional, but one callback implies presence of hte other */
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 887023fc8a5..10ef0de78b4 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -33,6 +33,7 @@
 #include <sys/file.h>
 #include <unistd.h>
 
+#include "access/tableam.h"
 #include "access/xlog.h"
 #include "catalog/catalog.h"
 #include "catalog/storage.h"
@@ -2789,14 +2790,50 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln)
 /*
  * RelationGetNumberOfBlocksInFork
  *		Determines the current number of pages in the specified relation fork.
+ *
+ * Note that the accuracy of the result will depend on the details of the
+ * relation's storage. For all builtin AMs it'll be accurate, but for external
+ * AMs it might not perfectly be.
  */
 BlockNumber
 RelationGetNumberOfBlocksInFork(Relation relation, ForkNumber forkNum)
 {
-	/* Open it at the smgr level if not already done */
-	RelationOpenSmgr(relation);
+	switch (relation->rd_rel->relkind)
+	{
+		case RELKIND_SEQUENCE:
+		case RELKIND_INDEX:
+		case RELKIND_PARTITIONED_INDEX:
+			/* Open it at the smgr level if not already done */
+			RelationOpenSmgr(relation);
 
-	return smgrnblocks(relation->rd_smgr, forkNum);
+			return smgrnblocks(relation->rd_smgr, forkNum);
+
+		case RELKIND_RELATION:
+		case RELKIND_TOASTVALUE:
+		case RELKIND_MATVIEW:
+			{
+				/*
+				 * Not every table AM uses BLCKSZ wide fixed size
+				 * blocks. Therefore tableam returns the size in bytes - but
+				 * for the purpose of this routine, we want the number of
+				 * blocks. Therefore divide, rounding up.
+				 */
+				uint64 szbytes;
+
+				szbytes = table_relation_size(relation, forkNum);
+
+				return (szbytes + (BLCKSZ - 1)) / BLCKSZ;
+			}
+		case RELKIND_VIEW:
+		case RELKIND_COMPOSITE_TYPE:
+		case RELKIND_FOREIGN_TABLE:
+		case RELKIND_PARTITIONED_TABLE:
+		default:
+			Assert(false);
+			break;
+	}
+
+	return 0; /* satisfy compiler */
 }
 
 /*
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index c018a44267a..00d105af9e5 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -533,6 +533,22 @@ typedef struct TableAmRoutine
 										struct ValidateIndexState *state);
 
 
+	/* ------------------------------------------------------------------------
+	 * Miscellaneous functions.
+	 * ------------------------------------------------------------------------
+	 */
+
+	/*
+	 * See table_relation_size().
+	 *
+	 * Note that currently a few use the MAIN_FORKNUM size to vet the validity
+	 * of tids (e.g. nodeTidscans.c), and others use it to figure out the
+	 * range of potentially interesting blocks (brin, analyze). The
+	 * abstraction around this will need to be improved in the near future.
+	 */
+	uint64		(*relation_size) (Relation rel, ForkNumber forkNumber);
+
+
 	/* ------------------------------------------------------------------------
 	 * Planner related functions.
 	 * ------------------------------------------------------------------------
@@ -543,6 +559,10 @@ typedef struct TableAmRoutine
 	 *
 	 * While block oriented, it shouldn't be too hard for an AM that doesn't
 	 * doesn't internally use blocks to convert into a usable representation.
+	 *
+	 * This differs from the relation_size callback by returning size
+	 * estimates (both relation size and tuple count) for planning purposes,
+	 * rather than returning a currently correct estimate.
 	 */
 	void		(*relation_estimate_size) (Relation rel, int32 *attr_widths,
 										   BlockNumber *pages, double *tuples,
@@ -1492,6 +1512,26 @@ table_index_validate_scan(Relation heap_rel,
 }
 
 
+/* ----------------------------------------------------------------------------
+ * Miscellaneous functionality
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * Return the current size of `rel` in bytes. If `forkNumber` is
+ * InvalidForkNumber return the relation's overall size, otherwise the size
+ * for the indicated fork.
+ *
+ * Note that the overall size might not be the equivalent of the sum of sizes
+ * for the individual forks for some AMs, e.g. because the AMs storage does
+ * not neatly map onto the builtin types of forks.
+ */
+static inline uint64
+table_relation_size(Relation rel, ForkNumber forkNumber)
+{
+	return rel->rd_tableam->relation_size(rel, forkNumber);
+}
+
 /* ----------------------------------------------------------------------------
  * Planner related functionality
  * ----------------------------------------------------------------------------
#156Ashwin Agrawal
aagrawal@pivotal.io
In reply to: Andres Freund (#155)
Re: Pluggable Storage - Andres's take

On Thu, Apr 25, 2019 at 3:43 PM Andres Freund <andres@anarazel.de> wrote:

Hm. I think some of those changes would be a bit bigger than I initially
though. Attached is a more minimal fix that'd route
RelationGetNumberOfBlocksForFork() through tableam if necessary. I
think it's definitely the right answer for 1), probably the pragmatic
answer to 2), but certainly not for 3).

I've for now made the AM return the size in bytes, and then convert that
into blocks in RelationGetNumberOfBlocksForFork(). Most postgres callers
are going to continue to want it internally as pages (otherwise there's
going to be way too much churn, without a benefit I can see). So I
think that's OK.

I will provide my inputs, Heikki please correct me or add your inputs.

I am not sure how much gain this practically provides, if rest of the
system continues to use the value returned in-terms of blocks. I
understand things being block based (and not just really block based
but all the blocks of relation are storing data and full tuple) is
engraved in the system. So, breaking out of it is yes much larger
change and not just limited to table AM API.

I feel most of the issues discussed here should be faced by zheap as
well, as not all blocks/pages contain data like TPD pages should be
excluded from sampling and TID scans, etc...

There's also a somewhat weird bit of returning the total relation size
for InvalidForkNumber - it's pretty likely that other AMs wouldn't use
postgres' current forks, but have equivalent concepts. And without that
there'd be no way to get that size. I'm not sure I like this, input
welcome. But it seems good to offer the ability to get the entire size
somehow.

Yes, I do think we should have mechanism to get total size and just
size for specific purpose. Zedstore currently doesn't use forks. Just
a thought, instead of calling it forknum as argument, call it
something like data and meta-data or main-data and auxiliary-data
size. Though I don't know if usage exists where wish to get size of
just some non MAIN fork for heap/zheap, those pieces of code shouldn't
be in generic areas instead in AM specific code only.

2) commands/analyze.c, computing pg_class.relpages

This should imo be moved to the tableam callback. It's currently done
a bit weirdly imo, with fdws computing relpages the callback, but
then also returning the acquirefunc. Seems like it should entirely be
computed as part of calling acquirefunc.

Here I'm not sure routing RelationGetNumberOfBlocksForFork() through
tableam wouldn't be the right minimal approach too. It has the
disadvantage of implying certain values for the
RelationGetNumberOfBlocksForFork(MAIN) return value. The alternative
would be to return the desire sampling range in
table_beginscan_analyze() - but that'd require some duplication because
currently that just uses the generic scan_begin() callback.

Yes, just routing relation size via AM layer and using its returned
value in terms of blocks still and performing sampling based on blocks
based on it, doesn't feel resolves the issue. Maybe need to delegate
sampling completely to AM layer. Code duplication can be avoided by
similar AMs (heap and zheap) possible using some common utility
functions to achieve intended result.

I suspect - as previously mentioned- that we're going to have to extend
statistics collection beyond the current approach at some point, but I
don't think that's now. At least to me it's not clear how to best
represent the stats, and how to best use them, if the underlying storage
is fundamentally not block best. Nor how we'd avoid code duplication...

Yes, will have to give more thoughts into this.

3) nodeTidscan, skipping over too large tids
I think this should just be moved into the AMs, there's no need to
have this in nodeTidscan.c

I think here it's *not* actually correct at all to use the relation
size. It's currently doing:

/*
* We silently discard any TIDs that are out of range at the time of scan
* start. (Since we hold at least AccessShareLock on the table, it won't
* be possible for someone to truncate away the blocks we intend to
* visit.)
*/
nblocks = RelationGetNumberOfBlocks(tidstate->ss.ss_currentRelation);

which is fine (except for a certain abstraction leakage) for an AM like
heap or zheap, but I suspect strongly that that's not ok for Ashwin &
Heikki's approach where tid isn't tied to physical representation.

Agree, its not nice to have that optimization being performed based on
number of block in generic layer. I feel its not efficient either for
zheap too due to TPD pages as mentioned above, as number of blocks
returned will be higher compared to actually data blocks.

The obvious answer would be to just move that check into the
table_fetch_row_version implementation (currently just calling
heap_fetch()) - but that doesn't seem OK from a performance POV, because
we'd then determine the relation size once for each tid, rather than
once per tidscan. And it'd also check in cases where we know the tid is
supposed to be valid (e.g. fetching trigger tuples and such).

Agree, checking relation size per tuple is out of possible solutions.

The proper fix seems to be to introduce a new scan variant
(e.g. table_beginscan_tid()), and then have table_fetch_row_version take
a scan as a parameter. But it seems we'd have to introduce that as a
separate tableam callback, because we'd not want to incur the overhead
of creating an additional scan / RelationGetNumberOfBlocks() checks for
triggers et al.

Thinking out loud here, we can possibly tackle this in multiple ways.
First above mentioned check seems more optimization to me than
functionally needed, correct me if wrong. If that's true we can check
with AM if wish to apply that optimization or not based on relation
size. Like for Zedstore, instead of performing this optimization
directly call fetch row version and zedstore can quickly bail out
based on TID passed to it, as in meta page has highest allocated TID
value. With concurrent inserts though it may perform more work.

Or other alternative could be instead of getting relation size. Add
callback to get highest TID value from AM. heap and zheap can return
TID using highest block number and max TID that block can have.
Zedstore can return the highest TID it has assigned so far. Either use
the TID then perform the check and not just block-number. Or extract
block number from the TID and use that instead for the check. That
would at least work for AMs we know of so far and hard to imagine for
AMs doesn't exist yet, how this will be used.

Irrespective of how we solve this problem, ctids are displayed and
need to be specified in (block, offset) fashion for tid scans :-)

#157Rafia Sabih
rafia.pghackers@gmail.com
In reply to: Heikki Linnakangas (#146)
Re: Pluggable Storage - Andres's take

On Tue, 9 Apr 2019 at 15:17, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 08/04/2019 20:37, Andres Freund wrote:

On 2019-04-08 15:34:46 +0300, Heikki Linnakangas wrote:

There's a little bug in index-only scan executor node, where it mixes up the
slots to hold a tuple from the index, and from the table. That doesn't cause
any ill effects if the AM uses TTSOpsHeapTuple, but with my toy AM, which
uses a virtual slot, it caused warnings like this from index-only scans:

Hm. That's another one that I think I had fixed previously :(, and then
concluded that it's not actually necessary for some reason. Your fix
looks correct to me. Do you want to commit it? Otherwise I'll look at
it after rebasing zheap, and checking it with that.

I found another slot type confusion bug, while playing with zedstore. In
an Index Scan, if you have an ORDER BY key that needs to be rechecked,
so that it uses the reorder queue, then it will sometimes use the
reorder queue slot, and sometimes the table AM's slot, for the scan
slot. If they're not of the same type, you get an assertion:

TRAP: FailedAssertion("!(op->d.fetch.kind == slot->tts_ops)", File:
"execExprInterp.c", Line: 1905)

Attached is a test for this, again using the toy table AM, extended to
be able to test this. And a fix.

Attached is a patch with the toy implementation I used to test this. I'm not
suggesting we should commit that - although feel free to do that if you
think it's useful - but it shows how I bumped into these issues.

Hm, probably not a bad idea to include something like it. It seems like
we kinda would need non-stub implementation of more functions for it to
test much / and to serve as an example. I'm mildy inclined to just do
it via zheap / externally, but I'm not quite sure that's good enough.

Works for me.

+static Size
+toyam_parallelscan_estimate(Relation rel)
+{
+    ereport(ERROR,
+                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                     errmsg("function %s not implemented yet", __func__)));
+}

The other stubbed functions seem like we should require them, but I
wonder if we should make the parallel stuff optional?

Yeah, that would be good. I would assume it to be optional.

I was trying the toyam patch and on make check it failed with
segmentation fault at

static void
toyam_relation_set_new_filenode(Relation rel,
char persistence,
TransactionId *freezeXid,
MultiXactId *minmulti)
{
*freezeXid = InvalidTransactionId;

Basically, on running create table t (i int, j int) using toytable,
leads to this segmentation fault.

Am I missing something here?

--
Regards,
Rafia Sabih

#158Andres Freund
andres@anarazel.de
In reply to: Rafia Sabih (#157)
Re: Pluggable Storage - Andres's take

Hi,

On May 6, 2019 3:40:55 AM PDT, Rafia Sabih <rafia.pghackers@gmail.com> wrote:

On Tue, 9 Apr 2019 at 15:17, Heikki Linnakangas <hlinnaka@iki.fi>
wrote:

On 08/04/2019 20:37, Andres Freund wrote:

On 2019-04-08 15:34:46 +0300, Heikki Linnakangas wrote:

There's a little bug in index-only scan executor node, where it

mixes up the

slots to hold a tuple from the index, and from the table. That

doesn't cause

any ill effects if the AM uses TTSOpsHeapTuple, but with my toy

AM, which

uses a virtual slot, it caused warnings like this from index-only

scans:

Hm. That's another one that I think I had fixed previously :(, and

then

concluded that it's not actually necessary for some reason. Your

fix

looks correct to me. Do you want to commit it? Otherwise I'll look

at

it after rebasing zheap, and checking it with that.

I found another slot type confusion bug, while playing with zedstore.

In

an Index Scan, if you have an ORDER BY key that needs to be

rechecked,

so that it uses the reorder queue, then it will sometimes use the
reorder queue slot, and sometimes the table AM's slot, for the scan
slot. If they're not of the same type, you get an assertion:

TRAP: FailedAssertion("!(op->d.fetch.kind == slot->tts_ops)", File:
"execExprInterp.c", Line: 1905)

Attached is a test for this, again using the toy table AM, extended

to

be able to test this. And a fix.

Attached is a patch with the toy implementation I used to test

this. I'm not

suggesting we should commit that - although feel free to do that

if you

think it's useful - but it shows how I bumped into these issues.

Hm, probably not a bad idea to include something like it. It seems

like

we kinda would need non-stub implementation of more functions for

it to

test much / and to serve as an example. I'm mildy inclined to just

do

it via zheap / externally, but I'm not quite sure that's good

enough.

Works for me.

+static Size
+toyam_parallelscan_estimate(Relation rel)
+{
+    ereport(ERROR,
+                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                     errmsg("function %s not implemented yet",

__func__)));

+}

The other stubbed functions seem like we should require them, but I
wonder if we should make the parallel stuff optional?

Yeah, that would be good. I would assume it to be optional.

I was trying the toyam patch and on make check it failed with
segmentation fault at

static void
toyam_relation_set_new_filenode(Relation rel,
char persistence,
TransactionId *freezeXid,
MultiXactId *minmulti)
{
*freezeXid = InvalidTransactionId;

Basically, on running create table t (i int, j int) using toytable,
leads to this segmentation fault.

Am I missing something here?

I assume you got compiler warmings compiling it? The API for some callbacks changed a bit.

Andred
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#159Ashwin Agrawal
aagrawal@pivotal.io
In reply to: Andres Freund (#158)
1 attachment(s)
Re: Pluggable Storage - Andres's take

On Mon, May 6, 2019 at 7:14 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On May 6, 2019 3:40:55 AM PDT, Rafia Sabih <rafia.pghackers@gmail.com> wrote:

I was trying the toyam patch and on make check it failed with
segmentation fault at

static void
toyam_relation_set_new_filenode(Relation rel,
char persistence,
TransactionId *freezeXid,
MultiXactId *minmulti)
{
*freezeXid = InvalidTransactionId;

Basically, on running create table t (i int, j int) using toytable,
leads to this segmentation fault.

Am I missing something here?

I assume you got compiler warmings compiling it? The API for some callbacks changed a bit.

Attached patch gets toy table AM implementation to match latest master API.
The patch builds on top of patch from Heikki in [1].
Compiles and works but the test still continues to fail with WARNING
for issue mentioned in [1]

Noticed the typo in recently added comment for relation_set_new_filenode().

* Note that only the subset of the relcache filled by
* RelationBuildLocalRelation() can be relied upon and that the
relation's
* catalog entries either will either not yet exist (new
relation), or
* will still reference the old relfilenode.

seems should be

* Note that only the subset of the relcache filled by
* RelationBuildLocalRelation() can be relied upon and that the
relation's
* catalog entries will either not yet exist (new relation), or
still
* reference the old relfilenode.

Also wish to point out, while working on Zedstore, we realized that
TupleDesc from Relation object can be trusted at AM layer for
scan_begin() API. As for ALTER TABLE rewrite case (ATRewriteTables()),
catalog is updated first and hence the relation object passed to AM
layer reflects new TupleDesc. For heapam its fine as it doesn't use
the TupleDesc today during scans in AM layer for scan_getnextslot().
Only TupleDesc which can trusted and matches the on-disk layout of the
tuple for scans hence is from TupleTableSlot. Which is little
unfortunate as TupleTableSlot is only available in scan_getnextslot(),
and not in scan_begin(). Means if AM wishes to do some initialization
based on TupleDesc for scans can't be done in scan_begin() and forced
to delay till has access to TupleTableSlot. We should at least add
comment for scan_begin() to strongly clarify not to trust Relation
object TupleDesc. Or maybe other alternative would be have separate
API for rewrite case.

1] /messages/by-id/9a7fb9cc-2419-5db7-8840-ddc10c93f122@iki.fi

Attachments:

Adjust-toy-table-AM-implementation-to-latest-APIs.patchapplication/x-patch; name=Adjust-toy-table-AM-implementation-to-latest-APIs.patchDownload
diff --git a/src/test/modules/toytable/toytableam.c b/src/test/modules/toytable/toytableam.c
index 4cb2b5d75db..9f1ab534822 100644
--- a/src/test/modules/toytable/toytableam.c
+++ b/src/test/modules/toytable/toytableam.c
@@ -398,6 +398,7 @@ toyam_finish_bulk_insert(Relation rel, int options)
 
 static void
 toyam_relation_set_new_filenode(Relation rel,
+								const RelFileNode *newrnode,
 								char persistence,
 								TransactionId *freezeXid,
 								MultiXactId *minmulti)
@@ -410,7 +411,7 @@ toyam_relation_set_new_filenode(Relation rel,
 	 * RelationGetNumberOfBlocks, from index_update_stats(), and that
 	 * fails if the underlying file doesn't exist.
 	 */
-	RelationCreateStorage(rel->rd_node, persistence);
+	RelationCreateStorage(*newrnode, persistence);
 }
 
 static void
@@ -422,7 +423,7 @@ toyam_relation_nontransactional_truncate(Relation rel)
 }
 
 static void
-toyam_relation_copy_data(Relation rel, RelFileNode newrnode)
+toyam_relation_copy_data(Relation rel, const RelFileNode *newrnode)
 {
 	ereport(ERROR,
 			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -435,8 +436,8 @@ toyam_relation_copy_for_cluster(Relation NewHeap,
 								Relation OldIndex,
 								bool use_sort,
 								TransactionId OldestXmin,
-								TransactionId FreezeXid,
-								MultiXactId MultiXactCutoff,
+								TransactionId *xid_cutoff,
+								MultiXactId *multi_cutoff,
 								double *num_tuples,
 								double *tups_vacuumed,
 								double *tups_recently_dead)
#160Rafia Sabih
rafia.pghackers@gmail.com
In reply to: Andres Freund (#158)
Re: Pluggable Storage - Andres's take

On Mon, 6 May 2019 at 16:14, Andres Freund <andres@anarazel.de> wrote:

Hi,

On May 6, 2019 3:40:55 AM PDT, Rafia Sabih <rafia.pghackers@gmail.com> wrote:

On Tue, 9 Apr 2019 at 15:17, Heikki Linnakangas <hlinnaka@iki.fi>
wrote:

On 08/04/2019 20:37, Andres Freund wrote:

On 2019-04-08 15:34:46 +0300, Heikki Linnakangas wrote:

There's a little bug in index-only scan executor node, where it

mixes up the

slots to hold a tuple from the index, and from the table. That

doesn't cause

any ill effects if the AM uses TTSOpsHeapTuple, but with my toy

AM, which

uses a virtual slot, it caused warnings like this from index-only

scans:

Hm. That's another one that I think I had fixed previously :(, and

then

concluded that it's not actually necessary for some reason. Your

fix

looks correct to me. Do you want to commit it? Otherwise I'll look

at

it after rebasing zheap, and checking it with that.

I found another slot type confusion bug, while playing with zedstore.

In

an Index Scan, if you have an ORDER BY key that needs to be

rechecked,

so that it uses the reorder queue, then it will sometimes use the
reorder queue slot, and sometimes the table AM's slot, for the scan
slot. If they're not of the same type, you get an assertion:

TRAP: FailedAssertion("!(op->d.fetch.kind == slot->tts_ops)", File:
"execExprInterp.c", Line: 1905)

Attached is a test for this, again using the toy table AM, extended

to

be able to test this. And a fix.

Attached is a patch with the toy implementation I used to test

this. I'm not

suggesting we should commit that - although feel free to do that

if you

think it's useful - but it shows how I bumped into these issues.

Hm, probably not a bad idea to include something like it. It seems

like

we kinda would need non-stub implementation of more functions for

it to

test much / and to serve as an example. I'm mildy inclined to just

do

it via zheap / externally, but I'm not quite sure that's good

enough.

Works for me.

+static Size
+toyam_parallelscan_estimate(Relation rel)
+{
+    ereport(ERROR,
+                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                     errmsg("function %s not implemented yet",

__func__)));

+}

The other stubbed functions seem like we should require them, but I
wonder if we should make the parallel stuff optional?

Yeah, that would be good. I would assume it to be optional.

I was trying the toyam patch and on make check it failed with
segmentation fault at

static void
toyam_relation_set_new_filenode(Relation rel,
char persistence,
TransactionId *freezeXid,
MultiXactId *minmulti)
{
*freezeXid = InvalidTransactionId;

Basically, on running create table t (i int, j int) using toytable,
leads to this segmentation fault.

Am I missing something here?

I assume you got compiler warmings compiling it? The API for some callbacks changed a bit.

Oh yeah it does.

--
Regards,
Rafia Sabih

#161Rafia Sabih
rafia.pghackers@gmail.com
In reply to: Ashwin Agrawal (#159)
Re: Pluggable Storage - Andres's take

On Mon, 6 May 2019 at 22:39, Ashwin Agrawal <aagrawal@pivotal.io> wrote:

On Mon, May 6, 2019 at 7:14 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On May 6, 2019 3:40:55 AM PDT, Rafia Sabih <rafia.pghackers@gmail.com> wrote:

I was trying the toyam patch and on make check it failed with
segmentation fault at

static void
toyam_relation_set_new_filenode(Relation rel,
char persistence,
TransactionId *freezeXid,
MultiXactId *minmulti)
{
*freezeXid = InvalidTransactionId;

Basically, on running create table t (i int, j int) using toytable,
leads to this segmentation fault.

Am I missing something here?

I assume you got compiler warmings compiling it? The API for some callbacks changed a bit.

Attached patch gets toy table AM implementation to match latest master API.
The patch builds on top of patch from Heikki in [1].
Compiles and works but the test still continues to fail with WARNING
for issue mentioned in [1]

Thanks Ashwin, this works fine with the mentioned warnings of course.

--
Regards,
Rafia Sabih

#162Ashwin Agrawal
aagrawal@pivotal.io
In reply to: Ashwin Agrawal (#159)
Re: Pluggable Storage - Andres's take

On Mon, May 6, 2019 at 1:39 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote:

Also wish to point out, while working on Zedstore, we realized that
TupleDesc from Relation object can be trusted at AM layer for
scan_begin() API. As for ALTER TABLE rewrite case (ATRewriteTables()),
catalog is updated first and hence the relation object passed to AM
layer reflects new TupleDesc. For heapam its fine as it doesn't use
the TupleDesc today during scans in AM layer for scan_getnextslot().
Only TupleDesc which can trusted and matches the on-disk layout of the
tuple for scans hence is from TupleTableSlot. Which is little
unfortunate as TupleTableSlot is only available in scan_getnextslot(),
and not in scan_begin(). Means if AM wishes to do some initialization
based on TupleDesc for scans can't be done in scan_begin() and forced
to delay till has access to TupleTableSlot. We should at least add
comment for scan_begin() to strongly clarify not to trust Relation
object TupleDesc. Or maybe other alternative would be have separate
API for rewrite case.

Just to correct my typo, I wish to say, TupleDesc from Relation object can't
be trusted at AM layer for scan_begin() API.

Andres, any thoughts on above. I see you had proposed "change the
table_beginscan* API so it
provides a slot" in [1]/messages/by-id/20181211021340.mqaown4njtcgrjvr@alap3.anarazel.de, but seems received no response/comments that time.

[1]: /messages/by-id/20181211021340.mqaown4njtcgrjvr@alap3.anarazel.de
/messages/by-id/20181211021340.mqaown4njtcgrjvr@alap3.anarazel.de

#163Andres Freund
andres@anarazel.de
In reply to: Ashwin Agrawal (#156)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-04-29 16:17:41 -0700, Ashwin Agrawal wrote:

On Thu, Apr 25, 2019 at 3:43 PM Andres Freund <andres@anarazel.de> wrote:

Hm. I think some of those changes would be a bit bigger than I initially
though. Attached is a more minimal fix that'd route
RelationGetNumberOfBlocksForFork() through tableam if necessary. I
think it's definitely the right answer for 1), probably the pragmatic
answer to 2), but certainly not for 3).

I've for now made the AM return the size in bytes, and then convert that
into blocks in RelationGetNumberOfBlocksForFork(). Most postgres callers
are going to continue to want it internally as pages (otherwise there's
going to be way too much churn, without a benefit I can see). So I
think that's OK.

I will provide my inputs, Heikki please correct me or add your inputs.

I am not sure how much gain this practically provides, if rest of the
system continues to use the value returned in-terms of blocks. I
understand things being block based (and not just really block based
but all the blocks of relation are storing data and full tuple) is
engraved in the system. So, breaking out of it is yes much larger
change and not just limited to table AM API.

I don't think it's that ingrained in all that many parts of the
system. Outside of the places I listed upthread, and the one index case
that stashes extra info, which places are that "block based"?

I feel most of the issues discussed here should be faced by zheap as
well, as not all blocks/pages contain data like TPD pages should be
excluded from sampling and TID scans, etc...

It's not a problem so far, and zheap works on tableam. You can just skip
such blocks during sampling / analyze, and return nothing for tidscans.

2) commands/analyze.c, computing pg_class.relpages

This should imo be moved to the tableam callback. It's currently done
a bit weirdly imo, with fdws computing relpages the callback, but
then also returning the acquirefunc. Seems like it should entirely be
computed as part of calling acquirefunc.

Here I'm not sure routing RelationGetNumberOfBlocksForFork() through
tableam wouldn't be the right minimal approach too. It has the
disadvantage of implying certain values for the
RelationGetNumberOfBlocksForFork(MAIN) return value. The alternative
would be to return the desire sampling range in
table_beginscan_analyze() - but that'd require some duplication because
currently that just uses the generic scan_begin() callback.

Yes, just routing relation size via AM layer and using its returned
value in terms of blocks still and performing sampling based on blocks
based on it, doesn't feel resolves the issue. Maybe need to delegate
sampling completely to AM layer. Code duplication can be avoided by
similar AMs (heap and zheap) possible using some common utility
functions to achieve intended result.

I don't know what this is actually proposing.

I suspect - as previously mentioned- that we're going to have to extend
statistics collection beyond the current approach at some point, but I
don't think that's now. At least to me it's not clear how to best
represent the stats, and how to best use them, if the underlying storage
is fundamentally not block best. Nor how we'd avoid code duplication...

Yes, will have to give more thoughts into this.

3) nodeTidscan, skipping over too large tids
I think this should just be moved into the AMs, there's no need to
have this in nodeTidscan.c

I think here it's *not* actually correct at all to use the relation
size. It's currently doing:

/*
* We silently discard any TIDs that are out of range at the time of scan
* start. (Since we hold at least AccessShareLock on the table, it won't
* be possible for someone to truncate away the blocks we intend to
* visit.)
*/
nblocks = RelationGetNumberOfBlocks(tidstate->ss.ss_currentRelation);

which is fine (except for a certain abstraction leakage) for an AM like
heap or zheap, but I suspect strongly that that's not ok for Ashwin &
Heikki's approach where tid isn't tied to physical representation.

Agree, its not nice to have that optimization being performed based on
number of block in generic layer. I feel its not efficient either for
zheap too due to TPD pages as mentioned above, as number of blocks
returned will be higher compared to actually data blocks.

I don't think there's a problem for zheap. The blocks are just
interspersed.

Having pondered this a lot more, I think this is the way to go for
12. Then we can improve this for v13, to be nice.

The proper fix seems to be to introduce a new scan variant
(e.g. table_beginscan_tid()), and then have table_fetch_row_version take
a scan as a parameter. But it seems we'd have to introduce that as a
separate tableam callback, because we'd not want to incur the overhead
of creating an additional scan / RelationGetNumberOfBlocks() checks for
triggers et al.

Thinking out loud here, we can possibly tackle this in multiple ways.
First above mentioned check seems more optimization to me than
functionally needed, correct me if wrong. If that's true we can check
with AM if wish to apply that optimization or not based on relation
size.

It'd be really expensive to check this differently for heap. We'd have
to check the relation size, which is out of the question imo.

Greetings,

Andres Freund

#164Andres Freund
andres@anarazel.de
In reply to: Ashwin Agrawal (#162)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-05-07 23:18:39 -0700, Ashwin Agrawal wrote:

On Mon, May 6, 2019 at 1:39 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote:

Also wish to point out, while working on Zedstore, we realized that
TupleDesc from Relation object can be trusted at AM layer for
scan_begin() API. As for ALTER TABLE rewrite case (ATRewriteTables()),
catalog is updated first and hence the relation object passed to AM
layer reflects new TupleDesc. For heapam its fine as it doesn't use
the TupleDesc today during scans in AM layer for scan_getnextslot().
Only TupleDesc which can trusted and matches the on-disk layout of the
tuple for scans hence is from TupleTableSlot. Which is little
unfortunate as TupleTableSlot is only available in scan_getnextslot(),
and not in scan_begin(). Means if AM wishes to do some initialization
based on TupleDesc for scans can't be done in scan_begin() and forced
to delay till has access to TupleTableSlot. We should at least add
comment for scan_begin() to strongly clarify not to trust Relation
object TupleDesc. Or maybe other alternative would be have separate
API for rewrite case.

Just to correct my typo, I wish to say, TupleDesc from Relation object can't
be trusted at AM layer for scan_begin() API.

Andres, any thoughts on above. I see you had proposed "change the
table_beginscan* API so it
provides a slot" in [1], but seems received no response/comments that time.
[1]
/messages/by-id/20181211021340.mqaown4njtcgrjvr@alap3.anarazel.de

I don't think passing a slot at beginscan time is a good idea. There's
several places that want to use different slots for the same scan, and
we probably want to increase that over time (e.g. for batching), not
decrease it.

What kind of initialization do you want to do based on the tuple desc at
beginscan time?

Greetings,

Andres Freund

#165Ashwin Agrawal
aagrawal@pivotal.io
In reply to: Andres Freund (#164)
Re: Pluggable Storage - Andres's take

On Wed, May 8, 2019 at 2:46 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-05-07 23:18:39 -0700, Ashwin Agrawal wrote:

On Mon, May 6, 2019 at 1:39 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote:

Also wish to point out, while working on Zedstore, we realized that
TupleDesc from Relation object can be trusted at AM layer for
scan_begin() API. As for ALTER TABLE rewrite case (ATRewriteTables()),
catalog is updated first and hence the relation object passed to AM
layer reflects new TupleDesc. For heapam its fine as it doesn't use
the TupleDesc today during scans in AM layer for scan_getnextslot().
Only TupleDesc which can trusted and matches the on-disk layout of the
tuple for scans hence is from TupleTableSlot. Which is little
unfortunate as TupleTableSlot is only available in scan_getnextslot(),
and not in scan_begin(). Means if AM wishes to do some initialization
based on TupleDesc for scans can't be done in scan_begin() and forced
to delay till has access to TupleTableSlot. We should at least add
comment for scan_begin() to strongly clarify not to trust Relation
object TupleDesc. Or maybe other alternative would be have separate
API for rewrite case.

Just to correct my typo, I wish to say, TupleDesc from Relation object can't
be trusted at AM layer for scan_begin() API.

Andres, any thoughts on above. I see you had proposed "change the
table_beginscan* API so it
provides a slot" in [1], but seems received no response/comments that time.
[1]
/messages/by-id/20181211021340.mqaown4njtcgrjvr@alap3.anarazel.de

I don't think passing a slot at beginscan time is a good idea. There's
several places that want to use different slots for the same scan, and
we probably want to increase that over time (e.g. for batching), not
decrease it.

What kind of initialization do you want to do based on the tuple desc at
beginscan time?

For Zedstore (column store) need to allocate map (array or bitmask) to
mark which columns to project for the scan. Also need to allocate AM
internal scan descriptors corresponding to number of attributes for
the scan. Hence, need access to number of attributes involved in the
scan. Currently, not able to trust Relation's TupleDesc, for Zedstore
we worked-around the same by allocating these things on first call to
getnextslot, when have access to slot (by switching to memory context
used during scan_begin()).

#166Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#155)
2 attachment(s)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-04-25 15:43:15 -0700, Andres Freund wrote:

Hm. I think some of those changes would be a bit bigger than I initially
though. Attached is a more minimal fix that'd route
RelationGetNumberOfBlocksForFork() through tableam if necessary. I
think it's definitely the right answer for 1), probably the pragmatic
answer to 2), but certainly not for 3).

I've for now made the AM return the size in bytes, and then convert that
into blocks in RelationGetNumberOfBlocksForFork(). Most postgres callers
are going to continue to want it internally as pages (otherwise there's
going to be way too much churn, without a benefit I can see). So I
thinkt that's OK.

There's also a somewhat weird bit of returning the total relation size
for InvalidForkNumber - it's pretty likely that other AMs wouldn't use
postgres' current forks, but have equivalent concepts. And without that
there'd be no way to get that size. I'm not sure I like this, input
welcome. But it seems good to offer the ability to get the entire size
somehow.

I'm still reasonably happy with this. I'll polish it a bit and push.

3) nodeTidscan, skipping over too large tids
I think this should just be moved into the AMs, there's no need to
have this in nodeTidscan.c

I think here it's *not* actually correct at all to use the relation
size. It's currently doing:

/*
* We silently discard any TIDs that are out of range at the time of scan
* start. (Since we hold at least AccessShareLock on the table, it won't
* be possible for someone to truncate away the blocks we intend to
* visit.)
*/
nblocks = RelationGetNumberOfBlocks(tidstate->ss.ss_currentRelation);

which is fine (except for a certain abstraction leakage) for an AM like
heap or zheap, but I suspect strongly that that's not ok for Ashwin &
Heikki's approach where tid isn't tied to physical representation.

The obvious answer would be to just move that check into the
table_fetch_row_version implementation (currently just calling
heap_fetch()) - but that doesn't seem OK from a performance POV, because
we'd then determine the relation size once for each tid, rather than
once per tidscan. And it'd also check in cases where we know the tid is
supposed to be valid (e.g. fetching trigger tuples and such).

The proper fix seems to be to introduce a new scan variant
(e.g. table_beginscan_tid()), and then have table_fetch_row_version take
a scan as a parameter. But it seems we'd have to introduce that as a
separate tableam callback, because we'd not want to incur the overhead
of creating an additional scan / RelationGetNumberOfBlocks() checks for
triggers et al.

Attached is a prototype of a variation of this. I added a
table_tuple_tid_valid(TableScanDesc sscan, ItemPointer tid)
callback / wrapper. Currently it just takes a "plain" scan, but we could
add a separate table_beginscan variant too.

For heap that just means we can just use HeapScanDesc's rs_nblock to
filter out invalid tids, and we only need to call
RelationGetNumberOfBlocks() once, rather than every
table_tuple_tid_valid(0 / table_get_latest_tid() call. Which is a good
improvement for nodeTidscan's table_get_latest_tid() call (for WHERE
CURRENT OF) - which previously computed the relation size once per
tuple.

Needs a bit of polishing, but I think this is the right direction?
Unless somebody protests, I'm going to push something along those lines
quite soon.

Greetings,

Andres Freund

Attachments:

v2-0001-tableam-Don-t-assume-that-an-AM-uses-md.c-style-s.patchtext/x-diff; charset=us-asciiDownload
From 5c8edd1803619120c9ac856e7353943281f2d407 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 14 May 2019 23:50:41 -0700
Subject: [PATCH v2 1/2] tableam: Don't assume that an AM uses md.c style
 storage.

---
 src/backend/access/heap/heapam_handler.c | 27 +++++++++++++++
 src/backend/access/table/tableamapi.c    |  3 ++
 src/backend/storage/buffer/bufmgr.c      | 43 ++++++++++++++++++++++--
 src/include/access/tableam.h             | 40 ++++++++++++++++++++++
 4 files changed, 110 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 00505ec3f4d..96211c673e1 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1975,6 +1975,31 @@ heapam_scan_get_blocks_done(HeapScanDesc hscan)
 }
 
 
+/* ------------------------------------------------------------------------
+ * Miscellaneous callbacks for the heap AM
+ * ------------------------------------------------------------------------
+ */
+
+static uint64
+heapam_relation_size(Relation rel, ForkNumber forkNumber)
+{
+	uint64		nblocks = 0;
+
+	/* Open it at the smgr level if not already done */
+	RelationOpenSmgr(rel);
+
+	/* InvalidForkNumber indicates the size of all forks */
+	if (forkNumber == InvalidForkNumber)
+	{
+		for (int i = 0; i < MAX_FORKNUM; i++)
+			nblocks += smgrnblocks(rel->rd_smgr, i);
+	}
+	else
+		nblocks = smgrnblocks(rel->rd_smgr, forkNumber);
+
+	return nblocks * BLCKSZ;
+}
+
 
 /* ------------------------------------------------------------------------
  * Planner related callbacks for the heap AM
@@ -2556,6 +2581,8 @@ static const TableAmRoutine heapam_methods = {
 	.index_build_range_scan = heapam_index_build_range_scan,
 	.index_validate_scan = heapam_index_validate_scan,
 
+	.relation_size = heapam_relation_size,
+
 	.relation_estimate_size = heapam_estimate_rel_size,
 
 	.scan_bitmap_next_block = heapam_scan_bitmap_next_block,
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index 0053dc95cab..32877e7674f 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -86,6 +86,9 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->scan_analyze_next_tuple != NULL);
 	Assert(routine->index_build_range_scan != NULL);
 	Assert(routine->index_validate_scan != NULL);
+
+	Assert(routine->relation_size != NULL);
+
 	Assert(routine->relation_estimate_size != NULL);
 
 	/* optional, but one callback implies presence of hte other */
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 887023fc8a5..fae290b4dbd 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -33,6 +33,7 @@
 #include <sys/file.h>
 #include <unistd.h>
 
+#include "access/tableam.h"
 #include "access/xlog.h"
 #include "catalog/catalog.h"
 #include "catalog/storage.h"
@@ -2789,14 +2790,50 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln)
 /*
  * RelationGetNumberOfBlocksInFork
  *		Determines the current number of pages in the specified relation fork.
+ *
+ * Note that the accuracy of the result will depend on the details of the
+ * relation's storage. For builtin AMs it'll be accurate, but for external AMs
+ * it might not be.
  */
 BlockNumber
 RelationGetNumberOfBlocksInFork(Relation relation, ForkNumber forkNum)
 {
-	/* Open it at the smgr level if not already done */
-	RelationOpenSmgr(relation);
+	switch (relation->rd_rel->relkind)
+	{
+		case RELKIND_SEQUENCE:
+		case RELKIND_INDEX:
+		case RELKIND_PARTITIONED_INDEX:
+			/* Open it at the smgr level if not already done */
+			RelationOpenSmgr(relation);
 
-	return smgrnblocks(relation->rd_smgr, forkNum);
+			return smgrnblocks(relation->rd_smgr, forkNum);
+
+		case RELKIND_RELATION:
+		case RELKIND_TOASTVALUE:
+		case RELKIND_MATVIEW:
+			{
+				/*
+				 * Not every table AM uses BLCKSZ wide fixed size
+				 * blocks. Therefore tableam returns the size in bytes - but
+				 * for the purpose of this routine, we want the number of
+				 * blocks. Therefore divide, rounding up.
+				 */
+				uint64 szbytes;
+
+				szbytes = table_relation_size(relation, forkNum);
+
+				return (szbytes + (BLCKSZ - 1)) / BLCKSZ;
+			}
+		case RELKIND_VIEW:
+		case RELKIND_COMPOSITE_TYPE:
+		case RELKIND_FOREIGN_TABLE:
+		case RELKIND_PARTITIONED_TABLE:
+		default:
+			Assert(false);
+			break;
+	}
+
+	return 0; /* satisfy compiler */
 }
 
 /*
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index ebfa0d51855..e2062d808ef 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -540,6 +540,22 @@ typedef struct TableAmRoutine
 										struct ValidateIndexState *state);
 
 
+	/* ------------------------------------------------------------------------
+	 * Miscellaneous functions.
+	 * ------------------------------------------------------------------------
+	 */
+
+	/*
+	 * See table_relation_size().
+	 *
+	 * Note that currently a few callers use the MAIN_FORKNUM size to vet the
+	 * validity of tids (e.g. nodeTidscans.c), and others use it to figure out
+	 * the range of potentially interesting blocks (brin, analyze). The
+	 * abstraction around this will need to be improved in the near future.
+	 */
+	uint64		(*relation_size) (Relation rel, ForkNumber forkNumber);
+
+
 	/* ------------------------------------------------------------------------
 	 * Planner related functions.
 	 * ------------------------------------------------------------------------
@@ -550,6 +566,10 @@ typedef struct TableAmRoutine
 	 *
 	 * While block oriented, it shouldn't be too hard for an AM that doesn't
 	 * doesn't internally use blocks to convert into a usable representation.
+	 *
+	 * This differs from the relation_size callback by returning size
+	 * estimates (both relation size and tuple count) for planning purposes,
+	 * rather than returning a currently correct estimate.
 	 */
 	void		(*relation_estimate_size) (Relation rel, int32 *attr_widths,
 										   BlockNumber *pages, double *tuples,
@@ -1503,6 +1523,26 @@ table_index_validate_scan(Relation heap_rel,
 }
 
 
+/* ----------------------------------------------------------------------------
+ * Miscellaneous functionality
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * Return the current size of `rel` in bytes. If `forkNumber` is
+ * InvalidForkNumber return the relation's overall size, otherwise the size
+ * for the indicated fork.
+ *
+ * Note that the overall size might not be the equivalent of the sum of sizes
+ * for the individual forks for some AMs, e.g. because the AMs storage does
+ * not neatly map onto the builtin types of forks.
+ */
+static inline uint64
+table_relation_size(Relation rel, ForkNumber forkNumber)
+{
+	return rel->rd_tableam->relation_size(rel, forkNumber);
+}
+
 /* ----------------------------------------------------------------------------
  * Planner related functionality
  * ----------------------------------------------------------------------------
-- 
2.21.0.dirty

v2-0002-tableam-Avoid-relying-on-relation-size-to-determi.patchtext/x-diff; charset=us-asciiDownload
From 1518a2adff7d6fdd7f2aefd6fbe6b66d179a7f7d Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 15 May 2019 11:52:01 -0700
Subject: [PATCH v2 2/2] tableam: Avoid relying on relation size to determine
 validity of tids.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/heap/heapam.c         | 26 +++-----
 src/backend/access/heap/heapam_handler.c | 11 +++-
 src/backend/access/table/tableam.c       | 27 +++++++++
 src/backend/executor/nodeTidscan.c       | 77 +++++++++++++++---------
 src/backend/utils/adt/tid.c              | 10 ++-
 src/include/access/heapam.h              |  3 +-
 src/include/access/tableam.h             | 37 ++++++++----
 7 files changed, 129 insertions(+), 62 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ec9853603fd..d8d4f3b1f5a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1654,8 +1654,8 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 /*
  *	heap_get_latest_tid -  get the latest tid of a specified tuple
  *
- * Actually, this gets the latest version that is visible according to
- * the passed snapshot.  You can pass SnapshotDirty to get the very latest,
+ * Actually, this gets the latest version that is visible according to the
+ * scan's snapshot.  Create a scan using SnapshotDirty to get the very latest,
  * possibly uncommitted version.
  *
  * *tid is both an input and an output parameter: it is updated to
@@ -1663,28 +1663,20 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
  * if no version of the row passes the snapshot test.
  */
 void
-heap_get_latest_tid(Relation relation,
-					Snapshot snapshot,
+heap_get_latest_tid(TableScanDesc sscan,
 					ItemPointer tid)
 {
-	BlockNumber blk;
+	Relation relation = sscan->rs_rd;
+	Snapshot snapshot = sscan->rs_snapshot;
 	ItemPointerData ctid;
 	TransactionId priorXmax;
 
-	/* this is to avoid Assert failures on bad input */
-	if (!ItemPointerIsValid(tid))
-		return;
-
 	/*
-	 * Since this can be called with user-supplied TID, don't trust the input
-	 * too much.  (RelationGetNumberOfBlocks is an expensive check, so we
-	 * don't check t_ctid links again this way.  Note that it would not do to
-	 * call it just once and save the result, either.)
+	 * table_get_latest_tid verified that the passed in tid is valid.  Assume
+	 * that t_ctid links are valid however - there shouldn't be invalid ones
+	 * in the table.
 	 */
-	blk = ItemPointerGetBlockNumber(tid);
-	if (blk >= RelationGetNumberOfBlocks(relation))
-		elog(ERROR, "block number %u is out of range for relation \"%s\"",
-			 blk, RelationGetRelationName(relation));
+	Assert(ItemPointerIsValid(tid));
 
 	/*
 	 * Loop to chase down t_ctid links.  At top of loop, ctid is the tuple we
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 96211c673e1..d7e1f5eb724 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -73,7 +73,6 @@ heapam_slot_callbacks(Relation relation)
 	return &TTSOpsBufferHeapTuple;
 }
 
-
 /* ------------------------------------------------------------------------
  * Index Scan Callbacks for heap AM
  * ------------------------------------------------------------------------
@@ -204,6 +203,15 @@ heapam_fetch_row_version(Relation relation,
 	return false;
 }
 
+static bool
+heapam_tuple_tid_valid(TableScanDesc scan, ItemPointer tid)
+{
+	HeapScanDesc hscan = (HeapScanDesc) scan;
+
+	return ItemPointerIsValid(tid) &&
+		ItemPointerGetBlockNumber(tid) < hscan->rs_nblocks;
+}
+
 static bool
 heapam_tuple_satisfies_snapshot(Relation rel, TupleTableSlot *slot,
 								Snapshot snapshot)
@@ -2568,6 +2576,7 @@ static const TableAmRoutine heapam_methods = {
 
 	.tuple_fetch_row_version = heapam_fetch_row_version,
 	.tuple_get_latest_tid = heap_get_latest_tid,
+	.tuple_tid_valid = heapam_tuple_tid_valid,
 	.tuple_satisfies_snapshot = heapam_tuple_satisfies_snapshot,
 	.compute_xid_horizon_for_tuples = heap_compute_xid_horizon_for_tuples,
 
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index baba1ea699b..6e46befdfd9 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -213,6 +213,33 @@ table_index_fetch_tuple_check(Relation rel,
 }
 
 
+/* ------------------------------------------------------------------------
+ * Functions for non-modifying operations on individual tuples
+ * ------------------------------------------------------------------------
+ */
+
+void
+table_get_latest_tid(TableScanDesc scan, ItemPointer tid)
+{
+	Relation rel = scan->rs_rd;
+	const TableAmRoutine *tableam = rel->rd_tableam;
+
+	/*
+	 * Since this can be called with user-supplied TID, don't trust the input
+	 * too much.
+	 */
+	if (!tableam->tuple_tid_valid(scan, tid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("tid (%u, %u) is not valid for relation for relation \"%s\"",
+						ItemPointerGetBlockNumberNoCheck(tid),
+						ItemPointerGetOffsetNumberNoCheck(tid),
+						RelationGetRelationName(rel))));
+
+	return tableam->tuple_get_latest_tid(scan, tid);
+}
+
+
 /* ----------------------------------------------------------------------------
  * Functions to make modifications a bit simpler.
  * ----------------------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 156be56a57d..63802a53419 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -129,20 +129,12 @@ static void
 TidListEval(TidScanState *tidstate)
 {
 	ExprContext *econtext = tidstate->ss.ps.ps_ExprContext;
-	BlockNumber nblocks;
+	TableScanDesc scan = tidstate->ss.ss_currentScanDesc;
 	ItemPointerData *tidList;
 	int			numAllocTids;
 	int			numTids;
 	ListCell   *l;
 
-	/*
-	 * We silently discard any TIDs that are out of range at the time of scan
-	 * start.  (Since we hold at least AccessShareLock on the table, it won't
-	 * be possible for someone to truncate away the blocks we intend to
-	 * visit.)
-	 */
-	nblocks = RelationGetNumberOfBlocks(tidstate->ss.ss_currentRelation);
-
 	/*
 	 * We initialize the array with enough slots for the case that all quals
 	 * are simple OpExprs or CurrentOfExprs.  If there are any
@@ -165,19 +157,26 @@ TidListEval(TidScanState *tidstate)
 				DatumGetPointer(ExecEvalExprSwitchContext(tidexpr->exprstate,
 														  econtext,
 														  &isNull));
-			if (!isNull &&
-				ItemPointerIsValid(itemptr) &&
-				ItemPointerGetBlockNumber(itemptr) < nblocks)
+			if (isNull)
+				continue;
+
+			/*
+			 * We silently discard any TIDs that are out of range at the time
+			 * of scan start.  (Since we hold at least AccessShareLock on the
+			 * table, it won't be possible for someone to truncate away the
+			 * blocks we intend to visit.)
+			 */
+			if (!table_tuple_tid_valid(scan, itemptr))
+				continue;
+
+			if (numTids >= numAllocTids)
 			{
-				if (numTids >= numAllocTids)
-				{
-					numAllocTids *= 2;
-					tidList = (ItemPointerData *)
-						repalloc(tidList,
-								 numAllocTids * sizeof(ItemPointerData));
-				}
-				tidList[numTids++] = *itemptr;
+				numAllocTids *= 2;
+				tidList = (ItemPointerData *)
+					repalloc(tidList,
+							 numAllocTids * sizeof(ItemPointerData));
 			}
+			tidList[numTids++] = *itemptr;
 		}
 		else if (tidexpr->exprstate && tidexpr->isarray)
 		{
@@ -206,13 +205,15 @@ TidListEval(TidScanState *tidstate)
 			}
 			for (i = 0; i < ndatums; i++)
 			{
-				if (!ipnulls[i])
-				{
-					itemptr = (ItemPointer) DatumGetPointer(ipdatums[i]);
-					if (ItemPointerIsValid(itemptr) &&
-						ItemPointerGetBlockNumber(itemptr) < nblocks)
-						tidList[numTids++] = *itemptr;
-				}
+				if (ipnulls[i])
+					continue;
+
+				itemptr = (ItemPointer) DatumGetPointer(ipdatums[i]);
+
+				if (!table_tuple_tid_valid(scan, itemptr))
+					continue;
+
+				tidList[numTids++] = *itemptr;
 			}
 			pfree(ipdatums);
 			pfree(ipnulls);
@@ -306,6 +307,7 @@ TidNext(TidScanState *node)
 	EState	   *estate;
 	ScanDirection direction;
 	Snapshot	snapshot;
+	TableScanDesc scan;
 	Relation	heapRelation;
 	TupleTableSlot *slot;
 	ItemPointerData *tidList;
@@ -325,8 +327,17 @@ TidNext(TidScanState *node)
 	 * First time through, compute the list of TIDs to be visited
 	 */
 	if (node->tss_TidList == NULL)
-		TidListEval(node);
+	{
+		Assert(node->ss.ss_currentScanDesc == NULL);
 
+		node->ss.ss_currentScanDesc =
+			table_beginscan(node->ss.ss_currentRelation,
+							estate->es_snapshot,
+							0, NULL);
+		TidListEval(node);
+	}
+
+	scan = node->ss.ss_currentScanDesc;
 	tidList = node->tss_TidList;
 	numTids = node->tss_NumTids;
 
@@ -365,7 +376,7 @@ TidNext(TidScanState *node)
 		 * current according to our snapshot.
 		 */
 		if (node->tss_isCurrentOf)
-			table_get_latest_tid(heapRelation, snapshot, &tid);
+			table_get_latest_tid(scan, &tid);
 
 		if (table_fetch_row_version(heapRelation, &tid, snapshot, slot))
 			return slot;
@@ -436,8 +447,13 @@ ExecTidScan(PlanState *pstate)
 void
 ExecReScanTidScan(TidScanState *node)
 {
+	if (node->ss.ss_currentScanDesc)
+		table_endscan(node->ss.ss_currentScanDesc);
+
 	if (node->tss_TidList)
 		pfree(node->tss_TidList);
+
+	node->ss.ss_currentScanDesc = NULL;
 	node->tss_TidList = NULL;
 	node->tss_NumTids = 0;
 	node->tss_TidPtr = -1;
@@ -455,6 +471,9 @@ ExecReScanTidScan(TidScanState *node)
 void
 ExecEndTidScan(TidScanState *node)
 {
+	if (node->ss.ss_currentScanDesc)
+		table_endscan(node->ss.ss_currentScanDesc);
+
 	/*
 	 * Free the exprcontext
 	 */
diff --git a/src/backend/utils/adt/tid.c b/src/backend/utils/adt/tid.c
index 6ab26d8ea8b..1aab30b6aab 100644
--- a/src/backend/utils/adt/tid.c
+++ b/src/backend/utils/adt/tid.c
@@ -358,6 +358,7 @@ currtid_byreloid(PG_FUNCTION_ARGS)
 	Relation	rel;
 	AclResult	aclresult;
 	Snapshot	snapshot;
+	TableScanDesc scan;
 
 	result = (ItemPointer) palloc(sizeof(ItemPointerData));
 	if (!reloid)
@@ -380,7 +381,9 @@ currtid_byreloid(PG_FUNCTION_ARGS)
 	ItemPointerCopy(tid, result);
 
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	table_get_latest_tid(rel, snapshot, result);
+	scan = table_beginscan(rel, snapshot, 0, NULL);
+	table_get_latest_tid(scan, result);
+	table_endscan(scan);
 	UnregisterSnapshot(snapshot);
 
 	table_close(rel, AccessShareLock);
@@ -398,6 +401,7 @@ currtid_byrelname(PG_FUNCTION_ARGS)
 	Relation	rel;
 	AclResult	aclresult;
 	Snapshot	snapshot;
+	TableScanDesc scan;
 
 	relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
 	rel = table_openrv(relrv, AccessShareLock);
@@ -415,7 +419,9 @@ currtid_byrelname(PG_FUNCTION_ARGS)
 	ItemPointerCopy(tid, result);
 
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	table_get_latest_tid(rel, snapshot, result);
+	scan = table_beginscan(rel, snapshot, 0, NULL);
+	table_get_latest_tid(scan, result);
+	table_endscan(scan);
 	UnregisterSnapshot(snapshot);
 
 	table_close(rel, AccessShareLock);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 77e5e603b03..6b8c7020c8c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -134,8 +134,7 @@ extern bool heap_hot_search_buffer(ItemPointer tid, Relation relation,
 					   Buffer buffer, Snapshot snapshot, HeapTuple heapTuple,
 					   bool *all_dead, bool first_call);
 
-extern void heap_get_latest_tid(Relation relation, Snapshot snapshot,
-					ItemPointer tid);
+extern void heap_get_latest_tid(TableScanDesc scan, ItemPointer tid);
 extern void setLastTid(const ItemPointer tid);
 
 extern BulkInsertState GetBulkInsertState(void);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e2062d808ef..e3e99a7dcdb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -308,12 +308,17 @@ typedef struct TableAmRoutine
 											Snapshot snapshot,
 											TupleTableSlot *slot);
 
+	/*
+	 * Is tid valid for a scan of this relation.
+	 */
+	bool		(*tuple_tid_valid) (TableScanDesc scan,
+									ItemPointer tid);
+
 	/*
 	 * Return the latest version of the tuple at `tid`, by updating `tid` to
 	 * point at the newest version.
 	 */
-	void		(*tuple_get_latest_tid) (Relation rel,
-										 Snapshot snapshot,
+	void		(*tuple_get_latest_tid) (TableScanDesc scan,
 										 ItemPointer tid);
 
 	/*
@@ -548,10 +553,10 @@ typedef struct TableAmRoutine
 	/*
 	 * See table_relation_size().
 	 *
-	 * Note that currently a few callers use the MAIN_FORKNUM size to vet the
-	 * validity of tids (e.g. nodeTidscans.c), and others use it to figure out
-	 * the range of potentially interesting blocks (brin, analyze). The
-	 * abstraction around this will need to be improved in the near future.
+	 * Note that currently a few callers use the MAIN_FORKNUM size to figure
+	 * out the range of potentially interesting blocks (brin, analyze). It's
+	 * probably that we'll need to revise the interface for those at some
+	 * point.
 	 */
 	uint64		(*relation_size) (Relation rel, ForkNumber forkNumber);
 
@@ -985,15 +990,25 @@ table_fetch_row_version(Relation rel,
 	return rel->rd_tableam->tuple_fetch_row_version(rel, tid, snapshot, slot);
 }
 
+/*
+ * Verify that `tid` is a potentially valid tuple identifier. That doesn't
+ * mean that the pointed to row needs to exist or be visible, but that
+ * attempting to fetch the row (e.g. with table_get_latest_tid() or
+ * table_fetch_row_version()) should not error out if called with that tid.
+ *
+ * `scan` needs to have been started via table_beginscan().
+ */
+static inline bool
+table_tuple_tid_valid(TableScanDesc scan, ItemPointer tid)
+{
+	return scan->rs_rd->rd_tableam->tuple_tid_valid(scan, tid);
+}
+
 /*
  * Return the latest version of the tuple at `tid`, by updating `tid` to
  * point at the newest version.
  */
-static inline void
-table_get_latest_tid(Relation rel, Snapshot snapshot, ItemPointer tid)
-{
-	rel->rd_tableam->tuple_get_latest_tid(rel, snapshot, tid);
-}
+extern void table_get_latest_tid(TableScanDesc scan, ItemPointer tid);
 
 /*
  * Return true iff tuple in slot satisfies the snapshot.
-- 
2.21.0.dirty

#167Ashwin Agrawal
aagrawal@pivotal.io
In reply to: Andres Freund (#166)
Re: Pluggable Storage - Andres's take

On Wed, May 15, 2019 at 11:54 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-04-25 15:43:15 -0700, Andres Freund wrote:

3) nodeTidscan, skipping over too large tids
I think this should just be moved into the AMs, there's no need to
have this in nodeTidscan.c

I think here it's *not* actually correct at all to use the relation
size. It's currently doing:

/*
* We silently discard any TIDs that are out of range at the time

of scan

* start. (Since we hold at least AccessShareLock on the table,

it won't

* be possible for someone to truncate away the blocks we intend to
* visit.)
*/
nblocks =

RelationGetNumberOfBlocks(tidstate->ss.ss_currentRelation);

which is fine (except for a certain abstraction leakage) for an AM like
heap or zheap, but I suspect strongly that that's not ok for Ashwin &
Heikki's approach where tid isn't tied to physical representation.

The obvious answer would be to just move that check into the
table_fetch_row_version implementation (currently just calling
heap_fetch()) - but that doesn't seem OK from a performance POV, because
we'd then determine the relation size once for each tid, rather than
once per tidscan. And it'd also check in cases where we know the tid is
supposed to be valid (e.g. fetching trigger tuples and such).

The proper fix seems to be to introduce a new scan variant
(e.g. table_beginscan_tid()), and then have table_fetch_row_version take
a scan as a parameter. But it seems we'd have to introduce that as a
separate tableam callback, because we'd not want to incur the overhead
of creating an additional scan / RelationGetNumberOfBlocks() checks for
triggers et al.

Attached is a prototype of a variation of this. I added a
table_tuple_tid_valid(TableScanDesc sscan, ItemPointer tid)
callback / wrapper. Currently it just takes a "plain" scan, but we could
add a separate table_beginscan variant too.

For heap that just means we can just use HeapScanDesc's rs_nblock to
filter out invalid tids, and we only need to call
RelationGetNumberOfBlocks() once, rather than every
table_tuple_tid_valid(0 / table_get_latest_tid() call. Which is a good
improvement for nodeTidscan's table_get_latest_tid() call (for WHERE
CURRENT OF) - which previously computed the relation size once per
tuple.

Needs a bit of polishing, but I think this is the right direction?

Highlevel this looks good to me. Will look into full details tomorrow. This
alligns with the high level thought I made but implemented in much better
way, to consult with the AM to perform the optimization or not. So, now
using the new callback table_tuple_tid_valid() AM either can implement some
way to perform the validation for TID to optimize the scan, or if has no
way to check based on scan descriptor then can decide to always pass true
and let table_fetch_row_version() handle the things.

#168Andres Freund
andres@anarazel.de
In reply to: Ashwin Agrawal (#167)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-05-15 23:00:38 -0700, Ashwin Agrawal wrote:

Highlevel this looks good to me. Will look into full details tomorrow.

Ping?

I'll push the first of the patches soon, and unless you'll comment on
the second soon, I'll also push ahead. There's a beta upcoming...

- Andres

#169Ashwin Agrawal
aagrawal@pivotal.io
In reply to: Andres Freund (#168)
Re: Pluggable Storage - Andres's take

On Fri, May 17, 2019 at 12:54 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-05-15 23:00:38 -0700, Ashwin Agrawal wrote:

Highlevel this looks good to me. Will look into full details tomorrow.

Ping?

I'll push the first of the patches soon, and unless you'll comment on
the second soon, I'll also push ahead. There's a beta upcoming...

Sorry for the delay, didn't get to it yesterday. Looked into both the
patches. They both look good to me, thank you.

Relation size API still doesn't address the analyze case as you mentioned
but sure something we can improve on later.

#170Ashwin Agrawal
aagrawal@pivotal.io
In reply to: Heikki Linnakangas (#146)
Re: Pluggable Storage - Andres's take

On Tue, Apr 9, 2019 at 6:17 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 08/04/2019 20:37, Andres Freund wrote:

On 2019-04-08 15:34:46 +0300, Heikki Linnakangas wrote:

There's a little bug in index-only scan executor node, where it mixes

up the

slots to hold a tuple from the index, and from the table. That doesn't

cause

any ill effects if the AM uses TTSOpsHeapTuple, but with my toy AM,

which

uses a virtual slot, it caused warnings like this from index-only scans:

Hm. That's another one that I think I had fixed previously :(, and then
concluded that it's not actually necessary for some reason. Your fix
looks correct to me. Do you want to commit it? Otherwise I'll look at
it after rebasing zheap, and checking it with that.

I found another slot type confusion bug, while playing with zedstore. In
an Index Scan, if you have an ORDER BY key that needs to be rechecked,
so that it uses the reorder queue, then it will sometimes use the
reorder queue slot, and sometimes the table AM's slot, for the scan
slot. If they're not of the same type, you get an assertion:

TRAP: FailedAssertion("!(op->d.fetch.kind == slot->tts_ops)", File:
"execExprInterp.c", Line: 1905)

Attached is a test for this, again using the toy table AM, extended to
be able to test this. And a fix.

It seems the two patches from email [1] fixing slot confusion in Index
Scans are pending to be committed.

1]
/messages/by-id/e71c4da4-3e82-cc4f-32cc-ede387fac8b0@iki.fi

#171Ashwin Agrawal
aagrawal@pivotal.io
In reply to: Andres Freund (#166)
Re: Pluggable Storage - Andres's take

On Wed, May 15, 2019 at 11:54 AM Andres Freund <andres@anarazel.de> wrote:

Attached is a prototype of a variation of this. I added a
table_tuple_tid_valid(TableScanDesc sscan, ItemPointer tid)
callback / wrapper. Currently it just takes a "plain" scan, but we could
add a separate table_beginscan variant too.

For heap that just means we can just use HeapScanDesc's rs_nblock to
filter out invalid tids, and we only need to call
RelationGetNumberOfBlocks() once, rather than every
table_tuple_tid_valid(0 / table_get_latest_tid() call. Which is a good
improvement for nodeTidscan's table_get_latest_tid() call (for WHERE
CURRENT OF) - which previously computed the relation size once per
tuple.

Question on the patch, if not too late
Why call table_beginscan() in TidNext() and not in ExecInitTidScan() ?
Seems cleaner to have it in ExecInitTidScan().

#172Andres Freund
andres@anarazel.de
In reply to: Ashwin Agrawal (#171)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-05-17 16:56:04 -0700, Ashwin Agrawal wrote:

Question on the patch, if not too late
Why call table_beginscan() in TidNext() and not in ExecInitTidScan() ?
Seems cleaner to have it in ExecInitTidScan().

Largely because it's symmetrical to where most other scans are started (
c.f. nodeSeqscan.c, nodeIndexscan.c). But also, there's no need to incur
the cost of a smgrnblocks() etc when the node might never actually be
reached during execution.

Greetings,

Andres Freund

#173Andres Freund
andres@anarazel.de
In reply to: Ashwin Agrawal (#169)
Re: Pluggable Storage - Andres's take

Hi,

On 2019-05-17 14:49:19 -0700, Ashwin Agrawal wrote:

On Fri, May 17, 2019 at 12:54 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-05-15 23:00:38 -0700, Ashwin Agrawal wrote:

Highlevel this looks good to me. Will look into full details tomorrow.

Ping?

I'll push the first of the patches soon, and unless you'll comment on
the second soon, I'll also push ahead. There's a beta upcoming...

Sorry for the delay, didn't get to it yesterday. Looked into both the
patches. They both look good to me, thank you.

Pushed both now.

Relation size API still doesn't address the analyze case as you mentioned
but sure something we can improve on later.

I'm much less concerned about that. You can just return a reasonable
block size from the size callback, and it'll work for block sampling
(and you can just skip pages in the analyze callback if needed, e.g. for
zheap's tpd pages). And we assume that a reasonable block size is
returned by the size callback anyway, for planning purposes (both in
relpages and for estimate_rel_size). We'll probably want to improve
that some day, but it doesn't strike me as hugely urgent.

Greetings,

Andres Freund

#174Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Ashwin Agrawal (#170)
Re: Pluggable Storage - Andres's take

On 18/05/2019 01:19, Ashwin Agrawal wrote:

On Tue, Apr 9, 2019 at 6:17 AM Heikki Linnakangas <hlinnaka@iki.fi
<mailto:hlinnaka@iki.fi>> wrote:

On 08/04/2019 20:37, Andres Freund wrote:

On 2019-04-08 15:34:46 +0300, Heikki Linnakangas wrote:

There's a little bug in index-only scan executor node, where it

mixes up the

slots to hold a tuple from the index, and from the table. That

doesn't cause

any ill effects if the AM uses TTSOpsHeapTuple, but with my toy

AM, which

uses a virtual slot, it caused warnings like this from

index-only scans:

Hm. That's another one that I think I had fixed previously :(,

and then

concluded that it's not actually necessary for some reason. Your fix
looks correct to me.  Do you want to commit it? Otherwise I'll

look at

it after rebasing zheap, and checking it with that.

I found another slot type confusion bug, while playing with
zedstore. In
an Index Scan, if you have an ORDER BY key that needs to be rechecked,
so that it uses the reorder queue, then it will sometimes use the
reorder queue slot, and sometimes the table AM's slot, for the scan
slot. If they're not of the same type, you get an assertion:

TRAP: FailedAssertion("!(op->d.fetch.kind == slot->tts_ops)", File:
"execExprInterp.c", Line: 1905)

Attached is a test for this, again using the toy table AM, extended to
be able to test this. And a fix.

It seems the two patches from email [1] fixing slot confusion in Index
Scans are pending to be committed.

1]
/messages/by-id/e71c4da4-3e82-cc4f-32cc-ede387fac8b0@iki.fi

Pushed the first patch now. Andres already fixed the second issue in
commit b8b94ea129.

- Heikki

#175Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Heikki Linnakangas (#174)
Re: Pluggable Storage - Andres's take

On 2019-Jun-06, Heikki Linnakangas wrote:

Pushed the first patch now. Andres already fixed the second issue in commit
b8b94ea129.

Please don't omit the "Discussion:" tag in commit messages.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#176Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Andres Freund (#148)
default_table_access_method is not in sample config file

On 11/04/2019 19:49, Andres Freund wrote:

On 2019-04-11 14:52:40 +0300, Heikki Linnakangas wrote:

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f7f726b5aec..bbcab9ce31a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3638,7 +3638,7 @@ static struct config_string ConfigureNamesString[] =
{"default_table_access_method", PGC_USERSET, CLIENT_CONN_STATEMENT,
gettext_noop("Sets the default table access method for new tables."),
NULL,
-			GUC_IS_NAME
+			GUC_NOT_IN_SAMPLE | GUC_IS_NAME
},
&default_table_access_method,
DEFAULT_TABLE_ACCESS_METHOD,

Hm, I think we should rather add it to sample. That's an oversight, not
intentional.

I just noticed that this is still an issue. default_table_access_method
is not in the sample config file, and it's not marked with
GUC_NOT_IN_SAMPLE. I'll add this to the open items list so we don't forget.

- Heikki

#177Michael Paquier
michael@paquier.xyz
In reply to: Heikki Linnakangas (#176)
1 attachment(s)
Re: default_table_access_method is not in sample config file

On Fri, Aug 09, 2019 at 11:34:05AM +0300, Heikki Linnakangas wrote:

On 11/04/2019 19:49, Andres Freund wrote:

Hm, I think we should rather add it to sample. That's an oversight, not
intentional.

I just noticed that this is still an issue. default_table_access_method is
not in the sample config file, and it's not marked with GUC_NOT_IN_SAMPLE.
I'll add this to the open items list so we don't forget.

I think that we should give it the same visibility as default_tablespace,
so adding it to the sample file sounds good to me.
--
Michael

Attachments:

sample-tab-am.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 65a6da18b3..39fc787851 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -622,6 +622,7 @@
 #default_tablespace = ''		# a tablespace name, '' uses the default
 #temp_tablespaces = ''			# a list of tablespace names, '' uses
 					# only default tablespace
+#default_table_access_method = 'heap'
 #check_function_bodies = on
 #default_transaction_isolation = 'read committed'
 #default_transaction_read_only = off
#178Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#177)
Re: default_table_access_method is not in sample config file

On 2019-08-13 15:03:13 +0900, Michael Paquier wrote:

On Fri, Aug 09, 2019 at 11:34:05AM +0300, Heikki Linnakangas wrote:

On 11/04/2019 19:49, Andres Freund wrote:

Hm, I think we should rather add it to sample. That's an oversight, not
intentional.

I just noticed that this is still an issue. default_table_access_method is
not in the sample config file, and it's not marked with GUC_NOT_IN_SAMPLE.
I'll add this to the open items list so we don't forget.

Thanks!

I think that we should give it the same visibility as default_tablespace,
so adding it to the sample file sounds good to me.

diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 65a6da18b3..39fc787851 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -622,6 +622,7 @@
#default_tablespace = ''		# a tablespace name, '' uses the default
#temp_tablespaces = ''			# a list of tablespace names, '' uses
# only default tablespace
+#default_table_access_method = 'heap'

Pushed, thanks.

#check_function_bodies = on
#default_transaction_isolation = 'read committed'
#default_transaction_read_only = off

Hm. I find the current ordering there a bit weird. Unrelated to your
proposed change. The header of the group is

#------------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#------------------------------------------------------------------------------

# - Statement Behavior -

but I don't quite see GUCs like default_tablespace, search_path (due to
determining a created table's schema), temp_tablespace,
default_table_access_method fit reasonably well under that heading. They
all can affect persistent state. That seems pretty different from a
number of other settings (client_min_messages,
default_transaction_isolation, lock_timeout, ...) which only have
transient effects.

Should we perhaps split that group? Not that I have a good proposal for
better names.

Greetings,

Andres Freund

#179Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#178)
Re: default_table_access_method is not in sample config file

On Fri, Aug 16, 2019 at 03:29:30PM -0700, Andres Freund wrote:

but I don't quite see GUCs like default_tablespace, search_path (due to
determining a created table's schema), temp_tablespace,
default_table_access_method fit reasonably well under that heading. They
all can affect persistent state. That seems pretty different from a
number of other settings (client_min_messages,
default_transaction_isolation, lock_timeout, ...) which only have
transient effects.

Agreed.

Should we perhaps split that group? Not that I have a good proposal for
better names.

We could have a section for transaction-related parameters, and move
the vacuum ones into the section for autovacuum so as they get
grouped, renaming the section "autovacuum and vacuum". An idea of
group for search_path, temp_tablespace, default_tablespace & co would
be "object parameters", or "relation parameters" for all the
parameters which interfere with object definitions?
--
Michael