Pluggable Storage - Andres's take
Hi,
As I've previously mentioned I had planned to spend some time to polish
Haribabu's version of the pluggable storage patch and rebase it on the
vtable based slot approach from [1]http://archives.postgresql.org/message-id/20180220224318.gw4oe5jadhpmcdnm%40alap3.anarazel.de. While doing so I found more and
more things that I previously hadn't noticed. I started rewriting things
into something closer to what I think we want architecturally.
The current state of my version of the patch is *NOT* ready for proper
review (it doesn't even pass all tests, there's FIXME / elog()s). But I
think it's getting close enough to it's eventual shape that more eyes,
and potentially more hands on keyboards, can be useful.
The most fundamental issues I had with Haribabu's last version from [2]http://archives.postgresql.org/message-id/CAJrrPGcN5A4jH0PJ-s=6k3+SLA4pozC4HHRdmvU1ZBuA20TE-A@mail.gmail.com
are the following:
- The use of TableTuple, a typedef from void *, is bad from multiple
fronts. For one it reduces just about all type safety. There were
numerous bugs in the patch where things were just cast from HeapTuple
to TableTuple to HeapTuple (and even to TupleTableSlot). I think it's
a really, really bad idea to introduce a vague type like this for
development purposes alone, it makes it way too hard to refactor -
essentially throwing the biggest benefit of type safe languages out of
the window.
Additionally I think it's also the wrong approach architecturally. We
shouldn't assume that a tuple can efficiently be represented as a
single palloc'ed chunk. In fact, we should move *away* from relying on
that so much.
I've thus removed the TableTuple type entirely.
- Previous verions of the patchset exposed Buffers in the tableam.h API,
performed buffer locking / pinning / ExecStoreTuple() calls outside of
it. That is wrong in my opinion, as various AMs will deal very
differently with buffer pinning & locking. The relevant logic is
largely moved within the AM. Bringing me to the next point:
- tableam exposed various operations based on HeapTuple/TableTuple's
(and their Buffers). This all need be slot based, as we can't
represent the way each AM will deal with this. I've largely converted
the API to be slot based. That has some fallout, but I think largely
works. Lots of outdated comments.
- I think the move of the indexing from outside the table layer into the
storage layer isn't a good idea. It lead to having to pass EState into
the tableam, a callback API to perform index updates, etc. This seems
to have at least partially been triggered by the speculative insertion
codepaths. I've reverted this part of the changes. The speculative
insertion / confirm codepaths are now exposed to tableam.h - I think
that's the right thing because we'll likely want to have that
functionality across more than a single tuple in the future.
- The visibility functions relied on the *caller* performing buffer
locking. That's not a great idea, because generic code shouldn't know
about the locking scheme a particular AM needs. I've changed the
external visibility functions to instead take a slot, and perform the
necessary locking inside.
- There were numerous tableam callback uses inside heapam.c - that makes
no sense, we know what the storage is therein. The relevant
- The integration between index lookups and heap lookups based on the
results on a index lookup was IMO too tight. The index code dealt
with heap tuples, which isn't great. I've introduced a new concept, a
'IndexFetchTableData' scan. It's initialized when building an index
scan, and provides the necessary state (say current heap buffer), to
do table lookups from within a heap.
- The am of relations required for bootstrapping was set to 0 - I don't
think that's a good idea. I changed it so it's set to the heap AM as
well.
- HOT was encoded in the API in a bunch of places. That doesn't look
right to me. I tried to improve a bit on that, but I'm not yet quite
sure I like it. Needs written explanation & arguments...
- the heap tableam did a heap_copytuple() nearly everywhere. Leading to
a higher memory usage, because the resulting tuples weren't freed or
anything. There might be a reason for doing such a change - we've
certainly discussed that before - but I'm *vehemently* against doing
that at the same time we introduce pluggable storage. Analyzing the
performance effects will be hard enough without changes like this.
- I've for now backed out the heap rewrite changes, partially. Mostly
because I didn't like the way the abstraction looks, but haven't quite
figured out how it should look like.
- I did not like that speculative tokens were moved to slots. There's
really no reason for them to live outside parameters to tableam.h
functsions.
- lotsa additional smaller changes.
- lotsa new bugs
My current working state is at [3]https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/pluggable-storage (urls to clone repo are at [4]https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=summary).
This is *HEAVILY WIP*. I plan to continue working on it over the next
days, but I'll temporarily focus onto v11 work. If others want I could
move repo to github and grant others write access.
I think the patchseries should eventually look like:
- move vacuumlazy.c (and other similar files) into access/heap, there's
really nothing generic here. This is a fairly independent task.
- slot'ify FDW RefetchForeignRow_function
- vtable based slot API, based on [1]http://archives.postgresql.org/message-id/20180220224318.gw4oe5jadhpmcdnm%40alap3.anarazel.de
- slot'ify trigger API
- redo EPQ based on slots (prototyped in my git tree)
- redo trigger API to be slot based
- tuple traversal API changes
- tableam infrastructure, with error if a non-builtin AM is chosen
- move heap and calling code to be tableam based
- make vacuum callback based (not vacuum.c, just vacuumlazy.c)
- [other patches]
- allow other AMs
- introduce test AM
Tasks / Questions:
- split up patch
- Change heap table AM to not allocate handler function for each table,
instead allocate it statically. Avoids a significant amount of data
duplication, and allows for a few more compiler optimizations.
- Merge tableam.h and tableamapi.h and make most tableam.c functions
small inline functions. Having one-line tableam.c wrappers makes this
more expensive than necessary. We'll have a big enough trouble not
regressing performancewise.
- change scan level slot creation to use tableam function for doing so
- get rid of slot->tts_tid, tts_tupleOid and potentially tts_tableOid
- COPY's multi_insert path should probably deal with a bunch of slots,
rather than forming HeapTuples
- bitmap index scans probably need a new tableam.h callback, abstracting
bitgetpage()
- suspect IndexBuildHeapScan might need to move into the tableam.h API -
it's not clear to me that it's realistically possible to this in a
generic manner.
Greetings,
Andres Freund
[1]: http://archives.postgresql.org/message-id/20180220224318.gw4oe5jadhpmcdnm%40alap3.anarazel.de
[2]: http://archives.postgresql.org/message-id/CAJrrPGcN5A4jH0PJ-s=6k3+SLA4pozC4HHRdmvU1ZBuA20TE-A@mail.gmail.com
[3]: https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/pluggable-storage
[4]: https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=summary
On Tue, Jul 3, 2018 at 5:06 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
As I've previously mentioned I had planned to spend some time to polish
Haribabu's version of the pluggable storage patch and rebase it on the
vtable based slot approach from [1]. While doing so I found more and
more things that I previously hadn't noticed. I started rewriting things
into something closer to what I think we want architecturally.
Thanks for the deep review and changes.
The current state of my version of the patch is *NOT* ready for proper
review (it doesn't even pass all tests, there's FIXME / elog()s). But I
think it's getting close enough to it's eventual shape that more eyes,
and potentially more hands on keyboards, can be useful.
I will try to update it to make sure that it passes all the tests and also
try to
reduce the FIXME's.
The most fundamental issues I had with Haribabu's last version from [2]
are the following:- The use of TableTuple, a typedef from void *, is bad from multiple
fronts. For one it reduces just about all type safety. There were
numerous bugs in the patch where things were just cast from HeapTuple
to TableTuple to HeapTuple (and even to TupleTableSlot). I think it's
a really, really bad idea to introduce a vague type like this for
development purposes alone, it makes it way too hard to refactor -
essentially throwing the biggest benefit of type safe languages out of
the window.
My earlier intention was to remove the HeapTuple usage entirely and replace
it with slot everywhere outside the tableam. But it ended up with TableTuple
before it reach to the stage because of heavy use of HeapTuple.
Additionally I think it's also the wrong approach architecturally. We
shouldn't assume that a tuple can efficiently be represented as a
single palloc'ed chunk. In fact, we should move *away* from relying on
that so much.I've thus removed the TableTuple type entirely.
Thanks for the changes, I didn't check the code yet, so for now whenever the
HeapTuple is required it will be generated from slot?
- Previous verions of the patchset exposed Buffers in the tableam.h API,
performed buffer locking / pinning / ExecStoreTuple() calls outside of
it. That is wrong in my opinion, as various AMs will deal very
differently with buffer pinning & locking. The relevant logic is
largely moved within the AM. Bringing me to the next point:- tableam exposed various operations based on HeapTuple/TableTuple's
(and their Buffers). This all need be slot based, as we can't
represent the way each AM will deal with this. I've largely converted
the API to be slot based. That has some fallout, but I think largely
works. Lots of outdated comments.
Yes, I agree with you.
- I think the move of the indexing from outside the table layer into the
storage layer isn't a good idea. It lead to having to pass EState into
the tableam, a callback API to perform index updates, etc. This seems
to have at least partially been triggered by the speculative insertion
codepaths. I've reverted this part of the changes. The speculative
insertion / confirm codepaths are now exposed to tableam.h - I think
that's the right thing because we'll likely want to have that
functionality across more than a single tuple in the future.- The visibility functions relied on the *caller* performing buffer
locking. That's not a great idea, because generic code shouldn't know
about the locking scheme a particular AM needs. I've changed the
external visibility functions to instead take a slot, and perform the
necessary locking inside.
When I first moved all the visibility functions as part of tableam, I find
this
problem, and it will be good if it takes of buffer locking and etc.
- There were numerous tableam callback uses inside heapam.c - that makes
no sense, we know what the storage is therein. The relevant- The integration between index lookups and heap lookups based on the
results on a index lookup was IMO too tight. The index code dealt
with heap tuples, which isn't great. I've introduced a new concept, a
'IndexFetchTableData' scan. It's initialized when building an index
scan, and provides the necessary state (say current heap buffer), to
do table lookups from within a heap.
I agree that it will be good with the new concept from index to access the
heap.
- The am of relations required for bootstrapping was set to 0 - I don't
think that's a good idea. I changed it so it's set to the heap AM as
well.- HOT was encoded in the API in a bunch of places. That doesn't look
right to me. I tried to improve a bit on that, but I'm not yet quite
sure I like it. Needs written explanation & arguments...- the heap tableam did a heap_copytuple() nearly everywhere. Leading to
a higher memory usage, because the resulting tuples weren't freed or
anything. There might be a reason for doing such a change - we've
certainly discussed that before - but I'm *vehemently* against doing
that at the same time we introduce pluggable storage. Analyzing the
performance effects will be hard enough without changes like this.
How about using of slot instead of tuple and reusing of it? I don't know
yet whether it is possible everywhere.
- I've for now backed out the heap rewrite changes, partially. Mostly
because I didn't like the way the abstraction looks, but haven't quite
figured out how it should look like.- I did not like that speculative tokens were moved to slots. There's
really no reason for them to live outside parameters to tableam.h
functsions.- lotsa additional smaller changes.
- lotsa new bugs
Thanks for all the changes.
My current working state is at [3] (urls to clone repo are at [4]).
This is *HEAVILY WIP*. I plan to continue working on it over the next
days, but I'll temporarily focus onto v11 work. If others want I could
move repo to github and grant others write access.
Yes, I want to access the code and do further development on it.
Tasks / Questions:
- split up patch
How about generating refactoring changes as patches first based on
the code in your repo as discussed here[1]?
- Change heap table AM to not allocate handler function for each table,
instead allocate it statically. Avoids a significant amount of data
duplication, and allows for a few more compiler optimizations.
Some kind of static variable handlers for each tableam, but need to check
how can we access that static handler from the relation.
- Merge tableam.h and tableamapi.h and make most tableam.c functions
small inline functions. Having one-line tableam.c wrappers makes this
more expensive than necessary. We'll have a big enough trouble not
regressing performancewise.
OK.
- change scan level slot creation to use tableam function for doing so
- get rid of slot->tts_tid, tts_tupleOid and potentially tts_tableOid
so with this there shouldn't be a way from slot to tid mapping or there
should be some other way.
- COPY's multi_insert path should probably deal with a bunch of slots,
rather than forming HeapTuples
OK.
- bitmap index scans probably need a new tableam.h callback, abstracting
bitgetpage()
OK.
Regards,
Haribabu Kommi
Fujitsu Australia
Hi!
On Tue, Jul 3, 2018 at 10:06 AM Andres Freund <andres@anarazel.de> wrote:
As I've previously mentioned I had planned to spend some time to polish
Haribabu's version of the pluggable storage patch and rebase it on the
vtable based slot approach from [1]. While doing so I found more and
more things that I previously hadn't noticed. I started rewriting things
into something closer to what I think we want architecturally.
Great, thank you for working on this patchset!
The current state of my version of the patch is *NOT* ready for proper
review (it doesn't even pass all tests, there's FIXME / elog()s). But I
think it's getting close enough to it's eventual shape that more eyes,
and potentially more hands on keyboards, can be useful.The most fundamental issues I had with Haribabu's last version from [2]
are the following:- The use of TableTuple, a typedef from void *, is bad from multiple
fronts. For one it reduces just about all type safety. There were
numerous bugs in the patch where things were just cast from HeapTuple
to TableTuple to HeapTuple (and even to TupleTableSlot). I think it's
a really, really bad idea to introduce a vague type like this for
development purposes alone, it makes it way too hard to refactor -
essentially throwing the biggest benefit of type safe languages out of
the window.Additionally I think it's also the wrong approach architecturally. We
shouldn't assume that a tuple can efficiently be represented as a
single palloc'ed chunk. In fact, we should move *away* from relying on
that so much.I've thus removed the TableTuple type entirely.
+1, TableTuple was vague concept.
- Previous verions of the patchset exposed Buffers in the tableam.h API,
performed buffer locking / pinning / ExecStoreTuple() calls outside of
it. That is wrong in my opinion, as various AMs will deal very
differently with buffer pinning & locking. The relevant logic is
largely moved within the AM. Bringing me to the next point:- tableam exposed various operations based on HeapTuple/TableTuple's
(and their Buffers). This all need be slot based, as we can't
represent the way each AM will deal with this. I've largely converted
the API to be slot based. That has some fallout, but I think largely
works. Lots of outdated comments.
Makes sense for me. I like passing TupleTableSlot to tableam api
function much more.
- I think the move of the indexing from outside the table layer into the
storage layer isn't a good idea. It lead to having to pass EState into
the tableam, a callback API to perform index updates, etc. This seems
to have at least partially been triggered by the speculative insertion
codepaths. I've reverted this part of the changes. The speculative
insertion / confirm codepaths are now exposed to tableam.h - I think
that's the right thing because we'll likely want to have that
functionality across more than a single tuple in the future.
I agree that passing EState into tableam doesn't look good. But I
believe that tableam needs way more control over indexes than it has
in your version patch. If even tableam-independent insertion into
indexes on tuple insert is more or less ok, but on update we need
something smarter rather than just insert index tuples depending
"update_indexes" flag. Tableam specific update method may decide to
update only some of indexes. For example, when zheap performs update
in-place then it inserts to only indexes, whose fields were updated.
And I think any undo-log based storage would have similar behavior.
Moreover, it might be required to do something with existing index
tuples (for instance, as I know, zheap sets "deleted" flag to index
tuples related to previous values of updated fields).
If we would like to move indexing outside of tableam, then we might
turn "update_indexes" from bool to more enum with values like: "don't
insert index tuples", "insert all index tuples", "insert index tuples
only for updated fields" and so on. But that looks more like set of
hardcoded cases for particular implementations, than proper API. So,
probably we shouldn't move indexing outside of tableam, but rather
provide better wrappers for doing that in tableam?
- The visibility functions relied on the *caller* performing buffer
locking. That's not a great idea, because generic code shouldn't know
about the locking scheme a particular AM needs. I've changed the
external visibility functions to instead take a slot, and perform the
necessary locking inside.
Makes sense for me. But would it cause extra locking/unlocking and in
turn performance impact?
- There were numerous tableam callback uses inside heapam.c - that makes
no sense, we know what the storage is therein. The relevant
Ok.
- The integration between index lookups and heap lookups based on the
results on a index lookup was IMO too tight. The index code dealt
with heap tuples, which isn't great. I've introduced a new concept, a
'IndexFetchTableData' scan. It's initialized when building an index
scan, and provides the necessary state (say current heap buffer), to
do table lookups from within a heap.
+1
- The am of relations required for bootstrapping was set to 0 - I don't
think that's a good idea. I changed it so it's set to the heap AM as
well.
+1
- HOT was encoded in the API in a bunch of places. That doesn't look
right to me. I tried to improve a bit on that, but I'm not yet quite
sure I like it. Needs written explanation & arguments...
Yes, HOT is heapam specific feature. Other tableams might not have
HOT. But it appears that we still expose hot_search_buffer() function
in tableam API. But that function have no usage, so it's just
redundant and can be removed.
- the heap tableam did a heap_copytuple() nearly everywhere. Leading to
a higher memory usage, because the resulting tuples weren't freed or
anything. There might be a reason for doing such a change - we've
certainly discussed that before - but I'm *vehemently* against doing
that at the same time we introduce pluggable storage. Analyzing the
performance effects will be hard enough without changes like this.
I think once we switched to slots, doing heap_copytuple() do
frequently is not required anymore.
- I've for now backed out the heap rewrite changes, partially. Mostly
because I didn't like the way the abstraction looks, but haven't quite
figured out how it should look like.
Yeah, it's hard part, but we need to invent something in this area...
- I did not like that speculative tokens were moved to slots. There's
really no reason for them to live outside parameters to tableam.h
functsions.
Good.
My current working state is at [3] (urls to clone repo are at [4]).
This is *HEAVILY WIP*. I plan to continue working on it over the next
days, but I'll temporarily focus onto v11 work. If others want I could
move repo to github and grant others write access.
Github would be more convinient for me.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Thu, Jul 5, 2018 at 3:25 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
My current working state is at [3] (urls to clone repo are at [4]).
This is *HEAVILY WIP*. I plan to continue working on it over the next
days, but I'll temporarily focus onto v11 work. If others want I could
move repo to github and grant others write access.Github would be more convinient for me.
I've another note. It appears that you leave my patch for locking
last version of tuple in one call (heapam_lock_tuple() function)
almost without changes. During PGCon 2018 Developer meeting I
remember you was somewhat unhappy with this approach. So, do you have
any notes about that for now?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hi,
I've pushed up a new version to
https://github.com/anarazel/postgres-pluggable-storage which now passes
all the tests.
Besides a lot of bugfixes, I've rebased the tree, moved TriggerData to
be primarily slot based (with a conversion roundtrip when calling
trigger functions), and a lot of other small things.
On 2018-07-04 20:11:21 +1000, Haribabu Kommi wrote:
On Tue, Jul 3, 2018 at 5:06 PM Andres Freund <andres@anarazel.de> wrote:
The current state of my version of the patch is *NOT* ready for proper
review (it doesn't even pass all tests, there's FIXME / elog()s). But I
think it's getting close enough to it's eventual shape that more eyes,
and potentially more hands on keyboards, can be useful.I will try to update it to make sure that it passes all the tests and also
try to
reduce the FIXME's.
Cool.
Alexander, Haribabu, if you give me (privately) your github accounts,
I'll give you write access to that repository.
The most fundamental issues I had with Haribabu's last version from [2]
are the following:- The use of TableTuple, a typedef from void *, is bad from multiple
fronts. For one it reduces just about all type safety. There were
numerous bugs in the patch where things were just cast from HeapTuple
to TableTuple to HeapTuple (and even to TupleTableSlot). I think it's
a really, really bad idea to introduce a vague type like this for
development purposes alone, it makes it way too hard to refactor -
essentially throwing the biggest benefit of type safe languages out of
the window.My earlier intention was to remove the HeapTuple usage entirely and replace
it with slot everywhere outside the tableam. But it ended up with TableTuple
before it reach to the stage because of heavy use of HeapTuple.
I don't think that's necessary - a lot of the system catalog accesses
are going to continue to be heap tuple accesses. And the conversions you
did significantly continue to access TableTuples as heap tuples - it was
just that the compiler didn't warn about it anymore.
A prime example of that is the way the rewriteheap / cluster
integreation was done. Cluster continued to sort tuples as heap tuples -
even though that's likely incompatible with other tuple formats which
need different state.
Additionally I think it's also the wrong approach architecturally. We
shouldn't assume that a tuple can efficiently be represented as a
single palloc'ed chunk. In fact, we should move *away* from relying on
that so much.I've thus removed the TableTuple type entirely.
Thanks for the changes, I didn't check the code yet, so for now whenever the
HeapTuple is required it will be generated from slot?
Pretty much.
- the heap tableam did a heap_copytuple() nearly everywhere. Leading to
a higher memory usage, because the resulting tuples weren't freed or
anything. There might be a reason for doing such a change - we've
certainly discussed that before - but I'm *vehemently* against doing
that at the same time we introduce pluggable storage. Analyzing the
performance effects will be hard enough without changes like this.How about using of slot instead of tuple and reusing of it? I don't know
yet whether it is possible everywhere.
Not quite sure what you mean?
Tasks / Questions:
- split up patch
How about generating refactoring changes as patches first based on
the code in your repo as discussed here[1]?
Sure - it was just at the moment too much work ;)
- Change heap table AM to not allocate handler function for each table,
instead allocate it statically. Avoids a significant amount of data
duplication, and allows for a few more compiler optimizations.Some kind of static variable handlers for each tableam, but need to check
how can we access that static handler from the relation.
I'm not sure what you mean by "how can we access"? We can just return a
pointer from the constant data from the current handler? Except that
adding a bunch of consts would be good, the external interface wouldn't
need to change?
- change scan level slot creation to use tableam function for doing so
- get rid of slot->tts_tid, tts_tupleOid and potentially tts_tableOidso with this there shouldn't be a way from slot to tid mapping or there
should be some other way.
I'm not sure I follow?
- bitmap index scans probably need a new tableam.h callback, abstracting
bitgetpage()
OK.
Any chance you could try to tackle this? I'm going to be mostly out
this week, so we'd probably not run across each others feet...
Greetings,
Andres Freund
Hi,
On 2018-07-05 15:25:25 +0300, Alexander Korotkov wrote:
- I think the move of the indexing from outside the table layer into the
storage layer isn't a good idea. It lead to having to pass EState into
the tableam, a callback API to perform index updates, etc. This seems
to have at least partially been triggered by the speculative insertion
codepaths. I've reverted this part of the changes. The speculative
insertion / confirm codepaths are now exposed to tableam.h - I think
that's the right thing because we'll likely want to have that
functionality across more than a single tuple in the future.I agree that passing EState into tableam doesn't look good. But I
believe that tableam needs way more control over indexes than it has
in your version patch. If even tableam-independent insertion into
indexes on tuple insert is more or less ok, but on update we need
something smarter rather than just insert index tuples depending
"update_indexes" flag. Tableam specific update method may decide to
update only some of indexes. For example, when zheap performs update
in-place then it inserts to only indexes, whose fields were updated.
And I think any undo-log based storage would have similar behavior.
Moreover, it might be required to do something with existing index
tuples (for instance, as I know, zheap sets "deleted" flag to index
tuples related to previous values of updated fields).
I agree that we probably need more - I'm just inclined to think that we
need a more concrete target to work against. Currently zheap's indexing
logic still is fairly naive, I don't think we'll get the interface right
without having worked further on the zheap side of things.
- The visibility functions relied on the *caller* performing buffer
locking. That's not a great idea, because generic code shouldn't know
about the locking scheme a particular AM needs. I've changed the
external visibility functions to instead take a slot, and perform the
necessary locking inside.Makes sense for me. But would it cause extra locking/unlocking and in
turn performance impact?
I don't think so - nearly all the performance relevant cases do all the
visibility logic inside the AM. Where the underlying functions can be
used, that do not do the locking. Pretty all the converted places just
had manual LockBuffer calls.
- HOT was encoded in the API in a bunch of places. That doesn't look
right to me. I tried to improve a bit on that, but I'm not yet quite
sure I like it. Needs written explanation & arguments...Yes, HOT is heapam specific feature. Other tableams might not have
HOT. But it appears that we still expose hot_search_buffer() function
in tableam API. But that function have no usage, so it's just
redundant and can be removed.
Yea, that was a leftover.
- the heap tableam did a heap_copytuple() nearly everywhere. Leading to
a higher memory usage, because the resulting tuples weren't freed or
anything. There might be a reason for doing such a change - we've
certainly discussed that before - but I'm *vehemently* against doing
that at the same time we introduce pluggable storage. Analyzing the
performance effects will be hard enough without changes like this.I think once we switched to slots, doing heap_copytuple() do
frequently is not required anymore.
It's mostly gone now.
- I've for now backed out the heap rewrite changes, partially. Mostly
because I didn't like the way the abstraction looks, but haven't quite
figured out how it should look like.Yeah, it's hard part, but we need to invent something in this area...
I agree. But I really don't yet quite now what. I somewhat wonder if we
should just add a cluster_rel() callback to the tableam and let it deal
with everything :(. As previously proposed the interface wouldn't have
worked with anything not losslessly encodable into a heaptuple, which is
unlikely to be sufficient.
FWIW, I plan to be mostly out until Thursday this week, and then I'll
rebase onto the new version of the abstract slot patch and then try to
split up the patchset. Once that's done, I'll do a prototype conversion
of zheap, which I'm sure will show up a lot of weaknesses in the current
abstraction. Once that's done I hope we can collaborate / divide &
conquer to get the individual pieces into commit shape.
If either of you wants to get a head start separating something out,
let's try to organize who would do what? The EPQ and trigger
slotification are probably good candidates.
Greetings,
Andres Freund
On Mon, Jul 16, 2018 at 11:35 PM Andres Freund <andres@anarazel.de> wrote:
On 2018-07-04 20:11:21 +1000, Haribabu Kommi wrote:
On Tue, Jul 3, 2018 at 5:06 PM Andres Freund <andres@anarazel.de> wrote:
The most fundamental issues I had with Haribabu's last version from [2]
are the following:- The use of TableTuple, a typedef from void *, is bad from multiple
fronts. For one it reduces just about all type safety. There were
numerous bugs in the patch where things were just cast from HeapTuple
to TableTuple to HeapTuple (and even to TupleTableSlot). I thinkit's
a really, really bad idea to introduce a vague type like this for
development purposes alone, it makes it way too hard to refactor -
essentially throwing the biggest benefit of type safe languages outof
the window.
My earlier intention was to remove the HeapTuple usage entirely and
replace
it with slot everywhere outside the tableam. But it ended up with
TableTuple
before it reach to the stage because of heavy use of HeapTuple.
I don't think that's necessary - a lot of the system catalog accesses
are going to continue to be heap tuple accesses. And the conversions you
did significantly continue to access TableTuples as heap tuples - it was
just that the compiler didn't warn about it anymore.A prime example of that is the way the rewriteheap / cluster
integreation was done. Cluster continued to sort tuples as heap tuples -
even though that's likely incompatible with other tuple formats which
need different state.
OK. Understood.
- the heap tableam did a heap_copytuple() nearly everywhere. Leading to
a higher memory usage, because the resulting tuples weren't freed or
anything. There might be a reason for doing such a change - we've
certainly discussed that before - but I'm *vehemently* against doing
that at the same time we introduce pluggable storage. Analyzing the
performance effects will be hard enough without changes like this.How about using of slot instead of tuple and reusing of it? I don't know
yet whether it is possible everywhere.Not quite sure what you mean?
I thought of using slot everywhere can reduce the use of heap_copytuple,
I understand that you already did those changes from you reply from the
other mail.
Tasks / Questions:
- split up patch
How about generating refactoring changes as patches first based on
the code in your repo as discussed here[1]?Sure - it was just at the moment too much work ;)
Yes, it is too much work. How about doing this once most of the
open items are finished?
- Change heap table AM to not allocate handler function for each table,
instead allocate it statically. Avoids a significant amount of data
duplication, and allows for a few more compiler optimizations.Some kind of static variable handlers for each tableam, but need to check
how can we access that static handler from the relation.I'm not sure what you mean by "how can we access"? We can just return a
pointer from the constant data from the current handler? Except that
adding a bunch of consts would be good, the external interface wouldn't
need to change?
I mean we may need to store some tableam ID in each table, so that based on
that ID we get the static tableam handler, because at a time there may be
tables from different tableam methods.
- change scan level slot creation to use tableam function for doing so
- get rid of slot->tts_tid, tts_tupleOid and potentially tts_tableOidso with this there shouldn't be a way from slot to tid mapping or there
should be some other way.I'm not sure I follow?
Replacing of heaptuple with slot, currently with slot only the tid is passed
to the tableam methods, To get rid of the tid from slot, we may need any
other method of passing?
- bitmap index scans probably need a new tableam.h callback, abstracting
bitgetpage()
OK.
Any chance you could try to tackle this? I'm going to be mostly out
this week, so we'd probably not run across each others feet...
OK, I will take care of the above point.
Regards,
Haribabu Kommi
Fujitsu Australia
On Tue, Jul 17, 2018 at 11:01 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
On Mon, Jul 16, 2018 at 11:35 PM Andres Freund <andres@anarazel.de> wrote:
On 2018-07-04 20:11:21 +1000, Haribabu Kommi wrote:
On Tue, Jul 3, 2018 at 5:06 PM Andres Freund <andres@anarazel.de>
wrote:
- bitmap index scans probably need a new tableam.h callback, abstracting
bitgetpage()
OK.
Any chance you could try to tackle this? I'm going to be mostly out
this week, so we'd probably not run across each others feet...OK, I will take care of the above point.
I added new API in the tableam.h to get all the page visible tuples to
abstract the bitgetpage() function.
- Merge tableam.h and tableamapi.h and make most tableam.c functions
small inline functions. Having one-line tableam.c wrappers makes this
more expensive than necessary. We'll have a big enough trouble not
regressing performancewise.
I merged tableam.h and tableamapi.h into tableam.h and changed all the
functions as inline. This change may have added some additional headers,
will check them if I can remove their need.
Attached are the updated patches on top your github tree.
Currently I am working on the following.
- I observed that there is a crash when running isolation tests.
- COPY's multi_insert path should probably deal with a bunch of slots,
rather than forming HeapTuples
Regards,
Haribabu Kommi
Fujitsu Australia
Attachments:
0002-New-API-to-get-heap-page-tuples.patchapplication/octet-stream; name=0002-New-API-to-get-heap-page-tuples.patchDownload
From e96746d07b8dfcdb17082b54524505163f5e237a Mon Sep 17 00:00:00 2001
From: Hari Babu <haribabuk@fast.au.fujitsu.com>
Date: Tue, 24 Jul 2018 23:18:15 +1000
Subject: [PATCH 2/2] New API to get heap page tuples
This API is used in bitmap scan to get all the visible tuples
from the page.
---
src/backend/access/heap/heapam_handler.c | 105 ++++++++++++++++++++++
src/backend/executor/nodeBitmapHeapscan.c | 100 ++-------------------
src/include/access/tableam.h | 20 +++++
3 files changed, 131 insertions(+), 94 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 42eec2a2ab..a3fe110efe 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,9 +33,13 @@
#include "storage/bufpage.h"
#include "storage/bufmgr.h"
+#include "storage/predicate.h"
#include "access/xact.h"
+static bool heapam_fetch_tuple_from_offset(TableScanDesc sscan, BlockNumber blkno,
+ OffsetNumber offset, TupleTableSlot *slot);
+
/* ----------------------------------------------------------------
* storage AM support routines for heapam
* ----------------------------------------------------------------
@@ -462,6 +466,106 @@ heapam_get_heappagescandesc(TableScanDesc sscan)
return &scan->rs_pagescan;
}
+static void
+heapam_scan_get_page_tuples(TableScanDesc scan,
+ HeapPageScanDesc pagescan,
+ TupleTableSlot *slot,
+ BlockNumber page,
+ int ntuples,
+ OffsetNumber *offsets)
+{
+ Buffer buffer;
+ Snapshot snapshot;
+ int ntup;
+
+ /*
+ * Acquire pin on the target heap page, trading in any pin we held before.
+ */
+ Assert(page < pagescan->rs_nblocks);
+
+ scan->rs_cbuf = ReleaseAndReadBuffer(scan->rs_cbuf,
+ scan->rs_rd,
+ page);
+ buffer = scan->rs_cbuf;
+ snapshot = scan->rs_snapshot;
+
+ ntup = 0;
+
+ /*
+ * Prune and repair fragmentation for the whole page, if possible.
+ */
+ heap_page_prune_opt(scan->rs_rd, buffer);
+
+ /*
+ * We must hold share lock on the buffer content while examining tuple
+ * visibility. Afterwards, however, the tuples we have found to be
+ * visible are guaranteed good as long as we hold the buffer pin.
+ */
+ LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+ /*
+ * We need two separate strategies for lossy and non-lossy cases.
+ */
+ if (ntuples >= 0)
+ {
+ /*
+ * Bitmap is non-lossy, so we just look through the offsets listed in
+ * tbmres; but we have to follow any HOT chain starting at each such
+ * offset.
+ */
+ int curslot;
+
+ for (curslot = 0; curslot < ntuples; curslot++)
+ {
+ OffsetNumber offnum = offsets[curslot];
+ ItemPointerData tid;
+ HeapTupleData heapTuple;
+
+ ItemPointerSet(&tid, page, offnum);
+ if (heap_hot_search_buffer(&tid, scan->rs_rd, buffer, snapshot,
+ &heapTuple, NULL, true))
+ pagescan->rs_vistuples[ntup++] = ItemPointerGetOffsetNumber(&tid);
+ }
+ }
+ else
+ {
+ /*
+ * Bitmap is lossy, so we must examine each item pointer on the page.
+ * But we can ignore HOT chains, since we'll check each tuple anyway.
+ */
+ Page dp = (Page) BufferGetPage(buffer);
+ OffsetNumber maxoff = PageGetMaxOffsetNumber(dp);
+ OffsetNumber offnum;
+
+ for (offnum = FirstOffsetNumber; offnum <= maxoff; offnum = OffsetNumberNext(offnum))
+ {
+ ItemId lp;
+ bool valid;
+ BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
+
+ lp = PageGetItemId(dp, offnum);
+ if (!ItemIdIsNormal(lp))
+ continue;
+
+ /* FIXME: unnecessarily pins */
+ heapam_fetch_tuple_from_offset(scan, page, offnum, slot);
+ valid = HeapTupleSatisfies(bslot->base.tuple, snapshot, bslot->buffer);
+ if (valid)
+ {
+ pagescan->rs_vistuples[ntup++] = offnum;
+ PredicateLockTuple(scan->rs_rd, bslot->base.tuple, snapshot);
+ }
+ CheckForSerializableConflictOut(valid, scan->rs_rd, bslot->base.tuple,
+ buffer, snapshot);
+ }
+ }
+
+ LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
+ Assert(ntup <= MaxHeapTuplesPerPage);
+ pagescan->rs_ntuples = ntup;
+}
+
static bool
heapam_fetch_tuple_from_offset(TableScanDesc sscan, BlockNumber blkno, OffsetNumber offset, TupleTableSlot *slot)
{
@@ -683,6 +787,7 @@ heap_tableam_handler(PG_FUNCTION_ARGS)
* BitmapHeap and Sample Scans
*/
amroutine->scan_get_heappagescandesc = heapam_get_heappagescandesc;
+ amroutine->scan_get_page_tuples = heapam_scan_get_page_tuples;
amroutine->sync_scan_report_location = ss_report_location;
amroutine->tuple_fetch_row_version = heapam_fetch_row_version;
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 8521e45132..be123728eb 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -371,100 +371,12 @@ BitmapHeapNext(BitmapHeapScanState *node)
static void
bitgetpage(BitmapHeapScanState *node, TBMIterateResult *tbmres)
{
- TableScanDesc scan = node->ss.ss_currentScanDesc;
- HeapPageScanDesc pagescan = node->pagescan;
- TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
- BlockNumber page = tbmres->blockno;
- Buffer buffer;
- Snapshot snapshot;
- int ntup;
-
- /*
- * Acquire pin on the target heap page, trading in any pin we held before.
- */
- Assert(page < pagescan->rs_nblocks);
-
- scan->rs_cbuf = ReleaseAndReadBuffer(scan->rs_cbuf,
- scan->rs_rd,
- page);
- buffer = scan->rs_cbuf;
- snapshot = scan->rs_snapshot;
-
- ntup = 0;
-
- /*
- * Prune and repair fragmentation for the whole page, if possible.
- */
- heap_page_prune_opt(scan->rs_rd, buffer);
-
- /*
- * We must hold share lock on the buffer content while examining tuple
- * visibility. Afterwards, however, the tuples we have found to be
- * visible are guaranteed good as long as we hold the buffer pin.
- */
- LockBuffer(buffer, BUFFER_LOCK_SHARE);
-
- /*
- * We need two separate strategies for lossy and non-lossy cases.
- */
- if (tbmres->ntuples >= 0)
- {
- /*
- * Bitmap is non-lossy, so we just look through the offsets listed in
- * tbmres; but we have to follow any HOT chain starting at each such
- * offset.
- */
- int curslot;
-
- for (curslot = 0; curslot < tbmres->ntuples; curslot++)
- {
- OffsetNumber offnum = tbmres->offsets[curslot];
- ItemPointerData tid;
- HeapTupleData heapTuple;
-
- ItemPointerSet(&tid, page, offnum);
- if (heap_hot_search_buffer(&tid, scan->rs_rd, buffer, snapshot,
- &heapTuple, NULL, true))
- pagescan->rs_vistuples[ntup++] = ItemPointerGetOffsetNumber(&tid);
- }
- }
- else
- {
- /*
- * Bitmap is lossy, so we must examine each item pointer on the page.
- * But we can ignore HOT chains, since we'll check each tuple anyway.
- */
- Page dp = (Page) BufferGetPage(buffer);
- OffsetNumber maxoff = PageGetMaxOffsetNumber(dp);
- OffsetNumber offnum;
-
- for (offnum = FirstOffsetNumber; offnum <= maxoff; offnum = OffsetNumberNext(offnum))
- {
- ItemId lp;
- bool valid;
- BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
-
- lp = PageGetItemId(dp, offnum);
- if (!ItemIdIsNormal(lp))
- continue;
-
- /* FIXME: unnecessarily pins */
- table_tuple_fetch_from_offset(scan, page, offnum, slot);
- valid = HeapTupleSatisfies(bslot->base.tuple, snapshot, bslot->buffer);
- if (valid)
- {
- pagescan->rs_vistuples[ntup++] = offnum;
- PredicateLockTuple(scan->rs_rd, bslot->base.tuple, snapshot);
- }
- CheckForSerializableConflictOut(valid, scan->rs_rd, bslot->base.tuple,
- buffer, snapshot);
- }
- }
-
- LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
-
- Assert(ntup <= MaxHeapTuplesPerPage);
- pagescan->rs_ntuples = ntup;
+ table_scan_get_page_tuples(node->ss.ss_currentScanDesc,
+ node->pagescan,
+ node->ss.ss_ScanTupleSlot,
+ tbmres->blockno,
+ tbmres->ntuples,
+ tbmres->offsets);
}
/*
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index bf675ff881..df60ba3316 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -172,6 +172,13 @@ typedef bool (*TupleFetchFollow_function)(struct IndexFetchTableData *scan,
TupleTableSlot *slot,
bool *call_again, bool *all_dead);
+typedef void (*ScanGetPageTuples_function)(TableScanDesc scan,
+ HeapPageScanDesc pagescan,
+ TupleTableSlot *slot,
+ BlockNumber page,
+ int ntuples,
+ OffsetNumber *offsets);
+
/*
* API struct for a table AM. Note this must be stored in a single palloc'd
* chunk of memory.
@@ -224,6 +231,7 @@ typedef struct TableAmRoutine
ScanGetpage_function scan_getpage;
ScanRescan_function scan_rescan;
ScanUpdateSnapshot_function scan_update_snapshot;
+ ScanGetPageTuples_function scan_get_page_tuples;
BeginIndexFetchTable_function begin_index_fetch;
EndIndexFetchTable_function reset_index_fetch;
@@ -472,6 +480,18 @@ table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot)
scan->rs_rd->rd_tableamroutine->scan_update_snapshot(scan, snapshot);
}
+static inline void
+table_scan_get_page_tuples(TableScanDesc scan,
+ HeapPageScanDesc pagescan,
+ TupleTableSlot *slot,
+ BlockNumber page,
+ int ntuples,
+ OffsetNumber *offsets)
+{
+ scan->rs_rd->rd_tableamroutine->scan_get_page_tuples(scan, pagescan, slot, page, ntuples, offsets);
+}
+
+
static inline TupleTableSlot *
table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *slot)
{
--
2.18.0.windows.1
0001-Merge-tableam.h-and-tableamapi.h.patchapplication/octet-stream; name=0001-Merge-tableam.h-and-tableamapi.h.patchDownload
From 4bb12ad1e73927ee31683d728d5cdaebca6a53a6 Mon Sep 17 00:00:00 2001
From: Hari Babu <haribabuk@fast.au.fujitsu.com>
Date: Mon, 23 Jul 2018 14:44:35 +1000
Subject: [PATCH 1/2] Merge tableam.h and tableamapi.h
And also most tableam.c functions small inline functions.
Having one-line tableam.c wrappers makes this more expensive
than necessary.
The above change may added some internal headers also exposed
via tableam.h, may need another check.
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/heap/heapam.c | 2 +-
src/backend/access/heap/heapam_handler.c | 2 +-
src/backend/access/heap/rewriteheap.c | 1 +
src/backend/access/nbtree/nbtsort.c | 1 +
src/backend/access/table/Makefile | 2 +-
src/backend/access/table/tableam.c | 472 ---------------
src/backend/access/table/tableamapi.c | 2 +-
src/backend/commands/cluster.c | 2 +-
src/backend/executor/execIndexing.c | 1 +
src/backend/executor/nodeSamplescan.c | 1 +
src/backend/optimizer/util/plancat.c | 2 +-
src/backend/postmaster/autovacuum.c | 1 +
src/backend/storage/lmgr/predicate.c | 1 +
src/backend/utils/adt/ri_triggers.c | 1 +
src/backend/utils/adt/selfuncs.c | 1 +
src/backend/utils/cache/relcache.c | 2 +-
src/include/access/relscan.h | 1 -
src/include/access/tableam.h | 738 ++++++++++++++++++++---
src/include/access/tableamapi.h | 212 -------
src/include/nodes/nodes.h | 2 +-
src/include/utils/tqual.h | 15 -
22 files changed, 669 insertions(+), 795 deletions(-)
delete mode 100644 src/backend/access/table/tableam.c
delete mode 100644 src/include/access/tableamapi.h
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index 3995a88397..959f5e7dc8 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -26,7 +26,7 @@
#include "access/multixact.h"
#include "access/relscan.h"
-#include "access/tableamapi.h"
+#include "access/tableam.h"
#include "access/xact.h"
#include "catalog/namespace.h"
#include "catalog/pg_authid.h"
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 5b8155c911..40c1a5432d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -45,7 +45,7 @@
#include "access/multixact.h"
#include "access/parallel.h"
#include "access/relscan.h"
-#include "access/tableamapi.h"
+#include "access/tableam.h"
#include "access/sysattr.h"
#include "access/transam.h"
#include "access/tuptoaster.h"
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 94c64d8387..42eec2a2ab 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -23,7 +23,7 @@
#include "access/heapam.h"
#include "access/relscan.h"
#include "access/rewriteheap.h"
-#include "access/tableamapi.h"
+#include "access/tableam.h"
#include "catalog/pg_am_d.h"
#include "pgstat.h"
#include "storage/lmgr.h"
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 2ddb421eb0..5dad191ab2 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -110,6 +110,7 @@
#include "access/heapam.h"
#include "access/heapam_xlog.h"
#include "access/rewriteheap.h"
+#include "access/tableam.h"
#include "access/transam.h"
#include "access/tuptoaster.h"
#include "access/xact.h"
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 74f8e1bbeb..be74041df4 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -60,6 +60,7 @@
#include "access/nbtree.h"
#include "access/parallel.h"
#include "access/relscan.h"
+#include "access/tableam.h"
#include "access/xact.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
diff --git a/src/backend/access/table/Makefile b/src/backend/access/table/Makefile
index fe22bf9208..ff0989ed24 100644
--- a/src/backend/access/table/Makefile
+++ b/src/backend/access/table/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/access/table
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = tableam.o tableamapi.o tableam_common.o
+OBJS = tableamapi.o tableam_common.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
deleted file mode 100644
index 77c04aaa27..0000000000
--- a/src/backend/access/table/tableam.c
+++ /dev/null
@@ -1,472 +0,0 @@
-/*-------------------------------------------------------------------------
- *
- * tableam.c
- * table access method code
- *
- * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
- * Portions Copyright (c) 1994, Regents of the University of California
- *
- *
- * IDENTIFICATION
- * src/backend/access/table/tableam.c
- *
- *-------------------------------------------------------------------------
- */
-#include "postgres.h"
-
-#include "access/tableam.h"
-#include "access/tableamapi.h"
-#include "access/relscan.h"
-#include "storage/bufmgr.h"
-#include "utils/rel.h"
-#include "utils/tqual.h"
-
-TupleTableSlot*
-table_gimmegimmeslot(Relation relation, List **reglist)
-{
- TupleTableSlot *slot;
-
- slot = relation->rd_tableamroutine->gimmegimmeslot(relation);
-
- if (reglist)
- *reglist = lappend(*reglist, slot);
-
- return slot;
-}
-
-/*
- * table_fetch_row_version - retrieve tuple with given tid
- *
- * XXX: This shouldn't just take a tid, but tid + additional information
- */
-bool
-table_fetch_row_version(Relation r,
- ItemPointer tid,
- Snapshot snapshot,
- TupleTableSlot *slot,
- Relation stats_relation)
-{
- return r->rd_tableamroutine->tuple_fetch_row_version(r, tid,
- snapshot, slot,
- stats_relation);
-}
-
-
-/*
- * table_lock_tuple - lock a tuple in shared or exclusive mode
- *
- * XXX: This shouldn't just take a tid, but tid + additional information
- */
-HTSU_Result
-table_lock_tuple(Relation relation, ItemPointer tid, Snapshot snapshot,
- TupleTableSlot *slot, CommandId cid, LockTupleMode mode,
- LockWaitPolicy wait_policy, uint8 flags,
- HeapUpdateFailureData *hufd)
-{
- return relation->rd_tableamroutine->tuple_lock(relation, tid, snapshot, slot,
- cid, mode, wait_policy,
- flags, hufd);
-}
-
-/* ----------------
- * heap_beginscan_parallel - join a parallel scan
- *
- * Caller must hold a suitable lock on the correct relation.
- * ----------------
- */
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelHeapScanDesc parallel_scan)
-{
- Snapshot snapshot;
-
- Assert(RelationGetRelid(relation) == parallel_scan->phs_relid);
-
- if (!parallel_scan->phs_snapshot_any)
- {
- /* Snapshot was serialized -- restore it */
- snapshot = RestoreSnapshot(parallel_scan->phs_snapshot_data);
- RegisterSnapshot(snapshot);
- }
- else
- {
- /* SnapshotAny passed by caller (not serialized) */
- snapshot = SnapshotAny;
- }
-
- return relation->rd_tableamroutine->scan_begin(relation, snapshot, 0, NULL, parallel_scan,
- true, true, true, false, false, !parallel_scan->phs_snapshot_any);
-}
-
-ParallelHeapScanDesc
-tableam_get_parallelheapscandesc(TableScanDesc sscan)
-{
- return sscan->rs_rd->rd_tableamroutine->scan_get_parallelheapscandesc(sscan);
-}
-
-HeapPageScanDesc
-tableam_get_heappagescandesc(TableScanDesc sscan)
-{
- /*
- * Planner should have already validated whether the current storage
- * supports Page scans are not? This function will be called only from
- * Bitmap Heap scan and sample scan
- */
- Assert(sscan->rs_rd->rd_tableamroutine->scan_get_heappagescandesc != NULL);
-
- return sscan->rs_rd->rd_tableamroutine->scan_get_heappagescandesc(sscan);
-}
-
-void
-table_syncscan_report_location(Relation rel, BlockNumber location)
-{
- return rel->rd_tableamroutine->sync_scan_report_location(rel, location);
-}
-
-/*
- * heap_setscanlimits - restrict range of a heapscan
- *
- * startBlk is the page to start at
- * numBlks is number of pages to scan (InvalidBlockNumber means "all")
- */
-void
-table_setscanlimits(TableScanDesc sscan, BlockNumber startBlk, BlockNumber numBlks)
-{
- sscan->rs_rd->rd_tableamroutine->scansetlimits(sscan, startBlk, numBlks);
-}
-
-
-/* ----------------
- * heap_beginscan - begin relation scan
- *
- * heap_beginscan is the "standard" case.
- *
- * heap_beginscan_catalog differs in setting up its own temporary snapshot.
- *
- * heap_beginscan_strat offers an extended API that lets the caller control
- * whether a nondefault buffer access strategy can be used, and whether
- * syncscan can be chosen (possibly resulting in the scan not starting from
- * block zero). Both of these default to true with plain heap_beginscan.
- *
- * heap_beginscan_bm is an alternative entry point for setting up a
- * TableScanDesc for a bitmap heap scan. Although that scan technology is
- * really quite unlike a standard seqscan, there is just enough commonality
- * to make it worth using the same data structure.
- *
- * heap_beginscan_sampling is an alternative entry point for setting up a
- * TableScanDesc for a TABLESAMPLE scan. As with bitmap scans, it's worth
- * using the same data structure although the behavior is rather different.
- * In addition to the options offered by heap_beginscan_strat, this call
- * also allows control of whether page-mode visibility checking is used.
- * ----------------
- */
-TableScanDesc
-table_beginscan(Relation relation, Snapshot snapshot,
- int nkeys, ScanKey key)
-{
- return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
- true, true, true, false, false, false);
-}
-
-TableScanDesc
-table_beginscan_catalog(Relation relation, int nkeys, ScanKey key)
-{
- Oid relid = RelationGetRelid(relation);
- Snapshot snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
-
- return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
- true, true, true, false, false, true);
-}
-
-TableScanDesc
-table_beginscan_strat(Relation relation, Snapshot snapshot,
- int nkeys, ScanKey key,
- bool allow_strat, bool allow_sync)
-{
- return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
- allow_strat, allow_sync, true,
- false, false, false);
-}
-
-TableScanDesc
-table_beginscan_bm(Relation relation, Snapshot snapshot,
- int nkeys, ScanKey key)
-{
- return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
- false, false, true, true, false, false);
-}
-
-TableScanDesc
-table_beginscan_sampling(Relation relation, Snapshot snapshot,
- int nkeys, ScanKey key,
- bool allow_strat, bool allow_sync, bool allow_pagemode)
-{
- return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
- allow_strat, allow_sync, allow_pagemode,
- false, true, false);
-}
-
-/* ----------------
- * heap_rescan - restart a relation scan
- * ----------------
- */
-void
-table_rescan(TableScanDesc scan,
- ScanKey key)
-{
- scan->rs_rd->rd_tableamroutine->scan_rescan(scan, key, false, false, false, false);
-}
-
-/* ----------------
- * heap_rescan_set_params - restart a relation scan after changing params
- *
- * This call allows changing the buffer strategy, syncscan, and pagemode
- * options before starting a fresh scan. Note that although the actual use
- * of syncscan might change (effectively, enabling or disabling reporting),
- * the previously selected startblock will be kept.
- * ----------------
- */
-void
-table_rescan_set_params(TableScanDesc scan, ScanKey key,
- bool allow_strat, bool allow_sync, bool allow_pagemode)
-{
- scan->rs_rd->rd_tableamroutine->scan_rescan(scan, key, true,
- allow_strat, allow_sync, (allow_pagemode && IsMVCCSnapshot(scan->rs_snapshot)));
-}
-
-/* ----------------
- * heap_endscan - end relation scan
- *
- * See how to integrate with index scans.
- * Check handling if reldesc caching.
- * ----------------
- */
-void
-table_endscan(TableScanDesc scan)
-{
- scan->rs_rd->rd_tableamroutine->scan_end(scan);
-}
-
-
-/* ----------------
- * heap_update_snapshot
- *
- * Update snapshot info in heap scan descriptor.
- * ----------------
- */
-void
-table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot)
-{
- scan->rs_rd->rd_tableamroutine->scan_update_snapshot(scan, snapshot);
-}
-
-TupleTableSlot *
-table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *slot)
-{
- return sscan->rs_rd->rd_tableamroutine->scan_getnextslot(sscan, direction, slot);
-}
-
-bool
-table_tuple_fetch_from_offset(TableScanDesc sscan, BlockNumber blkno, OffsetNumber offset, TupleTableSlot *slot)
-{
- return sscan->rs_rd->rd_tableamroutine->scan_fetch_tuple_from_offset(sscan, blkno, offset, slot);
-}
-
-
-IndexFetchTableData*
-table_begin_index_fetch_table(Relation rel)
-{
- return rel->rd_tableamroutine->begin_index_fetch(rel);
-}
-
-void
-table_reset_index_fetch_table(IndexFetchTableData* scan)
-{
- scan->rel->rd_tableamroutine->reset_index_fetch(scan);
-}
-
-void
-table_end_index_fetch_table(IndexFetchTableData* scan)
-{
- scan->rel->rd_tableamroutine->end_index_fetch(scan);
-}
-
-/*
- * Insert a tuple from a slot into table AM routine
- */
-Oid
-table_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
- int options, BulkInsertState bistate)
-{
- return relation->rd_tableamroutine->tuple_insert(relation, slot, cid, options,
- bistate);
-}
-
-Oid
-table_insert_speculative(Relation relation, TupleTableSlot *slot, CommandId cid,
- int options, BulkInsertState bistate, uint32 specToken)
-{
- return relation->rd_tableamroutine->tuple_insert_speculative(relation, slot, cid, options,
- bistate, specToken);
-}
-
-void table_complete_speculative(Relation relation, TupleTableSlot *slot, uint32 specToken,
- bool succeeded)
-{
- return relation->rd_tableamroutine->tuple_complete_speculative(relation, slot, specToken, succeeded);
-}
-
-/*
- * Delete a tuple from tid using table AM routine
- */
-HTSU_Result
-table_delete(Relation relation, ItemPointer tid, CommandId cid,
- Snapshot crosscheck, bool wait,
- HeapUpdateFailureData *hufd, bool changingPart)
-{
- return relation->rd_tableamroutine->tuple_delete(relation, tid, cid,
- crosscheck, wait, hufd, changingPart);
-}
-
-/*
- * update a tuple from tid using table AM routine
- */
-HTSU_Result
-table_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
- CommandId cid, Snapshot crosscheck, bool wait,
- HeapUpdateFailureData *hufd, LockTupleMode *lockmode,
- bool *update_indexes)
-{
- return relation->rd_tableamroutine->tuple_update(relation, otid, slot,
- cid, crosscheck, wait, hufd,
- lockmode, update_indexes);
-}
-
-bool
-table_fetch_follow(struct IndexFetchTableData *scan,
- ItemPointer tid,
- Snapshot snapshot,
- TupleTableSlot *slot,
- bool *call_again, bool *all_dead)
-{
-
- return scan->rel->rd_tableamroutine->tuple_fetch_follow(scan, tid, snapshot,
- slot, call_again,
- all_dead);
-}
-
-bool
-table_fetch_follow_check(Relation rel,
- ItemPointer tid,
- Snapshot snapshot,
- bool *all_dead)
-{
- IndexFetchTableData *scan = table_begin_index_fetch_table(rel);
- TupleTableSlot *slot = table_gimmegimmeslot(rel, NULL);
- bool call_again = false;
- bool found;
-
- found = table_fetch_follow(scan, tid, snapshot, slot, &call_again, all_dead);
-
- table_end_index_fetch_table(scan);
- ExecDropSingleTupleTableSlot(slot);
-
- return found;
-}
-
-/*
- * table_multi_insert - insert multiple tuple into a table
- */
-void
-table_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
- CommandId cid, int options, BulkInsertState bistate)
-{
- relation->rd_tableamroutine->multi_insert(relation, tuples, ntuples,
- cid, options, bistate);
-}
-
-tuple_data
-table_tuple_get_data(Relation relation, TupleTableSlot *slot, tuple_data_flags flags)
-{
- return relation->rd_tableamroutine->get_tuple_data(slot, flags);
-}
-
-void
-table_get_latest_tid(Relation relation,
- Snapshot snapshot,
- ItemPointer tid)
-{
- relation->rd_tableamroutine->tuple_get_latest_tid(relation, snapshot, tid);
-}
-
-
-void
-table_vacuum_rel(Relation rel, int options,
- struct VacuumParams *params, BufferAccessStrategy bstrategy)
-{
- rel->rd_tableamroutine->relation_vacuum(rel, options, params, bstrategy);
-}
-
-/*
- * table_sync - sync a heap, for use when no WAL has been written
- */
-void
-table_sync(Relation rel)
-{
- rel->rd_tableamroutine->relation_sync(rel);
-}
-
-/*
- * -------------------
- * storage Bulk Insert functions
- * -------------------
- */
-BulkInsertState
-table_getbulkinsertstate(Relation rel)
-{
- return rel->rd_tableamroutine->getbulkinsertstate();
-}
-
-void
-table_freebulkinsertstate(Relation rel, BulkInsertState bistate)
-{
- rel->rd_tableamroutine->freebulkinsertstate(bistate);
-}
-
-void
-table_releasebulkinsertstate(Relation rel, BulkInsertState bistate)
-{
- rel->rd_tableamroutine->releasebulkinsertstate(bistate);
-}
-
-/*
- * -------------------
- * storage tuple rewrite functions
- * -------------------
- */
-RewriteState
-table_begin_rewrite(Relation OldHeap, Relation NewHeap,
- TransactionId OldestXmin, TransactionId FreezeXid,
- MultiXactId MultiXactCutoff, bool use_wal)
-{
- return NewHeap->rd_tableamroutine->begin_heap_rewrite(OldHeap, NewHeap,
- OldestXmin, FreezeXid, MultiXactCutoff, use_wal);
-}
-
-void
-table_end_rewrite(Relation rel, RewriteState state)
-{
- rel->rd_tableamroutine->end_heap_rewrite(state);
-}
-
-void
-table_rewrite_tuple(Relation rel, RewriteState state, HeapTuple oldTuple,
- HeapTuple newTuple)
-{
- rel->rd_tableamroutine->rewrite_heap_tuple(state, oldTuple, newTuple);
-}
-
-bool
-table_rewrite_dead_tuple(Relation rel, RewriteState state, HeapTuple oldTuple)
-{
- return rel->rd_tableamroutine->rewrite_heap_dead_tuple(state, oldTuple);
-}
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index f94660e306..91e5774a6e 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -13,7 +13,7 @@
#include "postgres.h"
#include "access/htup_details.h"
-#include "access/tableamapi.h"
+#include "access/tableam.h"
#include "catalog/pg_am.h"
#include "catalog/pg_proc.h"
#include "utils/syscache.h"
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 14ed2aa393..34f815c28f 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -21,7 +21,7 @@
#include "access/multixact.h"
#include "access/relscan.h"
#include "access/rewriteheap.h"
-#include "access/tableamapi.h"
+#include "access/tableam.h"
#include "access/transam.h"
#include "access/tuptoaster.h"
#include "access/xact.h"
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 542210b29f..80b604821b 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -107,6 +107,7 @@
#include "postgres.h"
#include "access/relscan.h"
+#include "access/tableam.h"
#include "access/xact.h"
#include "catalog/index.h"
#include "executor/executor.h"
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 566dabaa00..b5d02983c5 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -16,6 +16,7 @@
#include "access/hash.h"
#include "access/relscan.h"
+#include "access/tableam.h"
#include "access/tsmapi.h"
#include "executor/executor.h"
#include "executor/nodeSamplescan.h"
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index f3cd64cf62..8fe8312f29 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -21,7 +21,7 @@
#include "access/heapam.h"
#include "access/htup_details.h"
#include "access/nbtree.h"
-#include "access/tableamapi.h"
+#include "access/tableam.h"
#include "access/sysattr.h"
#include "access/transam.h"
#include "access/xlog.h"
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 4455b42875..7142a54ce9 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -69,6 +69,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/reloptions.h"
+#include "access/tableam.h"
#include "access/transam.h"
#include "access/xact.h"
#include "catalog/dependency.h"
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index e8390311d0..2960e21340 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -188,6 +188,7 @@
#include "access/htup_details.h"
#include "access/slru.h"
#include "access/subtrans.h"
+#include "access/tableam.h"
#include "access/transam.h"
#include "access/twophase.h"
#include "access/twophase_rmgr.h"
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index a661f4b047..254041cea7 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -31,6 +31,7 @@
#include "postgres.h"
#include "access/htup_details.h"
+#include "access/tableam.h"
#include "access/sysattr.h"
#include "access/xact.h"
#include "catalog/pg_collation.h"
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 203b83ad06..7fcf077426 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -103,6 +103,7 @@
#include "access/brin.h"
#include "access/gin.h"
#include "access/htup_details.h"
+#include "access/tableam.h"
#include "access/sysattr.h"
#include "catalog/index.h"
#include "catalog/pg_am.h"
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 6360371493..ece332bd44 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -36,7 +36,7 @@
#include "access/nbtree.h"
#include "access/reloptions.h"
#include "access/sysattr.h"
-#include "access/tableamapi.h"
+#include "access/tableam.h"
#include "access/tupdesc_details.h"
#include "access/xact.h"
#include "access/xlog.h"
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index e9d8eed541..97208d4c44 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -16,7 +16,6 @@
#include "access/genam.h"
#include "access/heapam.h"
-#include "access/tableam.h"
#include "access/htup_details.h"
#include "access/itup.h"
#include "access/tupdesc.h"
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index fd05018ee8..bf675ff881 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -14,10 +14,18 @@
#ifndef TABLEAM_H
#define TABLEAM_H
+#include "postgres.h"
+
#include "access/heapam.h"
+#include "access/relscan.h"
#include "executor/tuptable.h"
#include "nodes/execnodes.h"
+#include "nodes/nodes.h"
+#include "fmgr.h"
#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/snapshot.h"
+#include "utils/tqual.h"
#define DEFAULT_TABLE_ACCESS_METHOD "heap_tableam"
@@ -37,103 +45,661 @@ typedef enum tuple_data_flags
CTID
} tuple_data_flags;
-extern TupleTableSlot* table_gimmegimmeslot(Relation relation, List **reglist);
-extern TableScanDesc table_beginscan_parallel(Relation relation, ParallelHeapScanDesc parallel_scan);
-extern ParallelHeapScanDesc tableam_get_parallelheapscandesc(TableScanDesc sscan);
-extern HeapPageScanDesc tableam_get_heappagescandesc(TableScanDesc sscan);
-extern void table_syncscan_report_location(Relation rel, BlockNumber location);
-extern void table_setscanlimits(TableScanDesc sscan, BlockNumber startBlk, BlockNumber numBlks);
-extern TableScanDesc table_beginscan(Relation relation, Snapshot snapshot,
- int nkeys, ScanKey key);
-extern TableScanDesc table_beginscan_catalog(Relation relation, int nkeys, ScanKey key);
-extern TableScanDesc table_beginscan_strat(Relation relation, Snapshot snapshot,
- int nkeys, ScanKey key,
- bool allow_strat, bool allow_sync);
-extern TableScanDesc table_beginscan_bm(Relation relation, Snapshot snapshot,
- int nkeys, ScanKey key);
-extern TableScanDesc table_beginscan_sampling(Relation relation, Snapshot snapshot,
- int nkeys, ScanKey key,
- bool allow_strat, bool allow_sync, bool allow_pagemode);
-extern struct IndexFetchTableData* table_begin_index_fetch_table(Relation rel);
-extern void table_reset_index_fetch_table(struct IndexFetchTableData* scan);
-extern void table_end_index_fetch_table(struct IndexFetchTableData* scan);
+/*
+ * Storage routine function hooks
+ */
+typedef bool (*SnapshotSatisfies_function) (TupleTableSlot *slot, Snapshot snapshot);
+typedef HTSU_Result (*SnapshotSatisfiesUpdate_function) (TupleTableSlot *slot, CommandId curcid);
+typedef HTSV_Result (*SnapshotSatisfiesVacuum_function) (TupleTableSlot *slot, TransactionId OldestXmin);
-extern void table_endscan(TableScanDesc scan);
-extern void table_rescan(TableScanDesc scan, ScanKey key);
-extern void table_rescan_set_params(TableScanDesc scan, ScanKey key,
- bool allow_strat, bool allow_sync, bool allow_pagemode);
-extern void table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot);
+typedef Oid (*TupleInsert_function) (Relation rel, TupleTableSlot *slot, CommandId cid,
+ int options, BulkInsertState bistate);
-extern TupleTableSlot *table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *slot);
-extern bool table_tuple_fetch_from_offset(TableScanDesc sscan, BlockNumber blkno, OffsetNumber offset, TupleTableSlot *slot);
+typedef Oid (*TupleInsertSpeculative_function) (Relation rel,
+ TupleTableSlot *slot,
+ CommandId cid,
+ int options,
+ BulkInsertState bistate,
+ uint32 specToken);
-extern void storage_get_latest_tid(Relation relation,
- Snapshot snapshot,
- ItemPointer tid);
-
-extern bool table_fetch_row_version(Relation relation,
- ItemPointer tid,
- Snapshot snapshot,
- TupleTableSlot *slot,
- Relation stats_relation);
-
-extern bool table_fetch_follow(struct IndexFetchTableData *scan,
- ItemPointer tid,
- Snapshot snapshot,
- TupleTableSlot *slot,
- bool *call_again, bool *all_dead);
-
-extern bool table_fetch_follow_check(Relation rel,
- ItemPointer tid,
- Snapshot snapshot,
- bool *all_dead);
-
-extern HTSU_Result table_lock_tuple(Relation relation, ItemPointer tid, Snapshot snapshot,
- TupleTableSlot *slot, CommandId cid, LockTupleMode mode,
- LockWaitPolicy wait_policy, uint8 flags,
- HeapUpdateFailureData *hufd);
-
-extern Oid table_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
- int options, BulkInsertState bistate);
-extern Oid table_insert_speculative(Relation relation, TupleTableSlot *slot, CommandId cid,
- int options, BulkInsertState bistate, uint32 specToken);
-extern void table_complete_speculative(Relation relation, TupleTableSlot *slot, uint32 specToken,
- bool succeeded);
-
-extern HTSU_Result table_delete(Relation relation, ItemPointer tid, CommandId cid,
- Snapshot crosscheck, bool wait, HeapUpdateFailureData *hufd,
- bool changingPart);
-
-extern HTSU_Result table_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
- CommandId cid, Snapshot crosscheck, bool wait,
- HeapUpdateFailureData *hufd, LockTupleMode *lockmode,
- bool *upddate_indexes);
-
-extern void table_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
- CommandId cid, int options, BulkInsertState bistate);
-
-extern tuple_data table_tuple_get_data(Relation relation, TupleTableSlot *slot, tuple_data_flags flags);
-
-extern void table_get_latest_tid(Relation relation,
- Snapshot snapshot,
- ItemPointer tid);
-extern void table_sync(Relation rel);
+typedef void (*TupleCompleteSpeculative_function) (Relation rel,
+ TupleTableSlot *slot,
+ uint32 specToken,
+ bool succeeded);
+
+typedef HTSU_Result (*TupleDelete_function) (Relation relation,
+ ItemPointer tid,
+ CommandId cid,
+ Snapshot crosscheck,
+ bool wait,
+ HeapUpdateFailureData *hufd,
+ bool changingPart);
+
+typedef HTSU_Result (*TupleUpdate_function) (Relation relation,
+ ItemPointer otid,
+ TupleTableSlot *slot,
+ CommandId cid,
+ Snapshot crosscheck,
+ bool wait,
+ HeapUpdateFailureData *hufd,
+ LockTupleMode *lockmode,
+ bool *update_indexes);
+
+typedef bool (*TupleFetchRowVersion_function) (Relation relation,
+ ItemPointer tid,
+ Snapshot snapshot,
+ TupleTableSlot *slot,
+ Relation stats_relation);
+
+typedef HTSU_Result (*TupleLock_function) (Relation relation,
+ ItemPointer tid,
+ Snapshot snapshot,
+ TupleTableSlot *slot,
+ CommandId cid,
+ LockTupleMode mode,
+ LockWaitPolicy wait_policy,
+ uint8 flags,
+ HeapUpdateFailureData *hufd);
+
+typedef void (*MultiInsert_function) (Relation relation, HeapTuple *tuples, int ntuples,
+ CommandId cid, int options, BulkInsertState bistate);
+
+typedef void (*TupleGetLatestTid_function) (Relation relation,
+ Snapshot snapshot,
+ ItemPointer tid);
+
+typedef tuple_data(*GetTupleData_function) (TupleTableSlot *slot, tuple_data_flags flags);
+
struct VacuumParams;
-extern void table_vacuum_rel(Relation onerel, int options,
+typedef void (*RelationVacuum_function)(Relation onerel, int options,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
-extern BulkInsertState table_getbulkinsertstate(Relation rel);
-extern void table_freebulkinsertstate(Relation rel, BulkInsertState bistate);
-extern void table_releasebulkinsertstate(Relation rel, BulkInsertState bistate);
+typedef void (*RelationSync_function) (Relation relation);
-extern RewriteState table_begin_rewrite(Relation OldHeap, Relation NewHeap,
+typedef BulkInsertState (*GetBulkInsertState_function) (void);
+typedef void (*FreeBulkInsertState_function) (BulkInsertState bistate);
+typedef void (*ReleaseBulkInsertState_function) (BulkInsertState bistate);
+
+typedef RewriteState (*BeginHeapRewrite_function) (Relation OldHeap, Relation NewHeap,
TransactionId OldestXmin, TransactionId FreezeXid,
MultiXactId MultiXactCutoff, bool use_wal);
-extern void table_end_rewrite(Relation rel, RewriteState state);
-extern void table_rewrite_tuple(Relation rel, RewriteState state, HeapTuple oldTuple,
+typedef void (*EndHeapRewrite_function) (RewriteState state);
+typedef void (*RewriteHeapTuple_function) (RewriteState state, HeapTuple oldTuple,
HeapTuple newTuple);
-extern bool table_rewrite_dead_tuple(Relation rel, RewriteState state, HeapTuple oldTuple);
+typedef bool (*RewriteHeapDeadTuple_function) (RewriteState state, HeapTuple oldTuple);
+
+typedef TupleTableSlot* (*Slot_function) (Relation relation);
+
+typedef TableScanDesc (*ScanBegin_function) (Relation relation,
+ Snapshot snapshot,
+ int nkeys, ScanKey key,
+ ParallelHeapScanDesc parallel_scan,
+ bool allow_strat,
+ bool allow_sync,
+ bool allow_pagemode,
+ bool is_bitmapscan,
+ bool is_samplescan,
+ bool temp_snap);
+
+typedef struct IndexFetchTableData* (*BeginIndexFetchTable_function) (Relation relation);
+typedef void (*ResetIndexFetchTable_function) (struct IndexFetchTableData* data);
+typedef void (*EndIndexFetchTable_function) (struct IndexFetchTableData* data);
+
+typedef ParallelHeapScanDesc (*ScanGetParallelheapscandesc_function) (TableScanDesc scan);
+typedef HeapPageScanDesc(*ScanGetHeappagescandesc_function) (TableScanDesc scan);
+typedef void (*SyncScanReportLocation_function) (Relation rel, BlockNumber location);
+typedef void (*ScanSetlimits_function) (TableScanDesc sscan, BlockNumber startBlk, BlockNumber numBlks);
+
+typedef TupleTableSlot *(*ScanGetnextSlot_function) (TableScanDesc scan,
+ ScanDirection direction, TupleTableSlot *slot);
+
+typedef bool (*ScanFetchTupleFromOffset_function) (TableScanDesc scan,
+ BlockNumber blkno, OffsetNumber offset, TupleTableSlot *slot);
+
+typedef void (*ScanEnd_function) (TableScanDesc scan);
+
+
+typedef void (*ScanGetpage_function) (TableScanDesc scan, BlockNumber page);
+typedef void (*ScanRescan_function) (TableScanDesc scan, ScanKey key, bool set_params,
+ bool allow_strat, bool allow_sync, bool allow_pagemode);
+typedef void (*ScanUpdateSnapshot_function) (TableScanDesc scan, Snapshot snapshot);
+
+typedef bool (*TupleFetchFollow_function)(struct IndexFetchTableData *scan,
+ ItemPointer tid,
+ Snapshot snapshot,
+ TupleTableSlot *slot,
+ bool *call_again, bool *all_dead);
+
+/*
+ * API struct for a table AM. Note this must be stored in a single palloc'd
+ * chunk of memory.
+ */
+typedef struct TableAmRoutine
+{
+ NodeTag type;
+
+ Slot_function gimmegimmeslot;
+
+ SnapshotSatisfies_function snapshot_satisfies;
+ SnapshotSatisfiesUpdate_function snapshot_satisfiesUpdate;
+ SnapshotSatisfiesVacuum_function snapshot_satisfiesVacuum;
+
+ /* Operations on physical tuples */
+ TupleInsert_function tuple_insert;
+ TupleInsertSpeculative_function tuple_insert_speculative;
+ TupleCompleteSpeculative_function tuple_complete_speculative;
+ TupleUpdate_function tuple_update;
+ TupleDelete_function tuple_delete;
+ TupleFetchRowVersion_function tuple_fetch_row_version;
+ TupleLock_function tuple_lock;
+ MultiInsert_function multi_insert;
+ TupleGetLatestTid_function tuple_get_latest_tid;
+ TupleFetchFollow_function tuple_fetch_follow;
+
+ GetTupleData_function get_tuple_data;
+
+ RelationVacuum_function relation_vacuum;
+ RelationSync_function relation_sync;
+
+ GetBulkInsertState_function getbulkinsertstate;
+ FreeBulkInsertState_function freebulkinsertstate;
+ ReleaseBulkInsertState_function releasebulkinsertstate;
+
+ BeginHeapRewrite_function begin_heap_rewrite;
+ EndHeapRewrite_function end_heap_rewrite;
+ RewriteHeapTuple_function rewrite_heap_tuple;
+ RewriteHeapDeadTuple_function rewrite_heap_dead_tuple;
+
+ /* Operations on relation scans */
+ ScanBegin_function scan_begin;
+ ScanGetParallelheapscandesc_function scan_get_parallelheapscandesc;
+ ScanGetHeappagescandesc_function scan_get_heappagescandesc;
+ SyncScanReportLocation_function sync_scan_report_location;
+ ScanSetlimits_function scansetlimits;
+ ScanGetnextSlot_function scan_getnextslot;
+ ScanFetchTupleFromOffset_function scan_fetch_tuple_from_offset;
+ ScanEnd_function scan_end;
+ ScanGetpage_function scan_getpage;
+ ScanRescan_function scan_rescan;
+ ScanUpdateSnapshot_function scan_update_snapshot;
+
+ BeginIndexFetchTable_function begin_index_fetch;
+ EndIndexFetchTable_function reset_index_fetch;
+ EndIndexFetchTable_function end_index_fetch;
+
+} TableAmRoutine;
+
+/*
+ * INLINE functions
+ */
+static inline TupleTableSlot*
+table_gimmegimmeslot(Relation relation, List **reglist)
+{
+ TupleTableSlot *slot;
+
+ slot = relation->rd_tableamroutine->gimmegimmeslot(relation);
+
+ if (reglist)
+ *reglist = lappend(*reglist, slot);
+
+ return slot;
+}
+
+/*
+ * table_fetch_row_version - retrieve tuple with given tid
+ *
+ * XXX: This shouldn't just take a tid, but tid + additional information
+ */
+static inline bool
+table_fetch_row_version(Relation r,
+ ItemPointer tid,
+ Snapshot snapshot,
+ TupleTableSlot *slot,
+ Relation stats_relation)
+{
+ return r->rd_tableamroutine->tuple_fetch_row_version(r, tid,
+ snapshot, slot,
+ stats_relation);
+}
+
+
+/*
+ * table_lock_tuple - lock a tuple in shared or exclusive mode
+ *
+ * XXX: This shouldn't just take a tid, but tid + additional information
+ */
+static inline HTSU_Result
+table_lock_tuple(Relation relation, ItemPointer tid, Snapshot snapshot,
+ TupleTableSlot *slot, CommandId cid, LockTupleMode mode,
+ LockWaitPolicy wait_policy, uint8 flags,
+ HeapUpdateFailureData *hufd)
+{
+ return relation->rd_tableamroutine->tuple_lock(relation, tid, snapshot, slot,
+ cid, mode, wait_policy,
+ flags, hufd);
+}
+
+/* ----------------
+ * heap_beginscan_parallel - join a parallel scan
+ *
+ * Caller must hold a suitable lock on the correct relation.
+ * ----------------
+ */
+static inline TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelHeapScanDesc parallel_scan)
+{
+ Snapshot snapshot;
+
+ Assert(RelationGetRelid(relation) == parallel_scan->phs_relid);
+
+ if (!parallel_scan->phs_snapshot_any)
+ {
+ /* Snapshot was serialized -- restore it */
+ snapshot = RestoreSnapshot(parallel_scan->phs_snapshot_data);
+ RegisterSnapshot(snapshot);
+ }
+ else
+ {
+ /* SnapshotAny passed by caller (not serialized) */
+ snapshot = SnapshotAny;
+ }
+
+ return relation->rd_tableamroutine->scan_begin(relation, snapshot, 0, NULL, parallel_scan,
+ true, true, true, false, false, !parallel_scan->phs_snapshot_any);
+}
+
+static inline ParallelHeapScanDesc
+tableam_get_parallelheapscandesc(TableScanDesc sscan)
+{
+ return sscan->rs_rd->rd_tableamroutine->scan_get_parallelheapscandesc(sscan);
+}
+
+static inline HeapPageScanDesc
+tableam_get_heappagescandesc(TableScanDesc sscan)
+{
+ /*
+ * Planner should have already validated whether the current storage
+ * supports Page scans are not? This function will be called only from
+ * Bitmap Heap scan and sample scan
+ */
+ Assert(sscan->rs_rd->rd_tableamroutine->scan_get_heappagescandesc != NULL);
+
+ return sscan->rs_rd->rd_tableamroutine->scan_get_heappagescandesc(sscan);
+}
+
+static inline void
+table_syncscan_report_location(Relation rel, BlockNumber location)
+{
+ return rel->rd_tableamroutine->sync_scan_report_location(rel, location);
+}
+
+/*
+ * heap_setscanlimits - restrict range of a heapscan
+ *
+ * startBlk is the page to start at
+ * numBlks is number of pages to scan (InvalidBlockNumber means "all")
+ */
+static inline void
+table_setscanlimits(TableScanDesc sscan, BlockNumber startBlk, BlockNumber numBlks)
+{
+ sscan->rs_rd->rd_tableamroutine->scansetlimits(sscan, startBlk, numBlks);
+}
+
+
+/* ----------------
+ * heap_beginscan - begin relation scan
+ *
+ * heap_beginscan is the "standard" case.
+ *
+ * heap_beginscan_catalog differs in setting up its own temporary snapshot.
+ *
+ * heap_beginscan_strat offers an extended API that lets the caller control
+ * whether a nondefault buffer access strategy can be used, and whether
+ * syncscan can be chosen (possibly resulting in the scan not starting from
+ * block zero). Both of these default to true with plain heap_beginscan.
+ *
+ * heap_beginscan_bm is an alternative entry point for setting up a
+ * TableScanDesc for a bitmap heap scan. Although that scan technology is
+ * really quite unlike a standard seqscan, there is just enough commonality
+ * to make it worth using the same data structure.
+ *
+ * heap_beginscan_sampling is an alternative entry point for setting up a
+ * TableScanDesc for a TABLESAMPLE scan. As with bitmap scans, it's worth
+ * using the same data structure although the behavior is rather different.
+ * In addition to the options offered by heap_beginscan_strat, this call
+ * also allows control of whether page-mode visibility checking is used.
+ * ----------------
+ */
+static inline TableScanDesc
+table_beginscan(Relation relation, Snapshot snapshot,
+ int nkeys, ScanKey key)
+{
+ return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
+ true, true, true, false, false, false);
+}
+
+static inline TableScanDesc
+table_beginscan_catalog(Relation relation, int nkeys, ScanKey key)
+{
+ Oid relid = RelationGetRelid(relation);
+ Snapshot snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
+
+ return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
+ true, true, true, false, false, true);
+}
+
+static inline TableScanDesc
+table_beginscan_strat(Relation relation, Snapshot snapshot,
+ int nkeys, ScanKey key,
+ bool allow_strat, bool allow_sync)
+{
+ return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
+ allow_strat, allow_sync, true,
+ false, false, false);
+}
+
+static inline TableScanDesc
+table_beginscan_bm(Relation relation, Snapshot snapshot,
+ int nkeys, ScanKey key)
+{
+ return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
+ false, false, true, true, false, false);
+}
+
+static inline TableScanDesc
+table_beginscan_sampling(Relation relation, Snapshot snapshot,
+ int nkeys, ScanKey key,
+ bool allow_strat, bool allow_sync, bool allow_pagemode)
+{
+ return relation->rd_tableamroutine->scan_begin(relation, snapshot, nkeys, key, NULL,
+ allow_strat, allow_sync, allow_pagemode,
+ false, true, false);
+}
+
+/* ----------------
+ * heap_rescan - restart a relation scan
+ * ----------------
+ */
+static inline void
+table_rescan(TableScanDesc scan,
+ ScanKey key)
+{
+ scan->rs_rd->rd_tableamroutine->scan_rescan(scan, key, false, false, false, false);
+}
+
+/* ----------------
+ * heap_rescan_set_params - restart a relation scan after changing params
+ *
+ * This call allows changing the buffer strategy, syncscan, and pagemode
+ * options before starting a fresh scan. Note that although the actual use
+ * of syncscan might change (effectively, enabling or disabling reporting),
+ * the previously selected startblock will be kept.
+ * ----------------
+ */
+static inline void
+table_rescan_set_params(TableScanDesc scan, ScanKey key,
+ bool allow_strat, bool allow_sync, bool allow_pagemode)
+{
+ scan->rs_rd->rd_tableamroutine->scan_rescan(scan, key, true,
+ allow_strat, allow_sync, (allow_pagemode && IsMVCCSnapshot(scan->rs_snapshot)));
+}
+
+/* ----------------
+ * heap_endscan - end relation scan
+ *
+ * See how to integrate with index scans.
+ * Check handling if reldesc caching.
+ * ----------------
+ */
+static inline void
+table_endscan(TableScanDesc scan)
+{
+ scan->rs_rd->rd_tableamroutine->scan_end(scan);
+}
+
+
+/* ----------------
+ * heap_update_snapshot
+ *
+ * Update snapshot info in heap scan descriptor.
+ * ----------------
+ */
+static inline void
+table_scan_update_snapshot(TableScanDesc scan, Snapshot snapshot)
+{
+ scan->rs_rd->rd_tableamroutine->scan_update_snapshot(scan, snapshot);
+}
+
+static inline TupleTableSlot *
+table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *slot)
+{
+ return sscan->rs_rd->rd_tableamroutine->scan_getnextslot(sscan, direction, slot);
+}
+
+static inline bool
+table_tuple_fetch_from_offset(TableScanDesc sscan, BlockNumber blkno, OffsetNumber offset, TupleTableSlot *slot)
+{
+ return sscan->rs_rd->rd_tableamroutine->scan_fetch_tuple_from_offset(sscan, blkno, offset, slot);
+}
+
+
+static inline IndexFetchTableData*
+table_begin_index_fetch_table(Relation rel)
+{
+ return rel->rd_tableamroutine->begin_index_fetch(rel);
+}
+
+static inline void
+table_reset_index_fetch_table(struct IndexFetchTableData* scan)
+{
+ scan->rel->rd_tableamroutine->reset_index_fetch(scan);
+}
+
+static inline void
+table_end_index_fetch_table(struct IndexFetchTableData* scan)
+{
+ scan->rel->rd_tableamroutine->end_index_fetch(scan);
+}
+
+/*
+ * Insert a tuple from a slot into table AM routine
+ */
+static inline Oid
+table_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
+ int options, BulkInsertState bistate)
+{
+ return relation->rd_tableamroutine->tuple_insert(relation, slot, cid, options,
+ bistate);
+}
+
+static inline Oid
+table_insert_speculative(Relation relation, TupleTableSlot *slot, CommandId cid,
+ int options, BulkInsertState bistate, uint32 specToken)
+{
+ return relation->rd_tableamroutine->tuple_insert_speculative(relation, slot, cid, options,
+ bistate, specToken);
+}
+
+static inline void
+table_complete_speculative(Relation relation, TupleTableSlot *slot, uint32 specToken,
+ bool succeeded)
+{
+ return relation->rd_tableamroutine->tuple_complete_speculative(relation, slot, specToken, succeeded);
+}
+
+/*
+ * Delete a tuple from tid using table AM routine
+ */
+static inline HTSU_Result
+table_delete(Relation relation, ItemPointer tid, CommandId cid,
+ Snapshot crosscheck, bool wait,
+ HeapUpdateFailureData *hufd, bool changingPart)
+{
+ return relation->rd_tableamroutine->tuple_delete(relation, tid, cid,
+ crosscheck, wait, hufd, changingPart);
+}
+
+/*
+ * update a tuple from tid using table AM routine
+ */
+static inline HTSU_Result
+table_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
+ CommandId cid, Snapshot crosscheck, bool wait,
+ HeapUpdateFailureData *hufd, LockTupleMode *lockmode,
+ bool *update_indexes)
+{
+ return relation->rd_tableamroutine->tuple_update(relation, otid, slot,
+ cid, crosscheck, wait, hufd,
+ lockmode, update_indexes);
+}
+
+static inline bool
+table_fetch_follow(struct IndexFetchTableData *scan,
+ ItemPointer tid,
+ Snapshot snapshot,
+ TupleTableSlot *slot,
+ bool *call_again, bool *all_dead)
+{
+
+ return scan->rel->rd_tableamroutine->tuple_fetch_follow(scan, tid, snapshot,
+ slot, call_again,
+ all_dead);
+}
+
+static inline bool
+table_fetch_follow_check(Relation rel,
+ ItemPointer tid,
+ Snapshot snapshot,
+ bool *all_dead)
+{
+ IndexFetchTableData *scan = table_begin_index_fetch_table(rel);
+ TupleTableSlot *slot = table_gimmegimmeslot(rel, NULL);
+ bool call_again = false;
+ bool found;
+
+ found = table_fetch_follow(scan, tid, snapshot, slot, &call_again, all_dead);
+
+ table_end_index_fetch_table(scan);
+ ExecDropSingleTupleTableSlot(slot);
+
+ return found;
+}
+
+/*
+ * table_multi_insert - insert multiple tuple into a table
+ */
+static inline void
+table_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
+ CommandId cid, int options, BulkInsertState bistate)
+{
+ relation->rd_tableamroutine->multi_insert(relation, tuples, ntuples,
+ cid, options, bistate);
+}
+
+static inline tuple_data
+table_tuple_get_data(Relation relation, TupleTableSlot *slot, tuple_data_flags flags)
+{
+ return relation->rd_tableamroutine->get_tuple_data(slot, flags);
+}
+
+static inline void
+table_get_latest_tid(Relation relation,
+ Snapshot snapshot,
+ ItemPointer tid)
+{
+ relation->rd_tableamroutine->tuple_get_latest_tid(relation, snapshot, tid);
+}
+
+
+static inline void
+table_vacuum_rel(Relation rel, int options,
+ struct VacuumParams *params, BufferAccessStrategy bstrategy)
+{
+ rel->rd_tableamroutine->relation_vacuum(rel, options, params, bstrategy);
+}
+
+/*
+ * table_sync - sync a heap, for use when no WAL has been written
+ */
+static inline void
+table_sync(Relation rel)
+{
+ rel->rd_tableamroutine->relation_sync(rel);
+}
+
+/*
+ * -------------------
+ * storage Bulk Insert functions
+ * -------------------
+ */
+static inline BulkInsertState
+table_getbulkinsertstate(Relation rel)
+{
+ return rel->rd_tableamroutine->getbulkinsertstate();
+}
+
+static inline void
+table_freebulkinsertstate(Relation rel, BulkInsertState bistate)
+{
+ rel->rd_tableamroutine->freebulkinsertstate(bistate);
+}
+
+static inline void
+table_releasebulkinsertstate(Relation rel, BulkInsertState bistate)
+{
+ rel->rd_tableamroutine->releasebulkinsertstate(bistate);
+}
+
+/*
+ * -------------------
+ * storage tuple rewrite functions
+ * -------------------
+ */
+static inline RewriteState
+table_begin_rewrite(Relation OldHeap, Relation NewHeap,
+ TransactionId OldestXmin, TransactionId FreezeXid,
+ MultiXactId MultiXactCutoff, bool use_wal)
+{
+ return NewHeap->rd_tableamroutine->begin_heap_rewrite(OldHeap, NewHeap,
+ OldestXmin, FreezeXid, MultiXactCutoff, use_wal);
+}
+
+static inline void
+table_end_rewrite(Relation rel, RewriteState state)
+{
+ rel->rd_tableamroutine->end_heap_rewrite(state);
+}
+
+static inline void
+table_rewrite_tuple(Relation rel, RewriteState state, HeapTuple oldTuple,
+ HeapTuple newTuple)
+{
+ rel->rd_tableamroutine->rewrite_heap_tuple(state, oldTuple, newTuple);
+}
+
+static inline bool
+table_rewrite_dead_tuple(Relation rel, RewriteState state, HeapTuple oldTuple)
+{
+ return rel->rd_tableamroutine->rewrite_heap_dead_tuple(state, oldTuple);
+}
+
+/*
+ * HeapTupleSatisfiesVisibility
+ * True iff heap tuple satisfies a time qual.
+ *
+ * Notes:
+ * Assumes heap tuple is valid.
+ * Beware of multiple evaluations of snapshot argument.
+ * Hint bits in the HeapTuple's t_infomask may be updated as a side effect;
+ * if so, the indicated buffer is marked dirty.
+ */
+#define HeapTupleSatisfiesVisibility(method, slot, snapshot) \
+ (((method)->snapshot_satisfies) (slot, snapshot))
+
+extern TableAmRoutine * GetTableAmRoutine(Oid amhandler);
+extern TableAmRoutine * GetTableAmRoutineByAmId(Oid amoid);
+extern TableAmRoutine * GetHeapamTableAmRoutine(void);
#endif /* TABLEAM_H */
diff --git a/src/include/access/tableamapi.h b/src/include/access/tableamapi.h
deleted file mode 100644
index a4a6e7fd23..0000000000
--- a/src/include/access/tableamapi.h
+++ /dev/null
@@ -1,212 +0,0 @@
-/*---------------------------------------------------------------------
- *
- * tableamapi.h
- * API for Postgres table access methods
- *
- * Copyright (c) 2017, PostgreSQL Global Development Group
- *
- * src/include/access/tableamapi.h
- *---------------------------------------------------------------------
- */
-#ifndef TABLEEAMAPI_H
-#define TABLEEAMAPI_H
-
-#include "access/heapam.h"
-#include "access/tableam.h"
-#include "nodes/execnodes.h"
-#include "nodes/nodes.h"
-#include "fmgr.h"
-#include "utils/snapshot.h"
-
-struct IndexFetchTableData;
-
-/*
- * Storage routine function hooks
- */
-typedef bool (*SnapshotSatisfies_function) (TupleTableSlot *slot, Snapshot snapshot);
-typedef HTSU_Result (*SnapshotSatisfiesUpdate_function) (TupleTableSlot *slot, CommandId curcid);
-typedef HTSV_Result (*SnapshotSatisfiesVacuum_function) (TupleTableSlot *slot, TransactionId OldestXmin);
-
-typedef Oid (*TupleInsert_function) (Relation rel, TupleTableSlot *slot, CommandId cid,
- int options, BulkInsertState bistate);
-
-typedef Oid (*TupleInsertSpeculative_function) (Relation rel,
- TupleTableSlot *slot,
- CommandId cid,
- int options,
- BulkInsertState bistate,
- uint32 specToken);
-
-
-typedef void (*TupleCompleteSpeculative_function) (Relation rel,
- TupleTableSlot *slot,
- uint32 specToken,
- bool succeeded);
-
-typedef HTSU_Result (*TupleDelete_function) (Relation relation,
- ItemPointer tid,
- CommandId cid,
- Snapshot crosscheck,
- bool wait,
- HeapUpdateFailureData *hufd,
- bool changingPart);
-
-typedef HTSU_Result (*TupleUpdate_function) (Relation relation,
- ItemPointer otid,
- TupleTableSlot *slot,
- CommandId cid,
- Snapshot crosscheck,
- bool wait,
- HeapUpdateFailureData *hufd,
- LockTupleMode *lockmode,
- bool *update_indexes);
-
-typedef bool (*TupleFetchRowVersion_function) (Relation relation,
- ItemPointer tid,
- Snapshot snapshot,
- TupleTableSlot *slot,
- Relation stats_relation);
-
-typedef HTSU_Result (*TupleLock_function) (Relation relation,
- ItemPointer tid,
- Snapshot snapshot,
- TupleTableSlot *slot,
- CommandId cid,
- LockTupleMode mode,
- LockWaitPolicy wait_policy,
- uint8 flags,
- HeapUpdateFailureData *hufd);
-
-typedef void (*MultiInsert_function) (Relation relation, HeapTuple *tuples, int ntuples,
- CommandId cid, int options, BulkInsertState bistate);
-
-typedef void (*TupleGetLatestTid_function) (Relation relation,
- Snapshot snapshot,
- ItemPointer tid);
-
-typedef tuple_data(*GetTupleData_function) (TupleTableSlot *slot, tuple_data_flags flags);
-
-struct VacuumParams;
-typedef void (*RelationVacuum_function)(Relation onerel, int options,
- struct VacuumParams *params, BufferAccessStrategy bstrategy);
-
-typedef void (*RelationSync_function) (Relation relation);
-
-typedef BulkInsertState (*GetBulkInsertState_function) (void);
-typedef void (*FreeBulkInsertState_function) (BulkInsertState bistate);
-typedef void (*ReleaseBulkInsertState_function) (BulkInsertState bistate);
-
-typedef RewriteState (*BeginHeapRewrite_function) (Relation OldHeap, Relation NewHeap,
- TransactionId OldestXmin, TransactionId FreezeXid,
- MultiXactId MultiXactCutoff, bool use_wal);
-typedef void (*EndHeapRewrite_function) (RewriteState state);
-typedef void (*RewriteHeapTuple_function) (RewriteState state, HeapTuple oldTuple,
- HeapTuple newTuple);
-typedef bool (*RewriteHeapDeadTuple_function) (RewriteState state, HeapTuple oldTuple);
-
-typedef TupleTableSlot* (*Slot_function) (Relation relation);
-
-typedef TableScanDesc (*ScanBegin_function) (Relation relation,
- Snapshot snapshot,
- int nkeys, ScanKey key,
- ParallelHeapScanDesc parallel_scan,
- bool allow_strat,
- bool allow_sync,
- bool allow_pagemode,
- bool is_bitmapscan,
- bool is_samplescan,
- bool temp_snap);
-
-typedef struct IndexFetchTableData* (*BeginIndexFetchTable_function) (Relation relation);
-typedef void (*ResetIndexFetchTable_function) (struct IndexFetchTableData* data);
-typedef void (*EndIndexFetchTable_function) (struct IndexFetchTableData* data);
-
-typedef ParallelHeapScanDesc (*ScanGetParallelheapscandesc_function) (TableScanDesc scan);
-typedef HeapPageScanDesc(*ScanGetHeappagescandesc_function) (TableScanDesc scan);
-typedef void (*SyncScanReportLocation_function) (Relation rel, BlockNumber location);
-typedef void (*ScanSetlimits_function) (TableScanDesc sscan, BlockNumber startBlk, BlockNumber numBlks);
-
-typedef TupleTableSlot *(*ScanGetnextSlot_function) (TableScanDesc scan,
- ScanDirection direction, TupleTableSlot *slot);
-
-typedef bool (*ScanFetchTupleFromOffset_function) (TableScanDesc scan,
- BlockNumber blkno, OffsetNumber offset, TupleTableSlot *slot);
-
-typedef void (*ScanEnd_function) (TableScanDesc scan);
-
-
-typedef void (*ScanGetpage_function) (TableScanDesc scan, BlockNumber page);
-typedef void (*ScanRescan_function) (TableScanDesc scan, ScanKey key, bool set_params,
- bool allow_strat, bool allow_sync, bool allow_pagemode);
-typedef void (*ScanUpdateSnapshot_function) (TableScanDesc scan, Snapshot snapshot);
-
-typedef bool (*TupleFetchFollow_function)(struct IndexFetchTableData *scan,
- ItemPointer tid,
- Snapshot snapshot,
- TupleTableSlot *slot,
- bool *call_again, bool *all_dead);
-
-/*
- * API struct for a table AM. Note this must be stored in a single palloc'd
- * chunk of memory.
- */
-typedef struct TableAmRoutine
-{
- NodeTag type;
-
- Slot_function gimmegimmeslot;
-
- SnapshotSatisfies_function snapshot_satisfies;
- SnapshotSatisfiesUpdate_function snapshot_satisfiesUpdate;
- SnapshotSatisfiesVacuum_function snapshot_satisfiesVacuum;
-
- /* Operations on physical tuples */
- TupleInsert_function tuple_insert;
- TupleInsertSpeculative_function tuple_insert_speculative;
- TupleCompleteSpeculative_function tuple_complete_speculative;
- TupleUpdate_function tuple_update;
- TupleDelete_function tuple_delete;
- TupleFetchRowVersion_function tuple_fetch_row_version;
- TupleLock_function tuple_lock;
- MultiInsert_function multi_insert;
- TupleGetLatestTid_function tuple_get_latest_tid;
- TupleFetchFollow_function tuple_fetch_follow;
-
- GetTupleData_function get_tuple_data;
-
- RelationVacuum_function relation_vacuum;
- RelationSync_function relation_sync;
-
- GetBulkInsertState_function getbulkinsertstate;
- FreeBulkInsertState_function freebulkinsertstate;
- ReleaseBulkInsertState_function releasebulkinsertstate;
-
- BeginHeapRewrite_function begin_heap_rewrite;
- EndHeapRewrite_function end_heap_rewrite;
- RewriteHeapTuple_function rewrite_heap_tuple;
- RewriteHeapDeadTuple_function rewrite_heap_dead_tuple;
-
- /* Operations on relation scans */
- ScanBegin_function scan_begin;
- ScanGetParallelheapscandesc_function scan_get_parallelheapscandesc;
- ScanGetHeappagescandesc_function scan_get_heappagescandesc;
- SyncScanReportLocation_function sync_scan_report_location;
- ScanSetlimits_function scansetlimits;
- ScanGetnextSlot_function scan_getnextslot;
- ScanFetchTupleFromOffset_function scan_fetch_tuple_from_offset;
- ScanEnd_function scan_end;
- ScanGetpage_function scan_getpage;
- ScanRescan_function scan_rescan;
- ScanUpdateSnapshot_function scan_update_snapshot;
-
- BeginIndexFetchTable_function begin_index_fetch;
- EndIndexFetchTable_function reset_index_fetch;
- EndIndexFetchTable_function end_index_fetch;
-
-} TableAmRoutine;
-
-extern TableAmRoutine * GetTableAmRoutine(Oid amhandler);
-extern TableAmRoutine * GetTableAmRoutineByAmId(Oid amoid);
-extern TableAmRoutine * GetHeapamTableAmRoutine(void);
-
-#endif /* TABLEEAMAPI_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 945bbc3ddf..c69ca99435 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -502,7 +502,7 @@ typedef enum NodeTag
T_InlineCodeBlock, /* in nodes/parsenodes.h */
T_FdwRoutine, /* in foreign/fdwapi.h */
T_IndexAmRoutine, /* in access/amapi.h */
- T_TableAmRoutine, /* in access/tableamapi.h */
+ T_TableAmRoutine, /* in access/tableam.h */
T_TsmRoutine, /* in access/tsmapi.h */
T_ForeignKeyCacheInfo, /* in utils/rel.h */
T_CallContext /* in nodes/parsenodes.h */
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 9739bed9e0..1fe9cc6402 100644
--- a/src/include/utils/tqual.h
+++ b/src/include/utils/tqual.h
@@ -16,10 +16,8 @@
#define TQUAL_H
#include "utils/snapshot.h"
-#include "access/tableamapi.h"
#include "access/xlogdefs.h"
-
/* Static variables representing various special snapshot semantics */
extern PGDLLIMPORT SnapshotData SnapshotSelfData;
extern PGDLLIMPORT SnapshotData SnapshotAnyData;
@@ -33,19 +31,6 @@ extern PGDLLIMPORT SnapshotData CatalogSnapshotData;
((snapshot)->visibility_type == MVCC_VISIBILITY || \
(snapshot)->visibility_type == HISTORIC_MVCC_VISIBILITY)
-/*
- * HeapTupleSatisfiesVisibility
- * True iff heap tuple satisfies a time qual.
- *
- * Notes:
- * Assumes heap tuple is valid.
- * Beware of multiple evaluations of snapshot argument.
- * Hint bits in the HeapTuple's t_infomask may be updated as a side effect;
- * if so, the indicated buffer is marked dirty.
- */
-#define HeapTupleSatisfiesVisibility(method, slot, snapshot) \
- (((method)->snapshot_satisfies) (slot, snapshot))
-
/*
* To avoid leaking too much knowledge about reorderbuffer implementation
* details this is implemented in reorderbuffer.c not tqual.c.
--
2.18.0.windows.1
On Tue, Jul 24, 2018 at 11:31 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
On Tue, Jul 17, 2018 at 11:01 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:I added new API in the tableam.h to get all the page visible tuples to
abstract the bitgetpage() function.
- Merge tableam.h and tableamapi.h and make most tableam.c functions
small inline functions. Having one-line tableam.c wrappers makes this
more expensive than necessary. We'll have a big enough trouble not
regressing performancewise.I merged tableam.h and tableamapi.h into tableam.h and changed all the
functions as inline. This change may have added some additional headers,
will check them if I can remove their need.Attached are the updated patches on top your github tree.
Currently I am working on the following.
- I observed that there is a crash when running isolation tests.
while investing the crash, I observed that it is due to the lot of FIXME's
in
the code. So I just fixed minimal changes and looking into correcting
the FIXME's first.
One thing I observed is lack relation pointer is leading to crash in the
flow of EvalPlan* functions, because all ROW_MARK types doesn't
contains relation pointer.
will continue to check all FIXME fixes.
- COPY's multi_insert path should probably deal with a bunch of slots,
rather than forming HeapTuples
Implemented supporting of slots in the copy multi insert path.
Regards,
Haribabu Kommi
Fujitsu Australia
Attachments:
0002-Isolation-test-fixes-1.patchapplication/octet-stream; name=0002-Isolation-test-fixes-1.patchDownload
From 3e4d02fd2ade9b9b116b013899b4b81b435379a8 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Fri, 3 Aug 2018 11:29:05 +1000
Subject: [PATCH 2/2] Isolation test fixes -1
---
src/backend/access/heap/heapam_handler.c | 9 ++++++---
src/backend/executor/execMain.c | 5 +++--
src/backend/executor/nodeModifyTable.c | 6 +++++-
3 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index a3fe110efe..cce2123416 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -317,7 +317,8 @@ retry:
if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple->t_data),
priorXmax))
{
- ReleaseBuffer(buffer);
+ if (BufferIsValid(buffer))
+ ReleaseBuffer(buffer);
return HeapTupleDeleted;
}
@@ -336,7 +337,8 @@ retry:
if (ItemPointerEquals(&tuple->t_self, &tuple->t_data->t_ctid))
{
/* deleted, so forget about it */
- ReleaseBuffer(buffer);
+ if (BufferIsValid(buffer))
+ ReleaseBuffer(buffer);
return HeapTupleDeleted;
}
@@ -344,7 +346,8 @@ retry:
*tid = tuple->t_data->t_ctid;
/* updated row should have xmin matching this xmax */
priorXmax = HeapTupleHeaderGetUpdateXid(tuple->t_data);
- ReleaseBuffer(buffer);
+ if (BufferIsValid(buffer))
+ ReleaseBuffer(buffer);
/* loop back to fetch next in chain */
}
}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ea60e588a5..fd3e53d1ee 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2552,8 +2552,9 @@ EvalPlanQual(EState *estate, EPQState *epqstate,
* datums that may be present in copyTuple). As with the next step, this
* is to guard against early re-use of the EPQ query.
*/
- if (!TupIsNull(slot))
- ExecMaterializeSlot(slot);
+ /*if (!TupIsNull(slot))
+ * ExecMaterializeSlot(slot);
+ */
#if FIXME
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 5ae0bab9f5..71150ad32e 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1177,7 +1177,11 @@ lreplace:;
if (result == HeapTupleUpdated && !IsolationUsesXactSnapshot())
{
- TupleTableSlot *inputslot = EvalPlanQualSlot(epqstate, resultRelationDesc, resultRelInfo->ri_RangeTableIndex);
+ TupleTableSlot *inputslot;
+
+ EvalPlanQualBegin(epqstate, estate);
+
+ inputslot = EvalPlanQualSlot(epqstate, resultRelationDesc, resultRelInfo->ri_RangeTableIndex);
result = table_lock_tuple(resultRelationDesc, tupleid,
estate->es_snapshot,
--
2.18.0.windows.1
0001-COPY-s-multi_insert-path-deal-with-bunch-of-slots.patchapplication/octet-stream; name=0001-COPY-s-multi_insert-path-deal-with-bunch-of-slots.patchDownload
From 2bcbe0ffc5df81b38e6c6f0093eb486669c7a3b2 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Fri, 3 Aug 2018 11:07:52 +1000
Subject: [PATCH 1/2] COPY's multi_insert path deal with bunch of slots
Support of passing slots instead of tuples when doing
multi insert.
---
src/backend/access/heap/heapam.c | 31 ++++++----
src/backend/commands/copy.c | 100 +++++++++++++++----------------
src/include/access/heapam.h | 3 +-
src/include/access/tableam.h | 6 +-
4 files changed, 74 insertions(+), 66 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 40c1a5432d..7d0d1dc234 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2648,7 +2648,7 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
* temporary context before calling this, if that's a problem.
*/
void
-heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
+heap_multi_insert(Relation relation, TupleTableSlot **slots, int nslots,
CommandId cid, int options, BulkInsertState bistate)
{
TransactionId xid = GetCurrentTransactionId();
@@ -2666,11 +2666,18 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
saveFreeSpace = RelationGetTargetPageFreeSpace(relation,
HEAP_DEFAULT_FILLFACTOR);
- /* Toast and set header data in all the tuples */
- heaptuples = palloc(ntuples * sizeof(HeapTuple));
- for (i = 0; i < ntuples; i++)
- heaptuples[i] = heap_prepare_insert(relation, tuples[i],
+ /* Toast and set header data in all the slots */
+ heaptuples = palloc(nslots * sizeof(HeapTuple));
+ for (i = 0; i < nslots; i++)
+ {
+ heaptuples[i] = heap_prepare_insert(relation, ExecGetHeapTupleFromSlot(slots[i]),
xid, cid, options);
+ if (slots[i]->tts_tupleOid != InvalidOid)
+ HeapTupleSetOid(heaptuples[i], slots[i]->tts_tupleOid);
+
+ if (slots[i]->tts_tableOid != InvalidOid)
+ heaptuples[i]->t_tableOid = slots[i]->tts_tableOid;
+ }
/*
* Allocate some memory to use for constructing the WAL record. Using
@@ -2706,7 +2713,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
CheckForSerializableConflictIn(relation, NULL, InvalidBuffer);
ndone = 0;
- while (ndone < ntuples)
+ while (ndone < nslots)
{
Buffer buffer;
Buffer vmbuffer = InvalidBuffer;
@@ -2732,7 +2739,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
* Put that on the page, and then as many other tuples as fit.
*/
RelationPutHeapTuple(relation, buffer, heaptuples[ndone], false);
- for (nthispage = 1; ndone + nthispage < ntuples; nthispage++)
+ for (nthispage = 1; ndone + nthispage < nslots; nthispage++)
{
HeapTuple heaptup = heaptuples[ndone + nthispage];
@@ -2841,7 +2848,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
* emitted by this call to heap_multi_insert(). Needed for logical
* decoding so it knows when to cleanup temporary data.
*/
- if (ndone + nthispage == ntuples)
+ if (ndone + nthispage == nslots)
xlrec->flags |= XLH_INSERT_LAST_IN_MULTI;
if (init)
@@ -2904,7 +2911,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
*/
if (IsCatalogRelation(relation))
{
- for (i = 0; i < ntuples; i++)
+ for (i = 0; i < nslots; i++)
CacheInvalidateHeapTuple(relation, heaptuples[i], NULL);
}
@@ -2913,10 +2920,10 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
* nothing for untoasted tuples (tuples[i] == heaptuples[i)], but it's
* probably faster to always copy than check.
*/
- for (i = 0; i < ntuples; i++)
- tuples[i]->t_self = heaptuples[i]->t_self;
+ for (i = 0; i < nslots; i++)
+ slots[i]->tts_tid = heaptuples[i]->t_self;
- pgstat_count_heap_insert(relation, ntuples);
+ pgstat_count_heap_insert(relation, nslots);
}
/*
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index d49734ddab..62ee8cfea7 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -306,9 +306,9 @@ static void CopyOneRowTo(CopyState cstate, Oid tupleOid,
Datum *values, bool *nulls);
static void CopyFromInsertBatch(CopyState cstate, EState *estate,
CommandId mycid, int hi_options,
- ResultRelInfo *resultRelInfo, TupleTableSlot *myslot,
+ ResultRelInfo *resultRelInfo,
BulkInsertState bistate,
- int nBufferedTuples, HeapTuple *bufferedTuples,
+ int nslots, TupleTableSlot **slots,
uint64 firstBufferedLineNo);
static bool CopyReadLine(CopyState cstate);
static bool CopyReadLineText(CopyState cstate);
@@ -2310,7 +2310,6 @@ CopyFrom(CopyState cstate)
EState *estate = CreateExecutorState(); /* for ExecConstraints() */
ModifyTableState *mtstate;
ExprContext *econtext;
- TupleTableSlot *myslot;
MemoryContext oldcontext = CurrentMemoryContext;
ErrorContextCallback errcallback;
@@ -2319,12 +2318,11 @@ CopyFrom(CopyState cstate)
void *bistate;
uint64 processed = 0;
bool useHeapMultiInsert;
- int nBufferedTuples = 0;
+ int nslots = 0;
int prev_leaf_part_index = -1;
-#define MAX_BUFFERED_TUPLES 1000
- HeapTuple *bufferedTuples = NULL; /* initialize to silence warning */
- Size bufferedTuplesSize = 0;
+#define MAX_BUFFERED_SLOTS 1000
+ TupleTableSlot **slots = NULL; /* initialize to silence warning */
uint64 firstBufferedLineNo = 0;
Assert(cstate->rel);
@@ -2467,10 +2465,6 @@ CopyFrom(CopyState cstate)
estate->es_result_relation_info = resultRelInfo;
estate->es_range_table = cstate->range_table;
- /* Set up a tuple slot too */
- myslot = ExecInitExtraTupleSlot(estate, tupDesc,
- TTS_TYPE_HEAPTUPLE);
-
/*
* Set up a ModifyTableState so we can let FDW(s) init themselves for
* foreign-table result relation(s).
@@ -2541,7 +2535,7 @@ CopyFrom(CopyState cstate)
else
{
useHeapMultiInsert = true;
- bufferedTuples = palloc(MAX_BUFFERED_TUPLES * sizeof(HeapTuple));
+ slots = palloc(MAX_BUFFERED_SLOTS * sizeof(TupleTableSlot *));
}
/*
@@ -2569,11 +2563,17 @@ CopyFrom(CopyState cstate)
TupleTableSlot *slot;
bool skip_tuple;
Oid loaded_oid = InvalidOid;
+ int natts = resultRelInfo->ri_RelationDesc->rd_att->natts;
+ int cnt;
CHECK_FOR_INTERRUPTS();
- if (nBufferedTuples == 0)
+ if (nslots == 0)
{
+ /* Reset Tupletable slots if any */
+ ExecResetTupleTable(estate->es_tupleTable, false);
+ estate->es_tupleTable = NIL;
+
/*
* Reset the per-tuple exprcontext. We can only do this if the
* tuple buffer is empty. (Calling the context the per-tuple
@@ -2588,25 +2588,32 @@ CopyFrom(CopyState cstate)
if (!NextCopyFrom(cstate, econtext, values, nulls, &loaded_oid))
break;
- /* And now we can form the input tuple. */
- tuple = heap_form_tuple(tupDesc, values, nulls);
+ slot = ExecInitExtraTupleSlot(estate,
+ RelationGetDescr(resultRelInfo->ri_RelationDesc),
+ useHeapMultiInsert ? TTS_TYPE_VIRTUAL : TTS_TYPE_HEAPTUPLE);
+
+ /* Directly store the values/nulls array in the slot */
+ memcpy(slot->tts_isnull, nulls, sizeof(bool) * natts);
+ for (cnt = 0; cnt < natts; cnt++)
+ {
+ if (!slot->tts_isnull[cnt])
+ slot->tts_values[cnt] = values[cnt];
+ }
+
+ ExecStoreVirtualTuple(slot);
if (loaded_oid != InvalidOid)
- HeapTupleSetOid(tuple, loaded_oid);
+ slot->tts_tupleOid = loaded_oid;
/*
* Constraints might reference the tableoid column, so initialize
* t_tableOid before evaluating them.
*/
- tuple->t_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+ slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
/* Triggers and stuff need to be invoked in query context. */
MemoryContextSwitchTo(oldcontext);
- /* Place tuple in tuple slot --- but slot shouldn't free it */
- slot = myslot;
- ExecStoreTuple(tuple, slot, InvalidBuffer, false);
-
/* Determine the partition to heap_insert the tuple into */
if (cstate->partition_tuple_routing)
{
@@ -2659,6 +2666,9 @@ CopyFrom(CopyState cstate)
*/
estate->es_result_relation_info = resultRelInfo;
+ /* FIXME: Get the HeapTuple from slot */
+ tuple = ExecGetHeapTupleFromSlot(slot);
+
/*
* If we're capturing transition tuples, we might need to convert
* from the partition rowtype to parent rowtype.
@@ -2699,6 +2709,7 @@ CopyFrom(CopyState cstate)
&slot);
tuple->t_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+ slot->tts_tableOid = tuple->t_tableOid;
}
skip_tuple = false;
@@ -2709,8 +2720,6 @@ CopyFrom(CopyState cstate)
{
if (!ExecBRInsertTriggers(estate, resultRelInfo, slot))
skip_tuple = true; /* "do nothing" */
- else /* trigger might have changed tuple */
- tuple = ExecGetHeapTupleFromSlot(slot);
}
if (!skip_tuple)
@@ -2746,10 +2755,9 @@ CopyFrom(CopyState cstate)
if (useHeapMultiInsert)
{
/* Add this tuple to the tuple buffer */
- if (nBufferedTuples == 0)
+ if (nslots == 0)
firstBufferedLineNo = cstate->cur_lineno;
- bufferedTuples[nBufferedTuples++] = tuple;
- bufferedTuplesSize += tuple->t_len;
+ slots[nslots++] = slot;
/*
* If the buffer filled up, flush it. Also flush if the
@@ -2757,15 +2765,13 @@ CopyFrom(CopyState cstate)
* large, to avoid using large amounts of memory for the
* buffer when the tuples are exceptionally wide.
*/
- if (nBufferedTuples == MAX_BUFFERED_TUPLES ||
- bufferedTuplesSize > 65535)
+ if (nslots == MAX_BUFFERED_SLOTS)
{
CopyFromInsertBatch(cstate, estate, mycid, hi_options,
- resultRelInfo, myslot, bistate,
- nBufferedTuples, bufferedTuples,
+ resultRelInfo, bistate,
+ nslots, slots,
firstBufferedLineNo);
- nBufferedTuples = 0;
- bufferedTuplesSize = 0;
+ nslots = 0;
}
}
else
@@ -2783,15 +2789,12 @@ CopyFrom(CopyState cstate)
if (slot == NULL) /* "do nothing" */
goto next_tuple;
- /* FDW might have changed tuple */
- tuple = ExecGetHeapTupleFromSlot(slot);
-
/*
* AFTER ROW Triggers might reference the tableoid
* column, so initialize t_tableOid before evaluating
* them.
*/
- tuple->t_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+ slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
}
else
{
@@ -2834,10 +2837,10 @@ next_tuple:
}
/* Flush any remaining buffered tuples */
- if (nBufferedTuples > 0)
+ if (nslots > 0)
CopyFromInsertBatch(cstate, estate, mycid, hi_options,
- resultRelInfo, myslot, bistate,
- nBufferedTuples, bufferedTuples,
+ resultRelInfo, bistate,
+ nslots, slots,
firstBufferedLineNo);
/* Done, clean up */
@@ -2900,8 +2903,7 @@ next_tuple:
static void
CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
int hi_options, ResultRelInfo *resultRelInfo,
- TupleTableSlot *myslot, BulkInsertState bistate,
- int nBufferedTuples, HeapTuple *bufferedTuples,
+ BulkInsertState bistate, int nslots, TupleTableSlot **slots,
uint64 firstBufferedLineNo)
{
MemoryContext oldcontext;
@@ -2921,8 +2923,8 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
*/
oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
table_multi_insert(cstate->rel,
- bufferedTuples,
- nBufferedTuples,
+ slots,
+ nslots,
mycid,
hi_options,
bistate);
@@ -2934,16 +2936,15 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
*/
if (resultRelInfo->ri_NumIndices > 0)
{
- for (i = 0; i < nBufferedTuples; i++)
+ for (i = 0; i < nslots; i++)
{
List *recheckIndexes;
cstate->cur_lineno = firstBufferedLineNo + i;
- ExecStoreTuple(bufferedTuples[i], myslot, InvalidBuffer, false);
recheckIndexes =
- ExecInsertIndexTuples(myslot, estate, false, NULL, NIL);
+ ExecInsertIndexTuples(slots[i], estate, false, NULL, NIL);
ExecARInsertTriggers(estate, resultRelInfo,
- myslot,
+ slots[i],
recheckIndexes, cstate->transition_capture);
list_free(recheckIndexes);
}
@@ -2957,12 +2958,11 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
(resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
resultRelInfo->ri_TrigDesc->trig_insert_new_table))
{
- for (i = 0; i < nBufferedTuples; i++)
+ for (i = 0; i < nslots; i++)
{
cstate->cur_lineno = firstBufferedLineNo + i;
- ExecStoreTuple(bufferedTuples[i], myslot, InvalidBuffer, false);
ExecARInsertTriggers(estate, resultRelInfo,
- myslot,
+ slots[i],
NIL, cstate->transition_capture);
}
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 5f89b5b174..a36a0c49e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -16,6 +16,7 @@
#include "access/sdir.h"
#include "access/skey.h"
+#include "executor/tuptable.h"
#include "nodes/lockoptions.h"
#include "nodes/primnodes.h"
#include "storage/bufpage.h"
@@ -168,7 +169,7 @@ extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
extern Oid heap_insert(Relation relation, HeapTuple tup, CommandId cid,
int options, BulkInsertState bistate);
-extern void heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
+extern void heap_multi_insert(Relation relation, TupleTableSlot **slots, int nslots,
CommandId cid, int options, BulkInsertState bistate);
extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
CommandId cid, Snapshot crosscheck, bool wait,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index df60ba3316..9912a171fb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -103,7 +103,7 @@ typedef HTSU_Result (*TupleLock_function) (Relation relation,
uint8 flags,
HeapUpdateFailureData *hufd);
-typedef void (*MultiInsert_function) (Relation relation, HeapTuple *tuples, int ntuples,
+typedef void (*MultiInsert_function) (Relation relation, TupleTableSlot **slots, int nslots,
CommandId cid, int options, BulkInsertState bistate);
typedef void (*TupleGetLatestTid_function) (Relation relation,
@@ -611,10 +611,10 @@ table_fetch_follow_check(Relation rel,
* table_multi_insert - insert multiple tuple into a table
*/
static inline void
-table_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
+table_multi_insert(Relation relation, TupleTableSlot **slots, int nslots,
CommandId cid, int options, BulkInsertState bistate)
{
- relation->rd_tableamroutine->multi_insert(relation, tuples, ntuples,
+ relation->rd_tableamroutine->multi_insert(relation, slots, nslots,
cid, options, bistate);
}
--
2.18.0.windows.1
Hi,
I'm currently in the process of rebasing zheap onto the pluggable
storage work. The goal, which seems to work surprisingly well, is to
find issues that the current pluggable storage patch doesn't yet deal
with. I plan to push a tree including a lot of fixes and improvements
soon.
On 2018-08-03 12:35:50 +1000, Haribabu Kommi wrote:
while investing the crash, I observed that it is due to the lot of FIXME's
in
the code. So I just fixed minimal changes and looking into correcting
the FIXME's first.One thing I observed is lack relation pointer is leading to crash in the
flow of EvalPlan* functions, because all ROW_MARK types doesn't
contains relation pointer.will continue to check all FIXME fixes.
Thanks.
- COPY's multi_insert path should probably deal with a bunch of slots,
rather than forming HeapTuplesImplemented supporting of slots in the copy multi insert path.
Cool. I've not yet looked at it, but I plan to do so soon. Will have to
rebase over the other copy changes first :(
- Andres
On Sun, Aug 5, 2018 at 7:48 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
I'm currently in the process of rebasing zheap onto the pluggable
storage work. The goal, which seems to work surprisingly well, is to
find issues that the current pluggable storage patch doesn't yet deal
with. I plan to push a tree including a lot of fixes and improvements
soon.
Sorry for coming late to this thread.
That's good. Did you find any problems in porting zheap into pluggable
storage? Does it needs any API changes or new API requirement?
On 2018-08-03 12:35:50 +1000, Haribabu Kommi wrote:
while investing the crash, I observed that it is due to the lot of
FIXME's
in
the code. So I just fixed minimal changes and looking into correcting
the FIXME's first.One thing I observed is lack relation pointer is leading to crash in the
flow of EvalPlan* functions, because all ROW_MARK types doesn't
contains relation pointer.will continue to check all FIXME fixes.
Thanks.
I fixed some of the Isolation test problems. All the issues are related to
EPQ slot handling. Still more needs to be fixed.
Does the new TupleTableSlot abstraction patches has fixed any of these
issues in the recent thread [1]/messages/by-id/CAFjFpRcNPQ1oOL41-HQYaEF=Nq6Vbg0eHeFgopJhHw_X2usA5w@mail.gmail.com? so that I can look into the change of FDW
API
to return slot instead of tuple.
- COPY's multi_insert path should probably deal with a bunch of slots,
rather than forming HeapTuplesImplemented supporting of slots in the copy multi insert path.
Cool. I've not yet looked at it, but I plan to do so soon. Will have to
rebase over the other copy changes first :(
OK. Understood. There are many changes in the COPY flow conflicts
with my changes. Please let me know once you done the rebase, I can
fix those conflicts and regenerate the patch.
Attached is the patch with further fixes.
[1]: /messages/by-id/CAFjFpRcNPQ1oOL41-HQYaEF=Nq6Vbg0eHeFgopJhHw_X2usA5w@mail.gmail.com
/messages/by-id/CAFjFpRcNPQ1oOL41-HQYaEF=Nq6Vbg0eHeFgopJhHw_X2usA5w@mail.gmail.com
Regards,
Haribabu Kommi
Fujitsu Australia
Attachments:
0001-isolation-test-fixes-2.patchapplication/octet-stream; name=0001-isolation-test-fixes-2.patchDownload
From a064078f3cc917cd548f20cc7327516c3905b35b Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Tue, 21 Aug 2018 16:28:46 +1000
Subject: [PATCH] isolation test fixes -2
---
src/backend/commands/trigger.c | 6 +++++-
src/backend/executor/execMain.c | 9 +++++++--
src/backend/executor/nodeModifyTable.c | 4 ++++
3 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index b2951a237e..801a3fee25 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3326,7 +3326,11 @@ GetTupleForTrigger(EState *estate,
if (TupIsNull(epqslot))
return false;
- ExecCopySlot(newslot, epqslot);
+ if (newslot)
+ ExecCopySlot(newslot, epqslot);
+ else
+ ExecCopySlot(oldslot, epqslot);
+
*is_epqtuple = true;
}
break;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index fd3e53d1ee..dbbebca045 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2477,8 +2477,13 @@ EvalPlanQualSlot(EPQState *epqstate,
MemoryContext oldcontext;
oldcontext = MemoryContextSwitchTo(epqstate->estate->es_query_cxt);
- *slot = table_gimmegimmeslot(relation,
- &epqstate->estate->es_tupleTable);
+
+ if (relation)
+ *slot = table_gimmegimmeslot(relation, &epqstate->estate->es_tupleTable);
+ else
+ *slot = MakeTupleTableSlot(epqstate->origslot->tts_tupleDescriptor, TTS_TYPE_BUFFER);
+
+ epqstate->estate->es_epqTupleSet[rti - 1] = true;
MemoryContextSwitchTo(oldcontext);
}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 71150ad32e..14ca3b976e 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -702,6 +702,9 @@ ldelete:;
if (result == HeapTupleUpdated && !IsolationUsesXactSnapshot())
{
+ EvalPlanQualBegin(epqstate, estate);
+ slot = EvalPlanQualSlot(epqstate, resultRelationDesc, resultRelInfo->ri_RangeTableIndex);
+
result = table_lock_tuple(resultRelationDesc, tupleid,
estate->es_snapshot,
slot, estate->es_output_cid,
@@ -1182,6 +1185,7 @@ lreplace:;
EvalPlanQualBegin(epqstate, estate);
inputslot = EvalPlanQualSlot(epqstate, resultRelationDesc, resultRelInfo->ri_RangeTableIndex);
+ ExecCopySlot(inputslot, slot);
result = table_lock_tuple(resultRelationDesc, tupleid,
estate->es_snapshot,
--
2.18.0.windows.1
Hi,
On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:
On Sun, Aug 5, 2018 at 7:48 PM Andres Freund <andres@anarazel.de> wrote:
I'm currently in the process of rebasing zheap onto the pluggable
storage work. The goal, which seems to work surprisingly well, is to
find issues that the current pluggable storage patch doesn't yet deal
with. I plan to push a tree including a lot of fixes and improvements
soon.Sorry for coming late to this thread.
No worries.
That's good. Did you find any problems in porting zheap into pluggable
storage? Does it needs any API changes or new API requirement?
A lot, yes. The big changes are:
- removal of HeapPageScanDesc
- introduction of explicit support functions for tablesample & bitmap scans
- introduction of callbacks for vacuum_rel, cluster
And quite a bit more along those lines.
Does the new TupleTableSlot abstraction patches has fixed any of these
issues in the recent thread [1]? so that I can look into the change of
FDW API to return slot instead of tuple.
Yea, that'd be a good thing to start with.
Greetings,
Andres Freund
On Tue, Aug 21, 2018 at 6:59 PM Andres Freund <andres@anarazel.de> wrote:
On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:
On Sun, Aug 5, 2018 at 7:48 PM Andres Freund <andres@anarazel.de> wrote:
I'm currently in the process of rebasing zheap onto the pluggable
storage work. The goal, which seems to work surprisingly well, is to
find issues that the current pluggable storage patch doesn't yet deal
with. I plan to push a tree including a lot of fixes and improvements
soon.That's good. Did you find any problems in porting zheap into pluggable
storage? Does it needs any API changes or new API requirement?A lot, yes. The big changes are:
- removal of HeapPageScanDesc
- introduction of explicit support functions for tablesample & bitmap scans
- introduction of callbacks for vacuum_rel, clusterAnd quite a bit more along those lines.
OK. Those are quite a bit of changes.
Does the new TupleTableSlot abstraction patches has fixed any of these
issues in the recent thread [1]? so that I can look into the change of
FDW API to return slot instead of tuple.Yea, that'd be a good thing to start with.
I found out only the RefetchForeignRow API needs the change and done the
same.
Along with that, I fixed all the issues of running make check-world.
Attached patches
for the same.
Now I will look into the remaining FIXME's that don't conflict with your
further changes.
Regards,
Haribabu Kommi
Fujitsu Australia
Attachments:
0002-check-world-fixes.patchapplication/octet-stream; name=0002-check-world-fixes.patchDownload
From 58b1bcd28991a1be04d13783e1c46a1ebf5de51c Mon Sep 17 00:00:00 2001
From: Kommi <haribabuk@fast.au.fujitsu.com>
Date: Fri, 24 Aug 2018 11:21:34 +1000
Subject: [PATCH 2/2] check-world fixes
---
contrib/amcheck/verify_nbtree.c | 1 +
contrib/pg_visibility/pg_visibility.c | 5 +++--
contrib/pgstattuple/pgstatapprox.c | 10 +++++++++-
contrib/pgstattuple/pgstattuple.c | 9 ++++++++-
src/backend/executor/execMain.c | 11 +----------
src/backend/executor/execTuples.c | 22 ++++++++++++++++++++++
src/backend/executor/nodeModifyTable.c | 12 ++++++++++--
src/include/executor/tuptable.h | 1 +
8 files changed, 55 insertions(+), 16 deletions(-)
diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 04102dd3df..cb4294af7d 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -25,6 +25,7 @@
#include "access/htup_details.h"
#include "access/nbtree.h"
+#include "access/tableam.h"
#include "access/transam.h"
#include "access/xact.h"
#include "catalog/index.h"
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index dce5262e34..88ca4fd2af 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -563,12 +563,13 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
rel = relation_open(relid, AccessShareLock);
+ /* Only some relkinds have a visibility map */
+ check_relation_relkind(rel);
+
if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("only heap AM is supported")));
- /* Only some relkinds have a visibility map */
- check_relation_relkind(rel);
nblocks = RelationGetNumberOfBlocks(rel);
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index e805981bb9..6aee2ce8ac 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -12,6 +12,7 @@
*/
#include "postgres.h"
+#include "access/tableam.h"
#include "access/transam.h"
#include "access/visibilitymap.h"
#include "access/xact.h"
@@ -69,6 +70,7 @@ statapprox_heap(Relation rel, output_type *stat)
Buffer vmbuffer = InvalidBuffer;
BufferAccessStrategy bstrategy;
TransactionId OldestXmin;
+ TupleTableSlot *slot;
OldestXmin = GetOldestXmin(rel, PROCARRAY_FLAGS_VACUUM);
bstrategy = GetAccessStrategy(BAS_BULKREAD);
@@ -76,6 +78,8 @@ statapprox_heap(Relation rel, output_type *stat)
nblocks = RelationGetNumberOfBlocks(rel);
scanned = 0;
+ slot = MakeSingleTupleTableSlot(RelationGetDescr(rel), TTS_TYPE_BUFFER);
+
for (blkno = 0; blkno < nblocks; blkno++)
{
Buffer buf;
@@ -153,13 +157,15 @@ statapprox_heap(Relation rel, output_type *stat)
tuple.t_len = ItemIdGetLength(itemid);
tuple.t_tableOid = RelationGetRelid(rel);
+ ExecStoreTuple(&tuple,slot,buf,false);
+
/*
* We follow VACUUM's lead in counting INSERT_IN_PROGRESS tuples
* as "dead" while DELETE_IN_PROGRESS tuples are "live". We don't
* bother distinguishing tuples inserted/deleted by our own
* transaction.
*/
- switch (rel->rd_tableamroutine->snapshot_satisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (rel->rd_tableamroutine->snapshot_satisfiesVacuum(slot, OldestXmin))
{
case HEAPTUPLE_LIVE:
case HEAPTUPLE_DELETE_IN_PROGRESS:
@@ -210,6 +216,8 @@ statapprox_heap(Relation rel, output_type *stat)
ReleaseBuffer(vmbuffer);
vmbuffer = InvalidBuffer;
}
+
+ ExecDropSingleTupleTableSlot(slot);
}
/*
diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 53fdff6c2c..5965ecbcf8 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -28,6 +28,7 @@
#include "access/hash.h"
#include "access/nbtree.h"
#include "access/relscan.h"
+#include "access/tableam.h"
#include "catalog/namespace.h"
#include "catalog/pg_am.h"
#include "funcapi.h"
@@ -326,12 +327,14 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
Buffer buffer;
pgstattuple_type stat = {0};
SnapshotData SnapshotDirty;
+ TupleTableSlot *slot;
TableAmRoutine *method = rel->rd_tableamroutine;
/* Disable syncscan because we assume we scan from block zero upwards */
scan = table_beginscan_strat(rel, SnapshotAny, 0, NULL, true, false);
InitDirtySnapshot(SnapshotDirty);
+ slot = MakeSingleTupleTableSlot(RelationGetDescr(rel), TTS_TYPE_BUFFER);
pagescan = tableam_get_heappagescandesc(scan);
nblocks = pagescan->rs_nblocks; /* # blocks to be scanned */
@@ -343,7 +346,10 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
/* must hold a buffer lock to call HeapTupleSatisfiesVisibility */
LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
- if (HeapTupleSatisfiesVisibility(method, tuple, &SnapshotDirty, scan->rs_cbuf))
+ /* FIXME: change to get the slot directly, instead of tuple */
+ ExecStoreTuple(tuple, slot, scan->rs_cbuf, false);
+
+ if (HeapTupleSatisfiesVisibility(method, slot, &SnapshotDirty))
{
stat.tuple_len += tuple->t_len;
stat.tuple_count++;
@@ -393,6 +399,7 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
relation_close(rel, AccessShareLock);
stat.table_len = (uint64) nblocks * BLCKSZ;
+ ExecDropSingleTupleTableSlot(slot);
return build_pgstattuple_type(&stat, fcinfo);
}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 1237e809f3..e349a22e94 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2765,16 +2765,7 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
if (isNull)
continue;
- elog(ERROR, "frak, need to implement ROW_MARK_COPY");
-#ifdef FIXME
- // FIXME: this should just deform the tuple and store it as a
- // virtual one.
- tuple = table_tuple_by_datum(erm->relation, datum, erm->relid);
-
- /* store tuple */
- EvalPlanQualSetTuple(epqstate, erm->rti, tuple);
-#endif
-
+ ExecForceStoreHeapTupleDatum(datum, slot);
}
}
}
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 59ccc1a626..046f7b23dc 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -1385,6 +1385,28 @@ ExecForceStoreHeapTuple(HeapTuple tuple,
}
+void
+ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot)
+{
+ HeapTuple tuple;
+ HeapTupleHeader td;
+ MemoryContext oldContext;
+
+ td = DatumGetHeapTupleHeader(data);
+
+ tuple = (HeapTuple) palloc(HEAPTUPLESIZE + HeapTupleHeaderGetDatumLength(td));
+ tuple->t_len = HeapTupleHeaderGetDatumLength(td);
+ tuple->t_self = td->t_ctid;
+ tuple->t_data = (HeapTupleHeader) ((char *) tuple + HEAPTUPLESIZE);
+ memcpy((char *) tuple->t_data, (char *) td, tuple->t_len);
+
+ ExecClearTuple(slot);
+
+ heap_deform_tuple(tuple, slot->tts_tupleDescriptor,
+ slot->tts_values, slot->tts_isnull);
+ ExecStoreVirtualTuple(slot);
+}
+
/* --------------------------------
* ExecStoreMinimalTuple
*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 14ca3b976e..d7d79f845c 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -609,7 +609,7 @@ ExecDelete(ModifyTableState *mtstate,
bool canSetTag,
bool changingPart,
bool *tupleDeleted,
- TupleTableSlot **epqslot)
+ TupleTableSlot **epqreturnslot)
{
ResultRelInfo *resultRelInfo;
Relation resultRelationDesc;
@@ -634,7 +634,7 @@ ExecDelete(ModifyTableState *mtstate,
bool dodelete;
dodelete = ExecBRDeleteTriggers(estate, epqstate, resultRelInfo,
- tupleid, oldtuple, epqslot);
+ tupleid, oldtuple, epqreturnslot);
if (!dodelete) /* "do nothing" */
return NULL;
@@ -727,6 +727,14 @@ ldelete:;
/* Tuple no more passing quals, exiting... */
return NULL;
}
+
+ /**/
+ if (epqreturnslot)
+ {
+ *epqreturnslot = epqslot;
+ return NULL;
+ }
+
goto ldelete;
}
}
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index dec3e87a1e..4e49304ba7 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -313,6 +313,7 @@ extern TupleTableSlot *ExecStoreMinimalTuple(MinimalTuple mtup,
extern void ExecForceStoreHeapTuple(HeapTuple tuple,
TupleTableSlot *slot);
+extern void ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot);
extern TupleTableSlot *ExecStoreVirtualTuple(TupleTableSlot *slot);
extern TupleTableSlot *ExecStoreAllNullTuple(TupleTableSlot *slot);
--
2.18.0.windows.1
0001-FDW-RefetchForeignRow-API-prototype-change.patchapplication/octet-stream; name=0001-FDW-RefetchForeignRow-API-prototype-change.patchDownload
From 71a9811c742dfe7ac8cfac053d02869540452f92 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Wed, 22 Aug 2018 16:45:02 +1000
Subject: [PATCH 1/2] FDW RefetchForeignRow API prototype change
With pluggable storage, the tuple usage is minimized
and all the extenal API's must deal with TupleTableSlot.
---
doc/src/sgml/fdwhandler.sgml | 10 ++++++----
src/backend/executor/execMain.c | 16 ++++++++--------
src/backend/executor/nodeLockRows.c | 20 +++++++-------------
src/include/foreign/fdwapi.h | 9 +++++----
4 files changed, 26 insertions(+), 29 deletions(-)
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 4ce88dd77c..12769f3288 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -988,23 +988,25 @@ GetForeignRowMarkType(RangeTblEntry *rte,
<para>
<programlisting>
-HeapTuple
+TupleTableSlot *
RefetchForeignRow(EState *estate,
ExecRowMark *erm,
Datum rowid,
+ TupleTableSlot *slot,
bool *updated);
</programlisting>
- Re-fetch one tuple from the foreign table, after locking it if required.
+ Re-fetch one tuple slot from the foreign table, after locking it if required.
<literal>estate</literal> is global execution state for the query.
<literal>erm</literal> is the <structname>ExecRowMark</structname> struct describing
the target foreign table and the row lock type (if any) to acquire.
<literal>rowid</literal> identifies the tuple to be fetched.
- <literal>updated</literal> is an output parameter.
+ <literal>slot</literal> contains nothing useful upon call, but can be used to
+ hold the returned tuple. <literal>updated</literal> is an output parameter.
</para>
<para>
- This function should return a palloc'ed copy of the fetched tuple,
+ This function should return a slot containing the fetched tuple
or <literal>NULL</literal> if the row lock couldn't be obtained. The row lock
type to acquire is defined by <literal>erm->markType</literal>, which is the
value previously returned by <function>GetForeignRowMarkType</function>.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index dbbebca045..1237e809f3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2704,23 +2704,24 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
/* fetch requests on foreign tables must be passed to their FDW */
if (erm->relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
{
- elog(ERROR, "frak, need to change fdw API");
-#ifdef FIXME
FdwRoutine *fdwroutine;
bool updated = false;
fdwroutine = GetFdwRoutineForRelation(erm->relation, false);
+
/* this should have been checked already, but let's be safe */
if (fdwroutine->RefetchForeignRow == NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot lock rows in foreign table \"%s\"",
RelationGetRelationName(erm->relation))));
- tuple = fdwroutine->RefetchForeignRow(epqstate->estate,
- erm,
- datum,
- &updated);
- if (tuple == NULL)
+
+ slot = fdwroutine->RefetchForeignRow(epqstate->estate,
+ erm,
+ datum,
+ slot,
+ &updated);
+ if (slot == NULL)
elog(ERROR, "failed to fetch tuple for EvalPlanQual recheck");
/*
@@ -2728,7 +2729,6 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
* assumes that FDWs can track that exactly, which they might
* not be able to. So just ignore the flag.
*/
-#endif
}
else
{
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index e58e0919d8..3a4071f2e3 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -128,33 +128,27 @@ lnext:
{
FdwRoutine *fdwroutine;
bool updated = false;
- HeapTuple copyTuple;
-
- elog(ERROR, "frak, tuple based API needs to be rewritten");
fdwroutine = GetFdwRoutineForRelation(erm->relation, false);
+
/* this should have been checked already, but let's be safe */
if (fdwroutine->RefetchForeignRow == NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot lock rows in foreign table \"%s\"",
RelationGetRelationName(erm->relation))));
- copyTuple = fdwroutine->RefetchForeignRow(estate,
- erm,
- datum,
- &updated);
- if (copyTuple == NULL)
+ markSlot = fdwroutine->RefetchForeignRow(estate,
+ erm,
+ datum,
+ markSlot,
+ &updated);
+ if (markSlot == NULL)
{
/* couldn't get the lock, so skip this row */
goto lnext;
}
- elog(ERROR, "frak: slotify");
-
- /* save locked tuple for possible EvalPlanQual testing below */
- //*testTuple = copyTuple;
-
/*
* if FDW says tuple was updated before getting locked, we need to
* perform EPQ testing to see if quals are still satisfied
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index c14eb546c6..508b0eece8 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -121,10 +121,11 @@ typedef void (*EndDirectModify_function) (ForeignScanState *node);
typedef RowMarkType (*GetForeignRowMarkType_function) (RangeTblEntry *rte,
LockClauseStrength strength);
-typedef HeapTuple (*RefetchForeignRow_function) (EState *estate,
- ExecRowMark *erm,
- Datum rowid,
- bool *updated);
+typedef TupleTableSlot *(*RefetchForeignRow_function) (EState *estate,
+ ExecRowMark *erm,
+ Datum rowid,
+ TupleTableSlot *slot,
+ bool *updated);
typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
struct ExplainState *es);
--
2.18.0.windows.1
Hi,
On 2018-08-24 11:55:41 +1000, Haribabu Kommi wrote:
On Tue, Aug 21, 2018 at 6:59 PM Andres Freund <andres@anarazel.de> wrote:
On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:
On Sun, Aug 5, 2018 at 7:48 PM Andres Freund <andres@anarazel.de> wrote:
I'm currently in the process of rebasing zheap onto the pluggable
storage work. The goal, which seems to work surprisingly well, is to
find issues that the current pluggable storage patch doesn't yet deal
with. I plan to push a tree including a lot of fixes and improvements
soon.That's good. Did you find any problems in porting zheap into pluggable
storage? Does it needs any API changes or new API requirement?A lot, yes. The big changes are:
- removal of HeapPageScanDesc
- introduction of explicit support functions for tablesample & bitmap scans
- introduction of callbacks for vacuum_rel, clusterAnd quite a bit more along those lines.
OK. Those are quite a bit of changes.
I've pushed a current version of that to my git tree to the
pluggable-storage branch. It's not really a version that I think makese
sense to review or such, but it's probably more useful if you work based
on that. There's also the pluggable-zheap branch, which I found
extremely useful to develop against.
There's a few further changes since last time: - Pluggable handlers are
now stored in static global variables, and thus do not need to be copied
anymore - VACUUM FULL / CLUSTER is moved into one callback that does the
actual copying. The various previous rewrite callbacks imo integrated at
the wrong level. - there's a GUC that allows to change the default
table AM - moving COPY to use slots (roughly based on your / Haribabu's
patch) - removed the AM specific shmem initialization callbacks -
various AMs are going to need the syncscan APIs, so moving that into AM
callbacks doesn't make sense.
Missing:
- callback for the second scan of CREATE INDEX CONCURRENTLY
- commands/analyze.c integration (Working on it)
- fixing your (Haribabu's) slotification of copy patch to compute memory
usage somehow
- table creation callback, currently the pluggable-zheap patch has a few
conditionals outside of access/zheap for that purpose (see RelationTruncate
- header structure cleanup
And then:
- lotsa cleanups
- rebasing onto a newer version of the abstract slot patchset
- splitting out smaller patches
You'd moved the bulk insert into tableam callbacks - I don't quite get
why? There's not really anything AM specific in that code?
Does the new TupleTableSlot abstraction patches has fixed any of these
issues in the recent thread [1]? so that I can look into the change of
FDW API to return slot instead of tuple.Yea, that'd be a good thing to start with.
I found out only the RefetchForeignRow API needs the change and done the
same.
Along with that, I fixed all the issues of running make check-world.
Attached patches
for the same.
Thanks, that's really helpful! I'll try to merge these soon.
I'm starting to think that we're getting closer to something that
looks right from a high level, even though there's a lot of details to
clean up.
Greetings,
Andres Freund
On Fri, Aug 24, 2018 at 12:50 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-08-24 11:55:41 +1000, Haribabu Kommi wrote:
On Tue, Aug 21, 2018 at 6:59 PM Andres Freund <andres@anarazel.de>
wrote:
On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:
On Sun, Aug 5, 2018 at 7:48 PM Andres Freund <andres@anarazel.de>
wrote:
I'm currently in the process of rebasing zheap onto the pluggable
storage work. The goal, which seems to work surprisingly well, isto
find issues that the current pluggable storage patch doesn't yet
deal
with. I plan to push a tree including a lot of fixes and
improvements
soon.
That's good. Did you find any problems in porting zheap into
pluggable
storage? Does it needs any API changes or new API requirement?
A lot, yes. The big changes are:
- removal of HeapPageScanDesc
- introduction of explicit support functions for tablesample & bitmapscans
- introduction of callbacks for vacuum_rel, cluster
And quite a bit more along those lines.
OK. Those are quite a bit of changes.
I've pushed a current version of that to my git tree to the
pluggable-storage branch. It's not really a version that I think makese
sense to review or such, but it's probably more useful if you work based
on that. There's also the pluggable-zheap branch, which I found
extremely useful to develop against.
OK. Thanks, will check that also.
There's a few further changes since last time: - Pluggable handlers are
now stored in static global variables, and thus do not need to be copied
anymore - VACUUM FULL / CLUSTER is moved into one callback that does the
actual copying. The various previous rewrite callbacks imo integrated at
the wrong level. - there's a GUC that allows to change the default
table AM - moving COPY to use slots (roughly based on your / Haribabu's
patch) - removed the AM specific shmem initialization callbacks -
various AMs are going to need the syncscan APIs, so moving that into AM
callbacks doesn't make sense.
OK.
Missing:
- callback for the second scan of CREATE INDEX CONCURRENTLY
- commands/analyze.c integration (Working on it)
- fixing your (Haribabu's) slotification of copy patch to compute memory
usage somehow
I will check it.
- table creation callback, currently the pluggable-zheap patch has a few
conditionals outside of access/zheap for that purpose (see
RelationTruncate
I will check it.
And then:
- lotsa cleanups
- rebasing onto a newer version of the abstract slot patchset
- splitting out smaller patchesYou'd moved the bulk insert into tableam callbacks - I don't quite get
why? There's not really anything AM specific in that code?
The main reason of adding them to AM is just to provide a control to
the specific AM to decide whether they can support the bulk insert or
not?
Current framework doesn't support AM specific bulk insert state to be
passed from one function to another and it's structure is fixed. This needs
to be enhanced to add AM specific private members also.
Does the new TupleTableSlot abstraction patches has fixed any of
these
issues in the recent thread [1]? so that I can look into the change
of
FDW API to return slot instead of tuple.
Yea, that'd be a good thing to start with.
I found out only the RefetchForeignRow API needs the change and done the
same.
Along with that, I fixed all the issues of running make check-world.
Attached patches
for the same.Thanks, that's really helpful! I'll try to merge these soon.
I can share the rebased patches for the fixes, so that it will be easy to
merge.
I'm starting to think that we're getting closer to something that
looks right from a high level, even though there's a lot of details to
clean up.
That's good.
Regards,
Haribabu Kommi
Fujitsu Australia
On Tue, Aug 28, 2018 at 1:48 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
On Fri, Aug 24, 2018 at 12:50 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-08-24 11:55:41 +1000, Haribabu Kommi wrote:
On Tue, Aug 21, 2018 at 6:59 PM Andres Freund <andres@anarazel.de>
wrote:
On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:
On Sun, Aug 5, 2018 at 7:48 PM Andres Freund <andres@anarazel.de>
wrote:
I'm currently in the process of rebasing zheap onto the pluggable
storage work. The goal, which seems to work surprisingly well, isto
find issues that the current pluggable storage patch doesn't yet
deal
with. I plan to push a tree including a lot of fixes and
improvements
soon.
That's good. Did you find any problems in porting zheap into
pluggable
storage? Does it needs any API changes or new API requirement?
A lot, yes. The big changes are:
- removal of HeapPageScanDesc
- introduction of explicit support functions for tablesample & bitmapscans
- introduction of callbacks for vacuum_rel, cluster
And quite a bit more along those lines.
OK. Those are quite a bit of changes.
I've pushed a current version of that to my git tree to the
pluggable-storage branch. It's not really a version that I think makese
sense to review or such, but it's probably more useful if you work based
on that. There's also the pluggable-zheap branch, which I found
extremely useful to develop against.OK. Thanks, will check that also.
- fixing your (Haribabu's) slotification of copy patch to compute memory
usage somehowI will check it.
Attached is the copy patch that brings back the size validation.
Compute the tuple size from the first tuple in the batch and use the same
for the
rest of the tuples in that batch. This way the calculation overhead is also
reduced,
but there may be chances that the first tuple size is very low and rest can
be very
long, but I feel those are rare.
- table creation callback, currently the pluggable-zheap patch has a few
conditionals outside of access/zheap for that purpose (see
RelationTruncateI will check it.
I found couple of places where the zheap is using some extra logic in
verifying
whether it is zheap AM or not, based on that it used to took some extra
decisions.
I am analyzing all the extra code that is done, whether any callbacks can
handle it
or not? and how? I can come back with more details later.
And then:
- lotsa cleanups
- rebasing onto a newer version of the abstract slot patchset
- splitting out smaller patchesYou'd moved the bulk insert into tableam callbacks - I don't quite get
why? There's not really anything AM specific in that code?The main reason of adding them to AM is just to provide a control to
the specific AM to decide whether they can support the bulk insert or
not?Current framework doesn't support AM specific bulk insert state to be
passed from one function to another and it's structure is fixed. This needs
to be enhanced to add AM specific private members also.
Do you want me to work on it to make it generic to AM methods to extend
the structure?
Does the new TupleTableSlot abstraction patches has fixed any of
these
issues in the recent thread [1]? so that I can look into the change
of
FDW API to return slot instead of tuple.
Yea, that'd be a good thing to start with.
I found out only the RefetchForeignRow API needs the change and done the
same.
Along with that, I fixed all the issues of running make check-world.
Attached patches
for the same.Thanks, that's really helpful! I'll try to merge these soon.
I can share the rebased patches for the fixes, so that it will be easy to
merge.
Rebased FDW and check-world fixes patch is attached.
I will continue working on the rest of the miss items.
Regards,
Haribabu Kommi
Fujitsu Australia
Attachments:
0002-copy-memory-limit-fix.patchapplication/octet-stream; name=0002-copy-memory-limit-fix.patchDownload
From fd6fb3028c1c9f7fcb41d651a324b1b1eb4ab2ce Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Wed, 29 Aug 2018 13:52:39 +1000
Subject: [PATCH 2/2] copy memory limit fix
To limit memory used by the COPY FROM because of slotification,
calculates the tuple size of the first tuple in the batch and
use that for remaining batch, so that it almost averages the
memory usage by the COPY command.
---
src/backend/commands/copy.c | 61 ++++++++++++++++++++++++-------------
1 file changed, 39 insertions(+), 22 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c9272b344a..1e2d5ebb50 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -308,7 +308,7 @@ static void CopyFromInsertBatch(CopyState cstate, EState *estate,
CommandId mycid, int hi_options,
ResultRelInfo *resultRelInfo,
BulkInsertState bistate,
- int nBufferedTuples, TupleTableSlot **bufferedSlots,
+ int nBufferedSlots, TupleTableSlot **bufferedSlots,
uint64 firstBufferedLineNo);
static bool CopyReadLine(CopyState cstate);
static bool CopyReadLineText(CopyState cstate);
@@ -2309,11 +2309,12 @@ CopyFrom(CopyState cstate)
void *bistate;
uint64 processed = 0;
bool useHeapMultiInsert;
- int nBufferedTuples = 0;
+ int nBufferedSlots = 0;
int prev_leaf_part_index = -1;
-#define MAX_BUFFERED_TUPLES 1000
+#define MAX_BUFFERED_SLOTS 1000
TupleTableSlot **bufferedSlots = NULL; /* initialize to silence warning */
+ Size bufferedSlotsSize = 0;
uint64 firstBufferedLineNo = 0;
Assert(cstate->rel);
@@ -2524,7 +2525,7 @@ CopyFrom(CopyState cstate)
else
{
useHeapMultiInsert = true;
- bufferedSlots = palloc0(MAX_BUFFERED_TUPLES * sizeof(TupleTableSlot *));
+ bufferedSlots = palloc0(MAX_BUFFERED_SLOTS * sizeof(TupleTableSlot *));
}
/*
@@ -2562,7 +2563,7 @@ CopyFrom(CopyState cstate)
CHECK_FOR_INTERRUPTS();
- if (nBufferedTuples == 0)
+ if (nBufferedSlots == 0)
{
/*
* Reset the per-tuple exprcontext. We can only do this if the
@@ -2577,14 +2578,14 @@ CopyFrom(CopyState cstate)
myslot = singleslot;
Assert(myslot != NULL);
}
- else if (bufferedSlots[nBufferedTuples] == NULL)
+ else if (bufferedSlots[nBufferedSlots] == NULL)
{
myslot = table_gimmegimmeslot(resultRelInfo->ri_RelationDesc,
&estate->es_tupleTable);
- bufferedSlots[nBufferedTuples] = myslot;
+ bufferedSlots[nBufferedSlots] = myslot;
}
else
- myslot = bufferedSlots[nBufferedTuples];
+ myslot = bufferedSlots[nBufferedSlots];
/* Switch into its memory context */
MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -2750,27 +2751,43 @@ CopyFrom(CopyState cstate)
if (useHeapMultiInsert)
{
+ int tup_size;
+
/* Add this tuple to the tuple buffer */
- if (nBufferedTuples == 0)
+ if (nBufferedSlots == 0)
+ {
firstBufferedLineNo = cstate->cur_lineno;
- Assert(bufferedSlots[nBufferedTuples] == myslot);
- nBufferedTuples++;
+
+ /*
+ * Find out the Tuple size of the first tuple in a batch and
+ * use it for the rest tuples in a batch. There may be scenarios
+ * where the first tuple is very small and rest can be large, but
+ * that's rare and this should work for majority of the scenarios.
+ */
+ tup_size = heap_compute_data_size(myslot->tts_tupleDescriptor,
+ myslot->tts_values,
+ myslot->tts_isnull);
+ }
+
+ Assert(bufferedSlots[nBufferedSlots] == myslot);
+ nBufferedSlots++;
+ bufferedSlotsSize += tup_size;
/*
* If the buffer filled up, flush it. Also flush if the
* total size of all the tuples in the buffer becomes
* large, to avoid using large amounts of memory for the
* buffer when the tuples are exceptionally wide.
- *
- * PBORKED: Re-introduce size limit
*/
- if (nBufferedTuples == MAX_BUFFERED_TUPLES)
+ if (nBufferedSlots == MAX_BUFFERED_SLOTS ||
+ bufferedSlotsSize > 65535)
{
CopyFromInsertBatch(cstate, estate, mycid, hi_options,
resultRelInfo, bistate,
- nBufferedTuples, bufferedSlots,
+ nBufferedSlots, bufferedSlots,
firstBufferedLineNo);
- nBufferedTuples = 0;
+ nBufferedSlots = 0;
+ bufferedSlotsSize = 0;
}
}
else
@@ -2836,10 +2853,10 @@ next_tuple:
}
/* Flush any remaining buffered tuples */
- if (nBufferedTuples > 0)
+ if (nBufferedSlots > 0)
CopyFromInsertBatch(cstate, estate, mycid, hi_options,
resultRelInfo, bistate,
- nBufferedTuples, bufferedSlots,
+ nBufferedSlots, bufferedSlots,
firstBufferedLineNo);
/* Done, clean up */
@@ -2899,7 +2916,7 @@ next_tuple:
static void
CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
int hi_options, ResultRelInfo *resultRelInfo,
- BulkInsertState bistate, int nBufferedTuples, TupleTableSlot **bufferedSlots,
+ BulkInsertState bistate, int nBufferedSlots, TupleTableSlot **bufferedSlots,
uint64 firstBufferedLineNo)
{
MemoryContext oldcontext;
@@ -2920,7 +2937,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
table_multi_insert(cstate->rel,
bufferedSlots,
- nBufferedTuples,
+ nBufferedSlots,
mycid,
hi_options,
bistate);
@@ -2932,7 +2949,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
*/
if (resultRelInfo->ri_NumIndices > 0)
{
- for (i = 0; i < nBufferedTuples; i++)
+ for (i = 0; i < nBufferedSlots; i++)
{
List *recheckIndexes;
@@ -2954,7 +2971,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
(resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
resultRelInfo->ri_TrigDesc->trig_insert_new_table))
{
- for (i = 0; i < nBufferedTuples; i++)
+ for (i = 0; i < nBufferedSlots; i++)
{
cstate->cur_lineno = firstBufferedLineNo + i;
ExecARInsertTriggers(estate, resultRelInfo,
--
2.18.0.windows.1
0001-check-world-fixes.patchapplication/octet-stream; name=0001-check-world-fixes.patchDownload
From e64936a923fa2772fa9151a0e51bb4a042bbd36b Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Wed, 29 Aug 2018 14:22:07 +1000
Subject: [PATCH 1/2] check-world fixes
---
contrib/pg_visibility/pg_visibility.c | 5 +++--
src/backend/access/heap/heapam.c | 2 --
src/backend/executor/execExprInterp.c | 3 +++
src/backend/executor/execMain.c | 13 ++-----------
src/backend/executor/execTuples.c | 21 +++++++++++++++++++++
src/backend/executor/nodeModifyTable.c | 12 ++++++++++--
src/include/executor/tuptable.h | 1 +
7 files changed, 40 insertions(+), 17 deletions(-)
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index dce5262e34..88ca4fd2af 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -563,12 +563,13 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
rel = relation_open(relid, AccessShareLock);
+ /* Only some relkinds have a visibility map */
+ check_relation_relkind(rel);
+
if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("only heap AM is supported")));
- /* Only some relkinds have a visibility map */
- check_relation_relkind(rel);
nblocks = RelationGetNumberOfBlocks(rel);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b35dcea75e..6d516ccc0b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -83,8 +83,6 @@
/* GUC variable */
bool synchronize_seqscans = true;
-static void heap_parallelscan_startblock_init(HeapScanDesc scan);
-static BlockNumber heap_parallelscan_nextpage(HeapScanDesc scan);
static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
TransactionId xid, CommandId cid, int options);
static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 5ed9273d32..d91d0f25a1 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -570,6 +570,9 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
Assert(TTS_IS_HEAPTUPLE(scanslot) ||
TTS_IS_BUFFERTUPLE(scanslot));
+ if (hslot->tuple == NULL)
+ ExecMaterializeSlot(scanslot);
+
d = heap_getsysattr(hslot->tuple, attnum,
scanslot->tts_tupleDescriptor,
op->resnull);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index f430733b69..faeb960e1d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2559,7 +2559,7 @@ EvalPlanQual(EState *estate, EPQState *epqstate,
* datums that may be present in copyTuple). As with the next step, this
* is to guard against early re-use of the EPQ query.
*/
- if (!TupIsNull(slot))
+ if (!TupIsNull(slot) && !TTS_IS_VIRTUAL(slot))
ExecMaterializeSlot(slot);
#if FIXME
@@ -2766,16 +2766,7 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
if (isNull)
continue;
- elog(ERROR, "frak, need to implement ROW_MARK_COPY");
-#ifdef FIXME
- // FIXME: this should just deform the tuple and store it as a
- // virtual one.
- tuple = table_tuple_by_datum(erm->relation, datum, erm->relid);
-
- /* store tuple */
- EvalPlanQualSetTuple(epqstate, erm->rti, tuple);
-#endif
-
+ ExecForceStoreHeapTupleDatum(datum, slot);
}
}
}
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 4921835c31..7628799d41 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -1397,6 +1397,27 @@ ExecStoreAllNullTuple(TupleTableSlot *slot)
return ExecFinishStoreSlotValues(slot);
}
+void
+ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot)
+{
+ HeapTuple tuple;
+ HeapTupleHeader td;
+
+ td = DatumGetHeapTupleHeader(data);
+
+ tuple = (HeapTuple) palloc(HEAPTUPLESIZE + HeapTupleHeaderGetDatumLength(td));
+ tuple->t_len = HeapTupleHeaderGetDatumLength(td);
+ tuple->t_self = td->t_ctid;
+ tuple->t_data = (HeapTupleHeader) ((char *) tuple + HEAPTUPLESIZE);
+ memcpy((char *) tuple->t_data, (char *) td, tuple->t_len);
+
+ ExecClearTuple(slot);
+
+ heap_deform_tuple(tuple, slot->tts_tupleDescriptor,
+ slot->tts_values, slot->tts_isnull);
+ ExecFinishStoreSlotValues(slot);
+}
+
/* --------------------------------
* ExecFetchSlotTuple
* Fetch the slot's regular physical tuple.
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 226ba5fc21..0b38259387 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -607,7 +607,7 @@ ExecDelete(ModifyTableState *mtstate,
bool canSetTag,
bool changingPart,
bool *tupleDeleted,
- TupleTableSlot **epqslot)
+ TupleTableSlot **epqreturnslot)
{
ResultRelInfo *resultRelInfo;
Relation resultRelationDesc;
@@ -632,7 +632,7 @@ ExecDelete(ModifyTableState *mtstate,
bool dodelete;
dodelete = ExecBRDeleteTriggers(estate, epqstate, resultRelInfo,
- tupleid, oldtuple, epqslot);
+ tupleid, oldtuple, epqreturnslot);
if (!dodelete) /* "do nothing" */
return NULL;
@@ -724,6 +724,14 @@ ldelete:;
/* Tuple no more passing quals, exiting... */
return NULL;
}
+
+ /**/
+ if (epqreturnslot)
+ {
+ *epqreturnslot = epqslot;
+ return NULL;
+ }
+
goto ldelete;
}
}
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index 1d316deed3..97aa26f5e0 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -476,6 +476,7 @@ extern TupleTableSlot *ExecCopySlot(TupleTableSlot *dstslot,
extern void ExecForceStoreHeapTuple(HeapTuple tuple,
TupleTableSlot *slot);
+extern void ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot);
extern void slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum);
extern Datum slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull);
--
2.18.0.windows.1
0003-FDW-RefetchForeignRow-API-prototype-change.patchapplication/octet-stream; name=0003-FDW-RefetchForeignRow-API-prototype-change.patchDownload
From c1a1ca617f344b8e4d6094da50585783508de0c2 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Wed, 22 Aug 2018 16:45:02 +1000
Subject: [PATCH 3/3] FDW RefetchForeignRow API prototype change
With pluggable storage, the tuple usage is minimized
and all the extenal API's must deal with TupleTableSlot.
---
doc/src/sgml/fdwhandler.sgml | 10 ++++++----
src/backend/executor/execMain.c | 16 ++++++++--------
src/backend/executor/nodeLockRows.c | 20 +++++++-------------
src/include/foreign/fdwapi.h | 9 +++++----
4 files changed, 26 insertions(+), 29 deletions(-)
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 4ce88dd77c..12769f3288 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -988,23 +988,25 @@ GetForeignRowMarkType(RangeTblEntry *rte,
<para>
<programlisting>
-HeapTuple
+TupleTableSlot *
RefetchForeignRow(EState *estate,
ExecRowMark *erm,
Datum rowid,
+ TupleTableSlot *slot,
bool *updated);
</programlisting>
- Re-fetch one tuple from the foreign table, after locking it if required.
+ Re-fetch one tuple slot from the foreign table, after locking it if required.
<literal>estate</literal> is global execution state for the query.
<literal>erm</literal> is the <structname>ExecRowMark</structname> struct describing
the target foreign table and the row lock type (if any) to acquire.
<literal>rowid</literal> identifies the tuple to be fetched.
- <literal>updated</literal> is an output parameter.
+ <literal>slot</literal> contains nothing useful upon call, but can be used to
+ hold the returned tuple. <literal>updated</literal> is an output parameter.
</para>
<para>
- This function should return a palloc'ed copy of the fetched tuple,
+ This function should return a slot containing the fetched tuple
or <literal>NULL</literal> if the row lock couldn't be obtained. The row lock
type to acquire is defined by <literal>erm->markType</literal>, which is the
value previously returned by <function>GetForeignRowMarkType</function>.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index faeb960e1d..674569a586 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2705,23 +2705,24 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
/* fetch requests on foreign tables must be passed to their FDW */
if (erm->relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
{
- elog(ERROR, "frak, need to change fdw API");
-#ifdef FIXME
FdwRoutine *fdwroutine;
bool updated = false;
fdwroutine = GetFdwRoutineForRelation(erm->relation, false);
+
/* this should have been checked already, but let's be safe */
if (fdwroutine->RefetchForeignRow == NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot lock rows in foreign table \"%s\"",
RelationGetRelationName(erm->relation))));
- tuple = fdwroutine->RefetchForeignRow(epqstate->estate,
- erm,
- datum,
- &updated);
- if (tuple == NULL)
+
+ slot = fdwroutine->RefetchForeignRow(epqstate->estate,
+ erm,
+ datum,
+ slot,
+ &updated);
+ if (slot == NULL)
elog(ERROR, "failed to fetch tuple for EvalPlanQual recheck");
/*
@@ -2729,7 +2730,6 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
* assumes that FDWs can track that exactly, which they might
* not be able to. So just ignore the flag.
*/
-#endif
}
else
{
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 668f5fa7a2..e52394a65c 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -128,33 +128,27 @@ lnext:
{
FdwRoutine *fdwroutine;
bool updated = false;
- HeapTuple copyTuple;
-
- elog(ERROR, "frak, tuple based API needs to be rewritten");
fdwroutine = GetFdwRoutineForRelation(erm->relation, false);
+
/* this should have been checked already, but let's be safe */
if (fdwroutine->RefetchForeignRow == NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot lock rows in foreign table \"%s\"",
RelationGetRelationName(erm->relation))));
- copyTuple = fdwroutine->RefetchForeignRow(estate,
- erm,
- datum,
- &updated);
- if (copyTuple == NULL)
+ markSlot = fdwroutine->RefetchForeignRow(estate,
+ erm,
+ datum,
+ markSlot,
+ &updated);
+ if (markSlot == NULL)
{
/* couldn't get the lock, so skip this row */
goto lnext;
}
- elog(ERROR, "frak: slotify");
-
- /* save locked tuple for possible EvalPlanQual testing below */
- //*testTuple = copyTuple;
-
/*
* if FDW says tuple was updated before getting locked, we need to
* perform EPQ testing to see if quals are still satisfied
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index c14eb546c6..508b0eece8 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -121,10 +121,11 @@ typedef void (*EndDirectModify_function) (ForeignScanState *node);
typedef RowMarkType (*GetForeignRowMarkType_function) (RangeTblEntry *rte,
LockClauseStrength strength);
-typedef HeapTuple (*RefetchForeignRow_function) (EState *estate,
- ExecRowMark *erm,
- Datum rowid,
- bool *updated);
+typedef TupleTableSlot *(*RefetchForeignRow_function) (EState *estate,
+ ExecRowMark *erm,
+ Datum rowid,
+ TupleTableSlot *slot,
+ bool *updated);
typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
struct ExplainState *es);
--
2.18.0.windows.1
Hi,
Thanks for the patches!
On 2018-09-03 19:06:27 +1000, Haribabu Kommi wrote:
I found couple of places where the zheap is using some extra logic in
verifying
whether it is zheap AM or not, based on that it used to took some extra
decisions.
I am analyzing all the extra code that is done, whether any callbacks can
handle it
or not? and how? I can come back with more details later.
Yea, I think some of them will need to stay (particularly around
integrating undo) and som other ones we'll need to abstract.
And then:
- lotsa cleanups
- rebasing onto a newer version of the abstract slot patchset
- splitting out smaller patchesYou'd moved the bulk insert into tableam callbacks - I don't quite get
why? There's not really anything AM specific in that code?The main reason of adding them to AM is just to provide a control to
the specific AM to decide whether they can support the bulk insert or
not?Current framework doesn't support AM specific bulk insert state to be
passed from one function to another and it's structure is fixed. This needs
to be enhanced to add AM specific private members also.Do you want me to work on it to make it generic to AM methods to extend
the structure?
I think the best thing here would be to *remove* all AM abstraction for
bulk insert, until it's actuallly needed. The likelihood of us getting
the interface right and useful without an actual user seems low. Also,
this already is a huge patch...
@@ -308,7 +308,7 @@ static void CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid, int hi_options, ResultRelInfo *resultRelInfo, BulkInsertState bistate, - int nBufferedTuples, TupleTableSlot **bufferedSlots, + int nBufferedSlots, TupleTableSlot **bufferedSlots, uint64 firstBufferedLineNo); static bool CopyReadLine(CopyState cstate); static bool CopyReadLineText(CopyState cstate); @@ -2309,11 +2309,12 @@ CopyFrom(CopyState cstate) void *bistate; uint64 processed = 0; bool useHeapMultiInsert; - int nBufferedTuples = 0; + int nBufferedSlots = 0; int prev_leaf_part_index = -1;
-#define MAX_BUFFERED_TUPLES 1000 +#define MAX_BUFFERED_SLOTS 1000
What's the point of these renames? We're still dealing in tuples. Just
seems to make the patch larger.
if (useHeapMultiInsert) { + int tup_size; + /* Add this tuple to the tuple buffer */ - if (nBufferedTuples == 0) + if (nBufferedSlots == 0) + { firstBufferedLineNo = cstate->cur_lineno; - Assert(bufferedSlots[nBufferedTuples] == myslot); - nBufferedTuples++; + + /* + * Find out the Tuple size of the first tuple in a batch and + * use it for the rest tuples in a batch. There may be scenarios + * where the first tuple is very small and rest can be large, but + * that's rare and this should work for majority of the scenarios. + */ + tup_size = heap_compute_data_size(myslot->tts_tupleDescriptor, + myslot->tts_values, + myslot->tts_isnull); + }
This seems too exensive to me. I think it'd be better if we instead
used the amount of input data consumed for the tuple as a proxy. Does that
sound reasonable?
Greetings,
Andres Freund
On Tue, Sep 4, 2018 at 10:33 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
Thanks for the patches!
On 2018-09-03 19:06:27 +1000, Haribabu Kommi wrote:
I found couple of places where the zheap is using some extra logic in
verifying
whether it is zheap AM or not, based on that it used to took some extra
decisions.
I am analyzing all the extra code that is done, whether any callbacks can
handle it
or not? and how? I can come back with more details later.Yea, I think some of them will need to stay (particularly around
integrating undo) and som other ones we'll need to abstract.
OK. I will list all the areas that I found with my observation of how to
abstract or leaving it and then implement around it.
And then:
- lotsa cleanups
- rebasing onto a newer version of the abstract slot patchset
- splitting out smaller patchesYou'd moved the bulk insert into tableam callbacks - I don't quite get
why? There's not really anything AM specific in that code?The main reason of adding them to AM is just to provide a control to
the specific AM to decide whether they can support the bulk insert or
not?Current framework doesn't support AM specific bulk insert state to be
passed from one function to another and it's structure is fixed. Thisneeds
to be enhanced to add AM specific private members also.
Do you want me to work on it to make it generic to AM methods to extend
the structure?I think the best thing here would be to *remove* all AM abstraction for
bulk insert, until it's actuallly needed. The likelihood of us getting
the interface right and useful without an actual user seems low. Also,
this already is a huge patch...
OK. Will remove them and share the patch.
@@ -308,7 +308,7 @@ static void CopyFromInsertBatch(CopyState cstate,
EState *estate,
CommandId mycid, int hi_options,
ResultRelInfo *resultRelInfo,
BulkInsertState bistate,
- int nBufferedTuples,TupleTableSlot **bufferedSlots,
+ int nBufferedSlots, TupleTableSlot
**bufferedSlots,
uint64 firstBufferedLineNo); static bool CopyReadLine(CopyState cstate); static bool CopyReadLineText(CopyState cstate); @@ -2309,11 +2309,12 @@ CopyFrom(CopyState cstate) void *bistate; uint64 processed = 0; bool useHeapMultiInsert; - int nBufferedTuples = 0; + int nBufferedSlots = 0; int prev_leaf_part_index = -1;-#define MAX_BUFFERED_TUPLES 1000 +#define MAX_BUFFERED_SLOTS 1000What's the point of these renames? We're still dealing in tuples. Just
seems to make the patch larger.
OK. I will correct it.
if (useHeapMultiInsert) { + int tup_size; + /* Add this tuple to the tuplebuffer */
- if (nBufferedTuples == 0) + if (nBufferedSlots == 0) + { firstBufferedLineNo =cstate->cur_lineno;
-
Assert(bufferedSlots[nBufferedTuples] == myslot);
- nBufferedTuples++; + + /* + * Find out the Tuple sizeof the first tuple in a batch and
+ * use it for the rest
tuples in a batch. There may be scenarios
+ * where the first tuple
is very small and rest can be large, but
+ * that's rare and this
should work for majority of the scenarios.
+ */ + tup_size =heap_compute_data_size(myslot->tts_tupleDescriptor,
+
myslot->tts_values,
+
myslot->tts_isnull);
+ }
This seems too exensive to me. I think it'd be better if we instead
used the amount of input data consumed for the tuple as a proxy. Does that
sound reasonable?
Yes, the cstate structure contains the line_buf member that holds the
information of
the line length of the row, this can be used as a tuple length to limit the
size usage.
comments?
Regards,
Haribabu Kommi
Fujitsu Australia
On Wed, Sep 5, 2018 at 2:04 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
On Tue, Sep 4, 2018 at 10:33 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
Thanks for the patches!
On 2018-09-03 19:06:27 +1000, Haribabu Kommi wrote:
I found couple of places where the zheap is using some extra logic in
verifying
whether it is zheap AM or not, based on that it used to took some extra
decisions.
I am analyzing all the extra code that is done, whether any callbackscan
handle it
or not? and how? I can come back with more details later.Yea, I think some of them will need to stay (particularly around
integrating undo) and som other ones we'll need to abstract.OK. I will list all the areas that I found with my observation of how to
abstract or leaving it and then implement around it.
The following are the change where the code is specific to checking whether
it is a zheap relation or not?
Overall I found that It needs 3 new API's at the following locations.
1. RelationSetNewRelfilenode
2. heap_create_init_fork
3. estimate_rel_size
4. Facility to provide handler options like (skip WAL and etc).
_hash_vacuum_one_page:
xlrec.flags = RelationStorageIsZHeap(heapRel) ?
XLOG_HASH_VACUUM_RELATION_STORAGE_ZHEAP : 0;
_bt_delitems_delete:
xlrec_delete.flags = RelationStorageIsZHeap(heapRel) ?
XLOG_BTREE_DELETE_RELATION_STORAGE_ZHEAP : 0;
Storing the type of the handler and while checking for these new types
adding a
new API for special handing can remove the need of the above code.
RelationAddExtraBlocks:
if (RelationStorageIsZHeap(relation))
{
ZheapInitPage(page, BufferGetPageSize(buffer));
freespace = PageGetZHeapFreeSpace(page);
}
Adding a new API for PageInt and PageGetHeapFreeSpace to redirect the calls
to specific
table am handlers.
visibilitymap_set:
if (RelationStorageIsZHeap(rel))
{
recptr = log_zheap_visible(rel->rd_node, heapBuf, vmBuf,
cutoff_xid, flags);
/*
* We do not have a page wise visibility flag in zheap.
* So no need to set LSN on zheap page.
*/
}
Handler options may solve need of above code.
validate_index_heapscan:
/* Set up for predicate or expression evaluation */
/* For zheap relations, the tuple is locally allocated, so free it. */
ExecStoreHeapTuple(heapTuple, slot, RelationStorageIsZHeap(heapRelation));
This will solve after changing the validate_index_heapscan function to
slotify.
RelationTruncate:
/* Create the meta page for zheap */
if (RelationStorageIsZHeap(rel))
RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
InvalidTransactionId,
InvalidMultiXactId);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED &&
rel->rd_rel->relkind != 'p')
{
heap_create_init_fork(rel);
if (RelationStorageIsZHeap(rel))
ZheapInitMetaPage(rel, INIT_FORKNUM);
}
new API in RelationSetNewRelfilenode and heap_create_init_fork can solve it.
cluster:
if (RelationStorageIsZHeap(rel))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot cluster a zheap table")));
No change required.
copyFrom:
/*
* In zheap, we don't support the optimization for HEAP_INSERT_SKIP_WAL.
* See zheap_prepare_insert for details.
* PBORKED / ZBORKED: abstract
*/
if (!RelationStorageIsZHeap(cstate->rel) && !XLogIsNeeded())
hi_options |= HEAP_INSERT_SKIP_WAL;
How about requesting the table am handler to provide options and use them
here?
ExecuteTruncateGuts:
// PBORKED: Need to abstract this
minmulti = GetOldestMultiXactId();
/*
* Need the full transaction-safe pushups.
*
* Create a new empty storage file for the relation, and assign it
* as the relfilenode value. The old storage file is scheduled for
* deletion at commit.
*
* PBORKED: needs to be a callback
*/
if (RelationStorageIsZHeap(rel))
RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
InvalidTransactionId, InvalidMultiXactId);
else
RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
RecentXmin, minmulti);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
{
heap_create_init_fork(rel);
if (RelationStorageIsZHeap(rel))
ZheapInitMetaPage(rel, INIT_FORKNUM);
}
New API inside RelationSetNewRelfilenode can handle it.
ATRewriteCatalogs:
/* Inherit the storage_engine reloption from the parent table. */
if (RelationStorageIsZHeap(rel))
{
static char *validnsps[] = HEAP_RELOPT_NAMESPACES;
DefElem *storage_engine;
storage_engine = makeDefElemExtended("toast", "storage_engine",
(Node *) makeString("zheap"),
DEFELEM_UNSPEC, -1);
reloptions = transformRelOptions((Datum) 0,
list_make1(storage_engine),
"toast",
validnsps, true, false);
}
I don't think anything can be done in API.
ATRewriteTable:
/*
* In zheap, we don't support the optimization for HEAP_INSERT_SKIP_WAL.
* See zheap_prepare_insert for details.
*
* ZFIXME / PFIXME: We probably need a different abstraction for this.
*/
if (!RelationStorageIsZHeap(newrel) && !XLogIsNeeded())
hi_options |= HEAP_INSERT_SKIP_WAL;
Options can solve this also.
estimate_rel_size:
if (curpages < 10 &&
(rel->rd_rel->relpages == 0 ||
(RelationStorageIsZHeap(rel) &&
rel->rd_rel->relpages == ZHEAP_METAPAGE + 1)) &&
!rel->rd_rel->relhassubclass &&
rel->rd_rel->relkind != RELKIND_INDEX)
curpages = 10;
/* report estimated # pages */
*pages = curpages;
/* quick exit if rel is clearly empty */
if (curpages == 0 || (RelationStorageIsZHeap(rel) &&
curpages == ZHEAP_METAPAGE + 1))
{
*tuples = 0;
*allvisfrac = 0;
break;
}
/* coerce values in pg_class to more desirable types */
relpages = (BlockNumber) rel->rd_rel->relpages;
reltuples = (double) rel->rd_rel->reltuples;
relallvisible = (BlockNumber) rel->rd_rel->relallvisible;
/*
* If it's a zheap relation, then subtract the pages
* to account for the metapage.
*/
if (relpages > 0 && RelationStorageIsZHeap(rel))
{
curpages--;
relpages--;
}
An API may be needed to find out estimation size based on handler type?
pg_stat_get_tuples_hot_updated and others:
/*
* Counter tuples_hot_updated stores number of hot updates for heap table
* and the number of inplace updates for zheap table.
*/
if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL ||
RelationStorageIsZHeap(rel))
result = 0;
else
result = (int64) (tabentry->tuples_hot_updated);
Is the special condition is needed? The values should be 0 because of zheap
right?
RelationSetNewRelfilenode:
/* Initialize the metapage for zheap relation. */
if (RelationStorageIsZHeap(relation))
ZheapInitMetaPage(relation, MAIN_FORKNUM);
New API in RelationSetNetRelfilenode Can solve this problem.
And then:
- lotsa cleanups
- rebasing onto a newer version of the abstract slot patchset
- splitting out smaller patchesYou'd moved the bulk insert into tableam callbacks - I don't quite
get
why? There's not really anything AM specific in that code?
The main reason of adding them to AM is just to provide a control to
the specific AM to decide whether they can support the bulk insert or
not?Current framework doesn't support AM specific bulk insert state to be
passed from one function to another and it's structure is fixed. Thisneeds
to be enhanced to add AM specific private members also.
Do you want me to work on it to make it generic to AM methods to extend
the structure?I think the best thing here would be to *remove* all AM abstraction for
bulk insert, until it's actuallly needed. The likelihood of us getting
the interface right and useful without an actual user seems low. Also,
this already is a huge patch...OK. Will remove them and share the patch.
Bulk insert API changes are removed.
@@ -308,7 +308,7 @@ static void CopyFromInsertBatch(CopyState cstate,
EState *estate,
CommandId mycid, int hi_options,
ResultRelInfo *resultRelInfo,
BulkInsertState bistate,
- int nBufferedTuples,TupleTableSlot **bufferedSlots,
+ int nBufferedSlots,
TupleTableSlot **bufferedSlots,
uint64 firstBufferedLineNo); static bool CopyReadLine(CopyState cstate); static bool CopyReadLineText(CopyState cstate); @@ -2309,11 +2309,12 @@ CopyFrom(CopyState cstate) void *bistate; uint64 processed = 0; bool useHeapMultiInsert; - int nBufferedTuples = 0; + int nBufferedSlots = 0; int prev_leaf_part_index = -1;-#define MAX_BUFFERED_TUPLES 1000 +#define MAX_BUFFERED_SLOTS 1000What's the point of these renames? We're still dealing in tuples. Just
seems to make the patch larger.OK. I will correct it.
if (useHeapMultiInsert) { + int tup_size; + /* Add this tuple to the tuplebuffer */
- if (nBufferedTuples == 0) + if (nBufferedSlots == 0) + { firstBufferedLineNo =cstate->cur_lineno;
-
Assert(bufferedSlots[nBufferedTuples] == myslot);
- nBufferedTuples++; + + /* + * Find out the Tuplesize of the first tuple in a batch and
+ * use it for the rest
tuples in a batch. There may be scenarios
+ * where the first tuple
is very small and rest can be large, but
+ * that's rare and this
should work for majority of the scenarios.
+ */ + tup_size =heap_compute_data_size(myslot->tts_tupleDescriptor,
+
myslot->tts_values,
+
myslot->tts_isnull);
+ }
This seems too exensive to me. I think it'd be better if we instead
used the amount of input data consumed for the tuple as a proxy. Does that
sound reasonable?Yes, the cstate structure contains the line_buf member that holds the
information of
the line length of the row, this can be used as a tuple length to limit
the size usage.
comments?
copy from batch insert memory usage limit fix and Provide grammer support
for USING method
to create table as also.
Regards,
Haribabu Kommi
Fujitsu Australia
Attachments:
0001-copy-memory-limit-fix.patchapplication/octet-stream; name=0001-copy-memory-limit-fix.patchDownload
From 67018b04b7e11ec0f0644afbbd451f5fbaf0a6d6 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Wed, 29 Aug 2018 13:52:39 +1000
Subject: [PATCH 1/3] copy memory limit fix
To limit memory used by the COPY FROM because of slotification,
calculates the tuple size of the first tuple in the batch and
use that for remaining batch, so that it almost averages the
memory usage by the COPY command.
---
src/backend/commands/copy.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c9272b344a..a82389b1a8 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2314,6 +2314,7 @@ CopyFrom(CopyState cstate)
#define MAX_BUFFERED_TUPLES 1000
TupleTableSlot **bufferedSlots = NULL; /* initialize to silence warning */
+ Size bufferedSlotsSize = 0;
uint64 firstBufferedLineNo = 0;
Assert(cstate->rel);
@@ -2753,24 +2754,26 @@ CopyFrom(CopyState cstate)
/* Add this tuple to the tuple buffer */
if (nBufferedTuples == 0)
firstBufferedLineNo = cstate->cur_lineno;
+
Assert(bufferedSlots[nBufferedTuples] == myslot);
nBufferedTuples++;
+ bufferedSlotsSize += cstate->line_buf.len;
/*
* If the buffer filled up, flush it. Also flush if the
* total size of all the tuples in the buffer becomes
* large, to avoid using large amounts of memory for the
* buffer when the tuples are exceptionally wide.
- *
- * PBORKED: Re-introduce size limit
*/
- if (nBufferedTuples == MAX_BUFFERED_TUPLES)
+ if (nBufferedTuples == MAX_BUFFERED_TUPLES ||
+ bufferedSlotsSize > 65535)
{
CopyFromInsertBatch(cstate, estate, mycid, hi_options,
resultRelInfo, bistate,
nBufferedTuples, bufferedSlots,
firstBufferedLineNo);
nBufferedTuples = 0;
+ bufferedSlotsSize = 0;
}
}
else
--
2.18.0.windows.1
0003-CREATE-AS-USING-method-grammer-support.patchapplication/octet-stream; name=0003-CREATE-AS-USING-method-grammer-support.patchDownload
From 8b59dbc51a15fe769e29f22d5049fa42bf8eebfc Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 10 Sep 2018 17:20:56 +1000
Subject: [PATCH 3/3] CREATE AS USING method grammer support
This change was missed during earlier USING grammer support.
---
src/backend/parser/gram.y | 11 ++++++-----
src/include/nodes/primnodes.h | 1 +
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 8f6f9ddae2..a9c5450a37 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4038,7 +4038,6 @@ CreateStatsStmt:
*
*****************************************************************************/
-// PBORKED: storage option
CreateAsStmt:
CREATE OptTemp TABLE create_as_target AS SelectStmt opt_with_data
{
@@ -4069,14 +4068,16 @@ CreateAsStmt:
;
create_as_target:
- qualified_name opt_column_list OptWith OnCommitOption OptTableSpace
+ qualified_name opt_column_list table_access_method_clause
+ OptWith OnCommitOption OptTableSpace
{
$$ = makeNode(IntoClause);
$$->rel = $1;
$$->colNames = $2;
- $$->options = $3;
- $$->onCommit = $4;
- $$->tableSpaceName = $5;
+ $$->accessMethod = $3;
+ $$->options = $4;
+ $$->onCommit = $5;
+ $$->tableSpaceName = $6;
$$->viewQuery = NULL;
$$->skipData = false; /* might get changed later */
}
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..ffc788c4a3 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -108,6 +108,7 @@ typedef struct IntoClause
RangeVar *rel; /* target relation name */
List *colNames; /* column names to assign, or NIL */
+ char *accessMethod; /* table access method */
List *options; /* options from WITH clause */
OnCommitAction onCommit; /* what do we do at COMMIT? */
char *tableSpaceName; /* table space to use, or NULL */
--
2.18.0.windows.1
0002-Remove-of-Bulk-insert-state-API.patchapplication/octet-stream; name=0002-Remove-of-Bulk-insert-state-API.patchDownload
From b01172f17c561b89c79a88733241e020ecf946e3 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Wed, 5 Sep 2018 17:18:09 +1000
Subject: [PATCH 2/3] Remove of Bulk insert state API
Currently there is no requirement of exposing Bulk Insert state
APIs, as there is no use of it currently.
---
src/backend/access/heap/heapam_handler.c | 4 ---
src/backend/commands/copy.c | 6 ++---
src/backend/commands/createas.c | 4 +--
src/backend/commands/matview.c | 4 +--
src/backend/commands/tablecmds.c | 4 +--
src/include/access/tableam.h | 32 ------------------------
6 files changed, 9 insertions(+), 45 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 382148ff1d..2d5074734b 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1771,10 +1771,6 @@ static const TableAmRoutine heapam_methods = {
.relation_copy_for_cluster = heap_copy_for_cluster,
.relation_sync = heap_sync,
- .getbulkinsertstate = GetBulkInsertState,
- .freebulkinsertstate = FreeBulkInsertState,
- .releasebulkinsertstate = ReleaseBulkInsertStatePin,
-
.begin_index_fetch = heapam_begin_index_fetch,
.reset_index_fetch = heapam_reset_index_fetch,
.end_index_fetch = heapam_end_index_fetch,
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index a82389b1a8..49e654e4ee 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2546,7 +2546,7 @@ CopyFrom(CopyState cstate)
*/
ExecBSInsertTriggers(estate, resultRelInfo);
- bistate = table_getbulkinsertstate(resultRelInfo->ri_RelationDesc);
+ bistate = GetBulkInsertState();
econtext = GetPerTupleExprContext(estate);
/* Set up callback to identify error line number */
@@ -2639,7 +2639,7 @@ CopyFrom(CopyState cstate)
*/
if (prev_leaf_part_index != leaf_part_index)
{
- table_releasebulkinsertstate(resultRelInfo->ri_RelationDesc, bistate);
+ ReleaseBulkInsertStatePin(bistate);
prev_leaf_part_index = leaf_part_index;
}
@@ -2848,7 +2848,7 @@ next_tuple:
/* Done, clean up */
error_context_stack = errcallback.previous;
- table_freebulkinsertstate(resultRelInfo->ri_RelationDesc, bistate);
+ FreeBulkInsertState(bistate);
MemoryContextSwitchTo(oldcontext);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index f84ef0a65e..852c6becba 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -572,7 +572,7 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
*/
myState->hi_options = HEAP_INSERT_SKIP_FSM |
(XLogIsNeeded() ? 0 : HEAP_INSERT_SKIP_WAL);
- myState->bistate = table_getbulkinsertstate(intoRelationDesc);
+ myState->bistate = GetBulkInsertState();
/* Not using WAL requires smgr_targblock be initially invalid */
Assert(RelationGetTargetBlock(intoRelationDesc) == InvalidBlockNumber);
@@ -611,7 +611,7 @@ intorel_shutdown(DestReceiver *self)
{
DR_intorel *myState = (DR_intorel *) self;
- table_freebulkinsertstate(myState->rel, myState->bistate);
+ FreeBulkInsertState(myState->bistate);
/* If we skipped using WAL, must heap_sync before commit */
if (myState->hi_options & HEAP_INSERT_SKIP_WAL)
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 83ee2f725e..80828ed4a6 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -466,7 +466,7 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
myState->hi_options = HEAP_INSERT_SKIP_FSM | HEAP_INSERT_FROZEN;
if (!XLogIsNeeded())
myState->hi_options |= HEAP_INSERT_SKIP_WAL;
- myState->bistate = table_getbulkinsertstate(transientrel);
+ myState->bistate = GetBulkInsertState();
/* Not using WAL requires smgr_targblock be initially invalid */
Assert(RelationGetTargetBlock(transientrel) == InvalidBlockNumber);
@@ -499,7 +499,7 @@ transientrel_shutdown(DestReceiver *self)
{
DR_transientrel *myState = (DR_transientrel *) self;
- table_freebulkinsertstate(myState->transientrel, myState->bistate);
+ FreeBulkInsertState(myState->bistate);
/* If we skipped using WAL, must heap_sync before commit */
if (myState->hi_options & HEAP_INSERT_SKIP_WAL)
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index ff6e4486f0..d44d865ec7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -4616,7 +4616,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
if (newrel)
{
mycid = GetCurrentCommandId(true);
- bistate = table_getbulkinsertstate(newrel);
+ bistate = GetBulkInsertState();
hi_options = HEAP_INSERT_SKIP_FSM;
if (!XLogIsNeeded())
@@ -4901,7 +4901,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
heap_close(oldrel, NoLock);
if (newrel)
{
- table_freebulkinsertstate(newrel, bistate);
+ FreeBulkInsertState(bistate);
/* If we skipped writing WAL, then we need to sync the heap. */
if (hi_options & HEAP_INSERT_SKIP_WAL)
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 6d50410166..5f6b39c0e0 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -125,10 +125,6 @@ typedef void (*RelationCopyForCluster_function)(Relation NewHeap, Relation OldHe
typedef void (*RelationSync_function) (Relation relation);
-typedef BulkInsertState (*GetBulkInsertState_function) (void);
-typedef void (*FreeBulkInsertState_function) (BulkInsertState bistate);
-typedef void (*ReleaseBulkInsertState_function) (BulkInsertState bistate);
-
typedef const TupleTableSlotOps* (*SlotCallbacks_function) (Relation relation);
typedef TableScanDesc (*ScanBegin_function) (Relation relation,
@@ -217,10 +213,6 @@ typedef struct TableAmRoutine
RelationCopyForCluster_function relation_copy_for_cluster;
RelationSync_function relation_sync;
- GetBulkInsertState_function getbulkinsertstate;
- FreeBulkInsertState_function freebulkinsertstate;
- ReleaseBulkInsertState_function releasebulkinsertstate;
-
/* Operations on relation scans */
ScanBegin_function scan_begin;
ScanSetlimits_function scansetlimits;
@@ -650,30 +642,6 @@ table_sync(Relation rel)
rel->rd_tableamroutine->relation_sync(rel);
}
-/*
- * -------------------
- * storage Bulk Insert functions
- * -------------------
- */
-static inline BulkInsertState
-table_getbulkinsertstate(Relation rel)
-{
- return rel->rd_tableamroutine->getbulkinsertstate();
-}
-
-static inline void
-table_freebulkinsertstate(Relation rel, BulkInsertState bistate)
-{
- rel->rd_tableamroutine->freebulkinsertstate(bistate);
-}
-
-static inline void
-table_releasebulkinsertstate(Relation rel, BulkInsertState bistate)
-{
- rel->rd_tableamroutine->releasebulkinsertstate(bistate);
-}
-
-
static inline double
table_index_build_scan(Relation heapRelation,
Relation indexRelation,
--
2.18.0.windows.1
On Mon, Sep 10, 2018 at 1:12 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
On Wed, Sep 5, 2018 at 2:04 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
pg_stat_get_tuples_hot_updated and others:
/*
* Counter tuples_hot_updated stores number of hot updates for heap table
* and the number of inplace updates for zheap table.
*/
if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL ||
RelationStorageIsZHeap(rel))
result = 0;
else
result = (int64) (tabentry->tuples_hot_updated);Is the special condition is needed? The values should be 0 because of zheap right?
I also think so. Beena/Mithun has worked on this part of the code, so
it is better if they also confirm once.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Sep 10, 2018 at 7:33 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Sep 10, 2018 at 1:12 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
On Wed, Sep 5, 2018 at 2:04 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
pg_stat_get_tuples_hot_updated and others:
/*
* Counter tuples_hot_updated stores number of hot updates for heap table
* and the number of inplace updates for zheap table.
*/
if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL ||
RelationStorageIsZHeap(rel))
result = 0;
else
result = (int64) (tabentry->tuples_hot_updated);Is the special condition is needed? The values should be 0 because of zheap right?
I also think so. Beena/Mithun has worked on this part of the code, so
it is better if they also confirm once.
Yes pg_stat_get_tuples_hot_updated should return 0 for zheap.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
Hello,
On Mon, 10 Sep 2018, 19:33 Amit Kapila, <amit.kapila16@gmail.com> wrote:
On Mon, Sep 10, 2018 at 1:12 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:On Wed, Sep 5, 2018 at 2:04 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
pg_stat_get_tuples_hot_updated and others:
/*
* Counter tuples_hot_updated stores number of hot updates for heap table
* and the number of inplace updates for zheap table.
*/
if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL ||
RelationStorageIsZHeap(rel))
result = 0;
else
result = (int64) (tabentry->tuples_hot_updated);Is the special condition is needed? The values should be 0 because of
zheap right?
I also think so. Beena/Mithun has worked on this part of the code, so
it is better if they also confirm once.
We have used the hot_updated counter to count the number of inplace updates
for zheap to qvoid introducing a new counter. Though, technically, hot
updates are 0 for zheap, the counter could hold non-zero value indicating
the inplace updates.
Thank you
Show quoted text
On Mon, Sep 10, 2018 at 5:42 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
On Wed, Sep 5, 2018 at 2:04 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:On Tue, Sep 4, 2018 at 10:33 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
Thanks for the patches!
On 2018-09-03 19:06:27 +1000, Haribabu Kommi wrote:
I found couple of places where the zheap is using some extra logic in
verifying
whether it is zheap AM or not, based on that it used to took some extra
decisions.
I am analyzing all the extra code that is done, whether any callbackscan
handle it
or not? and how? I can come back with more details later.Yea, I think some of them will need to stay (particularly around
integrating undo) and som other ones we'll need to abstract.OK. I will list all the areas that I found with my observation of how to
abstract or leaving it and then implement around it.The following are the change where the code is specific to checking whether
it is a zheap relation or not?Overall I found that It needs 3 new API's at the following locations.
1. RelationSetNewRelfilenode
2. heap_create_init_fork
3. estimate_rel_size
4. Facility to provide handler options like (skip WAL and etc).
During the porting of Fujitsu in-memory columnar store on top of pluggable
storage, I found that the callers of the "heap_beginscan" are expecting
the returned data is always contains all the records.
For example, in the sequential scan, the heap returns the slot with
the tuple or with value array of all the columns and then the data gets
filtered and later removed the unnecessary columns with projection.
This works fine for the row based storage. For columnar storage, if
the storage knows that upper layers needs only particular columns,
then they can directly return the specified columns and there is no
need of projection step. This will help the columnar storage also
to return proper columns in a faster way.
Is it good to pass the plan to the storage, so that they can find out
the columns that needs to be returned? And also if the projection
can handle in the storage itself for some scenarios, need to be
informed the callers that there is no need to perform the projection
extra.
comments?
Regards,
Haribabu Kommi
Fujitsu Australia
Hi,
On 2018-09-21 16:57:43 +1000, Haribabu Kommi wrote:
During the porting of Fujitsu in-memory columnar store on top of pluggable
storage, I found that the callers of the "heap_beginscan" are expecting
the returned data is always contains all the records.
Right.
For example, in the sequential scan, the heap returns the slot with
the tuple or with value array of all the columns and then the data gets
filtered and later removed the unnecessary columns with projection.
This works fine for the row based storage. For columnar storage, if
the storage knows that upper layers needs only particular columns,
then they can directly return the specified columns and there is no
need of projection step. This will help the columnar storage also
to return proper columns in a faster way.
I think this is an important feature, but I feel fairly strongly that we
should only tackle it in a second version. This patchset is already
pretty darn large. It's imo not just helpful for columnar, but even for
heap - we e.g. spend a lot of time deforming columns that are never
accessed. That's particularly harmful when the leading columns are all
NOT NULL and fixed width, but even if not, it's painful.
Is it good to pass the plan to the storage, so that they can find out
the columns that needs to be returned?
I don't think that's the right approach - this should be a level *below*
plan nodes, not reference them. I suspect we're going to have to have a
new table_scan_set_columnlist() option or such.
And also if the projection can handle in the storage itself for some
scenarios, need to be informed the callers that there is no need to
perform the projection extra.
I don't think that should be done in the storage layer - that's probably
better done introducing custom scan nodes and such. This has costing
implications etc, so this needs to happen *before* planning is finished.
Greetings,
Andres Freund
On Fri, Sep 21, 2018 at 5:05 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-09-21 16:57:43 +1000, Haribabu Kommi wrote:
For example, in the sequential scan, the heap returns the slot with
the tuple or with value array of all the columns and then the data gets
filtered and later removed the unnecessary columns with projection.
This works fine for the row based storage. For columnar storage, if
the storage knows that upper layers needs only particular columns,
then they can directly return the specified columns and there is no
need of projection step. This will help the columnar storage also
to return proper columns in a faster way.I think this is an important feature, but I feel fairly strongly that we
should only tackle it in a second version. This patchset is already
pretty darn large. It's imo not just helpful for columnar, but even for
heap - we e.g. spend a lot of time deforming columns that are never
accessed. That's particularly harmful when the leading columns are all
NOT NULL and fixed width, but even if not, it's painful.
OK. Thanks for your opinion.
Then I will first try to cleanup the open items of the existing patch.
Is it good to pass the plan to the storage, so that they can find out
the columns that needs to be returned?
I don't think that's the right approach - this should be a level *below*
plan nodes, not reference them. I suspect we're going to have to have a
new table_scan_set_columnlist() option or such.
The table_scan_set_columnlist() API can be a good solution to share
the columns that are expected.
And also if the projection can handle in the storage itself for some
scenarios, need to be informed the callers that there is no need to
perform the projection extra.I don't think that should be done in the storage layer - that's probably
better done introducing custom scan nodes and such. This has costing
implications etc, so this needs to happen *before* planning is finished.
Sorry, my explanation was wrong, Assuming a scenario where the target list
contains only the plain columns of a table and these columns are already
passed
to storage using the above proposed new API and their of one to one mapping.
Based on the above info, deciding whether the projection is required or not
is good.
Regards,
Haribabu Kommi
Fujitsu Australia
On Fri, Aug 24, 2018 at 5:50 AM Andres Freund <andres@anarazel.de> wrote:
I've pushed a current version of that to my git tree to the
pluggable-storage branch. It's not really a version that I think makese
sense to review or such, but it's probably more useful if you work based
on that. There's also the pluggable-zheap branch, which I found
extremely useful to develop against.
BTW, I'm going to take a look at current shape of this patch and share
my thoughts. But where are the branches you're referring? On your
postgres.org git repository pluggable-storage brach was updates last
time at June 7. And on the github branches are updated at August 5
and 14, and that is still much older than your email (August 24)...
1. https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/pluggable-storage
2. https://github.com/anarazel/postgres-pluggable-storage
3, https://github.com/anarazel/postgres-pluggable-storage/tree/pluggable-zheap
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Mon, Sep 24, 2018 at 5:02 AM Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:
On Fri, Aug 24, 2018 at 5:50 AM Andres Freund <andres@anarazel.de> wrote:
I've pushed a current version of that to my git tree to the
pluggable-storage branch. It's not really a version that I think makese
sense to review or such, but it's probably more useful if you work based
on that. There's also the pluggable-zheap branch, which I found
extremely useful to develop against.BTW, I'm going to take a look at current shape of this patch and share
my thoughts. But where are the branches you're referring? On your
postgres.org git repository pluggable-storage brach was updates last
time at June 7. And on the github branches are updated at August 5
and 14, and that is still much older than your email (August 24)...
The code is latest, but the commit time is older, I feel that is because of
commit squash.
pluggable-storage is the branch where the pluggable storage code is present
and pluggable-zheap branch where zheap is rebased on top of pluggable
storage.
Regards,
Haribabu Kommi
Fujitsu Australia
On Mon, Sep 24, 2018 at 8:04 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
On Mon, Sep 24, 2018 at 5:02 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Fri, Aug 24, 2018 at 5:50 AM Andres Freund <andres@anarazel.de> wrote:
I've pushed a current version of that to my git tree to the
pluggable-storage branch. It's not really a version that I think makese
sense to review or such, but it's probably more useful if you work based
on that. There's also the pluggable-zheap branch, which I found
extremely useful to develop against.BTW, I'm going to take a look at current shape of this patch and share
my thoughts. But where are the branches you're referring? On your
postgres.org git repository pluggable-storage brach was updates last
time at June 7. And on the github branches are updated at August 5
and 14, and that is still much older than your email (August 24)...The code is latest, but the commit time is older, I feel that is because of
commit squash.pluggable-storage is the branch where the pluggable storage code is present
and pluggable-zheap branch where zheap is rebased on top of pluggable
storage.
Got it, thanks!
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Fri, Sep 21, 2018 at 5:40 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
On Fri, Sep 21, 2018 at 5:05 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-09-21 16:57:43 +1000, Haribabu Kommi wrote:
For example, in the sequential scan, the heap returns the slot with
the tuple or with value array of all the columns and then the data gets
filtered and later removed the unnecessary columns with projection.
This works fine for the row based storage. For columnar storage, if
the storage knows that upper layers needs only particular columns,
then they can directly return the specified columns and there is no
need of projection step. This will help the columnar storage also
to return proper columns in a faster way.I think this is an important feature, but I feel fairly strongly that we
should only tackle it in a second version. This patchset is already
pretty darn large. It's imo not just helpful for columnar, but even for
heap - we e.g. spend a lot of time deforming columns that are never
accessed. That's particularly harmful when the leading columns are all
NOT NULL and fixed width, but even if not, it's painful.OK. Thanks for your opinion.
Then I will first try to cleanup the open items of the existing patch.
Here I attached further cleanup patches.
1. Re-arrange the GUC variable
2. Added a check function hook for default_table_access_method GUC
3. Added a new hook validate_index. I tried to change the function
validate_index_heapscan to slotify, but that have many problems as it
is accessing some internals of the heapscandesc structure and accessing
the buffer and etc.
So I added a new hook and provided a callback to handle the index insert.
Please check and let me know comments?
I will further add the new API's that are discussed for Zheap storage and
share the patch.
Regards,
Haribabu Kommi
Fujitsu Australia
Attachments:
0003-validate-index-scan-hook-addition.patchapplication/octet-stream; name=0003-validate-index-scan-hook-addition.patchDownload
From 3a783b0e62c6f93eba808e6a3c6be3c479484a5d Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Fri, 28 Sep 2018 11:25:07 +1000
Subject: [PATCH 3/3] validate index scan hook addition
Slotify the validate index is having problems as it tries to access
the buffer stored in the scandesc, so made a callback to get the
control from back.
This may needs further visit as the callback may need further
abstraction
---
src/backend/access/heap/heapam_handler.c | 243 ++++++++++++++++-
src/backend/catalog/index.c | 318 +++--------------------
src/include/access/tableam.h | 27 ++
src/include/catalog/index.h | 48 ++++
4 files changed, 352 insertions(+), 284 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 2d5074734b..ee8a658c6d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1120,6 +1120,246 @@ IndexBuildHeapRangeScan(Relation heapRelation,
return reltuples;
}
+/*
+ * validate_index_heapscan - second table scan for concurrent index build
+ *
+ * This has much code in common with IndexBuildHeapScan, but it's enough
+ * different that it seems cleaner to have two routines not one.
+ */
+static uint64
+validate_index_heapscan(Relation heapRelation,
+ Relation indexRelation,
+ IndexInfo *indexInfo,
+ Snapshot snapshot,
+ Tuplesortstate *tuplesort,
+ IndexValidateCallback callback,
+ void *callback_state)
+{
+ TableScanDesc sscan;
+ HeapScanDesc scan;
+ HeapTuple heapTuple;
+ Datum values[INDEX_MAX_KEYS];
+ bool isnull[INDEX_MAX_KEYS];
+ ExprState *predicate;
+ TupleTableSlot *slot;
+ EState *estate;
+ ExprContext *econtext;
+ BlockNumber root_blkno = InvalidBlockNumber;
+ OffsetNumber root_offsets[MaxHeapTuplesPerPage];
+ bool in_index[MaxHeapTuplesPerPage];
+
+ /* state variables for the merge */
+ ItemPointer indexcursor = NULL;
+ ItemPointerData decoded;
+ bool tuplesort_empty = false;
+ uint64 nhtups = 0;
+
+ /*
+ * sanity checks
+ */
+ Assert(OidIsValid(indexRelation->rd_rel->relam));
+
+ /*
+ * Need an EState for evaluation of index expressions and partial-index
+ * predicates. Also a slot to hold the current tuple.
+ */
+ estate = CreateExecutorState();
+ econtext = GetPerTupleExprContext(estate);
+ slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
+ &TTSOpsHeapTuple);
+
+ /* Arrange for econtext's scan tuple to be the tuple under test */
+ econtext->ecxt_scantuple = slot;
+
+ /* Set up execution state for predicate, if any. */
+ predicate = ExecPrepareQual(indexInfo->ii_Predicate, estate);
+
+ /*
+ * Prepare for scan of the base relation. We need just those tuples
+ * satisfying the passed-in reference snapshot. We must disable syncscan
+ * here, because it's critical that we read from block zero forward to
+ * match the sorted TIDs.
+ */
+ sscan = heap_beginscan(heapRelation, /* relation */
+ snapshot, /* snapshot */
+ 0, /* number of keys */
+ NULL, /* scan key */
+ NULL,
+ true, /* buffer access strategy OK */
+ false, /* syncscan not OK */
+ true,
+ false,
+ false,
+ false);
+
+ scan = (HeapScanDesc)sscan;
+
+ /*
+ * Scan all tuples matching the snapshot.
+ */
+ while ((heapTuple = heap_getnext(sscan, ForwardScanDirection)) != NULL)
+ {
+ ItemPointer heapcursor = &heapTuple->t_self;
+ ItemPointerData rootTuple;
+ OffsetNumber root_offnum;
+
+ CHECK_FOR_INTERRUPTS();
+
+ nhtups += 1;
+
+ /*
+ * As commented in IndexBuildHeapScan, we should index heap-only
+ * tuples under the TIDs of their root tuples; so when we advance onto
+ * a new heap page, build a map of root item offsets on the page.
+ *
+ * This complicates merging against the tuplesort output: we will
+ * visit the live tuples in order by their offsets, but the root
+ * offsets that we need to compare against the index contents might be
+ * ordered differently. So we might have to "look back" within the
+ * tuplesort output, but only within the current page. We handle that
+ * by keeping a bool array in_index[] showing all the
+ * already-passed-over tuplesort output TIDs of the current page. We
+ * clear that array here, when advancing onto a new heap page.
+ */
+ if (scan->rs_cblock != root_blkno)
+ {
+ Page page = BufferGetPage(scan->rs_cbuf);
+
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+ heap_get_root_tuples(page, root_offsets);
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+
+ memset(in_index, 0, sizeof(in_index));
+
+ root_blkno = scan->rs_cblock;
+ }
+
+ /* Convert actual tuple TID to root TID */
+ rootTuple = *heapcursor;
+ root_offnum = ItemPointerGetOffsetNumber(heapcursor);
+
+ if (HeapTupleIsHeapOnly(heapTuple))
+ {
+ root_offnum = root_offsets[root_offnum - 1];
+ if (!OffsetNumberIsValid(root_offnum))
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"",
+ ItemPointerGetBlockNumber(heapcursor),
+ ItemPointerGetOffsetNumber(heapcursor),
+ RelationGetRelationName(heapRelation))));
+ ItemPointerSetOffsetNumber(&rootTuple, root_offnum);
+ }
+
+ /*
+ * "merge" by skipping through the index tuples until we find or pass
+ * the current root tuple.
+ */
+ while (!tuplesort_empty &&
+ (!indexcursor ||
+ ItemPointerCompare(indexcursor, &rootTuple) < 0))
+ {
+ Datum ts_val;
+ bool ts_isnull;
+
+ if (indexcursor)
+ {
+ /*
+ * Remember index items seen earlier on the current heap page
+ */
+ if (ItemPointerGetBlockNumber(indexcursor) == root_blkno)
+ in_index[ItemPointerGetOffsetNumber(indexcursor) - 1] = true;
+ }
+
+ tuplesort_empty = !tuplesort_getdatum(tuplesort, true,
+ &ts_val, &ts_isnull, NULL);
+ Assert(tuplesort_empty || !ts_isnull);
+ if (!tuplesort_empty)
+ {
+ itemptr_decode(&decoded, DatumGetInt64(ts_val));
+ indexcursor = &decoded;
+
+ /* If int8 is pass-by-ref, free (encoded) TID Datum memory */
+#ifndef USE_FLOAT8_BYVAL
+ pfree(DatumGetPointer(ts_val));
+#endif
+ }
+ else
+ {
+ /* Be tidy */
+ indexcursor = NULL;
+ }
+ }
+
+ /*
+ * If the tuplesort has overshot *and* we didn't see a match earlier,
+ * then this tuple is missing from the index, so insert it.
+ */
+ if ((tuplesort_empty ||
+ ItemPointerCompare(indexcursor, &rootTuple) > 0) &&
+ !in_index[root_offnum - 1])
+ {
+ MemoryContextReset(econtext->ecxt_per_tuple_memory);
+
+ /* Set up for predicate or expression evaluation */
+ ExecStoreHeapTuple(heapTuple, slot, false);
+
+ /*
+ * In a partial index, discard tuples that don't satisfy the
+ * predicate.
+ */
+ if (predicate != NULL)
+ {
+ if (!ExecQual(predicate, econtext))
+ continue;
+ }
+
+ /*
+ * For the current heap tuple, extract all the attributes we use
+ * in this index, and note which are null. This also performs
+ * evaluation of any expressions needed.
+ */
+ FormIndexDatum(indexInfo,
+ slot,
+ estate,
+ values,
+ isnull);
+
+ /*
+ * You'd think we should go ahead and build the index tuple here,
+ * but some index AMs want to do further processing on the data
+ * first. So pass the values[] and isnull[] arrays, instead.
+ */
+
+ /*
+ * If the tuple is already committed dead, you might think we
+ * could suppress uniqueness checking, but this is no longer true
+ * in the presence of HOT, because the insert is actually a proxy
+ * for a uniqueness check on the whole HOT-chain. That is, the
+ * tuple we have here could be dead because it was already
+ * HOT-updated, and if so the updating transaction will not have
+ * thought it should insert index entries. The index AM will
+ * check the whole HOT-chain and correctly detect a conflict if
+ * there is one.
+ */
+
+ callback(indexRelation, values, isnull, &rootTuple, heapRelation,
+ indexInfo, callback_state);
+ }
+ }
+
+ table_endscan(sscan);
+
+ ExecDropSingleTupleTableSlot(slot);
+
+ FreeExecutorState(estate);
+
+ /* These may have been pointing to the now-gone estate */
+ indexInfo->ii_ExpressionsState = NIL;
+ indexInfo->ii_PredicateState = NULL;
+
+ return nhtups;
+}
static bool
heapam_scan_bitmap_pagescan(TableScanDesc sscan,
@@ -1775,7 +2015,8 @@ static const TableAmRoutine heapam_methods = {
.reset_index_fetch = heapam_reset_index_fetch,
.end_index_fetch = heapam_end_index_fetch,
- .index_build_range_scan = IndexBuildHeapRangeScan
+ .index_build_range_scan = IndexBuildHeapRangeScan,
+ .validate_index_scan = validate_index_heapscan
};
const TableAmRoutine *
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 2fe66972a1..a0096e60ca 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -81,7 +81,7 @@
/* Potentially set by pg_upgrade_support functions */
Oid binary_upgrade_next_index_pg_class_oid = InvalidOid;
-/* state info for validate_index bulkdelete callback */
+/* state info for validate_index bulkdelete callback */
typedef struct
{
Tuplesortstate *tuplesort; /* for sorting the index TIDs */
@@ -134,11 +134,13 @@ static void IndexCheckExclusion(Relation heapRelation,
static inline int64 itemptr_encode(ItemPointer itemptr);
static inline void itemptr_decode(ItemPointer itemptr, int64 encoded);
static bool validate_index_callback(ItemPointer itemptr, void *opaque);
-static void validate_index_heapscan(Relation heapRelation,
- Relation indexRelation,
- IndexInfo *indexInfo,
- Snapshot snapshot,
- v_i_state *state);
+static void validate_index_scan_callbck(Relation indexRelation,
+ Datum *values,
+ bool *isnull,
+ ItemPointer rootTuple,
+ Relation heapRelation,
+ IndexInfo *indexInfo,
+ void *opaque);
static bool ReindexIsCurrentlyProcessingIndex(Oid indexOid);
static void SetReindexProcessing(Oid heapOid, Oid indexOid);
static void ResetReindexProcessing(void);
@@ -2638,11 +2640,13 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
/*
* Now scan the heap and "merge" it with the index
*/
- validate_index_heapscan(heapRelation,
- indexRelation,
- indexInfo,
- snapshot,
- &state);
+ state.htups = table_validate_index(heapRelation,
+ indexRelation,
+ indexInfo,
+ snapshot,
+ state.tuplesort,
+ validate_index_scan_callbck,
+ &state);
/* Done with tuplesort object */
tuplesort_end(state.tuplesort);
@@ -2662,45 +2666,6 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
heap_close(heapRelation, NoLock);
}
-/*
- * itemptr_encode - Encode ItemPointer as int64/int8
- *
- * This representation must produce values encoded as int64 that sort in the
- * same order as their corresponding original TID values would (using the
- * default int8 opclass to produce a result equivalent to the default TID
- * opclass).
- *
- * As noted in validate_index(), this can be significantly faster.
- */
-static inline int64
-itemptr_encode(ItemPointer itemptr)
-{
- BlockNumber block = ItemPointerGetBlockNumber(itemptr);
- OffsetNumber offset = ItemPointerGetOffsetNumber(itemptr);
- int64 encoded;
-
- /*
- * Use the 16 least significant bits for the offset. 32 adjacent bits are
- * used for the block number. Since remaining bits are unused, there
- * cannot be negative encoded values (We assume a two's complement
- * representation).
- */
- encoded = ((uint64) block << 16) | (uint16) offset;
-
- return encoded;
-}
-
-/*
- * itemptr_decode - Decode int64/int8 representation back to ItemPointer
- */
-static inline void
-itemptr_decode(ItemPointer itemptr, int64 encoded)
-{
- BlockNumber block = (BlockNumber) (encoded >> 16);
- OffsetNumber offset = (OffsetNumber) (encoded & 0xFFFF);
-
- ItemPointerSet(itemptr, block, offset);
-}
/*
* validate_index_callback - bulkdelete callback to collect the index TIDs
@@ -2717,242 +2682,29 @@ validate_index_callback(ItemPointer itemptr, void *opaque)
}
/*
- * validate_index_heapscan - second table scan for concurrent index build
- *
- * This has much code in common with IndexBuildHeapScan, but it's enough
- * different that it seems cleaner to have two routines not one.
+ * validate_index_scan_callbck - callback to insert into the index
*/
static void
-validate_index_heapscan(Relation heapRelation,
- Relation indexRelation,
- IndexInfo *indexInfo,
- Snapshot snapshot,
- v_i_state *state)
+validate_index_scan_callbck(Relation indexRelation,
+ Datum *values,
+ bool *isnull,
+ ItemPointer rootTuple,
+ Relation heapRelation,
+ IndexInfo *indexInfo,
+ void *opaque)
{
- TableScanDesc sscan;
- HeapScanDesc scan;
- HeapTuple heapTuple;
- Datum values[INDEX_MAX_KEYS];
- bool isnull[INDEX_MAX_KEYS];
- ExprState *predicate;
- TupleTableSlot *slot;
- EState *estate;
- ExprContext *econtext;
- BlockNumber root_blkno = InvalidBlockNumber;
- OffsetNumber root_offsets[MaxHeapTuplesPerPage];
- bool in_index[MaxHeapTuplesPerPage];
-
- /* state variables for the merge */
- ItemPointer indexcursor = NULL;
- ItemPointerData decoded;
- bool tuplesort_empty = false;
-
- /*
- * sanity checks
- */
- Assert(OidIsValid(indexRelation->rd_rel->relam));
-
- /*
- * Need an EState for evaluation of index expressions and partial-index
- * predicates. Also a slot to hold the current tuple.
- */
- estate = CreateExecutorState();
- econtext = GetPerTupleExprContext(estate);
- slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
- &TTSOpsHeapTuple);
-
- /* Arrange for econtext's scan tuple to be the tuple under test */
- econtext->ecxt_scantuple = slot;
-
- /* Set up execution state for predicate, if any. */
- predicate = ExecPrepareQual(indexInfo->ii_Predicate, estate);
-
- /*
- * Prepare for scan of the base relation. We need just those tuples
- * satisfying the passed-in reference snapshot. We must disable syncscan
- * here, because it's critical that we read from block zero forward to
- * match the sorted TIDs.
- */
- sscan = table_beginscan_strat(heapRelation, /* relation */
- snapshot, /* snapshot */
- 0, /* number of keys */
- NULL, /* scan key */
- true, /* buffer access strategy OK */
- false); /* syncscan not OK */
- scan = (HeapScanDesc) sscan;
-
- /*
- * Scan all tuples matching the snapshot.
- */
- // PBORKED: slotify
- while ((heapTuple = heap_scan_getnext(sscan, ForwardScanDirection)) != NULL)
- {
- ItemPointer heapcursor = &heapTuple->t_self;
- ItemPointerData rootTuple;
- OffsetNumber root_offnum;
-
- CHECK_FOR_INTERRUPTS();
-
- state->htups += 1;
-
- /*
- * As commented in IndexBuildHeapScan, we should index heap-only
- * tuples under the TIDs of their root tuples; so when we advance onto
- * a new heap page, build a map of root item offsets on the page.
- *
- * This complicates merging against the tuplesort output: we will
- * visit the live tuples in order by their offsets, but the root
- * offsets that we need to compare against the index contents might be
- * ordered differently. So we might have to "look back" within the
- * tuplesort output, but only within the current page. We handle that
- * by keeping a bool array in_index[] showing all the
- * already-passed-over tuplesort output TIDs of the current page. We
- * clear that array here, when advancing onto a new heap page.
- */
- if (scan->rs_cblock != root_blkno)
- {
- Page page = BufferGetPage(scan->rs_cbuf);
-
- LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
- heap_get_root_tuples(page, root_offsets);
- LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
-
- memset(in_index, 0, sizeof(in_index));
-
- root_blkno = scan->rs_cblock;
- }
-
- /* Convert actual tuple TID to root TID */
- rootTuple = *heapcursor;
- root_offnum = ItemPointerGetOffsetNumber(heapcursor);
-
- if (HeapTupleIsHeapOnly(heapTuple))
- {
- root_offnum = root_offsets[root_offnum - 1];
- if (!OffsetNumberIsValid(root_offnum))
- ereport(ERROR,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"",
- ItemPointerGetBlockNumber(heapcursor),
- ItemPointerGetOffsetNumber(heapcursor),
- RelationGetRelationName(heapRelation))));
- ItemPointerSetOffsetNumber(&rootTuple, root_offnum);
- }
-
- /*
- * "merge" by skipping through the index tuples until we find or pass
- * the current root tuple.
- */
- while (!tuplesort_empty &&
- (!indexcursor ||
- ItemPointerCompare(indexcursor, &rootTuple) < 0))
- {
- Datum ts_val;
- bool ts_isnull;
-
- if (indexcursor)
- {
- /*
- * Remember index items seen earlier on the current heap page
- */
- if (ItemPointerGetBlockNumber(indexcursor) == root_blkno)
- in_index[ItemPointerGetOffsetNumber(indexcursor) - 1] = true;
- }
-
- tuplesort_empty = !tuplesort_getdatum(state->tuplesort, true,
- &ts_val, &ts_isnull, NULL);
- Assert(tuplesort_empty || !ts_isnull);
- if (!tuplesort_empty)
- {
- itemptr_decode(&decoded, DatumGetInt64(ts_val));
- indexcursor = &decoded;
-
- /* If int8 is pass-by-ref, free (encoded) TID Datum memory */
-#ifndef USE_FLOAT8_BYVAL
- pfree(DatumGetPointer(ts_val));
-#endif
- }
- else
- {
- /* Be tidy */
- indexcursor = NULL;
- }
- }
-
- /*
- * If the tuplesort has overshot *and* we didn't see a match earlier,
- * then this tuple is missing from the index, so insert it.
- */
- if ((tuplesort_empty ||
- ItemPointerCompare(indexcursor, &rootTuple) > 0) &&
- !in_index[root_offnum - 1])
- {
- MemoryContextReset(econtext->ecxt_per_tuple_memory);
-
- /* Set up for predicate or expression evaluation */
- ExecStoreHeapTuple(heapTuple, slot, false);
-
- /*
- * In a partial index, discard tuples that don't satisfy the
- * predicate.
- */
- if (predicate != NULL)
- {
- if (!ExecQual(predicate, econtext))
- continue;
- }
-
- /*
- * For the current heap tuple, extract all the attributes we use
- * in this index, and note which are null. This also performs
- * evaluation of any expressions needed.
- */
- FormIndexDatum(indexInfo,
- slot,
- estate,
- values,
- isnull);
-
- /*
- * You'd think we should go ahead and build the index tuple here,
- * but some index AMs want to do further processing on the data
- * first. So pass the values[] and isnull[] arrays, instead.
- */
-
- /*
- * If the tuple is already committed dead, you might think we
- * could suppress uniqueness checking, but this is no longer true
- * in the presence of HOT, because the insert is actually a proxy
- * for a uniqueness check on the whole HOT-chain. That is, the
- * tuple we have here could be dead because it was already
- * HOT-updated, and if so the updating transaction will not have
- * thought it should insert index entries. The index AM will
- * check the whole HOT-chain and correctly detect a conflict if
- * there is one.
- */
-
- index_insert(indexRelation,
- values,
- isnull,
- &rootTuple,
- heapRelation,
- indexInfo->ii_Unique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
- indexInfo);
-
- state->tups_inserted += 1;
- }
- }
-
- table_endscan(sscan);
-
- ExecDropSingleTupleTableSlot(slot);
-
- FreeExecutorState(estate);
-
- /* These may have been pointing to the now-gone estate */
- indexInfo->ii_ExpressionsState = NIL;
- indexInfo->ii_PredicateState = NULL;
+ v_i_state *state = (v_i_state *)opaque;
+
+ index_insert(indexRelation,
+ values,
+ isnull,
+ rootTuple,
+ heapRelation,
+ indexInfo->ii_Unique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
+ indexInfo);
+
+ state->tups_inserted += 1;
}
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 691f687ade..27bf57a486 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -173,6 +173,14 @@ typedef double (*IndexBuildRangeScan_function)(Relation heapRelation,
void *callback_state,
TableScanDesc scan);
+typedef uint64 (*ValidateIndexscan_function)(Relation heapRelation,
+ Relation indexRelation,
+ IndexInfo *indexInfo,
+ Snapshot snapshot,
+ Tuplesortstate *tuplesort,
+ IndexValidateCallback callback,
+ void *callback_state);
+
typedef bool (*BitmapPagescan_function)(TableScanDesc scan,
TBMIterateResult *tbmres);
@@ -236,6 +244,7 @@ typedef struct TableAmRoutine
IndexBuildRangeScan_function index_build_range_scan;
+ ValidateIndexscan_function validate_index_scan;
} TableAmRoutine;
static inline const TupleTableSlotOps*
@@ -691,6 +700,24 @@ table_index_build_range_scan(Relation heapRelation,
scan);
}
+static inline uint64
+table_validate_index(Relation heapRelation,
+ Relation indexRelation,
+ IndexInfo *indexInfo,
+ Snapshot snapshot,
+ Tuplesortstate *tuplesort,
+ IndexValidateCallback callback,
+ void *callback_state)
+{
+ return heapRelation->rd_tableamroutine->validate_index_scan(heapRelation,
+ indexRelation,
+ indexInfo,
+ snapshot,
+ tuplesort,
+ callback,
+ callback_state);
+}
+
extern BlockNumber table_parallelscan_nextpage(TableScanDesc scan);
extern void table_parallelscan_startblock_init(TableScanDesc scan);
extern Size table_parallelscan_estimate(Snapshot snapshot);
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 376907b616..874e956c8e 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -28,6 +28,15 @@ typedef void (*IndexBuildCallback) (Relation index,
bool tupleIsAlive,
void *state);
+/* Typedef for callback function for table_validate_index_scan */
+typedef void (*IndexValidateCallback) (Relation indexRelation,
+ Datum *values,
+ bool *isnull,
+ ItemPointer heap_t_ctid,
+ Relation heapRelation,
+ IndexInfo *indexInfo,
+ void *state);
+
/* Action code for index_set_state_flags */
typedef enum
{
@@ -37,6 +46,45 @@ typedef enum
INDEX_DROP_SET_DEAD
} IndexStateFlagsAction;
+/*
+ * itemptr_encode - Encode ItemPointer as int64/int8
+ *
+ * This representation must produce values encoded as int64 that sort in the
+ * same order as their corresponding original TID values would (using the
+ * default int8 opclass to produce a result equivalent to the default TID
+ * opclass).
+ *
+ * As noted in validate_index(), this can be significantly faster.
+ */
+static inline int64
+itemptr_encode(ItemPointer itemptr)
+{
+ BlockNumber block = ItemPointerGetBlockNumber(itemptr);
+ OffsetNumber offset = ItemPointerGetOffsetNumber(itemptr);
+ int64 encoded;
+
+ /*
+ * Use the 16 least significant bits for the offset. 32 adjacent bits are
+ * used for the block number. Since remaining bits are unused, there
+ * cannot be negative encoded values (We assume a two's complement
+ * representation).
+ */
+ encoded = ((uint64) block << 16) | (uint16) offset;
+
+ return encoded;
+}
+
+/*
+ * itemptr_decode - Decode int64/int8 representation back to ItemPointer
+ */
+static inline void
+itemptr_decode(ItemPointer itemptr, int64 encoded)
+{
+ BlockNumber block = (BlockNumber) (encoded >> 16);
+ OffsetNumber offset = (OffsetNumber) (encoded & 0xFFFF);
+
+ ItemPointerSet(itemptr, block, offset);
+}
extern void index_check_primary_key(Relation heapRel,
IndexInfo *indexInfo,
--
2.18.0.windows.1
0001-Movting-GUC-variable-declartion-to-proper-place.patchapplication/octet-stream; name=0001-Movting-GUC-variable-declartion-to-proper-place.patchDownload
From 02340b6422a324eddbaa096fbeef95ed3a4cd6df Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 27 Sep 2018 15:54:10 +1000
Subject: [PATCH 1/3] Movting GUC variable declartion to proper place
---
src/backend/access/heap/heapam.c | 3 ---
src/backend/access/table/tableam.c | 6 ++----
src/include/access/tableam.h | 1 +
3 files changed, 3 insertions(+), 7 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6d516ccc0b..bff7049214 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -80,9 +80,6 @@
#include "nodes/execnodes.h"
#include "executor/executor.h"
-/* GUC variable */
-bool synchronize_seqscans = true;
-
static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
TransactionId xid, CommandId cid, int options);
static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 96c5325ddb..af99264df9 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -15,10 +15,8 @@
#include "storage/bufmgr.h"
#include "storage/shmem.h"
-
-// PBORKED: move to header
-extern bool synchronize_seqscans;
-
+/* GUC variable */
+bool synchronize_seqscans = true;
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 5f6b39c0e0..d0a5f59aa9 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -48,6 +48,7 @@ typedef enum tuple_data_flags
} tuple_data_flags;
extern char *default_table_access_method;
+extern bool synchronize_seqscans;
/*
* Storage routine function hooks
--
2.18.0.windows.1
0002-check_default_table_access_method-hook-to-verify-the.patchapplication/octet-stream; name=0002-check_default_table_access_method-hook-to-verify-the.patchDownload
From 8df9ac913811220bbcbfefa57e2204362c481cb9 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 27 Sep 2018 15:54:49 +1000
Subject: [PATCH 2/3] check_default_table_access_method hook to verify the
access method
---
src/backend/access/table/tableamapi.c | 88 +++++++++++++++++++++++++++
src/backend/utils/misc/guc.c | 3 +-
src/include/access/tableam.h | 4 ++
3 files changed, 93 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index 7f08500ef4..d4eb889bd2 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -14,11 +14,14 @@
#include "access/htup_details.h"
#include "access/tableam.h"
+#include "access/xact.h"
#include "catalog/pg_am.h"
#include "catalog/pg_proc.h"
+#include "utils/fmgroids.h"
#include "utils/syscache.h"
#include "utils/memutils.h"
+static Oid get_table_am_oid(const char *tableamname, bool missing_ok);
TupleTableSlot*
table_gimmegimmeslot(Relation relation, List **reglist)
@@ -97,3 +100,88 @@ GetTableAmRoutineByAmId(Oid amoid)
/* And finally, call the handler function to get the API struct. */
return GetTableAmRoutine(amhandler);
}
+
+/*
+ * get_table_am_oid - given a table access method name, look up the OID
+ *
+ * If missing_ok is false, throw an error if table access method name not
+ * found. If true, just return InvalidOid.
+ */
+static Oid
+get_table_am_oid(const char *tableamname, bool missing_ok)
+{
+ Oid result;
+ Relation rel;
+ TableScanDesc scandesc;
+ HeapTuple tuple;
+ ScanKeyData entry[1];
+
+ /*
+ * Search pg_tablespace. We use a heapscan here even though there is an
+ * index on name, on the theory that pg_tablespace will usually have just
+ * a few entries and so an indexed lookup is a waste of effort.
+ */
+ rel = heap_open(AccessMethodRelationId, AccessShareLock);
+
+ ScanKeyInit(&entry[0],
+ Anum_pg_am_amname,
+ BTEqualStrategyNumber, F_NAMEEQ,
+ CStringGetDatum(tableamname));
+ scandesc = table_beginscan_catalog(rel, 1, entry);
+ tuple = heap_scan_getnext(scandesc, ForwardScanDirection);
+
+ /* We assume that there can be at most one matching tuple */
+ if (HeapTupleIsValid(tuple) &&
+ ((Form_pg_am) GETSTRUCT(tuple))->amtype == AMTYPE_TABLE)
+ result = HeapTupleGetOid(tuple);
+ else
+ result = InvalidOid;
+
+ table_endscan(scandesc);
+ heap_close(rel, AccessShareLock);
+
+ if (!OidIsValid(result) && !missing_ok)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("table access method \"%s\" does not exist",
+ tableamname)));
+
+ return result;
+}
+
+/* check_hook: validate new default_table_access_method */
+bool
+check_default_table_access_method(char **newval, void **extra, GucSource source)
+{
+ /*
+ * If we aren't inside a transaction, we cannot do database access so
+ * cannot verify the name. Must accept the value on faith.
+ */
+ if (IsTransactionState())
+ {
+ if (**newval != '\0' &&
+ !OidIsValid(get_table_am_oid(*newval, true)))
+ {
+ /*
+ * When source == PGC_S_TEST, don't throw a hard error for a
+ * nonexistent table access method, only a NOTICE.
+ * See comments in guc.h.
+ */
+ if (source == PGC_S_TEST)
+ {
+ ereport(NOTICE,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("Table access method \"%s\" does not exist",
+ *newval)));
+ }
+ else
+ {
+ GUC_check_errdetail("Table access method \"%s\" does not exist.",
+ *newval);
+ return false;
+ }
+ }
+ }
+
+ return true;
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 3b996c8088..94b135a48b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3336,8 +3336,7 @@ static struct config_string ConfigureNamesString[] =
},
&default_table_access_method,
DEFAULT_TABLE_ACCESS_METHOD,
- /* PBORKED: a check hook would be good */
- NULL, NULL, NULL
+ check_default_table_access_method, NULL, NULL
},
{
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index d0a5f59aa9..691f687ade 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -23,6 +23,7 @@
#include "nodes/execnodes.h"
#include "nodes/nodes.h"
#include "fmgr.h"
+#include "utils/guc.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
#include "utils/snapshot.h"
@@ -714,4 +715,7 @@ extern const TableAmRoutine * GetTableAmRoutine(Oid amhandler);
extern const TableAmRoutine * GetTableAmRoutineByAmId(Oid amoid);
extern const TableAmRoutine * GetHeapamTableAmRoutine(void);
+extern bool check_default_table_access_method(char **newval, void **extra,
+ GucSource source);
+
#endif /* TABLEAM_H */
--
2.18.0.windows.1
On 2018-09-28 12:21:08 +1000, Haribabu Kommi wrote:
Here I attached further cleanup patches.
1. Re-arrange the GUC variable
2. Added a check function hook for default_table_access_method GUC
Cool.
3. Added a new hook validate_index. I tried to change the function
validate_index_heapscan to slotify, but that have many problems as it
is accessing some internals of the heapscandesc structure and accessing
the buffer and etc.
Oops, I also did that locally, in a way. I also made a validate a
callback, as the validation logic is going to be specific to the AMs.
Sorry for not pushing that up earlier. I'll try to do that soon,
there's a fair amount of change.
Greetings,
Andres Freund
On 2018-09-27 20:03:58 -0700, Andres Freund wrote:
On 2018-09-28 12:21:08 +1000, Haribabu Kommi wrote:
Here I attached further cleanup patches.
1. Re-arrange the GUC variable
2. Added a check function hook for default_table_access_method GUCCool.
3. Added a new hook validate_index. I tried to change the function
validate_index_heapscan to slotify, but that have many problems as it
is accessing some internals of the heapscandesc structure and accessing
the buffer and etc.Oops, I also did that locally, in a way. I also made a validate a
callback, as the validation logic is going to be specific to the AMs.
Sorry for not pushing that up earlier. I'll try to do that soon,
there's a fair amount of change.
I've pushed an updated version, with a fair amount of pending changes,
and I hope all your pending (and not redundant, by our concurrent
development), patches merged.
There's currently 3 regression test failures, that I'll look into
tomorrow:
- partition_prune shows a few additional Heap Blocks: exact=1 lines. I'm
a bit confused as to why, but haven't really investigated yet.
- fast_default fails, because I've undone most of 7636e5c60fea83a9f3c,
I'll have to redo that in a different way.
- I occasionally see failures in aggregates.sql - I've not figured out
what's going on there.
Amit Khandekar said he'll publish a new version of the slot-abstraction
patch tomorrow, so I'll rebase it onto that ASAP.
My next planned steps are a) to try to commit parts of the
slot-abstraction work b) to try to break out a few more pieces out of
the large pluggable storage patch.
Greetings,
Andres Freund
On Wed, Oct 3, 2018 at 3:16 PM Andres Freund <andres@anarazel.de> wrote:
On 2018-09-27 20:03:58 -0700, Andres Freund wrote:
On 2018-09-28 12:21:08 +1000, Haribabu Kommi wrote:
Here I attached further cleanup patches.
1. Re-arrange the GUC variable
2. Added a check function hook for default_table_access_method GUCCool.
3. Added a new hook validate_index. I tried to change the function
validate_index_heapscan to slotify, but that have many problems as it
is accessing some internals of the heapscandesc structure and accessing
the buffer and etc.Oops, I also did that locally, in a way. I also made a validate a
callback, as the validation logic is going to be specific to the AMs.
Sorry for not pushing that up earlier. I'll try to do that soon,
there's a fair amount of change.I've pushed an updated version, with a fair amount of pending changes,
and I hope all your pending (and not redundant, by our concurrent
development), patches merged.
Yes, All the patches are merged.
There's currently 3 regression test failures, that I'll look into
tomorrow:
- partition_prune shows a few additional Heap Blocks: exact=1 lines. I'm
a bit confused as to why, but haven't really investigated yet.
- fast_default fails, because I've undone most of 7636e5c60fea83a9f3c,
I'll have to redo that in a different way.
- I occasionally see failures in aggregates.sql - I've not figured out
what's going on there.
I also observed the failure of aggregates.sql, will look into it.
Amit Khandekar said he'll publish a new version of the slot-abstraction
patch tomorrow, so I'll rebase it onto that ASAP.
OK.
Here I attached two new API patches.
1. Set New Rel File node
2. Create Init fork
There is an another patch of "External Relations" in the older patch set,
which is not included in the current git. That patch is of creating
external relations by the extensions for their internal purpose. (Columnar
relations for the columnar storage). This new relkind can be used for
those relations, this way it provides the difference between normal and
columnar relations. Do you have any other idea of supporting those type
of relations?
And also I want to create a new API for heap_create_with_catalog
to let the pluggable storage engine to create additional relations.
This API is not required for every storage engine, so instead of moving
the entire function as API, how about adding an API at the end of the
function and calls only when it is set like hook functions? In case if the
storage engine doesn't need any of the heap_create_with_catalog
functionality then creating a full API is better.
Comments?
My next planned steps are a) to try to commit parts of the
slot-abstraction work b) to try to break out a few more pieces out of
the large pluggable storage patch.
OK. Let me know your views on the part of the pieces that are stable,
so that I can separate them from larger patch.
Regards,
Haribabu Kommi
Fujitsu Australia
Attachments:
0002-init-fork-API.patchapplication/octet-stream; name=0002-init-fork-API.patchDownload
From 3f1340364236b22f5a2b505e359083494b276b95 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 8 Oct 2018 15:08:59 +1100
Subject: [PATCH 2/2] init fork API
API to create INIT_FORKNUM file with wrapper
table_create_init_fork.
---
src/backend/access/heap/heapam_handler.c | 26 +++++++++++++++++++++++-
src/backend/catalog/heap.c | 24 ++--------------------
src/backend/commands/tablecmds.c | 4 ++--
src/include/access/tableam.h | 8 ++++++++
src/include/catalog/heap.h | 2 --
5 files changed, 37 insertions(+), 27 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 313ed319fc..87d3331ba1 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,6 +33,7 @@
#include "catalog/indexing.h"
#include "catalog/pg_am_d.h"
#include "catalog/storage.h"
+#include "catalog/storage_xlog.h"
#include "executor/executor.h"
#include "pgstat.h"
#include "storage/lmgr.h"
@@ -2240,6 +2241,28 @@ RelationSetNewRelfilenode(Relation relation, char persistence,
EOXactListAdd(relation);
}
+/*
+ * Set up an init fork for an unlogged table so that it can be correctly
+ * reinitialized on restart. An immediate sync is required even if the
+ * page has been logged, because the write did not go through
+ * shared_buffers and therefore a concurrent checkpoint may have moved
+ * the redo pointer past our xlog record. Recovery may as well remove it
+ * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
+ * record. Therefore, logging is necessary even if wal_level=minimal.
+ */
+static void
+heap_create_init_fork(Relation rel)
+{
+ Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
+ rel->rd_rel->relkind == RELKIND_MATVIEW ||
+ rel->rd_rel->relkind == RELKIND_TOASTVALUE);
+ RelationOpenSmgr(rel);
+ smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
+ log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
+ smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
+}
+
+
static const TableAmRoutine heapam_methods = {
.type = T_TableAmRoutine,
@@ -2289,7 +2312,8 @@ static const TableAmRoutine heapam_methods = {
.index_validate_scan = validate_index_heapscan,
- .SetNewFileNode = RelationSetNewRelfilenode
+ .SetNewFileNode = RelationSetNewRelfilenode,
+ .CreateInitFork = heap_create_init_fork
};
const TableAmRoutine *
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 38b368f916..8e7c8ce684 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -32,6 +32,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/sysattr.h"
+#include "access/tableam.h"
#include "access/transam.h"
#include "access/xact.h"
#include "access/xlog.h"
@@ -1425,7 +1426,7 @@ heap_create_with_catalog(const char *relname,
*/
if (relpersistence == RELPERSISTENCE_UNLOGGED &&
relkind != RELKIND_PARTITIONED_TABLE)
- heap_create_init_fork(new_rel_desc);
+ table_create_init_fork(new_rel_desc);
/*
* ok, the relation has been cataloged, so close our relations and return
@@ -1437,27 +1438,6 @@ heap_create_with_catalog(const char *relname,
return relid;
}
-/*
- * Set up an init fork for an unlogged table so that it can be correctly
- * reinitialized on restart. An immediate sync is required even if the
- * page has been logged, because the write did not go through
- * shared_buffers and therefore a concurrent checkpoint may have moved
- * the redo pointer past our xlog record. Recovery may as well remove it
- * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
- * record. Therefore, logging is necessary even if wal_level=minimal.
- */
-void
-heap_create_init_fork(Relation rel)
-{
- Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
- rel->rd_rel->relkind == RELKIND_MATVIEW ||
- rel->rd_rel->relkind == RELKIND_TOASTVALUE);
- RelationOpenSmgr(rel);
- smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
- log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
- smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
-}
-
/*
* RelationRemoveInheritance
*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6db214309e..e107afc786 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1647,7 +1647,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
table_set_new_filenode(rel, rel->rd_rel->relpersistence,
RecentXmin, minmulti);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
- heap_create_init_fork(rel);
+ table_create_init_fork(rel);
heap_relid = RelationGetRelid(rel);
toast_relid = rel->rd_rel->reltoastrelid;
@@ -1661,7 +1661,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
table_set_new_filenode(rel, rel->rd_rel->relpersistence,
RecentXmin, minmulti);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
- heap_create_init_fork(rel);
+ table_create_init_fork(rel);
heap_close(rel, NoLock);
}
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 4d5b11c294..f3e36368db 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -196,6 +196,7 @@ typedef bool (*SampleScanNextTuple_function)(TableScanDesc scan, struct SampleSc
typedef void (*SetNewFileNode_function)(Relation relation, char persistence,
TransactionId freezeXid, MultiXactId minmulti);
+typedef void (*CreateInitFork_function)(Relation rel);
/*
* API struct for a table AM. Note this must be allocated in a
@@ -255,6 +256,7 @@ typedef struct TableAmRoutine
IndexValidateScan_function index_validate_scan;
SetNewFileNode_function SetNewFileNode;
+ CreateInitFork_function CreateInitFork;
} TableAmRoutine;
static inline const TupleTableSlotOps*
@@ -754,6 +756,12 @@ table_set_new_filenode(Relation relation, char persistence,
freezeXid, minmulti);
}
+static inline void
+table_create_init_fork(Relation relation)
+{
+ relation->rd_tableamroutine->CreateInitFork(relation);
+}
+
extern BlockNumber table_parallelscan_nextpage(TableScanDesc scan);
extern void table_parallelscan_startblock_init(TableScanDesc scan);
extern Size table_parallelscan_estimate(Snapshot snapshot);
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 4584b3473c..c0e706ecc9 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -77,8 +77,6 @@ extern Oid heap_create_with_catalog(const char *relname,
Oid relrewrite,
ObjectAddress *typaddress);
-extern void heap_create_init_fork(Relation rel);
-
extern void heap_drop_with_catalog(Oid relid);
extern void heap_truncate(List *relids);
--
2.18.0.windows.1
0001-New-API-setNewfilenode.patchapplication/octet-stream; name=0001-New-API-setNewfilenode.patchDownload
From d8489cf06b9cd186f5dac801879e604bb330f79a Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 8 Oct 2018 14:33:49 +1100
Subject: [PATCH 1/2] New API setNewfilenode
This API can be used to set the filenode of a relation.
The wrapper function for this API is table_set_new_filenode,
using of it for sequence and index to create storage. The
wrapper function name can be updated if required.
---
src/backend/access/heap/heapam_handler.c | 128 ++++++++++++++++++++-
src/backend/catalog/index.c | 2 +-
src/backend/commands/sequence.c | 5 +-
src/backend/commands/tablecmds.c | 6 +-
src/backend/utils/cache/relcache.c | 135 ++---------------------
src/include/access/tableam.h | 13 +++
src/include/utils/relcache.h | 9 +-
7 files changed, 157 insertions(+), 141 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c3960dc91f..313ed319fc 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -22,6 +22,7 @@
#include "miscadmin.h"
+#include "access/multixact.h"
#include "access/heapam.h"
#include "access/relscan.h"
#include "access/rewriteheap.h"
@@ -29,12 +30,17 @@
#include "access/tsmapi.h"
#include "catalog/catalog.h"
#include "catalog/index.h"
+#include "catalog/indexing.h"
#include "catalog/pg_am_d.h"
+#include "catalog/storage.h"
#include "executor/executor.h"
#include "pgstat.h"
#include "storage/lmgr.h"
#include "utils/builtins.h"
#include "utils/rel.h"
+#include "utils/relcache.h"
+#include "utils/relmapper.h"
+#include "utils/syscache.h"
#include "utils/tqual.h"
#include "storage/bufpage.h"
#include "storage/bufmgr.h"
@@ -2116,6 +2122,124 @@ heap_copy_for_cluster(Relation OldHeap, Relation NewHeap, Relation OldIndex,
pfree(isnull);
}
+/*
+ * RelationSetNewRelfilenode
+ *
+ * Assign a new relfilenode (physical file name) to the relation.
+ *
+ * This allows a full rewrite of the relation to be done with transactional
+ * safety (since the filenode assignment can be rolled back). Note however
+ * that there is no simple way to access the relation's old data for the
+ * remainder of the current transaction. This limits the usefulness to cases
+ * such as TRUNCATE or rebuilding an index from scratch.
+ *
+ * Caller must already hold exclusive lock on the relation.
+ *
+ * The relation is marked with relfrozenxid = freezeXid (InvalidTransactionId
+ * must be passed for indexes and sequences). This should be a lower bound on
+ * the XIDs that will be put into the new relation contents.
+ *
+ * The new filenode's persistence is set to the given value. This is useful
+ * for the cases that are changing the relation's persistence; other callers
+ * need to pass the original relpersistence value.
+ */
+static void
+RelationSetNewRelfilenode(Relation relation, char persistence,
+ TransactionId freezeXid, MultiXactId minmulti)
+{
+ Oid newrelfilenode;
+ RelFileNodeBackend newrnode;
+ Relation pg_class;
+ HeapTuple tuple;
+ Form_pg_class classform;
+
+ /* Indexes, sequences must have Invalid frozenxid; other rels must not */
+ Assert((relation->rd_rel->relkind == RELKIND_INDEX ||
+ relation->rd_rel->relkind == RELKIND_SEQUENCE) ?
+ freezeXid == InvalidTransactionId :
+ TransactionIdIsNormal(freezeXid));
+ Assert(TransactionIdIsNormal(freezeXid) == MultiXactIdIsValid(minmulti));
+
+ /* Allocate a new relfilenode */
+ newrelfilenode = GetNewRelFileNode(relation->rd_rel->reltablespace, NULL,
+ persistence);
+
+ /*
+ * Get a writable copy of the pg_class tuple for the given relation.
+ */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ tuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(RelationGetRelid(relation)));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "could not find tuple for relation %u",
+ RelationGetRelid(relation));
+ classform = (Form_pg_class) GETSTRUCT(tuple);
+
+ /*
+ * Create storage for the main fork of the new relfilenode.
+ *
+ * NOTE: any conflict in relfilenode value will be caught here, if
+ * GetNewRelFileNode messes up for any reason.
+ */
+ newrnode.node = relation->rd_node;
+ newrnode.node.relNode = newrelfilenode;
+ newrnode.backend = relation->rd_backend;
+ RelationCreateStorage(newrnode.node, persistence);
+ smgrclosenode(newrnode);
+
+ /*
+ * Schedule unlinking of the old storage at transaction commit.
+ */
+ RelationDropStorage(relation);
+
+ /*
+ * Now update the pg_class row. However, if we're dealing with a mapped
+ * index, pg_class.relfilenode doesn't change; instead we have to send the
+ * update to the relation mapper.
+ */
+ if (RelationIsMapped(relation))
+ RelationMapUpdateMap(RelationGetRelid(relation),
+ newrelfilenode,
+ relation->rd_rel->relisshared,
+ false);
+ else
+ classform->relfilenode = newrelfilenode;
+
+ /* These changes are safe even for a mapped relation */
+ if (relation->rd_rel->relkind != RELKIND_SEQUENCE)
+ {
+ classform->relpages = 0; /* it's empty until further notice */
+ classform->reltuples = 0;
+ classform->relallvisible = 0;
+ }
+ classform->relfrozenxid = freezeXid;
+ classform->relminmxid = minmulti;
+ classform->relpersistence = persistence;
+
+ CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+
+ heap_freetuple(tuple);
+
+ heap_close(pg_class, RowExclusiveLock);
+
+ /*
+ * Make the pg_class row change visible, as well as the relation map
+ * change if any. This will cause the relcache entry to get updated, too.
+ */
+ CommandCounterIncrement();
+
+ /*
+ * Mark the rel as having been given a new relfilenode in the current
+ * (sub) transaction. This is a hint that can be used to optimize later
+ * operations on the rel in the same transaction.
+ */
+ relation->rd_newRelfilenodeSubid = GetCurrentSubTransactionId();
+
+ /* Flag relation as needing eoxact cleanup (to remove the hint) */
+ EOXactListAdd(relation);
+}
+
static const TableAmRoutine heapam_methods = {
.type = T_TableAmRoutine,
@@ -2163,7 +2287,9 @@ static const TableAmRoutine heapam_methods = {
.index_build_range_scan = IndexBuildHeapRangeScan,
- .index_validate_scan = validate_index_heapscan
+ .index_validate_scan = validate_index_heapscan,
+
+ .SetNewFileNode = RelationSetNewRelfilenode
};
const TableAmRoutine *
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 55477bd995..df213dc07d 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2865,7 +2865,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
}
/* We'll build a new physical relation for the index */
- RelationSetNewRelfilenode(iRel, persistence, InvalidTransactionId,
+ table_set_new_filenode(iRel, persistence, InvalidTransactionId,
InvalidMultiXactId);
/* Initialize the index and rebuild */
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 89122d4ad7..107f9a0176 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -17,6 +17,7 @@
#include "access/bufmask.h"
#include "access/htup_details.h"
#include "access/multixact.h"
+#include "access/tableam.h"
#include "access/transam.h"
#include "access/xact.h"
#include "access/xlog.h"
@@ -315,7 +316,7 @@ ResetSequence(Oid seq_relid)
* sequence's relfrozenxid at 0, since it won't contain any unfrozen XIDs.
* Same with relminmxid, since a sequence will never contain multixacts.
*/
- RelationSetNewRelfilenode(seq_rel, seq_rel->rd_rel->relpersistence,
+ table_set_new_filenode(seq_rel, seq_rel->rd_rel->relpersistence,
InvalidTransactionId, InvalidMultiXactId);
/*
@@ -485,7 +486,7 @@ AlterSequence(ParseState *pstate, AlterSeqStmt *stmt)
* at 0, since it won't contain any unfrozen XIDs. Same with
* relminmxid, since a sequence will never contain multixacts.
*/
- RelationSetNewRelfilenode(seqrel, seqrel->rd_rel->relpersistence,
+ table_set_new_filenode(seqrel, seqrel->rd_rel->relpersistence,
InvalidTransactionId, InvalidMultiXactId);
/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f3526b267d..6db214309e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1643,10 +1643,8 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
* Create a new empty storage file for the relation, and assign it
* as the relfilenode value. The old storage file is scheduled for
* deletion at commit.
- *
- * PBORKED: needs to be a callback
*/
- RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
+ table_set_new_filenode(rel, rel->rd_rel->relpersistence,
RecentXmin, minmulti);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
heap_create_init_fork(rel);
@@ -1660,7 +1658,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
if (OidIsValid(toast_relid))
{
rel = relation_open(toast_relid, AccessExclusiveLock);
- RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
+ table_set_new_filenode(rel, rel->rd_rel->relpersistence,
RecentXmin, minmulti);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
heap_create_init_fork(rel);
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 0d6e5a189f..0592fdc750 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -160,13 +160,6 @@ static Oid eoxact_list[MAX_EOXACT_LIST];
static int eoxact_list_len = 0;
static bool eoxact_list_overflowed = false;
-#define EOXactListAdd(rel) \
- do { \
- if (eoxact_list_len < MAX_EOXACT_LIST) \
- eoxact_list[eoxact_list_len++] = (rel)->rd_id; \
- else \
- eoxact_list_overflowed = true; \
- } while (0)
/*
* EOXactTupleDescArray stores TupleDescs that (might) need AtEOXact
@@ -292,6 +285,14 @@ static void unlink_initfile(const char *initfilename, int elevel);
static bool equalPartitionDescs(PartitionKey key, PartitionDesc partdesc1,
PartitionDesc partdesc2);
+void
+EOXactListAdd(Relation rel)
+{
+ if (eoxact_list_len < MAX_EOXACT_LIST)
+ eoxact_list[eoxact_list_len++] = (rel)->rd_id;
+ else
+ eoxact_list_overflowed = true;
+}
/*
* ScanPgRelation
@@ -3392,126 +3393,6 @@ RelationBuildLocalRelation(const char *relname,
return rel;
}
-
-/*
- * RelationSetNewRelfilenode
- *
- * Assign a new relfilenode (physical file name) to the relation.
- *
- * This allows a full rewrite of the relation to be done with transactional
- * safety (since the filenode assignment can be rolled back). Note however
- * that there is no simple way to access the relation's old data for the
- * remainder of the current transaction. This limits the usefulness to cases
- * such as TRUNCATE or rebuilding an index from scratch.
- *
- * Caller must already hold exclusive lock on the relation.
- *
- * The relation is marked with relfrozenxid = freezeXid (InvalidTransactionId
- * must be passed for indexes and sequences). This should be a lower bound on
- * the XIDs that will be put into the new relation contents.
- *
- * The new filenode's persistence is set to the given value. This is useful
- * for the cases that are changing the relation's persistence; other callers
- * need to pass the original relpersistence value.
- */
-void
-RelationSetNewRelfilenode(Relation relation, char persistence,
- TransactionId freezeXid, MultiXactId minmulti)
-{
- Oid newrelfilenode;
- RelFileNodeBackend newrnode;
- Relation pg_class;
- HeapTuple tuple;
- Form_pg_class classform;
-
- /* Indexes, sequences must have Invalid frozenxid; other rels must not */
- Assert((relation->rd_rel->relkind == RELKIND_INDEX ||
- relation->rd_rel->relkind == RELKIND_SEQUENCE) ?
- freezeXid == InvalidTransactionId :
- TransactionIdIsNormal(freezeXid));
- Assert(TransactionIdIsNormal(freezeXid) == MultiXactIdIsValid(minmulti));
-
- /* Allocate a new relfilenode */
- newrelfilenode = GetNewRelFileNode(relation->rd_rel->reltablespace, NULL,
- persistence);
-
- /*
- * Get a writable copy of the pg_class tuple for the given relation.
- */
- pg_class = heap_open(RelationRelationId, RowExclusiveLock);
-
- tuple = SearchSysCacheCopy1(RELOID,
- ObjectIdGetDatum(RelationGetRelid(relation)));
- if (!HeapTupleIsValid(tuple))
- elog(ERROR, "could not find tuple for relation %u",
- RelationGetRelid(relation));
- classform = (Form_pg_class) GETSTRUCT(tuple);
-
- /*
- * Create storage for the main fork of the new relfilenode.
- *
- * NOTE: any conflict in relfilenode value will be caught here, if
- * GetNewRelFileNode messes up for any reason.
- */
- newrnode.node = relation->rd_node;
- newrnode.node.relNode = newrelfilenode;
- newrnode.backend = relation->rd_backend;
- RelationCreateStorage(newrnode.node, persistence);
- smgrclosenode(newrnode);
-
- /*
- * Schedule unlinking of the old storage at transaction commit.
- */
- RelationDropStorage(relation);
-
- /*
- * Now update the pg_class row. However, if we're dealing with a mapped
- * index, pg_class.relfilenode doesn't change; instead we have to send the
- * update to the relation mapper.
- */
- if (RelationIsMapped(relation))
- RelationMapUpdateMap(RelationGetRelid(relation),
- newrelfilenode,
- relation->rd_rel->relisshared,
- false);
- else
- classform->relfilenode = newrelfilenode;
-
- /* These changes are safe even for a mapped relation */
- if (relation->rd_rel->relkind != RELKIND_SEQUENCE)
- {
- classform->relpages = 0; /* it's empty until further notice */
- classform->reltuples = 0;
- classform->relallvisible = 0;
- }
- classform->relfrozenxid = freezeXid;
- classform->relminmxid = minmulti;
- classform->relpersistence = persistence;
-
- CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
-
- heap_freetuple(tuple);
-
- heap_close(pg_class, RowExclusiveLock);
-
- /*
- * Make the pg_class row change visible, as well as the relation map
- * change if any. This will cause the relcache entry to get updated, too.
- */
- CommandCounterIncrement();
-
- /*
- * Mark the rel as having been given a new relfilenode in the current
- * (sub) transaction. This is a hint that can be used to optimize later
- * operations on the rel in the same transaction.
- */
- relation->rd_newRelfilenodeSubid = GetCurrentSubTransactionId();
-
- /* Flag relation as needing eoxact cleanup (to remove the hint) */
- EOXactListAdd(relation);
-}
-
-
/*
* RelationCacheInitialize
*
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 7fe6ff6c22..4d5b11c294 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -194,6 +194,9 @@ struct SampleScanState;
typedef bool (*SampleScanNextBlock_function)(TableScanDesc scan, struct SampleScanState *scanstate);
typedef bool (*SampleScanNextTuple_function)(TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot);
+typedef void (*SetNewFileNode_function)(Relation relation, char persistence,
+ TransactionId freezeXid, MultiXactId minmulti);
+
/*
* API struct for a table AM. Note this must be allocated in a
* server-lifetime manner, typically as a static const struct.
@@ -250,6 +253,8 @@ typedef struct TableAmRoutine
IndexBuildRangeScan_function index_build_range_scan;
IndexValidateScan_function index_validate_scan;
+
+ SetNewFileNode_function SetNewFileNode;
} TableAmRoutine;
static inline const TupleTableSlotOps*
@@ -741,6 +746,14 @@ table_index_build_range_scan(Relation heapRelation,
scan);
}
+static inline void
+table_set_new_filenode(Relation relation, char persistence,
+ TransactionId freezeXid, MultiXactId minmulti)
+{
+ relation->rd_tableamroutine->SetNewFileNode(relation, persistence,
+ freezeXid, minmulti);
+}
+
extern BlockNumber table_parallelscan_nextpage(TableScanDesc scan);
extern void table_parallelscan_startblock_init(TableScanDesc scan);
extern Size table_parallelscan_estimate(Snapshot snapshot);
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 858a7b30d2..1482dae904 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -33,6 +33,9 @@ typedef struct RelationData *Relation;
*/
typedef Relation *RelationPtr;
+/* Function to store the clenup OID's */
+extern void EOXactListAdd(Relation rel);
+
/*
* Routines to open (lookup) and close a relcache entry
*/
@@ -109,12 +112,6 @@ extern Relation RelationBuildLocalRelation(const char *relname,
char relpersistence,
char relkind);
-/*
- * Routine to manage assignment of new relfilenode to a relation
- */
-extern void RelationSetNewRelfilenode(Relation relation, char persistence,
- TransactionId freezeXid, MultiXactId minmulti);
-
/*
* Routines for flushing/rebuilding relcache entries in various scenarios
*/
--
2.18.0.windows.1
Hi!
On Wed, Oct 3, 2018 at 8:16 AM Andres Freund <andres@anarazel.de> wrote:
I've pushed an updated version, with a fair amount of pending changes,
and I hope all your pending (and not redundant, by our concurrent
development), patches merged.
I'd like to also share some patches. I've used current state of
pluggable-zheap for the base of my patches.
* 0001-remove-extra-snapshot-functions.patch – removes
snapshot_satisfiesUpdate() and snapshot_satisfiesVacuum() functions
from tableam API. snapshot_satisfiesUpdate() was unused completely.
snapshot_satisfiesVacuum() was used only in heap_copy_for_cluster().
So, I've replaced it with direct heapam_satisfies_vacuum().
* 0002-add-costing-function-to-API.patch – adds function for costing
sequential and table sample scan to tableam API. zheap costing
function are now copies of heap costing function. This should be
adjusted in future. Estimation for heap lookup during index scans
should be also pluggable, but not yet implemented (TODO).
I've examined code in pluggable-zheap branch and EDB github [1] and I
didn't found anything related to "delete-marking" indexes as stated on
slide #25 of presentation [2]. So, basically contract between heap
and indexes is remain unchanged: once you update one indexed field,
you have to update all the others. Did I understand correctly that
this is postponed?
And couple more notes from me:
* Right now table_fetch_row_version() is called in most of places with
SnapshotAny. That might be working in majority of cases, because in
heap there couldn't be multiple tuples residing the same TID, while
zheap always returns most recent tuple residing this TID. But I think
it would be better to provide some meaningful snapshot instead of
SnapshotAny. If even the best thing we can do is to ask for most
recent tuple on some TID, we need more consistent way for asking table
AM for this. I'm going to elaborate more on this.
* I'm not really sure we need ability to iterate multiple tuples
referenced by index. It seems that the only place, which really needs
this is heap_copy_for_cluster(), which itself is table AM specific.
Also zheap doesn't seem to be able to return more than one tuple by
zheapam_fetch_follow(). So, I'm going to investigate more on this.
And if this iteration is really unneeded, I'll propose a patch to
delete that.
1. https://github.com/EnterpriseDB/zheap
2. http://www.pgcon.org/2018/schedule/attachments/501_zheap-a-new-storage-format-postgresql-5.pdf
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0001-remove-extra-snapshot-functions.patchapplication/octet-stream; name=0001-remove-extra-snapshot-functions.patchDownload
commit fea0ea9bceb698dbe88bee42d5fc5f3332a658dd
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed Sep 26 15:29:43 2018 +0300
Remove some snapshot functions from TableAmRoutine
snapshot_satisfiesUpdate was unused. snapshot_satisfiesVacuum was used only
inside heap_copy_for_cluster, so it was replaced to direct
heapam_satisfies_vacuum() call.
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c3960dc91fd..28c475e7bdc 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -483,21 +483,6 @@ heapam_satisfies(TupleTableSlot *slot, Snapshot snapshot)
return res;
}
-static HTSU_Result
-heapam_satisfies_update(TupleTableSlot *slot, CommandId curcid)
-{
- BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
- HTSU_Result res;
-
- LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
- res = HeapTupleSatisfiesUpdate(bslot->base.tuple,
- curcid,
- bslot->buffer);
- LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
-
- return res;
-}
-
static HTSV_Result
heapam_satisfies_vacuum(TupleTableSlot *slot, TransactionId OldestXmin)
{
@@ -2003,7 +1988,7 @@ heap_copy_for_cluster(Relation OldHeap, Relation NewHeap, Relation OldIndex,
break;
}
- switch (OldHeap->rd_tableamroutine->snapshot_satisfiesVacuum(slot, OldestXmin))
+ switch (heapam_satisfies_vacuum(slot, OldestXmin))
{
case HEAPTUPLE_DEAD:
/* Definitely dead */
@@ -2122,8 +2107,6 @@ static const TableAmRoutine heapam_methods = {
.slot_callbacks = heapam_slot_callbacks,
.snapshot_satisfies = heapam_satisfies,
- .snapshot_satisfiesUpdate = heapam_satisfies_update,
- .snapshot_satisfiesVacuum = heapam_satisfies_vacuum,
.scan_begin = heap_beginscan,
.scansetlimits = heap_setscanlimits,
diff --git a/src/backend/access/zheap/zheapam_handler.c b/src/backend/access/zheap/zheapam_handler.c
index bec9b16f7d6..e707baa1b5d 100644
--- a/src/backend/access/zheap/zheapam_handler.c
+++ b/src/backend/access/zheap/zheapam_handler.c
@@ -486,40 +486,6 @@ zheapam_satisfies(TupleTableSlot *slot, Snapshot snapshot)
#endif
}
-static HTSU_Result
-zheapam_satisfies_update(TupleTableSlot *slot, CommandId curcid)
-{
- elog(ERROR, "would need to track buffer or refetch");
-#if ZBORKED
- BufferHeapTupleTableSlot *zslot = (BufferHeapTupleTableSlot *) slot;
- HTSU_Result res;
-
-
- LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
- res = HeapTupleSatisfiesUpdate(bslot->base.tuple,
- curcid,
- bslot->buffer);
- LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
-
- return res;
-#endif
-}
-
-static HTSV_Result
-zheapam_satisfies_vacuum(TupleTableSlot *slot, TransactionId OldestXmin)
-{
- BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
- HTSV_Result res;
-
- LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
- res = HeapTupleSatisfiesVacuum(bslot->base.tuple,
- OldestXmin,
- bslot->buffer);
- LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
-
- return res;
-}
-
static IndexFetchTableData*
zheapam_begin_index_fetch(Relation rel)
{
@@ -1621,8 +1587,6 @@ static const TableAmRoutine zheapam_methods = {
.slot_callbacks = zheapam_slot_callbacks,
.snapshot_satisfies = zheapam_satisfies,
- .snapshot_satisfiesUpdate = zheapam_satisfies_update,
- .snapshot_satisfiesVacuum = zheapam_satisfies_vacuum,
.scan_begin = zheap_beginscan,
.scansetlimits = zheap_setscanlimits,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 7fe6ff6c221..fb37a739918 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -55,8 +55,6 @@ extern bool synchronize_seqscans;
* Storage routine function hooks
*/
typedef bool (*SnapshotSatisfies_function) (TupleTableSlot *slot, Snapshot snapshot);
-typedef HTSU_Result (*SnapshotSatisfiesUpdate_function) (TupleTableSlot *slot, CommandId curcid);
-typedef HTSV_Result (*SnapshotSatisfiesVacuum_function) (TupleTableSlot *slot, TransactionId OldestXmin);
typedef Oid (*TupleInsert_function) (Relation rel, TupleTableSlot *slot, CommandId cid,
int options, BulkInsertState bistate);
@@ -205,8 +203,6 @@ typedef struct TableAmRoutine
SlotCallbacks_function slot_callbacks;
SnapshotSatisfies_function snapshot_satisfies;
- SnapshotSatisfiesUpdate_function snapshot_satisfiesUpdate;
- SnapshotSatisfiesVacuum_function snapshot_satisfiesVacuum;
/* Operations on physical tuples */
TupleInsert_function tuple_insert;
0002-add-costing-function-to-API.patchapplication/octet-stream; name=0002-add-costing-function-to-API.patchDownload
commit fa33fbebe3e33f09fdb32961b2ffdf5c7262b74a
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date: Mon Oct 15 21:33:31 2018 +0300
Add costing function to tableam interface
Costs of sequential scan and table sample scan are estimated using
cost_seqscan() and cost_samplescan(). But they should be table access method
specific, because different table AMs could have different costs. This commit
introduces zheap cost functions to be the same as heap. That should be
adjusted in future. Make costs of heap lookups during index scan is TODO.
diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile
index aee7bfd8346..e13b0e0b8fa 100644
--- a/src/backend/access/heap/Makefile
+++ b/src/backend/access/heap/Makefile
@@ -13,6 +13,7 @@ top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
OBJS = heapam.o heapam_handler.o heapam_visibility.o hio.o pruneheap.o \
- rewriteheap.o syncscan.o tuptoaster.o vacuumlazy.o visibilitymap.o
+ rewriteheap.o syncscan.o tuptoaster.o vacuumlazy.o visibilitymap.o \
+ heapam_cost.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 28c475e7bdc..35b230c3606 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2146,7 +2146,10 @@ static const TableAmRoutine heapam_methods = {
.index_build_range_scan = IndexBuildHeapRangeScan,
- .index_validate_scan = validate_index_heapscan
+ .index_validate_scan = validate_index_heapscan,
+
+ .cost_scan = heapam_cost_scan,
+ .cost_samplescan = heapam_cost_samplescan
};
const TableAmRoutine *
diff --git a/src/backend/access/zheap/Makefile b/src/backend/access/zheap/Makefile
index 75b0ff69ebf..4458e1a4238 100644
--- a/src/backend/access/zheap/Makefile
+++ b/src/backend/access/zheap/Makefile
@@ -13,6 +13,6 @@ top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
OBJS = prunetpd.o prunezheap.o tpd.o tpdxlog.o zheapam.o zheapam_handler.o zheapamutils.o zheapamxlog.o \
- zhio.o zmultilocker.o ztuptoaster.o ztqual.o zvacuumlazy.o
+ zhio.o zmultilocker.o ztuptoaster.o ztqual.o zvacuumlazy.o zheapam_cost.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/zheap/zheapam_handler.c b/src/backend/access/zheap/zheapam_handler.c
index e707baa1b5d..da95c881a77 100644
--- a/src/backend/access/zheap/zheapam_handler.c
+++ b/src/backend/access/zheap/zheapam_handler.c
@@ -1627,6 +1627,8 @@ static const TableAmRoutine zheapam_methods = {
.index_build_range_scan = IndexBuildZHeapRangeScan,
.index_validate_scan = validate_index_zheapscan,
+ .cost_scan = zheapam_cost_scan,
+ .cost_samplescan = zheapam_cost_samplescan
};
Datum
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 7bf67a05295..abdddacf89a 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -75,6 +75,7 @@
#include "access/amapi.h"
#include "access/htup_details.h"
+#include "access/tableam.h"
#include "access/tsmapi.h"
#include "executor/executor.h"
#include "executor/nodeHash.h"
@@ -150,9 +151,6 @@ static MergeScanSelCache *cached_scansel(PlannerInfo *root,
static void cost_rescan(PlannerInfo *root, Path *path,
Cost *rescan_startup_cost, Cost *rescan_total_cost);
static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
-static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
- ParamPathInfo *param_info,
- QualCost *qpqual_cost);
static bool has_indexed_join_quals(NestPath *joinpath);
static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
List *quals);
@@ -174,7 +172,6 @@ static Cost append_nonpartial_cost(List *subpaths, int numpaths,
static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
static double relation_byte_size(double tuples, int width);
static double page_size(double tuples, int width);
-static double get_parallel_divisor(Path *path);
/*
@@ -209,70 +206,9 @@ void
cost_seqscan(Path *path, PlannerInfo *root,
RelOptInfo *baserel, ParamPathInfo *param_info)
{
- Cost startup_cost = 0;
- Cost cpu_run_cost;
- Cost disk_run_cost;
- double spc_seq_page_cost;
- QualCost qpqual_cost;
- Cost cpu_per_tuple;
-
- /* Should only be applied to base relations */
- Assert(baserel->relid > 0);
- Assert(baserel->rtekind == RTE_RELATION);
-
- /* Mark the path with the correct row estimate */
- if (param_info)
- path->rows = param_info->ppi_rows;
- else
- path->rows = baserel->rows;
-
- if (!enable_seqscan)
- startup_cost += disable_cost;
-
- /* fetch estimated page cost for tablespace containing table */
- get_tablespace_page_costs(baserel->reltablespace,
- NULL,
- &spc_seq_page_cost);
-
- /*
- * disk costs
- */
- disk_run_cost = spc_seq_page_cost * baserel->pages;
-
- /* CPU costs */
- get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
-
- startup_cost += qpqual_cost.startup;
- cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
- cpu_run_cost = cpu_per_tuple * baserel->tuples;
- /* tlist eval costs are paid per output row, not per tuple scanned */
- startup_cost += path->pathtarget->cost.startup;
- cpu_run_cost += path->pathtarget->cost.per_tuple * path->rows;
-
- /* Adjust costing for parallelism, if used. */
- if (path->parallel_workers > 0)
- {
- double parallel_divisor = get_parallel_divisor(path);
-
- /* The CPU cost is divided among all the workers. */
- cpu_run_cost /= parallel_divisor;
-
- /*
- * It may be possible to amortize some of the I/O cost, but probably
- * not very much, because most operating systems already do aggressive
- * prefetching. For now, we assume that the disk run cost can't be
- * amortized at all.
- */
-
- /*
- * In the case of a parallel plan, the row count needs to represent
- * the number of tuples processed per worker.
- */
- path->rows = clamp_row_est(path->rows / parallel_divisor);
- }
-
- path->startup_cost = startup_cost;
- path->total_cost = startup_cost + cpu_run_cost + disk_run_cost;
+ TableScanCost_function cost_func = (TableScanCost_function) baserel->cost_scan;
+ Assert(cost_func != NULL);
+ return cost_func(path, root, baserel, param_info);
}
/*
@@ -286,65 +222,9 @@ void
cost_samplescan(Path *path, PlannerInfo *root,
RelOptInfo *baserel, ParamPathInfo *param_info)
{
- Cost startup_cost = 0;
- Cost run_cost = 0;
- RangeTblEntry *rte;
- TableSampleClause *tsc;
- TsmRoutine *tsm;
- double spc_seq_page_cost,
- spc_random_page_cost,
- spc_page_cost;
- QualCost qpqual_cost;
- Cost cpu_per_tuple;
-
- /* Should only be applied to base relations with tablesample clauses */
- Assert(baserel->relid > 0);
- rte = planner_rt_fetch(baserel->relid, root);
- Assert(rte->rtekind == RTE_RELATION);
- tsc = rte->tablesample;
- Assert(tsc != NULL);
- tsm = GetTsmRoutine(tsc->tsmhandler);
-
- /* Mark the path with the correct row estimate */
- if (param_info)
- path->rows = param_info->ppi_rows;
- else
- path->rows = baserel->rows;
-
- /* fetch estimated page cost for tablespace containing table */
- get_tablespace_page_costs(baserel->reltablespace,
- &spc_random_page_cost,
- &spc_seq_page_cost);
-
- /* if NextSampleBlock is used, assume random access, else sequential */
- spc_page_cost = (tsm->NextSampleBlock != NULL) ?
- spc_random_page_cost : spc_seq_page_cost;
-
- /*
- * disk costs (recall that baserel->pages has already been set to the
- * number of pages the sampling method will visit)
- */
- run_cost += spc_page_cost * baserel->pages;
-
- /*
- * CPU costs (recall that baserel->tuples has already been set to the
- * number of tuples the sampling method will select). Note that we ignore
- * execution cost of the TABLESAMPLE parameter expressions; they will be
- * evaluated only once per scan, and in most usages they'll likely be
- * simple constants anyway. We also don't charge anything for the
- * calculations the sampling method might do internally.
- */
- get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
-
- startup_cost += qpqual_cost.startup;
- cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
- run_cost += cpu_per_tuple * baserel->tuples;
- /* tlist eval costs are paid per output row, not per tuple scanned */
- startup_cost += path->pathtarget->cost.startup;
- run_cost += path->pathtarget->cost.per_tuple * path->rows;
-
- path->startup_cost = startup_cost;
- path->total_cost = startup_cost + run_cost;
+ TableScanCost_function cost_func = (TableScanCost_function) baserel->cost_samplescan;
+ Assert(cost_func != NULL);
+ return cost_func(path, root, baserel, param_info);
}
/*
@@ -3988,7 +3868,7 @@ cost_qual_eval_walker(Node *node, cost_qual_eval_context *context)
* some of the quals. We assume baserestrictcost was previously set by
* set_baserel_size_estimates().
*/
-static void
+void
get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info,
QualCost *qpqual_cost)
@@ -5343,7 +5223,7 @@ page_size(double tuples, int width)
* Estimate the fraction of the work that each worker will do given the
* number of workers budgeted for the path.
*/
-static double
+double
get_parallel_divisor(Path *path)
{
double parallel_divisor = path->parallel_workers;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 86029cd1327..45f3b0372b9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -93,6 +93,8 @@ static void set_baserel_partition_key_exprs(Relation relation,
* pages number of pages
* tuples number of tuples
* rel_parallel_workers user-defined number of parallel workers
+ * cost_scan costing function for sequential scan
+ * cost_samplescan costing function for sample scan
*
* Also, add information about the relation's foreign keys to root->fkey_list.
*
@@ -443,6 +445,18 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
rel->fdwroutine = NULL;
}
+ /* Get costing functions */
+ if (relation->rd_tableamroutine != NULL)
+ {
+ rel->cost_scan = relation->rd_tableamroutine->cost_scan;
+ rel->cost_samplescan = relation->rd_tableamroutine->cost_samplescan;
+ }
+ else
+ {
+ rel->cost_scan = NULL;
+ rel->cost_samplescan = NULL;
+ }
+
/* Collect info about relation's foreign keys, if relevant */
get_relation_foreign_keys(root, rel, relation, inhparent);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index dba50178887..2f342c7fef1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -17,6 +17,7 @@
#include "access/genham.h"
#include "nodes/lockoptions.h"
#include "nodes/primnodes.h"
+#include "optimizer/cost.h"
#include "storage/bufpage.h"
#include "storage/lockdefs.h"
#include "utils/relcache.h"
@@ -59,6 +60,13 @@ extern Relation heap_openrv_extended(const RangeVar *relation,
#define heap_close(r,l) relation_close(r,l)
+/* in heap/heamam_cost.c */
+extern void heapam_cost_scan(Path *path, PlannerInfo *root,
+ RelOptInfo *baserel, ParamPathInfo *param_info);
+extern void heapam_cost_samplescan(Path *path, PlannerInfo *root,
+ RelOptInfo *baserel, ParamPathInfo *param_info);
+
+
/* struct definitions appear in relscan.h */
typedef struct TableScanDescData *TableScanDesc;
typedef struct HeapScanDescData *HeapScanDesc;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index fb37a739918..beea954885a 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,6 +22,7 @@
#include "executor/tuptable.h"
#include "nodes/execnodes.h"
#include "nodes/nodes.h"
+#include "optimizer/cost.h"
#include "fmgr.h"
#include "utils/guc.h"
#include "utils/rel.h"
@@ -192,6 +193,8 @@ struct SampleScanState;
typedef bool (*SampleScanNextBlock_function)(TableScanDesc scan, struct SampleScanState *scanstate);
typedef bool (*SampleScanNextTuple_function)(TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot);
+typedef void (*TableScanCost_function)(Path *path, PlannerInfo *root, RelOptInfo *baserel, ParamPathInfo *param_info);
+
/*
* API struct for a table AM. Note this must be allocated in a
* server-lifetime manner, typically as a static const struct.
@@ -246,6 +249,10 @@ typedef struct TableAmRoutine
IndexBuildRangeScan_function index_build_range_scan;
IndexValidateScan_function index_validate_scan;
+
+ /* Costing functions */
+ TableScanCost_function cost_scan;
+ TableScanCost_function cost_samplescan;
} TableAmRoutine;
static inline const TupleTableSlotOps*
diff --git a/src/include/access/zheap.h b/src/include/access/zheap.h
index c657c728ec3..583cf25f965 100644
--- a/src/include/access/zheap.h
+++ b/src/include/access/zheap.h
@@ -20,6 +20,7 @@
#include "access/hio.h"
#include "access/undoinsert.h"
#include "access/zhtup.h"
+#include "optimizer/cost.h"
#include "utils/rel.h"
#include "utils/snapshot.h"
@@ -211,4 +212,11 @@ typedef struct ZHeapFreeOffsetRanges
int nranges;
} ZHeapFreeOffsetRanges;
+/* Zheap costing functions */
+extern void zheapam_cost_scan(Path *path, PlannerInfo *root,
+ RelOptInfo *baserel, ParamPathInfo *param_info);
+extern void zheapam_cost_samplescan(Path *path, PlannerInfo *root,
+ RelOptInfo *baserel, ParamPathInfo *param_info);
+
+
#endif /* ZHEAP_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index adb42650479..6c51fe27460 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -704,6 +704,10 @@ typedef struct RelOptInfo
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
List *partitioned_child_rels; /* List of RT indexes. */
+
+ /* Rather than include tableam.h here, we declare costing functions like this */
+ void (*cost_scan) (); /* sequential scan cost estimator */
+ void (*cost_samplescan) (); /* sample scan cost estimator */
} RelOptInfo;
/*
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 77ca7ff8371..0af574c41fe 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -166,6 +166,9 @@ extern void cost_gather(GatherPath *path, PlannerInfo *root,
extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
+extern void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
+ ParamPathInfo *param_info,
+ QualCost *qpqual_cost);
extern void compute_semi_anti_join_factors(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
@@ -198,6 +201,7 @@ extern void set_tablefunc_size_estimates(PlannerInfo *root, RelOptInfo *rel);
extern void set_namedtuplestore_size_estimates(PlannerInfo *root, RelOptInfo *rel);
extern void set_foreign_size_estimates(PlannerInfo *root, RelOptInfo *rel);
extern PathTarget *set_pathtarget_cost_width(PlannerInfo *root, PathTarget *target);
+extern double get_parallel_divisor(Path *path);
extern double compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel,
Path *bitmapqual, int loop_count, Cost *cost, double *tuple);
On Tue, Oct 9, 2018 at 1:46 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
On Wed, Oct 3, 2018 at 3:16 PM Andres Freund <andres@anarazel.de> wrote:
On 2018-09-27 20:03:58 -0700, Andres Freund wrote:
On 2018-09-28 12:21:08 +1000, Haribabu Kommi wrote:
Here I attached further cleanup patches.
1. Re-arrange the GUC variable
2. Added a check function hook for default_table_access_method GUCCool.
3. Added a new hook validate_index. I tried to change the function
validate_index_heapscan to slotify, but that have many problems as it
is accessing some internals of the heapscandesc structure andaccessing
the buffer and etc.
Oops, I also did that locally, in a way. I also made a validate a
callback, as the validation logic is going to be specific to the AMs.
Sorry for not pushing that up earlier. I'll try to do that soon,
there's a fair amount of change.I've pushed an updated version, with a fair amount of pending changes,
and I hope all your pending (and not redundant, by our concurrent
development), patches merged.Yes, All the patches are merged.
There's currently 3 regression test failures, that I'll look into
tomorrow:
- partition_prune shows a few additional Heap Blocks: exact=1 lines. I'm
a bit confused as to why, but haven't really investigated yet.
- fast_default fails, because I've undone most of 7636e5c60fea83a9f3c,
I'll have to redo that in a different way.
- I occasionally see failures in aggregates.sql - I've not figured out
what's going on there.I also observed the failure of aggregates.sql, will look into it.
Amit Khandekar said he'll publish a new version of the slot-abstraction
patch tomorrow, so I'll rebase it onto that ASAP.OK.
Here I attached two new API patches.
1. Set New Rel File node
2. Create Init fork
The above patches are having problem and while testing it leads to crash.
Sorry for not testing earlier. The index relation also creates the
NewRelFileNode,
because of moving that function as pluggable table access method, and for
Index relations, there is no tableam routine, thus it leads to crash.
So moving the storage creation methods as table access methods doesn't
work. we may need common access methods that are shared across both
table and index.
Regards,
Haribabu Kommi
Fujitsu Australia
On Tue, Oct 16, 2018 at 12:37 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
I've examined code in pluggable-zheap branch and EDB github [1] and I
didn't found anything related to "delete-marking" indexes as stated on
slide #25 of presentation [2]. So, basically contract between heap
and indexes is remain unchanged: once you update one indexed field,
you have to update all the others.
Yes, this will be the behavior for the first version.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Oct 9, 2018 at 1:46 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
On Wed, Oct 3, 2018 at 3:16 PM Andres Freund <andres@anarazel.de> wrote:
On 2018-09-27 20:03:58 -0700, Andres Freund wrote:
On 2018-09-28 12:21:08 +1000, Haribabu Kommi wrote:
Here I attached further cleanup patches.
1. Re-arrange the GUC variable
2. Added a check function hook for default_table_access_method GUCCool.
3. Added a new hook validate_index. I tried to change the function
validate_index_heapscan to slotify, but that have many problems as it
is accessing some internals of the heapscandesc structure andaccessing
the buffer and etc.
Oops, I also did that locally, in a way. I also made a validate a
callback, as the validation logic is going to be specific to the AMs.
Sorry for not pushing that up earlier. I'll try to do that soon,
there's a fair amount of change.I've pushed an updated version, with a fair amount of pending changes,
and I hope all your pending (and not redundant, by our concurrent
development), patches merged.Yes, All the patches are merged.
There's currently 3 regression test failures, that I'll look into
tomorrow:
- partition_prune shows a few additional Heap Blocks: exact=1 lines. I'm
a bit confused as to why, but haven't really investigated yet.
- fast_default fails, because I've undone most of 7636e5c60fea83a9f3c,
I'll have to redo that in a different way.
- I occasionally see failures in aggregates.sql - I've not figured out
what's going on there.I also observed the failure of aggregates.sql, will look into it.
The random failure of aggregates.sql is as follows
SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
! avg_32
! ---------------------
! 32.6666666666666667
(1 row)
-- In 7.1, avg(float4) is computed using float8 arithmetic.
--- 8,16 ----
(1 row)
SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
! avg_32
! --------
!
(1 row)
Same NULL result for another aggregate query on column b.
The aggtest table is accessed by two tests that are running in parallel.
i.e aggregates.sql and transactions.sql, In transactions.sql, inside a
transaction
all the records in the aggtest table are deleted and aborted the
transaction,
I suspect that some visibility checks are having some race conditions that
leads
to no records on the table aggtest, thus it returns NULL result.
If I try the scenario manually by opening a transaction and deleting the
records, the
issue is not occurring.
I am yet to find the cause for this problem.
Regards,
Haribabu Kommi
On Tue, Oct 16, 2018 at 6:06 AM Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:
Hi!
On Wed, Oct 3, 2018 at 8:16 AM Andres Freund <andres@anarazel.de> wrote:
I've pushed an updated version, with a fair amount of pending changes,
and I hope all your pending (and not redundant, by our concurrent
development), patches merged.I'd like to also share some patches. I've used current state of
pluggable-zheap for the base of my patches.
Thanks for the review and patches.
* 0001-remove-extra-snapshot-functions.patch – removes
snapshot_satisfiesUpdate() and snapshot_satisfiesVacuum() functions
from tableam API. snapshot_satisfiesUpdate() was unused completely.
snapshot_satisfiesVacuum() was used only in heap_copy_for_cluster().
So, I've replaced it with direct heapam_satisfies_vacuum().
Thanks for the correction.
* 0002-add-costing-function-to-API.patch – adds function for costing
sequential and table sample scan to tableam API. zheap costing
function are now copies of heap costing function. This should be
adjusted in future.
This patch misses the new *_cost.c files that are added specific cost
functions.
Estimation for heap lookup during index scans
should be also pluggable, but not yet implemented (TODO).
Yes, Is it possible to use the same API that is added by above
patch?
Regards,
Haribabu Kommi
Fujitsu Australia
On Thu, Oct 18, 2018 at 6:28 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
On Tue, Oct 16, 2018 at 6:06 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
* 0002-add-costing-function-to-API.patch – adds function for costing
sequential and table sample scan to tableam API. zheap costing
function are now copies of heap costing function. This should be
adjusted in future.This patch misses the new *_cost.c files that are added specific cost
functions.
Thank you for noticing. Revised patchset is attached.
Estimation for heap lookup during index scans
should be also pluggable, but not yet implemented (TODO).Yes, Is it possible to use the same API that is added by above
patch?
I'm not yet sure. I'll elaborate more on that. I'd like to keep
number of costing functions small. Handling of costing of index scan
heap fetches will probably require function signature change.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0002-add-costing-function-to-API-2.patchapplication/octet-stream; name=0002-add-costing-function-to-API-2.patchDownload
commit 3a0868018d84db24024e2731e5bd6972b66d10e4
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date: Mon Oct 15 21:33:31 2018 +0300
Add costing function to tableam interface
Costs of sequential scan and table sample scan are estimated using
cost_seqscan() and cost_samplescan(). But they should be table access method
specific, because different table AMs could have different costs. This commit
introduces zheap cost functions to be the same as heap. That should be
adjusted in future. Make costs of heap lookups during index scan is TODO.
diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile
index aee7bfd8346..e13b0e0b8fa 100644
--- a/src/backend/access/heap/Makefile
+++ b/src/backend/access/heap/Makefile
@@ -13,6 +13,7 @@ top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
OBJS = heapam.o heapam_handler.o heapam_visibility.o hio.o pruneheap.o \
- rewriteheap.o syncscan.o tuptoaster.o vacuumlazy.o visibilitymap.o
+ rewriteheap.o syncscan.o tuptoaster.o vacuumlazy.o visibilitymap.o \
+ heapam_cost.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/heapam_cost.c b/src/backend/access/heap/heapam_cost.c
new file mode 100644
index 00000000000..5c685c262a2
--- /dev/null
+++ b/src/backend/access/heap/heapam_cost.c
@@ -0,0 +1,187 @@
+/*-------------------------------------------------------------------------
+ *
+ * heapam_cost.c
+ * costing functions for heap access method
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/access/heap/heapam_cost.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <math.h>
+
+#include "access/amapi.h"
+#include "access/htup_details.h"
+#include "access/tableam.h"
+#include "access/tsmapi.h"
+#include "executor/executor.h"
+#include "executor/nodeHash.h"
+#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/placeholder.h"
+#include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "parser/parsetree.h"
+#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/spccache.h"
+#include "utils/tuplesort.h"
+
+/*
+ * heapam_cost_scan
+ * Determines and returns the cost of scanning a relation sequentially.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+heapam_cost_scan(Path *path, PlannerInfo *root,
+ RelOptInfo *baserel, ParamPathInfo *param_info)
+{
+ Cost startup_cost = 0;
+ Cost cpu_run_cost;
+ Cost disk_run_cost;
+ double spc_seq_page_cost;
+ QualCost qpqual_cost;
+ Cost cpu_per_tuple;
+
+ /* Should only be applied to base relations */
+ Assert(baserel->relid > 0);
+ Assert(baserel->rtekind == RTE_RELATION);
+
+ /* Mark the path with the correct row estimate */
+ if (param_info)
+ path->rows = param_info->ppi_rows;
+ else
+ path->rows = baserel->rows;
+
+ if (!enable_seqscan)
+ startup_cost += disable_cost;
+
+ /* fetch estimated page cost for tablespace containing table */
+ get_tablespace_page_costs(baserel->reltablespace,
+ NULL,
+ &spc_seq_page_cost);
+
+ /*
+ * disk costs
+ */
+ disk_run_cost = spc_seq_page_cost * baserel->pages;
+
+ /* CPU costs */
+ get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+ startup_cost += qpqual_cost.startup;
+ cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+ cpu_run_cost = cpu_per_tuple * baserel->tuples;
+ /* tlist eval costs are paid per output row, not per tuple scanned */
+ startup_cost += path->pathtarget->cost.startup;
+ cpu_run_cost += path->pathtarget->cost.per_tuple * path->rows;
+
+ /* Adjust costing for parallelism, if used. */
+ if (path->parallel_workers > 0)
+ {
+ double parallel_divisor = get_parallel_divisor(path);
+
+ /* The CPU cost is divided among all the workers. */
+ cpu_run_cost /= parallel_divisor;
+
+ /*
+ * It may be possible to amortize some of the I/O cost, but probably
+ * not very much, because most operating systems already do aggressive
+ * prefetching. For now, we assume that the disk run cost can't be
+ * amortized at all.
+ */
+
+ /*
+ * In the case of a parallel plan, the row count needs to represent
+ * the number of tuples processed per worker.
+ */
+ path->rows = clamp_row_est(path->rows / parallel_divisor);
+ }
+
+ path->startup_cost = startup_cost;
+ path->total_cost = startup_cost + cpu_run_cost + disk_run_cost;
+}
+
+/*
+ * heapam_cost_samplescan
+ * Determines and returns the cost of scanning a relation using sampling.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+heapam_cost_samplescan(Path *path, PlannerInfo *root,
+ RelOptInfo *baserel, ParamPathInfo *param_info)
+{
+ Cost startup_cost = 0;
+ Cost run_cost = 0;
+ RangeTblEntry *rte;
+ TableSampleClause *tsc;
+ TsmRoutine *tsm;
+ double spc_seq_page_cost,
+ spc_random_page_cost,
+ spc_page_cost;
+ QualCost qpqual_cost;
+ Cost cpu_per_tuple;
+
+ /* Should only be applied to base relations with tablesample clauses */
+ Assert(baserel->relid > 0);
+ rte = planner_rt_fetch(baserel->relid, root);
+ Assert(rte->rtekind == RTE_RELATION);
+ tsc = rte->tablesample;
+ Assert(tsc != NULL);
+ tsm = GetTsmRoutine(tsc->tsmhandler);
+
+ /* Mark the path with the correct row estimate */
+ if (param_info)
+ path->rows = param_info->ppi_rows;
+ else
+ path->rows = baserel->rows;
+
+ /* fetch estimated page cost for tablespace containing table */
+ get_tablespace_page_costs(baserel->reltablespace,
+ &spc_random_page_cost,
+ &spc_seq_page_cost);
+
+ /* if NextSampleBlock is used, assume random access, else sequential */
+ spc_page_cost = (tsm->NextSampleBlock != NULL) ?
+ spc_random_page_cost : spc_seq_page_cost;
+
+ /*
+ * disk costs (recall that baserel->pages has already been set to the
+ * number of pages the sampling method will visit)
+ */
+ run_cost += spc_page_cost * baserel->pages;
+
+ /*
+ * CPU costs (recall that baserel->tuples has already been set to the
+ * number of tuples the sampling method will select). Note that we ignore
+ * execution cost of the TABLESAMPLE parameter expressions; they will be
+ * evaluated only once per scan, and in most usages they'll likely be
+ * simple constants anyway. We also don't charge anything for the
+ * calculations the sampling method might do internally.
+ */
+ get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+ startup_cost += qpqual_cost.startup;
+ cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+ run_cost += cpu_per_tuple * baserel->tuples;
+ /* tlist eval costs are paid per output row, not per tuple scanned */
+ startup_cost += path->pathtarget->cost.startup;
+ run_cost += path->pathtarget->cost.per_tuple * path->rows;
+
+ path->startup_cost = startup_cost;
+ path->total_cost = startup_cost + run_cost;
+}
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 28c475e7bdc..35b230c3606 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2146,7 +2146,10 @@ static const TableAmRoutine heapam_methods = {
.index_build_range_scan = IndexBuildHeapRangeScan,
- .index_validate_scan = validate_index_heapscan
+ .index_validate_scan = validate_index_heapscan,
+
+ .cost_scan = heapam_cost_scan,
+ .cost_samplescan = heapam_cost_samplescan
};
const TableAmRoutine *
diff --git a/src/backend/access/zheap/Makefile b/src/backend/access/zheap/Makefile
index 75b0ff69ebf..4458e1a4238 100644
--- a/src/backend/access/zheap/Makefile
+++ b/src/backend/access/zheap/Makefile
@@ -13,6 +13,6 @@ top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
OBJS = prunetpd.o prunezheap.o tpd.o tpdxlog.o zheapam.o zheapam_handler.o zheapamutils.o zheapamxlog.o \
- zhio.o zmultilocker.o ztuptoaster.o ztqual.o zvacuumlazy.o
+ zhio.o zmultilocker.o ztuptoaster.o ztqual.o zvacuumlazy.o zheapam_cost.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/zheap/zheapam_cost.c b/src/backend/access/zheap/zheapam_cost.c
new file mode 100644
index 00000000000..613ee1fb6e7
--- /dev/null
+++ b/src/backend/access/zheap/zheapam_cost.c
@@ -0,0 +1,187 @@
+/*-------------------------------------------------------------------------
+ *
+ * zheapam_cost.c
+ * costing functions for zheap access method
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/access/zheap/zheapam_cost.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <math.h>
+
+#include "access/amapi.h"
+#include "access/htup_details.h"
+#include "access/tableam.h"
+#include "access/tsmapi.h"
+#include "executor/executor.h"
+#include "executor/nodeHash.h"
+#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/placeholder.h"
+#include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "parser/parsetree.h"
+#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/spccache.h"
+#include "utils/tuplesort.h"
+
+/*
+ * zheapam_cost_scan
+ * Determines and returns the cost of scanning a relation sequentially.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+zheapam_cost_scan(Path *path, PlannerInfo *root,
+ RelOptInfo *baserel, ParamPathInfo *param_info)
+{
+ Cost startup_cost = 0;
+ Cost cpu_run_cost;
+ Cost disk_run_cost;
+ double spc_seq_page_cost;
+ QualCost qpqual_cost;
+ Cost cpu_per_tuple;
+
+ /* Should only be applied to base relations */
+ Assert(baserel->relid > 0);
+ Assert(baserel->rtekind == RTE_RELATION);
+
+ /* Mark the path with the correct row estimate */
+ if (param_info)
+ path->rows = param_info->ppi_rows;
+ else
+ path->rows = baserel->rows;
+
+ if (!enable_seqscan)
+ startup_cost += disable_cost;
+
+ /* fetch estimated page cost for tablespace containing table */
+ get_tablespace_page_costs(baserel->reltablespace,
+ NULL,
+ &spc_seq_page_cost);
+
+ /*
+ * disk costs
+ */
+ disk_run_cost = spc_seq_page_cost * baserel->pages;
+
+ /* CPU costs */
+ get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+ startup_cost += qpqual_cost.startup;
+ cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+ cpu_run_cost = cpu_per_tuple * baserel->tuples;
+ /* tlist eval costs are paid per output row, not per tuple scanned */
+ startup_cost += path->pathtarget->cost.startup;
+ cpu_run_cost += path->pathtarget->cost.per_tuple * path->rows;
+
+ /* Adjust costing for parallelism, if used. */
+ if (path->parallel_workers > 0)
+ {
+ double parallel_divisor = get_parallel_divisor(path);
+
+ /* The CPU cost is divided among all the workers. */
+ cpu_run_cost /= parallel_divisor;
+
+ /*
+ * It may be possible to amortize some of the I/O cost, but probably
+ * not very much, because most operating systems already do aggressive
+ * prefetching. For now, we assume that the disk run cost can't be
+ * amortized at all.
+ */
+
+ /*
+ * In the case of a parallel plan, the row count needs to represent
+ * the number of tuples processed per worker.
+ */
+ path->rows = clamp_row_est(path->rows / parallel_divisor);
+ }
+
+ path->startup_cost = startup_cost;
+ path->total_cost = startup_cost + cpu_run_cost + disk_run_cost;
+}
+
+/*
+ * zheapam_cost_samplescan
+ * Determines and returns the cost of scanning a relation using sampling.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+zheapam_cost_samplescan(Path *path, PlannerInfo *root,
+ RelOptInfo *baserel, ParamPathInfo *param_info)
+{
+ Cost startup_cost = 0;
+ Cost run_cost = 0;
+ RangeTblEntry *rte;
+ TableSampleClause *tsc;
+ TsmRoutine *tsm;
+ double spc_seq_page_cost,
+ spc_random_page_cost,
+ spc_page_cost;
+ QualCost qpqual_cost;
+ Cost cpu_per_tuple;
+
+ /* Should only be applied to base relations with tablesample clauses */
+ Assert(baserel->relid > 0);
+ rte = planner_rt_fetch(baserel->relid, root);
+ Assert(rte->rtekind == RTE_RELATION);
+ tsc = rte->tablesample;
+ Assert(tsc != NULL);
+ tsm = GetTsmRoutine(tsc->tsmhandler);
+
+ /* Mark the path with the correct row estimate */
+ if (param_info)
+ path->rows = param_info->ppi_rows;
+ else
+ path->rows = baserel->rows;
+
+ /* fetch estimated page cost for tablespace containing table */
+ get_tablespace_page_costs(baserel->reltablespace,
+ &spc_random_page_cost,
+ &spc_seq_page_cost);
+
+ /* if NextSampleBlock is used, assume random access, else sequential */
+ spc_page_cost = (tsm->NextSampleBlock != NULL) ?
+ spc_random_page_cost : spc_seq_page_cost;
+
+ /*
+ * disk costs (recall that baserel->pages has already been set to the
+ * number of pages the sampling method will visit)
+ */
+ run_cost += spc_page_cost * baserel->pages;
+
+ /*
+ * CPU costs (recall that baserel->tuples has already been set to the
+ * number of tuples the sampling method will select). Note that we ignore
+ * execution cost of the TABLESAMPLE parameter expressions; they will be
+ * evaluated only once per scan, and in most usages they'll likely be
+ * simple constants anyway. We also don't charge anything for the
+ * calculations the sampling method might do internally.
+ */
+ get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+ startup_cost += qpqual_cost.startup;
+ cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+ run_cost += cpu_per_tuple * baserel->tuples;
+ /* tlist eval costs are paid per output row, not per tuple scanned */
+ startup_cost += path->pathtarget->cost.startup;
+ run_cost += path->pathtarget->cost.per_tuple * path->rows;
+
+ path->startup_cost = startup_cost;
+ path->total_cost = startup_cost + run_cost;
+}
diff --git a/src/backend/access/zheap/zheapam_handler.c b/src/backend/access/zheap/zheapam_handler.c
index e707baa1b5d..da95c881a77 100644
--- a/src/backend/access/zheap/zheapam_handler.c
+++ b/src/backend/access/zheap/zheapam_handler.c
@@ -1627,6 +1627,8 @@ static const TableAmRoutine zheapam_methods = {
.index_build_range_scan = IndexBuildZHeapRangeScan,
.index_validate_scan = validate_index_zheapscan,
+ .cost_scan = zheapam_cost_scan,
+ .cost_samplescan = zheapam_cost_samplescan
};
Datum
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 7bf67a05295..abdddacf89a 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -75,6 +75,7 @@
#include "access/amapi.h"
#include "access/htup_details.h"
+#include "access/tableam.h"
#include "access/tsmapi.h"
#include "executor/executor.h"
#include "executor/nodeHash.h"
@@ -150,9 +151,6 @@ static MergeScanSelCache *cached_scansel(PlannerInfo *root,
static void cost_rescan(PlannerInfo *root, Path *path,
Cost *rescan_startup_cost, Cost *rescan_total_cost);
static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
-static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
- ParamPathInfo *param_info,
- QualCost *qpqual_cost);
static bool has_indexed_join_quals(NestPath *joinpath);
static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
List *quals);
@@ -174,7 +172,6 @@ static Cost append_nonpartial_cost(List *subpaths, int numpaths,
static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
static double relation_byte_size(double tuples, int width);
static double page_size(double tuples, int width);
-static double get_parallel_divisor(Path *path);
/*
@@ -209,70 +206,9 @@ void
cost_seqscan(Path *path, PlannerInfo *root,
RelOptInfo *baserel, ParamPathInfo *param_info)
{
- Cost startup_cost = 0;
- Cost cpu_run_cost;
- Cost disk_run_cost;
- double spc_seq_page_cost;
- QualCost qpqual_cost;
- Cost cpu_per_tuple;
-
- /* Should only be applied to base relations */
- Assert(baserel->relid > 0);
- Assert(baserel->rtekind == RTE_RELATION);
-
- /* Mark the path with the correct row estimate */
- if (param_info)
- path->rows = param_info->ppi_rows;
- else
- path->rows = baserel->rows;
-
- if (!enable_seqscan)
- startup_cost += disable_cost;
-
- /* fetch estimated page cost for tablespace containing table */
- get_tablespace_page_costs(baserel->reltablespace,
- NULL,
- &spc_seq_page_cost);
-
- /*
- * disk costs
- */
- disk_run_cost = spc_seq_page_cost * baserel->pages;
-
- /* CPU costs */
- get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
-
- startup_cost += qpqual_cost.startup;
- cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
- cpu_run_cost = cpu_per_tuple * baserel->tuples;
- /* tlist eval costs are paid per output row, not per tuple scanned */
- startup_cost += path->pathtarget->cost.startup;
- cpu_run_cost += path->pathtarget->cost.per_tuple * path->rows;
-
- /* Adjust costing for parallelism, if used. */
- if (path->parallel_workers > 0)
- {
- double parallel_divisor = get_parallel_divisor(path);
-
- /* The CPU cost is divided among all the workers. */
- cpu_run_cost /= parallel_divisor;
-
- /*
- * It may be possible to amortize some of the I/O cost, but probably
- * not very much, because most operating systems already do aggressive
- * prefetching. For now, we assume that the disk run cost can't be
- * amortized at all.
- */
-
- /*
- * In the case of a parallel plan, the row count needs to represent
- * the number of tuples processed per worker.
- */
- path->rows = clamp_row_est(path->rows / parallel_divisor);
- }
-
- path->startup_cost = startup_cost;
- path->total_cost = startup_cost + cpu_run_cost + disk_run_cost;
+ TableScanCost_function cost_func = (TableScanCost_function) baserel->cost_scan;
+ Assert(cost_func != NULL);
+ return cost_func(path, root, baserel, param_info);
}
/*
@@ -286,65 +222,9 @@ void
cost_samplescan(Path *path, PlannerInfo *root,
RelOptInfo *baserel, ParamPathInfo *param_info)
{
- Cost startup_cost = 0;
- Cost run_cost = 0;
- RangeTblEntry *rte;
- TableSampleClause *tsc;
- TsmRoutine *tsm;
- double spc_seq_page_cost,
- spc_random_page_cost,
- spc_page_cost;
- QualCost qpqual_cost;
- Cost cpu_per_tuple;
-
- /* Should only be applied to base relations with tablesample clauses */
- Assert(baserel->relid > 0);
- rte = planner_rt_fetch(baserel->relid, root);
- Assert(rte->rtekind == RTE_RELATION);
- tsc = rte->tablesample;
- Assert(tsc != NULL);
- tsm = GetTsmRoutine(tsc->tsmhandler);
-
- /* Mark the path with the correct row estimate */
- if (param_info)
- path->rows = param_info->ppi_rows;
- else
- path->rows = baserel->rows;
-
- /* fetch estimated page cost for tablespace containing table */
- get_tablespace_page_costs(baserel->reltablespace,
- &spc_random_page_cost,
- &spc_seq_page_cost);
-
- /* if NextSampleBlock is used, assume random access, else sequential */
- spc_page_cost = (tsm->NextSampleBlock != NULL) ?
- spc_random_page_cost : spc_seq_page_cost;
-
- /*
- * disk costs (recall that baserel->pages has already been set to the
- * number of pages the sampling method will visit)
- */
- run_cost += spc_page_cost * baserel->pages;
-
- /*
- * CPU costs (recall that baserel->tuples has already been set to the
- * number of tuples the sampling method will select). Note that we ignore
- * execution cost of the TABLESAMPLE parameter expressions; they will be
- * evaluated only once per scan, and in most usages they'll likely be
- * simple constants anyway. We also don't charge anything for the
- * calculations the sampling method might do internally.
- */
- get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
-
- startup_cost += qpqual_cost.startup;
- cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
- run_cost += cpu_per_tuple * baserel->tuples;
- /* tlist eval costs are paid per output row, not per tuple scanned */
- startup_cost += path->pathtarget->cost.startup;
- run_cost += path->pathtarget->cost.per_tuple * path->rows;
-
- path->startup_cost = startup_cost;
- path->total_cost = startup_cost + run_cost;
+ TableScanCost_function cost_func = (TableScanCost_function) baserel->cost_samplescan;
+ Assert(cost_func != NULL);
+ return cost_func(path, root, baserel, param_info);
}
/*
@@ -3988,7 +3868,7 @@ cost_qual_eval_walker(Node *node, cost_qual_eval_context *context)
* some of the quals. We assume baserestrictcost was previously set by
* set_baserel_size_estimates().
*/
-static void
+void
get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info,
QualCost *qpqual_cost)
@@ -5343,7 +5223,7 @@ page_size(double tuples, int width)
* Estimate the fraction of the work that each worker will do given the
* number of workers budgeted for the path.
*/
-static double
+double
get_parallel_divisor(Path *path)
{
double parallel_divisor = path->parallel_workers;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 86029cd1327..45f3b0372b9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -93,6 +93,8 @@ static void set_baserel_partition_key_exprs(Relation relation,
* pages number of pages
* tuples number of tuples
* rel_parallel_workers user-defined number of parallel workers
+ * cost_scan costing function for sequential scan
+ * cost_samplescan costing function for sample scan
*
* Also, add information about the relation's foreign keys to root->fkey_list.
*
@@ -443,6 +445,18 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
rel->fdwroutine = NULL;
}
+ /* Get costing functions */
+ if (relation->rd_tableamroutine != NULL)
+ {
+ rel->cost_scan = relation->rd_tableamroutine->cost_scan;
+ rel->cost_samplescan = relation->rd_tableamroutine->cost_samplescan;
+ }
+ else
+ {
+ rel->cost_scan = NULL;
+ rel->cost_samplescan = NULL;
+ }
+
/* Collect info about relation's foreign keys, if relevant */
get_relation_foreign_keys(root, rel, relation, inhparent);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index dba50178887..2f342c7fef1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -17,6 +17,7 @@
#include "access/genham.h"
#include "nodes/lockoptions.h"
#include "nodes/primnodes.h"
+#include "optimizer/cost.h"
#include "storage/bufpage.h"
#include "storage/lockdefs.h"
#include "utils/relcache.h"
@@ -59,6 +60,13 @@ extern Relation heap_openrv_extended(const RangeVar *relation,
#define heap_close(r,l) relation_close(r,l)
+/* in heap/heamam_cost.c */
+extern void heapam_cost_scan(Path *path, PlannerInfo *root,
+ RelOptInfo *baserel, ParamPathInfo *param_info);
+extern void heapam_cost_samplescan(Path *path, PlannerInfo *root,
+ RelOptInfo *baserel, ParamPathInfo *param_info);
+
+
/* struct definitions appear in relscan.h */
typedef struct TableScanDescData *TableScanDesc;
typedef struct HeapScanDescData *HeapScanDesc;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index fb37a739918..beea954885a 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,6 +22,7 @@
#include "executor/tuptable.h"
#include "nodes/execnodes.h"
#include "nodes/nodes.h"
+#include "optimizer/cost.h"
#include "fmgr.h"
#include "utils/guc.h"
#include "utils/rel.h"
@@ -192,6 +193,8 @@ struct SampleScanState;
typedef bool (*SampleScanNextBlock_function)(TableScanDesc scan, struct SampleScanState *scanstate);
typedef bool (*SampleScanNextTuple_function)(TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot);
+typedef void (*TableScanCost_function)(Path *path, PlannerInfo *root, RelOptInfo *baserel, ParamPathInfo *param_info);
+
/*
* API struct for a table AM. Note this must be allocated in a
* server-lifetime manner, typically as a static const struct.
@@ -246,6 +249,10 @@ typedef struct TableAmRoutine
IndexBuildRangeScan_function index_build_range_scan;
IndexValidateScan_function index_validate_scan;
+
+ /* Costing functions */
+ TableScanCost_function cost_scan;
+ TableScanCost_function cost_samplescan;
} TableAmRoutine;
static inline const TupleTableSlotOps*
diff --git a/src/include/access/zheap.h b/src/include/access/zheap.h
index c657c728ec3..583cf25f965 100644
--- a/src/include/access/zheap.h
+++ b/src/include/access/zheap.h
@@ -20,6 +20,7 @@
#include "access/hio.h"
#include "access/undoinsert.h"
#include "access/zhtup.h"
+#include "optimizer/cost.h"
#include "utils/rel.h"
#include "utils/snapshot.h"
@@ -211,4 +212,11 @@ typedef struct ZHeapFreeOffsetRanges
int nranges;
} ZHeapFreeOffsetRanges;
+/* Zheap costing functions */
+extern void zheapam_cost_scan(Path *path, PlannerInfo *root,
+ RelOptInfo *baserel, ParamPathInfo *param_info);
+extern void zheapam_cost_samplescan(Path *path, PlannerInfo *root,
+ RelOptInfo *baserel, ParamPathInfo *param_info);
+
+
#endif /* ZHEAP_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index adb42650479..6c51fe27460 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -704,6 +704,10 @@ typedef struct RelOptInfo
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
List *partitioned_child_rels; /* List of RT indexes. */
+
+ /* Rather than include tableam.h here, we declare costing functions like this */
+ void (*cost_scan) (); /* sequential scan cost estimator */
+ void (*cost_samplescan) (); /* sample scan cost estimator */
} RelOptInfo;
/*
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 77ca7ff8371..0af574c41fe 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -166,6 +166,9 @@ extern void cost_gather(GatherPath *path, PlannerInfo *root,
extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
+extern void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
+ ParamPathInfo *param_info,
+ QualCost *qpqual_cost);
extern void compute_semi_anti_join_factors(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
@@ -198,6 +201,7 @@ extern void set_tablefunc_size_estimates(PlannerInfo *root, RelOptInfo *rel);
extern void set_namedtuplestore_size_estimates(PlannerInfo *root, RelOptInfo *rel);
extern void set_foreign_size_estimates(PlannerInfo *root, RelOptInfo *rel);
extern PathTarget *set_pathtarget_cost_width(PlannerInfo *root, PathTarget *target);
+extern double get_parallel_divisor(Path *path);
extern double compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel,
Path *bitmapqual, int loop_count, Cost *cost, double *tuple);
0001-remove-extra-snapshot-functions-2.patchapplication/octet-stream; name=0001-remove-extra-snapshot-functions-2.patchDownload
commit fea0ea9bceb698dbe88bee42d5fc5f3332a658dd
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed Sep 26 15:29:43 2018 +0300
Remove some snapshot functions from TableAmRoutine
snapshot_satisfiesUpdate was unused. snapshot_satisfiesVacuum was used only
inside heap_copy_for_cluster, so it was replaced to direct
heapam_satisfies_vacuum() call.
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c3960dc91fd..28c475e7bdc 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -483,21 +483,6 @@ heapam_satisfies(TupleTableSlot *slot, Snapshot snapshot)
return res;
}
-static HTSU_Result
-heapam_satisfies_update(TupleTableSlot *slot, CommandId curcid)
-{
- BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
- HTSU_Result res;
-
- LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
- res = HeapTupleSatisfiesUpdate(bslot->base.tuple,
- curcid,
- bslot->buffer);
- LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
-
- return res;
-}
-
static HTSV_Result
heapam_satisfies_vacuum(TupleTableSlot *slot, TransactionId OldestXmin)
{
@@ -2003,7 +1988,7 @@ heap_copy_for_cluster(Relation OldHeap, Relation NewHeap, Relation OldIndex,
break;
}
- switch (OldHeap->rd_tableamroutine->snapshot_satisfiesVacuum(slot, OldestXmin))
+ switch (heapam_satisfies_vacuum(slot, OldestXmin))
{
case HEAPTUPLE_DEAD:
/* Definitely dead */
@@ -2122,8 +2107,6 @@ static const TableAmRoutine heapam_methods = {
.slot_callbacks = heapam_slot_callbacks,
.snapshot_satisfies = heapam_satisfies,
- .snapshot_satisfiesUpdate = heapam_satisfies_update,
- .snapshot_satisfiesVacuum = heapam_satisfies_vacuum,
.scan_begin = heap_beginscan,
.scansetlimits = heap_setscanlimits,
diff --git a/src/backend/access/zheap/zheapam_handler.c b/src/backend/access/zheap/zheapam_handler.c
index bec9b16f7d6..e707baa1b5d 100644
--- a/src/backend/access/zheap/zheapam_handler.c
+++ b/src/backend/access/zheap/zheapam_handler.c
@@ -486,40 +486,6 @@ zheapam_satisfies(TupleTableSlot *slot, Snapshot snapshot)
#endif
}
-static HTSU_Result
-zheapam_satisfies_update(TupleTableSlot *slot, CommandId curcid)
-{
- elog(ERROR, "would need to track buffer or refetch");
-#if ZBORKED
- BufferHeapTupleTableSlot *zslot = (BufferHeapTupleTableSlot *) slot;
- HTSU_Result res;
-
-
- LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
- res = HeapTupleSatisfiesUpdate(bslot->base.tuple,
- curcid,
- bslot->buffer);
- LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
-
- return res;
-#endif
-}
-
-static HTSV_Result
-zheapam_satisfies_vacuum(TupleTableSlot *slot, TransactionId OldestXmin)
-{
- BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
- HTSV_Result res;
-
- LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
- res = HeapTupleSatisfiesVacuum(bslot->base.tuple,
- OldestXmin,
- bslot->buffer);
- LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
-
- return res;
-}
-
static IndexFetchTableData*
zheapam_begin_index_fetch(Relation rel)
{
@@ -1621,8 +1587,6 @@ static const TableAmRoutine zheapam_methods = {
.slot_callbacks = zheapam_slot_callbacks,
.snapshot_satisfies = zheapam_satisfies,
- .snapshot_satisfiesUpdate = zheapam_satisfies_update,
- .snapshot_satisfiesVacuum = zheapam_satisfies_vacuum,
.scan_begin = zheap_beginscan,
.scansetlimits = zheap_setscanlimits,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 7fe6ff6c221..fb37a739918 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -55,8 +55,6 @@ extern bool synchronize_seqscans;
* Storage routine function hooks
*/
typedef bool (*SnapshotSatisfies_function) (TupleTableSlot *slot, Snapshot snapshot);
-typedef HTSU_Result (*SnapshotSatisfiesUpdate_function) (TupleTableSlot *slot, CommandId curcid);
-typedef HTSV_Result (*SnapshotSatisfiesVacuum_function) (TupleTableSlot *slot, TransactionId OldestXmin);
typedef Oid (*TupleInsert_function) (Relation rel, TupleTableSlot *slot, CommandId cid,
int options, BulkInsertState bistate);
@@ -205,8 +203,6 @@ typedef struct TableAmRoutine
SlotCallbacks_function slot_callbacks;
SnapshotSatisfies_function snapshot_satisfies;
- SnapshotSatisfiesUpdate_function snapshot_satisfiesUpdate;
- SnapshotSatisfiesVacuum_function snapshot_satisfiesVacuum;
/* Operations on physical tuples */
TupleInsert_function tuple_insert;
On Thu, Oct 18, 2018 at 1:04 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
On Tue, Oct 9, 2018 at 1:46 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:I also observed the failure of aggregates.sql, will look into it.
The random failure of aggregates.sql is as follows
SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
! avg_32
! ---------------------
! 32.6666666666666667
(1 row)-- In 7.1, avg(float4) is computed using float8 arithmetic. --- 8,16 ---- (1 row)SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
! avg_32
! --------
!
(1 row)Same NULL result for another aggregate query on column b.
The aggtest table is accessed by two tests that are running in parallel.
i.e aggregates.sql and transactions.sql, In transactions.sql, inside a
transaction
all the records in the aggtest table are deleted and aborted the
transaction,
I suspect that some visibility checks are having some race conditions that
leads
to no records on the table aggtest, thus it returns NULL result.If I try the scenario manually by opening a transaction and deleting the
records, the
issue is not occurring.I am yet to find the cause for this problem.
I am not yet able to generate a test case where the above issue can occur
easily for
debugging, it is happening randomly. I will try to add some logs to find
out the problem.
During the checking for the above problem, I found some corrections,
1. Remove of the tableam_common.c file as it is not used.
2. Remove the extra heaptuple visibile check in heapgettup_pagemode function
3. New API for init fork.
Regards,
Haribabu Kommi
Fujitsu Australia
Attachments:
0003-init-fork-API.patchapplication/octet-stream; name=0003-init-fork-API.patchDownload
From edc69f750fa13e252bff67f4aa615d4fdcec2b5e Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 22 Oct 2018 17:58:27 +1100
Subject: [PATCH 3/3] init fork API
API to create INIT_FORKNUM file with wrapper
table_create_init_fork.
---
src/backend/access/heap/heapam_handler.c | 27 +++++++++++++++++++++++-
src/backend/catalog/heap.c | 24 ++-------------------
src/backend/commands/tablecmds.c | 4 ++--
src/include/access/tableam.h | 10 +++++++++
src/include/catalog/heap.h | 2 --
5 files changed, 40 insertions(+), 27 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c3960dc91f..c0cfbe74b1 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -30,6 +30,7 @@
#include "catalog/catalog.h"
#include "catalog/index.h"
#include "catalog/pg_am_d.h"
+#include "catalog/storage_xlog.h"
#include "executor/executor.h"
#include "pgstat.h"
#include "storage/lmgr.h"
@@ -2116,6 +2117,28 @@ heap_copy_for_cluster(Relation OldHeap, Relation NewHeap, Relation OldIndex,
pfree(isnull);
}
+/*
+ * Set up an init fork for an unlogged table so that it can be correctly
+ * reinitialized on restart. An immediate sync is required even if the
+ * page has been logged, because the write did not go through
+ * shared_buffers and therefore a concurrent checkpoint may have moved
+ * the redo pointer past our xlog record. Recovery may as well remove it
+ * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
+ * record. Therefore, logging is necessary even if wal_level=minimal.
+ */
+static void
+heap_create_init_fork(Relation rel)
+{
+ Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
+ rel->rd_rel->relkind == RELKIND_MATVIEW ||
+ rel->rd_rel->relkind == RELKIND_TOASTVALUE);
+ RelationOpenSmgr(rel);
+ smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
+ log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
+ smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
+}
+
+
static const TableAmRoutine heapam_methods = {
.type = T_TableAmRoutine,
@@ -2163,7 +2186,9 @@ static const TableAmRoutine heapam_methods = {
.index_build_range_scan = IndexBuildHeapRangeScan,
- .index_validate_scan = validate_index_heapscan
+ .index_validate_scan = validate_index_heapscan,
+
+ .CreateInitFork = heap_create_init_fork
};
const TableAmRoutine *
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 38b368f916..8e7c8ce684 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -32,6 +32,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/sysattr.h"
+#include "access/tableam.h"
#include "access/transam.h"
#include "access/xact.h"
#include "access/xlog.h"
@@ -1425,7 +1426,7 @@ heap_create_with_catalog(const char *relname,
*/
if (relpersistence == RELPERSISTENCE_UNLOGGED &&
relkind != RELKIND_PARTITIONED_TABLE)
- heap_create_init_fork(new_rel_desc);
+ table_create_init_fork(new_rel_desc);
/*
* ok, the relation has been cataloged, so close our relations and return
@@ -1437,27 +1438,6 @@ heap_create_with_catalog(const char *relname,
return relid;
}
-/*
- * Set up an init fork for an unlogged table so that it can be correctly
- * reinitialized on restart. An immediate sync is required even if the
- * page has been logged, because the write did not go through
- * shared_buffers and therefore a concurrent checkpoint may have moved
- * the redo pointer past our xlog record. Recovery may as well remove it
- * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
- * record. Therefore, logging is necessary even if wal_level=minimal.
- */
-void
-heap_create_init_fork(Relation rel)
-{
- Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
- rel->rd_rel->relkind == RELKIND_MATVIEW ||
- rel->rd_rel->relkind == RELKIND_TOASTVALUE);
- RelationOpenSmgr(rel);
- smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
- log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
- smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
-}
-
/*
* RelationRemoveInheritance
*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f3526b267d..3c46a48882 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1649,7 +1649,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
RecentXmin, minmulti);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
- heap_create_init_fork(rel);
+ table_create_init_fork(rel);
heap_relid = RelationGetRelid(rel);
toast_relid = rel->rd_rel->reltoastrelid;
@@ -1663,7 +1663,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
RecentXmin, minmulti);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
- heap_create_init_fork(rel);
+ table_create_init_fork(rel);
heap_close(rel, NoLock);
}
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 7fe6ff6c22..79c71b06e5 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -194,6 +194,8 @@ struct SampleScanState;
typedef bool (*SampleScanNextBlock_function)(TableScanDesc scan, struct SampleScanState *scanstate);
typedef bool (*SampleScanNextTuple_function)(TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot);
+typedef void (*CreateInitFork_function)(Relation rel);
+
/*
* API struct for a table AM. Note this must be allocated in a
* server-lifetime manner, typically as a static const struct.
@@ -250,6 +252,8 @@ typedef struct TableAmRoutine
IndexBuildRangeScan_function index_build_range_scan;
IndexValidateScan_function index_validate_scan;
+
+ CreateInitFork_function CreateInitFork;
} TableAmRoutine;
static inline const TupleTableSlotOps*
@@ -741,6 +745,12 @@ table_index_build_range_scan(Relation heapRelation,
scan);
}
+static inline void
+table_create_init_fork(Relation relation)
+{
+ relation->rd_tableamroutine->CreateInitFork(relation);
+}
+
extern BlockNumber table_parallelscan_nextpage(TableScanDesc scan);
extern void table_parallelscan_startblock_init(TableScanDesc scan);
extern Size table_parallelscan_estimate(Snapshot snapshot);
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 4584b3473c..c0e706ecc9 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -77,8 +77,6 @@ extern Oid heap_create_with_catalog(const char *relname,
Oid relrewrite,
ObjectAddress *typaddress);
-extern void heap_create_init_fork(Relation rel);
-
extern void heap_drop_with_catalog(Oid relid);
extern void heap_truncate(List *relids);
--
2.18.0.windows.1
0002-Remove-the-extra-Tuple-visibility-function.patchapplication/octet-stream; name=0002-Remove-the-extra-Tuple-visibility-function.patchDownload
From 3b4974dd4fa27f7a726b49cb8b16828818fe4093 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 22 Oct 2018 16:07:07 +1100
Subject: [PATCH 2/3] Remove the extra Tuple visibility function
In heapgettup_pagemode the tuple visiblity check is added
during the early devlopment of pluggable storage, but the visibility
check is already carried out in heapgetpage function itself.
---
src/backend/access/heap/heapam.c | 28 +++++++++++-----------------
1 file changed, 11 insertions(+), 17 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ec99d0bcae..ef6b4c3e54 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -947,31 +947,25 @@ heapgettup_pagemode(HeapScanDesc scan,
/*
* if current tuple qualifies, return it.
*/
- if (HeapTupleSatisfies(tuple, scan->rs_scan.rs_snapshot, scan->rs_cbuf))
+ if (key != NULL)
{
- /*
- * if current tuple qualifies, return it.
- */
- if (key != NULL)
- {
- bool valid;
+ bool valid;
- HeapKeyTest(tuple, RelationGetDescr(scan->rs_scan.rs_rd),
- nkeys, key, valid);
- if (valid)
- {
- scan->rs_cindex = lineindex;
- LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
- return;
- }
- }
- else
+ HeapKeyTest(tuple, RelationGetDescr(scan->rs_scan.rs_rd),
+ nkeys, key, valid);
+ if (valid)
{
scan->rs_cindex = lineindex;
LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
return;
}
}
+ else
+ {
+ scan->rs_cindex = lineindex;
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+ return;
+ }
/*
* otherwise move to the next item on the page
--
2.18.0.windows.1
0001-Remove-the-old-slot-interface-file-and-also-update-t.patchapplication/octet-stream; name=0001-Remove-the-old-slot-interface-file-and-also-update-t.patchDownload
From ab6169b484b134522a4bacb22f226c3dd4f67e42 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 22 Oct 2018 16:06:10 +1100
Subject: [PATCH 1/3] Remove the old slot interface file and also update the
Makefile.
---
src/backend/access/table/Makefile | 2 +-
src/backend/access/table/tableam_common.c | 0
2 files changed, 1 insertion(+), 1 deletion(-)
delete mode 100644 src/backend/access/table/tableam_common.c
diff --git a/src/backend/access/table/Makefile b/src/backend/access/table/Makefile
index fe22bf9208..006ba99182 100644
--- a/src/backend/access/table/Makefile
+++ b/src/backend/access/table/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/access/table
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = tableam.o tableamapi.o tableam_common.o
+OBJS = tableam.o tableamapi.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/table/tableam_common.c b/src/backend/access/table/tableam_common.c
deleted file mode 100644
index e69de29bb2..0000000000
--
2.18.0.windows.1
On Mon, Oct 22, 2018 at 6:16 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
On Thu, Oct 18, 2018 at 1:04 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:On Tue, Oct 9, 2018 at 1:46 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:I also observed the failure of aggregates.sql, will look into it.
The random failure of aggregates.sql is as follows
SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
! avg_32
! ---------------------
! 32.6666666666666667
(1 row)-- In 7.1, avg(float4) is computed using float8 arithmetic. --- 8,16 ---- (1 row)SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
! avg_32
! --------
!
(1 row)Same NULL result for another aggregate query on column b.
The aggtest table is accessed by two tests that are running in parallel.
i.e aggregates.sql and transactions.sql, In transactions.sql, inside a
transaction
all the records in the aggtest table are deleted and aborted the
transaction,
I suspect that some visibility checks are having some race conditions
that leads
to no records on the table aggtest, thus it returns NULL result.If I try the scenario manually by opening a transaction and deleting the
records, the
issue is not occurring.I am yet to find the cause for this problem.
I am not yet able to generate a test case where the above issue can occur
easily for
debugging, it is happening randomly. I will try to add some logs to find
out the problem.
I am able to generate the simple test and found the problem. The issue with
the following
SQL.
SELECT *
INTO TABLE xacttest
FROM aggtest;
During the processing of the above query, the tuple that is selected from
the aggtest is
sent to the intorel_receive() function, and the same tuple is used for the
insert, because
of this reason, the tuple xmin is updated and it leads to failure of
selecting the data from
another query. I fixed this issue by materializing the slot.
During the above test run, I found another issue during analyze, that is
trying to access
the invalid offset data. Attached a fix patch.
Regards,
Haribabu Kommi
Fujitsu Australia
Attachments:
0001-scan-start-offset-fix-during-analyze.patchapplication/octet-stream; name=0001-scan-start-offset-fix-during-analyze.patchDownload
From 82ae4f7a6ef78c9e04bc5abeeb0593b890ee454b Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Tue, 23 Oct 2018 15:42:57 +1100
Subject: [PATCH 1/2] scan start offset fix during analyze
---
src/backend/access/heap/heapam_handler.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c0cfbe74b1..ae832e1f71 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1742,7 +1742,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
{
HeapScanDesc scan = (HeapScanDesc) sscan;
Page targpage;
- OffsetNumber targoffset = scan->rs_cindex;
+ OffsetNumber targoffset;
OffsetNumber maxoffset;
BufferHeapTupleTableSlot *hslot;
@@ -1752,7 +1752,9 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
maxoffset = PageGetMaxOffsetNumber(targpage);
/* Inner loop over all tuples on the selected page */
- for (targoffset = scan->rs_cindex; targoffset <= maxoffset; targoffset++)
+ for (targoffset = scan->rs_cindex ? scan->rs_cindex : FirstOffsetNumber;
+ targoffset <= maxoffset;
+ targoffset++)
{
ItemId itemid;
HeapTuple targtuple = &hslot->base.tupdata;
--
2.18.0.windows.1
0002-Materialize-all-the-slots-before-they-are-processed-.patchapplication/octet-stream; name=0002-Materialize-all-the-slots-before-they-are-processed-.patchDownload
From 52865d2ede0c6d0b2b8af26d67736124cd44450d Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Tue, 23 Oct 2018 17:15:18 +1100
Subject: [PATCH 2/2] Materialize all the slots before they are processed using
into_rel
---
src/backend/commands/createas.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 84de804175..27a28a896d 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -593,6 +593,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
if (myState->rel->rd_rel->relhasoids)
slot->tts_tupleOid = InvalidOid;
+ ExecMaterializeSlot(slot);
table_insert(myState->rel,
slot,
myState->output_cid,
--
2.18.0.windows.1
On Tue, Oct 23, 2018 at 5:49 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
I am able to generate the simple test and found the problem. The issue
with the following
SQL.SELECT *
INTO TABLE xacttest
FROM aggtest;During the processing of the above query, the tuple that is selected from
the aggtest is
sent to the intorel_receive() function, and the same tuple is used for the
insert, because
of this reason, the tuple xmin is updated and it leads to failure of
selecting the data from
another query. I fixed this issue by materializing the slot.
Wrong patch attached in the earlier mail, sorry for the inconvenience.
Attached proper fix patch.
I will look into isolation test failures.
Regards,
Haribabu Kommi
Fujitsu Australia
Attachments:
0002-Materialize-the-slot-before-they-are-processed-using.patchapplication/octet-stream; name=0002-Materialize-the-slot-before-they-are-processed-using.patchDownload
From f0d9dbf5c5608beb99b879e7317b68a285bbeab8 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Tue, 23 Oct 2018 17:15:18 +1100
Subject: [PATCH 2/2] Materialize the slot before they are processed using
intorel_receive
---
src/backend/commands/createas.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 84de804175..d3ffe417ff 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -593,6 +593,10 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
if (myState->rel->rd_rel->relhasoids)
slot->tts_tupleOid = InvalidOid;
+ /* Materialize the slot */
+ if (!TTS_IS_VIRTUAL(slot))
+ ExecMaterializeSlot(slot);
+
table_insert(myState->rel,
slot,
myState->output_cid,
--
2.18.0.windows.1
On Tue, Oct 23, 2018 at 6:11 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
On Tue, Oct 23, 2018 at 5:49 PM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:I am able to generate the simple test and found the problem. The issue
with the following
SQL.SELECT *
INTO TABLE xacttest
FROM aggtest;During the processing of the above query, the tuple that is selected from
the aggtest is
sent to the intorel_receive() function, and the same tuple is used for
the insert, because
of this reason, the tuple xmin is updated and it leads to failure of
selecting the data from
another query. I fixed this issue by materializing the slot.Wrong patch attached in the earlier mail, sorry for the inconvenience.
Attached proper fix patch.I will look into isolation test failures.
Here I attached the cumulative patch with all fixes that are shared in
earlier mails by me.
Except fast_default test, rest of test failures are fixed.
Regards,
Haribabu Kommi
Fujitsu Australia
Attachments:
0002-init-fork-API.patchapplication/octet-stream; name=0002-init-fork-API.patchDownload
From 118907d991360848715c893f8a9cf892ecc2bd5b Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 22 Oct 2018 17:58:27 +1100
Subject: [PATCH 2/2] init fork API
API to create INIT_FORKNUM file with wrapper
table_create_init_fork.
---
src/backend/access/heap/heapam_handler.c | 27 +++++++++++++++++++++++-
src/backend/catalog/heap.c | 24 ++-------------------
src/backend/commands/tablecmds.c | 4 ++--
src/include/access/tableam.h | 10 +++++++++
src/include/catalog/heap.h | 2 --
5 files changed, 40 insertions(+), 27 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3254e30a45..ae832e1f71 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -30,6 +30,7 @@
#include "catalog/catalog.h"
#include "catalog/index.h"
#include "catalog/pg_am_d.h"
+#include "catalog/storage_xlog.h"
#include "executor/executor.h"
#include "pgstat.h"
#include "storage/lmgr.h"
@@ -2118,6 +2119,28 @@ heap_copy_for_cluster(Relation OldHeap, Relation NewHeap, Relation OldIndex,
pfree(isnull);
}
+/*
+ * Set up an init fork for an unlogged table so that it can be correctly
+ * reinitialized on restart. An immediate sync is required even if the
+ * page has been logged, because the write did not go through
+ * shared_buffers and therefore a concurrent checkpoint may have moved
+ * the redo pointer past our xlog record. Recovery may as well remove it
+ * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
+ * record. Therefore, logging is necessary even if wal_level=minimal.
+ */
+static void
+heap_create_init_fork(Relation rel)
+{
+ Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
+ rel->rd_rel->relkind == RELKIND_MATVIEW ||
+ rel->rd_rel->relkind == RELKIND_TOASTVALUE);
+ RelationOpenSmgr(rel);
+ smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
+ log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
+ smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
+}
+
+
static const TableAmRoutine heapam_methods = {
.type = T_TableAmRoutine,
@@ -2165,7 +2188,9 @@ static const TableAmRoutine heapam_methods = {
.index_build_range_scan = IndexBuildHeapRangeScan,
- .index_validate_scan = validate_index_heapscan
+ .index_validate_scan = validate_index_heapscan,
+
+ .CreateInitFork = heap_create_init_fork
};
const TableAmRoutine *
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 38b368f916..8e7c8ce684 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -32,6 +32,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/sysattr.h"
+#include "access/tableam.h"
#include "access/transam.h"
#include "access/xact.h"
#include "access/xlog.h"
@@ -1425,7 +1426,7 @@ heap_create_with_catalog(const char *relname,
*/
if (relpersistence == RELPERSISTENCE_UNLOGGED &&
relkind != RELKIND_PARTITIONED_TABLE)
- heap_create_init_fork(new_rel_desc);
+ table_create_init_fork(new_rel_desc);
/*
* ok, the relation has been cataloged, so close our relations and return
@@ -1437,27 +1438,6 @@ heap_create_with_catalog(const char *relname,
return relid;
}
-/*
- * Set up an init fork for an unlogged table so that it can be correctly
- * reinitialized on restart. An immediate sync is required even if the
- * page has been logged, because the write did not go through
- * shared_buffers and therefore a concurrent checkpoint may have moved
- * the redo pointer past our xlog record. Recovery may as well remove it
- * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
- * record. Therefore, logging is necessary even if wal_level=minimal.
- */
-void
-heap_create_init_fork(Relation rel)
-{
- Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
- rel->rd_rel->relkind == RELKIND_MATVIEW ||
- rel->rd_rel->relkind == RELKIND_TOASTVALUE);
- RelationOpenSmgr(rel);
- smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
- log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
- smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
-}
-
/*
* RelationRemoveInheritance
*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f3526b267d..3c46a48882 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1649,7 +1649,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
RecentXmin, minmulti);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
- heap_create_init_fork(rel);
+ table_create_init_fork(rel);
heap_relid = RelationGetRelid(rel);
toast_relid = rel->rd_rel->reltoastrelid;
@@ -1663,7 +1663,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
RecentXmin, minmulti);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
- heap_create_init_fork(rel);
+ table_create_init_fork(rel);
heap_close(rel, NoLock);
}
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 7fe6ff6c22..79c71b06e5 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -194,6 +194,8 @@ struct SampleScanState;
typedef bool (*SampleScanNextBlock_function)(TableScanDesc scan, struct SampleScanState *scanstate);
typedef bool (*SampleScanNextTuple_function)(TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot);
+typedef void (*CreateInitFork_function)(Relation rel);
+
/*
* API struct for a table AM. Note this must be allocated in a
* server-lifetime manner, typically as a static const struct.
@@ -250,6 +252,8 @@ typedef struct TableAmRoutine
IndexBuildRangeScan_function index_build_range_scan;
IndexValidateScan_function index_validate_scan;
+
+ CreateInitFork_function CreateInitFork;
} TableAmRoutine;
static inline const TupleTableSlotOps*
@@ -741,6 +745,12 @@ table_index_build_range_scan(Relation heapRelation,
scan);
}
+static inline void
+table_create_init_fork(Relation relation)
+{
+ relation->rd_tableamroutine->CreateInitFork(relation);
+}
+
extern BlockNumber table_parallelscan_nextpage(TableScanDesc scan);
extern void table_parallelscan_startblock_init(TableScanDesc scan);
extern Size table_parallelscan_estimate(Snapshot snapshot);
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 4584b3473c..c0e706ecc9 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -77,8 +77,6 @@ extern Oid heap_create_with_catalog(const char *relname,
Oid relrewrite,
ObjectAddress *typaddress);
-extern void heap_create_init_fork(Relation rel);
-
extern void heap_drop_with_catalog(Oid relid);
extern void heap_truncate(List *relids);
--
2.18.0.windows.1
0001-Further-fixes-and-cleanup.patchapplication/octet-stream; name=0001-Further-fixes-and-cleanup.patchDownload
From c2dc08ce7d41294cc52cc2820578e5a1668dddf7 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 22 Oct 2018 16:06:10 +1100
Subject: [PATCH 1/2] Further fixes and cleanup
1. Remove the old slot interface file and also update the Makefile.
2. CREATE AS USING method grammer support
This change was missed during earlier USING grammer support.
3. Remove the extra Tuple visibility check
In heapgettup_pagemode the tuple visiblity check is added
during the early devlopment of pluggable storage, but the visibility
check is already carried out in heapgetpage function itself.
regression fixes
1. scan start offset fix during analyze
2. Materialize the slot before they are processed using intorel_receive
3. ROW_MARK_COPY support by force store of heap tuple
4. partition prune extra heap page fix
---
contrib/pg_visibility/pg_visibility.c | 5 ++--
src/backend/access/heap/heapam.c | 28 +++++++++--------------
src/backend/access/heap/heapam_handler.c | 6 +++--
src/backend/access/table/Makefile | 2 +-
src/backend/access/table/tableam_common.c | 0
src/backend/commands/createas.c | 4 ++++
src/backend/executor/execExprInterp.c | 3 +++
src/backend/executor/execMain.c | 13 ++---------
src/backend/executor/execTuples.c | 21 +++++++++++++++++
src/backend/executor/nodeBitmapHeapscan.c | 12 ++++++++++
src/backend/executor/nodeModifyTable.c | 12 ++++++++--
src/backend/parser/gram.y | 11 +++++----
src/include/executor/tuptable.h | 1 +
src/include/nodes/primnodes.h | 1 +
14 files changed, 79 insertions(+), 40 deletions(-)
delete mode 100644 src/backend/access/table/tableam_common.c
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index dce5262e34..88ca4fd2af 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -563,12 +563,13 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
rel = relation_open(relid, AccessShareLock);
+ /* Only some relkinds have a visibility map */
+ check_relation_relkind(rel);
+
if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("only heap AM is supported")));
- /* Only some relkinds have a visibility map */
- check_relation_relkind(rel);
nblocks = RelationGetNumberOfBlocks(rel);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ec99d0bcae..ef6b4c3e54 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -947,31 +947,25 @@ heapgettup_pagemode(HeapScanDesc scan,
/*
* if current tuple qualifies, return it.
*/
- if (HeapTupleSatisfies(tuple, scan->rs_scan.rs_snapshot, scan->rs_cbuf))
+ if (key != NULL)
{
- /*
- * if current tuple qualifies, return it.
- */
- if (key != NULL)
- {
- bool valid;
+ bool valid;
- HeapKeyTest(tuple, RelationGetDescr(scan->rs_scan.rs_rd),
- nkeys, key, valid);
- if (valid)
- {
- scan->rs_cindex = lineindex;
- LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
- return;
- }
- }
- else
+ HeapKeyTest(tuple, RelationGetDescr(scan->rs_scan.rs_rd),
+ nkeys, key, valid);
+ if (valid)
{
scan->rs_cindex = lineindex;
LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
return;
}
}
+ else
+ {
+ scan->rs_cindex = lineindex;
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+ return;
+ }
/*
* otherwise move to the next item on the page
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c3960dc91f..3254e30a45 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1741,7 +1741,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
{
HeapScanDesc scan = (HeapScanDesc) sscan;
Page targpage;
- OffsetNumber targoffset = scan->rs_cindex;
+ OffsetNumber targoffset;
OffsetNumber maxoffset;
BufferHeapTupleTableSlot *hslot;
@@ -1751,7 +1751,9 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
maxoffset = PageGetMaxOffsetNumber(targpage);
/* Inner loop over all tuples on the selected page */
- for (targoffset = scan->rs_cindex; targoffset <= maxoffset; targoffset++)
+ for (targoffset = scan->rs_cindex ? scan->rs_cindex : FirstOffsetNumber;
+ targoffset <= maxoffset;
+ targoffset++)
{
ItemId itemid;
HeapTuple targtuple = &hslot->base.tupdata;
diff --git a/src/backend/access/table/Makefile b/src/backend/access/table/Makefile
index fe22bf9208..006ba99182 100644
--- a/src/backend/access/table/Makefile
+++ b/src/backend/access/table/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/access/table
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = tableam.o tableamapi.o tableam_common.o
+OBJS = tableam.o tableamapi.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/table/tableam_common.c b/src/backend/access/table/tableam_common.c
deleted file mode 100644
index e69de29bb2..0000000000
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 84de804175..d3ffe417ff 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -593,6 +593,10 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
if (myState->rel->rd_rel->relhasoids)
slot->tts_tupleOid = InvalidOid;
+ /* Materialize the slot */
+ if (!TTS_IS_VIRTUAL(slot))
+ ExecMaterializeSlot(slot);
+
table_insert(myState->rel,
slot,
myState->output_cid,
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index ef94ac4aa0..12651f5ceb 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -570,6 +570,9 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
Assert(TTS_IS_HEAPTUPLE(scanslot) ||
TTS_IS_BUFFERTUPLE(scanslot));
+ if (hslot->tuple == NULL)
+ ExecMaterializeSlot(scanslot);
+
d = heap_getsysattr(hslot->tuple, attnum,
scanslot->tts_tupleDescriptor,
op->resnull);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index e055c0a7c6..34ef86a5bd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2594,7 +2594,7 @@ EvalPlanQual(EState *estate, EPQState *epqstate,
* datums that may be present in copyTuple). As with the next step, this
* is to guard against early re-use of the EPQ query.
*/
- if (!TupIsNull(slot))
+ if (!TupIsNull(slot) && !TTS_IS_VIRTUAL(slot))
ExecMaterializeSlot(slot);
#if FIXME
@@ -2787,16 +2787,7 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
if (isNull)
continue;
- elog(ERROR, "frak, need to implement ROW_MARK_COPY");
-#ifdef FIXME
- // FIXME: this should just deform the tuple and store it as a
- // virtual one.
- tuple = table_tuple_by_datum(erm->relation, datum, erm->relid);
-
- /* store tuple */
- EvalPlanQualSetTuple(epqstate, erm->rti, tuple);
-#endif
-
+ ExecForceStoreHeapTupleDatum(datum, slot);
}
}
}
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 917bf80f71..74149cc3ad 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -1364,6 +1364,27 @@ ExecStoreAllNullTuple(TupleTableSlot *slot)
return ExecFinishStoreSlotValues(slot);
}
+void
+ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot)
+{
+ HeapTuple tuple;
+ HeapTupleHeader td;
+
+ td = DatumGetHeapTupleHeader(data);
+
+ tuple = (HeapTuple) palloc(HEAPTUPLESIZE + HeapTupleHeaderGetDatumLength(td));
+ tuple->t_len = HeapTupleHeaderGetDatumLength(td);
+ tuple->t_self = td->t_ctid;
+ tuple->t_data = (HeapTupleHeader) ((char *) tuple + HEAPTUPLESIZE);
+ memcpy((char *) tuple->t_data, (char *) td, tuple->t_len);
+
+ ExecClearTuple(slot);
+
+ heap_deform_tuple(tuple, slot->tts_tupleDescriptor,
+ slot->tts_values, slot->tts_isnull);
+ ExecFinishStoreSlotValues(slot);
+}
+
/* --------------------------------
* ExecFetchSlotTuple
* Fetch the slot's regular physical tuple.
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 56880e3d16..36ca07beb2 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -224,6 +224,18 @@ BitmapHeapNext(BitmapHeapScanState *node)
BitmapAdjustPrefetchIterator(node, tbmres);
+ /*
+ * Ignore any claimed entries past what we think is the end of the
+ * relation. (This is probably not necessary given that we got at
+ * least AccessShareLock on the table before performing any of the
+ * indexscans, but let's be safe.)
+ */
+ if (tbmres->blockno >= scan->rs_nblocks)
+ {
+ node->tbmres = tbmres = NULL;
+ continue;
+ }
+
/*
* We can skip fetching the heap page if we don't need any fields
* from the heap, and the bitmap entries don't need rechecking,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 3cc9092413..307c12ee69 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -607,7 +607,7 @@ ExecDelete(ModifyTableState *mtstate,
bool canSetTag,
bool changingPart,
bool *tupleDeleted,
- TupleTableSlot **epqslot)
+ TupleTableSlot **epqreturnslot)
{
ResultRelInfo *resultRelInfo;
Relation resultRelationDesc;
@@ -632,7 +632,7 @@ ExecDelete(ModifyTableState *mtstate,
bool dodelete;
dodelete = ExecBRDeleteTriggers(estate, epqstate, resultRelInfo,
- tupleid, oldtuple, epqslot);
+ tupleid, oldtuple, epqreturnslot);
if (!dodelete) /* "do nothing" */
return NULL;
@@ -724,6 +724,14 @@ ldelete:;
/* Tuple no more passing quals, exiting... */
return NULL;
}
+
+ /**/
+ if (epqreturnslot)
+ {
+ *epqreturnslot = epqslot;
+ return NULL;
+ }
+
goto ldelete;
}
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 54382aba88..f030ad25a2 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4037,7 +4037,6 @@ CreateStatsStmt:
*
*****************************************************************************/
-// PBORKED: storage option
CreateAsStmt:
CREATE OptTemp TABLE create_as_target AS SelectStmt opt_with_data
{
@@ -4068,14 +4067,16 @@ CreateAsStmt:
;
create_as_target:
- qualified_name opt_column_list OptWith OnCommitOption OptTableSpace
+ qualified_name opt_column_list table_access_method_clause
+ OptWith OnCommitOption OptTableSpace
{
$$ = makeNode(IntoClause);
$$->rel = $1;
$$->colNames = $2;
- $$->options = $3;
- $$->onCommit = $4;
- $$->tableSpaceName = $5;
+ $$->accessMethod = $3;
+ $$->options = $4;
+ $$->onCommit = $5;
+ $$->tableSpaceName = $6;
$$->viewQuery = NULL;
$$->skipData = false; /* might get changed later */
}
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index 05f38cfd0d..20fc425a27 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -476,6 +476,7 @@ extern TupleTableSlot *ExecCopySlot(TupleTableSlot *dstslot,
extern void ExecForceStoreHeapTuple(HeapTuple tuple,
TupleTableSlot *slot);
+extern void ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot);
extern void slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum);
extern Datum slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull);
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 40f6eb03d2..4d194a8c2a 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -111,6 +111,7 @@ typedef struct IntoClause
RangeVar *rel; /* target relation name */
List *colNames; /* column names to assign, or NIL */
+ char *accessMethod; /* table access method */
List *options; /* options from WITH clause */
OnCommitAction onCommit; /* what do we do at COMMIT? */
char *tableSpaceName; /* table space to use, or NULL */
--
2.18.0.windows.1
On Fri, 26 Oct 2018 at 13:25, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
Here I attached the cumulative patch with all fixes that are shared in earlier mails by me.
Except fast_default test, rest of test failures are fixed.
Hi,
If I understand correctly, these patches are for the branch "pluggable-storage"
in [1]https://github.com/anarazel/postgres-pluggable-storage (at least I couldn't apply them cleanly to "pluggable-zheap" branch),
right? I've tried to experiment a bit with the current status of the patch, and
accidentally stumbled upon what seems to be an issue - when I run pgbench
agains it with some significant number of clients and script [2]https://gist.github.com/erthalion/c85ba0e12146596d24c572234501e756:
$ pgbench -T 60 -c 128 -j 64 -f zipfian.sql
I've got for some client an error:
client 117 aborted in command 5 (SQL) of script 0; ERROR:
unrecognized heap_update status: 1
This problem couldn't be reproduced on the master branch, so I've tried to
investigate it. It comes from nodeModifyTable.c:1267, when we've got
HeapTupleInvisible as a result, and this value in turn comes from
table_lock_tuple. Everything points to the new way of handling HeapTupleUpdated
result from heap_update, when table_lock_tuple call was introduced. Since I
don't see anything similar in the master branch, can anyone clarify why is this
lock necessary here? Out of curiosity I've rearranged the code, that handles
HeapTupleUpdated, back to switch and removed table_lock_tuple (see the attached
patch, it can be applied on top of the lastest two patches posted by Haribabu)
and it seems to solve the issue.
[1]: https://github.com/anarazel/postgres-pluggable-storage
[2]: https://gist.github.com/erthalion/c85ba0e12146596d24c572234501e756
Attachments:
unrecognized_heap_status.patchapplication/octet-stream; name=unrecognized_heap_status.patchDownload
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 307c12ee69..074593b1cc 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1170,42 +1170,6 @@ lreplace:;
true /* wait for commit */,
&hufd, &lockmode, &update_indexes);
- if (result == HeapTupleUpdated && !IsolationUsesXactSnapshot())
- {
- TupleTableSlot *inputslot;
-
- EvalPlanQualBegin(epqstate, estate);
-
- inputslot = EvalPlanQualSlot(epqstate, resultRelationDesc, resultRelInfo->ri_RangeTableIndex);
- ExecCopySlot(inputslot, slot);
-
- result = table_lock_tuple(resultRelationDesc, tupleid,
- estate->es_snapshot,
- inputslot, estate->es_output_cid,
- lockmode, LockWaitBlock,
- TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
- &hufd);
- /* hari FIXME*/
- /*Assert(result != HeapTupleUpdated && hufd.traversed);*/
- if (result == HeapTupleMayBeUpdated)
- {
- TupleTableSlot *epqslot;
-
- epqslot = EvalPlanQual(estate,
- epqstate,
- resultRelationDesc,
- resultRelInfo->ri_RangeTableIndex,
- inputslot);
- if (TupIsNull(epqslot))
- {
- /* Tuple no more passing quals, exiting... */
- return NULL;
- }
- slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
- goto lreplace;
- }
- }
-
switch (result)
{
case HeapTupleSelfUpdated:
@@ -1250,10 +1214,37 @@ lreplace:;
ereport(ERROR,
(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
errmsg("could not serialize access due to concurrent update")));
- else
- /* shouldn't get there */
- elog(ERROR, "wrong heap_delete status: %u", result);
- break;
+
+ if (ItemPointerIndicatesMovedPartitions(&hufd.ctid))
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("tuple to be updated was already moved to another partition due to concurrent update")));
+
+ if (!ItemPointerEquals(tupleid, &hufd.ctid))
+ {
+ TupleTableSlot *epqslot;
+ TupleTableSlot *inputslot;
+
+ EvalPlanQualBegin(epqstate, estate);
+
+ inputslot = EvalPlanQualSlot(epqstate, resultRelationDesc, resultRelInfo->ri_RangeTableIndex);
+ ExecCopySlot(inputslot, slot);
+
+ epqslot = EvalPlanQual(estate,
+ epqstate,
+ resultRelationDesc,
+ resultRelInfo->ri_RangeTableIndex,
+ inputslot);
+ if (!TupIsNull(epqslot))
+ {
+ *tupleid = hufd.ctid;
+ slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+ slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+ goto lreplace;
+ }
+ }
+ /* tuple already deleted; nothing to do */
+ return NULL;
case HeapTupleDeleted:
if (IsolationUsesXactSnapshot())
On Mon, Oct 29, 2018 at 7:40 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
On Fri, 26 Oct 2018 at 13:25, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
Here I attached the cumulative patch with all fixes that are shared in
earlier mails by me.
Except fast_default test, rest of test failures are fixed.
Hi,
If I understand correctly, these patches are for the branch
"pluggable-storage"
in [1] (at least I couldn't apply them cleanly to "pluggable-zheap"
branch),
right?
Yes, the patches attached are for pluggable-storage branch.
I've tried to experiment a bit with the current status of the patch, and
accidentally stumbled upon what seems to be an issue - when I run pgbench
agains it with some significant number of clients and script [2]:$ pgbench -T 60 -c 128 -j 64 -f zipfian.sql
Thanks for testing the patches.
I've got for some client an error:
client 117 aborted in command 5 (SQL) of script 0; ERROR:
unrecognized heap_update status: 1
This error is for the tuple state of HeapTupleInvisible, As per the comments
in heap_lock_tuple, this is possible in ON CONFLICT UPDATE, but because
of reorganizing of the table_lock_tuple out of EvalPlanQual(), the invisible
error is returned in other cases also. This case is missed in the new code.
This problem couldn't be reproduced on the master branch, so I've tried to
investigate it. It comes from nodeModifyTable.c:1267, when we've got
HeapTupleInvisible as a result, and this value in turn comes from
table_lock_tuple. Everything points to the new way of handling
HeapTupleUpdated
result from heap_update, when table_lock_tuple call was introduced. Since I
don't see anything similar in the master branch, can anyone clarify why is
this
lock necessary here?
In the master branch code also, there is a tuple lock that is happening in
EvalPlanQual() function, but pluggable-storage code, the lock is kept
outside
and also function call rearrangements, to make it easier for the table
access
methods to provide their own MVCC implementation.
Out of curiosity I've rearranged the code, that handles
HeapTupleUpdated, back to switch and removed table_lock_tuple (see the
attached
patch, it can be applied on top of the lastest two patches posted by
Haribabu)
and it seems to solve the issue.
Thanks for the patch. I didn't reproduce the problem, based on the error
from
your mail, the attached draft patch of handling of invisible tuples in
update and
delete cases should also fix it.
Regards,
Haribabu Kommi
Fujitsu Australia
Attachments:
0001-Handling-HeapTupleInvisible-case.patchapplication/octet-stream; name=0001-Handling-HeapTupleInvisible-case.patchDownload
From c34faa5ad6af9ef4562feabd6bb4d361fe813945 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 29 Oct 2018 15:53:30 +1100
Subject: [PATCH] Handling HeapTupleInvisible case
In update/delete scenarios, when the tuple is
concurrently updated/deleted, sometimes locking
of a tuple may return HeapTupleInvisible. Handle
that case as nothing to do.
---
src/backend/executor/nodeModifyTable.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 307c12ee69..b3851b180d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -734,6 +734,11 @@ ldelete:;
goto ldelete;
}
+ else if (result == HeapTupleInvisible)
+ {
+ /* tuple is not visible; nothing to do */
+ return NULL;
+ }
}
switch (result)
@@ -1204,6 +1209,11 @@ lreplace:;
slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
goto lreplace;
}
+ else if (result == HeapTupleInvisible)
+ {
+ /* tuple is not visible; nothing to do */
+ return NULL;
+ }
}
switch (result)
--
2.18.0.windows.1
On Mon, 29 Oct 2018 at 05:56, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
This problem couldn't be reproduced on the master branch, so I've tried to
investigate it. It comes from nodeModifyTable.c:1267, when we've got
HeapTupleInvisible as a result, and this value in turn comes from
table_lock_tuple. Everything points to the new way of handling HeapTupleUpdated
result from heap_update, when table_lock_tuple call was introduced. Since I
don't see anything similar in the master branch, can anyone clarify why is this
lock necessary here?In the master branch code also, there is a tuple lock that is happening in
EvalPlanQual() function, but pluggable-storage code, the lock is kept outside
and also function call rearrangements, to make it easier for the table access
methods to provide their own MVCC implementation.
Yes, now I see it, thanks. Also I can confirm that the attached patch solves
this issue.
FYI, alongside with reviewing the code changes I've ran few performance tests
(that's why I hit this issue with pgbench in the first place). In case of high
concurrecy so far I see small performance degradation in comparison with the
master branch (about 2-5% of average latency, depending on the level of
concurrency), but can't really say why exactly (perf just shows barely
noticeable overhead there and there, maybe what I see is actually a cumulative
impact).
On Wed, Oct 31, 2018 at 9:34 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
On Mon, 29 Oct 2018 at 05:56, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
This problem couldn't be reproduced on the master branch, so I've tried
to
investigate it. It comes from nodeModifyTable.c:1267, when we've got
HeapTupleInvisible as a result, and this value in turn comes from
table_lock_tuple. Everything points to the new way of handlingHeapTupleUpdated
result from heap_update, when table_lock_tuple call was introduced.
Since I
don't see anything similar in the master branch, can anyone clarify why
is this
lock necessary here?
In the master branch code also, there is a tuple lock that is happening
in
EvalPlanQual() function, but pluggable-storage code, the lock is kept
outside
and also function call rearrangements, to make it easier for the table
access
methods to provide their own MVCC implementation.
Yes, now I see it, thanks. Also I can confirm that the attached patch
solves
this issue.
Thanks for the testing and confirmation.
FYI, alongside with reviewing the code changes I've ran few performance
tests
(that's why I hit this issue with pgbench in the first place). In case of
high
concurrecy so far I see small performance degradation in comparison with
the
master branch (about 2-5% of average latency, depending on the level of
concurrency), but can't really say why exactly (perf just shows barely
noticeable overhead there and there, maybe what I see is actually a
cumulative
impact).
Thanks for sharing your observation, I will also analyze and try to find
out performance
bottlenecks that are causing the overhead.
Here I attached the cumulative fixes of the patches, new API additions for
zheap and
basic outline of the documentation.
Regards,
Haribabu Kommi
Fujitsu Australia
Attachments:
0003-First-draft-of-pluggable-storage-documentation.patchapplication/octet-stream; name=0003-First-draft-of-pluggable-storage-documentation.patchDownload
From 2a530ea1306c291a40fff4042a0b1a5755dcefc9 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 1 Nov 2018 12:00:10 +1100
Subject: [PATCH 3/3] First draft of pluggable-storage documentation
---
doc/src/sgml/{indexam.sgml => am.sgml} | 590 ++++++++++++++++++++-
doc/src/sgml/catalogs.sgml | 5 +-
doc/src/sgml/config.sgml | 24 +
doc/src/sgml/filelist.sgml | 2 +-
doc/src/sgml/postgres.sgml | 2 +-
doc/src/sgml/ref/create_access_method.sgml | 12 +-
doc/src/sgml/ref/create_table.sgml | 18 +-
doc/src/sgml/ref/create_table_as.sgml | 14 +
doc/src/sgml/release-9.6.sgml | 2 +-
doc/src/sgml/xindex.sgml | 2 +-
10 files changed, 640 insertions(+), 31 deletions(-)
rename doc/src/sgml/{indexam.sgml => am.sgml} (78%)
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/am.sgml
similarity index 78%
rename from doc/src/sgml/indexam.sgml
rename to doc/src/sgml/am.sgml
index beb99d1831..dc13bc1073 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/am.sgml
@@ -1,16 +1,20 @@
-<!-- doc/src/sgml/indexam.sgml -->
+<!-- doc/src/sgml/am.sgml -->
-<chapter id="indexam">
- <title>Index Access Method Interface Definition</title>
+<chapter id="am">
+ <title>Access Method Interface Definition</title>
<para>
This chapter defines the interface between the core
- <productname>PostgreSQL</productname> system and <firstterm>index access
- methods</firstterm>, which manage individual index types. The core system
- knows nothing about indexes beyond what is specified here, so it is
- possible to develop entirely new index types by writing add-on code.
+ <productname>PostgreSQL</productname> system and <firstterm>access
+ methods</firstterm>, which manage individual <literal>INDEX</literal>
+ and <literal>TABLE</literal> types. The core system knows nothing
+ about these access methods beyond what is specified here, so it is
+ possible to develop entirely new access method types by writing add-on code.
</para>
-
+
+ <sect1 id="index-access-methods">
+ <title>Overview of Index access methods</title>
+
<para>
All indexes in <productname>PostgreSQL</productname> are what are known
technically as <firstterm>secondary indexes</firstterm>; that is, the index is
@@ -42,8 +46,8 @@
dead tuples are reclaimed (by vacuuming) when the dead tuples themselves
are reclaimed.
</para>
-
- <sect1 id="index-api">
+
+ <sect2 id="index-api">
<title>Basic API Structure for Indexes</title>
<para>
@@ -217,9 +221,9 @@ typedef struct IndexAmRoutine
conditions.
</para>
- </sect1>
+ </sect2>
- <sect1 id="index-functions">
+ <sect2 id="index-functions">
<title>Index Access Method Functions</title>
<para>
@@ -709,9 +713,11 @@ amparallelrescan (IndexScanDesc scan);
the beginning.
</para>
- </sect1>
+ </sect2>
+
+
- <sect1 id="index-scanning">
+ <sect2 id="index-scanning">
<title>Index Scanning</title>
<para>
@@ -864,9 +870,9 @@ amparallelrescan (IndexScanDesc scan);
if its internal implementation is unsuited to one API or the other.
</para>
- </sect1>
+ </sect2>
- <sect1 id="index-locking">
+ <sect2 id="index-locking">
<title>Index Locking Considerations</title>
<para>
@@ -978,9 +984,9 @@ amparallelrescan (IndexScanDesc scan);
reduce the frequency of such transaction cancellations.
</para>
- </sect1>
+ </sect2>
- <sect1 id="index-unique-checks">
+ <sect2 id="index-unique-checks">
<title>Index Uniqueness Checks</title>
<para>
@@ -1127,9 +1133,9 @@ amparallelrescan (IndexScanDesc scan);
</itemizedlist>
</para>
- </sect1>
+ </sect2>
- <sect1 id="index-cost-estimation">
+ <sect2 id="index-cost-estimation">
<title>Index Cost Estimation Functions</title>
<para>
@@ -1376,5 +1382,549 @@ cost_qual_eval(&index_qual_cost, path->indexquals, root);
Examples of cost estimator functions can be found in
<filename>src/backend/utils/adt/selfuncs.c</filename>.
</para>
+ </sect2>
</sect1>
+
+ <sect1 id="table-access-methods">
+ <title>Overview of Table access methods </title>
+
+ <para>
+ All Tables in <productname>PostgreSQL</productname> are the primary data store.
+ Each table is stored as its own physical <firstterm>relation</firstterm> and so
+ is described by an entry in the <structname>pg_class</structname> catalog.
+ The contents of an table are entirely under the control of its access method.
+ (All the access methods furthermore use the standard page layout described in
+ <xref linkend="storage-page-layout"/>.)
+ </para>
+
+ <sect2 id="table-api">
+ <title>Table access method API</title>
+
+ <para>
+ Each table access method is described by a row in the
+ <link linkend="catalog-pg-am"><structname>pg_am</structname></link>
+ system catalog. The <structname>pg_am</structname> entry
+ specifies a name and a <firstterm>handler function</firstterm> for the access
+ method. These entries can be created and deleted using the
+ <xref linkend="sql-create-access-method"/> and
+ <xref linkend="sql-drop-access-method"/> SQL commands.
+ </para>
+
+ <para>
+ A table access method handler function must be declared to accept a
+ single argument of type <type>internal</type> and to return the
+ pseudo-type <type>table_am_handler</type>. The argument is a dummy value that
+ simply serves to prevent handler functions from being called directly from
+ SQL commands. The result of the function must be a palloc'd struct of
+ type <structname>TableAmRoutine</structname>, which contains everything
+ that the core code needs to know to make use of the table access method.
+ The <structname>TableAmRoutine</structname> struct, also called the access
+ method's <firstterm>API struct</firstterm>, includes fields specifying assorted
+ fixed properties of the access method, such as whether it can support
+ bitmap scans. More importantly, it contains pointers to support
+ functions for the access method, which do all of the real work to access
+ tables. These support functions are plain C functions and are not
+ visible or callable at the SQL level. The support functions are described
+ in <xref linkend="table-functions"/>.
+ </para>
+
+ <para>
+ The structure <structname>TableAmRoutine</structname> is defined thus:
+<programlisting>
+typedef struct TableAmRoutine
+{
+ NodeTag type;
+
+ SlotCallbacks_function slot_callbacks;
+
+ SnapshotSatisfies_function snapshot_satisfies;
+ SnapshotSatisfiesUpdate_function snapshot_satisfiesUpdate;
+ SnapshotSatisfiesVacuum_function snapshot_satisfiesVacuum;
+
+ /* Operations on physical tuples */
+ TupleInsert_function tuple_insert;
+ TupleInsertSpeculative_function tuple_insert_speculative;
+ TupleCompleteSpeculative_function tuple_complete_speculative;
+ TupleUpdate_function tuple_update;
+ TupleDelete_function tuple_delete;
+ TupleFetchRowVersion_function tuple_fetch_row_version;
+ TupleLock_function tuple_lock;
+ MultiInsert_function multi_insert;
+ TupleGetLatestTid_function tuple_get_latest_tid;
+ TupleFetchFollow_function tuple_fetch_follow;
+
+ GetTupleData_function get_tuple_data;
+
+ RelationVacuum_function relation_vacuum;
+ RelationScanAnalyzeNextBlock_function scan_analyze_next_block;
+ RelationScanAnalyzeNextTuple_function scan_analyze_next_tuple;
+ RelationCopyForCluster_function relation_copy_for_cluster;
+ RelationSync_function relation_sync;
+
+ /* Operations on relation scans */
+ ScanBegin_function scan_begin;
+ ScanSetlimits_function scansetlimits;
+ ScanGetnextSlot_function scan_getnextslot;
+
+ BitmapPagescan_function scan_bitmap_pagescan;
+ BitmapPagescanNext_function scan_bitmap_pagescan_next;
+
+ SampleScanNextBlock_function scan_sample_next_block;
+ SampleScanNextTuple_function scan_sample_next_tuple;
+
+ ScanEnd_function scan_end;
+ ScanRescan_function scan_rescan;
+ ScanUpdateSnapshot_function scan_update_snapshot;
+
+ BeginIndexFetchTable_function begin_index_fetch;
+ EndIndexFetchTable_function reset_index_fetch;
+ EndIndexFetchTable_function end_index_fetch;
+
+
+ IndexBuildRangeScan_function index_build_range_scan;
+ IndexValidateScan_function index_validate_scan;
+
+ CreateInitFork_function CreateInitFork;
+} TableAmRoutine;
+</programlisting>
+ </para>
+
+ <para>
+ An individual table is defined by a
+ <link linkend="catalog-pg-class"><structname>pg_class</structname></link>
+ entry that describes it as a physical relation.
+ </para>
+
+ </sect2>
+
+ <sect2 id="table-functions">
+ <title>Table Access Method Functions</title>
+
+ <para>
+ The table construction and maintenance functions that an table access
+ method must provide in <structname>TableAmRoutine</structname> are:
+ </para>
+
+ <para>
+<programlisting>
+TupleTableSlotOps *
+slot_callbacks (Relation relation);
+</programlisting>
+ API to access the slot specific methods;
+ Following methods are available;
+ <structname>TTSOpsVirtual</structname>,
+ <structname>TTSOpsHeapTuple</structname>,
+ <structname>TTSOpsMinimalTuple</structname>,
+ <structname>TTSOpsBufferTuple</structname>,
+ </para>
+
+ <para>
+<programlisting>
+bool
+snapshot_satisfies (TupleTableSlot *slot, Snapshot snapshot);
+</programlisting>
+ API to check whether the provided slot is visible to the current
+ transaction according the snapshot.
+ </para>
+
+ <para>
+<programlisting>
+Oid
+tuple_insert (Relation rel, TupleTableSlot *slot, CommandId cid,
+ int options, BulkInsertState bistate);
+</programlisting>
+ API to insert the tuple and provide the <literal>ItemPointerData</literal>
+ where the tuple is successfully inserted.
+ </para>
+
+ <para>
+<programlisting>
+Oid
+tuple_insert_speculative (Relation rel,
+ TupleTableSlot *slot,
+ CommandId cid,
+ int options,
+ BulkInsertState bistate,
+ uint32 specToken);
+</programlisting>
+ API to insert the tuple with a speculative token. This API is similar
+ like <literal>tuple_insert</literal>, with additional speculative
+ information.
+ </para>
+
+ <para>
+<programlisting>
+void
+tuple_complete_speculative (Relation rel,
+ TupleTableSlot *slot,
+ uint32 specToken,
+ bool succeeded);
+</programlisting>
+ API to complete the state of the tuple inserted by the API <literal>tuple_insert_speculative</literal>
+ with the successful completion of the index insert.
+ </para>
+
+
+ <para>
+<programlisting>
+HTSU_Result
+tuple_update (Relation relation,
+ ItemPointer otid,
+ TupleTableSlot *slot,
+ CommandId cid,
+ Snapshot crosscheck,
+ bool wait,
+ HeapUpdateFailureData *hufd,
+ LockTupleMode *lockmode,
+ bool *update_indexes);
+</programlisting>
+ API to update the existing tuple with new data.
+ </para>
+
+
+ <para>
+<programlisting>
+HTSU_Result
+tuple_delete (Relation relation,
+ ItemPointer tid,
+ CommandId cid,
+ Snapshot crosscheck,
+ bool wait,
+ HeapUpdateFailureData *hufd,
+ bool changingPart);
+</programlisting>
+ API to delete the existing tuple.
+ </para>
+
+
+ <para>
+<programlisting>
+bool
+tuple_fetch_row_version (Relation relation,
+ ItemPointer tid,
+ Snapshot snapshot,
+ TupleTableSlot *slot,
+ Relation stats_relation);
+</programlisting>
+ API to fetch and store the Buffered Heap tuple in the provided slot
+ based on the ItemPointer.
+ </para>
+
+
+ <para>
+<programlisting>
+HTSU_Result
+TupleLock_function (Relation relation,
+ ItemPointer tid,
+ Snapshot snapshot,
+ TupleTableSlot *slot,
+ CommandId cid,
+ LockTupleMode mode,
+ LockWaitPolicy wait_policy,
+ uint8 flags,
+ HeapUpdateFailureData *hufd);
+</programlisting>
+ API to lock the specified the ItemPointer tuple and fetches the newest version of
+ its tuple and TID.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+multi_insert (Relation relation, TupleTableSlot **slots, int nslots,
+ CommandId cid, int options, BulkInsertState bistate);
+</programlisting>
+ API to insert multiple tuples at a time into the relation.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+tuple_get_latest_tid (Relation relation,
+ Snapshot snapshot,
+ ItemPointer tid);
+</programlisting>
+ API to get the the latest TID of the tuple with the given itempointer.
+ </para>
+
+
+ <para>
+<programlisting>
+bool
+tuple_fetch_follow (struct IndexFetchTableData *scan,
+ ItemPointer tid,
+ Snapshot snapshot,
+ TupleTableSlot *slot,
+ bool *call_again, bool *all_dead);
+</programlisting>
+ API to get the all the tuples of the page that satisfies itempointer.
+ </para>
+
+
+ <para>
+<programlisting>
+tuple_data
+get_tuple_data (TupleTableSlot *slot, tuple_data_flags flags);
+</programlisting>
+ API to return the internal structure members of the HeapTuple.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+relation_vacuum (Relation onerel, int options,
+ struct VacuumParams *params, BufferAccessStrategy bstrategy);
+</programlisting>
+ API to perform vacuum for one heap relation.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+scan_analyze_next_block (TableScanDesc scan, BlockNumber blockno,
+ BufferAccessStrategy bstrategy);
+</programlisting>
+ API to fill the scan descriptor with the buffer of the specified block.
+ </para>
+
+
+ <para>
+<programlisting>
+bool
+scan_analyze_next_tuple (TableScanDesc scan, TransactionId OldestXmin,
+ double *liverows, double *deadrows, TupleTableSlot *slot));
+</programlisting>
+ API to analyze the block and fill the buffered heap tuple in the slot and also
+ provide the live and dead rows.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+relation_copy_for_cluster (Relation NewHeap, Relation OldHeap, Relation OldIndex,
+ bool use_sort,
+ TransactionId OldestXmin, TransactionId FreezeXid, MultiXactId MultiXactCutoff,
+ double *num_tuples, double *tups_vacuumed, double *tups_recently_dead);
+</programlisting>
+ API to copy one relation to another relation eith using the Index or table scan.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+relation_sync (Relation relation);
+</programlisting>
+ API to sync the relation to disk, useful for the cases where no WAL is written.
+ </para>
+
+
+ <para>
+<programlisting>
+TableScanDesc
+scan_begin (Relation relation,
+ Snapshot snapshot,
+ int nkeys, ScanKey key,
+ ParallelTableScanDesc parallel_scan,
+ bool allow_strat,
+ bool allow_sync,
+ bool allow_pagemode,
+ bool is_bitmapscan,
+ bool is_samplescan,
+ bool temp_snap);
+</programlisting>
+ API to start the relation scan for the provided relation and returns the
+ <structname>TableScanDesc</structname> structure.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+scansetlimits (TableScanDesc sscan, BlockNumber startBlk, BlockNumber numBlks);
+</programlisting>
+ API to fix the relation scan range limits.
+ </para>
+
+
+ <para>
+<programlisting>
+TupleTableSlot *
+scan_getnextslot (TableScanDesc scan,
+ ScanDirection direction, TupleTableSlot *slot);
+</programlisting>
+ API to fill the next visible tuple from the relation scan in the provided slot
+ and return it.
+ </para>
+
+
+ <para>
+<programlisting>
+bool
+scan_bitmap_pagescan (TableScanDesc scan,
+ TBMIterateResult *tbmres);
+</programlisting>
+ API to scan the relation and fill the scan description bitmap with valid item pointers
+ for the specified block.
+ </para>
+
+
+ <para>
+<programlisting>
+bool
+scan_bitmap_pagescan_next (TableScanDesc scan,
+ TupleTableSlot *slot);
+</programlisting>
+ API to fill the buffered heap tuple data from the bitmap scanned item pointers and store
+ it in the provided slot.
+ </para>
+
+
+ <para>
+<programlisting>
+bool
+scan_sample_next_block (TableScanDesc scan, struct SampleScanState *scanstate);
+</programlisting>
+ API to scan the relation and fill the scan description bitmap with valid item pointers
+ for the specified block provided by the sample method.
+ </para>
+
+
+ <para>
+<programlisting>
+bool
+scan_sample_next_tuple (TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot);
+</programlisting>
+ API to fill the buffered heap tuple data from the bitmap scanned item pointers based on the sample
+ method and store it in the provided slot.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+scan_end (TableScanDesc scan);
+</programlisting>
+ API to end the relation scan.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+scan_rescan (TableScanDesc scan, ScanKey key, bool set_params,
+ bool allow_strat, bool allow_sync, bool allow_pagemode);
+</programlisting>
+ API to restart the relation scan with provided data.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+scan_update_snapshot (TableScanDesc scan, Snapshot snapshot);
+</programlisting>
+ API to update the relation scan with the new snapshot.
+ </para>
+
+ <para>
+<programlisting>
+IndexFetchTableData *
+begin_index_fetch (Relation relation);
+</programlisting>
+ API to prepare the <structname>IndexFetchTableData</structname> for the relation.
+ </para>
+
+ <para>
+<programlisting>
+void
+reset_index_fetch (struct IndexFetchTableData* data);
+</programlisting>
+ API to reset the prepared internal members of the <structname>IndexFetchTableData</structname>.
+ </para>
+
+ <para>
+<programlisting>
+void
+end_index_fetch (struct IndexFetchTableData* data);
+</programlisting>
+ API to clear and free the <structname>IndexFetchTableData</structname>.
+ </para>
+
+ <para>
+<programlisting>
+double
+index_build_range_scan (Relation heapRelation,
+ Relation indexRelation,
+ IndexInfo *indexInfo,
+ bool allow_sync,
+ bool anyvisible,
+ BlockNumber start_blockno,
+ BlockNumber end_blockno,
+ IndexBuildCallback callback,
+ void *callback_state,
+ TableScanDesc scan);
+</programlisting>
+ API to perform the table scan with bounded range specified by the caller
+ and insert the satisfied records into the index using the provided callback
+ function pointer.
+ </para>
+
+ <para>
+<programlisting>
+void
+index_validate_scan (Relation heapRelation,
+ Relation indexRelation,
+ IndexInfo *indexInfo,
+ Snapshot snapshot,
+ struct ValidateIndexState *state);
+</programlisting>
+ API to perform the table scan and insert the satisfied records into the index.
+ This API is similar like <function>index_build_range_scan</function>. This
+ is used in the scenario of concurrent index build.
+ </para>
+
+ </sect2>
+
+ <sect2>
+ <title>Table scanning</title>
+
+ <para>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Table insert/update/delete</title>
+
+ <para>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Table locking</title>
+
+ <para>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Table vacuum</title>
+
+ <para>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Table fetch</title>
+
+ <para>
+ </para>
+ </sect2>
+
+ </sect1>
</chapter>
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 0179deea2e..f0c8037bbc 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -587,8 +587,9 @@
The catalog <structname>pg_am</structname> stores information about
relation access methods. There is one row for each access method supported
by the system.
- Currently, only indexes have access methods. The requirements for index
- access methods are discussed in detail in <xref linkend="indexam"/>.
+ Currently, only <literal>INDEX</literal> and <literal>TABLE</literal> have
+ access methods. The requirements for access methods are discussed in detail
+ in <xref linkend="am"/>.
</para>
<table>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f11b8f724c..8765d7c57c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6585,6 +6585,30 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-default-table-access-method" xreflabel="default_table_access_method">
+ <term><varname>default_table_access_method</varname> (<type>string</type>)
+ <indexterm>
+ <primary><varname>default_table_access_method</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ This variable specifies the default table access method using which to create
+ objects (tables and materialized views) when a <command>CREATE</command> command does
+ not explicitly specify a access method.
+ </para>
+
+ <para>
+ The value is either the name of a table access method, or an empty string
+ to specify using the default table access method of the current database.
+ If the value does not match the name of any existing table access methods,
+ <productname>PostgreSQL</productname> will automatically use the default
+ table access method of the current database.
+ </para>
+
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-default-tablespace" xreflabel="default_tablespace">
<term><varname>default_tablespace</varname> (<type>string</type>)
<indexterm>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 48ac14a838..99a6496502 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -90,7 +90,7 @@
<!ENTITY gin SYSTEM "gin.sgml">
<!ENTITY brin SYSTEM "brin.sgml">
<!ENTITY planstats SYSTEM "planstats.sgml">
-<!ENTITY indexam SYSTEM "indexam.sgml">
+<!ENTITY am SYSTEM "am.sgml">
<!ENTITY nls SYSTEM "nls.sgml">
<!ENTITY plhandler SYSTEM "plhandler.sgml">
<!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 0070603fc3..3e66ae9c8a 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -251,7 +251,7 @@
&tablesample-method;
&custom-scan;
&geqo;
- &indexam;
+ &am;
&generic-wal;
&btree;
&gist;
diff --git a/doc/src/sgml/ref/create_access_method.sgml b/doc/src/sgml/ref/create_access_method.sgml
index 851c5e63be..256914022a 100644
--- a/doc/src/sgml/ref/create_access_method.sgml
+++ b/doc/src/sgml/ref/create_access_method.sgml
@@ -61,7 +61,8 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
<listitem>
<para>
This clause specifies the type of access method to define.
- Only <literal>INDEX</literal> is supported at present.
+ Only <literal>INDEX</literal> and <literal>TABLE</literal>
+ are supported at present.
</para>
</listitem>
</varlistentry>
@@ -76,9 +77,12 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
declared to take a single argument of type <type>internal</type>,
and its return type depends on the type of access method;
for <literal>INDEX</literal> access methods, it must
- be <type>index_am_handler</type>. The C-level API that the handler
- function must implement varies depending on the type of access method.
- The index access method API is described in <xref linkend="indexam"/>.
+ be <type>index_am_handler</type> and for <literal>TABLE</literal>
+ access methods, it must be <type>table_am_handler</type>.
+ The C-level API that the handler function must implement varies
+ depending on the type of access method. The index access method API
+ is described in <xref linkend="index-access-methods"/> and the table access method
+ API is described in <xref linkend="table-access-methods"/>.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 10428f8ff0..87e0f01ab2 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -29,6 +29,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
] )
[ INHERITS ( <replaceable>parent_table</replaceable> [, ... ] ) ]
[ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
[ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -40,6 +41,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
[, ... ]
) ]
[ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
[ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -51,6 +53,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
[, ... ]
) ] { FOR VALUES <replaceable class="parameter">partition_bound_spec</replaceable> | DEFAULT }
[ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
[ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -955,7 +958,7 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
<para>
The access method must support <literal>amgettuple</literal> (see <xref
- linkend="indexam"/>); at present this means <acronym>GIN</acronym>
+ linkend="index-access-methods"/>); at present this means <acronym>GIN</acronym>
cannot be used. Although it's allowed, there is little point in using
B-tree or hash indexes with an exclusion constraint, because this
does nothing that an ordinary unique constraint doesn't do better.
@@ -1138,6 +1141,19 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+ <listitem>
+ <para>
+ This clause specifies optional access method for the new table;
+ see <xref linkend="table-access-methods"/> for more information.
+ If this option is not specified, then the default table access method
+ is chosen for the new table. see <xref linkend="guc-default-table-access-method"/>
+ for more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
<listitem>
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 527138e787..2acf52d2f5 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
<synopsis>
CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
[ (<replaceable>column_name</replaceable> [, ...] ) ]
+ [ USING <replaceable class="parameter">method</replaceable> ]
[ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -120,6 +121,19 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+ <listitem>
+ <para>
+ This clause specifies optional access method for the new table;
+ see <xref linkend="table-access-methods"/> for more information.
+ If this option is not specified, then the default table access method
+ is chosen for the new table. see <xref linkend="guc-default-table-access-method"/>
+ for more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
<listitem>
diff --git a/doc/src/sgml/release-9.6.sgml b/doc/src/sgml/release-9.6.sgml
index acb6a88b31..68c79db4b5 100644
--- a/doc/src/sgml/release-9.6.sgml
+++ b/doc/src/sgml/release-9.6.sgml
@@ -10081,7 +10081,7 @@ This commit is also listed under libpq and PL/pgSQL
2016-08-13 [ed0097e4f] Add SQL-accessible functions for inspecting index AM pro
-->
<para>
- Restructure <link linkend="indexam">index access
+ Restructure <link linkend="index-access-methods">index access
method <acronym>API</acronym></link> to hide most of it at
the <application>C</application> level (Alexander Korotkov, Andrew Gierth)
</para>
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml
index 9446f8b836..4fa821160c 100644
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@@ -36,7 +36,7 @@
described in <classname>pg_am</classname>. It is possible to add a
new index access method by writing the necessary code and
then creating an entry in <classname>pg_am</classname> — but that is
- beyond the scope of this chapter (see <xref linkend="indexam"/>).
+ beyond the scope of this chapter (see <xref linkend="am"/>).
</para>
<para>
--
2.18.0.windows.1
0002-New-API-s-are-added.patchapplication/octet-stream; name=0002-New-API-s-are-added.patchDownload
From caf9356688a722c2246969e2f1421bf25c7c882e Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 22 Oct 2018 17:58:27 +1100
Subject: [PATCH 2/3] New API's are added
1. init fork API
2. set New filenode
3. Estimate rel size
Set New filenode and estimate rel size are added as function hooks
and these not complusory for heap, when they exist, it take over
the control.
---
src/backend/access/heap/heapam_handler.c | 27 +++++++++++++++++++++++-
src/backend/catalog/heap.c | 24 ++-------------------
src/backend/commands/tablecmds.c | 4 ++--
src/backend/optimizer/util/plancat.c | 12 +++++++++++
src/backend/utils/cache/relcache.c | 12 +++++++++++
src/include/access/tableam.h | 16 ++++++++++++++
src/include/catalog/heap.h | 2 --
7 files changed, 70 insertions(+), 27 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3254e30a45..ae832e1f71 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -30,6 +30,7 @@
#include "catalog/catalog.h"
#include "catalog/index.h"
#include "catalog/pg_am_d.h"
+#include "catalog/storage_xlog.h"
#include "executor/executor.h"
#include "pgstat.h"
#include "storage/lmgr.h"
@@ -2118,6 +2119,28 @@ heap_copy_for_cluster(Relation OldHeap, Relation NewHeap, Relation OldIndex,
pfree(isnull);
}
+/*
+ * Set up an init fork for an unlogged table so that it can be correctly
+ * reinitialized on restart. An immediate sync is required even if the
+ * page has been logged, because the write did not go through
+ * shared_buffers and therefore a concurrent checkpoint may have moved
+ * the redo pointer past our xlog record. Recovery may as well remove it
+ * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
+ * record. Therefore, logging is necessary even if wal_level=minimal.
+ */
+static void
+heap_create_init_fork(Relation rel)
+{
+ Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
+ rel->rd_rel->relkind == RELKIND_MATVIEW ||
+ rel->rd_rel->relkind == RELKIND_TOASTVALUE);
+ RelationOpenSmgr(rel);
+ smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
+ log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
+ smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
+}
+
+
static const TableAmRoutine heapam_methods = {
.type = T_TableAmRoutine,
@@ -2165,7 +2188,9 @@ static const TableAmRoutine heapam_methods = {
.index_build_range_scan = IndexBuildHeapRangeScan,
- .index_validate_scan = validate_index_heapscan
+ .index_validate_scan = validate_index_heapscan,
+
+ .CreateInitFork = heap_create_init_fork
};
const TableAmRoutine *
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 38b368f916..8e7c8ce684 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -32,6 +32,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/sysattr.h"
+#include "access/tableam.h"
#include "access/transam.h"
#include "access/xact.h"
#include "access/xlog.h"
@@ -1425,7 +1426,7 @@ heap_create_with_catalog(const char *relname,
*/
if (relpersistence == RELPERSISTENCE_UNLOGGED &&
relkind != RELKIND_PARTITIONED_TABLE)
- heap_create_init_fork(new_rel_desc);
+ table_create_init_fork(new_rel_desc);
/*
* ok, the relation has been cataloged, so close our relations and return
@@ -1437,27 +1438,6 @@ heap_create_with_catalog(const char *relname,
return relid;
}
-/*
- * Set up an init fork for an unlogged table so that it can be correctly
- * reinitialized on restart. An immediate sync is required even if the
- * page has been logged, because the write did not go through
- * shared_buffers and therefore a concurrent checkpoint may have moved
- * the redo pointer past our xlog record. Recovery may as well remove it
- * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
- * record. Therefore, logging is necessary even if wal_level=minimal.
- */
-void
-heap_create_init_fork(Relation rel)
-{
- Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
- rel->rd_rel->relkind == RELKIND_MATVIEW ||
- rel->rd_rel->relkind == RELKIND_TOASTVALUE);
- RelationOpenSmgr(rel);
- smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
- log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
- smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
-}
-
/*
* RelationRemoveInheritance
*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f3526b267d..3c46a48882 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1649,7 +1649,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
RecentXmin, minmulti);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
- heap_create_init_fork(rel);
+ table_create_init_fork(rel);
heap_relid = RelationGetRelid(rel);
toast_relid = rel->rd_rel->reltoastrelid;
@@ -1663,7 +1663,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
RecentXmin, minmulti);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
- heap_create_init_fork(rel);
+ table_create_init_fork(rel);
heap_close(rel, NoLock);
}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8da468a86f..3355f8bff4 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -947,6 +947,18 @@ estimate_rel_size(Relation rel, int32 *attr_widths,
BlockNumber relallvisible;
double density;
+ /*
+ * If the relation contains any specific EstimateRelSize
+ * function, use that instead of the regular default heap method.
+ */
+ if (rel->rd_tableamroutine &&
+ rel->rd_tableamroutine->EstimateRelSize)
+ {
+ rel->rd_tableamroutine->EstimateRelSize(rel, attr_widths, pages,
+ tuples, allvisfrac);
+ return;
+ }
+
switch (rel->rd_rel->relkind)
{
case RELKIND_RELATION:
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 0d6e5a189f..9cc8e98e40 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3424,6 +3424,18 @@ RelationSetNewRelfilenode(Relation relation, char persistence,
HeapTuple tuple;
Form_pg_class classform;
+ /*
+ * If the relation contains any specific SetNewFilenode
+ * function, use that instead of the regular default heap method.
+ */
+ if (relation->rd_tableamroutine &&
+ relation->rd_tableamroutine->SetNewFileNode)
+ {
+ relation->rd_tableamroutine->SetNewFileNode(relation, persistence,
+ freezeXid, minmulti);
+ return;
+ }
+
/* Indexes, sequences must have Invalid frozenxid; other rels must not */
Assert((relation->rd_rel->relkind == RELKIND_INDEX ||
relation->rd_rel->relkind == RELKIND_SEQUENCE) ?
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 7fe6ff6c22..eb7c9b8007 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -194,6 +194,12 @@ struct SampleScanState;
typedef bool (*SampleScanNextBlock_function)(TableScanDesc scan, struct SampleScanState *scanstate);
typedef bool (*SampleScanNextTuple_function)(TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot);
+typedef void (*CreateInitFork_function)(Relation rel);
+typedef void (*EstimateRelSize_function)(Relation rel, int32 *attr_widths,
+ BlockNumber *pages, double *tuples, double *allvisfrac);
+typedef void (*SetNewFileNode_function)(Relation relation, char persistence,
+ TransactionId freezeXid, MultiXactId minmulti);
+
/*
* API struct for a table AM. Note this must be allocated in a
* server-lifetime manner, typically as a static const struct.
@@ -250,6 +256,10 @@ typedef struct TableAmRoutine
IndexBuildRangeScan_function index_build_range_scan;
IndexValidateScan_function index_validate_scan;
+
+ CreateInitFork_function CreateInitFork;
+ EstimateRelSize_function EstimateRelSize;
+ SetNewFileNode_function SetNewFileNode;
} TableAmRoutine;
static inline const TupleTableSlotOps*
@@ -741,6 +751,12 @@ table_index_build_range_scan(Relation heapRelation,
scan);
}
+static inline void
+table_create_init_fork(Relation relation)
+{
+ relation->rd_tableamroutine->CreateInitFork(relation);
+}
+
extern BlockNumber table_parallelscan_nextpage(TableScanDesc scan);
extern void table_parallelscan_startblock_init(TableScanDesc scan);
extern Size table_parallelscan_estimate(Snapshot snapshot);
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 4584b3473c..c0e706ecc9 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -77,8 +77,6 @@ extern Oid heap_create_with_catalog(const char *relname,
Oid relrewrite,
ObjectAddress *typaddress);
-extern void heap_create_init_fork(Relation rel);
-
extern void heap_drop_with_catalog(Oid relid);
extern void heap_truncate(List *relids);
--
2.18.0.windows.1
0001-Further-fixes-and-cleanup.patchapplication/octet-stream; name=0001-Further-fixes-and-cleanup.patchDownload
From 5b03e5e2aab35477e409b531cca09e9fef528e6f Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 22 Oct 2018 16:06:10 +1100
Subject: [PATCH 1/3] Further fixes and cleanup
1. Remove the old slot interface file and also update the Makefile.
2. CREATE AS USING method grammer support
This change was missed during earlier USING grammer support.
3. Remove the extra Tuple visibility check
In heapgettup_pagemode the tuple visiblity check is added
during the early devlopment of pluggable storage, but the visibility
check is already carried out in heapgetpage function itself.
4. Handling HeapTupleInvisible case
In update/delete scenarios, when the tuple is
concurrently updated/deleted, sometimes locking
of a tuple may return HeapTupleInvisible. Handle
that case as nothing to do.
regression fixes
1. scan start offset fix during analyze
2. Materialize the slot before they are processed using intorel_receive
3. ROW_MARK_COPY support by force store of heap tuple
4. partition prune extra heap page fix
---
contrib/pg_visibility/pg_visibility.c | 5 ++--
src/backend/access/heap/heapam.c | 28 +++++++++--------------
src/backend/access/heap/heapam_handler.c | 6 +++--
src/backend/access/table/Makefile | 2 +-
src/backend/access/table/tableam_common.c | 0
src/backend/commands/createas.c | 4 ++++
src/backend/executor/execExprInterp.c | 16 +++++--------
src/backend/executor/execMain.c | 13 ++---------
src/backend/executor/execTuples.c | 21 +++++++++++++++++
src/backend/executor/nodeBitmapHeapscan.c | 12 ++++++++++
src/backend/executor/nodeModifyTable.c | 22 ++++++++++++++++--
src/backend/parser/gram.y | 11 +++++----
src/include/executor/tuptable.h | 1 +
src/include/nodes/primnodes.h | 1 +
14 files changed, 92 insertions(+), 50 deletions(-)
delete mode 100644 src/backend/access/table/tableam_common.c
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index dce5262e34..88ca4fd2af 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -563,12 +563,13 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
rel = relation_open(relid, AccessShareLock);
+ /* Only some relkinds have a visibility map */
+ check_relation_relkind(rel);
+
if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("only heap AM is supported")));
- /* Only some relkinds have a visibility map */
- check_relation_relkind(rel);
nblocks = RelationGetNumberOfBlocks(rel);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ec99d0bcae..ef6b4c3e54 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -947,31 +947,25 @@ heapgettup_pagemode(HeapScanDesc scan,
/*
* if current tuple qualifies, return it.
*/
- if (HeapTupleSatisfies(tuple, scan->rs_scan.rs_snapshot, scan->rs_cbuf))
+ if (key != NULL)
{
- /*
- * if current tuple qualifies, return it.
- */
- if (key != NULL)
- {
- bool valid;
+ bool valid;
- HeapKeyTest(tuple, RelationGetDescr(scan->rs_scan.rs_rd),
- nkeys, key, valid);
- if (valid)
- {
- scan->rs_cindex = lineindex;
- LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
- return;
- }
- }
- else
+ HeapKeyTest(tuple, RelationGetDescr(scan->rs_scan.rs_rd),
+ nkeys, key, valid);
+ if (valid)
{
scan->rs_cindex = lineindex;
LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
return;
}
}
+ else
+ {
+ scan->rs_cindex = lineindex;
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+ return;
+ }
/*
* otherwise move to the next item on the page
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c3960dc91f..3254e30a45 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1741,7 +1741,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
{
HeapScanDesc scan = (HeapScanDesc) sscan;
Page targpage;
- OffsetNumber targoffset = scan->rs_cindex;
+ OffsetNumber targoffset;
OffsetNumber maxoffset;
BufferHeapTupleTableSlot *hslot;
@@ -1751,7 +1751,9 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
maxoffset = PageGetMaxOffsetNumber(targpage);
/* Inner loop over all tuples on the selected page */
- for (targoffset = scan->rs_cindex; targoffset <= maxoffset; targoffset++)
+ for (targoffset = scan->rs_cindex ? scan->rs_cindex : FirstOffsetNumber;
+ targoffset <= maxoffset;
+ targoffset++)
{
ItemId itemid;
HeapTuple targtuple = &hslot->base.tupdata;
diff --git a/src/backend/access/table/Makefile b/src/backend/access/table/Makefile
index fe22bf9208..006ba99182 100644
--- a/src/backend/access/table/Makefile
+++ b/src/backend/access/table/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/access/table
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = tableam.o tableamapi.o tableam_common.o
+OBJS = tableam.o tableamapi.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/table/tableam_common.c b/src/backend/access/table/tableam_common.c
deleted file mode 100644
index e69de29bb2..0000000000
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 84de804175..d3ffe417ff 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -593,6 +593,10 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
if (myState->rel->rd_rel->relhasoids)
slot->tts_tupleOid = InvalidOid;
+ /* Materialize the slot */
+ if (!TTS_IS_VIRTUAL(slot))
+ ExecMaterializeSlot(slot);
+
table_insert(myState->rel,
slot,
myState->output_cid,
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index ef94ac4aa0..8df85c2f48 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -529,20 +529,13 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
Assert(TTS_IS_HEAPTUPLE(outerslot) ||
TTS_IS_BUFFERTUPLE(outerslot));
- /* The slot should have a valid heap tuple. */
-#if FIXME
- /* The slot should have a valid heap tuple. */
- Assert(hslot->tuple != NULL);
-#endif
-
- /*
- * hari
- * Assert(outerslot->tts_storageslotam->slot_is_physical_tuple(outerslot));
- */
if (attnum == TableOidAttributeNumber)
d = ObjectIdGetDatum(outerslot->tts_tableOid);
else
{
+ /* The slot should have a valid heap tuple. */
+ Assert(hslot->tuple != NULL);
+
/* heap_getsysattr has sufficient defenses against bad attnums */
d = heap_getsysattr(hslot->tuple, attnum,
outerslot->tts_tupleDescriptor,
@@ -570,6 +563,9 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
Assert(TTS_IS_HEAPTUPLE(scanslot) ||
TTS_IS_BUFFERTUPLE(scanslot));
+ if (hslot->tuple == NULL)
+ ExecMaterializeSlot(scanslot);
+
d = heap_getsysattr(hslot->tuple, attnum,
scanslot->tts_tupleDescriptor,
op->resnull);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index e055c0a7c6..34ef86a5bd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2594,7 +2594,7 @@ EvalPlanQual(EState *estate, EPQState *epqstate,
* datums that may be present in copyTuple). As with the next step, this
* is to guard against early re-use of the EPQ query.
*/
- if (!TupIsNull(slot))
+ if (!TupIsNull(slot) && !TTS_IS_VIRTUAL(slot))
ExecMaterializeSlot(slot);
#if FIXME
@@ -2787,16 +2787,7 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
if (isNull)
continue;
- elog(ERROR, "frak, need to implement ROW_MARK_COPY");
-#ifdef FIXME
- // FIXME: this should just deform the tuple and store it as a
- // virtual one.
- tuple = table_tuple_by_datum(erm->relation, datum, erm->relid);
-
- /* store tuple */
- EvalPlanQualSetTuple(epqstate, erm->rti, tuple);
-#endif
-
+ ExecForceStoreHeapTupleDatum(datum, slot);
}
}
}
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 917bf80f71..74149cc3ad 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -1364,6 +1364,27 @@ ExecStoreAllNullTuple(TupleTableSlot *slot)
return ExecFinishStoreSlotValues(slot);
}
+void
+ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot)
+{
+ HeapTuple tuple;
+ HeapTupleHeader td;
+
+ td = DatumGetHeapTupleHeader(data);
+
+ tuple = (HeapTuple) palloc(HEAPTUPLESIZE + HeapTupleHeaderGetDatumLength(td));
+ tuple->t_len = HeapTupleHeaderGetDatumLength(td);
+ tuple->t_self = td->t_ctid;
+ tuple->t_data = (HeapTupleHeader) ((char *) tuple + HEAPTUPLESIZE);
+ memcpy((char *) tuple->t_data, (char *) td, tuple->t_len);
+
+ ExecClearTuple(slot);
+
+ heap_deform_tuple(tuple, slot->tts_tupleDescriptor,
+ slot->tts_values, slot->tts_isnull);
+ ExecFinishStoreSlotValues(slot);
+}
+
/* --------------------------------
* ExecFetchSlotTuple
* Fetch the slot's regular physical tuple.
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 56880e3d16..36ca07beb2 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -224,6 +224,18 @@ BitmapHeapNext(BitmapHeapScanState *node)
BitmapAdjustPrefetchIterator(node, tbmres);
+ /*
+ * Ignore any claimed entries past what we think is the end of the
+ * relation. (This is probably not necessary given that we got at
+ * least AccessShareLock on the table before performing any of the
+ * indexscans, but let's be safe.)
+ */
+ if (tbmres->blockno >= scan->rs_nblocks)
+ {
+ node->tbmres = tbmres = NULL;
+ continue;
+ }
+
/*
* We can skip fetching the heap page if we don't need any fields
* from the heap, and the bitmap entries don't need rechecking,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 3cc9092413..b3851b180d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -607,7 +607,7 @@ ExecDelete(ModifyTableState *mtstate,
bool canSetTag,
bool changingPart,
bool *tupleDeleted,
- TupleTableSlot **epqslot)
+ TupleTableSlot **epqreturnslot)
{
ResultRelInfo *resultRelInfo;
Relation resultRelationDesc;
@@ -632,7 +632,7 @@ ExecDelete(ModifyTableState *mtstate,
bool dodelete;
dodelete = ExecBRDeleteTriggers(estate, epqstate, resultRelInfo,
- tupleid, oldtuple, epqslot);
+ tupleid, oldtuple, epqreturnslot);
if (!dodelete) /* "do nothing" */
return NULL;
@@ -724,8 +724,21 @@ ldelete:;
/* Tuple no more passing quals, exiting... */
return NULL;
}
+
+ /**/
+ if (epqreturnslot)
+ {
+ *epqreturnslot = epqslot;
+ return NULL;
+ }
+
goto ldelete;
}
+ else if (result == HeapTupleInvisible)
+ {
+ /* tuple is not visible; nothing to do */
+ return NULL;
+ }
}
switch (result)
@@ -1196,6 +1209,11 @@ lreplace:;
slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
goto lreplace;
}
+ else if (result == HeapTupleInvisible)
+ {
+ /* tuple is not visible; nothing to do */
+ return NULL;
+ }
}
switch (result)
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 54382aba88..f030ad25a2 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4037,7 +4037,6 @@ CreateStatsStmt:
*
*****************************************************************************/
-// PBORKED: storage option
CreateAsStmt:
CREATE OptTemp TABLE create_as_target AS SelectStmt opt_with_data
{
@@ -4068,14 +4067,16 @@ CreateAsStmt:
;
create_as_target:
- qualified_name opt_column_list OptWith OnCommitOption OptTableSpace
+ qualified_name opt_column_list table_access_method_clause
+ OptWith OnCommitOption OptTableSpace
{
$$ = makeNode(IntoClause);
$$->rel = $1;
$$->colNames = $2;
- $$->options = $3;
- $$->onCommit = $4;
- $$->tableSpaceName = $5;
+ $$->accessMethod = $3;
+ $$->options = $4;
+ $$->onCommit = $5;
+ $$->tableSpaceName = $6;
$$->viewQuery = NULL;
$$->skipData = false; /* might get changed later */
}
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index 05f38cfd0d..20fc425a27 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -476,6 +476,7 @@ extern TupleTableSlot *ExecCopySlot(TupleTableSlot *dstslot,
extern void ExecForceStoreHeapTuple(HeapTuple tuple,
TupleTableSlot *slot);
+extern void ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot);
extern void slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum);
extern Datum slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull);
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 40f6eb03d2..4d194a8c2a 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -111,6 +111,7 @@ typedef struct IntoClause
RangeVar *rel; /* target relation name */
List *colNames; /* column names to assign, or NIL */
+ char *accessMethod; /* table access method */
List *options; /* options from WITH clause */
OnCommitAction onCommit; /* what do we do at COMMIT? */
char *tableSpaceName; /* table space to use, or NULL */
--
2.18.0.windows.1
On Fri, Nov 2, 2018 at 11:17 AM Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
On Wed, Oct 31, 2018 at 9:34 PM Dmitry Dolgov <9erthalion6@gmail.com>
wrote:FYI, alongside with reviewing the code changes I've ran few performance
tests
(that's why I hit this issue with pgbench in the first place). In case of
high
concurrecy so far I see small performance degradation in comparison with
the
master branch (about 2-5% of average latency, depending on the level of
concurrency), but can't really say why exactly (perf just shows barely
noticeable overhead there and there, maybe what I see is actually a
cumulative
impact).Thanks for sharing your observation, I will also analyze and try to find
out performance
bottlenecks that are causing the overhead.
I tried running the pgbench performance tests with minimal clients in my
laptop and I didn't
find any performance issues, may be issue is visible only with higher
clients. Even with
perf tool, I am not able to get a clear problem function. As you said,
combining of all changes
leads to some overhead.
Here I attached the cumulative patches with further fixes and basic syntax
regress tests also.
Regards,
Haribabu Kommi
Fujitsu Australia
Attachments:
0003-First-draft-of-pluggable-storage-documentation.patchapplication/octet-stream; name=0003-First-draft-of-pluggable-storage-documentation.patchDownload
From 2c1414f2e847577174ba3087868e4920342dfeb1 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Thu, 1 Nov 2018 12:00:10 +1100
Subject: [PATCH 3/3] First draft of pluggable-storage documentation
---
doc/src/sgml/{indexam.sgml => am.sgml} | 590 ++++++++++++++++++++-
doc/src/sgml/catalogs.sgml | 5 +-
doc/src/sgml/config.sgml | 24 +
doc/src/sgml/filelist.sgml | 2 +-
doc/src/sgml/postgres.sgml | 2 +-
doc/src/sgml/ref/create_access_method.sgml | 12 +-
doc/src/sgml/ref/create_table.sgml | 18 +-
doc/src/sgml/ref/create_table_as.sgml | 14 +
doc/src/sgml/release-9.6.sgml | 2 +-
doc/src/sgml/xindex.sgml | 2 +-
10 files changed, 640 insertions(+), 31 deletions(-)
rename doc/src/sgml/{indexam.sgml => am.sgml} (78%)
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/am.sgml
similarity index 78%
rename from doc/src/sgml/indexam.sgml
rename to doc/src/sgml/am.sgml
index beb99d1831..dc13bc1073 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/am.sgml
@@ -1,16 +1,20 @@
-<!-- doc/src/sgml/indexam.sgml -->
+<!-- doc/src/sgml/am.sgml -->
-<chapter id="indexam">
- <title>Index Access Method Interface Definition</title>
+<chapter id="am">
+ <title>Access Method Interface Definition</title>
<para>
This chapter defines the interface between the core
- <productname>PostgreSQL</productname> system and <firstterm>index access
- methods</firstterm>, which manage individual index types. The core system
- knows nothing about indexes beyond what is specified here, so it is
- possible to develop entirely new index types by writing add-on code.
+ <productname>PostgreSQL</productname> system and <firstterm>access
+ methods</firstterm>, which manage individual <literal>INDEX</literal>
+ and <literal>TABLE</literal> types. The core system knows nothing
+ about these access methods beyond what is specified here, so it is
+ possible to develop entirely new access method types by writing add-on code.
</para>
-
+
+ <sect1 id="index-access-methods">
+ <title>Overview of Index access methods</title>
+
<para>
All indexes in <productname>PostgreSQL</productname> are what are known
technically as <firstterm>secondary indexes</firstterm>; that is, the index is
@@ -42,8 +46,8 @@
dead tuples are reclaimed (by vacuuming) when the dead tuples themselves
are reclaimed.
</para>
-
- <sect1 id="index-api">
+
+ <sect2 id="index-api">
<title>Basic API Structure for Indexes</title>
<para>
@@ -217,9 +221,9 @@ typedef struct IndexAmRoutine
conditions.
</para>
- </sect1>
+ </sect2>
- <sect1 id="index-functions">
+ <sect2 id="index-functions">
<title>Index Access Method Functions</title>
<para>
@@ -709,9 +713,11 @@ amparallelrescan (IndexScanDesc scan);
the beginning.
</para>
- </sect1>
+ </sect2>
+
+
- <sect1 id="index-scanning">
+ <sect2 id="index-scanning">
<title>Index Scanning</title>
<para>
@@ -864,9 +870,9 @@ amparallelrescan (IndexScanDesc scan);
if its internal implementation is unsuited to one API or the other.
</para>
- </sect1>
+ </sect2>
- <sect1 id="index-locking">
+ <sect2 id="index-locking">
<title>Index Locking Considerations</title>
<para>
@@ -978,9 +984,9 @@ amparallelrescan (IndexScanDesc scan);
reduce the frequency of such transaction cancellations.
</para>
- </sect1>
+ </sect2>
- <sect1 id="index-unique-checks">
+ <sect2 id="index-unique-checks">
<title>Index Uniqueness Checks</title>
<para>
@@ -1127,9 +1133,9 @@ amparallelrescan (IndexScanDesc scan);
</itemizedlist>
</para>
- </sect1>
+ </sect2>
- <sect1 id="index-cost-estimation">
+ <sect2 id="index-cost-estimation">
<title>Index Cost Estimation Functions</title>
<para>
@@ -1376,5 +1382,549 @@ cost_qual_eval(&index_qual_cost, path->indexquals, root);
Examples of cost estimator functions can be found in
<filename>src/backend/utils/adt/selfuncs.c</filename>.
</para>
+ </sect2>
</sect1>
+
+ <sect1 id="table-access-methods">
+ <title>Overview of Table access methods </title>
+
+ <para>
+ All Tables in <productname>PostgreSQL</productname> are the primary data store.
+ Each table is stored as its own physical <firstterm>relation</firstterm> and so
+ is described by an entry in the <structname>pg_class</structname> catalog.
+ The contents of an table are entirely under the control of its access method.
+ (All the access methods furthermore use the standard page layout described in
+ <xref linkend="storage-page-layout"/>.)
+ </para>
+
+ <sect2 id="table-api">
+ <title>Table access method API</title>
+
+ <para>
+ Each table access method is described by a row in the
+ <link linkend="catalog-pg-am"><structname>pg_am</structname></link>
+ system catalog. The <structname>pg_am</structname> entry
+ specifies a name and a <firstterm>handler function</firstterm> for the access
+ method. These entries can be created and deleted using the
+ <xref linkend="sql-create-access-method"/> and
+ <xref linkend="sql-drop-access-method"/> SQL commands.
+ </para>
+
+ <para>
+ A table access method handler function must be declared to accept a
+ single argument of type <type>internal</type> and to return the
+ pseudo-type <type>table_am_handler</type>. The argument is a dummy value that
+ simply serves to prevent handler functions from being called directly from
+ SQL commands. The result of the function must be a palloc'd struct of
+ type <structname>TableAmRoutine</structname>, which contains everything
+ that the core code needs to know to make use of the table access method.
+ The <structname>TableAmRoutine</structname> struct, also called the access
+ method's <firstterm>API struct</firstterm>, includes fields specifying assorted
+ fixed properties of the access method, such as whether it can support
+ bitmap scans. More importantly, it contains pointers to support
+ functions for the access method, which do all of the real work to access
+ tables. These support functions are plain C functions and are not
+ visible or callable at the SQL level. The support functions are described
+ in <xref linkend="table-functions"/>.
+ </para>
+
+ <para>
+ The structure <structname>TableAmRoutine</structname> is defined thus:
+<programlisting>
+typedef struct TableAmRoutine
+{
+ NodeTag type;
+
+ SlotCallbacks_function slot_callbacks;
+
+ SnapshotSatisfies_function snapshot_satisfies;
+ SnapshotSatisfiesUpdate_function snapshot_satisfiesUpdate;
+ SnapshotSatisfiesVacuum_function snapshot_satisfiesVacuum;
+
+ /* Operations on physical tuples */
+ TupleInsert_function tuple_insert;
+ TupleInsertSpeculative_function tuple_insert_speculative;
+ TupleCompleteSpeculative_function tuple_complete_speculative;
+ TupleUpdate_function tuple_update;
+ TupleDelete_function tuple_delete;
+ TupleFetchRowVersion_function tuple_fetch_row_version;
+ TupleLock_function tuple_lock;
+ MultiInsert_function multi_insert;
+ TupleGetLatestTid_function tuple_get_latest_tid;
+ TupleFetchFollow_function tuple_fetch_follow;
+
+ GetTupleData_function get_tuple_data;
+
+ RelationVacuum_function relation_vacuum;
+ RelationScanAnalyzeNextBlock_function scan_analyze_next_block;
+ RelationScanAnalyzeNextTuple_function scan_analyze_next_tuple;
+ RelationCopyForCluster_function relation_copy_for_cluster;
+ RelationSync_function relation_sync;
+
+ /* Operations on relation scans */
+ ScanBegin_function scan_begin;
+ ScanSetlimits_function scansetlimits;
+ ScanGetnextSlot_function scan_getnextslot;
+
+ BitmapPagescan_function scan_bitmap_pagescan;
+ BitmapPagescanNext_function scan_bitmap_pagescan_next;
+
+ SampleScanNextBlock_function scan_sample_next_block;
+ SampleScanNextTuple_function scan_sample_next_tuple;
+
+ ScanEnd_function scan_end;
+ ScanRescan_function scan_rescan;
+ ScanUpdateSnapshot_function scan_update_snapshot;
+
+ BeginIndexFetchTable_function begin_index_fetch;
+ EndIndexFetchTable_function reset_index_fetch;
+ EndIndexFetchTable_function end_index_fetch;
+
+
+ IndexBuildRangeScan_function index_build_range_scan;
+ IndexValidateScan_function index_validate_scan;
+
+ CreateInitFork_function CreateInitFork;
+} TableAmRoutine;
+</programlisting>
+ </para>
+
+ <para>
+ An individual table is defined by a
+ <link linkend="catalog-pg-class"><structname>pg_class</structname></link>
+ entry that describes it as a physical relation.
+ </para>
+
+ </sect2>
+
+ <sect2 id="table-functions">
+ <title>Table Access Method Functions</title>
+
+ <para>
+ The table construction and maintenance functions that an table access
+ method must provide in <structname>TableAmRoutine</structname> are:
+ </para>
+
+ <para>
+<programlisting>
+TupleTableSlotOps *
+slot_callbacks (Relation relation);
+</programlisting>
+ API to access the slot specific methods;
+ Following methods are available;
+ <structname>TTSOpsVirtual</structname>,
+ <structname>TTSOpsHeapTuple</structname>,
+ <structname>TTSOpsMinimalTuple</structname>,
+ <structname>TTSOpsBufferTuple</structname>,
+ </para>
+
+ <para>
+<programlisting>
+bool
+snapshot_satisfies (TupleTableSlot *slot, Snapshot snapshot);
+</programlisting>
+ API to check whether the provided slot is visible to the current
+ transaction according the snapshot.
+ </para>
+
+ <para>
+<programlisting>
+Oid
+tuple_insert (Relation rel, TupleTableSlot *slot, CommandId cid,
+ int options, BulkInsertState bistate);
+</programlisting>
+ API to insert the tuple and provide the <literal>ItemPointerData</literal>
+ where the tuple is successfully inserted.
+ </para>
+
+ <para>
+<programlisting>
+Oid
+tuple_insert_speculative (Relation rel,
+ TupleTableSlot *slot,
+ CommandId cid,
+ int options,
+ BulkInsertState bistate,
+ uint32 specToken);
+</programlisting>
+ API to insert the tuple with a speculative token. This API is similar
+ like <literal>tuple_insert</literal>, with additional speculative
+ information.
+ </para>
+
+ <para>
+<programlisting>
+void
+tuple_complete_speculative (Relation rel,
+ TupleTableSlot *slot,
+ uint32 specToken,
+ bool succeeded);
+</programlisting>
+ API to complete the state of the tuple inserted by the API <literal>tuple_insert_speculative</literal>
+ with the successful completion of the index insert.
+ </para>
+
+
+ <para>
+<programlisting>
+HTSU_Result
+tuple_update (Relation relation,
+ ItemPointer otid,
+ TupleTableSlot *slot,
+ CommandId cid,
+ Snapshot crosscheck,
+ bool wait,
+ HeapUpdateFailureData *hufd,
+ LockTupleMode *lockmode,
+ bool *update_indexes);
+</programlisting>
+ API to update the existing tuple with new data.
+ </para>
+
+
+ <para>
+<programlisting>
+HTSU_Result
+tuple_delete (Relation relation,
+ ItemPointer tid,
+ CommandId cid,
+ Snapshot crosscheck,
+ bool wait,
+ HeapUpdateFailureData *hufd,
+ bool changingPart);
+</programlisting>
+ API to delete the existing tuple.
+ </para>
+
+
+ <para>
+<programlisting>
+bool
+tuple_fetch_row_version (Relation relation,
+ ItemPointer tid,
+ Snapshot snapshot,
+ TupleTableSlot *slot,
+ Relation stats_relation);
+</programlisting>
+ API to fetch and store the Buffered Heap tuple in the provided slot
+ based on the ItemPointer.
+ </para>
+
+
+ <para>
+<programlisting>
+HTSU_Result
+TupleLock_function (Relation relation,
+ ItemPointer tid,
+ Snapshot snapshot,
+ TupleTableSlot *slot,
+ CommandId cid,
+ LockTupleMode mode,
+ LockWaitPolicy wait_policy,
+ uint8 flags,
+ HeapUpdateFailureData *hufd);
+</programlisting>
+ API to lock the specified the ItemPointer tuple and fetches the newest version of
+ its tuple and TID.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+multi_insert (Relation relation, TupleTableSlot **slots, int nslots,
+ CommandId cid, int options, BulkInsertState bistate);
+</programlisting>
+ API to insert multiple tuples at a time into the relation.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+tuple_get_latest_tid (Relation relation,
+ Snapshot snapshot,
+ ItemPointer tid);
+</programlisting>
+ API to get the the latest TID of the tuple with the given itempointer.
+ </para>
+
+
+ <para>
+<programlisting>
+bool
+tuple_fetch_follow (struct IndexFetchTableData *scan,
+ ItemPointer tid,
+ Snapshot snapshot,
+ TupleTableSlot *slot,
+ bool *call_again, bool *all_dead);
+</programlisting>
+ API to get the all the tuples of the page that satisfies itempointer.
+ </para>
+
+
+ <para>
+<programlisting>
+tuple_data
+get_tuple_data (TupleTableSlot *slot, tuple_data_flags flags);
+</programlisting>
+ API to return the internal structure members of the HeapTuple.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+relation_vacuum (Relation onerel, int options,
+ struct VacuumParams *params, BufferAccessStrategy bstrategy);
+</programlisting>
+ API to perform vacuum for one heap relation.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+scan_analyze_next_block (TableScanDesc scan, BlockNumber blockno,
+ BufferAccessStrategy bstrategy);
+</programlisting>
+ API to fill the scan descriptor with the buffer of the specified block.
+ </para>
+
+
+ <para>
+<programlisting>
+bool
+scan_analyze_next_tuple (TableScanDesc scan, TransactionId OldestXmin,
+ double *liverows, double *deadrows, TupleTableSlot *slot));
+</programlisting>
+ API to analyze the block and fill the buffered heap tuple in the slot and also
+ provide the live and dead rows.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+relation_copy_for_cluster (Relation NewHeap, Relation OldHeap, Relation OldIndex,
+ bool use_sort,
+ TransactionId OldestXmin, TransactionId FreezeXid, MultiXactId MultiXactCutoff,
+ double *num_tuples, double *tups_vacuumed, double *tups_recently_dead);
+</programlisting>
+ API to copy one relation to another relation eith using the Index or table scan.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+relation_sync (Relation relation);
+</programlisting>
+ API to sync the relation to disk, useful for the cases where no WAL is written.
+ </para>
+
+
+ <para>
+<programlisting>
+TableScanDesc
+scan_begin (Relation relation,
+ Snapshot snapshot,
+ int nkeys, ScanKey key,
+ ParallelTableScanDesc parallel_scan,
+ bool allow_strat,
+ bool allow_sync,
+ bool allow_pagemode,
+ bool is_bitmapscan,
+ bool is_samplescan,
+ bool temp_snap);
+</programlisting>
+ API to start the relation scan for the provided relation and returns the
+ <structname>TableScanDesc</structname> structure.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+scansetlimits (TableScanDesc sscan, BlockNumber startBlk, BlockNumber numBlks);
+</programlisting>
+ API to fix the relation scan range limits.
+ </para>
+
+
+ <para>
+<programlisting>
+TupleTableSlot *
+scan_getnextslot (TableScanDesc scan,
+ ScanDirection direction, TupleTableSlot *slot);
+</programlisting>
+ API to fill the next visible tuple from the relation scan in the provided slot
+ and return it.
+ </para>
+
+
+ <para>
+<programlisting>
+bool
+scan_bitmap_pagescan (TableScanDesc scan,
+ TBMIterateResult *tbmres);
+</programlisting>
+ API to scan the relation and fill the scan description bitmap with valid item pointers
+ for the specified block.
+ </para>
+
+
+ <para>
+<programlisting>
+bool
+scan_bitmap_pagescan_next (TableScanDesc scan,
+ TupleTableSlot *slot);
+</programlisting>
+ API to fill the buffered heap tuple data from the bitmap scanned item pointers and store
+ it in the provided slot.
+ </para>
+
+
+ <para>
+<programlisting>
+bool
+scan_sample_next_block (TableScanDesc scan, struct SampleScanState *scanstate);
+</programlisting>
+ API to scan the relation and fill the scan description bitmap with valid item pointers
+ for the specified block provided by the sample method.
+ </para>
+
+
+ <para>
+<programlisting>
+bool
+scan_sample_next_tuple (TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot);
+</programlisting>
+ API to fill the buffered heap tuple data from the bitmap scanned item pointers based on the sample
+ method and store it in the provided slot.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+scan_end (TableScanDesc scan);
+</programlisting>
+ API to end the relation scan.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+scan_rescan (TableScanDesc scan, ScanKey key, bool set_params,
+ bool allow_strat, bool allow_sync, bool allow_pagemode);
+</programlisting>
+ API to restart the relation scan with provided data.
+ </para>
+
+
+ <para>
+<programlisting>
+void
+scan_update_snapshot (TableScanDesc scan, Snapshot snapshot);
+</programlisting>
+ API to update the relation scan with the new snapshot.
+ </para>
+
+ <para>
+<programlisting>
+IndexFetchTableData *
+begin_index_fetch (Relation relation);
+</programlisting>
+ API to prepare the <structname>IndexFetchTableData</structname> for the relation.
+ </para>
+
+ <para>
+<programlisting>
+void
+reset_index_fetch (struct IndexFetchTableData* data);
+</programlisting>
+ API to reset the prepared internal members of the <structname>IndexFetchTableData</structname>.
+ </para>
+
+ <para>
+<programlisting>
+void
+end_index_fetch (struct IndexFetchTableData* data);
+</programlisting>
+ API to clear and free the <structname>IndexFetchTableData</structname>.
+ </para>
+
+ <para>
+<programlisting>
+double
+index_build_range_scan (Relation heapRelation,
+ Relation indexRelation,
+ IndexInfo *indexInfo,
+ bool allow_sync,
+ bool anyvisible,
+ BlockNumber start_blockno,
+ BlockNumber end_blockno,
+ IndexBuildCallback callback,
+ void *callback_state,
+ TableScanDesc scan);
+</programlisting>
+ API to perform the table scan with bounded range specified by the caller
+ and insert the satisfied records into the index using the provided callback
+ function pointer.
+ </para>
+
+ <para>
+<programlisting>
+void
+index_validate_scan (Relation heapRelation,
+ Relation indexRelation,
+ IndexInfo *indexInfo,
+ Snapshot snapshot,
+ struct ValidateIndexState *state);
+</programlisting>
+ API to perform the table scan and insert the satisfied records into the index.
+ This API is similar like <function>index_build_range_scan</function>. This
+ is used in the scenario of concurrent index build.
+ </para>
+
+ </sect2>
+
+ <sect2>
+ <title>Table scanning</title>
+
+ <para>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Table insert/update/delete</title>
+
+ <para>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Table locking</title>
+
+ <para>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Table vacuum</title>
+
+ <para>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Table fetch</title>
+
+ <para>
+ </para>
+ </sect2>
+
+ </sect1>
</chapter>
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 0179deea2e..f0c8037bbc 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -587,8 +587,9 @@
The catalog <structname>pg_am</structname> stores information about
relation access methods. There is one row for each access method supported
by the system.
- Currently, only indexes have access methods. The requirements for index
- access methods are discussed in detail in <xref linkend="indexam"/>.
+ Currently, only <literal>INDEX</literal> and <literal>TABLE</literal> have
+ access methods. The requirements for access methods are discussed in detail
+ in <xref linkend="am"/>.
</para>
<table>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f11b8f724c..8765d7c57c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6585,6 +6585,30 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-default-table-access-method" xreflabel="default_table_access_method">
+ <term><varname>default_table_access_method</varname> (<type>string</type>)
+ <indexterm>
+ <primary><varname>default_table_access_method</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ This variable specifies the default table access method using which to create
+ objects (tables and materialized views) when a <command>CREATE</command> command does
+ not explicitly specify a access method.
+ </para>
+
+ <para>
+ The value is either the name of a table access method, or an empty string
+ to specify using the default table access method of the current database.
+ If the value does not match the name of any existing table access methods,
+ <productname>PostgreSQL</productname> will automatically use the default
+ table access method of the current database.
+ </para>
+
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-default-tablespace" xreflabel="default_tablespace">
<term><varname>default_tablespace</varname> (<type>string</type>)
<indexterm>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 48ac14a838..99a6496502 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -90,7 +90,7 @@
<!ENTITY gin SYSTEM "gin.sgml">
<!ENTITY brin SYSTEM "brin.sgml">
<!ENTITY planstats SYSTEM "planstats.sgml">
-<!ENTITY indexam SYSTEM "indexam.sgml">
+<!ENTITY am SYSTEM "am.sgml">
<!ENTITY nls SYSTEM "nls.sgml">
<!ENTITY plhandler SYSTEM "plhandler.sgml">
<!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 0070603fc3..3e66ae9c8a 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -251,7 +251,7 @@
&tablesample-method;
&custom-scan;
&geqo;
- &indexam;
+ &am;
&generic-wal;
&btree;
&gist;
diff --git a/doc/src/sgml/ref/create_access_method.sgml b/doc/src/sgml/ref/create_access_method.sgml
index 851c5e63be..256914022a 100644
--- a/doc/src/sgml/ref/create_access_method.sgml
+++ b/doc/src/sgml/ref/create_access_method.sgml
@@ -61,7 +61,8 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
<listitem>
<para>
This clause specifies the type of access method to define.
- Only <literal>INDEX</literal> is supported at present.
+ Only <literal>INDEX</literal> and <literal>TABLE</literal>
+ are supported at present.
</para>
</listitem>
</varlistentry>
@@ -76,9 +77,12 @@ CREATE ACCESS METHOD <replaceable class="parameter">name</replaceable>
declared to take a single argument of type <type>internal</type>,
and its return type depends on the type of access method;
for <literal>INDEX</literal> access methods, it must
- be <type>index_am_handler</type>. The C-level API that the handler
- function must implement varies depending on the type of access method.
- The index access method API is described in <xref linkend="indexam"/>.
+ be <type>index_am_handler</type> and for <literal>TABLE</literal>
+ access methods, it must be <type>table_am_handler</type>.
+ The C-level API that the handler function must implement varies
+ depending on the type of access method. The index access method API
+ is described in <xref linkend="index-access-methods"/> and the table access method
+ API is described in <xref linkend="table-access-methods"/>.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 10428f8ff0..87e0f01ab2 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -29,6 +29,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
] )
[ INHERITS ( <replaceable>parent_table</replaceable> [, ... ] ) ]
[ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
[ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -40,6 +41,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
[, ... ]
) ]
[ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
[ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -51,6 +53,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
[, ... ]
) ] { FOR VALUES <replaceable class="parameter">partition_bound_spec</replaceable> | DEFAULT }
[ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ USING <replaceable class="parameter">method</replaceable> ]
[ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -955,7 +958,7 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
<para>
The access method must support <literal>amgettuple</literal> (see <xref
- linkend="indexam"/>); at present this means <acronym>GIN</acronym>
+ linkend="index-access-methods"/>); at present this means <acronym>GIN</acronym>
cannot be used. Although it's allowed, there is little point in using
B-tree or hash indexes with an exclusion constraint, because this
does nothing that an ordinary unique constraint doesn't do better.
@@ -1138,6 +1141,19 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+ <listitem>
+ <para>
+ This clause specifies optional access method for the new table;
+ see <xref linkend="table-access-methods"/> for more information.
+ If this option is not specified, then the default table access method
+ is chosen for the new table. see <xref linkend="guc-default-table-access-method"/>
+ for more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
<listitem>
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 527138e787..2acf52d2f5 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
<synopsis>
CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable>table_name</replaceable>
[ (<replaceable>column_name</replaceable> [, ...] ) ]
+ [ USING <replaceable class="parameter">method</replaceable> ]
[ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -120,6 +121,19 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>USING <replaceable class="parameter">method</replaceable></literal></term>
+ <listitem>
+ <para>
+ This clause specifies optional access method for the new table;
+ see <xref linkend="table-access-methods"/> for more information.
+ If this option is not specified, then the default table access method
+ is chosen for the new table. see <xref linkend="guc-default-table-access-method"/>
+ for more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
<listitem>
diff --git a/doc/src/sgml/release-9.6.sgml b/doc/src/sgml/release-9.6.sgml
index acb6a88b31..68c79db4b5 100644
--- a/doc/src/sgml/release-9.6.sgml
+++ b/doc/src/sgml/release-9.6.sgml
@@ -10081,7 +10081,7 @@ This commit is also listed under libpq and PL/pgSQL
2016-08-13 [ed0097e4f] Add SQL-accessible functions for inspecting index AM pro
-->
<para>
- Restructure <link linkend="indexam">index access
+ Restructure <link linkend="index-access-methods">index access
method <acronym>API</acronym></link> to hide most of it at
the <application>C</application> level (Alexander Korotkov, Andrew Gierth)
</para>
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml
index 9446f8b836..4fa821160c 100644
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@@ -36,7 +36,7 @@
described in <classname>pg_am</classname>. It is possible to add a
new index access method by writing the necessary code and
then creating an entry in <classname>pg_am</classname> — but that is
- beyond the scope of this chapter (see <xref linkend="indexam"/>).
+ beyond the scope of this chapter (see <xref linkend="am"/>).
</para>
<para>
--
2.18.0.windows.1
0002-New-API-s-are-added.patchapplication/octet-stream; name=0002-New-API-s-are-added.patchDownload
From 826223860a977394ed2ebc1f07c1533b0f240e9c Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 22 Oct 2018 17:58:27 +1100
Subject: [PATCH 2/3] New API's are added
1. init fork API
2. set New filenode
3. Estimate rel size
Set New filenode and estimate rel size are added as function hooks
and these not complusory for heap, when they exist, it take over
the control.
---
src/backend/access/heap/heapam_handler.c | 27 +++++++++++++++++++++++-
src/backend/catalog/heap.c | 24 ++-------------------
src/backend/commands/tablecmds.c | 4 ++--
src/backend/optimizer/util/plancat.c | 12 +++++++++++
src/backend/utils/cache/relcache.c | 12 +++++++++++
src/include/access/tableam.h | 16 ++++++++++++++
src/include/catalog/heap.h | 2 --
7 files changed, 70 insertions(+), 27 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3254e30a45..ae832e1f71 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -30,6 +30,7 @@
#include "catalog/catalog.h"
#include "catalog/index.h"
#include "catalog/pg_am_d.h"
+#include "catalog/storage_xlog.h"
#include "executor/executor.h"
#include "pgstat.h"
#include "storage/lmgr.h"
@@ -2118,6 +2119,28 @@ heap_copy_for_cluster(Relation OldHeap, Relation NewHeap, Relation OldIndex,
pfree(isnull);
}
+/*
+ * Set up an init fork for an unlogged table so that it can be correctly
+ * reinitialized on restart. An immediate sync is required even if the
+ * page has been logged, because the write did not go through
+ * shared_buffers and therefore a concurrent checkpoint may have moved
+ * the redo pointer past our xlog record. Recovery may as well remove it
+ * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
+ * record. Therefore, logging is necessary even if wal_level=minimal.
+ */
+static void
+heap_create_init_fork(Relation rel)
+{
+ Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
+ rel->rd_rel->relkind == RELKIND_MATVIEW ||
+ rel->rd_rel->relkind == RELKIND_TOASTVALUE);
+ RelationOpenSmgr(rel);
+ smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
+ log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
+ smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
+}
+
+
static const TableAmRoutine heapam_methods = {
.type = T_TableAmRoutine,
@@ -2165,7 +2188,9 @@ static const TableAmRoutine heapam_methods = {
.index_build_range_scan = IndexBuildHeapRangeScan,
- .index_validate_scan = validate_index_heapscan
+ .index_validate_scan = validate_index_heapscan,
+
+ .CreateInitFork = heap_create_init_fork
};
const TableAmRoutine *
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 398f90775f..840319668a 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -32,6 +32,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/sysattr.h"
+#include "access/tableam.h"
#include "access/transam.h"
#include "access/xact.h"
#include "access/xlog.h"
@@ -1442,7 +1443,7 @@ heap_create_with_catalog(const char *relname,
*/
if (relpersistence == RELPERSISTENCE_UNLOGGED &&
relkind != RELKIND_PARTITIONED_TABLE)
- heap_create_init_fork(new_rel_desc);
+ table_create_init_fork(new_rel_desc);
/*
* ok, the relation has been cataloged, so close our relations and return
@@ -1454,27 +1455,6 @@ heap_create_with_catalog(const char *relname,
return relid;
}
-/*
- * Set up an init fork for an unlogged table so that it can be correctly
- * reinitialized on restart. An immediate sync is required even if the
- * page has been logged, because the write did not go through
- * shared_buffers and therefore a concurrent checkpoint may have moved
- * the redo pointer past our xlog record. Recovery may as well remove it
- * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
- * record. Therefore, logging is necessary even if wal_level=minimal.
- */
-void
-heap_create_init_fork(Relation rel)
-{
- Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
- rel->rd_rel->relkind == RELKIND_MATVIEW ||
- rel->rd_rel->relkind == RELKIND_TOASTVALUE);
- RelationOpenSmgr(rel);
- smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
- log_smgrcreate(&rel->rd_smgr->smgr_rnode.node, INIT_FORKNUM);
- smgrimmedsync(rel->rd_smgr, INIT_FORKNUM);
-}
-
/*
* RelationRemoveInheritance
*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f3526b267d..3c46a48882 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1649,7 +1649,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
RecentXmin, minmulti);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
- heap_create_init_fork(rel);
+ table_create_init_fork(rel);
heap_relid = RelationGetRelid(rel);
toast_relid = rel->rd_rel->reltoastrelid;
@@ -1663,7 +1663,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
RecentXmin, minmulti);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
- heap_create_init_fork(rel);
+ table_create_init_fork(rel);
heap_close(rel, NoLock);
}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8da468a86f..3355f8bff4 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -947,6 +947,18 @@ estimate_rel_size(Relation rel, int32 *attr_widths,
BlockNumber relallvisible;
double density;
+ /*
+ * If the relation contains any specific EstimateRelSize
+ * function, use that instead of the regular default heap method.
+ */
+ if (rel->rd_tableamroutine &&
+ rel->rd_tableamroutine->EstimateRelSize)
+ {
+ rel->rd_tableamroutine->EstimateRelSize(rel, attr_widths, pages,
+ tuples, allvisfrac);
+ return;
+ }
+
switch (rel->rd_rel->relkind)
{
case RELKIND_RELATION:
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 0d6e5a189f..9cc8e98e40 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3424,6 +3424,18 @@ RelationSetNewRelfilenode(Relation relation, char persistence,
HeapTuple tuple;
Form_pg_class classform;
+ /*
+ * If the relation contains any specific SetNewFilenode
+ * function, use that instead of the regular default heap method.
+ */
+ if (relation->rd_tableamroutine &&
+ relation->rd_tableamroutine->SetNewFileNode)
+ {
+ relation->rd_tableamroutine->SetNewFileNode(relation, persistence,
+ freezeXid, minmulti);
+ return;
+ }
+
/* Indexes, sequences must have Invalid frozenxid; other rels must not */
Assert((relation->rd_rel->relkind == RELKIND_INDEX ||
relation->rd_rel->relkind == RELKIND_SEQUENCE) ?
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 7fe6ff6c22..eb7c9b8007 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -194,6 +194,12 @@ struct SampleScanState;
typedef bool (*SampleScanNextBlock_function)(TableScanDesc scan, struct SampleScanState *scanstate);
typedef bool (*SampleScanNextTuple_function)(TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot);
+typedef void (*CreateInitFork_function)(Relation rel);
+typedef void (*EstimateRelSize_function)(Relation rel, int32 *attr_widths,
+ BlockNumber *pages, double *tuples, double *allvisfrac);
+typedef void (*SetNewFileNode_function)(Relation relation, char persistence,
+ TransactionId freezeXid, MultiXactId minmulti);
+
/*
* API struct for a table AM. Note this must be allocated in a
* server-lifetime manner, typically as a static const struct.
@@ -250,6 +256,10 @@ typedef struct TableAmRoutine
IndexBuildRangeScan_function index_build_range_scan;
IndexValidateScan_function index_validate_scan;
+
+ CreateInitFork_function CreateInitFork;
+ EstimateRelSize_function EstimateRelSize;
+ SetNewFileNode_function SetNewFileNode;
} TableAmRoutine;
static inline const TupleTableSlotOps*
@@ -741,6 +751,12 @@ table_index_build_range_scan(Relation heapRelation,
scan);
}
+static inline void
+table_create_init_fork(Relation relation)
+{
+ relation->rd_tableamroutine->CreateInitFork(relation);
+}
+
extern BlockNumber table_parallelscan_nextpage(TableScanDesc scan);
extern void table_parallelscan_startblock_init(TableScanDesc scan);
extern Size table_parallelscan_estimate(Snapshot snapshot);
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 4584b3473c..c0e706ecc9 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -77,8 +77,6 @@ extern Oid heap_create_with_catalog(const char *relname,
Oid relrewrite,
ObjectAddress *typaddress);
-extern void heap_create_init_fork(Relation rel);
-
extern void heap_drop_with_catalog(Oid relid);
extern void heap_truncate(List *relids);
--
2.18.0.windows.1
0001-Further-fixes-and-cleanup.patchapplication/octet-stream; name=0001-Further-fixes-and-cleanup.patchDownload
From 2982d89e825c334d07aa14e8e5038ea02034e581 Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Mon, 22 Oct 2018 16:06:10 +1100
Subject: [PATCH 1/3] Further fixes and cleanup
1. Remove the old slot interface file and also update the Makefile.
2. CREATE AS USING method grammer support
3. Materialized view grammer support
This change was missed during earlier USING grammer support.
3. Remove the extra Tuple visibility check
In heapgettup_pagemode the tuple visiblity check is added
during the early devlopment of pluggable storage, but the visibility
check is already carried out in heapgetpage function itself.
4. Handling HeapTupleInvisible case
In update/delete scenarios, when the tuple is
concurrently updated/deleted, sometimes locking
of a tuple may return HeapTupleInvisible. Handle
that case as nothing to do.
regression fixes
1. scan start offset fix during analyze
2. Materialize the slot before they are processed using intorel_receive
3. ROW_MARK_COPY support by force store of heap tuple
4. partition prune extra heap page fix
5. Basic syntax usage tests addition
---
contrib/pg_visibility/pg_visibility.c | 5 +-
src/backend/access/heap/heapam.c | 28 +++-----
src/backend/access/heap/heapam_handler.c | 6 +-
src/backend/access/heap/heapam_visibility.c | 4 +-
src/backend/access/index/genam.c | 3 +-
src/backend/access/table/Makefile | 2 +-
src/backend/access/table/tableam_common.c | 0
src/backend/catalog/heap.c | 17 +++++
src/backend/commands/cluster.c | 6 +-
src/backend/commands/createas.c | 5 ++
src/backend/executor/execExprInterp.c | 16 ++---
src/backend/executor/execMain.c | 13 +---
src/backend/executor/execReplication.c | 3 -
src/backend/executor/execTuples.c | 21 ++++++
src/backend/executor/nodeBitmapHeapscan.c | 12 ++++
src/backend/executor/nodeModifyTable.c | 22 +++++-
src/backend/parser/gram.y | 18 ++---
src/include/executor/tuptable.h | 1 +
src/include/nodes/primnodes.h | 1 +
src/test/regress/expected/create_am.out | 78 +++++++++++++++++++++
src/test/regress/sql/create_am.sql | 46 ++++++++++++
21 files changed, 243 insertions(+), 64 deletions(-)
delete mode 100644 src/backend/access/table/tableam_common.c
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index dce5262e34..88ca4fd2af 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -563,12 +563,13 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
rel = relation_open(relid, AccessShareLock);
+ /* Only some relkinds have a visibility map */
+ check_relation_relkind(rel);
+
if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("only heap AM is supported")));
- /* Only some relkinds have a visibility map */
- check_relation_relkind(rel);
nblocks = RelationGetNumberOfBlocks(rel);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ec99d0bcae..ef6b4c3e54 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -947,31 +947,25 @@ heapgettup_pagemode(HeapScanDesc scan,
/*
* if current tuple qualifies, return it.
*/
- if (HeapTupleSatisfies(tuple, scan->rs_scan.rs_snapshot, scan->rs_cbuf))
+ if (key != NULL)
{
- /*
- * if current tuple qualifies, return it.
- */
- if (key != NULL)
- {
- bool valid;
+ bool valid;
- HeapKeyTest(tuple, RelationGetDescr(scan->rs_scan.rs_rd),
- nkeys, key, valid);
- if (valid)
- {
- scan->rs_cindex = lineindex;
- LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
- return;
- }
- }
- else
+ HeapKeyTest(tuple, RelationGetDescr(scan->rs_scan.rs_rd),
+ nkeys, key, valid);
+ if (valid)
{
scan->rs_cindex = lineindex;
LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
return;
}
}
+ else
+ {
+ scan->rs_cindex = lineindex;
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+ return;
+ }
/*
* otherwise move to the next item on the page
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c3960dc91f..3254e30a45 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1741,7 +1741,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
{
HeapScanDesc scan = (HeapScanDesc) sscan;
Page targpage;
- OffsetNumber targoffset = scan->rs_cindex;
+ OffsetNumber targoffset;
OffsetNumber maxoffset;
BufferHeapTupleTableSlot *hslot;
@@ -1751,7 +1751,9 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
maxoffset = PageGetMaxOffsetNumber(targpage);
/* Inner loop over all tuples on the selected page */
- for (targoffset = scan->rs_cindex; targoffset <= maxoffset; targoffset++)
+ for (targoffset = scan->rs_cindex ? scan->rs_cindex : FirstOffsetNumber;
+ targoffset <= maxoffset;
+ targoffset++)
{
ItemId itemid;
HeapTuple targtuple = &hslot->base.tupdata;
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 8233475aa0..7bad246f55 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1838,8 +1838,10 @@ HeapTupleSatisfies(HeapTuple stup, Snapshot snapshot, Buffer buffer)
case NON_VACUUMABLE_VISIBILTY:
return HeapTupleSatisfiesNonVacuumable(stup, snapshot, buffer);
break;
- default:
+ case END_OF_VISIBILITY:
Assert(0);
break;
}
+
+ return false; /* keep compiler quiet */
}
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index e06bd0479f..94c9702dc1 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,10 +455,9 @@ systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup)
if (sysscan->irel)
{
- IndexScanDesc scan = sysscan->iscan;
IndexFetchHeapData *hscan = (IndexFetchHeapData *) sysscan->iscan->xs_heapfetch;
- Assert(IsMVCCSnapshot(scan->xs_snapshot));
+ Assert(IsMVCCSnapshot(sysscan->iscan->xs_snapshot));
//Assert(tup == &hscan->xs_ctup); replace by peeking into slot?
Assert(BufferIsValid(hscan->xs_cbuf));
/* must hold a buffer lock to call HeapTupleSatisfiesVisibility */
diff --git a/src/backend/access/table/Makefile b/src/backend/access/table/Makefile
index fe22bf9208..006ba99182 100644
--- a/src/backend/access/table/Makefile
+++ b/src/backend/access/table/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/access/table
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = tableam.o tableamapi.o tableam_common.o
+OBJS = tableam.o tableamapi.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/table/tableam_common.c b/src/backend/access/table/tableam_common.c
deleted file mode 100644
index e69de29bb2..0000000000
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 38b368f916..398f90775f 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -42,6 +42,7 @@
#include "catalog/index.h"
#include "catalog/objectaccess.h"
#include "catalog/partition.h"
+#include "catalog/pg_am.h"
#include "catalog/pg_attrdef.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_constraint.h"
@@ -1388,6 +1389,22 @@ heap_create_with_catalog(const char *relname,
recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
}
+ /*
+ * Make a dependency link to force the relation to be deleted if its
+ * access method is. Do this only for relation and materialized views.
+ *
+ * No need to add an explicit dependency with toast, as the original
+ * table depends on it.
+ */
+ if ((relkind == RELKIND_RELATION) ||
+ (relkind == RELKIND_MATVIEW))
+ {
+ referenced.classId = AccessMethodRelationId;
+ referenced.objectId = accessmtd;
+ referenced.objectSubId = 0;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+ }
+
if (relacl != NULL)
{
int nnewmembers;
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 63974979da..3ce8862a01 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -755,8 +755,6 @@ copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
Relation relRelation;
HeapTuple reltup;
Form_pg_class relform;
- TupleDesc oldTupDesc;
- TupleDesc newTupDesc;
TransactionId OldestXmin;
TransactionId FreezeXid;
MultiXactId MultiXactCutoff;
@@ -784,9 +782,7 @@ copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
* Their tuple descriptors should be exactly alike, but here we only need
* assume that they have the same number of columns.
*/
- oldTupDesc = RelationGetDescr(OldHeap);
- newTupDesc = RelationGetDescr(NewHeap);
- Assert(newTupDesc->natts == oldTupDesc->natts);
+ Assert(RelationGetDescr(NewHeap)->natts == RelationGetDescr(OldHeap)->natts);
/*
* If the OldHeap has a toast table, get lock on the toast table to keep
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 84de804175..82c0eb2824 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -108,6 +108,7 @@ create_ctas_internal(List *attrList, IntoClause *into)
create->oncommit = into->onCommit;
create->tablespacename = into->tableSpaceName;
create->if_not_exists = false;
+ create->accessMethod = into->accessMethod;
// PBORKED: toast options
@@ -593,6 +594,10 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
if (myState->rel->rd_rel->relhasoids)
slot->tts_tupleOid = InvalidOid;
+ /* Materialize the slot */
+ if (!TTS_IS_VIRTUAL(slot))
+ ExecMaterializeSlot(slot);
+
table_insert(myState->rel,
slot,
myState->output_cid,
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index ef94ac4aa0..8df85c2f48 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -529,20 +529,13 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
Assert(TTS_IS_HEAPTUPLE(outerslot) ||
TTS_IS_BUFFERTUPLE(outerslot));
- /* The slot should have a valid heap tuple. */
-#if FIXME
- /* The slot should have a valid heap tuple. */
- Assert(hslot->tuple != NULL);
-#endif
-
- /*
- * hari
- * Assert(outerslot->tts_storageslotam->slot_is_physical_tuple(outerslot));
- */
if (attnum == TableOidAttributeNumber)
d = ObjectIdGetDatum(outerslot->tts_tableOid);
else
{
+ /* The slot should have a valid heap tuple. */
+ Assert(hslot->tuple != NULL);
+
/* heap_getsysattr has sufficient defenses against bad attnums */
d = heap_getsysattr(hslot->tuple, attnum,
outerslot->tts_tupleDescriptor,
@@ -570,6 +563,9 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
Assert(TTS_IS_HEAPTUPLE(scanslot) ||
TTS_IS_BUFFERTUPLE(scanslot));
+ if (hslot->tuple == NULL)
+ ExecMaterializeSlot(scanslot);
+
d = heap_getsysattr(hslot->tuple, attnum,
scanslot->tts_tupleDescriptor,
op->resnull);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index e055c0a7c6..34ef86a5bd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2594,7 +2594,7 @@ EvalPlanQual(EState *estate, EPQState *epqstate,
* datums that may be present in copyTuple). As with the next step, this
* is to guard against early re-use of the EPQ query.
*/
- if (!TupIsNull(slot))
+ if (!TupIsNull(slot) && !TTS_IS_VIRTUAL(slot))
ExecMaterializeSlot(slot);
#if FIXME
@@ -2787,16 +2787,7 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
if (isNull)
continue;
- elog(ERROR, "frak, need to implement ROW_MARK_COPY");
-#ifdef FIXME
- // FIXME: this should just deform the tuple and store it as a
- // virtual one.
- tuple = table_tuple_by_datum(erm->relation, datum, erm->relid);
-
- /* store tuple */
- EvalPlanQualSetTuple(epqstate, erm->rti, tuple);
-#endif
-
+ ExecForceStoreHeapTupleDatum(datum, slot);
}
}
}
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 452973e4ca..489e7d42a2 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -239,9 +239,6 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
SnapshotData snap;
TransactionId xwait;
bool found;
- TupleDesc desc = RelationGetDescr(rel);
-
- Assert(equalTupleDescs(desc, outslot->tts_tupleDescriptor));
/* Start a heap scan. */
InitDirtySnapshot(snap);
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 917bf80f71..74149cc3ad 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -1364,6 +1364,27 @@ ExecStoreAllNullTuple(TupleTableSlot *slot)
return ExecFinishStoreSlotValues(slot);
}
+void
+ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot)
+{
+ HeapTuple tuple;
+ HeapTupleHeader td;
+
+ td = DatumGetHeapTupleHeader(data);
+
+ tuple = (HeapTuple) palloc(HEAPTUPLESIZE + HeapTupleHeaderGetDatumLength(td));
+ tuple->t_len = HeapTupleHeaderGetDatumLength(td);
+ tuple->t_self = td->t_ctid;
+ tuple->t_data = (HeapTupleHeader) ((char *) tuple + HEAPTUPLESIZE);
+ memcpy((char *) tuple->t_data, (char *) td, tuple->t_len);
+
+ ExecClearTuple(slot);
+
+ heap_deform_tuple(tuple, slot->tts_tupleDescriptor,
+ slot->tts_values, slot->tts_isnull);
+ ExecFinishStoreSlotValues(slot);
+}
+
/* --------------------------------
* ExecFetchSlotTuple
* Fetch the slot's regular physical tuple.
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 56880e3d16..36ca07beb2 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -224,6 +224,18 @@ BitmapHeapNext(BitmapHeapScanState *node)
BitmapAdjustPrefetchIterator(node, tbmres);
+ /*
+ * Ignore any claimed entries past what we think is the end of the
+ * relation. (This is probably not necessary given that we got at
+ * least AccessShareLock on the table before performing any of the
+ * indexscans, but let's be safe.)
+ */
+ if (tbmres->blockno >= scan->rs_nblocks)
+ {
+ node->tbmres = tbmres = NULL;
+ continue;
+ }
+
/*
* We can skip fetching the heap page if we don't need any fields
* from the heap, and the bitmap entries don't need rechecking,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 3cc9092413..b3851b180d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -607,7 +607,7 @@ ExecDelete(ModifyTableState *mtstate,
bool canSetTag,
bool changingPart,
bool *tupleDeleted,
- TupleTableSlot **epqslot)
+ TupleTableSlot **epqreturnslot)
{
ResultRelInfo *resultRelInfo;
Relation resultRelationDesc;
@@ -632,7 +632,7 @@ ExecDelete(ModifyTableState *mtstate,
bool dodelete;
dodelete = ExecBRDeleteTriggers(estate, epqstate, resultRelInfo,
- tupleid, oldtuple, epqslot);
+ tupleid, oldtuple, epqreturnslot);
if (!dodelete) /* "do nothing" */
return NULL;
@@ -724,8 +724,21 @@ ldelete:;
/* Tuple no more passing quals, exiting... */
return NULL;
}
+
+ /**/
+ if (epqreturnslot)
+ {
+ *epqreturnslot = epqslot;
+ return NULL;
+ }
+
goto ldelete;
}
+ else if (result == HeapTupleInvisible)
+ {
+ /* tuple is not visible; nothing to do */
+ return NULL;
+ }
}
switch (result)
@@ -1196,6 +1209,11 @@ lreplace:;
slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
goto lreplace;
}
+ else if (result == HeapTupleInvisible)
+ {
+ /* tuple is not visible; nothing to do */
+ return NULL;
+ }
}
switch (result)
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 54382aba88..ea48e1d6e8 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4037,7 +4037,6 @@ CreateStatsStmt:
*
*****************************************************************************/
-// PBORKED: storage option
CreateAsStmt:
CREATE OptTemp TABLE create_as_target AS SelectStmt opt_with_data
{
@@ -4068,14 +4067,16 @@ CreateAsStmt:
;
create_as_target:
- qualified_name opt_column_list OptWith OnCommitOption OptTableSpace
+ qualified_name opt_column_list table_access_method_clause
+ OptWith OnCommitOption OptTableSpace
{
$$ = makeNode(IntoClause);
$$->rel = $1;
$$->colNames = $2;
- $$->options = $3;
- $$->onCommit = $4;
- $$->tableSpaceName = $5;
+ $$->accessMethod = $3;
+ $$->options = $4;
+ $$->onCommit = $5;
+ $$->tableSpaceName = $6;
$$->viewQuery = NULL;
$$->skipData = false; /* might get changed later */
}
@@ -4125,14 +4126,15 @@ CreateMatViewStmt:
;
create_mv_target:
- qualified_name opt_column_list opt_reloptions OptTableSpace
+ qualified_name opt_column_list table_access_method_clause opt_reloptions OptTableSpace
{
$$ = makeNode(IntoClause);
$$->rel = $1;
$$->colNames = $2;
- $$->options = $3;
+ $$->accessMethod = $3;
+ $$->options = $4;
$$->onCommit = ONCOMMIT_NOOP;
- $$->tableSpaceName = $4;
+ $$->tableSpaceName = $5;
$$->viewQuery = NULL; /* filled at analysis time */
$$->skipData = false; /* might get changed later */
}
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index 05f38cfd0d..20fc425a27 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -476,6 +476,7 @@ extern TupleTableSlot *ExecCopySlot(TupleTableSlot *dstslot,
extern void ExecForceStoreHeapTuple(HeapTuple tuple,
TupleTableSlot *slot);
+extern void ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot);
extern void slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum);
extern Datum slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull);
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 40f6eb03d2..4d194a8c2a 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -111,6 +111,7 @@ typedef struct IntoClause
RangeVar *rel; /* target relation name */
List *colNames; /* column names to assign, or NIL */
+ char *accessMethod; /* table access method */
List *options; /* options from WITH clause */
OnCommitAction onCommit; /* what do we do at COMMIT? */
char *tableSpaceName; /* table space to use, or NULL */
diff --git a/src/test/regress/expected/create_am.out b/src/test/regress/expected/create_am.out
index 47dd885c4e..a4094ca3f1 100644
--- a/src/test/regress/expected/create_am.out
+++ b/src/test/regress/expected/create_am.out
@@ -99,3 +99,81 @@ HINT: Use DROP ... CASCADE to drop the dependent objects too.
-- Drop access method cascade
DROP ACCESS METHOD gist2 CASCADE;
NOTICE: drop cascades to index grect2ind2
+-- Create a heap2 table am handler with heapam handler
+CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler;
+SELECT * FROM pg_am where amtype = 't';
+ amname | amhandler | amtype
+--------+----------------------+--------
+ heap | heap_tableam_handler | t
+ heap2 | heap_tableam_handler | t
+(2 rows)
+
+CREATE TABLE tbl_heap2(f1 int, f2 char(100)) using heap2;
+INSERT INTO tbl_heap2 VALUES(generate_series(1,10), 'Test series');
+SELECT count(*) FROM tbl_heap2;
+ count
+-------
+ 10
+(1 row)
+
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+ where a.oid = r.relam AND r.relname = 'tbl_heap2';
+ relname | relkind | amname
+-----------+---------+--------
+ tbl_heap2 | r | heap2
+(1 row)
+
+-- create table as using heap2
+CREATE TABLE tblas_heap2 using heap2 AS select * from tbl_heap2;
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+ where a.oid = r.relam AND r.relname = 'tblas_heap2';
+ relname | relkind | amname
+-------------+---------+--------
+ tblas_heap2 | r | heap2
+(1 row)
+
+--
+-- select into doesn't support new syntax, so it should be
+-- default access method.
+--
+SELECT INTO tblselectinto_heap from tbl_heap2;
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+ where a.oid = r.relam AND r.relname = 'tblselectinto_heap';
+ relname | relkind | amname
+--------------------+---------+--------
+ tblselectinto_heap | r | heap
+(1 row)
+
+DROP TABLE tblselectinto_heap;
+-- create materialized view using heap2
+CREATE MATERIALIZED VIEW mv_heap2 USING heap2 AS
+ SELECT * FROM tbl_heap2;
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+ where a.oid = r.relam AND r.relname = 'mv_heap2';
+ relname | relkind | amname
+----------+---------+--------
+ mv_heap2 | m | heap2
+(1 row)
+
+-- Try creating the unsupported relation kinds with using syntax
+CREATE VIEW test_view USING heap2 AS SELECT * FROM tbl_heap2;
+ERROR: syntax error at or near "USING"
+LINE 1: CREATE VIEW test_view USING heap2 AS SELECT * FROM tbl_heap2...
+ ^
+CREATE SEQUENCE test_seq USING heap2;
+ERROR: syntax error at or near "USING"
+LINE 1: CREATE SEQUENCE test_seq USING heap2;
+ ^
+-- Drop table access method, but fails as objects depends on it
+DROP ACCESS METHOD heap2;
+ERROR: cannot drop access method heap2 because other objects depend on it
+DETAIL: table tbl_heap2 depends on access method heap2
+table tblas_heap2 depends on access method heap2
+materialized view mv_heap2 depends on access method heap2
+HINT: Use DROP ... CASCADE to drop the dependent objects too.
+-- Drop table access method with cascade
+DROP ACCESS METHOD heap2 CASCADE;
+NOTICE: drop cascades to 3 other objects
+DETAIL: drop cascades to table tbl_heap2
+drop cascades to table tblas_heap2
+drop cascades to materialized view mv_heap2
diff --git a/src/test/regress/sql/create_am.sql b/src/test/regress/sql/create_am.sql
index 3e0ac104f3..0472a60f20 100644
--- a/src/test/regress/sql/create_am.sql
+++ b/src/test/regress/sql/create_am.sql
@@ -66,3 +66,49 @@ DROP ACCESS METHOD gist2;
-- Drop access method cascade
DROP ACCESS METHOD gist2 CASCADE;
+
+-- Create a heap2 table am handler with heapam handler
+CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler;
+
+SELECT * FROM pg_am where amtype = 't';
+
+CREATE TABLE tbl_heap2(f1 int, f2 char(100)) using heap2;
+INSERT INTO tbl_heap2 VALUES(generate_series(1,10), 'Test series');
+SELECT count(*) FROM tbl_heap2;
+
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+ where a.oid = r.relam AND r.relname = 'tbl_heap2';
+
+-- create table as using heap2
+CREATE TABLE tblas_heap2 using heap2 AS select * from tbl_heap2;
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+ where a.oid = r.relam AND r.relname = 'tblas_heap2';
+
+--
+-- select into doesn't support new syntax, so it should be
+-- default access method.
+--
+SELECT INTO tblselectinto_heap from tbl_heap2;
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+ where a.oid = r.relam AND r.relname = 'tblselectinto_heap';
+
+DROP TABLE tblselectinto_heap;
+
+-- create materialized view using heap2
+CREATE MATERIALIZED VIEW mv_heap2 USING heap2 AS
+ SELECT * FROM tbl_heap2;
+
+SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a
+ where a.oid = r.relam AND r.relname = 'mv_heap2';
+
+-- Try creating the unsupported relation kinds with using syntax
+CREATE VIEW test_view USING heap2 AS SELECT * FROM tbl_heap2;
+
+CREATE SEQUENCE test_seq USING heap2;
+
+
+-- Drop table access method, but fails as objects depends on it
+DROP ACCESS METHOD heap2;
+
+-- Drop table access method with cascade
+DROP ACCESS METHOD heap2 CASCADE;
--
2.18.0.windows.1
Ashwin (copied) and I got a chance to go through the latest code from
Andres' github repository. We would like to share some
comments/quesitons:
The TupleTableSlot argument is well suited for row-oriented storage.
For a column-oriented storage engine, a projection list indicating the
columns to be scanned may be necessary. Is it possible to share this
information with current interface?
We realized that DDLs such as heap_create_with_catalog() are not
generalized. Haribabu's latest patch that adds
SetNewFileNode_function() and CreateInitFort_function() is a step
towards this end. However, the current API assumes that the storage
engine uses relation forks. Isn't that too restrictive?
TupleDelete_function() accepts changingPart as a parameter to indicate
if this deletion is part of a movement from one partition to another.
Partitioning is a higher level abstraction as compared to storage.
Ideally, storage layer should have no knowledge of partitioning. The
tuple delete API should not accept any parameter related to
partitioning.
The API needs to be more accommodating towards block sizes used in
storage engines. Currently, the same block size as heap seems to be
assumed, as evident from the type of some members of generic scan
object:
typedef struct TableScanDescData
{
/* state set up at initscan time */
BlockNumber rs_nblocks; /* total number of blocks in rel */
BlockNumber rs_startblock; /* block # to start at */
BlockNumber rs_numblocks; /* max number of blocks to scan */
/* rs_numblocks is usually InvalidBlockNumber, meaning "scan whole rel" */
bool rs_syncscan; /* report location to syncscan logic? */
} TableScanDescData;
Using bytes to represent this information would be more generic. E.g.
rs_startlocation as bytes/offset instead of rs_startblock and so on.
Asim
On Thu, Nov 22, 2018 at 1:12 PM Asim R P <apraveen@pivotal.io> wrote:
Ashwin (copied) and I got a chance to go through the latest code from
Andres' github repository. We would like to share some
comments/quesitons:
Thanks for the review.
The TupleTableSlot argument is well suited for row-oriented storage.
For a column-oriented storage engine, a projection list indicating the
columns to be scanned may be necessary. Is it possible to share this
information with current interface?
Currently all the interfaces are designed for row-oriented storage, as you
said we need a new API for projection list. The current patch set itself
is big and it needs to stabilized and then in the next set of the patches,
those new API's will be added that will be useful for columnar storage.
We realized that DDLs such as heap_create_with_catalog() are not
generalized. Haribabu's latest patch that adds
SetNewFileNode_function() and CreateInitFort_function() is a step
towards this end. However, the current API assumes that the storage
engine uses relation forks. Isn't that too restrictive?
Current set of API has many assumptions and uses the existing framework.
Thanks for your point, will check it how to enhance it.
TupleDelete_function() accepts changingPart as a parameter to indicate
if this deletion is part of a movement from one partition to another.
Partitioning is a higher level abstraction as compared to storage.
Ideally, storage layer should have no knowledge of partitioning. The
tuple delete API should not accept any parameter related to
partitioning.
Thanks for your point, will look into it in how to change extract it.
The API needs to be more accommodating towards block sizes used in
storage engines. Currently, the same block size as heap seems to be
assumed, as evident from the type of some members of generic scan
object:typedef struct TableScanDescData
{
/* state set up at initscan time */
BlockNumber rs_nblocks; /* total number of blocks in rel */
BlockNumber rs_startblock; /* block # to start at */
BlockNumber rs_numblocks; /* max number of blocks to scan */
/* rs_numblocks is usually InvalidBlockNumber, meaning "scan whole rel"
*/
bool rs_syncscan; /* report location to syncscan logic? */
} TableScanDescData;Using bytes to represent this information would be more generic. E.g.
rs_startlocation as bytes/offset instead of rs_startblock and so on.
I doubt that this may not be the only one that needs a change to support
different block sizes for different storage interfaces. Thanks for your
point,
but definitely this can be taken care in the next set of patches.
Andres, as the tupletableslot changes are committed, do you want me to
share the rebased pluggable storage patch? you already working on it?
Regards,
Haribabu Kommi
Fujitsu Australia
Hi,
FWIW, now that oids are removed, and the tuple table slot abstraction
got in, I'm working on rebasing the pluggable storage patchset ontop of
that.
On 2018-11-27 12:48:36 +1100, Haribabu Kommi wrote:
On Thu, Nov 22, 2018 at 1:12 PM Asim R P <apraveen@pivotal.io> wrote:
Ashwin (copied) and I got a chance to go through the latest code from
Andres' github repository. We would like to share some
comments/quesitons:Thanks for the review.
The TupleTableSlot argument is well suited for row-oriented storage.
For a column-oriented storage engine, a projection list indicating the
columns to be scanned may be necessary. Is it possible to share this
information with current interface?Currently all the interfaces are designed for row-oriented storage, as you
said we need a new API for projection list. The current patch set itself
is big and it needs to stabilized and then in the next set of the patches,
those new API's will be added that will be useful for columnar storage.
Precisely.
TupleDelete_function() accepts changingPart as a parameter to indicate
if this deletion is part of a movement from one partition to another.
Partitioning is a higher level abstraction as compared to storage.
Ideally, storage layer should have no knowledge of partitioning. The
tuple delete API should not accept any parameter related to
partitioning.Thanks for your point, will look into it in how to change extract it.
I don't think that's actually a problem. The changingPart parameter is
just a marker that the deletion is part of moving a tuple across
partitions. For heap and everythign compatible that's used to include
information to the tuple that concurrent modifications to the tuple
should error out when reaching such a tuple via EPQ.
Andres, as the tupletableslot changes are committed, do you want me to
share the rebased pluggable storage patch? you already working on it?
Working on it.
Greetings,
Andres Freund
Hi,
On 2018/11/02 9:17, Haribabu Kommi wrote:
Here I attached the cumulative fixes of the patches, new API additions for
zheap and
basic outline of the documentation.
I've read the documentation patch while also looking at the code and here
are some comments.
+ Each table is stored as its own physical
<firstterm>relation</firstterm> and so
+ is described by an entry in the <structname>pg_class</structname> catalog.
I think the "so" in "and so is described by an entry in..." is not necessary.
+ The contents of an table are entirely under the control of its access
method.
"a" table
+ (All the access methods furthermore use the standard page layout
described in
+ <xref linkend="storage-page-layout"/>.)
Maybe write the two sentences above as:
A table's content is entirely controlled by its access method, although
all access methods use the same standard page layout described in <xref
linkend="storage-page-layout"/>.
+ SlotCallbacks_function slot_callbacks;
+
+ SnapshotSatisfies_function snapshot_satisfies;
+ SnapshotSatisfiesUpdate_function snapshot_satisfiesUpdate;
+ SnapshotSatisfiesVacuum_function snapshot_satisfiesVacuum;
Like other functions, how about a one sentence comment for these, like:
/*
* Function to get an AM-specific set of functions for manipulating
* TupleTableSlots
*/
SlotCallbacks_function slot_callbacks;
/* AM-specific snapshot visibility determination functions */
SnapshotSatisfies_function snapshot_satisfies;
SnapshotSatisfiesUpdate_function snapshot_satisfiesUpdate;
SnapshotSatisfiesVacuum_function snapshot_satisfiesVacuum;
+ TupleFetchFollow_function tuple_fetch_follow;
+
+ GetTupleData_function get_tuple_data;
How about removing the empty line so that get_tuple_data can be seen as
part of the group /* Operations on physical tuples */
+ RelationVacuum_function relation_vacuum;
+ RelationScanAnalyzeNextBlock_function scan_analyze_next_block;
+ RelationScanAnalyzeNextTuple_function scan_analyze_next_tuple;
+ RelationCopyForCluster_function relation_copy_for_cluster;
+ RelationSync_function relation_sync;
Add /* Operations to support VACUUM/ANALYZE */ as a description for this
group?
+ BitmapPagescan_function scan_bitmap_pagescan;
+ BitmapPagescanNext_function scan_bitmap_pagescan_next;
Add /* Operations to support bitmap scans */ as a description for this group?
+ SampleScanNextBlock_function scan_sample_next_block;
+ SampleScanNextTuple_function scan_sample_next_tuple;
Add /* Operations to support sampling scans */ as a description for this
group?
+ ScanEnd_function scan_end;
+ ScanRescan_function scan_rescan;
+ ScanUpdateSnapshot_function scan_update_snapshot;
Move these two to be in the /* Operations on relation scans */ group?
+ BeginIndexFetchTable_function begin_index_fetch;
+ EndIndexFetchTable_function reset_index_fetch;
+ EndIndexFetchTable_function end_index_fetch;
Add /* Operations to support index scans */ as a description for this group?
+ IndexBuildRangeScan_function index_build_range_scan;
+ IndexValidateScan_function index_validate_scan;
Add /* Operations to support index build */ as a description for this group?
+ CreateInitFork_function CreateInitFork;
Add /* Function to create an init fork for unlogged tables */?
By the way, I can see the following two in the source code, but not in the
documentation.
EstimateRelSize_function EstimateRelSize;
SetNewFileNode_function SetNewFileNode;
+ The table construction and maintenance functions that an table access
+ method must provide in <structname>TableAmRoutine</structname> are:
"a" table access method
+ <para>
+<programlisting>
+TupleTableSlotOps *
+slot_callbacks (Relation relation);
+</programlisting>
+ API to access the slot specific methods;
+ Following methods are available;
+ <structname>TTSOpsVirtual</structname>,
+ <structname>TTSOpsHeapTuple</structname>,
+ <structname>TTSOpsMinimalTuple</structname>,
+ <structname>TTSOpsBufferTuple</structname>,
+ </para>
Unless I'm misunderstanding what the TupleTableSlotOps abstraction is or
its relations to the TableAmRoutine abstraction, I think the text
description could better be written as:
"API to get the slot operations struct for a given table access method"
It's not clear to me why various TTSOps* structs are listed here? Is the
point that different AMs may choose one of the listed alternatives? For
example, I see that heap AM implementation returns TTOpsBufferTuple, so it
manipulates slots containing buffered tuples, right? Other AMs are free
to return any one of these? For example, some AMs may never use buffer
manager and hence not use TTOpsBufferTuple. Is that understanding correct?
+ <para>
+<programlisting>
+bool
+snapshot_satisfies (TupleTableSlot *slot, Snapshot snapshot);
+</programlisting>
+ API to check whether the provided slot is visible to the current
+ transaction according the snapshot.
+ </para>
Do you mean:
"API to check whether the tuple contained in the provided slot is
visible...."?
+ <para>
+<programlisting>
+Oid
+tuple_insert (Relation rel, TupleTableSlot *slot, CommandId cid,
+ int options, BulkInsertState bistate);
+</programlisting>
+ API to insert the tuple and provide the <literal>ItemPointerData</literal>
+ where the tuple is successfully inserted.
+ </para>
It's not clear from the signature where you get the ItemPointerData.
Looking at heapam_tuple_insert which puts it in slot->tts_tid, I think
this should mention it a bit differently, like:
API to insert the tuple contained in the provided slot and return its TID,
that is, the location where the tuple is successfully inserted
+ API to insert the tuple with a speculative token. This API is similar
+ like <literal>tuple_insert</literal>, with additional speculative
+ information.
How about:
This API is similar to <literal>tuple_insert</literal>, although with
additional information necessary for speculative insertion
+ <para>
+<programlisting>
+void
+tuple_complete_speculative (Relation rel,
+ TupleTableSlot *slot,
+ uint32 specToken,
+ bool succeeded);
+</programlisting>
+ API to complete the state of the tuple inserted by the API
<literal>tuple_insert_speculative</literal>
+ with the successful completion of the index insert.
+ </para>
How about:
API to complete the speculative insertion of a tuple started by
<literal>tuple_insert_speculative</literal>, invoked after finishing the
index insert
+ <para>
+<programlisting>
+bool
+tuple_fetch_row_version (Relation relation,
+ ItemPointer tid,
+ Snapshot snapshot,
+ TupleTableSlot *slot,
+ Relation stats_relation);
+</programlisting>
+ API to fetch and store the Buffered Heap tuple in the provided slot
+ based on the ItemPointer.
+ </para>
It seems that this description is based on what heapam_fetch_row_version()
does, but it should be more generic, maybe like:
API to fetch a buffered tuple given its TID and store it in the provided slot
+ <para>
+<programlisting>
+HTSU_Result
+TupleLock_function (Relation relation,
+ ItemPointer tid,
+ Snapshot snapshot,
+ TupleTableSlot *slot,
+ CommandId cid,
+ LockTupleMode mode,
+ LockWaitPolicy wait_policy,
+ uint8 flags,
+ HeapUpdateFailureData *hufd);
I guess you meant to write "tuple_lock" here, not "TupleLock_function".
+</programlisting>
+ API to lock the specified the ItemPointer tuple and fetches the newest
version of
+ its tuple and TID.
+ </para>
How about:
API to lock the specified tuple and return the TID of its newest version
+ <para>
+<programlisting>
+void
+tuple_get_latest_tid (Relation relation,
+ Snapshot snapshot,
+ ItemPointer tid);
+</programlisting>
+ API to get the the latest TID of the tuple with the given itempointer.
+ </para>
How about:
API to get TID of the latest version of the specified tuple
+ <para>
+<programlisting>
+bool
+tuple_fetch_follow (struct IndexFetchTableData *scan,
+ ItemPointer tid,
+ Snapshot snapshot,
+ TupleTableSlot *slot,
+ bool *call_again, bool *all_dead);
+</programlisting>
+ API to get the all the tuples of the page that satisfies itempointer.
+ </para>
IIUC, "all the tuples of of the page" in the above sentence means all the
tuples in the HOT chain of a given heap tuple, making this description of
the API slightly specific to the heap AM. Can we make the description
more generic or is the API itself very specific that it cannot be
expressed in generic terms? Ignoring that for a moment, I think the
sentence contains more "the"s than there need to be, so maybe write as:
API to get all tuples on a given page that are linked to the tuple of the
given TID
+ <para>
+<programlisting>
+tuple_data
+get_tuple_data (TupleTableSlot *slot, tuple_data_flags flags);
+</programlisting>
+ API to return the internal structure members of the HeapTuple.
+ </para>
I think this description doesn't mention enough details of both the
information that needs to be specified when calling the function (what's
in flags) and the information that's returned.
+ <para>
+<programlisting>
+bool
+scan_analyze_next_tuple (TableScanDesc scan, TransactionId OldestXmin,
+ double *liverows, double *deadrows, TupleTableSlot
*slot));
+</programlisting>
+ API to analyze the block and fill the buffered heap tuple in the slot
and also
+ provide the live and dead rows.
+ </para>
How about:
API to get the next tuple from the block being scanned, which also updates
the number of live and dead rows encountered
+ <para>
+<programlisting>
+void
+relation_copy_for_cluster (Relation NewHeap, Relation OldHeap, Relation
OldIndex,
+ bool use_sort,
+ TransactionId OldestXmin, TransactionId FreezeXid,
MultiXactId MultiXactCutoff,
+ double *num_tuples, double *tups_vacuumed, double
*tups_recently_dead);
+</programlisting>
+ API to copy one relation to another relation eith using the Index or
table scan.
+ </para>
Typo: eith -> either
But maybe, rewrite this as:
API to make a copy of the content of a relation, optionally sorted using
either the specified index or by sorting explicitly
+ <para>
+<programlisting>
+TableScanDesc
+scan_begin (Relation relation,
+ Snapshot snapshot,
+ int nkeys, ScanKey key,
+ ParallelTableScanDesc parallel_scan,
+ bool allow_strat,
+ bool allow_sync,
+ bool allow_pagemode,
+ bool is_bitmapscan,
+ bool is_samplescan,
+ bool temp_snap);
+</programlisting>
+ API to start the relation scan for the provided relation and returns the
+ <structname>TableScanDesc</structname> structure.
+ </para>
How about:
API to start a scan of a relation using specified options, which returns
the <structname>TableScanDesc</structname> structure to be used for
subsequent scan operations
+ <para>
+<programlisting>
+void
+scansetlimits (TableScanDesc sscan, BlockNumber startBlk, BlockNumber
numBlks);
+</programlisting>
+ API to fix the relation scan range limits.
+ </para>
How about:
API to set scan range endpoints
+ <para>
+<programlisting>
+bool
+scan_bitmap_pagescan (TableScanDesc scan,
+ TBMIterateResult *tbmres);
+</programlisting>
+ API to scan the relation and fill the scan description bitmap with
valid item pointers
+ for the specified block.
+ </para>
This says "to scan the relation", but seems to be concerned with only a
page worth of data as the name also says. Also, it's not clear what "scan
description bitmap" means. Maybe write as:
API to scan the relation block specified in the scan descriptor to collect
and return the tuples requested by the given bitmap
+ <para>
+<programlisting>
+bool
+scan_bitmap_pagescan_next (TableScanDesc scan,
+ TupleTableSlot *slot);
+</programlisting>
+ API to fill the buffered heap tuple data from the bitmap scanned item
pointers and store
+ it in the provided slot.
+ </para>
How about:
API to select the next tuple from the set of tuples of a given page
specified in the scan descriptor and return in the provided slot; returns
false if no more tuples to return on the given page
+ <para>
+<programlisting>
+bool
+scan_sample_next_block (TableScanDesc scan, struct SampleScanState
*scanstate);
+</programlisting>
+ API to scan the relation and fill the scan description bitmap with
valid item pointers
+ for the specified block provided by the sample method.
+ </para>
Looking at the code, this API selects the next block using the sampling
method and nothing more, although I see that the heap AM implementation
also does heapgetpage thus collecting live tuples in the array known only
to heap AM. So, how about:
API to select the next block of the relation using the given sampling
method and set its information in the scan descriptor
+ <para>
+<programlisting>
+bool
+scan_sample_next_tuple (TableScanDesc scan, struct SampleScanState
*scanstate, TupleTableSlot *slot);
+</programlisting>
+ API to fill the buffered heap tuple data from the bitmap scanned item
pointers based on the sample
+ method and store it in the provided slot.
+ </para>
How about:
API to select the next tuple using the given sampling method from the set
of tuples collected from the block previously selected by the sampling method
+ <para>
+<programlisting>
+void
+scan_rescan (TableScanDesc scan, ScanKey key, bool set_params,
+ bool allow_strat, bool allow_sync, bool allow_pagemode);
+</programlisting>
+ API to restart the relation scan with provided data.
+ </para>
How about:
API to restart the given scan using provided options, releasing any
resources (such as buffer pins) already held by the scan
+ <para>
+<programlisting>
+void
+scan_update_snapshot (TableScanDesc scan, Snapshot snapshot);
+</programlisting>
+ API to update the relation scan with the new snapshot.
+ </para>
How about:
API to set the visibility snapshot to be used by a given scan
+ <para>
+<programlisting>
+IndexFetchTableData *
+begin_index_fetch (Relation relation);
+</programlisting>
+ API to prepare the <structname>IndexFetchTableData</structname> for
the relation.
+ </para>
This API is a bit vague. As in, it's not clear from the name when it's to
be called and what's be to be done with the returned struct. How about at
least adding more details about what the returned struct is for, like:
API to get the <structname>IndexFetchTableData</structname> to be assigned
to an index scan on the specified relation
+ <para>
+<programlisting>
+void
+reset_index_fetch (struct IndexFetchTableData* data);
+</programlisting>
+ API to reset the prepared internal members of the
<structname>IndexFetchTableData</structname>.
+ </para>
This description seems wrong if I look at the code. Its purpose seems to
be reset the AM-specific members, such as releasing the buffer pin held in
xs_cbuf in the heap AM's case.
How about:
API to release AM-specific resources held by the
<structname>IndexFetchTableData</structname> of a given index scan
+ <para>
+<programlisting>
+void
+end_index_fetch (struct IndexFetchTableData* data);
+</programlisting>
+ API to clear and free the <structname>IndexFetchTableData</structname>.
+ </para>
Given above, how about:
API to release AM-specific resources held by the
<structname>IndexFetchTableData</structname> of a given index scan and
free the memory of <structname>IndexFetchTableData</structname> itself
+ <para>
+<programlisting>
+double
+index_build_range_scan (Relation heapRelation,
+ Relation indexRelation,
+ IndexInfo *indexInfo,
+ bool allow_sync,
+ bool anyvisible,
+ BlockNumber start_blockno,
+ BlockNumber end_blockno,
+ IndexBuildCallback callback,
+ void *callback_state,
+ TableScanDesc scan);
+</programlisting>
+ API to perform the table scan with bounded range specified by the caller
+ and insert the satisfied records into the index using the provided
callback
+ function pointer.
+ </para>
This is a bit heavy API and the above description lacks some details.
Also, isn't it a bit misleading to use the name end_blockno if it is
interpreted as num_blocks by the internal APIs?
How about:
API to scan the specified blocks of the given table and insert them into
the specified index using the provided callback function
+ <para>
+<programlisting>
+void
+index_validate_scan (Relation heapRelation,
+ Relation indexRelation,
+ IndexInfo *indexInfo,
+ Snapshot snapshot,
+ struct ValidateIndexState *state);
+</programlisting>
+ API to perform the table scan and insert the satisfied records into
the index.
+ This API is similar like <function>index_build_range_scan</function>.
This
+ is used in the scenario of concurrent index build.
+ </para>
This one's a complicated API too. How about:
API to scan the table according to the given snapshot and insert tuples
satisfying the snapshot into the specified index, provided their TIDs are
also present in the <structname>ValidateIndexState</structname> struct;
this API is used as the last phase of a concurrent index build
+ <sect2>
+ <title>Table scanning</title>
+
+ <para>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Table insert/update/delete</title>
+
+ <para>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Table locking</title>
+
+ <para>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Table vacuum</title>
+
+ <para>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Table fetch</title>
+
+ <para>
+ </para>
+ </sect2>
Seems like you forgot to put the individual API descriptions under these
sub-headers. Actually, I think it'd be better to try to format this page
to looks more like the following:
https://www.postgresql.org/docs/devel/fdw-callbacks.html
- Currently, only indexes have access methods. The requirements for index
- access methods are discussed in detail in <xref linkend="indexam"/>.
+ Currently, only <literal>INDEX</literal> and <literal>TABLE</literal> have
+ access methods. The requirements for access methods are discussed in
detail
+ in <xref linkend="am"/>.
Hmm, I don't see why you decided to add literal tags to INDEX and TABLE.
Couldn't this have been written as:
Currently, only tables and indexes have access methods. The requirements
for access methods are discussed in detail in <xref linkend="am"/>.
+ This variable specifies the default table access method using
which to create
+ objects (tables and materialized views) when a
<command>CREATE</command> command does
+ not explicitly specify a access method.
"variable" is not wrong, but "parameter" is used more often for GUCs. "a
access method" should be "an access method".
Maybe you could write this as:
This variable specifies the default table access method to use when
creating tables or materialized views if the <command>CREATE</command>
does not explicitly specify an access method.
+ If the value does not match the name of any existing table access
methods,
+ <productname>PostgreSQL</productname> will automatically use the
default
+ table access method of the current database.
any existing table methods -> any existing table method
Although, shouldn't that cause an error instead of ignoring the error and
use the database default access method instead?
Thank you for working on this. Really looking forward to how this shapes
up. :)
Thanks,
Amit
On Fri, Nov 16, 2018 at 2:05 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
I tried running the pgbench performance tests with minimal clients in my laptop and I didn't
find any performance issues, may be issue is visible only with higher clients. Even with
perf tool, I am not able to get a clear problem function. As you said, combining of all changes
leads to some overhead.
Just out of curiosity I've also tried tpc-c from oltpbench (in the very same
simple environment), it doesn't show any significant difference from master as
well.
Here I attached the cumulative patches with further fixes and basic syntax regress tests also.
While testing the latest version I've noticed, that you didn't include the fix
for HeapTupleInvisible (so I see the error again), was it intentionally?
On Tue, Nov 27, 2018 at 2:55 AM Andres Freund <andres@anarazel.de> wrote:
FWIW, now that oids are removed, and the tuple table slot abstraction
got in, I'm working on rebasing the pluggable storage patchset ontop of
that.
Yes, please. I've tried it myself for reviewing purposes, but the rebasing
speed was not impressive. Also I want to suggest to move it from github and
make a regular patchset, since it's already a bit confusing in the sense what
goes where and which patch to apply on top of which branch.
Hi,
Thanks for these changes. I've merged a good chunk of them.
On 2018-11-16 12:05:26 +1100, Haribabu Kommi wrote:
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c index c3960dc91f..3254e30a45 100644 --- a/src/backend/access/heap/heapam_handler.c +++ b/src/backend/access/heap/heapam_handler.c @@ -1741,7 +1741,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do { HeapScanDesc scan = (HeapScanDesc) sscan; Page targpage; - OffsetNumber targoffset = scan->rs_cindex; + OffsetNumber targoffset; OffsetNumber maxoffset; BufferHeapTupleTableSlot *hslot;@@ -1751,7 +1751,9 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
maxoffset = PageGetMaxOffsetNumber(targpage);/* Inner loop over all tuples on the selected page */ - for (targoffset = scan->rs_cindex; targoffset <= maxoffset; targoffset++) + for (targoffset = scan->rs_cindex ? scan->rs_cindex : FirstOffsetNumber; + targoffset <= maxoffset; + targoffset++) { ItemId itemid; HeapTuple targtuple = &hslot->base.tupdata;
I thought it was better to fix the initialization for rs_cindex - any
reason you didn't go for that?
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c index 8233475aa0..7bad246f55 100644 --- a/src/backend/access/heap/heapam_visibility.c +++ b/src/backend/access/heap/heapam_visibility.c @@ -1838,8 +1838,10 @@ HeapTupleSatisfies(HeapTuple stup, Snapshot snapshot, Buffer buffer) case NON_VACUUMABLE_VISIBILTY: return HeapTupleSatisfiesNonVacuumable(stup, snapshot, buffer); break; - default: + case END_OF_VISIBILITY: Assert(0); break; } + + return false; /* keep compiler quiet */
I don't understand why END_OF_VISIBILITY is good idea? I now removed
END_OF_VISIBILITY, and the default case.
@@ -593,6 +594,10 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
if (myState->rel->rd_rel->relhasoids)
slot->tts_tupleOid = InvalidOid;+ /* Materialize the slot */ + if (!TTS_IS_VIRTUAL(slot)) + ExecMaterializeSlot(slot); + table_insert(myState->rel, slot, myState->output_cid,
What's the point of adding materialization here?
@@ -570,6 +563,9 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
Assert(TTS_IS_HEAPTUPLE(scanslot) ||
TTS_IS_BUFFERTUPLE(scanslot));+ if (hslot->tuple == NULL) + ExecMaterializeSlot(scanslot); + d = heap_getsysattr(hslot->tuple, attnum, scanslot->tts_tupleDescriptor, op->resnull);
Same?
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index e055c0a7c6..34ef86a5bd 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -2594,7 +2594,7 @@ EvalPlanQual(EState *estate, EPQState *epqstate, * datums that may be present in copyTuple). As with the next step, this * is to guard against early re-use of the EPQ query. */ - if (!TupIsNull(slot)) + if (!TupIsNull(slot) && !TTS_IS_VIRTUAL(slot)) ExecMaterializeSlot(slot);
Same?
#if FIXME
@@ -2787,16 +2787,7 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
if (isNull)
continue;- elog(ERROR, "frak, need to implement ROW_MARK_COPY"); -#ifdef FIXME - // FIXME: this should just deform the tuple and store it as a - // virtual one. - tuple = table_tuple_by_datum(erm->relation, datum, erm->relid); - - /* store tuple */ - EvalPlanQualSetTuple(epqstate, erm->rti, tuple); -#endif - + ExecForceStoreHeapTupleDatum(datum, slot); } } }
Cool.
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c index 56880e3d16..36ca07beb2 100644 --- a/src/backend/executor/nodeBitmapHeapscan.c +++ b/src/backend/executor/nodeBitmapHeapscan.c @@ -224,6 +224,18 @@ BitmapHeapNext(BitmapHeapScanState *node)BitmapAdjustPrefetchIterator(node, tbmres);
+ /* + * Ignore any claimed entries past what we think is the end of the + * relation. (This is probably not necessary given that we got at + * least AccessShareLock on the table before performing any of the + * indexscans, but let's be safe.) + */ + if (tbmres->blockno >= scan->rs_nblocks) + { + node->tbmres = tbmres = NULL; + continue; + } +
I moved this into the storage engine, there just was a minor bug
preventing the already existing check from taking effect. I don't think
we should expose this kind of thing to the outside of the storage
engine.
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 54382aba88..ea48e1d6e8 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -4037,7 +4037,6 @@ CreateStatsStmt: * *****************************************************************************/-// PBORKED: storage option
CreateAsStmt:
CREATE OptTemp TABLE create_as_target AS SelectStmt opt_with_data
{
@@ -4068,14 +4067,16 @@ CreateAsStmt:
;create_as_target: - qualified_name opt_column_list OptWith OnCommitOption OptTableSpace + qualified_name opt_column_list table_access_method_clause + OptWith OnCommitOption OptTableSpace { $$ = makeNode(IntoClause); $$->rel = $1; $$->colNames = $2; - $$->options = $3; - $$->onCommit = $4; - $$->tableSpaceName = $5; + $$->accessMethod = $3; + $$->options = $4; + $$->onCommit = $5; + $$->tableSpaceName = $6; $$->viewQuery = NULL; $$->skipData = false; /* might get changed later */ } @@ -4125,14 +4126,15 @@ CreateMatViewStmt: ;create_mv_target: - qualified_name opt_column_list opt_reloptions OptTableSpace + qualified_name opt_column_list table_access_method_clause opt_reloptions OptTableSpace { $$ = makeNode(IntoClause); $$->rel = $1; $$->colNames = $2; - $$->options = $3; + $$->accessMethod = $3; + $$->options = $4; $$->onCommit = ONCOMMIT_NOOP; - $$->tableSpaceName = $4; + $$->tableSpaceName = $5; $$->viewQuery = NULL; /* filled at analysis time */ $$->skipData = false; /* might get changed later */ }
Cool. I wonder if we should also somehow support SELECT INTO w/ USING?
You've apparently started to do so with?
diff --git a/src/test/regress/expected/create_am.out b/src/test/regress/expected/create_am.out index 47dd885c4e..a4094ca3f1 100644 --- a/src/test/regress/expected/create_am.out +++ b/src/test/regress/expected/create_am.out @@ -99,3 +99,81 @@ HINT: Use DROP ... CASCADE to drop the dependent objects too. -- Drop access method cascade DROP ACCESS METHOD gist2 CASCADE; NOTICE: drop cascades to index grect2ind2 +-- Create a heap2 table am handler with heapam handler +CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler; +SELECT * FROM pg_am where amtype = 't'; + amname | amhandler | amtype +--------+----------------------+-------- + heap | heap_tableam_handler | t + heap2 | heap_tableam_handler | t +(2 rows) + +CREATE TABLE tbl_heap2(f1 int, f2 char(100)) using heap2; +INSERT INTO tbl_heap2 VALUES(generate_series(1,10), 'Test series'); +SELECT count(*) FROM tbl_heap2; + count +------- + 10 +(1 row) + +SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a + where a.oid = r.relam AND r.relname = 'tbl_heap2'; + relname | relkind | amname +-----------+---------+-------- + tbl_heap2 | r | heap2 +(1 row) + +-- create table as using heap2 +CREATE TABLE tblas_heap2 using heap2 AS select * from tbl_heap2; +SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a + where a.oid = r.relam AND r.relname = 'tblas_heap2'; + relname | relkind | amname +-------------+---------+-------- + tblas_heap2 | r | heap2 +(1 row) + +-- +-- select into doesn't support new syntax, so it should be +-- default access method. +-- +SELECT INTO tblselectinto_heap from tbl_heap2; +SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a + where a.oid = r.relam AND r.relname = 'tblselectinto_heap'; + relname | relkind | amname +--------------------+---------+-------- + tblselectinto_heap | r | heap +(1 row) + +DROP TABLE tblselectinto_heap; +-- create materialized view using heap2 +CREATE MATERIALIZED VIEW mv_heap2 USING heap2 AS + SELECT * FROM tbl_heap2; +SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a + where a.oid = r.relam AND r.relname = 'mv_heap2'; + relname | relkind | amname +----------+---------+-------- + mv_heap2 | m | heap2 +(1 row) + +-- Try creating the unsupported relation kinds with using syntax +CREATE VIEW test_view USING heap2 AS SELECT * FROM tbl_heap2; +ERROR: syntax error at or near "USING" +LINE 1: CREATE VIEW test_view USING heap2 AS SELECT * FROM tbl_heap2... + ^ +CREATE SEQUENCE test_seq USING heap2; +ERROR: syntax error at or near "USING" +LINE 1: CREATE SEQUENCE test_seq USING heap2; + ^ +-- Drop table access method, but fails as objects depends on it +DROP ACCESS METHOD heap2; +ERROR: cannot drop access method heap2 because other objects depend on it +DETAIL: table tbl_heap2 depends on access method heap2 +table tblas_heap2 depends on access method heap2 +materialized view mv_heap2 depends on access method heap2 +HINT: Use DROP ... CASCADE to drop the dependent objects too. +-- Drop table access method with cascade +DROP ACCESS METHOD heap2 CASCADE; +NOTICE: drop cascades to 3 other objects +DETAIL: drop cascades to table tbl_heap2 +drop cascades to table tblas_heap2 +drop cascades to materialized view mv_heap2 diff --git a/src/test/regress/sql/create_am.sql b/src/test/regress/sql/create_am.sql index 3e0ac104f3..0472a60f20 100644 --- a/src/test/regress/sql/create_am.sql +++ b/src/test/regress/sql/create_am.sql @@ -66,3 +66,49 @@ DROP ACCESS METHOD gist2;-- Drop access method cascade DROP ACCESS METHOD gist2 CASCADE; + +-- Create a heap2 table am handler with heapam handler +CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler; + +SELECT * FROM pg_am where amtype = 't'; + +CREATE TABLE tbl_heap2(f1 int, f2 char(100)) using heap2; +INSERT INTO tbl_heap2 VALUES(generate_series(1,10), 'Test series'); +SELECT count(*) FROM tbl_heap2; + +SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a + where a.oid = r.relam AND r.relname = 'tbl_heap2'; + +-- create table as using heap2 +CREATE TABLE tblas_heap2 using heap2 AS select * from tbl_heap2; +SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a + where a.oid = r.relam AND r.relname = 'tblas_heap2'; + +-- +-- select into doesn't support new syntax, so it should be +-- default access method. +-- +SELECT INTO tblselectinto_heap from tbl_heap2; +SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a + where a.oid = r.relam AND r.relname = 'tblselectinto_heap'; + +DROP TABLE tblselectinto_heap; + +-- create materialized view using heap2 +CREATE MATERIALIZED VIEW mv_heap2 USING heap2 AS + SELECT * FROM tbl_heap2; + +SELECT r.relname, r.relkind, a.amname from pg_class as r, pg_am as a + where a.oid = r.relam AND r.relname = 'mv_heap2'; + +-- Try creating the unsupported relation kinds with using syntax +CREATE VIEW test_view USING heap2 AS SELECT * FROM tbl_heap2; + +CREATE SEQUENCE test_seq USING heap2; + + +-- Drop table access method, but fails as objects depends on it +DROP ACCESS METHOD heap2; + +-- Drop table access method with cascade +DROP ACCESS METHOD heap2 CASCADE; -- 2.18.0.windows.1
Nice!
Greetings,
Andres Freund
Hi,
On 2018-11-26 17:55:57 -0800, Andres Freund wrote:
FWIW, now that oids are removed, and the tuple table slot abstraction
got in, I'm working on rebasing the pluggable storage patchset ontop of
that.
I've pushed a version to that to the git tree, including a rebased
version of zheap:
https://github.com/anarazel/postgres-pluggable-storage
https://github.com/anarazel/postgres-pluggable-zheap
I'm still working on moving some of the out-of-access/zheap
modifications into pluggable storage (see e.g. the first commit of the
pluggable-zheap series). But this should allow others to start on a more
recent codebasis.
My next steps are:
- make relation creation properly pluggable
- remove the typedefs from tableam.h, instead move them into the
TableAmRoutine struct.
- Move rs_{nblocks, startblock, numblocks} out of TableScanDescData
- Move HeapScanDesc and IndexFetchHeapData out of relscan.h
- See if the slot in SysScanDescData can be avoided, it's not exactly
free of overhead.
- remove ExecSlotCompare(), it's entirely unrelated to these changes imo
(and in the wrong place)
- rename HeapUpdateFailureData et al to not reference Heap
- split pluggable storage patchset, to commit earlier:
- EvalPlanQual slotification
- trigger slotification
- split of IndexBuildHeapScan out of index.c
I'm wondering whether we should add
table_beginscan/table_getnextslot/index_getnext_slot using the old API
in an earlier commit that then could be committed separately, allowing
the tablecmd.c changes to be committed soon.
I'm wondering whether we should change the table_beginscan* API so it
provides a slot - pretty much every caller has to do so, and it seems
just as easy to create/dispose via table_beginscan/endscan.
Further tasks I'm not yet planning to tackle, that I'd welcome help on:
- pg_dump support
- pg_upgrade testing
- I think we should consider removing HeapTuple->t_tableOid, it should
imo live entirely in the slot
Greetings,
Andres Freund
On Tue, Dec 11, 2018 at 3:13 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-11-26 17:55:57 -0800, Andres Freund wrote:
FWIW, now that oids are removed, and the tuple table slot abstraction
got in, I'm working on rebasing the pluggable storage patchset ontop of
that.I've pushed a version to that to the git tree, including a rebased
version of zheap:
https://github.com/anarazel/postgres-pluggable-storage
https://github.com/anarazel/postgres-pluggable-zheap
Great, thanks!
As a side note, I assume the last reference should be this, right?
https://github.com/anarazel/postgres-pluggable-storage/tree/pluggable-zheap
Further tasks I'm not yet planning to tackle, that I'd welcome help on:
- pg_dump support
- pg_upgrade testing
- I think we should consider removing HeapTuple->t_tableOid, it should
imo live entirely in the slot
I would love to try help with pg_dump support.
Hello.
At Tue, 27 Nov 2018 14:58:35 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <080ce65e-7b96-adbf-1c8c-7c88d87eaeda@lab.ntt.co.jp>
+ <para> +<programlisting> +TupleTableSlotOps * +slot_callbacks (Relation relation); +</programlisting> + API to access the slot specific methods; + Following methods are available; + <structname>TTSOpsVirtual</structname>, + <structname>TTSOpsHeapTuple</structname>, + <structname>TTSOpsMinimalTuple</structname>, + <structname>TTSOpsBufferTuple</structname>, + </para>Unless I'm misunderstanding what the TupleTableSlotOps abstraction is or
its relations to the TableAmRoutine abstraction, I think the text
description could better be written as:"API to get the slot operations struct for a given table access method"
It's not clear to me why various TTSOps* structs are listed here? Is the
point that different AMs may choose one of the listed alternatives? For
example, I see that heap AM implementation returns TTOpsBufferTuple, so it
manipulates slots containing buffered tuples, right? Other AMs are free
to return any one of these? For example, some AMs may never use buffer
manager and hence not use TTOpsBufferTuple. Is that understanding correct?
Yeah, I'm not sure why it should not be a pointer to the struct
itself but a function. And the four struct doesn't seem relevant
to table AMs. Perhaps clear, getsomeattrs and so on should be
listed instead.
+ <para> +<programlisting> +Oid +tuple_insert (Relation rel, TupleTableSlot *slot, CommandId cid, + int options, BulkInsertState bistate); +</programlisting> + API to insert the tuple and provide the <literal>ItemPointerData</literal> + where the tuple is successfully inserted. + </para>It's not clear from the signature where you get the ItemPointerData.
Looking at heapam_tuple_insert which puts it in slot->tts_tid, I think
this should mention it a bit differently, like:API to insert the tuple contained in the provided slot and return its TID,
that is, the location where the tuple is successfully inserted
It is actually an OID, not a TID in the current code. TID is
internaly handled.
+ <para> +<programlisting> +bool +tuple_fetch_follow (struct IndexFetchTableData *scan, + ItemPointer tid, + Snapshot snapshot, + TupleTableSlot *slot, + bool *call_again, bool *all_dead); +</programlisting> + API to get the all the tuples of the page that satisfies itempointer. + </para>IIUC, "all the tuples of of the page" in the above sentence means all the
tuples in the HOT chain of a given heap tuple, making this description of
the API slightly specific to the heap AM. Can we make the description
more generic or is the API itself very specific that it cannot be
expressed in generic terms? Ignoring that for a moment, I think the
sentence contains more "the"s than there need to be, so maybe write as:API to get all tuples on a given page that are linked to the tuple of the
given TID
Mmm. This is exposing MVCC matters to indexam. I suppose we
should refactor this API.
+ <para> +<programlisting> +tuple_data +get_tuple_data (TupleTableSlot *slot, tuple_data_flags flags); +</programlisting> + API to return the internal structure members of the HeapTuple. + </para>I think this description doesn't mention enough details of both the
information that needs to be specified when calling the function (what's
in flags) and the information that's returned.
(I suppose it will be described in later sections.)
+ <para> +<programlisting> +bool +scan_analyze_next_tuple (TableScanDesc scan, TransactionId OldestXmin, + double *liverows, double *deadrows, TupleTableSlot *slot)); +</programlisting> + API to analyze the block and fill the buffered heap tuple in the slot and also + provide the live and dead rows. + </para>How about:
API to get the next tuple from the block being scanned, which also updates
the number of live and dead rows encountered
"live" and "dead" are MVCC terms. I suppose that we should stash
out the deadrows somwhere else. (But analyze code would need to
be modified if we do so.)
+void +scansetlimits (TableScanDesc sscan, BlockNumber startBlk, BlockNumber numBlks); +</programlisting> + API to fix the relation scan range limits. + </para>How about:
API to set scan range endpoints
This sets start point and the number of blocks.. Just "API to set
scan range" would be sifficient reffering to the parameter list.
+ <para> +<programlisting> +bool +scan_bitmap_pagescan (TableScanDesc scan, + TBMIterateResult *tbmres); +</programlisting> + API to scan the relation and fill the scan description bitmap with valid item pointers + for the specified block. + </para>This says "to scan the relation", but seems to be concerned with only a
page worth of data as the name also says. Also, it's not clear what "scan
description bitmap" means. Maybe write as:API to scan the relation block specified in the scan descriptor to collect
and return the tuples requested by the given bitmap
"API to collect the tuples in a page requested by the given
bitmpap scan result." something? I think detailed explanation
would be required apart from the one-line description. Anyway the
name TBMIterateResult doesn't seem proper to expose.
+ <para> +<programlisting> +bool +scan_sample_next_block (TableScanDesc scan, struct SampleScanState *scanstate); +</programlisting> + API to scan the relation and fill the scan description bitmap with valid item pointers + for the specified block provided by the sample method. + </para>Looking at the code, this API selects the next block using the sampling
method and nothing more, although I see that the heap AM implementation
also does heapgetpage thus collecting live tuples in the array known only
to heap AM. So, how about:API to select the next block of the relation using the given sampling
method and set its information in the scan descriptor
"block" and "page" seems randomly choosed here and there. I don't
mind that seen in the core but..
+ <para> +<programlisting> +bool +scan_sample_next_tuple (TableScanDesc scan, struct SampleScanState *scanstate, TupleTableSlot *slot); +</programlisting> + API to fill the buffered heap tuple data from the bitmap scanned item pointers based on the sample + method and store it in the provided slot. + </para>How about:
API to select the next tuple using the given sampling method from the set
of tuples collected from the block previously selected by the sampling method
I'm not sure "from the set of tuples collected" is true. Just
"the state of sample scan" or something wouldn't be fine?
+ <para> +<programlisting> +void +scan_rescan (TableScanDesc scan, ScanKey key, bool set_params, + bool allow_strat, bool allow_sync, bool allow_pagemode); +</programlisting> + API to restart the relation scan with provided data. + </para>How about:
API to restart the given scan using provided options, releasing any
resources (such as buffer pins) already held by the scan
It looks too-detailed to me, but "with provided data" looks too
coarse..
+ <para> +<programlisting> +void +scan_update_snapshot (TableScanDesc scan, Snapshot snapshot); +</programlisting> + API to update the relation scan with the new snapshot. + </para>How about:
API to set the visibility snapshot to be used by a given scan
If so, the function name should be "scan_set_snapshot". Anyway
the name look like "the function to update a snapshot (itself)".
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Hello.
(in the next branch:)
At Tue, 27 Nov 2018 14:58:35 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <080ce65e-7b96-adbf-1c8c-7c88d87eaeda@lab.ntt.co.jp>
Thank you for working on this. Really looking forward to how this shapes
up. :)
+1.
I looked through the documentation part, as where I can do something.
am.html:
61.1. Overview of Index access methods
61.1.1. Basic API Structure for Indexes
61.1.2. Index Access Method Functions
61.1.3. Index Scanning
61.2. Overview of Table access methods
61.2.1. Table access method API
61.2.2. Table Access Method Functions
61.2.3. Table scanning
Aren't 61.1 and 61.2 better in the reverse order?
Is there a reason for the difference of the titles between 61.1.1
and 61.2.1? The contents are quite similar.
+ <sect2 id="table-api">
+ <title>Table access method API</title>
The member names of index AM struct begins with "am" but they
don't have an unified prefix in table AM. It seems a bit
incosistent. Perhaps we might should rename some long and
internal names..
+ <sect2 id="table-functions">
+ <title>Table Access Method Functions</title>
Table AM functions are far finer-grained than index AM. I think
that AM developers needs the more concrete description on what
every API function does and explanation on various
previously-internal structs.
I suppose that how the functions are used in core code paths will
be written in the following sections.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Import Notes
Reply to msg id not found: CAJrrPGdKtCaZLj0Y-mtaM-YaD1uUMT8yvrtCqaBTz-dhAdwFA@mail.gmail.com080ce65e-7b96-adbf-1c8c-7c88d87eaeda@lab.ntt.co.jp
On Tue, Dec 11, 2018 at 12:47 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
Thanks for these changes. I've merged a good chunk of them.
Thanks.
On 2018-11-16 12:05:26 +1100, Haribabu Kommi wrote:
diff --git a/src/backend/access/heap/heapam_handler.cb/src/backend/access/heap/heapam_handler.c
index c3960dc91f..3254e30a45 100644 --- a/src/backend/access/heap/heapam_handler.c +++ b/src/backend/access/heap/heapam_handler.c @@ -1741,7 +1741,7 @@ heapam_scan_analyze_next_tuple(TableScanDescsscan, TransactionId OldestXmin, do
{ HeapScanDesc scan = (HeapScanDesc) sscan; Page targpage; - OffsetNumber targoffset = scan->rs_cindex; + OffsetNumber targoffset; OffsetNumber maxoffset; BufferHeapTupleTableSlot *hslot;@@ -1751,7 +1751,9 @@ heapam_scan_analyze_next_tuple(TableScanDesc
sscan, TransactionId OldestXmin, do
maxoffset = PageGetMaxOffsetNumber(targpage);
/* Inner loop over all tuples on the selected page */
- for (targoffset = scan->rs_cindex; targoffset <= maxoffset;targoffset++)
+ for (targoffset = scan->rs_cindex ? scan->rs_cindex :
FirstOffsetNumber;
+ targoffset <= maxoffset; + targoffset++) { ItemId itemid; HeapTuple targtuple = &hslot->base.tupdata;I thought it was better to fix the initialization for rs_cindex - any
reason you didn't go for that?
No specific reason. Thanks for the correction.
diff --git a/src/backend/access/heap/heapam_visibility.cb/src/backend/access/heap/heapam_visibility.c
index 8233475aa0..7bad246f55 100644 --- a/src/backend/access/heap/heapam_visibility.c +++ b/src/backend/access/heap/heapam_visibility.c @@ -1838,8 +1838,10 @@ HeapTupleSatisfies(HeapTuple stup, Snapshotsnapshot, Buffer buffer)
case NON_VACUUMABLE_VISIBILTY:
return HeapTupleSatisfiesNonVacuumable(stup,snapshot, buffer);
break; - default: + case END_OF_VISIBILITY: Assert(0); break; } + + return false; /* keep compiler quiet */I don't understand why END_OF_VISIBILITY is good idea? I now removed
END_OF_VISIBILITY, and the default case.
OK.
@@ -593,6 +594,10 @@ intorel_receive(TupleTableSlot *slot, DestReceiver
*self)
if (myState->rel->rd_rel->relhasoids)
slot->tts_tupleOid = InvalidOid;+ /* Materialize the slot */ + if (!TTS_IS_VIRTUAL(slot)) + ExecMaterializeSlot(slot); + table_insert(myState->rel, slot, myState->output_cid,What's the point of adding materialization here?
In earlier testing i observed as the slot that is received is a buffered
slot
and it points to the original tuple, but when it inserts it into the new
table,
the transaction id changes and it leads to invisible tuple, because of that
reason I added the materialize here.
@@ -570,6 +563,9 @@ ExecInterpExpr(ExprState *state, ExprContext
*econtext, bool *isnull)
Assert(TTS_IS_HEAPTUPLE(scanslot) ||
TTS_IS_BUFFERTUPLE(scanslot));+ if (hslot->tuple == NULL) + ExecMaterializeSlot(scanslot); + d = heap_getsysattr(hslot->tuple, attnum,scanslot->tts_tupleDescriptor,
op->resnull);
Same?
diff --git a/src/backend/executor/execMain.cb/src/backend/executor/execMain.c
index e055c0a7c6..34ef86a5bd 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -2594,7 +2594,7 @@ EvalPlanQual(EState *estate, EPQState *epqstate, * datums that may be present in copyTuple). As with the nextstep, this
* is to guard against early re-use of the EPQ query. */ - if (!TupIsNull(slot)) + if (!TupIsNull(slot) && !TTS_IS_VIRTUAL(slot)) ExecMaterializeSlot(slot);Same?
Earlier virtual tuple materialize was throwing error, because of that
reason I added
that check.
index 56880e3d16..36ca07beb2 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c +++ b/src/backend/executor/nodeBitmapHeapscan.c @@ -224,6 +224,18 @@ BitmapHeapNext(BitmapHeapScanState *node)BitmapAdjustPrefetchIterator(node, tbmres);
+ /* + * Ignore any claimed entries past what we thinkis the end of the
+ * relation. (This is probably not necessary
given that we got at
+ * least AccessShareLock on the table before
performing any of the
+ * indexscans, but let's be safe.) + */ + if (tbmres->blockno >= scan->rs_nblocks) + { + node->tbmres = tbmres = NULL; + continue; + } +I moved this into the storage engine, there just was a minor bug
preventing the already existing check from taking effect. I don't think
we should expose this kind of thing to the outside of the storage
engine.
OK.
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 54382aba88..ea48e1d6e8 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -4037,7 +4037,6 @@ CreateStatsStmt: ******************************************************************************/
-// PBORKED: storage option
CreateAsStmt:
CREATE OptTemp TABLE create_as_target AS SelectStmtopt_with_data
{
@@ -4068,14 +4067,16 @@ CreateAsStmt:
;create_as_target:
- qualified_name opt_column_list OptWithOnCommitOption OptTableSpace
+ qualified_name opt_column_list
table_access_method_clause
+ OptWith OnCommitOption OptTableSpace { $$ = makeNode(IntoClause); $$->rel = $1; $$->colNames = $2; - $$->options = $3; - $$->onCommit = $4; - $$->tableSpaceName = $5; + $$->accessMethod = $3; + $$->options = $4; + $$->onCommit = $5; + $$->tableSpaceName = $6; $$->viewQuery = NULL; $$->skipData = false; /*might get changed later */
}
@@ -4125,14 +4126,15 @@ CreateMatViewStmt:
;create_mv_target:
- qualified_name opt_column_list opt_reloptionsOptTableSpace
+ qualified_name opt_column_list
table_access_method_clause opt_reloptions OptTableSpace
{ $$ = makeNode(IntoClause); $$->rel = $1; $$->colNames = $2; - $$->options = $3; + $$->accessMethod = $3; + $$->options = $4; $$->onCommit = ONCOMMIT_NOOP; - $$->tableSpaceName = $4; + $$->tableSpaceName = $5; $$->viewQuery = NULL; /*filled at analysis time */
$$->skipData = false; /*
might get changed later */
}
Cool. I wonder if we should also somehow support SELECT INTO w/ USING?
You've apparently started to do so with?
I thought the same, but SELECT INTO is deprecated syntax, is it fine to add
the new syntax?
Regards,
Haribabu Kommi
Fujitsu Australia
On Tue, Dec 11, 2018 at 3:13 AM Andres Freund <andres@anarazel.de> wrote:
Further tasks I'm not yet planning to tackle, that I'd welcome help on:
- pg_dump support
- pg_upgrade testing
- I think we should consider removing HeapTuple->t_tableOid, it should
imo live entirely in the slot
I'm a bit confused, but what kind of pg_dump support you're talking about?
After a quick glance I don't see so far any table access specific logic there.
To check it I've created a test access method (which is a copy of heap, but
with some small differences) and pg_dump worked as expected.
As a side note, in a table description I haven't found any mention of which
access method is used for this table, probably it's useful to show that with \d+
(see the attached patch).
Attachments:
describe_am.patchapplication/octet-stream; name=describe_am.patchDownload
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 0a181b01d9..a292c531b5 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1484,6 +1484,7 @@ describeOneTableDetails(const char *schemaname,
char *reloftype;
char relpersistence;
char relreplident;
+ char *relam;
} tableinfo;
bool show_column_details = false;
@@ -1503,9 +1504,10 @@ describeOneTableDetails(const char *schemaname,
"c.relhastriggers, c.relrowsecurity, c.relforcerowsecurity, "
"false AS relhasoids, %s, c.reltablespace, "
"CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END, "
- "c.relpersistence, c.relreplident\n"
+ "c.relpersistence, c.relreplident, am.amname\n"
"FROM pg_catalog.pg_class c\n "
"LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
+ "LEFT JOIN pg_catalog.pg_am am ON (c.relam = am.oid)\n"
"WHERE c.oid = '%s';",
(verbose ?
"pg_catalog.array_to_string(c.reloptions || "
@@ -1656,6 +1658,8 @@ describeOneTableDetails(const char *schemaname,
*(PQgetvalue(res, 0, 11)) : 0;
tableinfo.relreplident = (pset.sversion >= 90400) ?
*(PQgetvalue(res, 0, 12)) : 'd';
+ tableinfo.relam = (pset.sversion >= 120000) ?
+ pg_strdup(PQgetvalue(res, 0, 13)) : NULL;
PQclear(res);
res = NULL;
@@ -3141,6 +3145,15 @@ describeOneTableDetails(const char *schemaname,
/* Tablespace info */
add_tablespace_footer(&cont, tableinfo.relkind, tableinfo.tablespace,
true);
+
+ /* Access method info */
+ if (pset.sversion >= 120000 && verbose)
+ {
+ printfPQExpBuffer(&buf, _("Access method: %s"), tableinfo.relam);
+ printTableAddFooter(&cont, buf.data);
+ }
+
+
}
/* reloptions, if verbose */
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 19bb538411..84d182303e 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -432,6 +432,7 @@ alter table check_con_tbl add check (check_con_function(check_con_tbl.*));
f1 | integer | | | | plain | |
Check constraints:
"check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+Access method: heap
copy check_con_tbl from stdin;
NOTICE: input = {"f1":1}
diff --git a/src/test/regress/expected/create_table.out b/src/test/regress/expected/create_table.out
index 7e52c27e3f..dbaa713e6b 100644
--- a/src/test/regress/expected/create_table.out
+++ b/src/test/regress/expected/create_table.out
@@ -438,6 +438,7 @@ Number of partitions: 0
b | text | | | | extended | |
Partition key: RANGE (((a + 1)), substr(b, 1, 5))
Number of partitions: 0
+Access method: heap
INSERT INTO partitioned2 VALUES (1, 'hello');
ERROR: no partition of relation "partitioned2" found for row
@@ -451,6 +452,7 @@ CREATE TABLE part2_1 PARTITION OF partitioned2 FOR VALUES FROM (-1, 'aaaaa') TO
b | text | | | | extended | |
Partition of: partitioned2 FOR VALUES FROM ('-1', 'aaaaa') TO (100, 'ccccc')
Partition constraint: (((a + 1) IS NOT NULL) AND (substr(b, 1, 5) IS NOT NULL) AND (((a + 1) > '-1'::integer) OR (((a + 1) = '-1'::integer) AND (substr(b, 1, 5) >= 'aaaaa'::text))) AND (((a + 1) < 100) OR (((a + 1) = 100) AND (substr(b, 1, 5) < 'ccccc'::text))))
+Access method: heap
DROP TABLE partitioned, partitioned2;
--
@@ -783,6 +785,7 @@ drop table parted_collate_must_match;
b | integer | | not null | 1 | plain | |
Partition of: parted FOR VALUES IN ('b')
Partition constraint: ((a IS NOT NULL) AND (a = 'b'::text))
+Access method: heap
-- Both partition bound and partition key in describe output
\d+ part_c
@@ -795,6 +798,7 @@ Partition of: parted FOR VALUES IN ('c')
Partition constraint: ((a IS NOT NULL) AND (a = 'c'::text))
Partition key: RANGE (b)
Partitions: part_c_1_10 FOR VALUES FROM (1) TO (10)
+Access method: heap
-- a level-2 partition's constraint will include the parent's expressions
\d+ part_c_1_10
@@ -805,6 +809,7 @@ Partitions: part_c_1_10 FOR VALUES FROM (1) TO (10)
b | integer | | not null | 0 | plain | |
Partition of: part_c FOR VALUES FROM (1) TO (10)
Partition constraint: ((a IS NOT NULL) AND (a = 'c'::text) AND (b IS NOT NULL) AND (b >= 1) AND (b < 10))
+Access method: heap
-- Show partition count in the parent's describe output
-- Tempted to include \d+ output listing partitions with bound info but
@@ -839,6 +844,7 @@ CREATE TABLE unbounded_range_part PARTITION OF range_parted4 FOR VALUES FROM (MI
c | integer | | | | plain | |
Partition of: range_parted4 FOR VALUES FROM (MINVALUE, MINVALUE, MINVALUE) TO (MAXVALUE, MAXVALUE, MAXVALUE)
Partition constraint: ((abs(a) IS NOT NULL) AND (abs(b) IS NOT NULL) AND (c IS NOT NULL))
+Access method: heap
DROP TABLE unbounded_range_part;
CREATE TABLE range_parted4_1 PARTITION OF range_parted4 FOR VALUES FROM (MINVALUE, MINVALUE, MINVALUE) TO (1, MAXVALUE, MAXVALUE);
@@ -851,6 +857,7 @@ CREATE TABLE range_parted4_1 PARTITION OF range_parted4 FOR VALUES FROM (MINVALU
c | integer | | | | plain | |
Partition of: range_parted4 FOR VALUES FROM (MINVALUE, MINVALUE, MINVALUE) TO (1, MAXVALUE, MAXVALUE)
Partition constraint: ((abs(a) IS NOT NULL) AND (abs(b) IS NOT NULL) AND (c IS NOT NULL) AND (abs(a) <= 1))
+Access method: heap
CREATE TABLE range_parted4_2 PARTITION OF range_parted4 FOR VALUES FROM (3, 4, 5) TO (6, 7, MAXVALUE);
\d+ range_parted4_2
@@ -862,6 +869,7 @@ CREATE TABLE range_parted4_2 PARTITION OF range_parted4 FOR VALUES FROM (3, 4, 5
c | integer | | | | plain | |
Partition of: range_parted4 FOR VALUES FROM (3, 4, 5) TO (6, 7, MAXVALUE)
Partition constraint: ((abs(a) IS NOT NULL) AND (abs(b) IS NOT NULL) AND (c IS NOT NULL) AND ((abs(a) > 3) OR ((abs(a) = 3) AND (abs(b) > 4)) OR ((abs(a) = 3) AND (abs(b) = 4) AND (c >= 5))) AND ((abs(a) < 6) OR ((abs(a) = 6) AND (abs(b) <= 7))))
+Access method: heap
CREATE TABLE range_parted4_3 PARTITION OF range_parted4 FOR VALUES FROM (6, 8, MINVALUE) TO (9, MAXVALUE, MAXVALUE);
\d+ range_parted4_3
@@ -873,6 +881,7 @@ CREATE TABLE range_parted4_3 PARTITION OF range_parted4 FOR VALUES FROM (6, 8, M
c | integer | | | | plain | |
Partition of: range_parted4 FOR VALUES FROM (6, 8, MINVALUE) TO (9, MAXVALUE, MAXVALUE)
Partition constraint: ((abs(a) IS NOT NULL) AND (abs(b) IS NOT NULL) AND (c IS NOT NULL) AND ((abs(a) > 6) OR ((abs(a) = 6) AND (abs(b) >= 8))) AND (abs(a) <= 9))
+Access method: heap
DROP TABLE range_parted4;
-- user-defined operator class in partition key
@@ -909,6 +918,7 @@ SELECT obj_description('parted_col_comment'::regclass);
b | text | | | | extended | |
Partition key: LIST (a)
Number of partitions: 0
+Access method: heap
DROP TABLE parted_col_comment;
-- list partitioning on array type column
@@ -921,6 +931,7 @@ CREATE TABLE arrlp12 PARTITION OF arrlp FOR VALUES IN ('{1}', '{2}');
a | integer[] | | | | extended | |
Partition of: arrlp FOR VALUES IN ('{1}', '{2}')
Partition constraint: ((a IS NOT NULL) AND ((a = '{1}'::integer[]) OR (a = '{2}'::integer[])))
+Access method: heap
DROP TABLE arrlp;
-- partition on boolean column
@@ -935,6 +946,7 @@ create table boolspart_f partition of boolspart for values in (false);
Partition key: LIST (a)
Partitions: boolspart_f FOR VALUES IN (false),
boolspart_t FOR VALUES IN (true)
+Access method: heap
drop table boolspart;
-- partitions mixing temporary and permanent relations
diff --git a/src/test/regress/expected/create_table_like.out b/src/test/regress/expected/create_table_like.out
index b582211270..951d876216 100644
--- a/src/test/regress/expected/create_table_like.out
+++ b/src/test/regress/expected/create_table_like.out
@@ -164,6 +164,7 @@ CREATE TABLE ctlt12_storage (LIKE ctlt1 INCLUDING STORAGE, LIKE ctlt2 INCLUDING
a | text | | not null | | main | |
b | text | | | | extended | |
c | text | | | | external | |
+Access method: heap
CREATE TABLE ctlt12_comments (LIKE ctlt1 INCLUDING COMMENTS, LIKE ctlt2 INCLUDING COMMENTS);
\d+ ctlt12_comments
@@ -173,6 +174,7 @@ CREATE TABLE ctlt12_comments (LIKE ctlt1 INCLUDING COMMENTS, LIKE ctlt2 INCLUDIN
a | text | | not null | | extended | | A
b | text | | | | extended | | B
c | text | | | | extended | | C
+Access method: heap
CREATE TABLE ctlt1_inh (LIKE ctlt1 INCLUDING CONSTRAINTS INCLUDING COMMENTS) INHERITS (ctlt1);
NOTICE: merging column "a" with inherited definition
@@ -187,6 +189,7 @@ NOTICE: merging constraint "ctlt1_a_check" with inherited definition
Check constraints:
"ctlt1_a_check" CHECK (length(a) > 2)
Inherits: ctlt1
+Access method: heap
SELECT description FROM pg_description, pg_constraint c WHERE classoid = 'pg_constraint'::regclass AND objoid = c.oid AND c.conrelid = 'ctlt1_inh'::regclass;
description
@@ -208,6 +211,7 @@ Check constraints:
"ctlt3_a_check" CHECK (length(a) < 5)
Inherits: ctlt1,
ctlt3
+Access method: heap
CREATE TABLE ctlt13_like (LIKE ctlt3 INCLUDING CONSTRAINTS INCLUDING COMMENTS INCLUDING STORAGE) INHERITS (ctlt1);
NOTICE: merging column "a" with inherited definition
@@ -222,6 +226,7 @@ Check constraints:
"ctlt1_a_check" CHECK (length(a) > 2)
"ctlt3_a_check" CHECK (length(a) < 5)
Inherits: ctlt1
+Access method: heap
SELECT description FROM pg_description, pg_constraint c WHERE classoid = 'pg_constraint'::regclass AND objoid = c.oid AND c.conrelid = 'ctlt13_like'::regclass;
description
@@ -244,6 +249,7 @@ Check constraints:
"ctlt1_a_check" CHECK (length(a) > 2)
Statistics objects:
"public"."ctlt_all_a_b_stat" (ndistinct, dependencies) ON a, b FROM ctlt_all
+Access method: heap
SELECT c.relname, objsubid, description FROM pg_description, pg_index i, pg_class c WHERE classoid = 'pg_class'::regclass AND objoid = i.indexrelid AND c.oid = i.indexrelid AND i.indrelid = 'ctlt_all'::regclass ORDER BY c.relname, objsubid;
relname | objsubid | description
diff --git a/src/test/regress/expected/domain.out b/src/test/regress/expected/domain.out
index 0b5a9041b0..976fd7446f 100644
--- a/src/test/regress/expected/domain.out
+++ b/src/test/regress/expected/domain.out
@@ -282,6 +282,7 @@ Rules:
silly AS
ON DELETE TO dcomptable DO INSTEAD UPDATE dcomptable SET d1.r = (dcomptable.d1).r - 1::double precision, d1.i = (dcomptable.d1).i + 1::double precision
WHERE (dcomptable.d1).i > 0::double precision
+Access method: heap
drop table dcomptable;
drop type comptype cascade;
@@ -419,6 +420,7 @@ Rules:
silly AS
ON DELETE TO dcomptable DO INSTEAD UPDATE dcomptable SET d1[1].r = dcomptable.d1[1].r - 1::double precision, d1[1].i = dcomptable.d1[1].i + 1::double precision
WHERE dcomptable.d1[1].i > 0::double precision
+Access method: heap
drop table dcomptable;
drop type comptype cascade;
diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out
index 4d82d3a7e8..94ab874d75 100644
--- a/src/test/regress/expected/foreign_data.out
+++ b/src/test/regress/expected/foreign_data.out
@@ -731,6 +731,7 @@ Check constraints:
"ft1_c3_check" CHECK (c3 >= '01-01-1994'::date AND c3 <= '01-31-1994'::date)
Server: s0
FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method:
\det+
List of foreign tables
@@ -800,6 +801,7 @@ Check constraints:
"ft1_c3_check" CHECK (c3 >= '01-01-1994'::date AND c3 <= '01-31-1994'::date)
Server: s0
FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method:
-- can't change the column type if it's used elsewhere
CREATE TABLE use_ft1_column_type (x ft1);
@@ -1339,6 +1341,7 @@ CREATE FOREIGN TABLE ft2 () INHERITS (fd_pt1)
c2 | text | | | | extended | |
c3 | date | | | | plain | |
Child tables: ft2
+Access method: heap
\d+ ft2
Foreign table "public.ft2"
@@ -1350,6 +1353,7 @@ Child tables: ft2
Server: s0
FDW options: (delimiter ',', quote '"', "be quoted" 'value')
Inherits: fd_pt1
+Access method:
DROP FOREIGN TABLE ft2;
\d+ fd_pt1
@@ -1359,6 +1363,7 @@ DROP FOREIGN TABLE ft2;
c1 | integer | | not null | | plain | |
c2 | text | | | | extended | |
c3 | date | | | | plain | |
+Access method: heap
CREATE FOREIGN TABLE ft2 (
c1 integer NOT NULL,
@@ -1374,6 +1379,7 @@ CREATE FOREIGN TABLE ft2 (
c3 | date | | | | | plain | |
Server: s0
FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method:
ALTER FOREIGN TABLE ft2 INHERIT fd_pt1;
\d+ fd_pt1
@@ -1384,6 +1390,7 @@ ALTER FOREIGN TABLE ft2 INHERIT fd_pt1;
c2 | text | | | | extended | |
c3 | date | | | | plain | |
Child tables: ft2
+Access method: heap
\d+ ft2
Foreign table "public.ft2"
@@ -1395,6 +1402,7 @@ Child tables: ft2
Server: s0
FDW options: (delimiter ',', quote '"', "be quoted" 'value')
Inherits: fd_pt1
+Access method:
CREATE TABLE ct3() INHERITS(ft2);
CREATE FOREIGN TABLE ft3 (
@@ -1418,6 +1426,7 @@ FDW options: (delimiter ',', quote '"', "be quoted" 'value')
Inherits: fd_pt1
Child tables: ct3,
ft3
+Access method:
\d+ ct3
Table "public.ct3"
@@ -1427,6 +1436,7 @@ Child tables: ct3,
c2 | text | | | | extended | |
c3 | date | | | | plain | |
Inherits: ft2
+Access method: heap
\d+ ft3
Foreign table "public.ft3"
@@ -1437,6 +1447,7 @@ Inherits: ft2
c3 | date | | | | | plain | |
Server: s0
Inherits: ft2
+Access method:
-- add attributes recursively
ALTER TABLE fd_pt1 ADD COLUMN c4 integer;
@@ -1457,6 +1468,7 @@ ALTER TABLE fd_pt1 ADD COLUMN c8 integer;
c7 | integer | | not null | | plain | |
c8 | integer | | | | plain | |
Child tables: ft2
+Access method: heap
\d+ ft2
Foreign table "public.ft2"
@@ -1475,6 +1487,7 @@ FDW options: (delimiter ',', quote '"', "be quoted" 'value')
Inherits: fd_pt1
Child tables: ct3,
ft3
+Access method:
\d+ ct3
Table "public.ct3"
@@ -1489,6 +1502,7 @@ Child tables: ct3,
c7 | integer | | not null | | plain | |
c8 | integer | | | | plain | |
Inherits: ft2
+Access method: heap
\d+ ft3
Foreign table "public.ft3"
@@ -1504,6 +1518,7 @@ Inherits: ft2
c8 | integer | | | | | plain | |
Server: s0
Inherits: ft2
+Access method:
-- alter attributes recursively
ALTER TABLE fd_pt1 ALTER COLUMN c4 SET DEFAULT 0;
@@ -1531,6 +1546,7 @@ ALTER TABLE fd_pt1 ALTER COLUMN c8 SET STORAGE EXTERNAL;
c7 | integer | | | | plain | |
c8 | text | | | | external | |
Child tables: ft2
+Access method: heap
\d+ ft2
Foreign table "public.ft2"
@@ -1549,6 +1565,7 @@ FDW options: (delimiter ',', quote '"', "be quoted" 'value')
Inherits: fd_pt1
Child tables: ct3,
ft3
+Access method:
-- drop attributes recursively
ALTER TABLE fd_pt1 DROP COLUMN c4;
@@ -1564,6 +1581,7 @@ ALTER TABLE fd_pt1 DROP COLUMN c8;
c2 | text | | | | extended | |
c3 | date | | | | plain | |
Child tables: ft2
+Access method: heap
\d+ ft2
Foreign table "public.ft2"
@@ -1577,6 +1595,7 @@ FDW options: (delimiter ',', quote '"', "be quoted" 'value')
Inherits: fd_pt1
Child tables: ct3,
ft3
+Access method:
-- add constraints recursively
ALTER TABLE fd_pt1 ADD CONSTRAINT fd_pt1chk1 CHECK (c1 > 0) NO INHERIT;
@@ -1604,6 +1623,7 @@ Check constraints:
"fd_pt1chk1" CHECK (c1 > 0) NO INHERIT
"fd_pt1chk2" CHECK (c2 <> ''::text)
Child tables: ft2
+Access method: heap
\d+ ft2
Foreign table "public.ft2"
@@ -1619,6 +1639,7 @@ FDW options: (delimiter ',', quote '"', "be quoted" 'value')
Inherits: fd_pt1
Child tables: ct3,
ft3
+Access method:
\set VERBOSITY terse
DROP FOREIGN TABLE ft2; -- ERROR
@@ -1648,6 +1669,7 @@ Check constraints:
"fd_pt1chk1" CHECK (c1 > 0) NO INHERIT
"fd_pt1chk2" CHECK (c2 <> ''::text)
Child tables: ft2
+Access method: heap
\d+ ft2
Foreign table "public.ft2"
@@ -1661,6 +1683,7 @@ Check constraints:
Server: s0
FDW options: (delimiter ',', quote '"', "be quoted" 'value')
Inherits: fd_pt1
+Access method:
-- drop constraints recursively
ALTER TABLE fd_pt1 DROP CONSTRAINT fd_pt1chk1 CASCADE;
@@ -1678,6 +1701,7 @@ ALTER TABLE fd_pt1 ADD CONSTRAINT fd_pt1chk3 CHECK (c2 <> '') NOT VALID;
Check constraints:
"fd_pt1chk3" CHECK (c2 <> ''::text) NOT VALID
Child tables: ft2
+Access method: heap
\d+ ft2
Foreign table "public.ft2"
@@ -1692,6 +1716,7 @@ Check constraints:
Server: s0
FDW options: (delimiter ',', quote '"', "be quoted" 'value')
Inherits: fd_pt1
+Access method:
-- VALIDATE CONSTRAINT need do nothing on foreign tables
ALTER TABLE fd_pt1 VALIDATE CONSTRAINT fd_pt1chk3;
@@ -1705,6 +1730,7 @@ ALTER TABLE fd_pt1 VALIDATE CONSTRAINT fd_pt1chk3;
Check constraints:
"fd_pt1chk3" CHECK (c2 <> ''::text)
Child tables: ft2
+Access method: heap
\d+ ft2
Foreign table "public.ft2"
@@ -1719,6 +1745,7 @@ Check constraints:
Server: s0
FDW options: (delimiter ',', quote '"', "be quoted" 'value')
Inherits: fd_pt1
+Access method:
-- changes name of an attribute recursively
ALTER TABLE fd_pt1 RENAME COLUMN c1 TO f1;
@@ -1736,6 +1763,7 @@ ALTER TABLE fd_pt1 RENAME CONSTRAINT fd_pt1chk3 TO f2_check;
Check constraints:
"f2_check" CHECK (f2 <> ''::text)
Child tables: ft2
+Access method: heap
\d+ ft2
Foreign table "public.ft2"
@@ -1750,6 +1778,7 @@ Check constraints:
Server: s0
FDW options: (delimiter ',', quote '"', "be quoted" 'value')
Inherits: fd_pt1
+Access method:
-- TRUNCATE doesn't work on foreign tables, either directly or recursively
TRUNCATE ft2; -- ERROR
@@ -1799,6 +1828,7 @@ CREATE FOREIGN TABLE fd_pt2_1 PARTITION OF fd_pt2 FOR VALUES IN (1)
c3 | date | | | | plain | |
Partition key: LIST (c1)
Partitions: fd_pt2_1 FOR VALUES IN (1)
+Access method: heap
\d+ fd_pt2_1
Foreign table "public.fd_pt2_1"
@@ -1811,6 +1841,7 @@ Partition of: fd_pt2 FOR VALUES IN (1)
Partition constraint: ((c1 IS NOT NULL) AND (c1 = 1))
Server: s0
FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method:
-- partition cannot have additional columns
DROP FOREIGN TABLE fd_pt2_1;
@@ -1830,6 +1861,7 @@ CREATE FOREIGN TABLE fd_pt2_1 (
c4 | character(1) | | | | | extended | |
Server: s0
FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method:
ALTER TABLE fd_pt2 ATTACH PARTITION fd_pt2_1 FOR VALUES IN (1); -- ERROR
ERROR: table "fd_pt2_1" contains column "c4" not found in parent "fd_pt2"
@@ -1844,6 +1876,7 @@ DROP FOREIGN TABLE fd_pt2_1;
c3 | date | | | | plain | |
Partition key: LIST (c1)
Number of partitions: 0
+Access method: heap
CREATE FOREIGN TABLE fd_pt2_1 (
c1 integer NOT NULL,
@@ -1859,6 +1892,7 @@ CREATE FOREIGN TABLE fd_pt2_1 (
c3 | date | | | | | plain | |
Server: s0
FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method:
-- no attach partition validation occurs for foreign tables
ALTER TABLE fd_pt2 ATTACH PARTITION fd_pt2_1 FOR VALUES IN (1);
@@ -1871,6 +1905,7 @@ ALTER TABLE fd_pt2 ATTACH PARTITION fd_pt2_1 FOR VALUES IN (1);
c3 | date | | | | plain | |
Partition key: LIST (c1)
Partitions: fd_pt2_1 FOR VALUES IN (1)
+Access method: heap
\d+ fd_pt2_1
Foreign table "public.fd_pt2_1"
@@ -1883,6 +1918,7 @@ Partition of: fd_pt2 FOR VALUES IN (1)
Partition constraint: ((c1 IS NOT NULL) AND (c1 = 1))
Server: s0
FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method:
-- cannot add column to a partition
ALTER TABLE fd_pt2_1 ADD c4 char;
@@ -1899,6 +1935,7 @@ ALTER TABLE fd_pt2_1 ADD CONSTRAINT p21chk CHECK (c2 <> '');
c3 | date | | | | plain | |
Partition key: LIST (c1)
Partitions: fd_pt2_1 FOR VALUES IN (1)
+Access method: heap
\d+ fd_pt2_1
Foreign table "public.fd_pt2_1"
@@ -1913,6 +1950,7 @@ Check constraints:
"p21chk" CHECK (c2 <> ''::text)
Server: s0
FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method:
-- cannot drop inherited NOT NULL constraint from a partition
ALTER TABLE fd_pt2_1 ALTER c1 DROP NOT NULL;
@@ -1929,6 +1967,7 @@ ALTER TABLE fd_pt2 ALTER c2 SET NOT NULL;
c3 | date | | | | plain | |
Partition key: LIST (c1)
Number of partitions: 0
+Access method: heap
\d+ fd_pt2_1
Foreign table "public.fd_pt2_1"
@@ -1941,6 +1980,7 @@ Check constraints:
"p21chk" CHECK (c2 <> ''::text)
Server: s0
FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method:
ALTER TABLE fd_pt2 ATTACH PARTITION fd_pt2_1 FOR VALUES IN (1); -- ERROR
ERROR: column "c2" in child table must be marked NOT NULL
@@ -1959,6 +1999,7 @@ Partition key: LIST (c1)
Check constraints:
"fd_pt2chk1" CHECK (c1 > 0)
Number of partitions: 0
+Access method: heap
\d+ fd_pt2_1
Foreign table "public.fd_pt2_1"
@@ -1971,6 +2012,7 @@ Check constraints:
"p21chk" CHECK (c2 <> ''::text)
Server: s0
FDW options: (delimiter ',', quote '"', "be quoted" 'value')
+Access method:
ALTER TABLE fd_pt2 ATTACH PARTITION fd_pt2_1 FOR VALUES IN (1); -- ERROR
ERROR: child table is missing constraint "fd_pt2chk1"
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index f259d07535..7bfc11c770 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1001,6 +1001,7 @@ ALTER TABLE inhts RENAME d TO dd;
dd | integer | | | | plain | |
Inherits: inht1,
inhs1
+Access method: heap
DROP TABLE inhts;
-- Test for renaming in diamond inheritance
@@ -1021,6 +1022,7 @@ ALTER TABLE inht1 RENAME aa TO aaa;
z | integer | | | | plain | |
Inherits: inht2,
inht3
+Access method: heap
CREATE TABLE inhts (d int) INHERITS (inht2, inhs1);
NOTICE: merging multiple inherited definitions of column "b"
@@ -1038,6 +1040,7 @@ ERROR: cannot rename inherited column "b"
d | integer | | | | plain | |
Inherits: inht2,
inhs1
+Access method: heap
WITH RECURSIVE r AS (
SELECT 'inht1'::regclass AS inhrelid
@@ -1084,6 +1087,7 @@ CREATE TABLE test_constraints_inh () INHERITS (test_constraints);
Indexes:
"test_constraints_val1_val2_key" UNIQUE CONSTRAINT, btree (val1, val2)
Child tables: test_constraints_inh
+Access method: heap
ALTER TABLE ONLY test_constraints DROP CONSTRAINT test_constraints_val1_val2_key;
\d+ test_constraints
@@ -1094,6 +1098,7 @@ ALTER TABLE ONLY test_constraints DROP CONSTRAINT test_constraints_val1_val2_key
val1 | character varying | | | | extended | |
val2 | integer | | | | plain | |
Child tables: test_constraints_inh
+Access method: heap
\d+ test_constraints_inh
Table "public.test_constraints_inh"
@@ -1103,6 +1108,7 @@ Child tables: test_constraints_inh
val1 | character varying | | | | extended | |
val2 | integer | | | | plain | |
Inherits: test_constraints
+Access method: heap
DROP TABLE test_constraints_inh;
DROP TABLE test_constraints;
@@ -1119,6 +1125,7 @@ CREATE TABLE test_ex_constraints_inh () INHERITS (test_ex_constraints);
Indexes:
"test_ex_constraints_c_excl" EXCLUDE USING gist (c WITH &&)
Child tables: test_ex_constraints_inh
+Access method: heap
ALTER TABLE test_ex_constraints DROP CONSTRAINT test_ex_constraints_c_excl;
\d+ test_ex_constraints
@@ -1127,6 +1134,7 @@ ALTER TABLE test_ex_constraints DROP CONSTRAINT test_ex_constraints_c_excl;
--------+--------+-----------+----------+---------+---------+--------------+-------------
c | circle | | | | plain | |
Child tables: test_ex_constraints_inh
+Access method: heap
\d+ test_ex_constraints_inh
Table "public.test_ex_constraints_inh"
@@ -1134,6 +1142,7 @@ Child tables: test_ex_constraints_inh
--------+--------+-----------+----------+---------+---------+--------------+-------------
c | circle | | | | plain | |
Inherits: test_ex_constraints
+Access method: heap
DROP TABLE test_ex_constraints_inh;
DROP TABLE test_ex_constraints;
@@ -1150,6 +1159,7 @@ Indexes:
"test_primary_constraints_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "test_foreign_constraints" CONSTRAINT "test_foreign_constraints_id1_fkey" FOREIGN KEY (id1) REFERENCES test_primary_constraints(id)
+Access method: heap
\d+ test_foreign_constraints
Table "public.test_foreign_constraints"
@@ -1159,6 +1169,7 @@ Referenced by:
Foreign-key constraints:
"test_foreign_constraints_id1_fkey" FOREIGN KEY (id1) REFERENCES test_primary_constraints(id)
Child tables: test_foreign_constraints_inh
+Access method: heap
ALTER TABLE test_foreign_constraints DROP CONSTRAINT test_foreign_constraints_id1_fkey;
\d+ test_foreign_constraints
@@ -1167,6 +1178,7 @@ ALTER TABLE test_foreign_constraints DROP CONSTRAINT test_foreign_constraints_id
--------+---------+-----------+----------+---------+---------+--------------+-------------
id1 | integer | | | | plain | |
Child tables: test_foreign_constraints_inh
+Access method: heap
\d+ test_foreign_constraints_inh
Table "public.test_foreign_constraints_inh"
@@ -1174,6 +1186,7 @@ Child tables: test_foreign_constraints_inh
--------+---------+-----------+----------+---------+---------+--------------+-------------
id1 | integer | | | | plain | |
Inherits: test_foreign_constraints
+Access method: heap
DROP TABLE test_foreign_constraints_inh;
DROP TABLE test_foreign_constraints;
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index 1cf6531c01..48ad462e3d 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -156,6 +156,7 @@ Rules:
irule3 AS
ON INSERT TO inserttest2 DO INSERT INTO inserttest (f4[1].if1, f4[1].if2[2]) SELECT new.f1,
new.f2
+Access method: heap
drop table inserttest2;
drop table inserttest;
@@ -461,6 +462,7 @@ Partitions: part_aa_bb FOR VALUES IN ('aa', 'bb'),
part_null FOR VALUES IN (NULL),
part_xx_yy FOR VALUES IN ('xx', 'yy'), PARTITIONED,
part_default DEFAULT, PARTITIONED
+Access method: heap
-- cleanup
drop table range_parted, list_parted;
@@ -476,6 +478,7 @@ create table part_default partition of list_parted default;
a | integer | | | | plain | |
Partition of: list_parted DEFAULT
No partition constraint
+Access method: heap
insert into part_default values (null);
insert into part_default values (1);
@@ -813,6 +816,7 @@ Partitions: mcrparted1_lt_b FOR VALUES FROM (MINVALUE, MINVALUE) TO ('b', MINVAL
mcrparted6_common_ge_10 FOR VALUES FROM ('common', 10) TO ('common', MAXVALUE),
mcrparted7_gt_common_lt_d FOR VALUES FROM ('common', MAXVALUE) TO ('d', MINVALUE),
mcrparted8_ge_d FOR VALUES FROM ('d', MINVALUE) TO (MAXVALUE, MAXVALUE)
+Access method: heap
\d+ mcrparted1_lt_b
Table "public.mcrparted1_lt_b"
@@ -822,6 +826,7 @@ Partitions: mcrparted1_lt_b FOR VALUES FROM (MINVALUE, MINVALUE) TO ('b', MINVAL
b | integer | | | | plain | |
Partition of: mcrparted FOR VALUES FROM (MINVALUE, MINVALUE) TO ('b', MINVALUE)
Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a < 'b'::text))
+Access method: heap
\d+ mcrparted2_b
Table "public.mcrparted2_b"
@@ -831,6 +836,7 @@ Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a < 'b'::text))
b | integer | | | | plain | |
Partition of: mcrparted FOR VALUES FROM ('b', MINVALUE) TO ('c', MINVALUE)
Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a >= 'b'::text) AND (a < 'c'::text))
+Access method: heap
\d+ mcrparted3_c_to_common
Table "public.mcrparted3_c_to_common"
@@ -840,6 +846,7 @@ Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a >= 'b'::text)
b | integer | | | | plain | |
Partition of: mcrparted FOR VALUES FROM ('c', MINVALUE) TO ('common', MINVALUE)
Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a >= 'c'::text) AND (a < 'common'::text))
+Access method: heap
\d+ mcrparted4_common_lt_0
Table "public.mcrparted4_common_lt_0"
@@ -849,6 +856,7 @@ Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a >= 'c'::text)
b | integer | | | | plain | |
Partition of: mcrparted FOR VALUES FROM ('common', MINVALUE) TO ('common', 0)
Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a = 'common'::text) AND (b < 0))
+Access method: heap
\d+ mcrparted5_common_0_to_10
Table "public.mcrparted5_common_0_to_10"
@@ -858,6 +866,7 @@ Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a = 'common'::te
b | integer | | | | plain | |
Partition of: mcrparted FOR VALUES FROM ('common', 0) TO ('common', 10)
Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a = 'common'::text) AND (b >= 0) AND (b < 10))
+Access method: heap
\d+ mcrparted6_common_ge_10
Table "public.mcrparted6_common_ge_10"
@@ -867,6 +876,7 @@ Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a = 'common'::te
b | integer | | | | plain | |
Partition of: mcrparted FOR VALUES FROM ('common', 10) TO ('common', MAXVALUE)
Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a = 'common'::text) AND (b >= 10))
+Access method: heap
\d+ mcrparted7_gt_common_lt_d
Table "public.mcrparted7_gt_common_lt_d"
@@ -876,6 +886,7 @@ Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a = 'common'::te
b | integer | | | | plain | |
Partition of: mcrparted FOR VALUES FROM ('common', MAXVALUE) TO ('d', MINVALUE)
Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a > 'common'::text) AND (a < 'd'::text))
+Access method: heap
\d+ mcrparted8_ge_d
Table "public.mcrparted8_ge_d"
@@ -885,6 +896,7 @@ Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a > 'common'::te
b | integer | | | | plain | |
Partition of: mcrparted FOR VALUES FROM ('d', MINVALUE) TO (MAXVALUE, MAXVALUE)
Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a >= 'd'::text))
+Access method: heap
insert into mcrparted values ('aaa', 0), ('b', 0), ('bz', 10), ('c', -10),
('comm', -10), ('common', -10), ('common', 0), ('common', 10),
diff --git a/src/test/regress/expected/matview.out b/src/test/regress/expected/matview.out
index 08cd4bea48..af943ea430 100644
--- a/src/test/regress/expected/matview.out
+++ b/src/test/regress/expected/matview.out
@@ -104,6 +104,7 @@ View definition:
mvtest_tv.totamt
FROM mvtest_tv
ORDER BY mvtest_tv.type;
+Access method: heap
\d+ mvtest_tvm
Materialized view "public.mvtest_tvm"
@@ -116,6 +117,7 @@ View definition:
mvtest_tv.totamt
FROM mvtest_tv
ORDER BY mvtest_tv.type;
+Access method: heap
\d+ mvtest_tvvm
Materialized view "public.mvtest_tvvm"
@@ -125,6 +127,7 @@ View definition:
View definition:
SELECT mvtest_tvv.grandtot
FROM mvtest_tvv;
+Access method: heap
\d+ mvtest_bb
Materialized view "public.mvtest_bb"
@@ -136,6 +139,7 @@ Indexes:
View definition:
SELECT mvtest_tvvmv.grandtot
FROM mvtest_tvvmv;
+Access method: heap
-- test schema behavior
CREATE SCHEMA mvtest_mvschema;
@@ -152,6 +156,7 @@ Indexes:
View definition:
SELECT sum(mvtest_tvm.totamt) AS grandtot
FROM mvtest_mvschema.mvtest_tvm;
+Access method: heap
SET search_path = mvtest_mvschema, public;
\d+ mvtest_tvm
@@ -165,6 +170,7 @@ View definition:
mvtest_tv.totamt
FROM mvtest_tv
ORDER BY mvtest_tv.type;
+Access method: heap
-- modify the underlying table data
INSERT INTO mvtest_t VALUES (6, 'z', 13);
@@ -369,6 +375,7 @@ UNION ALL
SELECT mvtest_vt2.moo,
3 * mvtest_vt2.moo
FROM mvtest_vt2;
+Access method: heap
CREATE MATERIALIZED VIEW mv_test3 AS SELECT * FROM mv_test2 WHERE moo = 12345;
SELECT relispopulated FROM pg_class WHERE oid = 'mv_test3'::regclass;
@@ -507,6 +514,7 @@ View definition:
'foo'::text AS u,
'foo'::text AS u2,
NULL::text AS n;
+Access method: heap
SELECT * FROM mv_unspecified_types;
i | num | u | u2 | n
diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out
index afbbdd543d..439a592778 100644
--- a/src/test/regress/expected/publication.out
+++ b/src/test/regress/expected/publication.out
@@ -74,6 +74,7 @@ Indexes:
"testpub_tbl2_pkey" PRIMARY KEY, btree (id)
Publications:
"testpub_foralltables"
+Access method: heap
\dRp+ testpub_foralltables
Publication testpub_foralltables
@@ -150,6 +151,7 @@ Publications:
"testpib_ins_trunct"
"testpub_default"
"testpub_fortbl"
+Access method: heap
\d+ testpub_tbl1
Table "public.testpub_tbl1"
@@ -163,6 +165,7 @@ Publications:
"testpib_ins_trunct"
"testpub_default"
"testpub_fortbl"
+Access method: heap
\dRp+ testpub_default
Publication testpub_default
@@ -188,6 +191,7 @@ Indexes:
Publications:
"testpib_ins_trunct"
"testpub_fortbl"
+Access method: heap
-- permissions
SET ROLE regress_publication_user2;
diff --git a/src/test/regress/expected/replica_identity.out b/src/test/regress/expected/replica_identity.out
index 175ecd2879..9ae7a090b4 100644
--- a/src/test/regress/expected/replica_identity.out
+++ b/src/test/regress/expected/replica_identity.out
@@ -171,6 +171,7 @@ Indexes:
"test_replica_identity_hash" hash (nonkey)
"test_replica_identity_keyab" btree (keya, keyb)
Replica Identity: FULL
+Access method: heap
ALTER TABLE test_replica_identity REPLICA IDENTITY NOTHING;
SELECT relreplident FROM pg_class WHERE oid = 'test_replica_identity'::regclass;
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 1d12b01068..b01ff58c41 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -958,6 +958,7 @@ Policies:
Partitions: part_document_fiction FOR VALUES FROM (11) TO (12),
part_document_nonfiction FOR VALUES FROM (99) TO (100),
part_document_satire FOR VALUES FROM (55) TO (56)
+Access method: heap
SELECT * FROM pg_policies WHERE schemaname = 'regress_rls_schema' AND tablename like '%part_document%' ORDER BY policyname;
schemaname | tablename | policyname | permissive | roles | cmd | qual | with_check
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b68b8d273f..22c38ae2e8 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2817,6 +2817,7 @@ Rules:
r3 AS
ON DELETE TO rules_src DO
NOTIFY rules_src_deletion
+Access method: heap
--
-- Ensure an aliased target relation for insert is correctly deparsed.
@@ -2845,6 +2846,7 @@ Rules:
r5 AS
ON UPDATE TO rules_src DO INSTEAD UPDATE rules_log trgt SET tag = 'updated'::text
WHERE trgt.f1 = new.f1
+Access method: heap
--
-- check alter rename rule
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
index d09326c182..6b857bbc14 100644
--- a/src/test/regress/expected/update.out
+++ b/src/test/regress/expected/update.out
@@ -669,6 +669,7 @@ create table part_def partition of range_parted default;
e | character varying | | | | extended | |
Partition of: range_parted DEFAULT
Partition constraint: (NOT ((a IS NOT NULL) AND (b IS NOT NULL) AND (((a = 'a'::text) AND (b >= '1'::bigint) AND (b < '10'::bigint)) OR ((a = 'a'::text) AND (b >= '10'::bigint) AND (b < '20'::bigint)) OR ((a = 'b'::text) AND (b >= '1'::bigint) AND (b < '10'::bigint)) OR ((a = 'b'::text) AND (b >= '10'::bigint) AND (b < '20'::bigint)) OR ((a = 'b'::text) AND (b >= '20'::bigint) AND (b < '30'::bigint)))))
+Access method: heap
insert into range_parted values ('c', 9);
-- ok
Hi,
On 2018-12-15 20:15:12 +0100, Dmitry Dolgov wrote:
On Tue, Dec 11, 2018 at 3:13 AM Andres Freund <andres@anarazel.de> wrote:
Further tasks I'm not yet planning to tackle, that I'd welcome help on:
- pg_dump support
- pg_upgrade testing
- I think we should consider removing HeapTuple->t_tableOid, it should
imo live entirely in the slotI'm a bit confused, but what kind of pg_dump support you're talking about?
After a quick glance I don't see so far any table access specific logic there.
To check it I've created a test access method (which is a copy of heap, but
with some small differences) and pg_dump worked as expected.
We need to dump the table access method at dump time, otherwise we loose
that information.
As a side note, in a table description I haven't found any mention of which
access method is used for this table, probably it's useful to show that with \d+
(see the attached patch).
I'm not convinced that's really worth the cost of including it in \d
(rather than \d+ or such). When developing an alternative access method
it's extremely useful to be able to just change the default access
method, and run the existing tests, which this makes harder. It's also a
lot of churn.
Greetings,
Andres Freund
On Sat, Dec 15, 2018 at 8:37 PM Andres Freund <andres@anarazel.de> wrote:
We need to dump the table access method at dump time, otherwise we loose
that information.
Oh, right. So, something like in the attached patch?
As a side note, in a table description I haven't found any mention of which
access method is used for this table, probably it's useful to show that with \d+
(see the attached patch).I'm not convinced that's really worth the cost of including it in \d
(rather than \d+ or such).
Maybe I'm missing the point, but I've meant exactly the same and the patch,
suggested in the previous email, add this info to \d+
Attachments:
pg_dump_access_method.patchapplication/octet-stream; name=pg_dump_access_method.patchDownload
commit 37cfd7cf84fcdaeff7ba5ed6e56c6692377e9b37
Author: erthalion <9erthalion6@gmail.com>
Date: Sun Dec 16 20:31:33 2018 +0100
pg_dump support
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 637c79af48..fca00d7b5c 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -5829,6 +5829,7 @@ getTables(Archive *fout, int *numTables)
int i_partkeydef;
int i_ispartition;
int i_partbound;
+ int i_amname;
/*
* Find all the tables and table-like objects.
@@ -5914,7 +5915,7 @@ getTables(Archive *fout, int *numTables)
"tc.relfrozenxid AS tfrozenxid, "
"tc.relminmxid AS tminmxid, "
"c.relpersistence, c.relispopulated, "
- "c.relreplident, c.relpages, "
+ "c.relreplident, c.relpages, am.amname AS amname, "
"CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -5945,6 +5946,7 @@ getTables(Archive *fout, int *numTables)
"d.objsubid = 0 AND "
"d.refclassid = c.tableoid AND d.deptype IN ('a', 'i')) "
"LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid) "
+ "LEFT JOIN pg_am am ON (c.relam = am.oid) "
"LEFT JOIN pg_init_privs pip ON "
"(c.oid = pip.objoid "
"AND pip.classoid = 'pg_class'::regclass "
@@ -6412,6 +6414,7 @@ getTables(Archive *fout, int *numTables)
i_partkeydef = PQfnumber(res, "partkeydef");
i_ispartition = PQfnumber(res, "ispartition");
i_partbound = PQfnumber(res, "partbound");
+ i_amname = PQfnumber(res, "amname");
if (dopt->lockWaitTimeout)
{
@@ -6481,6 +6484,10 @@ getTables(Archive *fout, int *numTables)
else
tblinfo[i].checkoption = pg_strdup(PQgetvalue(res, i, i_checkoption));
tblinfo[i].toast_reloptions = pg_strdup(PQgetvalue(res, i, i_toastreloptions));
+ if (PQgetisnull(res, i, i_amname))
+ tblinfo[i].amname = NULL;
+ else
+ tblinfo[i].amname = pg_strdup(PQgetvalue(res, i, i_amname));
/* other fields were zeroed above */
@@ -12546,6 +12553,9 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
case AMTYPE_INDEX:
appendPQExpBuffer(q, "TYPE INDEX ");
break;
+ case AMTYPE_TABLE:
+ appendPQExpBuffer(q, "TYPE TABLE ");
+ break;
default:
write_msg(NULL, "WARNING: invalid type \"%c\" of access method \"%s\"\n",
aminfo->amtype, qamname);
@@ -15601,6 +15611,9 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
if (tbinfo->relkind == RELKIND_PARTITIONED_TABLE)
appendPQExpBuffer(q, "\nPARTITION BY %s", tbinfo->partkeydef);
+ if (tbinfo->amname != NULL && strcmp(tbinfo->amname, "heap") != 0)
+ appendPQExpBuffer(q, "\nUSING %s", tbinfo->amname);
+
if (tbinfo->relkind == RELKIND_FOREIGN_TABLE)
appendPQExpBuffer(q, "\nSERVER %s", fmtId(srvname));
}
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 789d6a24e2..4ca6a802f3 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -324,6 +324,7 @@ typedef struct _tableInfo
char *partkeydef; /* partition key definition */
char *partbound; /* partition bound definition */
bool needs_override; /* has GENERATED ALWAYS AS IDENTITY */
+ char *amname; /* table access method */
/*
* Stuff computed only for dumpable tables.
On Mon, Dec 10, 2018 at 8:13 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
Just out of curiosity I've also tried tpc-c from oltpbench (in the very same
simple environment), it doesn't show any significant difference from master as
well.
FWIW, I have found BenchmarkSQL to be significantly better than
oltpbench, having used both quite a bit now:
https://bitbucket.org/openscg/benchmarksql
For example, oltpbench requires a max_connections setting that far
exceeds the number of terminals/clients used by the benchmark, because
the number of connections used during bulk loading far exceeds what is
truly required. BenchmarkSQL also makes it easy to generate useful
html reports, complete with graphs.
--
Peter Geoghegan
On Sat, Dec 15, 2018 at 8:37 PM Andres Freund <andres@anarazel.de> wrote:
We need to dump the table access method at dump time, otherwise we loose
that information.
As a result of the discussion in [1]/messages/by-id/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de (btw, thanks for starting it), here is
proposed solution with tracking current default_table_access_method. Next I'll
tackle similar issue for psql and probably add some tests for both patches.
[1]: /messages/by-id/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de
Attachments:
pg_dump_access_method.patchapplication/octet-stream; name=pg_dump_access_method.patchDownload
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 58bd3805f4..f9bae43132 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -85,6 +85,7 @@ static void _becomeUser(ArchiveHandle *AH, const char *user);
static void _becomeOwner(ArchiveHandle *AH, TocEntry *te);
static void _selectOutputSchema(ArchiveHandle *AH, const char *schemaName);
static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
+static void _selectTableAccessMethod(ArchiveHandle *AH, const char *tablespace);
static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
static void processSearchPathEntry(ArchiveHandle *AH, TocEntry *te);
@@ -1072,6 +1073,7 @@ ArchiveEntry(Archive *AHX,
const char *namespace,
const char *tablespace,
const char *owner,
+ const char *tableam,
const char *desc, teSection section,
const char *defn,
const char *dropStmt, const char *copyStmt,
@@ -1099,6 +1101,7 @@ ArchiveEntry(Archive *AHX,
newToc->tag = pg_strdup(tag);
newToc->namespace = namespace ? pg_strdup(namespace) : NULL;
newToc->tablespace = tablespace ? pg_strdup(tablespace) : NULL;
+ newToc->tableam = tableam ? pg_strdup(tableam) : NULL;
newToc->owner = pg_strdup(owner);
newToc->desc = pg_strdup(desc);
newToc->defn = pg_strdup(defn);
@@ -2367,6 +2370,7 @@ _allocAH(const char *FileSpec, const ArchiveFormat fmt,
AH->currUser = NULL; /* unknown */
AH->currSchema = NULL; /* ditto */
AH->currTablespace = NULL; /* ditto */
+ AH->currTableAm = NULL; /* ditto */
AH->toc = (TocEntry *) pg_malloc0(sizeof(TocEntry));
@@ -2594,6 +2598,7 @@ WriteToc(ArchiveHandle *AH)
WriteStr(AH, te->namespace);
WriteStr(AH, te->tablespace);
WriteStr(AH, te->owner);
+ WriteStr(AH, te->tableam);
WriteStr(AH, "false");
/* Dump list of dependencies */
@@ -2696,6 +2701,7 @@ ReadToc(ArchiveHandle *AH)
te->tablespace = ReadStr(AH);
te->owner = ReadStr(AH);
+ te->tableam = ReadStr(AH);
if (AH->version < K_VERS_1_9 || strcmp(ReadStr(AH), "true") == 0)
write_msg(modulename,
"WARNING: restoring tables WITH OIDS is not supported anymore");
@@ -3288,6 +3294,9 @@ _reconnectToDB(ArchiveHandle *AH, const char *dbname)
if (AH->currTablespace)
free(AH->currTablespace);
AH->currTablespace = NULL;
+ if (AH->currTableAm)
+ free(AH->currTableAm);
+ AH->currTableAm = NULL;
/* re-establish fixed state */
_doSetFixedOutputState(AH);
@@ -3448,6 +3457,48 @@ _selectTablespace(ArchiveHandle *AH, const char *tablespace)
destroyPQExpBuffer(qry);
}
+/*
+ * Set the proper default_table_access_method value for the table.
+ */
+static void
+_selectTableAccessMethod(ArchiveHandle *AH, const char *tableam)
+{
+ PQExpBuffer cmd = createPQExpBuffer();
+ const char *want, *have;
+
+ have = AH->currTableAm;
+ want = tableam;
+
+ if (!want)
+ return;
+
+ if (have && strcmp(want, have) == 0)
+ return;
+
+
+ appendPQExpBuffer(cmd, "SET default_table_access_method = %s;", tableam);
+
+ if (RestoringToDB(AH))
+ {
+ PGresult *res;
+
+ res = PQexec(AH->connection, cmd->data);
+
+ if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
+ warn_or_exit_horribly(AH, modulename,
+ "could not set default_table_access_method: %s",
+ PQerrorMessage(AH->connection));
+
+ PQclear(res);
+ }
+ else
+ ahprintf(AH, "%s\n\n", cmd->data);
+
+ destroyPQExpBuffer(cmd);
+
+ AH->currTableAm = pg_strdup(want);
+}
+
/*
* Extract an object description for a TOC entry, and append it to buf.
*
@@ -3547,6 +3598,7 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
_selectTablespace(AH, te->tablespace);
+ _selectTableAccessMethod(AH, te->tableam);
/* Emit header comment for item */
if (!AH->noTocComments)
@@ -4021,6 +4073,9 @@ restore_toc_entries_prefork(ArchiveHandle *AH, TocEntry *pending_list)
if (AH->currTablespace)
free(AH->currTablespace);
AH->currTablespace = NULL;
+ if (AH->currTableAm)
+ free(AH->currTableAm);
+ AH->currTableAm = NULL;
}
/*
@@ -4816,6 +4871,7 @@ CloneArchive(ArchiveHandle *AH)
clone->currUser = NULL;
clone->currSchema = NULL;
clone->currTablespace = NULL;
+ clone->currTableAm = NULL;
/* savedPassword must be local in case we change it while connecting */
if (clone->savedPassword)
@@ -4906,6 +4962,8 @@ DeCloneArchive(ArchiveHandle *AH)
free(AH->currSchema);
if (AH->currTablespace)
free(AH->currTablespace);
+ if (AH->currTableAm)
+ free(AH->currTableAm);
if (AH->savedPassword)
free(AH->savedPassword);
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 306d2ceba9..719065565b 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -347,6 +347,7 @@ struct _archiveHandle
char *currUser; /* current username, or NULL if unknown */
char *currSchema; /* current schema, or NULL */
char *currTablespace; /* current tablespace, or NULL */
+ char *currTableAm; /* current table access method, or NULL */
void *lo_buf;
size_t lo_buf_used;
@@ -373,6 +374,8 @@ struct _tocEntry
char *namespace; /* null or empty string if not in a schema */
char *tablespace; /* null if not in a tablespace; empty string
* means use database default */
+ char *tableam; /* table access method, onlyt for TABLE tags */
+
char *owner;
char *desc;
char *defn;
@@ -410,7 +413,7 @@ extern TocEntry *ArchiveEntry(Archive *AHX,
CatalogId catalogId, DumpId dumpId,
const char *tag,
const char *namespace, const char *tablespace,
- const char *owner,
+ const char *owner, const char *amname,
const char *desc, teSection section,
const char *defn,
const char *dropStmt, const char *copyStmt,
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 637c79af48..a3878ed9a2 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2136,7 +2136,7 @@ dumpTableData(Archive *fout, TableDataInfo *tdinfo)
te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
tbinfo->dobj.name, tbinfo->dobj.namespace->dobj.name,
- NULL, tbinfo->rolname,
+ NULL, tbinfo->rolname, NULL,
"TABLE DATA", SECTION_DATA,
"", "", copyStmt,
&(tbinfo->dobj.dumpId), 1,
@@ -2188,6 +2188,7 @@ refreshMatViewData(Archive *fout, TableDataInfo *tdinfo)
tbinfo->dobj.namespace->dobj.name, /* Namespace */
NULL, /* Tablespace */
tbinfo->rolname, /* Owner */
+ NULL, /* Table access method */
"MATERIALIZED VIEW DATA", /* Desc */
SECTION_POST_DATA, /* Section */
q->data, /* Create */
@@ -2726,6 +2727,7 @@ dumpDatabase(Archive *fout)
NULL, /* Namespace */
NULL, /* Tablespace */
dba, /* Owner */
+ NULL, /* Table access method */
"DATABASE", /* Desc */
SECTION_PRE_DATA, /* Section */
creaQry->data, /* Create */
@@ -2762,7 +2764,7 @@ dumpDatabase(Archive *fout)
appendPQExpBufferStr(dbQry, ";\n");
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- labelq->data, NULL, NULL, dba,
+ labelq->data, NULL, NULL, dba, NULL,
"COMMENT", SECTION_NONE,
dbQry->data, "", NULL,
&(dbDumpId), 1,
@@ -2789,7 +2791,7 @@ dumpDatabase(Archive *fout)
emitShSecLabels(conn, shres, seclabelQry, "DATABASE", datname);
if (seclabelQry->len > 0)
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- labelq->data, NULL, NULL, dba,
+ labelq->data, NULL, NULL, dba, NULL,
"SECURITY LABEL", SECTION_NONE,
seclabelQry->data, "", NULL,
&(dbDumpId), 1,
@@ -2859,7 +2861,7 @@ dumpDatabase(Archive *fout)
if (creaQry->len > 0)
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- datname, NULL, NULL, dba,
+ datname, NULL, NULL, dba, NULL,
"DATABASE PROPERTIES", SECTION_PRE_DATA,
creaQry->data, delQry->data, NULL,
&(dbDumpId), 1,
@@ -2904,7 +2906,7 @@ dumpDatabase(Archive *fout)
atooid(PQgetvalue(lo_res, 0, i_relminmxid)),
LargeObjectRelationId);
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- "pg_largeobject", NULL, NULL, "",
+ "pg_largeobject", NULL, NULL, "", NULL,
"pg_largeobject", SECTION_PRE_DATA,
loOutQry->data, "", NULL,
NULL, 0,
@@ -3014,7 +3016,7 @@ dumpEncoding(Archive *AH)
appendPQExpBufferStr(qry, ";\n");
ArchiveEntry(AH, nilCatalogId, createDumpId(),
- "ENCODING", NULL, NULL, "",
+ "ENCODING", NULL, NULL, "", NULL,
"ENCODING", SECTION_PRE_DATA,
qry->data, "", NULL,
NULL, 0,
@@ -3041,7 +3043,7 @@ dumpStdStrings(Archive *AH)
stdstrings);
ArchiveEntry(AH, nilCatalogId, createDumpId(),
- "STDSTRINGS", NULL, NULL, "",
+ "STDSTRINGS", NULL, NULL, "", NULL,
"STDSTRINGS", SECTION_PRE_DATA,
qry->data, "", NULL,
NULL, 0,
@@ -3097,7 +3099,7 @@ dumpSearchPath(Archive *AH)
write_msg(NULL, "saving search_path = %s\n", path->data);
ArchiveEntry(AH, nilCatalogId, createDumpId(),
- "SEARCHPATH", NULL, NULL, "",
+ "SEARCHPATH", NULL, NULL, "", NULL,
"SEARCHPATH", SECTION_PRE_DATA,
qry->data, "", NULL,
NULL, 0,
@@ -3275,7 +3277,7 @@ dumpBlob(Archive *fout, BlobInfo *binfo)
ArchiveEntry(fout, binfo->dobj.catId, binfo->dobj.dumpId,
binfo->dobj.name,
NULL, NULL,
- binfo->rolname,
+ binfo->rolname, NULL,
"BLOB", SECTION_PRE_DATA,
cquery->data, dquery->data, NULL,
NULL, 0,
@@ -3581,6 +3583,7 @@ dumpPolicy(Archive *fout, PolicyInfo *polinfo)
polinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"ROW SECURITY", SECTION_POST_DATA,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -3637,6 +3640,7 @@ dumpPolicy(Archive *fout, PolicyInfo *polinfo)
polinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"POLICY", SECTION_POST_DATA,
query->data, delqry->data, NULL,
NULL, 0,
@@ -3811,6 +3815,7 @@ dumpPublication(Archive *fout, PublicationInfo *pubinfo)
NULL,
NULL,
pubinfo->rolname,
+ NULL,
"PUBLICATION", SECTION_POST_DATA,
query->data, delq->data, NULL,
NULL, 0,
@@ -3954,6 +3959,7 @@ dumpPublicationTable(Archive *fout, PublicationRelInfo *pubrinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
"",
+ NULL,
"PUBLICATION TABLE", SECTION_POST_DATA,
query->data, "", NULL,
NULL, 0,
@@ -4147,6 +4153,7 @@ dumpSubscription(Archive *fout, SubscriptionInfo *subinfo)
NULL,
NULL,
subinfo->rolname,
+ NULL,
"SUBSCRIPTION", SECTION_POST_DATA,
query->data, delq->data, NULL,
NULL, 0,
@@ -5829,6 +5836,7 @@ getTables(Archive *fout, int *numTables)
int i_partkeydef;
int i_ispartition;
int i_partbound;
+ int i_amname;
/*
* Find all the tables and table-like objects.
@@ -5914,7 +5922,7 @@ getTables(Archive *fout, int *numTables)
"tc.relfrozenxid AS tfrozenxid, "
"tc.relminmxid AS tminmxid, "
"c.relpersistence, c.relispopulated, "
- "c.relreplident, c.relpages, "
+ "c.relreplident, c.relpages, am.amname AS amname, "
"CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -5945,6 +5953,7 @@ getTables(Archive *fout, int *numTables)
"d.objsubid = 0 AND "
"d.refclassid = c.tableoid AND d.deptype IN ('a', 'i')) "
"LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid) "
+ "LEFT JOIN pg_am am ON (c.relam = am.oid) "
"LEFT JOIN pg_init_privs pip ON "
"(c.oid = pip.objoid "
"AND pip.classoid = 'pg_class'::regclass "
@@ -6412,6 +6421,7 @@ getTables(Archive *fout, int *numTables)
i_partkeydef = PQfnumber(res, "partkeydef");
i_ispartition = PQfnumber(res, "ispartition");
i_partbound = PQfnumber(res, "partbound");
+ i_amname = PQfnumber(res, "amname");
if (dopt->lockWaitTimeout)
{
@@ -6481,6 +6491,11 @@ getTables(Archive *fout, int *numTables)
else
tblinfo[i].checkoption = pg_strdup(PQgetvalue(res, i, i_checkoption));
tblinfo[i].toast_reloptions = pg_strdup(PQgetvalue(res, i, i_toastreloptions));
+ if (PQgetisnull(res, i, i_amname))
+ tblinfo[i].amname = NULL;
+ else
+ tblinfo[i].amname = pg_strdup(PQgetvalue(res, i, i_amname));
+
/* other fields were zeroed above */
@@ -9355,7 +9370,7 @@ dumpComment(Archive *fout, const char *type, const char *name,
* post-data.
*/
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- tag->data, namespace, NULL, owner,
+ tag->data, namespace, NULL, owner, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(dumpId), 1,
@@ -9423,7 +9438,7 @@ dumpTableComment(Archive *fout, TableInfo *tbinfo,
ArchiveEntry(fout, nilCatalogId, createDumpId(),
tag->data,
tbinfo->dobj.namespace->dobj.name,
- NULL, tbinfo->rolname,
+ NULL, tbinfo->rolname, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -9447,7 +9462,7 @@ dumpTableComment(Archive *fout, TableInfo *tbinfo,
ArchiveEntry(fout, nilCatalogId, createDumpId(),
tag->data,
tbinfo->dobj.namespace->dobj.name,
- NULL, tbinfo->rolname,
+ NULL, tbinfo->rolname, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -9728,7 +9743,7 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
TocEntry *te;
te = ArchiveEntry(fout, dobj->catId, dobj->dumpId,
- dobj->name, NULL, NULL, "",
+ dobj->name, NULL, NULL, "", NULL,
"BLOBS", SECTION_DATA,
"", "", NULL,
NULL, 0,
@@ -9802,7 +9817,7 @@ dumpNamespace(Archive *fout, NamespaceInfo *nspinfo)
ArchiveEntry(fout, nspinfo->dobj.catId, nspinfo->dobj.dumpId,
nspinfo->dobj.name,
NULL, NULL,
- nspinfo->rolname,
+ nspinfo->rolname, NULL,
"SCHEMA", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -9938,7 +9953,7 @@ dumpExtension(Archive *fout, ExtensionInfo *extinfo)
ArchiveEntry(fout, extinfo->dobj.catId, extinfo->dobj.dumpId,
extinfo->dobj.name,
NULL, NULL,
- "",
+ "", NULL,
"EXTENSION", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10090,6 +10105,7 @@ dumpEnumType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10217,6 +10233,7 @@ dumpRangeType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10290,6 +10307,7 @@ dumpUndefinedType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10572,6 +10590,7 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10729,6 +10748,7 @@ dumpDomain(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"DOMAIN", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10951,6 +10971,7 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -11085,7 +11106,7 @@ dumpCompositeTypeColComments(Archive *fout, TypeInfo *tyinfo)
ArchiveEntry(fout, nilCatalogId, createDumpId(),
target->data,
tyinfo->dobj.namespace->dobj.name,
- NULL, tyinfo->rolname,
+ NULL, tyinfo->rolname, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(tyinfo->dobj.dumpId), 1,
@@ -11142,6 +11163,7 @@ dumpShellType(Archive *fout, ShellTypeInfo *stinfo)
stinfo->dobj.namespace->dobj.name,
NULL,
stinfo->baseType->rolname,
+ NULL,
"SHELL TYPE", SECTION_PRE_DATA,
q->data, "", NULL,
NULL, 0,
@@ -11251,7 +11273,7 @@ dumpProcLang(Archive *fout, ProcLangInfo *plang)
if (plang->dobj.dump & DUMP_COMPONENT_DEFINITION)
ArchiveEntry(fout, plang->dobj.catId, plang->dobj.dumpId,
plang->dobj.name,
- NULL, NULL, plang->lanowner,
+ NULL, NULL, plang->lanowner, NULL,
"PROCEDURAL LANGUAGE", SECTION_PRE_DATA,
defqry->data, delqry->data, NULL,
NULL, 0,
@@ -11924,6 +11946,7 @@ dumpFunc(Archive *fout, FuncInfo *finfo)
finfo->dobj.namespace->dobj.name,
NULL,
finfo->rolname,
+ NULL,
keyword, SECTION_PRE_DATA,
q->data, delqry->data, NULL,
NULL, 0,
@@ -12056,7 +12079,7 @@ dumpCast(Archive *fout, CastInfo *cast)
if (cast->dobj.dump & DUMP_COMPONENT_DEFINITION)
ArchiveEntry(fout, cast->dobj.catId, cast->dobj.dumpId,
labelq->data,
- NULL, NULL, "",
+ NULL, NULL, "", NULL,
"CAST", SECTION_PRE_DATA,
defqry->data, delqry->data, NULL,
NULL, 0,
@@ -12184,7 +12207,7 @@ dumpTransform(Archive *fout, TransformInfo *transform)
if (transform->dobj.dump & DUMP_COMPONENT_DEFINITION)
ArchiveEntry(fout, transform->dobj.catId, transform->dobj.dumpId,
labelq->data,
- NULL, NULL, "",
+ NULL, NULL, "", NULL,
"TRANSFORM", SECTION_PRE_DATA,
defqry->data, delqry->data, NULL,
transform->dobj.dependencies, transform->dobj.nDeps,
@@ -12400,6 +12423,7 @@ dumpOpr(Archive *fout, OprInfo *oprinfo)
oprinfo->dobj.namespace->dobj.name,
NULL,
oprinfo->rolname,
+ NULL,
"OPERATOR", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -12546,6 +12570,9 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
case AMTYPE_INDEX:
appendPQExpBuffer(q, "TYPE INDEX ");
break;
+ case AMTYPE_TABLE:
+ appendPQExpBuffer(q, "TYPE TABLE ");
+ break;
default:
write_msg(NULL, "WARNING: invalid type \"%c\" of access method \"%s\"\n",
aminfo->amtype, qamname);
@@ -12570,6 +12597,7 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
NULL,
NULL,
"",
+ NULL,
"ACCESS METHOD", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -12936,6 +12964,7 @@ dumpOpclass(Archive *fout, OpclassInfo *opcinfo)
opcinfo->dobj.namespace->dobj.name,
NULL,
opcinfo->rolname,
+ NULL,
"OPERATOR CLASS", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -13203,6 +13232,7 @@ dumpOpfamily(Archive *fout, OpfamilyInfo *opfinfo)
opfinfo->dobj.namespace->dobj.name,
NULL,
opfinfo->rolname,
+ NULL,
"OPERATOR FAMILY", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -13346,6 +13376,7 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
collinfo->dobj.namespace->dobj.name,
NULL,
collinfo->rolname,
+ NULL,
"COLLATION", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -13441,6 +13472,7 @@ dumpConversion(Archive *fout, ConvInfo *convinfo)
convinfo->dobj.namespace->dobj.name,
NULL,
convinfo->rolname,
+ NULL,
"CONVERSION", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -13930,6 +13962,7 @@ dumpAgg(Archive *fout, AggInfo *agginfo)
agginfo->aggfn.dobj.namespace->dobj.name,
NULL,
agginfo->aggfn.rolname,
+ NULL,
"AGGREGATE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14028,6 +14061,7 @@ dumpTSParser(Archive *fout, TSParserInfo *prsinfo)
prsinfo->dobj.namespace->dobj.name,
NULL,
"",
+ NULL,
"TEXT SEARCH PARSER", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14108,6 +14142,7 @@ dumpTSDictionary(Archive *fout, TSDictInfo *dictinfo)
dictinfo->dobj.namespace->dobj.name,
NULL,
dictinfo->rolname,
+ NULL,
"TEXT SEARCH DICTIONARY", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14169,6 +14204,7 @@ dumpTSTemplate(Archive *fout, TSTemplateInfo *tmplinfo)
tmplinfo->dobj.namespace->dobj.name,
NULL,
"",
+ NULL,
"TEXT SEARCH TEMPLATE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14289,6 +14325,7 @@ dumpTSConfig(Archive *fout, TSConfigInfo *cfginfo)
cfginfo->dobj.namespace->dobj.name,
NULL,
cfginfo->rolname,
+ NULL,
"TEXT SEARCH CONFIGURATION", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14355,6 +14392,7 @@ dumpForeignDataWrapper(Archive *fout, FdwInfo *fdwinfo)
NULL,
NULL,
fdwinfo->rolname,
+ NULL,
"FOREIGN DATA WRAPPER", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14446,6 +14484,7 @@ dumpForeignServer(Archive *fout, ForeignServerInfo *srvinfo)
NULL,
NULL,
srvinfo->rolname,
+ NULL,
"SERVER", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14564,6 +14603,7 @@ dumpUserMappings(Archive *fout,
namespace,
NULL,
owner,
+ NULL,
"USER MAPPING", SECTION_PRE_DATA,
q->data, delq->data, NULL,
&dumpId, 1,
@@ -14643,6 +14683,7 @@ dumpDefaultACL(Archive *fout, DefaultACLInfo *daclinfo)
daclinfo->dobj.namespace ? daclinfo->dobj.namespace->dobj.name : NULL,
NULL,
daclinfo->defaclrole,
+ NULL,
"DEFAULT ACL", SECTION_POST_DATA,
q->data, "", NULL,
NULL, 0,
@@ -14741,6 +14782,7 @@ dumpACL(Archive *fout, CatalogId objCatId, DumpId objDumpId,
tag->data, nspname,
NULL,
owner ? owner : "",
+ NULL,
"ACL", SECTION_NONE,
sql->data, "", NULL,
&(objDumpId), 1,
@@ -14826,7 +14868,7 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
appendPQExpBuffer(tag, "%s %s", type, name);
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- tag->data, namespace, NULL, owner,
+ tag->data, namespace, NULL, owner, NULL,
"SECURITY LABEL", SECTION_NONE,
query->data, "", NULL,
&(dumpId), 1,
@@ -14908,7 +14950,7 @@ dumpTableSecLabel(Archive *fout, TableInfo *tbinfo, const char *reltypename)
ArchiveEntry(fout, nilCatalogId, createDumpId(),
target->data,
tbinfo->dobj.namespace->dobj.name,
- NULL, tbinfo->rolname,
+ NULL, tbinfo->rolname, NULL,
"SECURITY LABEL", SECTION_NONE,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -15994,6 +16036,8 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
tbinfo->dobj.namespace->dobj.name,
(tbinfo->relkind == RELKIND_VIEW) ? NULL : tbinfo->reltablespace,
tbinfo->rolname,
+ (tbinfo->relkind == RELKIND_RELATION) ?
+ tbinfo->amname : NULL,
reltypename,
tbinfo->postponed_def ?
SECTION_POST_DATA : SECTION_PRE_DATA,
@@ -16074,6 +16118,7 @@ dumpAttrDef(Archive *fout, AttrDefInfo *adinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"DEFAULT", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16190,6 +16235,7 @@ dumpIndex(Archive *fout, IndxInfo *indxinfo)
tbinfo->dobj.namespace->dobj.name,
indxinfo->tablespace,
tbinfo->rolname,
+ NULL,
"INDEX", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16234,6 +16280,7 @@ dumpIndexAttach(Archive *fout, IndexAttachInfo *attachinfo)
attachinfo->dobj.namespace->dobj.name,
NULL,
"",
+ NULL,
"INDEX ATTACH", SECTION_POST_DATA,
q->data, "", NULL,
NULL, 0,
@@ -16289,6 +16336,7 @@ dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo)
statsextinfo->dobj.namespace->dobj.name,
NULL,
statsextinfo->rolname,
+ NULL,
"STATISTICS", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16450,6 +16498,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
tbinfo->dobj.namespace->dobj.name,
indxinfo->tablespace,
tbinfo->rolname,
+ NULL,
"CONSTRAINT", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16490,6 +16539,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"FK CONSTRAINT", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16522,6 +16572,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"CHECK CONSTRAINT", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16555,6 +16606,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"CHECK CONSTRAINT", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16829,6 +16881,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"SEQUENCE", SECTION_PRE_DATA,
query->data, delqry->data, NULL,
NULL, 0,
@@ -16870,6 +16923,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"SEQUENCE OWNED BY", SECTION_PRE_DATA,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -16938,6 +16992,7 @@ dumpSequenceData(Archive *fout, TableDataInfo *tdinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"SEQUENCE SET", SECTION_DATA,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -17137,6 +17192,7 @@ dumpTrigger(Archive *fout, TriggerInfo *tginfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"TRIGGER", SECTION_POST_DATA,
query->data, delqry->data, NULL,
NULL, 0,
@@ -17223,7 +17279,7 @@ dumpEventTrigger(Archive *fout, EventTriggerInfo *evtinfo)
if (evtinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
ArchiveEntry(fout, evtinfo->dobj.catId, evtinfo->dobj.dumpId,
evtinfo->dobj.name, NULL, NULL,
- evtinfo->evtowner,
+ evtinfo->evtowner, NULL,
"EVENT TRIGGER", SECTION_POST_DATA,
query->data, delqry->data, NULL,
NULL, 0,
@@ -17384,6 +17440,7 @@ dumpRule(Archive *fout, RuleInfo *rinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"RULE", SECTION_POST_DATA,
cmd->data, delcmd->data, NULL,
NULL, 0,
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 789d6a24e2..4024d0c1e3 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -324,6 +324,7 @@ typedef struct _tableInfo
char *partkeydef; /* partition key definition */
char *partbound; /* partition bound definition */
bool needs_override; /* has GENERATED ALWAYS AS IDENTITY */
+ char *amname; /* table access method */
/*
* Stuff computed only for dumpable tables.
Hi,
On 2019-01-12 01:35:06 +0100, Dmitry Dolgov wrote:
On Sat, Dec 15, 2018 at 8:37 PM Andres Freund <andres@anarazel.de> wrote:
We need to dump the table access method at dump time, otherwise we loose
that information.As a result of the discussion in [1] (btw, thanks for starting it), here is
proposed solution with tracking current default_table_access_method. Next I'll
tackle similar issue for psql and probably add some tests for both patches.
Thanks!
+/* + * Set the proper default_table_access_method value for the table. + */ +static void +_selectTableAccessMethod(ArchiveHandle *AH, const char *tableam) +{ + PQExpBuffer cmd = createPQExpBuffer(); + const char *want, *have; + + have = AH->currTableAm; + want = tableam; + + if (!want) + return; + + if (have && strcmp(want, have) == 0) + return; + + + appendPQExpBuffer(cmd, "SET default_table_access_method = %s;", tableam);
This needs escaping, at the very least with "", but better with proper
routines for dealing with identifiers.
@@ -5914,7 +5922,7 @@ getTables(Archive *fout, int *numTables) "tc.relfrozenxid AS tfrozenxid, " "tc.relminmxid AS tminmxid, " "c.relpersistence, c.relispopulated, " - "c.relreplident, c.relpages, " + "c.relreplident, c.relpages, am.amname AS amname, "
That AS doesn't do anything, does it?
/* other fields were zeroed above */
@@ -9355,7 +9370,7 @@ dumpComment(Archive *fout, const char *type, const char *name, * post-data. */ ArchiveEntry(fout, nilCatalogId, createDumpId(), - tag->data, namespace, NULL, owner, + tag->data, namespace, NULL, owner, NULL, "COMMENT", SECTION_NONE, query->data, "", NULL, &(dumpId), 1,
We really ought to move the arguments to a struct, so we don't generate
quite as much useless diffs whenever we do a change around one of
these...
Greetings,
Andres Freund
On Sat, Jan 12, 2019 at 1:44 AM Andres Freund <andres@anarazel.de> wrote:
+ appendPQExpBuffer(cmd, "SET default_table_access_method = %s;", tableam);
This needs escaping, at the very least with "", but better with proper
routines for dealing with identifiers.
Thanks for noticing, fixed.
@@ -5914,7 +5922,7 @@ getTables(Archive *fout, int *numTables) "tc.relfrozenxid AS tfrozenxid, " "tc.relminmxid AS tminmxid, " "c.relpersistence, c.relispopulated, " - "c.relreplident, c.relpages, " + "c.relreplident, c.relpages, am.amname AS amname, "That AS doesn't do anything, does it?
Rigth, I've renamed it few times and forgot to get rid of it. Removed.
/* other fields were zeroed above */
@@ -9355,7 +9370,7 @@ dumpComment(Archive *fout, const char *type, const char *name, * post-data. */ ArchiveEntry(fout, nilCatalogId, createDumpId(), - tag->data, namespace, NULL, owner, + tag->data, namespace, NULL, owner, NULL, "COMMENT", SECTION_NONE, query->data, "", NULL, &(dumpId), 1,We really ought to move the arguments to a struct, so we don't generate
quite as much useless diffs whenever we do a change around one of
these...
That's what I though too. Maybe then I'll suggest a mini-patch to the master to
refactor these arguments out into a separate struct, so we can leverage it here.
Attachments:
pg_dump_access_method_v2.patchapplication/octet-stream; name=pg_dump_access_method_v2.patchDownload
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 58bd3805f4..6f1b717e06 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -85,6 +85,7 @@ static void _becomeUser(ArchiveHandle *AH, const char *user);
static void _becomeOwner(ArchiveHandle *AH, TocEntry *te);
static void _selectOutputSchema(ArchiveHandle *AH, const char *schemaName);
static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
+static void _selectTableAccessMethod(ArchiveHandle *AH, const char *tablespace);
static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
static void processSearchPathEntry(ArchiveHandle *AH, TocEntry *te);
@@ -1072,6 +1073,7 @@ ArchiveEntry(Archive *AHX,
const char *namespace,
const char *tablespace,
const char *owner,
+ const char *tableam,
const char *desc, teSection section,
const char *defn,
const char *dropStmt, const char *copyStmt,
@@ -1099,6 +1101,7 @@ ArchiveEntry(Archive *AHX,
newToc->tag = pg_strdup(tag);
newToc->namespace = namespace ? pg_strdup(namespace) : NULL;
newToc->tablespace = tablespace ? pg_strdup(tablespace) : NULL;
+ newToc->tableam = tableam ? pg_strdup(tableam) : NULL;
newToc->owner = pg_strdup(owner);
newToc->desc = pg_strdup(desc);
newToc->defn = pg_strdup(defn);
@@ -2367,6 +2370,7 @@ _allocAH(const char *FileSpec, const ArchiveFormat fmt,
AH->currUser = NULL; /* unknown */
AH->currSchema = NULL; /* ditto */
AH->currTablespace = NULL; /* ditto */
+ AH->currTableAm = NULL; /* ditto */
AH->toc = (TocEntry *) pg_malloc0(sizeof(TocEntry));
@@ -2594,6 +2598,7 @@ WriteToc(ArchiveHandle *AH)
WriteStr(AH, te->namespace);
WriteStr(AH, te->tablespace);
WriteStr(AH, te->owner);
+ WriteStr(AH, te->tableam);
WriteStr(AH, "false");
/* Dump list of dependencies */
@@ -2696,6 +2701,7 @@ ReadToc(ArchiveHandle *AH)
te->tablespace = ReadStr(AH);
te->owner = ReadStr(AH);
+ te->tableam = ReadStr(AH);
if (AH->version < K_VERS_1_9 || strcmp(ReadStr(AH), "true") == 0)
write_msg(modulename,
"WARNING: restoring tables WITH OIDS is not supported anymore");
@@ -3288,6 +3294,9 @@ _reconnectToDB(ArchiveHandle *AH, const char *dbname)
if (AH->currTablespace)
free(AH->currTablespace);
AH->currTablespace = NULL;
+ if (AH->currTableAm)
+ free(AH->currTableAm);
+ AH->currTableAm = NULL;
/* re-establish fixed state */
_doSetFixedOutputState(AH);
@@ -3448,6 +3457,48 @@ _selectTablespace(ArchiveHandle *AH, const char *tablespace)
destroyPQExpBuffer(qry);
}
+/*
+ * Set the proper default_table_access_method value for the table.
+ */
+static void
+_selectTableAccessMethod(ArchiveHandle *AH, const char *tableam)
+{
+ PQExpBuffer cmd = createPQExpBuffer();
+ const char *want, *have;
+
+ have = AH->currTableAm;
+ want = tableam;
+
+ if (!want)
+ return;
+
+ if (have && strcmp(want, have) == 0)
+ return;
+
+
+ appendPQExpBuffer(cmd, "SET default_table_access_method = %s;", fmtId(want));
+
+ if (RestoringToDB(AH))
+ {
+ PGresult *res;
+
+ res = PQexec(AH->connection, cmd->data);
+
+ if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
+ warn_or_exit_horribly(AH, modulename,
+ "could not set default_table_access_method: %s",
+ PQerrorMessage(AH->connection));
+
+ PQclear(res);
+ }
+ else
+ ahprintf(AH, "%s\n\n", cmd->data);
+
+ destroyPQExpBuffer(cmd);
+
+ AH->currTableAm = pg_strdup(want);
+}
+
/*
* Extract an object description for a TOC entry, and append it to buf.
*
@@ -3547,6 +3598,7 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
_selectTablespace(AH, te->tablespace);
+ _selectTableAccessMethod(AH, te->tableam);
/* Emit header comment for item */
if (!AH->noTocComments)
@@ -4021,6 +4073,9 @@ restore_toc_entries_prefork(ArchiveHandle *AH, TocEntry *pending_list)
if (AH->currTablespace)
free(AH->currTablespace);
AH->currTablespace = NULL;
+ if (AH->currTableAm)
+ free(AH->currTableAm);
+ AH->currTableAm = NULL;
}
/*
@@ -4816,6 +4871,7 @@ CloneArchive(ArchiveHandle *AH)
clone->currUser = NULL;
clone->currSchema = NULL;
clone->currTablespace = NULL;
+ clone->currTableAm = NULL;
/* savedPassword must be local in case we change it while connecting */
if (clone->savedPassword)
@@ -4906,6 +4962,8 @@ DeCloneArchive(ArchiveHandle *AH)
free(AH->currSchema);
if (AH->currTablespace)
free(AH->currTablespace);
+ if (AH->currTableAm)
+ free(AH->currTableAm);
if (AH->savedPassword)
free(AH->savedPassword);
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 306d2ceba9..719065565b 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -347,6 +347,7 @@ struct _archiveHandle
char *currUser; /* current username, or NULL if unknown */
char *currSchema; /* current schema, or NULL */
char *currTablespace; /* current tablespace, or NULL */
+ char *currTableAm; /* current table access method, or NULL */
void *lo_buf;
size_t lo_buf_used;
@@ -373,6 +374,8 @@ struct _tocEntry
char *namespace; /* null or empty string if not in a schema */
char *tablespace; /* null if not in a tablespace; empty string
* means use database default */
+ char *tableam; /* table access method, onlyt for TABLE tags */
+
char *owner;
char *desc;
char *defn;
@@ -410,7 +413,7 @@ extern TocEntry *ArchiveEntry(Archive *AHX,
CatalogId catalogId, DumpId dumpId,
const char *tag,
const char *namespace, const char *tablespace,
- const char *owner,
+ const char *owner, const char *amname,
const char *desc, teSection section,
const char *defn,
const char *dropStmt, const char *copyStmt,
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 637c79af48..512c486546 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2136,7 +2136,7 @@ dumpTableData(Archive *fout, TableDataInfo *tdinfo)
te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
tbinfo->dobj.name, tbinfo->dobj.namespace->dobj.name,
- NULL, tbinfo->rolname,
+ NULL, tbinfo->rolname, NULL,
"TABLE DATA", SECTION_DATA,
"", "", copyStmt,
&(tbinfo->dobj.dumpId), 1,
@@ -2188,6 +2188,7 @@ refreshMatViewData(Archive *fout, TableDataInfo *tdinfo)
tbinfo->dobj.namespace->dobj.name, /* Namespace */
NULL, /* Tablespace */
tbinfo->rolname, /* Owner */
+ NULL, /* Table access method */
"MATERIALIZED VIEW DATA", /* Desc */
SECTION_POST_DATA, /* Section */
q->data, /* Create */
@@ -2726,6 +2727,7 @@ dumpDatabase(Archive *fout)
NULL, /* Namespace */
NULL, /* Tablespace */
dba, /* Owner */
+ NULL, /* Table access method */
"DATABASE", /* Desc */
SECTION_PRE_DATA, /* Section */
creaQry->data, /* Create */
@@ -2762,7 +2764,7 @@ dumpDatabase(Archive *fout)
appendPQExpBufferStr(dbQry, ";\n");
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- labelq->data, NULL, NULL, dba,
+ labelq->data, NULL, NULL, dba, NULL,
"COMMENT", SECTION_NONE,
dbQry->data, "", NULL,
&(dbDumpId), 1,
@@ -2789,7 +2791,7 @@ dumpDatabase(Archive *fout)
emitShSecLabels(conn, shres, seclabelQry, "DATABASE", datname);
if (seclabelQry->len > 0)
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- labelq->data, NULL, NULL, dba,
+ labelq->data, NULL, NULL, dba, NULL,
"SECURITY LABEL", SECTION_NONE,
seclabelQry->data, "", NULL,
&(dbDumpId), 1,
@@ -2859,7 +2861,7 @@ dumpDatabase(Archive *fout)
if (creaQry->len > 0)
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- datname, NULL, NULL, dba,
+ datname, NULL, NULL, dba, NULL,
"DATABASE PROPERTIES", SECTION_PRE_DATA,
creaQry->data, delQry->data, NULL,
&(dbDumpId), 1,
@@ -2904,7 +2906,7 @@ dumpDatabase(Archive *fout)
atooid(PQgetvalue(lo_res, 0, i_relminmxid)),
LargeObjectRelationId);
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- "pg_largeobject", NULL, NULL, "",
+ "pg_largeobject", NULL, NULL, "", NULL,
"pg_largeobject", SECTION_PRE_DATA,
loOutQry->data, "", NULL,
NULL, 0,
@@ -3014,7 +3016,7 @@ dumpEncoding(Archive *AH)
appendPQExpBufferStr(qry, ";\n");
ArchiveEntry(AH, nilCatalogId, createDumpId(),
- "ENCODING", NULL, NULL, "",
+ "ENCODING", NULL, NULL, "", NULL,
"ENCODING", SECTION_PRE_DATA,
qry->data, "", NULL,
NULL, 0,
@@ -3041,7 +3043,7 @@ dumpStdStrings(Archive *AH)
stdstrings);
ArchiveEntry(AH, nilCatalogId, createDumpId(),
- "STDSTRINGS", NULL, NULL, "",
+ "STDSTRINGS", NULL, NULL, "", NULL,
"STDSTRINGS", SECTION_PRE_DATA,
qry->data, "", NULL,
NULL, 0,
@@ -3097,7 +3099,7 @@ dumpSearchPath(Archive *AH)
write_msg(NULL, "saving search_path = %s\n", path->data);
ArchiveEntry(AH, nilCatalogId, createDumpId(),
- "SEARCHPATH", NULL, NULL, "",
+ "SEARCHPATH", NULL, NULL, "", NULL,
"SEARCHPATH", SECTION_PRE_DATA,
qry->data, "", NULL,
NULL, 0,
@@ -3275,7 +3277,7 @@ dumpBlob(Archive *fout, BlobInfo *binfo)
ArchiveEntry(fout, binfo->dobj.catId, binfo->dobj.dumpId,
binfo->dobj.name,
NULL, NULL,
- binfo->rolname,
+ binfo->rolname, NULL,
"BLOB", SECTION_PRE_DATA,
cquery->data, dquery->data, NULL,
NULL, 0,
@@ -3581,6 +3583,7 @@ dumpPolicy(Archive *fout, PolicyInfo *polinfo)
polinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"ROW SECURITY", SECTION_POST_DATA,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -3637,6 +3640,7 @@ dumpPolicy(Archive *fout, PolicyInfo *polinfo)
polinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"POLICY", SECTION_POST_DATA,
query->data, delqry->data, NULL,
NULL, 0,
@@ -3811,6 +3815,7 @@ dumpPublication(Archive *fout, PublicationInfo *pubinfo)
NULL,
NULL,
pubinfo->rolname,
+ NULL,
"PUBLICATION", SECTION_POST_DATA,
query->data, delq->data, NULL,
NULL, 0,
@@ -3954,6 +3959,7 @@ dumpPublicationTable(Archive *fout, PublicationRelInfo *pubrinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
"",
+ NULL,
"PUBLICATION TABLE", SECTION_POST_DATA,
query->data, "", NULL,
NULL, 0,
@@ -4147,6 +4153,7 @@ dumpSubscription(Archive *fout, SubscriptionInfo *subinfo)
NULL,
NULL,
subinfo->rolname,
+ NULL,
"SUBSCRIPTION", SECTION_POST_DATA,
query->data, delq->data, NULL,
NULL, 0,
@@ -5829,6 +5836,7 @@ getTables(Archive *fout, int *numTables)
int i_partkeydef;
int i_ispartition;
int i_partbound;
+ int i_amname;
/*
* Find all the tables and table-like objects.
@@ -5914,7 +5922,7 @@ getTables(Archive *fout, int *numTables)
"tc.relfrozenxid AS tfrozenxid, "
"tc.relminmxid AS tminmxid, "
"c.relpersistence, c.relispopulated, "
- "c.relreplident, c.relpages, "
+ "c.relreplident, c.relpages, am.amname, "
"CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -5945,6 +5953,7 @@ getTables(Archive *fout, int *numTables)
"d.objsubid = 0 AND "
"d.refclassid = c.tableoid AND d.deptype IN ('a', 'i')) "
"LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid) "
+ "LEFT JOIN pg_am am ON (c.relam = am.oid) "
"LEFT JOIN pg_init_privs pip ON "
"(c.oid = pip.objoid "
"AND pip.classoid = 'pg_class'::regclass "
@@ -6412,6 +6421,7 @@ getTables(Archive *fout, int *numTables)
i_partkeydef = PQfnumber(res, "partkeydef");
i_ispartition = PQfnumber(res, "ispartition");
i_partbound = PQfnumber(res, "partbound");
+ i_amname = PQfnumber(res, "amname");
if (dopt->lockWaitTimeout)
{
@@ -6481,6 +6491,11 @@ getTables(Archive *fout, int *numTables)
else
tblinfo[i].checkoption = pg_strdup(PQgetvalue(res, i, i_checkoption));
tblinfo[i].toast_reloptions = pg_strdup(PQgetvalue(res, i, i_toastreloptions));
+ if (PQgetisnull(res, i, i_amname))
+ tblinfo[i].amname = NULL;
+ else
+ tblinfo[i].amname = pg_strdup(PQgetvalue(res, i, i_amname));
+
/* other fields were zeroed above */
@@ -9355,7 +9370,7 @@ dumpComment(Archive *fout, const char *type, const char *name,
* post-data.
*/
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- tag->data, namespace, NULL, owner,
+ tag->data, namespace, NULL, owner, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(dumpId), 1,
@@ -9423,7 +9438,7 @@ dumpTableComment(Archive *fout, TableInfo *tbinfo,
ArchiveEntry(fout, nilCatalogId, createDumpId(),
tag->data,
tbinfo->dobj.namespace->dobj.name,
- NULL, tbinfo->rolname,
+ NULL, tbinfo->rolname, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -9447,7 +9462,7 @@ dumpTableComment(Archive *fout, TableInfo *tbinfo,
ArchiveEntry(fout, nilCatalogId, createDumpId(),
tag->data,
tbinfo->dobj.namespace->dobj.name,
- NULL, tbinfo->rolname,
+ NULL, tbinfo->rolname, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -9728,7 +9743,7 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
TocEntry *te;
te = ArchiveEntry(fout, dobj->catId, dobj->dumpId,
- dobj->name, NULL, NULL, "",
+ dobj->name, NULL, NULL, "", NULL,
"BLOBS", SECTION_DATA,
"", "", NULL,
NULL, 0,
@@ -9802,7 +9817,7 @@ dumpNamespace(Archive *fout, NamespaceInfo *nspinfo)
ArchiveEntry(fout, nspinfo->dobj.catId, nspinfo->dobj.dumpId,
nspinfo->dobj.name,
NULL, NULL,
- nspinfo->rolname,
+ nspinfo->rolname, NULL,
"SCHEMA", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -9938,7 +9953,7 @@ dumpExtension(Archive *fout, ExtensionInfo *extinfo)
ArchiveEntry(fout, extinfo->dobj.catId, extinfo->dobj.dumpId,
extinfo->dobj.name,
NULL, NULL,
- "",
+ "", NULL,
"EXTENSION", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10090,6 +10105,7 @@ dumpEnumType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10217,6 +10233,7 @@ dumpRangeType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10290,6 +10307,7 @@ dumpUndefinedType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10572,6 +10590,7 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10729,6 +10748,7 @@ dumpDomain(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"DOMAIN", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10951,6 +10971,7 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -11085,7 +11106,7 @@ dumpCompositeTypeColComments(Archive *fout, TypeInfo *tyinfo)
ArchiveEntry(fout, nilCatalogId, createDumpId(),
target->data,
tyinfo->dobj.namespace->dobj.name,
- NULL, tyinfo->rolname,
+ NULL, tyinfo->rolname, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(tyinfo->dobj.dumpId), 1,
@@ -11142,6 +11163,7 @@ dumpShellType(Archive *fout, ShellTypeInfo *stinfo)
stinfo->dobj.namespace->dobj.name,
NULL,
stinfo->baseType->rolname,
+ NULL,
"SHELL TYPE", SECTION_PRE_DATA,
q->data, "", NULL,
NULL, 0,
@@ -11251,7 +11273,7 @@ dumpProcLang(Archive *fout, ProcLangInfo *plang)
if (plang->dobj.dump & DUMP_COMPONENT_DEFINITION)
ArchiveEntry(fout, plang->dobj.catId, plang->dobj.dumpId,
plang->dobj.name,
- NULL, NULL, plang->lanowner,
+ NULL, NULL, plang->lanowner, NULL,
"PROCEDURAL LANGUAGE", SECTION_PRE_DATA,
defqry->data, delqry->data, NULL,
NULL, 0,
@@ -11924,6 +11946,7 @@ dumpFunc(Archive *fout, FuncInfo *finfo)
finfo->dobj.namespace->dobj.name,
NULL,
finfo->rolname,
+ NULL,
keyword, SECTION_PRE_DATA,
q->data, delqry->data, NULL,
NULL, 0,
@@ -12056,7 +12079,7 @@ dumpCast(Archive *fout, CastInfo *cast)
if (cast->dobj.dump & DUMP_COMPONENT_DEFINITION)
ArchiveEntry(fout, cast->dobj.catId, cast->dobj.dumpId,
labelq->data,
- NULL, NULL, "",
+ NULL, NULL, "", NULL,
"CAST", SECTION_PRE_DATA,
defqry->data, delqry->data, NULL,
NULL, 0,
@@ -12184,7 +12207,7 @@ dumpTransform(Archive *fout, TransformInfo *transform)
if (transform->dobj.dump & DUMP_COMPONENT_DEFINITION)
ArchiveEntry(fout, transform->dobj.catId, transform->dobj.dumpId,
labelq->data,
- NULL, NULL, "",
+ NULL, NULL, "", NULL,
"TRANSFORM", SECTION_PRE_DATA,
defqry->data, delqry->data, NULL,
transform->dobj.dependencies, transform->dobj.nDeps,
@@ -12400,6 +12423,7 @@ dumpOpr(Archive *fout, OprInfo *oprinfo)
oprinfo->dobj.namespace->dobj.name,
NULL,
oprinfo->rolname,
+ NULL,
"OPERATOR", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -12546,6 +12570,9 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
case AMTYPE_INDEX:
appendPQExpBuffer(q, "TYPE INDEX ");
break;
+ case AMTYPE_TABLE:
+ appendPQExpBuffer(q, "TYPE TABLE ");
+ break;
default:
write_msg(NULL, "WARNING: invalid type \"%c\" of access method \"%s\"\n",
aminfo->amtype, qamname);
@@ -12570,6 +12597,7 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
NULL,
NULL,
"",
+ NULL,
"ACCESS METHOD", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -12936,6 +12964,7 @@ dumpOpclass(Archive *fout, OpclassInfo *opcinfo)
opcinfo->dobj.namespace->dobj.name,
NULL,
opcinfo->rolname,
+ NULL,
"OPERATOR CLASS", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -13203,6 +13232,7 @@ dumpOpfamily(Archive *fout, OpfamilyInfo *opfinfo)
opfinfo->dobj.namespace->dobj.name,
NULL,
opfinfo->rolname,
+ NULL,
"OPERATOR FAMILY", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -13346,6 +13376,7 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
collinfo->dobj.namespace->dobj.name,
NULL,
collinfo->rolname,
+ NULL,
"COLLATION", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -13441,6 +13472,7 @@ dumpConversion(Archive *fout, ConvInfo *convinfo)
convinfo->dobj.namespace->dobj.name,
NULL,
convinfo->rolname,
+ NULL,
"CONVERSION", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -13930,6 +13962,7 @@ dumpAgg(Archive *fout, AggInfo *agginfo)
agginfo->aggfn.dobj.namespace->dobj.name,
NULL,
agginfo->aggfn.rolname,
+ NULL,
"AGGREGATE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14028,6 +14061,7 @@ dumpTSParser(Archive *fout, TSParserInfo *prsinfo)
prsinfo->dobj.namespace->dobj.name,
NULL,
"",
+ NULL,
"TEXT SEARCH PARSER", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14108,6 +14142,7 @@ dumpTSDictionary(Archive *fout, TSDictInfo *dictinfo)
dictinfo->dobj.namespace->dobj.name,
NULL,
dictinfo->rolname,
+ NULL,
"TEXT SEARCH DICTIONARY", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14169,6 +14204,7 @@ dumpTSTemplate(Archive *fout, TSTemplateInfo *tmplinfo)
tmplinfo->dobj.namespace->dobj.name,
NULL,
"",
+ NULL,
"TEXT SEARCH TEMPLATE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14289,6 +14325,7 @@ dumpTSConfig(Archive *fout, TSConfigInfo *cfginfo)
cfginfo->dobj.namespace->dobj.name,
NULL,
cfginfo->rolname,
+ NULL,
"TEXT SEARCH CONFIGURATION", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14355,6 +14392,7 @@ dumpForeignDataWrapper(Archive *fout, FdwInfo *fdwinfo)
NULL,
NULL,
fdwinfo->rolname,
+ NULL,
"FOREIGN DATA WRAPPER", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14446,6 +14484,7 @@ dumpForeignServer(Archive *fout, ForeignServerInfo *srvinfo)
NULL,
NULL,
srvinfo->rolname,
+ NULL,
"SERVER", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14564,6 +14603,7 @@ dumpUserMappings(Archive *fout,
namespace,
NULL,
owner,
+ NULL,
"USER MAPPING", SECTION_PRE_DATA,
q->data, delq->data, NULL,
&dumpId, 1,
@@ -14643,6 +14683,7 @@ dumpDefaultACL(Archive *fout, DefaultACLInfo *daclinfo)
daclinfo->dobj.namespace ? daclinfo->dobj.namespace->dobj.name : NULL,
NULL,
daclinfo->defaclrole,
+ NULL,
"DEFAULT ACL", SECTION_POST_DATA,
q->data, "", NULL,
NULL, 0,
@@ -14741,6 +14782,7 @@ dumpACL(Archive *fout, CatalogId objCatId, DumpId objDumpId,
tag->data, nspname,
NULL,
owner ? owner : "",
+ NULL,
"ACL", SECTION_NONE,
sql->data, "", NULL,
&(objDumpId), 1,
@@ -14826,7 +14868,7 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
appendPQExpBuffer(tag, "%s %s", type, name);
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- tag->data, namespace, NULL, owner,
+ tag->data, namespace, NULL, owner, NULL,
"SECURITY LABEL", SECTION_NONE,
query->data, "", NULL,
&(dumpId), 1,
@@ -14908,7 +14950,7 @@ dumpTableSecLabel(Archive *fout, TableInfo *tbinfo, const char *reltypename)
ArchiveEntry(fout, nilCatalogId, createDumpId(),
target->data,
tbinfo->dobj.namespace->dobj.name,
- NULL, tbinfo->rolname,
+ NULL, tbinfo->rolname, NULL,
"SECURITY LABEL", SECTION_NONE,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -15994,6 +16036,8 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
tbinfo->dobj.namespace->dobj.name,
(tbinfo->relkind == RELKIND_VIEW) ? NULL : tbinfo->reltablespace,
tbinfo->rolname,
+ (tbinfo->relkind == RELKIND_RELATION) ?
+ tbinfo->amname : NULL,
reltypename,
tbinfo->postponed_def ?
SECTION_POST_DATA : SECTION_PRE_DATA,
@@ -16074,6 +16118,7 @@ dumpAttrDef(Archive *fout, AttrDefInfo *adinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"DEFAULT", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16190,6 +16235,7 @@ dumpIndex(Archive *fout, IndxInfo *indxinfo)
tbinfo->dobj.namespace->dobj.name,
indxinfo->tablespace,
tbinfo->rolname,
+ NULL,
"INDEX", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16234,6 +16280,7 @@ dumpIndexAttach(Archive *fout, IndexAttachInfo *attachinfo)
attachinfo->dobj.namespace->dobj.name,
NULL,
"",
+ NULL,
"INDEX ATTACH", SECTION_POST_DATA,
q->data, "", NULL,
NULL, 0,
@@ -16289,6 +16336,7 @@ dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo)
statsextinfo->dobj.namespace->dobj.name,
NULL,
statsextinfo->rolname,
+ NULL,
"STATISTICS", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16450,6 +16498,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
tbinfo->dobj.namespace->dobj.name,
indxinfo->tablespace,
tbinfo->rolname,
+ NULL,
"CONSTRAINT", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16490,6 +16539,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"FK CONSTRAINT", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16522,6 +16572,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"CHECK CONSTRAINT", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16555,6 +16606,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"CHECK CONSTRAINT", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16829,6 +16881,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"SEQUENCE", SECTION_PRE_DATA,
query->data, delqry->data, NULL,
NULL, 0,
@@ -16870,6 +16923,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"SEQUENCE OWNED BY", SECTION_PRE_DATA,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -16938,6 +16992,7 @@ dumpSequenceData(Archive *fout, TableDataInfo *tdinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"SEQUENCE SET", SECTION_DATA,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -17137,6 +17192,7 @@ dumpTrigger(Archive *fout, TriggerInfo *tginfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"TRIGGER", SECTION_POST_DATA,
query->data, delqry->data, NULL,
NULL, 0,
@@ -17223,7 +17279,7 @@ dumpEventTrigger(Archive *fout, EventTriggerInfo *evtinfo)
if (evtinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
ArchiveEntry(fout, evtinfo->dobj.catId, evtinfo->dobj.dumpId,
evtinfo->dobj.name, NULL, NULL,
- evtinfo->evtowner,
+ evtinfo->evtowner, NULL,
"EVENT TRIGGER", SECTION_POST_DATA,
query->data, delqry->data, NULL,
NULL, 0,
@@ -17384,6 +17440,7 @@ dumpRule(Archive *fout, RuleInfo *rinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"RULE", SECTION_POST_DATA,
cmd->data, delcmd->data, NULL,
NULL, 0,
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 789d6a24e2..4024d0c1e3 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -324,6 +324,7 @@ typedef struct _tableInfo
char *partkeydef; /* partition key definition */
char *partbound; /* partition bound definition */
bool needs_override; /* has GENERATED ALWAYS AS IDENTITY */
+ char *amname; /* table access method */
/*
* Stuff computed only for dumpable tables.
Thanks for the patch updates.
A few comments so far from me :
+static void _selectTableAccessMethod(ArchiveHandle *AH, const char
*tablespace);
tablespace => tableam
+_selectTableAccessMethod(ArchiveHandle *AH, const char *tableam)
+{
+ PQExpBuffer cmd = createPQExpBuffer();
createPQExpBuffer() should be moved after the below statement, so that
it does not leak memory :
if (have && strcmp(want, have) == 0)
return;
char *tableam; /* table access method, onlyt for TABLE tags */
Indentation is a bit misaligned. onlyt=> only
@@ -2696,6 +2701,7 @@ ReadToc(ArchiveHandle *AH)
te->tablespace = ReadStr(AH);
te->owner = ReadStr(AH);
+ te->tableam = ReadStr(AH);
Above, I am not sure about the this, but possibly we may require to
have archive-version check like how it is done for tablespace :
if (AH->version >= K_VERS_1_10)
te->tablespace = ReadStr(AH);
So how about bumping up the archive version and doing these checks ?
Otherwise, if we run pg_restore using old version, we may read some
junk into te->tableam, or possibly crash. As I said, I am not sure
about this due to lack of clear understanding of archive versioning,
but let me know if you indeed find this issue to be true.
On Mon, Jan 14, 2019 at 2:07 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
createPQExpBuffer() should be moved after the below statement, so that
it does not leak memory
Thanks for noticing, fixed.
So how about bumping up the archive version and doing these checks ?
Yeah, you're right, I've added this check.
Attachments:
pg_dump_access_method_v3.patchapplication/octet-stream; name=pg_dump_access_method_v3.patchDownload
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 58bd3805f4..aadeacf95d 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -85,6 +85,7 @@ static void _becomeUser(ArchiveHandle *AH, const char *user);
static void _becomeOwner(ArchiveHandle *AH, TocEntry *te);
static void _selectOutputSchema(ArchiveHandle *AH, const char *schemaName);
static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
+static void _selectTableAccessMethod(ArchiveHandle *AH, const char *tablespace);
static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
static void processSearchPathEntry(ArchiveHandle *AH, TocEntry *te);
@@ -1072,6 +1073,7 @@ ArchiveEntry(Archive *AHX,
const char *namespace,
const char *tablespace,
const char *owner,
+ const char *tableam,
const char *desc, teSection section,
const char *defn,
const char *dropStmt, const char *copyStmt,
@@ -1099,6 +1101,7 @@ ArchiveEntry(Archive *AHX,
newToc->tag = pg_strdup(tag);
newToc->namespace = namespace ? pg_strdup(namespace) : NULL;
newToc->tablespace = tablespace ? pg_strdup(tablespace) : NULL;
+ newToc->tableam = tableam ? pg_strdup(tableam) : NULL;
newToc->owner = pg_strdup(owner);
newToc->desc = pg_strdup(desc);
newToc->defn = pg_strdup(defn);
@@ -2367,6 +2370,7 @@ _allocAH(const char *FileSpec, const ArchiveFormat fmt,
AH->currUser = NULL; /* unknown */
AH->currSchema = NULL; /* ditto */
AH->currTablespace = NULL; /* ditto */
+ AH->currTableAm = NULL; /* ditto */
AH->toc = (TocEntry *) pg_malloc0(sizeof(TocEntry));
@@ -2594,6 +2598,7 @@ WriteToc(ArchiveHandle *AH)
WriteStr(AH, te->namespace);
WriteStr(AH, te->tablespace);
WriteStr(AH, te->owner);
+ WriteStr(AH, te->tableam);
WriteStr(AH, "false");
/* Dump list of dependencies */
@@ -2696,6 +2701,9 @@ ReadToc(ArchiveHandle *AH)
te->tablespace = ReadStr(AH);
te->owner = ReadStr(AH);
+ if (AH->version >= K_VERS_1_14)
+ te->tableam = ReadStr(AH);
+
if (AH->version < K_VERS_1_9 || strcmp(ReadStr(AH), "true") == 0)
write_msg(modulename,
"WARNING: restoring tables WITH OIDS is not supported anymore");
@@ -3288,6 +3296,9 @@ _reconnectToDB(ArchiveHandle *AH, const char *dbname)
if (AH->currTablespace)
free(AH->currTablespace);
AH->currTablespace = NULL;
+ if (AH->currTableAm)
+ free(AH->currTableAm);
+ AH->currTableAm = NULL;
/* re-establish fixed state */
_doSetFixedOutputState(AH);
@@ -3448,6 +3459,48 @@ _selectTablespace(ArchiveHandle *AH, const char *tablespace)
destroyPQExpBuffer(qry);
}
+/*
+ * Set the proper default_table_access_method value for the table.
+ */
+static void
+_selectTableAccessMethod(ArchiveHandle *AH, const char *tableam)
+{
+ PQExpBuffer cmd;
+ const char *want, *have;
+
+ have = AH->currTableAm;
+ want = tableam;
+
+ if (!want)
+ return;
+
+ if (have && strcmp(want, have) == 0)
+ return;
+
+ cmd = createPQExpBuffer();
+ appendPQExpBuffer(cmd, "SET default_table_access_method = %s;", fmtId(want));
+
+ if (RestoringToDB(AH))
+ {
+ PGresult *res;
+
+ res = PQexec(AH->connection, cmd->data);
+
+ if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
+ warn_or_exit_horribly(AH, modulename,
+ "could not set default_table_access_method: %s",
+ PQerrorMessage(AH->connection));
+
+ PQclear(res);
+ }
+ else
+ ahprintf(AH, "%s\n\n", cmd->data);
+
+ destroyPQExpBuffer(cmd);
+
+ AH->currTableAm = pg_strdup(want);
+}
+
/*
* Extract an object description for a TOC entry, and append it to buf.
*
@@ -3547,6 +3600,7 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
_selectTablespace(AH, te->tablespace);
+ _selectTableAccessMethod(AH, te->tableam);
/* Emit header comment for item */
if (!AH->noTocComments)
@@ -4021,6 +4075,9 @@ restore_toc_entries_prefork(ArchiveHandle *AH, TocEntry *pending_list)
if (AH->currTablespace)
free(AH->currTablespace);
AH->currTablespace = NULL;
+ if (AH->currTableAm)
+ free(AH->currTableAm);
+ AH->currTableAm = NULL;
}
/*
@@ -4816,6 +4873,7 @@ CloneArchive(ArchiveHandle *AH)
clone->currUser = NULL;
clone->currSchema = NULL;
clone->currTablespace = NULL;
+ clone->currTableAm = NULL;
/* savedPassword must be local in case we change it while connecting */
if (clone->savedPassword)
@@ -4906,6 +4964,8 @@ DeCloneArchive(ArchiveHandle *AH)
free(AH->currSchema);
if (AH->currTablespace)
free(AH->currTablespace);
+ if (AH->currTableAm)
+ free(AH->currTableAm);
if (AH->savedPassword)
free(AH->savedPassword);
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 306d2ceba9..8131ff0a3d 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -94,6 +94,7 @@ typedef z_stream *z_streamp;
* entries */
#define K_VERS_1_13 MAKE_ARCHIVE_VERSION(1, 13, 0) /* change search_path
* behavior */
+#define K_VERS_1_14 MAKE_ARCHIVE_VERSION(1, 14, 0) /* add tableam */
/* Current archive version number (the format we can output) */
#define K_VERS_MAJOR 1
@@ -347,6 +348,7 @@ struct _archiveHandle
char *currUser; /* current username, or NULL if unknown */
char *currSchema; /* current schema, or NULL */
char *currTablespace; /* current tablespace, or NULL */
+ char *currTableAm; /* current table access method, or NULL */
void *lo_buf;
size_t lo_buf_used;
@@ -373,6 +375,8 @@ struct _tocEntry
char *namespace; /* null or empty string if not in a schema */
char *tablespace; /* null if not in a tablespace; empty string
* means use database default */
+ char *tableam; /* table access method, only for TABLE tags */
+
char *owner;
char *desc;
char *defn;
@@ -410,7 +414,7 @@ extern TocEntry *ArchiveEntry(Archive *AHX,
CatalogId catalogId, DumpId dumpId,
const char *tag,
const char *namespace, const char *tablespace,
- const char *owner,
+ const char *owner, const char *amname,
const char *desc, teSection section,
const char *defn,
const char *dropStmt, const char *copyStmt,
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 637c79af48..512c486546 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2136,7 +2136,7 @@ dumpTableData(Archive *fout, TableDataInfo *tdinfo)
te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
tbinfo->dobj.name, tbinfo->dobj.namespace->dobj.name,
- NULL, tbinfo->rolname,
+ NULL, tbinfo->rolname, NULL,
"TABLE DATA", SECTION_DATA,
"", "", copyStmt,
&(tbinfo->dobj.dumpId), 1,
@@ -2188,6 +2188,7 @@ refreshMatViewData(Archive *fout, TableDataInfo *tdinfo)
tbinfo->dobj.namespace->dobj.name, /* Namespace */
NULL, /* Tablespace */
tbinfo->rolname, /* Owner */
+ NULL, /* Table access method */
"MATERIALIZED VIEW DATA", /* Desc */
SECTION_POST_DATA, /* Section */
q->data, /* Create */
@@ -2726,6 +2727,7 @@ dumpDatabase(Archive *fout)
NULL, /* Namespace */
NULL, /* Tablespace */
dba, /* Owner */
+ NULL, /* Table access method */
"DATABASE", /* Desc */
SECTION_PRE_DATA, /* Section */
creaQry->data, /* Create */
@@ -2762,7 +2764,7 @@ dumpDatabase(Archive *fout)
appendPQExpBufferStr(dbQry, ";\n");
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- labelq->data, NULL, NULL, dba,
+ labelq->data, NULL, NULL, dba, NULL,
"COMMENT", SECTION_NONE,
dbQry->data, "", NULL,
&(dbDumpId), 1,
@@ -2789,7 +2791,7 @@ dumpDatabase(Archive *fout)
emitShSecLabels(conn, shres, seclabelQry, "DATABASE", datname);
if (seclabelQry->len > 0)
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- labelq->data, NULL, NULL, dba,
+ labelq->data, NULL, NULL, dba, NULL,
"SECURITY LABEL", SECTION_NONE,
seclabelQry->data, "", NULL,
&(dbDumpId), 1,
@@ -2859,7 +2861,7 @@ dumpDatabase(Archive *fout)
if (creaQry->len > 0)
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- datname, NULL, NULL, dba,
+ datname, NULL, NULL, dba, NULL,
"DATABASE PROPERTIES", SECTION_PRE_DATA,
creaQry->data, delQry->data, NULL,
&(dbDumpId), 1,
@@ -2904,7 +2906,7 @@ dumpDatabase(Archive *fout)
atooid(PQgetvalue(lo_res, 0, i_relminmxid)),
LargeObjectRelationId);
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- "pg_largeobject", NULL, NULL, "",
+ "pg_largeobject", NULL, NULL, "", NULL,
"pg_largeobject", SECTION_PRE_DATA,
loOutQry->data, "", NULL,
NULL, 0,
@@ -3014,7 +3016,7 @@ dumpEncoding(Archive *AH)
appendPQExpBufferStr(qry, ";\n");
ArchiveEntry(AH, nilCatalogId, createDumpId(),
- "ENCODING", NULL, NULL, "",
+ "ENCODING", NULL, NULL, "", NULL,
"ENCODING", SECTION_PRE_DATA,
qry->data, "", NULL,
NULL, 0,
@@ -3041,7 +3043,7 @@ dumpStdStrings(Archive *AH)
stdstrings);
ArchiveEntry(AH, nilCatalogId, createDumpId(),
- "STDSTRINGS", NULL, NULL, "",
+ "STDSTRINGS", NULL, NULL, "", NULL,
"STDSTRINGS", SECTION_PRE_DATA,
qry->data, "", NULL,
NULL, 0,
@@ -3097,7 +3099,7 @@ dumpSearchPath(Archive *AH)
write_msg(NULL, "saving search_path = %s\n", path->data);
ArchiveEntry(AH, nilCatalogId, createDumpId(),
- "SEARCHPATH", NULL, NULL, "",
+ "SEARCHPATH", NULL, NULL, "", NULL,
"SEARCHPATH", SECTION_PRE_DATA,
qry->data, "", NULL,
NULL, 0,
@@ -3275,7 +3277,7 @@ dumpBlob(Archive *fout, BlobInfo *binfo)
ArchiveEntry(fout, binfo->dobj.catId, binfo->dobj.dumpId,
binfo->dobj.name,
NULL, NULL,
- binfo->rolname,
+ binfo->rolname, NULL,
"BLOB", SECTION_PRE_DATA,
cquery->data, dquery->data, NULL,
NULL, 0,
@@ -3581,6 +3583,7 @@ dumpPolicy(Archive *fout, PolicyInfo *polinfo)
polinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"ROW SECURITY", SECTION_POST_DATA,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -3637,6 +3640,7 @@ dumpPolicy(Archive *fout, PolicyInfo *polinfo)
polinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"POLICY", SECTION_POST_DATA,
query->data, delqry->data, NULL,
NULL, 0,
@@ -3811,6 +3815,7 @@ dumpPublication(Archive *fout, PublicationInfo *pubinfo)
NULL,
NULL,
pubinfo->rolname,
+ NULL,
"PUBLICATION", SECTION_POST_DATA,
query->data, delq->data, NULL,
NULL, 0,
@@ -3954,6 +3959,7 @@ dumpPublicationTable(Archive *fout, PublicationRelInfo *pubrinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
"",
+ NULL,
"PUBLICATION TABLE", SECTION_POST_DATA,
query->data, "", NULL,
NULL, 0,
@@ -4147,6 +4153,7 @@ dumpSubscription(Archive *fout, SubscriptionInfo *subinfo)
NULL,
NULL,
subinfo->rolname,
+ NULL,
"SUBSCRIPTION", SECTION_POST_DATA,
query->data, delq->data, NULL,
NULL, 0,
@@ -5829,6 +5836,7 @@ getTables(Archive *fout, int *numTables)
int i_partkeydef;
int i_ispartition;
int i_partbound;
+ int i_amname;
/*
* Find all the tables and table-like objects.
@@ -5914,7 +5922,7 @@ getTables(Archive *fout, int *numTables)
"tc.relfrozenxid AS tfrozenxid, "
"tc.relminmxid AS tminmxid, "
"c.relpersistence, c.relispopulated, "
- "c.relreplident, c.relpages, "
+ "c.relreplident, c.relpages, am.amname, "
"CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -5945,6 +5953,7 @@ getTables(Archive *fout, int *numTables)
"d.objsubid = 0 AND "
"d.refclassid = c.tableoid AND d.deptype IN ('a', 'i')) "
"LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid) "
+ "LEFT JOIN pg_am am ON (c.relam = am.oid) "
"LEFT JOIN pg_init_privs pip ON "
"(c.oid = pip.objoid "
"AND pip.classoid = 'pg_class'::regclass "
@@ -6412,6 +6421,7 @@ getTables(Archive *fout, int *numTables)
i_partkeydef = PQfnumber(res, "partkeydef");
i_ispartition = PQfnumber(res, "ispartition");
i_partbound = PQfnumber(res, "partbound");
+ i_amname = PQfnumber(res, "amname");
if (dopt->lockWaitTimeout)
{
@@ -6481,6 +6491,11 @@ getTables(Archive *fout, int *numTables)
else
tblinfo[i].checkoption = pg_strdup(PQgetvalue(res, i, i_checkoption));
tblinfo[i].toast_reloptions = pg_strdup(PQgetvalue(res, i, i_toastreloptions));
+ if (PQgetisnull(res, i, i_amname))
+ tblinfo[i].amname = NULL;
+ else
+ tblinfo[i].amname = pg_strdup(PQgetvalue(res, i, i_amname));
+
/* other fields were zeroed above */
@@ -9355,7 +9370,7 @@ dumpComment(Archive *fout, const char *type, const char *name,
* post-data.
*/
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- tag->data, namespace, NULL, owner,
+ tag->data, namespace, NULL, owner, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(dumpId), 1,
@@ -9423,7 +9438,7 @@ dumpTableComment(Archive *fout, TableInfo *tbinfo,
ArchiveEntry(fout, nilCatalogId, createDumpId(),
tag->data,
tbinfo->dobj.namespace->dobj.name,
- NULL, tbinfo->rolname,
+ NULL, tbinfo->rolname, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -9447,7 +9462,7 @@ dumpTableComment(Archive *fout, TableInfo *tbinfo,
ArchiveEntry(fout, nilCatalogId, createDumpId(),
tag->data,
tbinfo->dobj.namespace->dobj.name,
- NULL, tbinfo->rolname,
+ NULL, tbinfo->rolname, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -9728,7 +9743,7 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
TocEntry *te;
te = ArchiveEntry(fout, dobj->catId, dobj->dumpId,
- dobj->name, NULL, NULL, "",
+ dobj->name, NULL, NULL, "", NULL,
"BLOBS", SECTION_DATA,
"", "", NULL,
NULL, 0,
@@ -9802,7 +9817,7 @@ dumpNamespace(Archive *fout, NamespaceInfo *nspinfo)
ArchiveEntry(fout, nspinfo->dobj.catId, nspinfo->dobj.dumpId,
nspinfo->dobj.name,
NULL, NULL,
- nspinfo->rolname,
+ nspinfo->rolname, NULL,
"SCHEMA", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -9938,7 +9953,7 @@ dumpExtension(Archive *fout, ExtensionInfo *extinfo)
ArchiveEntry(fout, extinfo->dobj.catId, extinfo->dobj.dumpId,
extinfo->dobj.name,
NULL, NULL,
- "",
+ "", NULL,
"EXTENSION", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10090,6 +10105,7 @@ dumpEnumType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10217,6 +10233,7 @@ dumpRangeType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10290,6 +10307,7 @@ dumpUndefinedType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10572,6 +10590,7 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10729,6 +10748,7 @@ dumpDomain(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"DOMAIN", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10951,6 +10971,7 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -11085,7 +11106,7 @@ dumpCompositeTypeColComments(Archive *fout, TypeInfo *tyinfo)
ArchiveEntry(fout, nilCatalogId, createDumpId(),
target->data,
tyinfo->dobj.namespace->dobj.name,
- NULL, tyinfo->rolname,
+ NULL, tyinfo->rolname, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(tyinfo->dobj.dumpId), 1,
@@ -11142,6 +11163,7 @@ dumpShellType(Archive *fout, ShellTypeInfo *stinfo)
stinfo->dobj.namespace->dobj.name,
NULL,
stinfo->baseType->rolname,
+ NULL,
"SHELL TYPE", SECTION_PRE_DATA,
q->data, "", NULL,
NULL, 0,
@@ -11251,7 +11273,7 @@ dumpProcLang(Archive *fout, ProcLangInfo *plang)
if (plang->dobj.dump & DUMP_COMPONENT_DEFINITION)
ArchiveEntry(fout, plang->dobj.catId, plang->dobj.dumpId,
plang->dobj.name,
- NULL, NULL, plang->lanowner,
+ NULL, NULL, plang->lanowner, NULL,
"PROCEDURAL LANGUAGE", SECTION_PRE_DATA,
defqry->data, delqry->data, NULL,
NULL, 0,
@@ -11924,6 +11946,7 @@ dumpFunc(Archive *fout, FuncInfo *finfo)
finfo->dobj.namespace->dobj.name,
NULL,
finfo->rolname,
+ NULL,
keyword, SECTION_PRE_DATA,
q->data, delqry->data, NULL,
NULL, 0,
@@ -12056,7 +12079,7 @@ dumpCast(Archive *fout, CastInfo *cast)
if (cast->dobj.dump & DUMP_COMPONENT_DEFINITION)
ArchiveEntry(fout, cast->dobj.catId, cast->dobj.dumpId,
labelq->data,
- NULL, NULL, "",
+ NULL, NULL, "", NULL,
"CAST", SECTION_PRE_DATA,
defqry->data, delqry->data, NULL,
NULL, 0,
@@ -12184,7 +12207,7 @@ dumpTransform(Archive *fout, TransformInfo *transform)
if (transform->dobj.dump & DUMP_COMPONENT_DEFINITION)
ArchiveEntry(fout, transform->dobj.catId, transform->dobj.dumpId,
labelq->data,
- NULL, NULL, "",
+ NULL, NULL, "", NULL,
"TRANSFORM", SECTION_PRE_DATA,
defqry->data, delqry->data, NULL,
transform->dobj.dependencies, transform->dobj.nDeps,
@@ -12400,6 +12423,7 @@ dumpOpr(Archive *fout, OprInfo *oprinfo)
oprinfo->dobj.namespace->dobj.name,
NULL,
oprinfo->rolname,
+ NULL,
"OPERATOR", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -12546,6 +12570,9 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
case AMTYPE_INDEX:
appendPQExpBuffer(q, "TYPE INDEX ");
break;
+ case AMTYPE_TABLE:
+ appendPQExpBuffer(q, "TYPE TABLE ");
+ break;
default:
write_msg(NULL, "WARNING: invalid type \"%c\" of access method \"%s\"\n",
aminfo->amtype, qamname);
@@ -12570,6 +12597,7 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
NULL,
NULL,
"",
+ NULL,
"ACCESS METHOD", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -12936,6 +12964,7 @@ dumpOpclass(Archive *fout, OpclassInfo *opcinfo)
opcinfo->dobj.namespace->dobj.name,
NULL,
opcinfo->rolname,
+ NULL,
"OPERATOR CLASS", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -13203,6 +13232,7 @@ dumpOpfamily(Archive *fout, OpfamilyInfo *opfinfo)
opfinfo->dobj.namespace->dobj.name,
NULL,
opfinfo->rolname,
+ NULL,
"OPERATOR FAMILY", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -13346,6 +13376,7 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
collinfo->dobj.namespace->dobj.name,
NULL,
collinfo->rolname,
+ NULL,
"COLLATION", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -13441,6 +13472,7 @@ dumpConversion(Archive *fout, ConvInfo *convinfo)
convinfo->dobj.namespace->dobj.name,
NULL,
convinfo->rolname,
+ NULL,
"CONVERSION", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -13930,6 +13962,7 @@ dumpAgg(Archive *fout, AggInfo *agginfo)
agginfo->aggfn.dobj.namespace->dobj.name,
NULL,
agginfo->aggfn.rolname,
+ NULL,
"AGGREGATE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14028,6 +14061,7 @@ dumpTSParser(Archive *fout, TSParserInfo *prsinfo)
prsinfo->dobj.namespace->dobj.name,
NULL,
"",
+ NULL,
"TEXT SEARCH PARSER", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14108,6 +14142,7 @@ dumpTSDictionary(Archive *fout, TSDictInfo *dictinfo)
dictinfo->dobj.namespace->dobj.name,
NULL,
dictinfo->rolname,
+ NULL,
"TEXT SEARCH DICTIONARY", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14169,6 +14204,7 @@ dumpTSTemplate(Archive *fout, TSTemplateInfo *tmplinfo)
tmplinfo->dobj.namespace->dobj.name,
NULL,
"",
+ NULL,
"TEXT SEARCH TEMPLATE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14289,6 +14325,7 @@ dumpTSConfig(Archive *fout, TSConfigInfo *cfginfo)
cfginfo->dobj.namespace->dobj.name,
NULL,
cfginfo->rolname,
+ NULL,
"TEXT SEARCH CONFIGURATION", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14355,6 +14392,7 @@ dumpForeignDataWrapper(Archive *fout, FdwInfo *fdwinfo)
NULL,
NULL,
fdwinfo->rolname,
+ NULL,
"FOREIGN DATA WRAPPER", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14446,6 +14484,7 @@ dumpForeignServer(Archive *fout, ForeignServerInfo *srvinfo)
NULL,
NULL,
srvinfo->rolname,
+ NULL,
"SERVER", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14564,6 +14603,7 @@ dumpUserMappings(Archive *fout,
namespace,
NULL,
owner,
+ NULL,
"USER MAPPING", SECTION_PRE_DATA,
q->data, delq->data, NULL,
&dumpId, 1,
@@ -14643,6 +14683,7 @@ dumpDefaultACL(Archive *fout, DefaultACLInfo *daclinfo)
daclinfo->dobj.namespace ? daclinfo->dobj.namespace->dobj.name : NULL,
NULL,
daclinfo->defaclrole,
+ NULL,
"DEFAULT ACL", SECTION_POST_DATA,
q->data, "", NULL,
NULL, 0,
@@ -14741,6 +14782,7 @@ dumpACL(Archive *fout, CatalogId objCatId, DumpId objDumpId,
tag->data, nspname,
NULL,
owner ? owner : "",
+ NULL,
"ACL", SECTION_NONE,
sql->data, "", NULL,
&(objDumpId), 1,
@@ -14826,7 +14868,7 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
appendPQExpBuffer(tag, "%s %s", type, name);
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- tag->data, namespace, NULL, owner,
+ tag->data, namespace, NULL, owner, NULL,
"SECURITY LABEL", SECTION_NONE,
query->data, "", NULL,
&(dumpId), 1,
@@ -14908,7 +14950,7 @@ dumpTableSecLabel(Archive *fout, TableInfo *tbinfo, const char *reltypename)
ArchiveEntry(fout, nilCatalogId, createDumpId(),
target->data,
tbinfo->dobj.namespace->dobj.name,
- NULL, tbinfo->rolname,
+ NULL, tbinfo->rolname, NULL,
"SECURITY LABEL", SECTION_NONE,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -15994,6 +16036,8 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
tbinfo->dobj.namespace->dobj.name,
(tbinfo->relkind == RELKIND_VIEW) ? NULL : tbinfo->reltablespace,
tbinfo->rolname,
+ (tbinfo->relkind == RELKIND_RELATION) ?
+ tbinfo->amname : NULL,
reltypename,
tbinfo->postponed_def ?
SECTION_POST_DATA : SECTION_PRE_DATA,
@@ -16074,6 +16118,7 @@ dumpAttrDef(Archive *fout, AttrDefInfo *adinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"DEFAULT", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16190,6 +16235,7 @@ dumpIndex(Archive *fout, IndxInfo *indxinfo)
tbinfo->dobj.namespace->dobj.name,
indxinfo->tablespace,
tbinfo->rolname,
+ NULL,
"INDEX", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16234,6 +16280,7 @@ dumpIndexAttach(Archive *fout, IndexAttachInfo *attachinfo)
attachinfo->dobj.namespace->dobj.name,
NULL,
"",
+ NULL,
"INDEX ATTACH", SECTION_POST_DATA,
q->data, "", NULL,
NULL, 0,
@@ -16289,6 +16336,7 @@ dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo)
statsextinfo->dobj.namespace->dobj.name,
NULL,
statsextinfo->rolname,
+ NULL,
"STATISTICS", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16450,6 +16498,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
tbinfo->dobj.namespace->dobj.name,
indxinfo->tablespace,
tbinfo->rolname,
+ NULL,
"CONSTRAINT", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16490,6 +16539,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"FK CONSTRAINT", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16522,6 +16572,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"CHECK CONSTRAINT", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16555,6 +16606,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"CHECK CONSTRAINT", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16829,6 +16881,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"SEQUENCE", SECTION_PRE_DATA,
query->data, delqry->data, NULL,
NULL, 0,
@@ -16870,6 +16923,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"SEQUENCE OWNED BY", SECTION_PRE_DATA,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -16938,6 +16992,7 @@ dumpSequenceData(Archive *fout, TableDataInfo *tdinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"SEQUENCE SET", SECTION_DATA,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -17137,6 +17192,7 @@ dumpTrigger(Archive *fout, TriggerInfo *tginfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"TRIGGER", SECTION_POST_DATA,
query->data, delqry->data, NULL,
NULL, 0,
@@ -17223,7 +17279,7 @@ dumpEventTrigger(Archive *fout, EventTriggerInfo *evtinfo)
if (evtinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
ArchiveEntry(fout, evtinfo->dobj.catId, evtinfo->dobj.dumpId,
evtinfo->dobj.name, NULL, NULL,
- evtinfo->evtowner,
+ evtinfo->evtowner, NULL,
"EVENT TRIGGER", SECTION_POST_DATA,
query->data, delqry->data, NULL,
NULL, 0,
@@ -17384,6 +17440,7 @@ dumpRule(Archive *fout, RuleInfo *rinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"RULE", SECTION_POST_DATA,
cmd->data, delcmd->data, NULL,
NULL, 0,
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 789d6a24e2..4024d0c1e3 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -324,6 +324,7 @@ typedef struct _tableInfo
char *partkeydef; /* partition key definition */
char *partbound; /* partition bound definition */
bool needs_override; /* has GENERATED ALWAYS AS IDENTITY */
+ char *amname; /* table access method */
/*
* Stuff computed only for dumpable tables.
On Tue, Dec 11, 2018 at 1:13 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-11-26 17:55:57 -0800, Andres Freund wrote:
Further tasks I'm not yet planning to tackle, that I'd welcome help on:
- pg_upgrade testing
I did the pg_upgrade testing from older version with some tables and views
exists, and all of them are properly transformed into new server with heap
as the default access method.
I will add the dimitry pg_dump patch and test the pg_upgrade to confirm
the proper access method is retained on the upgraded database.
- I think we should consider removing HeapTuple->t_tableOid, it should
imo live entirely in the slot
I removed the t_tableOid from HeapTuple and during testing I found some
problems with triggers, will post the patch once it is fixed.
Regards,
Haribabu Kommi
Fujitsu Australia
Hi,
On 2019-01-15 18:02:38 +1100, Haribabu Kommi wrote:
On Tue, Dec 11, 2018 at 1:13 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-11-26 17:55:57 -0800, Andres Freund wrote:
Further tasks I'm not yet planning to tackle, that I'd welcome help on:
- pg_upgrade testingI did the pg_upgrade testing from older version with some tables and views
exists, and all of them are properly transformed into new server with heap
as the default access method.I will add the dimitry pg_dump patch and test the pg_upgrade to confirm
the proper access method is retained on the upgraded database.- I think we should consider removing HeapTuple->t_tableOid, it should
imo live entirely in the slotI removed the t_tableOid from HeapTuple and during testing I found some
problems with triggers, will post the patch once it is fixed.
Please note that I'm working on a heavily revised version of the patch
right now, trying to clean up a lot of things (you might have seen some
of the threads I started). I hope to post it ~Thursday. Local-ish
patches shouldn't be a problem though.
Greetings,
Andres Freund
On Sat, 12 Jan 2019 at 18:11, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
On Sat, Jan 12, 2019 at 1:44 AM Andres Freund <andres@anarazel.de> wrote:
/* other fields were zeroed above */
@@ -9355,7 +9370,7 @@ dumpComment(Archive *fout, const char *type, const char *name, * post-data. */ ArchiveEntry(fout, nilCatalogId, createDumpId(), - tag->data, namespace, NULL, owner, + tag->data, namespace, NULL, owner, NULL, "COMMENT", SECTION_NONE, query->data, "", NULL, &(dumpId), 1,We really ought to move the arguments to a struct, so we don't generate
quite as much useless diffs whenever we do a change around one of
these...That's what I thought too. Maybe then I'll suggest a mini-patch to the master to
refactor these arguments out into a separate struct, so we can leverage it here.
Then for each of the calls, we would need to declare that structure
variable (with = {0}) and assign required fields in that structure
before passing it to ArchiveEntry(). But a major reason of
ArchiveEntry() is to avoid doing this and instead conveniently pass
those fields as parameters. This will cause unnecessary more lines of
code. I think better way is to have an ArchiveEntry() function with
limited number of parameters, and have an ArchiveEntryEx() with those
extra parameters which are not needed in usual cases. E.g. we can have
tablespace, tableam, dumpFn and dumpArg as those extra arguments of
ArchiveEntryEx(), because most of the places these are passed as NULL.
All future arguments would go in ArchiveEntryEx().
Comments ?
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
On Tue, 15 Jan 2019 at 12:27, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
On Mon, Jan 14, 2019 at 2:07 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
createPQExpBuffer() should be moved after the below statement, so that
it does not leak memoryThanks for noticing, fixed.
Looks good.
So how about bumping up the archive version and doing these checks ?
Yeah, you're right, I've added this check.
Need to bump K_VERS_MINOR as well.
On Mon, 14 Jan 2019 at 18:36, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
+static void _selectTableAccessMethod(ArchiveHandle *AH, const char
*tablespace);
tablespace => tableam
This is yet to be addressed.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
On Tue, Jan 15, 2019 at 10:52 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Need to bump K_VERS_MINOR as well.
I've bumped it up, but somehow this change escaped the previous version. Now
should be there, thanks!
On Mon, 14 Jan 2019 at 18:36, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
+static void _selectTableAccessMethod(ArchiveHandle *AH, const char
*tablespace);
tablespace => tableamThis is yet to be addressed.
Fixed.
Also I guess another attached patch should address the psql part, namely
displaying a table access method with \d+ and possibility to hide it with a
psql variable (HIDE_TABLEAM, but I'm open for suggestion about the name).
Attachments:
pg_dump_access_method_v4.patchapplication/octet-stream; name=pg_dump_access_method_v4.patchDownload
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 58bd3805f4..38f24ba1bf 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -85,6 +85,7 @@ static void _becomeUser(ArchiveHandle *AH, const char *user);
static void _becomeOwner(ArchiveHandle *AH, TocEntry *te);
static void _selectOutputSchema(ArchiveHandle *AH, const char *schemaName);
static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
+static void _selectTableAccessMethod(ArchiveHandle *AH, const char *tableam);
static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
static void processSearchPathEntry(ArchiveHandle *AH, TocEntry *te);
@@ -1072,6 +1073,7 @@ ArchiveEntry(Archive *AHX,
const char *namespace,
const char *tablespace,
const char *owner,
+ const char *tableam,
const char *desc, teSection section,
const char *defn,
const char *dropStmt, const char *copyStmt,
@@ -1099,6 +1101,7 @@ ArchiveEntry(Archive *AHX,
newToc->tag = pg_strdup(tag);
newToc->namespace = namespace ? pg_strdup(namespace) : NULL;
newToc->tablespace = tablespace ? pg_strdup(tablespace) : NULL;
+ newToc->tableam = tableam ? pg_strdup(tableam) : NULL;
newToc->owner = pg_strdup(owner);
newToc->desc = pg_strdup(desc);
newToc->defn = pg_strdup(defn);
@@ -2367,6 +2370,7 @@ _allocAH(const char *FileSpec, const ArchiveFormat fmt,
AH->currUser = NULL; /* unknown */
AH->currSchema = NULL; /* ditto */
AH->currTablespace = NULL; /* ditto */
+ AH->currTableAm = NULL; /* ditto */
AH->toc = (TocEntry *) pg_malloc0(sizeof(TocEntry));
@@ -2594,6 +2598,7 @@ WriteToc(ArchiveHandle *AH)
WriteStr(AH, te->namespace);
WriteStr(AH, te->tablespace);
WriteStr(AH, te->owner);
+ WriteStr(AH, te->tableam);
WriteStr(AH, "false");
/* Dump list of dependencies */
@@ -2696,6 +2701,9 @@ ReadToc(ArchiveHandle *AH)
te->tablespace = ReadStr(AH);
te->owner = ReadStr(AH);
+ if (AH->version >= K_VERS_1_14)
+ te->tableam = ReadStr(AH);
+
if (AH->version < K_VERS_1_9 || strcmp(ReadStr(AH), "true") == 0)
write_msg(modulename,
"WARNING: restoring tables WITH OIDS is not supported anymore");
@@ -3288,6 +3296,9 @@ _reconnectToDB(ArchiveHandle *AH, const char *dbname)
if (AH->currTablespace)
free(AH->currTablespace);
AH->currTablespace = NULL;
+ if (AH->currTableAm)
+ free(AH->currTableAm);
+ AH->currTableAm = NULL;
/* re-establish fixed state */
_doSetFixedOutputState(AH);
@@ -3448,6 +3459,48 @@ _selectTablespace(ArchiveHandle *AH, const char *tablespace)
destroyPQExpBuffer(qry);
}
+/*
+ * Set the proper default_table_access_method value for the table.
+ */
+static void
+_selectTableAccessMethod(ArchiveHandle *AH, const char *tableam)
+{
+ PQExpBuffer cmd;
+ const char *want, *have;
+
+ have = AH->currTableAm;
+ want = tableam;
+
+ if (!want)
+ return;
+
+ if (have && strcmp(want, have) == 0)
+ return;
+
+ cmd = createPQExpBuffer();
+ appendPQExpBuffer(cmd, "SET default_table_access_method = %s;", fmtId(want));
+
+ if (RestoringToDB(AH))
+ {
+ PGresult *res;
+
+ res = PQexec(AH->connection, cmd->data);
+
+ if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
+ warn_or_exit_horribly(AH, modulename,
+ "could not set default_table_access_method: %s",
+ PQerrorMessage(AH->connection));
+
+ PQclear(res);
+ }
+ else
+ ahprintf(AH, "%s\n\n", cmd->data);
+
+ destroyPQExpBuffer(cmd);
+
+ AH->currTableAm = pg_strdup(want);
+}
+
/*
* Extract an object description for a TOC entry, and append it to buf.
*
@@ -3547,6 +3600,7 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
_selectTablespace(AH, te->tablespace);
+ _selectTableAccessMethod(AH, te->tableam);
/* Emit header comment for item */
if (!AH->noTocComments)
@@ -4021,6 +4075,9 @@ restore_toc_entries_prefork(ArchiveHandle *AH, TocEntry *pending_list)
if (AH->currTablespace)
free(AH->currTablespace);
AH->currTablespace = NULL;
+ if (AH->currTableAm)
+ free(AH->currTableAm);
+ AH->currTableAm = NULL;
}
/*
@@ -4816,6 +4873,7 @@ CloneArchive(ArchiveHandle *AH)
clone->currUser = NULL;
clone->currSchema = NULL;
clone->currTablespace = NULL;
+ clone->currTableAm = NULL;
/* savedPassword must be local in case we change it while connecting */
if (clone->savedPassword)
@@ -4906,6 +4964,8 @@ DeCloneArchive(ArchiveHandle *AH)
free(AH->currSchema);
if (AH->currTablespace)
free(AH->currTablespace);
+ if (AH->currTableAm)
+ free(AH->currTableAm);
if (AH->savedPassword)
free(AH->savedPassword);
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 306d2ceba9..c719fca0ad 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -94,10 +94,11 @@ typedef z_stream *z_streamp;
* entries */
#define K_VERS_1_13 MAKE_ARCHIVE_VERSION(1, 13, 0) /* change search_path
* behavior */
+#define K_VERS_1_14 MAKE_ARCHIVE_VERSION(1, 14, 0) /* add tableam */
/* Current archive version number (the format we can output) */
#define K_VERS_MAJOR 1
-#define K_VERS_MINOR 13
+#define K_VERS_MINOR 14
#define K_VERS_REV 0
#define K_VERS_SELF MAKE_ARCHIVE_VERSION(K_VERS_MAJOR, K_VERS_MINOR, K_VERS_REV);
@@ -347,6 +348,7 @@ struct _archiveHandle
char *currUser; /* current username, or NULL if unknown */
char *currSchema; /* current schema, or NULL */
char *currTablespace; /* current tablespace, or NULL */
+ char *currTableAm; /* current table access method, or NULL */
void *lo_buf;
size_t lo_buf_used;
@@ -373,6 +375,8 @@ struct _tocEntry
char *namespace; /* null or empty string if not in a schema */
char *tablespace; /* null if not in a tablespace; empty string
* means use database default */
+ char *tableam; /* table access method, only for TABLE tags */
+
char *owner;
char *desc;
char *defn;
@@ -410,7 +414,7 @@ extern TocEntry *ArchiveEntry(Archive *AHX,
CatalogId catalogId, DumpId dumpId,
const char *tag,
const char *namespace, const char *tablespace,
- const char *owner,
+ const char *owner, const char *amname,
const char *desc, teSection section,
const char *defn,
const char *dropStmt, const char *copyStmt,
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 637c79af48..512c486546 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2136,7 +2136,7 @@ dumpTableData(Archive *fout, TableDataInfo *tdinfo)
te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
tbinfo->dobj.name, tbinfo->dobj.namespace->dobj.name,
- NULL, tbinfo->rolname,
+ NULL, tbinfo->rolname, NULL,
"TABLE DATA", SECTION_DATA,
"", "", copyStmt,
&(tbinfo->dobj.dumpId), 1,
@@ -2188,6 +2188,7 @@ refreshMatViewData(Archive *fout, TableDataInfo *tdinfo)
tbinfo->dobj.namespace->dobj.name, /* Namespace */
NULL, /* Tablespace */
tbinfo->rolname, /* Owner */
+ NULL, /* Table access method */
"MATERIALIZED VIEW DATA", /* Desc */
SECTION_POST_DATA, /* Section */
q->data, /* Create */
@@ -2726,6 +2727,7 @@ dumpDatabase(Archive *fout)
NULL, /* Namespace */
NULL, /* Tablespace */
dba, /* Owner */
+ NULL, /* Table access method */
"DATABASE", /* Desc */
SECTION_PRE_DATA, /* Section */
creaQry->data, /* Create */
@@ -2762,7 +2764,7 @@ dumpDatabase(Archive *fout)
appendPQExpBufferStr(dbQry, ";\n");
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- labelq->data, NULL, NULL, dba,
+ labelq->data, NULL, NULL, dba, NULL,
"COMMENT", SECTION_NONE,
dbQry->data, "", NULL,
&(dbDumpId), 1,
@@ -2789,7 +2791,7 @@ dumpDatabase(Archive *fout)
emitShSecLabels(conn, shres, seclabelQry, "DATABASE", datname);
if (seclabelQry->len > 0)
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- labelq->data, NULL, NULL, dba,
+ labelq->data, NULL, NULL, dba, NULL,
"SECURITY LABEL", SECTION_NONE,
seclabelQry->data, "", NULL,
&(dbDumpId), 1,
@@ -2859,7 +2861,7 @@ dumpDatabase(Archive *fout)
if (creaQry->len > 0)
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- datname, NULL, NULL, dba,
+ datname, NULL, NULL, dba, NULL,
"DATABASE PROPERTIES", SECTION_PRE_DATA,
creaQry->data, delQry->data, NULL,
&(dbDumpId), 1,
@@ -2904,7 +2906,7 @@ dumpDatabase(Archive *fout)
atooid(PQgetvalue(lo_res, 0, i_relminmxid)),
LargeObjectRelationId);
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- "pg_largeobject", NULL, NULL, "",
+ "pg_largeobject", NULL, NULL, "", NULL,
"pg_largeobject", SECTION_PRE_DATA,
loOutQry->data, "", NULL,
NULL, 0,
@@ -3014,7 +3016,7 @@ dumpEncoding(Archive *AH)
appendPQExpBufferStr(qry, ";\n");
ArchiveEntry(AH, nilCatalogId, createDumpId(),
- "ENCODING", NULL, NULL, "",
+ "ENCODING", NULL, NULL, "", NULL,
"ENCODING", SECTION_PRE_DATA,
qry->data, "", NULL,
NULL, 0,
@@ -3041,7 +3043,7 @@ dumpStdStrings(Archive *AH)
stdstrings);
ArchiveEntry(AH, nilCatalogId, createDumpId(),
- "STDSTRINGS", NULL, NULL, "",
+ "STDSTRINGS", NULL, NULL, "", NULL,
"STDSTRINGS", SECTION_PRE_DATA,
qry->data, "", NULL,
NULL, 0,
@@ -3097,7 +3099,7 @@ dumpSearchPath(Archive *AH)
write_msg(NULL, "saving search_path = %s\n", path->data);
ArchiveEntry(AH, nilCatalogId, createDumpId(),
- "SEARCHPATH", NULL, NULL, "",
+ "SEARCHPATH", NULL, NULL, "", NULL,
"SEARCHPATH", SECTION_PRE_DATA,
qry->data, "", NULL,
NULL, 0,
@@ -3275,7 +3277,7 @@ dumpBlob(Archive *fout, BlobInfo *binfo)
ArchiveEntry(fout, binfo->dobj.catId, binfo->dobj.dumpId,
binfo->dobj.name,
NULL, NULL,
- binfo->rolname,
+ binfo->rolname, NULL,
"BLOB", SECTION_PRE_DATA,
cquery->data, dquery->data, NULL,
NULL, 0,
@@ -3581,6 +3583,7 @@ dumpPolicy(Archive *fout, PolicyInfo *polinfo)
polinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"ROW SECURITY", SECTION_POST_DATA,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -3637,6 +3640,7 @@ dumpPolicy(Archive *fout, PolicyInfo *polinfo)
polinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"POLICY", SECTION_POST_DATA,
query->data, delqry->data, NULL,
NULL, 0,
@@ -3811,6 +3815,7 @@ dumpPublication(Archive *fout, PublicationInfo *pubinfo)
NULL,
NULL,
pubinfo->rolname,
+ NULL,
"PUBLICATION", SECTION_POST_DATA,
query->data, delq->data, NULL,
NULL, 0,
@@ -3954,6 +3959,7 @@ dumpPublicationTable(Archive *fout, PublicationRelInfo *pubrinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
"",
+ NULL,
"PUBLICATION TABLE", SECTION_POST_DATA,
query->data, "", NULL,
NULL, 0,
@@ -4147,6 +4153,7 @@ dumpSubscription(Archive *fout, SubscriptionInfo *subinfo)
NULL,
NULL,
subinfo->rolname,
+ NULL,
"SUBSCRIPTION", SECTION_POST_DATA,
query->data, delq->data, NULL,
NULL, 0,
@@ -5829,6 +5836,7 @@ getTables(Archive *fout, int *numTables)
int i_partkeydef;
int i_ispartition;
int i_partbound;
+ int i_amname;
/*
* Find all the tables and table-like objects.
@@ -5914,7 +5922,7 @@ getTables(Archive *fout, int *numTables)
"tc.relfrozenxid AS tfrozenxid, "
"tc.relminmxid AS tminmxid, "
"c.relpersistence, c.relispopulated, "
- "c.relreplident, c.relpages, "
+ "c.relreplident, c.relpages, am.amname, "
"CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
"d.refobjid AS owning_tab, "
"d.refobjsubid AS owning_col, "
@@ -5945,6 +5953,7 @@ getTables(Archive *fout, int *numTables)
"d.objsubid = 0 AND "
"d.refclassid = c.tableoid AND d.deptype IN ('a', 'i')) "
"LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid) "
+ "LEFT JOIN pg_am am ON (c.relam = am.oid) "
"LEFT JOIN pg_init_privs pip ON "
"(c.oid = pip.objoid "
"AND pip.classoid = 'pg_class'::regclass "
@@ -6412,6 +6421,7 @@ getTables(Archive *fout, int *numTables)
i_partkeydef = PQfnumber(res, "partkeydef");
i_ispartition = PQfnumber(res, "ispartition");
i_partbound = PQfnumber(res, "partbound");
+ i_amname = PQfnumber(res, "amname");
if (dopt->lockWaitTimeout)
{
@@ -6481,6 +6491,11 @@ getTables(Archive *fout, int *numTables)
else
tblinfo[i].checkoption = pg_strdup(PQgetvalue(res, i, i_checkoption));
tblinfo[i].toast_reloptions = pg_strdup(PQgetvalue(res, i, i_toastreloptions));
+ if (PQgetisnull(res, i, i_amname))
+ tblinfo[i].amname = NULL;
+ else
+ tblinfo[i].amname = pg_strdup(PQgetvalue(res, i, i_amname));
+
/* other fields were zeroed above */
@@ -9355,7 +9370,7 @@ dumpComment(Archive *fout, const char *type, const char *name,
* post-data.
*/
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- tag->data, namespace, NULL, owner,
+ tag->data, namespace, NULL, owner, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(dumpId), 1,
@@ -9423,7 +9438,7 @@ dumpTableComment(Archive *fout, TableInfo *tbinfo,
ArchiveEntry(fout, nilCatalogId, createDumpId(),
tag->data,
tbinfo->dobj.namespace->dobj.name,
- NULL, tbinfo->rolname,
+ NULL, tbinfo->rolname, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -9447,7 +9462,7 @@ dumpTableComment(Archive *fout, TableInfo *tbinfo,
ArchiveEntry(fout, nilCatalogId, createDumpId(),
tag->data,
tbinfo->dobj.namespace->dobj.name,
- NULL, tbinfo->rolname,
+ NULL, tbinfo->rolname, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -9728,7 +9743,7 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
TocEntry *te;
te = ArchiveEntry(fout, dobj->catId, dobj->dumpId,
- dobj->name, NULL, NULL, "",
+ dobj->name, NULL, NULL, "", NULL,
"BLOBS", SECTION_DATA,
"", "", NULL,
NULL, 0,
@@ -9802,7 +9817,7 @@ dumpNamespace(Archive *fout, NamespaceInfo *nspinfo)
ArchiveEntry(fout, nspinfo->dobj.catId, nspinfo->dobj.dumpId,
nspinfo->dobj.name,
NULL, NULL,
- nspinfo->rolname,
+ nspinfo->rolname, NULL,
"SCHEMA", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -9938,7 +9953,7 @@ dumpExtension(Archive *fout, ExtensionInfo *extinfo)
ArchiveEntry(fout, extinfo->dobj.catId, extinfo->dobj.dumpId,
extinfo->dobj.name,
NULL, NULL,
- "",
+ "", NULL,
"EXTENSION", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10090,6 +10105,7 @@ dumpEnumType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10217,6 +10233,7 @@ dumpRangeType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10290,6 +10307,7 @@ dumpUndefinedType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10572,6 +10590,7 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10729,6 +10748,7 @@ dumpDomain(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"DOMAIN", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -10951,6 +10971,7 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"TYPE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -11085,7 +11106,7 @@ dumpCompositeTypeColComments(Archive *fout, TypeInfo *tyinfo)
ArchiveEntry(fout, nilCatalogId, createDumpId(),
target->data,
tyinfo->dobj.namespace->dobj.name,
- NULL, tyinfo->rolname,
+ NULL, tyinfo->rolname, NULL,
"COMMENT", SECTION_NONE,
query->data, "", NULL,
&(tyinfo->dobj.dumpId), 1,
@@ -11142,6 +11163,7 @@ dumpShellType(Archive *fout, ShellTypeInfo *stinfo)
stinfo->dobj.namespace->dobj.name,
NULL,
stinfo->baseType->rolname,
+ NULL,
"SHELL TYPE", SECTION_PRE_DATA,
q->data, "", NULL,
NULL, 0,
@@ -11251,7 +11273,7 @@ dumpProcLang(Archive *fout, ProcLangInfo *plang)
if (plang->dobj.dump & DUMP_COMPONENT_DEFINITION)
ArchiveEntry(fout, plang->dobj.catId, plang->dobj.dumpId,
plang->dobj.name,
- NULL, NULL, plang->lanowner,
+ NULL, NULL, plang->lanowner, NULL,
"PROCEDURAL LANGUAGE", SECTION_PRE_DATA,
defqry->data, delqry->data, NULL,
NULL, 0,
@@ -11924,6 +11946,7 @@ dumpFunc(Archive *fout, FuncInfo *finfo)
finfo->dobj.namespace->dobj.name,
NULL,
finfo->rolname,
+ NULL,
keyword, SECTION_PRE_DATA,
q->data, delqry->data, NULL,
NULL, 0,
@@ -12056,7 +12079,7 @@ dumpCast(Archive *fout, CastInfo *cast)
if (cast->dobj.dump & DUMP_COMPONENT_DEFINITION)
ArchiveEntry(fout, cast->dobj.catId, cast->dobj.dumpId,
labelq->data,
- NULL, NULL, "",
+ NULL, NULL, "", NULL,
"CAST", SECTION_PRE_DATA,
defqry->data, delqry->data, NULL,
NULL, 0,
@@ -12184,7 +12207,7 @@ dumpTransform(Archive *fout, TransformInfo *transform)
if (transform->dobj.dump & DUMP_COMPONENT_DEFINITION)
ArchiveEntry(fout, transform->dobj.catId, transform->dobj.dumpId,
labelq->data,
- NULL, NULL, "",
+ NULL, NULL, "", NULL,
"TRANSFORM", SECTION_PRE_DATA,
defqry->data, delqry->data, NULL,
transform->dobj.dependencies, transform->dobj.nDeps,
@@ -12400,6 +12423,7 @@ dumpOpr(Archive *fout, OprInfo *oprinfo)
oprinfo->dobj.namespace->dobj.name,
NULL,
oprinfo->rolname,
+ NULL,
"OPERATOR", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -12546,6 +12570,9 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
case AMTYPE_INDEX:
appendPQExpBuffer(q, "TYPE INDEX ");
break;
+ case AMTYPE_TABLE:
+ appendPQExpBuffer(q, "TYPE TABLE ");
+ break;
default:
write_msg(NULL, "WARNING: invalid type \"%c\" of access method \"%s\"\n",
aminfo->amtype, qamname);
@@ -12570,6 +12597,7 @@ dumpAccessMethod(Archive *fout, AccessMethodInfo *aminfo)
NULL,
NULL,
"",
+ NULL,
"ACCESS METHOD", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -12936,6 +12964,7 @@ dumpOpclass(Archive *fout, OpclassInfo *opcinfo)
opcinfo->dobj.namespace->dobj.name,
NULL,
opcinfo->rolname,
+ NULL,
"OPERATOR CLASS", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -13203,6 +13232,7 @@ dumpOpfamily(Archive *fout, OpfamilyInfo *opfinfo)
opfinfo->dobj.namespace->dobj.name,
NULL,
opfinfo->rolname,
+ NULL,
"OPERATOR FAMILY", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -13346,6 +13376,7 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
collinfo->dobj.namespace->dobj.name,
NULL,
collinfo->rolname,
+ NULL,
"COLLATION", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -13441,6 +13472,7 @@ dumpConversion(Archive *fout, ConvInfo *convinfo)
convinfo->dobj.namespace->dobj.name,
NULL,
convinfo->rolname,
+ NULL,
"CONVERSION", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -13930,6 +13962,7 @@ dumpAgg(Archive *fout, AggInfo *agginfo)
agginfo->aggfn.dobj.namespace->dobj.name,
NULL,
agginfo->aggfn.rolname,
+ NULL,
"AGGREGATE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14028,6 +14061,7 @@ dumpTSParser(Archive *fout, TSParserInfo *prsinfo)
prsinfo->dobj.namespace->dobj.name,
NULL,
"",
+ NULL,
"TEXT SEARCH PARSER", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14108,6 +14142,7 @@ dumpTSDictionary(Archive *fout, TSDictInfo *dictinfo)
dictinfo->dobj.namespace->dobj.name,
NULL,
dictinfo->rolname,
+ NULL,
"TEXT SEARCH DICTIONARY", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14169,6 +14204,7 @@ dumpTSTemplate(Archive *fout, TSTemplateInfo *tmplinfo)
tmplinfo->dobj.namespace->dobj.name,
NULL,
"",
+ NULL,
"TEXT SEARCH TEMPLATE", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14289,6 +14325,7 @@ dumpTSConfig(Archive *fout, TSConfigInfo *cfginfo)
cfginfo->dobj.namespace->dobj.name,
NULL,
cfginfo->rolname,
+ NULL,
"TEXT SEARCH CONFIGURATION", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14355,6 +14392,7 @@ dumpForeignDataWrapper(Archive *fout, FdwInfo *fdwinfo)
NULL,
NULL,
fdwinfo->rolname,
+ NULL,
"FOREIGN DATA WRAPPER", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14446,6 +14484,7 @@ dumpForeignServer(Archive *fout, ForeignServerInfo *srvinfo)
NULL,
NULL,
srvinfo->rolname,
+ NULL,
"SERVER", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -14564,6 +14603,7 @@ dumpUserMappings(Archive *fout,
namespace,
NULL,
owner,
+ NULL,
"USER MAPPING", SECTION_PRE_DATA,
q->data, delq->data, NULL,
&dumpId, 1,
@@ -14643,6 +14683,7 @@ dumpDefaultACL(Archive *fout, DefaultACLInfo *daclinfo)
daclinfo->dobj.namespace ? daclinfo->dobj.namespace->dobj.name : NULL,
NULL,
daclinfo->defaclrole,
+ NULL,
"DEFAULT ACL", SECTION_POST_DATA,
q->data, "", NULL,
NULL, 0,
@@ -14741,6 +14782,7 @@ dumpACL(Archive *fout, CatalogId objCatId, DumpId objDumpId,
tag->data, nspname,
NULL,
owner ? owner : "",
+ NULL,
"ACL", SECTION_NONE,
sql->data, "", NULL,
&(objDumpId), 1,
@@ -14826,7 +14868,7 @@ dumpSecLabel(Archive *fout, const char *type, const char *name,
appendPQExpBuffer(tag, "%s %s", type, name);
ArchiveEntry(fout, nilCatalogId, createDumpId(),
- tag->data, namespace, NULL, owner,
+ tag->data, namespace, NULL, owner, NULL,
"SECURITY LABEL", SECTION_NONE,
query->data, "", NULL,
&(dumpId), 1,
@@ -14908,7 +14950,7 @@ dumpTableSecLabel(Archive *fout, TableInfo *tbinfo, const char *reltypename)
ArchiveEntry(fout, nilCatalogId, createDumpId(),
target->data,
tbinfo->dobj.namespace->dobj.name,
- NULL, tbinfo->rolname,
+ NULL, tbinfo->rolname, NULL,
"SECURITY LABEL", SECTION_NONE,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -15994,6 +16036,8 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
tbinfo->dobj.namespace->dobj.name,
(tbinfo->relkind == RELKIND_VIEW) ? NULL : tbinfo->reltablespace,
tbinfo->rolname,
+ (tbinfo->relkind == RELKIND_RELATION) ?
+ tbinfo->amname : NULL,
reltypename,
tbinfo->postponed_def ?
SECTION_POST_DATA : SECTION_PRE_DATA,
@@ -16074,6 +16118,7 @@ dumpAttrDef(Archive *fout, AttrDefInfo *adinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"DEFAULT", SECTION_PRE_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16190,6 +16235,7 @@ dumpIndex(Archive *fout, IndxInfo *indxinfo)
tbinfo->dobj.namespace->dobj.name,
indxinfo->tablespace,
tbinfo->rolname,
+ NULL,
"INDEX", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16234,6 +16280,7 @@ dumpIndexAttach(Archive *fout, IndexAttachInfo *attachinfo)
attachinfo->dobj.namespace->dobj.name,
NULL,
"",
+ NULL,
"INDEX ATTACH", SECTION_POST_DATA,
q->data, "", NULL,
NULL, 0,
@@ -16289,6 +16336,7 @@ dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo)
statsextinfo->dobj.namespace->dobj.name,
NULL,
statsextinfo->rolname,
+ NULL,
"STATISTICS", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16450,6 +16498,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
tbinfo->dobj.namespace->dobj.name,
indxinfo->tablespace,
tbinfo->rolname,
+ NULL,
"CONSTRAINT", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16490,6 +16539,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"FK CONSTRAINT", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16522,6 +16572,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"CHECK CONSTRAINT", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16555,6 +16606,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
tyinfo->dobj.namespace->dobj.name,
NULL,
tyinfo->rolname,
+ NULL,
"CHECK CONSTRAINT", SECTION_POST_DATA,
q->data, delq->data, NULL,
NULL, 0,
@@ -16829,6 +16881,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"SEQUENCE", SECTION_PRE_DATA,
query->data, delqry->data, NULL,
NULL, 0,
@@ -16870,6 +16923,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"SEQUENCE OWNED BY", SECTION_PRE_DATA,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -16938,6 +16992,7 @@ dumpSequenceData(Archive *fout, TableDataInfo *tdinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"SEQUENCE SET", SECTION_DATA,
query->data, "", NULL,
&(tbinfo->dobj.dumpId), 1,
@@ -17137,6 +17192,7 @@ dumpTrigger(Archive *fout, TriggerInfo *tginfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"TRIGGER", SECTION_POST_DATA,
query->data, delqry->data, NULL,
NULL, 0,
@@ -17223,7 +17279,7 @@ dumpEventTrigger(Archive *fout, EventTriggerInfo *evtinfo)
if (evtinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
ArchiveEntry(fout, evtinfo->dobj.catId, evtinfo->dobj.dumpId,
evtinfo->dobj.name, NULL, NULL,
- evtinfo->evtowner,
+ evtinfo->evtowner, NULL,
"EVENT TRIGGER", SECTION_POST_DATA,
query->data, delqry->data, NULL,
NULL, 0,
@@ -17384,6 +17440,7 @@ dumpRule(Archive *fout, RuleInfo *rinfo)
tbinfo->dobj.namespace->dobj.name,
NULL,
tbinfo->rolname,
+ NULL,
"RULE", SECTION_POST_DATA,
cmd->data, delcmd->data, NULL,
NULL, 0,
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 789d6a24e2..4024d0c1e3 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -324,6 +324,7 @@ typedef struct _tableInfo
char *partkeydef; /* partition key definition */
char *partbound; /* partition bound definition */
bool needs_override; /* has GENERATED ALWAYS AS IDENTITY */
+ char *amname; /* table access method */
/*
* Stuff computed only for dumpable tables.
psql_describe_am.patchapplication/octet-stream; name=psql_describe_am.patchDownload
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 0a181b01d9..3eef06ab7d 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1484,6 +1484,8 @@ describeOneTableDetails(const char *schemaname,
char *reloftype;
char relpersistence;
char relreplident;
+ char *relam;
+ bool relam_is_default;
} tableinfo;
bool show_column_details = false;
@@ -1503,9 +1505,11 @@ describeOneTableDetails(const char *schemaname,
"c.relhastriggers, c.relrowsecurity, c.relforcerowsecurity, "
"false AS relhasoids, %s, c.reltablespace, "
"CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END, "
- "c.relpersistence, c.relreplident\n"
+ "c.relpersistence, c.relreplident, am.amname,"
+ "am.amname = current_setting('default_table_access_method') \n"
"FROM pg_catalog.pg_class c\n "
"LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
+ "LEFT JOIN pg_catalog.pg_am am ON (c.relam = am.oid)\n"
"WHERE c.oid = '%s';",
(verbose ?
"pg_catalog.array_to_string(c.reloptions || "
@@ -1656,6 +1660,17 @@ describeOneTableDetails(const char *schemaname,
*(PQgetvalue(res, 0, 11)) : 0;
tableinfo.relreplident = (pset.sversion >= 90400) ?
*(PQgetvalue(res, 0, 12)) : 'd';
+ if (pset.sversion >= 120000)
+ {
+ tableinfo.relam = PQgetisnull(res, 0, 13) ?
+ (char *) NULL : pg_strdup(PQgetvalue(res, 0, 13));
+ tableinfo.relam_is_default = strcmp(PQgetvalue(res, 0, 14), "t") == 0;
+ }
+ else
+ {
+ tableinfo.relam = NULL;
+ tableinfo.relam_is_default = false;
+ }
PQclear(res);
res = NULL;
@@ -3141,6 +3156,16 @@ describeOneTableDetails(const char *schemaname,
/* Tablespace info */
add_tablespace_footer(&cont, tableinfo.relkind, tableinfo.tablespace,
true);
+
+ /* Access method info */
+ if (pset.sversion >= 120000 && verbose && tableinfo.relam != NULL &&
+ !(pset.hide_tableam && tableinfo.relam_is_default))
+ {
+ printfPQExpBuffer(&buf, _("Access method: %s"), fmtId(tableinfo.relam));
+ printTableAddFooter(&cont, buf.data);
+ }
+
+
}
/* reloptions, if verbose */
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 176c85afd0..0c62dfac30 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -140,6 +140,7 @@ typedef struct _psqlSettings
const char *prompt3;
PGVerbosity verbosity; /* current error verbosity level */
PGContextVisibility show_context; /* current context display level */
+ bool hide_tableam;
} PsqlSettings;
extern PsqlSettings pset;
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index e7536a8a06..b757febcc5 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -1128,6 +1128,11 @@ show_context_hook(const char *newval)
return true;
}
+static bool
+hide_tableam_hook(const char *newval)
+{
+ return ParseVariableBool(newval, "HIDE_TABLEAM", &pset.hide_tableam);
+}
static void
EstablishVariableSpace(void)
@@ -1191,4 +1196,7 @@ EstablishVariableSpace(void)
SetVariableHooks(pset.vars, "SHOW_CONTEXT",
show_context_substitute_hook,
show_context_hook);
+ SetVariableHooks(pset.vars, "HIDE_TABLEAM",
+ bool_substitute_hook,
+ hide_tableam_hook);
}
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 19bb538411..031c09422c 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
CREATE TEMP TABLE x (
a serial,
b int,
diff --git a/src/test/regress/expected/create_table.out b/src/test/regress/expected/create_table.out
index 7e52c27e3f..15c4474235 100644
--- a/src/test/regress/expected/create_table.out
+++ b/src/test/regress/expected/create_table.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
--
-- CREATE_TABLE
--
diff --git a/src/test/regress/expected/create_table_like.out b/src/test/regress/expected/create_table_like.out
index b582211270..08cbeadf0c 100644
--- a/src/test/regress/expected/create_table_like.out
+++ b/src/test/regress/expected/create_table_like.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
/* Test inheritance of structure (LIKE) */
CREATE TABLE inhx (xx text DEFAULT 'text');
/*
diff --git a/src/test/regress/expected/domain.out b/src/test/regress/expected/domain.out
index 0b5a9041b0..e2568008d9 100644
--- a/src/test/regress/expected/domain.out
+++ b/src/test/regress/expected/domain.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
--
-- Test domains.
--
diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out
index 4d82d3a7e8..60a28c09fb 100644
--- a/src/test/regress/expected/foreign_data.out
+++ b/src/test/regress/expected/foreign_data.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
--
-- Test foreign-data wrapper and server management.
--
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index f259d07535..5236af2744 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
--
-- Test inheritance features
--
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index 1cf6531c01..ab90db4d66 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
--
-- insert with DEFAULT in the target_list
--
diff --git a/src/test/regress/expected/matview.out b/src/test/regress/expected/matview.out
index 08cd4bea48..ff9ade106d 100644
--- a/src/test/regress/expected/matview.out
+++ b/src/test/regress/expected/matview.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
-- create a table to use as a basis for views and materialized views in various combinations
CREATE TABLE mvtest_t (id int NOT NULL PRIMARY KEY, type text NOT NULL, amt numeric NOT NULL);
INSERT INTO mvtest_t VALUES
diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out
index afbbdd543d..cc5d42dbf2 100644
--- a/src/test/regress/expected/publication.out
+++ b/src/test/regress/expected/publication.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
--
-- PUBLICATION
--
diff --git a/src/test/regress/expected/replica_identity.out b/src/test/regress/expected/replica_identity.out
index 175ecd2879..f3331ee833 100644
--- a/src/test/regress/expected/replica_identity.out
+++ b/src/test/regress/expected/replica_identity.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
CREATE TABLE test_replica_identity (
id serial primary key,
keya text not null,
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 1d12b01068..a3c3befb04 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
--
-- Test of Row-level security feature
--
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b68b8d273f..feb7cd95b1 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
--
-- RULES
-- From Jan's original setup_ruletest.sql and run_ruletest.sql
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
index d09326c182..bf209c14b0 100644
--- a/src/test/regress/expected/update.out
+++ b/src/test/regress/expected/update.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
--
-- UPDATE syntax tests
--
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index e36df8858e..bec93af438 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
CREATE TEMP TABLE x (
a serial,
b int,
diff --git a/src/test/regress/sql/create_table.sql b/src/test/regress/sql/create_table.sql
index a2cae9663c..fb1e9083a4 100644
--- a/src/test/regress/sql/create_table.sql
+++ b/src/test/regress/sql/create_table.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
--
-- CREATE_TABLE
--
diff --git a/src/test/regress/sql/create_table_like.sql b/src/test/regress/sql/create_table_like.sql
index 65c3880792..46889a5b1f 100644
--- a/src/test/regress/sql/create_table_like.sql
+++ b/src/test/regress/sql/create_table_like.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
/* Test inheritance of structure (LIKE) */
CREATE TABLE inhx (xx text DEFAULT 'text');
diff --git a/src/test/regress/sql/domain.sql b/src/test/regress/sql/domain.sql
index 68da27de22..c12f0b3093 100644
--- a/src/test/regress/sql/domain.sql
+++ b/src/test/regress/sql/domain.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
--
-- Test domains.
--
diff --git a/src/test/regress/sql/foreign_data.sql b/src/test/regress/sql/foreign_data.sql
index d6fb3fae4e..3aba5a9236 100644
--- a/src/test/regress/sql/foreign_data.sql
+++ b/src/test/regress/sql/foreign_data.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
--
-- Test foreign-data wrapper and server management.
--
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 425052c1f4..40dffccc7d 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
--
-- Test inheritance features
--
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index a7f659bc2b..85b5ca909c 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
--
-- insert with DEFAULT in the target_list
--
diff --git a/src/test/regress/sql/matview.sql b/src/test/regress/sql/matview.sql
index d96175aa26..2ce509581d 100644
--- a/src/test/regress/sql/matview.sql
+++ b/src/test/regress/sql/matview.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
-- create a table to use as a basis for views and materialized views in various combinations
CREATE TABLE mvtest_t (id int NOT NULL PRIMARY KEY, type text NOT NULL, amt numeric NOT NULL);
INSERT INTO mvtest_t VALUES
diff --git a/src/test/regress/sql/publication.sql b/src/test/regress/sql/publication.sql
index 815410b3c5..d5a1370c74 100644
--- a/src/test/regress/sql/publication.sql
+++ b/src/test/regress/sql/publication.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
--
-- PUBLICATION
--
diff --git a/src/test/regress/sql/replica_identity.sql b/src/test/regress/sql/replica_identity.sql
index b08a3623b8..90db4a7c40 100644
--- a/src/test/regress/sql/replica_identity.sql
+++ b/src/test/regress/sql/replica_identity.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
CREATE TABLE test_replica_identity (
id serial primary key,
keya text not null,
diff --git a/src/test/regress/sql/rowsecurity.sql b/src/test/regress/sql/rowsecurity.sql
index 38e9b38bc4..4b1c2ee619 100644
--- a/src/test/regress/sql/rowsecurity.sql
+++ b/src/test/regress/sql/rowsecurity.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
--
-- Test of Row-level security feature
--
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index f4ee30ec8f..a54038f3a2 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
--
-- RULES
-- From Jan's original setup_ruletest.sql and run_ruletest.sql
diff --git a/src/test/regress/sql/update.sql b/src/test/regress/sql/update.sql
index c9bb3b53d3..837b6d1871 100644
--- a/src/test/regress/sql/update.sql
+++ b/src/test/regress/sql/update.sql
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
--
-- UPDATE syntax tests
--
Hi,
On 2019-01-15 14:37:36 +0530, Amit Khandekar wrote:
Then for each of the calls, we would need to declare that structure
variable (with = {0}) and assign required fields in that structure
before passing it to ArchiveEntry(). But a major reason of
ArchiveEntry() is to avoid doing this and instead conveniently pass
those fields as parameters. This will cause unnecessary more lines of
code. I think better way is to have an ArchiveEntry() function with
limited number of parameters, and have an ArchiveEntryEx() with those
extra parameters which are not needed in usual cases.
I don't think that'll really solve the problem. I think it might be more
reasonable to rely on structs. Now that we can rely on designated
initializers for structs we can do something like
ArchiveEntry((ArchiveArgs){.tablespace = 3,
.dumpFn = somefunc,
...});
and unused arguments will automatically initialized to zero. Or we
could pass the struct as a pointer, might be more efficient (although I
doubt it matters here):
ArchiveEntry(&(ArchiveArgs){.tablespace = 3,
.dumpFn = somefunc,
...});
What do others think? It'd probably be a good idea to start a new
thread about this.
Greetings,
Andres Freund
On Tue, 15 Jan 2019 at 17:58, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
On Tue, Jan 15, 2019 at 10:52 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Need to bump K_VERS_MINOR as well.
I've bumped it up, but somehow this change escaped the previous version. Now
should be there, thanks!On Mon, 14 Jan 2019 at 18:36, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
+static void _selectTableAccessMethod(ArchiveHandle *AH, const char
*tablespace);
tablespace => tableamThis is yet to be addressed.
Fixed.
Thanks, the patch looks good to me. Of course there's the other thread
about ArchiveEntry arguments which may alter this patch, but
otherwise, I have no more comments on this patch.
Also I guess another attached patch should address the psql part, namely
displaying a table access method with \d+ and possibility to hide it with a
psql variable (HIDE_TABLEAM, but I'm open for suggestion about the name).
Will have a look at this one.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
On Fri, 18 Jan 2019 at 10:13, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Tue, 15 Jan 2019 at 17:58, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
Also I guess another attached patch should address the psql part, namely
displaying a table access method with \d+ and possibility to hide it with a
psql variable (HIDE_TABLEAM, but I'm open for suggestion about the name).
I am ok with the name.
Will have a look at this one.
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -1,3 +1,4 @@
+\set HIDE_TABLEAM on
CREATE TEMP TABLE x (
I thought we wanted to avoid having to add this setting in individual
regression tests. Can't we do this in pg_regress as a common setting ?
+ /* Access method info */
+ if (pset.sversion >= 120000 && verbose && tableinfo.relam != NULL &&
+ !(pset.hide_tableam && tableinfo.relam_is_default))
+ {
+ printfPQExpBuffer(&buf, _("Access method: %s"),
fmtId(tableinfo.relam));
So this will make psql hide the access method if it's same as the
default. I understand that this was kind of concluded in the other
thread "Displaying and dumping of table access methods". But IMHO, if
the hide_tableam is false, we should *always* show the access method,
regardless of the default value. I mean, we can make it simple : off
means never show table-access, on means always show table-access,
regardless of the default access method. And this also will work with
regression tests. If some regression test wants specifically to output
the access method, it can have a "\SET HIDE_TABLEAM off" command.
If we hide the method if it's default, then for a regression test that
wants to forcibly show the table access method of all tables, it won't
show up for tables that have default access method.
------------
+ if (pset.sversion >= 120000 && verbose && tableinfo.relam != NULL &&
If the server does not support relam, tableinfo.relam will be NULL
anyways. So I think sversion check is not needed.
------------
+ printfPQExpBuffer(&buf, _("Access method: %s"), fmtId(tableinfo.relam));
fmtId is not required. In fact, we should display the access method
name as-is. fmtId is required only for identifiers present in SQL
queries.
-----------
+ printfPQExpBuffer(&buf, _("Access method: %s"), fmtId(tableinfo.relam));
+ printTableAddFooter(&cont, buf.data);
+ }
+
+
}
Last two blank lines are not needed.
-----------
+ bool hide_tableam;
} PsqlSettings;
These variables, it seems, are supposed to be grouped together by type.
-----------
I believe you are going to add a new regression testcase for the change ?
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
On Fri, Jan 18, 2019 at 11:22 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
--- a/src/test/regress/expected/copy2.out +++ b/src/test/regress/expected/copy2.out @@ -1,3 +1,4 @@ +\set HIDE_TABLEAM onI thought we wanted to avoid having to add this setting in individual
regression tests. Can't we do this in pg_regress as a common setting ?
Yeah, you're probably right. Actually, I couldn't find anything that looks like
"common settings", and so far I've placed it into psql_start_test as a psql
argument. But not sure, maybe there is a better place.
+ /* Access method info */ + if (pset.sversion >= 120000 && verbose && tableinfo.relam != NULL && + !(pset.hide_tableam && tableinfo.relam_is_default)) + { + printfPQExpBuffer(&buf, _("Access method: %s"), fmtId(tableinfo.relam));So this will make psql hide the access method if it's same as the
default. I understand that this was kind of concluded in the other
thread "Displaying and dumping of table access methods". But IMHO, if
the hide_tableam is false, we should *always* show the access method,
regardless of the default value. I mean, we can make it simple : off
means never show table-access, on means always show table-access,
regardless of the default access method. And this also will work with
regression tests. If some regression test wants specifically to output
the access method, it can have a "\SET HIDE_TABLEAM off" command.If we hide the method if it's default, then for a regression test that
wants to forcibly show the table access method of all tables, it won't
show up for tables that have default access method.
I can't imagine, what kind of test would need to forcibly show the table access
method of all the tables? Even if you need to verify tableam for something,
maybe it's even easier just to select it from pg_am?
+ if (pset.sversion >= 120000 && verbose && tableinfo.relam != NULL &&
If the server does not support relam, tableinfo.relam will be NULL
anyways. So I think sversion check is not needed.
------------+ printfPQExpBuffer(&buf, _("Access method: %s"), fmtId(tableinfo.relam)); fmtId is not required. -----------+ printfPQExpBuffer(&buf, _("Access method: %s"), fmtId(tableinfo.relam)); + printTableAddFooter(&cont, buf.data); + } + + }Last two blank lines are not needed.
Right, fixed.
+ bool hide_tableam;
} PsqlSettings;These variables, it seems, are supposed to be grouped together by type.
Well, this grouping looks strange for me. But since I don't have a strong
opinion, I moved the variable.
I believe you are going to add a new regression testcase for the change ?
Yep.
Attachments:
psql_describe_am_v2.patchapplication/octet-stream; name=psql_describe_am_v2.patchDownload
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 0a181b01d9..f76c734a28 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1484,6 +1484,8 @@ describeOneTableDetails(const char *schemaname,
char *reloftype;
char relpersistence;
char relreplident;
+ char *relam;
+ bool relam_is_default;
} tableinfo;
bool show_column_details = false;
@@ -1503,9 +1505,11 @@ describeOneTableDetails(const char *schemaname,
"c.relhastriggers, c.relrowsecurity, c.relforcerowsecurity, "
"false AS relhasoids, %s, c.reltablespace, "
"CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END, "
- "c.relpersistence, c.relreplident\n"
+ "c.relpersistence, c.relreplident, am.amname,"
+ "am.amname = current_setting('default_table_access_method') \n"
"FROM pg_catalog.pg_class c\n "
"LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
+ "LEFT JOIN pg_catalog.pg_am am ON (c.relam = am.oid)\n"
"WHERE c.oid = '%s';",
(verbose ?
"pg_catalog.array_to_string(c.reloptions || "
@@ -1656,6 +1660,17 @@ describeOneTableDetails(const char *schemaname,
*(PQgetvalue(res, 0, 11)) : 0;
tableinfo.relreplident = (pset.sversion >= 90400) ?
*(PQgetvalue(res, 0, 12)) : 'd';
+ if (pset.sversion >= 120000)
+ {
+ tableinfo.relam = PQgetisnull(res, 0, 13) ?
+ (char *) NULL : pg_strdup(PQgetvalue(res, 0, 13));
+ tableinfo.relam_is_default = strcmp(PQgetvalue(res, 0, 14), "t") == 0;
+ }
+ else
+ {
+ tableinfo.relam = NULL;
+ tableinfo.relam_is_default = false;
+ }
PQclear(res);
res = NULL;
@@ -3141,6 +3156,14 @@ describeOneTableDetails(const char *schemaname,
/* Tablespace info */
add_tablespace_footer(&cont, tableinfo.relkind, tableinfo.tablespace,
true);
+
+ /* Access method info */
+ if (verbose && tableinfo.relam != NULL &&
+ !(pset.hide_tableam && tableinfo.relam_is_default))
+ {
+ printfPQExpBuffer(&buf, _("Access method: %s"), tableinfo.relam);
+ printTableAddFooter(&cont, buf.data);
+ }
}
/* reloptions, if verbose */
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 176c85afd0..058233b348 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -127,6 +127,7 @@ typedef struct _psqlSettings
bool quiet;
bool singleline;
bool singlestep;
+ bool hide_tableam;
int fetch_count;
int histsize;
int ignoreeof;
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index e7536a8a06..b757febcc5 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -1128,6 +1128,11 @@ show_context_hook(const char *newval)
return true;
}
+static bool
+hide_tableam_hook(const char *newval)
+{
+ return ParseVariableBool(newval, "HIDE_TABLEAM", &pset.hide_tableam);
+}
static void
EstablishVariableSpace(void)
@@ -1191,4 +1196,7 @@ EstablishVariableSpace(void)
SetVariableHooks(pset.vars, "SHOW_CONTEXT",
show_context_substitute_hook,
show_context_hook);
+ SetVariableHooks(pset.vars, "HIDE_TABLEAM",
+ bool_substitute_hook,
+ hide_tableam_hook);
}
diff --git a/src/test/regress/pg_regress_main.c b/src/test/regress/pg_regress_main.c
index bd613e4fda..1b4bca704b 100644
--- a/src/test/regress/pg_regress_main.c
+++ b/src/test/regress/pg_regress_main.c
@@ -74,10 +74,11 @@ psql_start_test(const char *testname,
}
offset += snprintf(psql_cmd + offset, sizeof(psql_cmd) - offset,
- "\"%s%spsql\" -X -a -q -d \"%s\" < \"%s\" > \"%s\" 2>&1",
+ "\"%s%spsql\" -X -a -q -d \"%s\" -v %s < \"%s\" > \"%s\" 2>&1",
bindir ? bindir : "",
bindir ? "/" : "",
dblist->str,
+ "HIDE_TABLEAM=\"on\"",
infile,
outfile);
if (offset >= sizeof(psql_cmd))
On Tue, Jan 15, 2019 at 6:05 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-01-15 18:02:38 +1100, Haribabu Kommi wrote:
On Tue, Dec 11, 2018 at 1:13 PM Andres Freund <andres@anarazel.de>
wrote:
Hi,
On 2018-11-26 17:55:57 -0800, Andres Freund wrote:
Further tasks I'm not yet planning to tackle, that I'd welcome help on:
- pg_upgrade testingI did the pg_upgrade testing from older version with some tables and
views
exists, and all of them are properly transformed into new server with
heap
as the default access method.
I will add the dimitry pg_dump patch and test the pg_upgrade to confirm
the proper access method is retained on the upgraded database.- I think we should consider removing HeapTuple->t_tableOid, it should
imo live entirely in the slotI removed the t_tableOid from HeapTuple and during testing I found some
problems with triggers, will post the patch once it is fixed.Please note that I'm working on a heavily revised version of the patch
right now, trying to clean up a lot of things (you might have seen some
of the threads I started). I hope to post it ~Thursday. Local-ish
patches shouldn't be a problem though.
Yes, I am checking you other threads of refactoring and cleanups.
I will rebase this patch once the revised code is available.
I am not able to remove the complete t_tableOid from HeapTuple,
because of its use in triggers, as the slot is not available in triggers
and I need to store the tableOid also as part of the tuple.
Currently setting of t_tableOid is done only when the tuple is formed
from the slot, and it is use is replaced with slot member.
comments?
Regards,
Haribabu Kommi
Fujitsu Australia
Attachments:
0001-Reduce-the-use-of-HeapTuple-t_tableOid.patchapplication/octet-stream; name=0001-Reduce-the-use-of-HeapTuple-t_tableOid.patchDownload
From 58ee84b870221a70f8995fd27f1de0e83ec5602a Mon Sep 17 00:00:00 2001
From: kommih <haribabuk@fast.au.fujitsu.com>
Date: Wed, 16 Jan 2019 18:43:47 +1100
Subject: [PATCH] Reduce the use of HeapTuple t_tableOid
Except trigger and where the HeapTuple is generated
and passed from slots, still the t_tableOid is used.
This needs to be replaced when the t_tableOid is stored
as a seperate variable/parameter.
---
contrib/hstore/hstore_io.c | 2 -
contrib/pg_visibility/pg_visibility.c | 1 -
contrib/pgstattuple/pgstatapprox.c | 1 -
contrib/pgstattuple/pgstattuple.c | 3 +-
contrib/postgres_fdw/postgres_fdw.c | 14 ++++++-
src/backend/access/common/heaptuple.c | 7 ----
src/backend/access/heap/heapam.c | 41 ++++++-------------
src/backend/access/heap/heapam_handler.c | 29 ++++++-------
src/backend/access/heap/heapam_visibility.c | 20 ++++-----
src/backend/access/heap/pruneheap.c | 2 -
src/backend/access/heap/tuptoaster.c | 3 --
src/backend/access/heap/vacuumlazy.c | 2 -
src/backend/access/index/genam.c | 4 +-
src/backend/catalog/indexing.c | 2 +-
src/backend/commands/analyze.c | 2 +-
src/backend/commands/functioncmds.c | 3 +-
src/backend/commands/schemacmds.c | 1 -
src/backend/commands/trigger.c | 21 +++++-----
src/backend/executor/execExprInterp.c | 1 -
src/backend/executor/execTuples.c | 25 +++++++----
src/backend/executor/execUtils.c | 2 -
src/backend/executor/nodeAgg.c | 3 +-
src/backend/executor/nodeGather.c | 1 +
src/backend/executor/nodeGatherMerge.c | 1 +
src/backend/executor/nodeIndexonlyscan.c | 4 +-
src/backend/executor/nodeIndexscan.c | 3 +-
src/backend/executor/nodeModifyTable.c | 6 +--
src/backend/executor/nodeSetOp.c | 1 +
src/backend/executor/spi.c | 1 -
src/backend/executor/tqueue.c | 1 -
src/backend/replication/logical/decode.c | 9 ----
.../replication/logical/reorderbuffer.c | 4 +-
src/backend/utils/adt/expandedrecord.c | 1 -
src/backend/utils/adt/jsonfuncs.c | 2 -
src/backend/utils/adt/rowtypes.c | 10 -----
src/backend/utils/cache/catcache.c | 1 -
src/backend/utils/sort/tuplesort.c | 7 ++--
src/include/access/heapam.h | 2 +-
src/include/executor/tuptable.h | 5 +--
src/include/utils/tqual.h | 1 +
src/pl/plpgsql/src/pl_exec.c | 2 -
src/test/regress/regress.c | 1 -
42 files changed, 98 insertions(+), 154 deletions(-)
diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 745497c76f..05244e77ef 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -845,7 +845,6 @@ hstore_from_record(PG_FUNCTION_ARGS)
/* Build a temporary HeapTuple control structure */
tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
ItemPointerSetInvalid(&(tuple.t_self));
- tuple.t_tableOid = InvalidOid;
tuple.t_data = rec;
values = (Datum *) palloc(ncolumns * sizeof(Datum));
@@ -998,7 +997,6 @@ hstore_populate_record(PG_FUNCTION_ARGS)
/* Build a temporary HeapTuple control structure */
tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
ItemPointerSetInvalid(&(tuple.t_self));
- tuple.t_tableOid = InvalidOid;
tuple.t_data = rec;
}
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index ce9ca704f6..1b1e00d724 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -655,7 +655,6 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
ItemPointerSet(&(tuple.t_self), blkno, offnum);
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = relid;
/*
* If we're checking whether the page is all-visible, we expect
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index c59fd10dc1..cef8606550 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -152,7 +152,6 @@ statapprox_heap(Relation rel, output_type *stat)
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(rel);
/*
* We follow VACUUM's lead in counting INSERT_IN_PROGRESS tuples
diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 520438d779..a39b03bde5 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -344,7 +344,8 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
/* must hold a buffer lock to call HeapTupleSatisfiesVisibility */
LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
- if (HeapTupleSatisfies(tuple, &SnapshotDirty, hscan->rs_cbuf))
+ if (HeapTupleSatisfies(tuple, RelationGetRelid(hscan->rs_scan.rs_rd),
+ &SnapshotDirty, hscan->rs_cbuf))
{
stat.tuple_len += tuple->t_len;
stat.tuple_count++;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index cc5b928950..5355e0e00e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1445,6 +1445,8 @@ postgresIterateForeignScan(ForeignScanState *node)
*/
ExecStoreHeapTuple(fsstate->tuples[fsstate->next_tuple++],
slot,
+ fsstate->rel ?
+ RelationGetRelid(fsstate->rel) : InvalidOid,
false);
return slot;
@@ -3538,7 +3540,11 @@ store_returning_result(PgFdwModifyState *fmstate,
NULL,
fmstate->temp_cxt);
/* tuple will be deleted when it is cleared from the slot */
- ExecStoreHeapTuple(newtup, slot, true);
+ ExecStoreHeapTuple(newtup,
+ slot,
+ fmstate->rel ?
+ RelationGetRelid(fmstate->rel) : InvalidOid,
+ true);
}
PG_CATCH();
{
@@ -3810,7 +3816,11 @@ get_returning_data(ForeignScanState *node)
dmstate->retrieved_attrs,
node,
dmstate->temp_cxt);
- ExecStoreHeapTuple(newtup, slot, false);
+ ExecStoreHeapTuple(newtup,
+ slot,
+ dmstate->rel ?
+ RelationGetRelid(dmstate->rel) : InvalidOid,
+ false);
}
PG_CATCH();
{
diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index 06dd628a5b..5beef33291 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -689,7 +689,6 @@ heap_copytuple(HeapTuple tuple)
newTuple = (HeapTuple) palloc(HEAPTUPLESIZE + tuple->t_len);
newTuple->t_len = tuple->t_len;
newTuple->t_self = tuple->t_self;
- newTuple->t_tableOid = tuple->t_tableOid;
newTuple->t_data = (HeapTupleHeader) ((char *) newTuple + HEAPTUPLESIZE);
memcpy((char *) newTuple->t_data, (char *) tuple->t_data, tuple->t_len);
return newTuple;
@@ -715,7 +714,6 @@ heap_copytuple_with_tuple(HeapTuple src, HeapTuple dest)
dest->t_len = src->t_len;
dest->t_self = src->t_self;
- dest->t_tableOid = src->t_tableOid;
dest->t_data = (HeapTupleHeader) palloc(src->t_len);
memcpy((char *) dest->t_data, (char *) src->t_data, src->t_len);
}
@@ -850,7 +848,6 @@ expand_tuple(HeapTuple *targetHeapTuple,
= targetTHeader
= (HeapTupleHeader) ((char *) *targetHeapTuple + HEAPTUPLESIZE);
(*targetHeapTuple)->t_len = len;
- (*targetHeapTuple)->t_tableOid = sourceTuple->t_tableOid;
(*targetHeapTuple)->t_self = sourceTuple->t_self;
targetTHeader->t_infomask = sourceTHeader->t_infomask;
@@ -1078,7 +1075,6 @@ heap_form_tuple(TupleDesc tupleDescriptor,
*/
tuple->t_len = len;
ItemPointerSetInvalid(&(tuple->t_self));
- tuple->t_tableOid = InvalidOid;
HeapTupleHeaderSetDatumLength(td, len);
HeapTupleHeaderSetTypeId(td, tupleDescriptor->tdtypeid);
@@ -1162,7 +1158,6 @@ heap_modify_tuple(HeapTuple tuple,
*/
newTuple->t_data->t_ctid = tuple->t_data->t_ctid;
newTuple->t_self = tuple->t_self;
- newTuple->t_tableOid = tuple->t_tableOid;
return newTuple;
}
@@ -1225,7 +1220,6 @@ heap_modify_tuple_by_cols(HeapTuple tuple,
*/
newTuple->t_data->t_ctid = tuple->t_data->t_ctid;
newTuple->t_self = tuple->t_self;
- newTuple->t_tableOid = tuple->t_tableOid;
return newTuple;
}
@@ -1465,7 +1459,6 @@ heap_tuple_from_minimal_tuple(MinimalTuple mtup)
result = (HeapTuple) palloc(HEAPTUPLESIZE + len);
result->t_len = len;
ItemPointerSetInvalid(&(result->t_self));
- result->t_tableOid = InvalidOid;
result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
memcpy((char *) result->t_data + MINIMAL_TUPLE_OFFSET, mtup, mtup->t_len);
memset(result->t_data, 0, offsetof(HeapTupleHeaderData, t_infomask2));
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f769d828ff..22080fc5bc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -423,7 +423,6 @@ heapgetpage(TableScanDesc sscan, BlockNumber page)
HeapTupleData loctup;
bool valid;
- loctup.t_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
loctup.t_len = ItemIdGetLength(lpp);
ItemPointerSet(&(loctup.t_self), page, lineoff);
@@ -431,7 +430,8 @@ heapgetpage(TableScanDesc sscan, BlockNumber page)
if (all_visible)
valid = true;
else
- valid = HeapTupleSatisfies(&loctup, snapshot, buffer);
+ valid = HeapTupleSatisfies(&loctup, RelationGetRelid(scan->rs_scan.rs_rd),
+ snapshot, buffer);
CheckForSerializableConflictOut(valid, scan->rs_scan.rs_rd, &loctup,
buffer, snapshot);
@@ -646,7 +646,8 @@ heapgettup(HeapScanDesc scan,
/*
* if current tuple qualifies, return it.
*/
- valid = HeapTupleSatisfies(tuple, snapshot, scan->rs_cbuf);
+ valid = HeapTupleSatisfies(tuple, RelationGetRelid(scan->rs_scan.rs_rd),
+ snapshot, scan->rs_cbuf);
CheckForSerializableConflictOut(valid, scan->rs_scan.rs_rd, tuple,
scan->rs_cbuf, snapshot);
@@ -1442,9 +1443,6 @@ heap_beginscan(Relation relation, Snapshot snapshot,
if (!is_bitmapscan && snapshot)
PredicateLockRelation(relation, snapshot);
- /* we only need to set this up once */
- scan->rs_ctup.t_tableOid = RelationGetRelid(relation);
-
/*
* we do this here instead of in initscan() because heap_rescan also calls
* initscan() and we don't want to allocate memory again
@@ -1657,6 +1655,7 @@ heap_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *s
pgstat_count_heap_getnext(scan->rs_scan.rs_rd);
+ slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
return ExecStoreBufferHeapTuple(&scan->rs_ctup, slot,
scan->rs_cbuf);
}
@@ -1760,12 +1759,11 @@ heap_fetch(Relation relation,
ItemPointerCopy(tid, &(tuple->t_self));
tuple->t_data = (HeapTupleHeader) PageGetItem(page, lp);
tuple->t_len = ItemIdGetLength(lp);
- tuple->t_tableOid = RelationGetRelid(relation);
/*
* check time qualification of tuple, then release lock
*/
- valid = HeapTupleSatisfies(tuple, snapshot, buffer);
+ valid = HeapTupleSatisfies(tuple, RelationGetRelid(relation), snapshot, buffer);
if (valid)
PredicateLockTuple(relation, tuple, snapshot);
@@ -1870,7 +1868,6 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
heapTuple->t_data = (HeapTupleHeader) PageGetItem(dp, lp);
heapTuple->t_len = ItemIdGetLength(lp);
- heapTuple->t_tableOid = RelationGetRelid(relation);
ItemPointerSetOffsetNumber(&heapTuple->t_self, offnum);
/*
@@ -1907,7 +1904,8 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
ItemPointerSet(&(heapTuple->t_self), BufferGetBlockNumber(buffer), offnum);
/* If it's visible per the snapshot, we must return it */
- valid = HeapTupleSatisfies(heapTuple, snapshot, buffer);
+ valid = HeapTupleSatisfies(heapTuple, RelationGetRelid(relation),
+ snapshot, buffer);
CheckForSerializableConflictOut(valid, relation, heapTuple,
buffer, snapshot);
/* reset to original, non-redirected, tid */
@@ -2064,7 +2062,6 @@ heap_get_latest_tid(Relation relation,
tp.t_self = ctid;
tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
tp.t_len = ItemIdGetLength(lp);
- tp.t_tableOid = RelationGetRelid(relation);
/*
* After following a t_ctid link, we might arrive at an unrelated
@@ -2081,7 +2078,7 @@ heap_get_latest_tid(Relation relation,
* Check time qualification of tuple; if visible, set it as the new
* result candidate.
*/
- valid = HeapTupleSatisfies(&tp, snapshot, buffer);
+ valid = HeapTupleSatisfies(&tp, RelationGetRelid(relation), snapshot, buffer);
CheckForSerializableConflictOut(valid, relation, &tp, buffer, snapshot);
if (valid)
*tid = ctid;
@@ -2433,7 +2430,6 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
HeapTupleHeaderSetCmin(tup->t_data, cid);
HeapTupleHeaderSetXmax(tup->t_data, 0); /* for cleanliness */
- tup->t_tableOid = RelationGetRelid(relation);
/*
* If the new tuple is too big for storage or contains already toasted
@@ -2491,9 +2487,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
{
heaptuples[i] = heap_prepare_insert(relation, ExecFetchSlotHeapTuple(slots[i], true, NULL),
xid, cid, options);
-
- if (slots[i]->tts_tableOid != InvalidOid)
- heaptuples[i]->t_tableOid = slots[i]->tts_tableOid;
}
/*
@@ -2883,7 +2876,6 @@ heap_delete(Relation relation, ItemPointer tid,
lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
Assert(ItemIdIsNormal(lp));
- tp.t_tableOid = RelationGetRelid(relation);
tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
tp.t_len = ItemIdGetLength(lp);
tp.t_self = *tid;
@@ -3000,7 +2992,7 @@ l1:
if (crosscheck != InvalidSnapshot && result == HeapTupleMayBeUpdated)
{
/* Perform additional check for transaction-snapshot mode RI updates */
- if (!HeapTupleSatisfies(&tp, crosscheck, buffer))
+ if (!HeapTupleSatisfies(&tp, RelationGetRelid(relation), crosscheck, buffer))
result = HeapTupleUpdated;
}
@@ -3404,14 +3396,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
* Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
* properly.
*/
- oldtup.t_tableOid = RelationGetRelid(relation);
oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
oldtup.t_len = ItemIdGetLength(lp);
oldtup.t_self = *otid;
- /* the new tuple is ready, except for this: */
- newtup->t_tableOid = RelationGetRelid(relation);
-
/* Determine columns modified by the update. */
modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
&oldtup, newtup);
@@ -3642,7 +3630,7 @@ l2:
if (crosscheck != InvalidSnapshot && result == HeapTupleMayBeUpdated)
{
/* Perform additional check for transaction-snapshot mode RI updates */
- if (!HeapTupleSatisfies(&oldtup, crosscheck, buffer))
+ if (!HeapTupleSatisfies(&oldtup, RelationGetRelid(relation), crosscheck, buffer))
result = HeapTupleUpdated;
}
@@ -4267,14 +4255,14 @@ ProjIndexIsUnchanged(Relation relation, HeapTuple oldtup, HeapTuple newtup)
int i;
ResetExprContext(econtext);
- ExecStoreHeapTuple(oldtup, slot, false);
+ ExecStoreHeapTuple(oldtup, slot, RelationGetRelid(relation),false);
FormIndexDatum(indexInfo,
slot,
estate,
old_values,
old_isnull);
- ExecStoreHeapTuple(newtup, slot, false);
+ ExecStoreHeapTuple(newtup, slot, RelationGetRelid(relation), false);
FormIndexDatum(indexInfo,
slot,
estate,
@@ -4486,7 +4474,6 @@ heap_lock_tuple(Relation relation, ItemPointer tid,
tuple->t_data = (HeapTupleHeader) PageGetItem(page, lp);
tuple->t_len = ItemIdGetLength(lp);
- tuple->t_tableOid = RelationGetRelid(relation);
tuple->t_self = *tid;
l3:
@@ -6044,7 +6031,6 @@ heap_abort_speculative(Relation relation, HeapTuple tuple)
lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
Assert(ItemIdIsNormal(lp));
- tp.t_tableOid = RelationGetRelid(relation);
tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
tp.t_len = ItemIdGetLength(lp);
tp.t_self = *tid;
@@ -7747,7 +7733,6 @@ log_heap_new_cid(Relation relation, HeapTuple tup)
HeapTupleHeader hdr = tup->t_data;
Assert(ItemPointerIsValid(&tup->t_self));
- Assert(tup->t_tableOid != InvalidOid);
xlrec.top_xid = GetTopTransactionId();
xlrec.target_node = relation->rd_node;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 95513dfec8..9119bdf162 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -90,8 +90,6 @@ heapam_heap_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
/* Update the tuple with table oid */
slot->tts_tableOid = RelationGetRelid(relation);
- if (slot->tts_tableOid != InvalidOid)
- tuple->t_tableOid = slot->tts_tableOid;
/* Perform the insertion, and copy the resulting ItemPointer */
heap_insert(relation, tuple, cid, options, bistate);
@@ -110,8 +108,6 @@ heapam_heap_insert_speculative(Relation relation, TupleTableSlot *slot, CommandI
/* Update the tuple with table oid */
slot->tts_tableOid = RelationGetRelid(relation);
- if (slot->tts_tableOid != InvalidOid)
- tuple->t_tableOid = slot->tts_tableOid;
HeapTupleHeaderSetSpeculativeToken(tuple->t_data, specToken);
@@ -386,10 +382,6 @@ heapam_heap_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
HTSU_Result result;
- /* Update the tuple with table oid */
- if (slot->tts_tableOid != InvalidOid)
- tuple->t_tableOid = slot->tts_tableOid;
-
result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
hufd, lockmode);
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
@@ -450,7 +442,7 @@ heapam_satisfies(Relation rel, TupleTableSlot *slot, Snapshot snapshot)
* Caller should be holding pin, but not lock.
*/
LockBuffer(bslot->buffer, BUFFER_LOCK_SHARE);
- res = HeapTupleSatisfies(bslot->base.tuple, snapshot, bslot->buffer);
+ res = HeapTupleSatisfies(bslot->base.tuple, RelationGetRelid(rel), snapshot, bslot->buffer);
LockBuffer(bslot->buffer, BUFFER_LOCK_UNLOCK);
return res;
@@ -984,7 +976,7 @@ IndexBuildHeapRangeScan(Relation heapRelation,
MemoryContextReset(econtext->ecxt_per_tuple_memory);
/* Set up for predicate or expression evaluation */
- ExecStoreHeapTuple(heapTuple, slot, false);
+ ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(sscan->rs_rd), false);
/*
* In a partial index, discard tuples that don't satisfy the
@@ -1240,7 +1232,7 @@ validate_index_heapscan(Relation heapRelation,
MemoryContextReset(econtext->ecxt_per_tuple_memory);
/* Set up for predicate or expression evaluation */
- ExecStoreHeapTuple(heapTuple, slot, false);
+ ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(sscan->rs_rd), false);
/*
* In a partial index, discard tuples that don't satisfy the
@@ -1393,9 +1385,11 @@ heapam_scan_bitmap_pagescan(TableScanDesc sscan,
continue;
loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
loctup.t_len = ItemIdGetLength(lp);
- loctup.t_tableOid = scan->rs_scan.rs_rd->rd_id;
ItemPointerSet(&loctup.t_self, page, offnum);
- valid = HeapTupleSatisfies(&loctup, snapshot, buffer);
+ valid = HeapTupleSatisfies(&loctup,
+ RelationGetRelid(scan->rs_scan.rs_rd),
+ snapshot,
+ buffer);
if (valid)
{
scan->rs_vistuples[ntup++] = offnum;
@@ -1432,7 +1426,6 @@ heapam_scan_bitmap_pagescan_next(TableScanDesc sscan, TupleTableSlot *slot)
scan->rs_ctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
scan->rs_ctup.t_len = ItemIdGetLength(lp);
- scan->rs_ctup.t_tableOid = scan->rs_scan.rs_rd->rd_id;
ItemPointerSet(&scan->rs_ctup.t_self, scan->rs_cblock, targoffset);
pgstat_count_heap_fetch(scan->rs_scan.rs_rd);
@@ -1444,6 +1437,7 @@ heapam_scan_bitmap_pagescan_next(TableScanDesc sscan, TupleTableSlot *slot)
ExecStoreBufferHeapTuple(&scan->rs_ctup,
slot,
scan->rs_cbuf);
+ slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
scan->rs_cindex++;
@@ -1490,7 +1484,9 @@ SampleHeapTupleVisible(HeapScanDesc scan, Buffer buffer,
else
{
/* Otherwise, we have to check the tuple individually. */
- return HeapTupleSatisfies(tuple, scan->rs_scan.rs_snapshot, buffer);
+ return HeapTupleSatisfies(tuple,
+ RelationGetRelid(scan->rs_scan.rs_rd),
+ scan->rs_scan.rs_snapshot, buffer);
}
}
@@ -1635,6 +1631,7 @@ heapam_scan_sample_next_tuple(TableScanDesc sscan, struct SampleScanState *scans
continue;
ExecStoreBufferHeapTuple(tuple, slot, scan->rs_cbuf);
+ slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
/* Found visible tuple, return it. */
if (!pagemode)
@@ -1720,7 +1717,6 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
ItemPointerSet(&targtuple->t_self, scan->rs_cblock, scan->rs_cindex);
- targtuple->t_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
targtuple->t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
targtuple->t_len = ItemIdGetLength(itemid);
@@ -1792,6 +1788,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc sscan, TransactionId OldestXmin, do
if (sample_it)
{
ExecStoreBufferHeapTuple(targtuple, slot, scan->rs_cbuf);
+ slot->tts_tableOid = RelationGetRelid(scan->rs_scan.rs_rd);
scan->rs_cindex++;
/* note that we leave the buffer locked here! */
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 1ac1a20c1d..cd4d3af3c3 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -179,7 +179,6 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
HeapTupleHeader tuple = htup->t_data;
Assert(ItemPointerIsValid(&htup->t_self));
- Assert(htup->t_tableOid != InvalidOid);
if (!HeapTupleHeaderXminCommitted(tuple))
{
@@ -370,7 +369,6 @@ HeapTupleSatisfiesToast(HeapTuple htup, Snapshot snapshot,
HeapTupleHeader tuple = htup->t_data;
Assert(ItemPointerIsValid(&htup->t_self));
- Assert(htup->t_tableOid != InvalidOid);
if (!HeapTupleHeaderXminCommitted(tuple))
{
@@ -464,7 +462,6 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
HeapTupleHeader tuple = htup->t_data;
Assert(ItemPointerIsValid(&htup->t_self));
- Assert(htup->t_tableOid != InvalidOid);
if (!HeapTupleHeaderXminCommitted(tuple))
{
@@ -757,7 +754,6 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
HeapTupleHeader tuple = htup->t_data;
Assert(ItemPointerIsValid(&htup->t_self));
- Assert(htup->t_tableOid != InvalidOid);
snapshot->xmin = snapshot->xmax = InvalidTransactionId;
snapshot->speculativeToken = 0;
@@ -981,7 +977,6 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
HeapTupleHeader tuple = htup->t_data;
Assert(ItemPointerIsValid(&htup->t_self));
- Assert(htup->t_tableOid != InvalidOid);
if (!HeapTupleHeaderXminCommitted(tuple))
{
@@ -1183,7 +1178,6 @@ HeapTupleSatisfiesVacuum(HeapTuple stup, TransactionId OldestXmin,
HeapTupleHeader tuple = htup->t_data;
Assert(ItemPointerIsValid(&htup->t_self));
- Assert(htup->t_tableOid != InvalidOid);
/*
* Has inserting transaction committed?
@@ -1611,7 +1605,6 @@ HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin)
HeapTupleHeader tuple = htup->t_data;
Assert(ItemPointerIsValid(&htup->t_self));
- Assert(htup->t_tableOid != InvalidOid);
/*
* If the inserting transaction is marked invalid, then it aborted, and
@@ -1675,7 +1668,7 @@ TransactionIdInArray(TransactionId xid, TransactionId *xip, Size num)
* complicated than when dealing "only" with the present.
*/
static bool
-HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
+HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Oid relid, Snapshot snapshot,
Buffer buffer)
{
HeapTupleHeader tuple = htup->t_data;
@@ -1683,7 +1676,6 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
TransactionId xmax = HeapTupleHeaderGetRawXmax(tuple);
Assert(ItemPointerIsValid(&htup->t_self));
- Assert(htup->t_tableOid != InvalidOid);
/* inserting transaction aborted */
if (HeapTupleHeaderXminInvalid(tuple))
@@ -1704,7 +1696,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
* values externally.
*/
resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(), snapshot,
- htup, buffer,
+ htup, relid, buffer,
&cmin, &cmax);
if (!resolved)
@@ -1775,7 +1767,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
/* Lookup actual cmin/cmax values */
resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(), snapshot,
- htup, buffer,
+ htup, relid, buffer,
&cmin, &cmax);
if (!resolved)
@@ -1813,8 +1805,10 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
}
bool
-HeapTupleSatisfies(HeapTuple stup, Snapshot snapshot, Buffer buffer)
+HeapTupleSatisfies(HeapTuple stup, Oid relid, Snapshot snapshot, Buffer buffer)
{
+ Assert(relid != InvalidOid);
+
switch (snapshot->visibility_type)
{
case MVCC_VISIBILITY:
@@ -1833,7 +1827,7 @@ HeapTupleSatisfies(HeapTuple stup, Snapshot snapshot, Buffer buffer)
return HeapTupleSatisfiesDirty(stup, snapshot, buffer);
break;
case HISTORIC_MVCC_VISIBILITY:
- return HeapTupleSatisfiesHistoricMVCC(stup, snapshot, buffer);
+ return HeapTupleSatisfiesHistoricMVCC(stup, relid, snapshot, buffer);
break;
case NON_VACUUMABLE_VISIBILTY:
return HeapTupleSatisfiesNonVacuumable(stup, snapshot, buffer);
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c2f5343dac..79964e157a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -367,8 +367,6 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
i;
HeapTupleData tup;
- tup.t_tableOid = RelationGetRelid(relation);
-
rootlp = PageGetItemId(dp, rootoffnum);
/*
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 486cde4aff..5349dbf805 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1023,7 +1023,6 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
result_tuple = (HeapTuple) palloc0(HEAPTUPLESIZE + new_tuple_len);
result_tuple->t_len = new_tuple_len;
result_tuple->t_self = newtup->t_self;
- result_tuple->t_tableOid = newtup->t_tableOid;
new_data = (HeapTupleHeader) ((char *) result_tuple + HEAPTUPLESIZE);
result_tuple->t_data = new_data;
@@ -1124,7 +1123,6 @@ toast_flatten_tuple(HeapTuple tup, TupleDesc tupleDesc)
* a syscache entry.
*/
new_tuple->t_self = tup->t_self;
- new_tuple->t_tableOid = tup->t_tableOid;
new_tuple->t_data->t_choice = tup->t_data->t_choice;
new_tuple->t_data->t_ctid = tup->t_data->t_ctid;
@@ -1195,7 +1193,6 @@ toast_flatten_tuple_to_datum(HeapTupleHeader tup,
/* Build a temporary HeapTuple control structure */
tmptup.t_len = tup_len;
ItemPointerSetInvalid(&(tmptup.t_self));
- tmptup.t_tableOid = InvalidOid;
tmptup.t_data = tup;
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 429f9ad52a..87589a927e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1004,7 +1004,6 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(onerel);
tupgone = false;
@@ -2238,7 +2237,6 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(rel);
switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
{
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5f033c5ee4..02afc191b6 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -462,7 +462,7 @@ systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup)
Assert(BufferIsValid(hscan->xs_cbuf));
/* must hold a buffer lock to call HeapTupleSatisfiesVisibility */
LockBuffer(hscan->xs_cbuf, BUFFER_LOCK_SHARE);
- result = HeapTupleSatisfies(tup, freshsnap, hscan->xs_cbuf);
+ result = HeapTupleSatisfies(tup, RelationGetRelid(sysscan->heap_rel), freshsnap, hscan->xs_cbuf);
LockBuffer(hscan->xs_cbuf, BUFFER_LOCK_UNLOCK);
}
else
@@ -474,7 +474,7 @@ systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup)
Assert(BufferIsValid(scan->rs_cbuf));
/* must hold a buffer lock to call HeapTupleSatisfiesVisibility */
LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
- result = HeapTupleSatisfies(tup, freshsnap, scan->rs_cbuf);
+ result = HeapTupleSatisfies(tup, RelationGetRelid(sysscan->heap_rel), freshsnap, scan->rs_cbuf);
LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
}
return result;
diff --git a/src/backend/catalog/indexing.c b/src/backend/catalog/indexing.c
index 52a2ccb40f..88b8df0b7a 100644
--- a/src/backend/catalog/indexing.c
+++ b/src/backend/catalog/indexing.c
@@ -97,7 +97,7 @@ CatalogIndexInsert(CatalogIndexState indstate, HeapTuple heapTuple)
/* Need a slot to hold the tuple being examined */
slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation),
&TTSOpsHeapTuple);
- ExecStoreHeapTuple(heapTuple, slot, false);
+ ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(heapRelation), false);
/*
* for each index, form and insert the index tuple
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 29e2377b52..b2697dae44 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -758,7 +758,7 @@ compute_index_stats(Relation onerel, double totalrows,
ResetExprContext(econtext);
/* Set up for predicate or expression evaluation */
- ExecStoreHeapTuple(heapTuple, slot, false);
+ ExecStoreHeapTuple(heapTuple, slot, RelationGetRelid(onerel), false);
/* If index is partial, check predicate */
if (predicate != NULL)
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index ebece4d1d7..a9f7d55940 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -2360,10 +2360,9 @@ ExecuteCallStmt(CallStmt *stmt, ParamListInfo params, bool atomic, DestReceiver
rettupdata.t_len = HeapTupleHeaderGetDatumLength(td);
ItemPointerSetInvalid(&(rettupdata.t_self));
- rettupdata.t_tableOid = InvalidOid;
rettupdata.t_data = td;
- slot = ExecStoreHeapTuple(&rettupdata, tstate->slot, false);
+ slot = ExecStoreHeapTuple(&rettupdata, tstate->slot, InvalidOid, false);
tstate->dest->receiveSlot(slot, tstate->dest);
end_tup_output(tstate);
diff --git a/src/backend/commands/schemacmds.c b/src/backend/commands/schemacmds.c
index f0ebe2d1c3..ba31f46019 100644
--- a/src/backend/commands/schemacmds.c
+++ b/src/backend/commands/schemacmds.c
@@ -355,7 +355,6 @@ AlterSchemaOwner_internal(HeapTuple tup, Relation rel, Oid newOwnerId)
{
Form_pg_namespace nspForm;
- Assert(tup->t_tableOid == NamespaceRelationId);
Assert(RelationGetRelid(rel) == NamespaceRelationId);
nspForm = (Form_pg_namespace) GETSTRUCT(tup);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 6a00a96f59..313222008d 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2573,7 +2573,7 @@ ExecBRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
}
if (newtuple != oldtuple)
{
- ExecForceStoreHeapTuple(newtuple, slot);
+ ExecForceStoreHeapTuple(newtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
newtuple = ExecFetchSlotHeapTuple(slot, true, NULL);
}
}
@@ -2653,7 +2653,8 @@ ExecIRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
}
if (oldtuple != newtuple)
{
- ExecForceStoreHeapTuple(newtuple, LocTriggerData.tg_trigslot);
+ ExecForceStoreHeapTuple(newtuple, LocTriggerData.tg_trigslot,
+ RelationGetRelid(relinfo->ri_RelationDesc));
newtuple = ExecFetchSlotHeapTuple(slot, true, NULL);
}
}
@@ -2777,7 +2778,7 @@ ExecBRDeleteTriggers(EState *estate, EPQState *epqstate,
else
{
trigtuple = fdw_trigtuple;
- ExecForceStoreHeapTuple(trigtuple, slot);
+ ExecForceStoreHeapTuple(trigtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
}
LocTriggerData.type = T_TriggerData;
@@ -2854,7 +2855,7 @@ ExecARDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
}
else
{
- ExecForceStoreHeapTuple(fdw_trigtuple, slot);
+ ExecForceStoreHeapTuple(fdw_trigtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
}
AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_DELETE,
@@ -2884,7 +2885,7 @@ ExecIRDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
LocTriggerData.tg_oldtable = NULL;
LocTriggerData.tg_newtable = NULL;
- ExecForceStoreHeapTuple(trigtuple, slot);
+ ExecForceStoreHeapTuple(trigtuple, slot, RelationGetRelid(relinfo->ri_RelationDesc));
for (i = 0; i < trigdesc->numtriggers; i++)
{
@@ -3044,7 +3045,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
}
else
{
- ExecForceStoreHeapTuple(fdw_trigtuple, oldslot);
+ ExecForceStoreHeapTuple(fdw_trigtuple, oldslot, RelationGetRelid(relinfo->ri_RelationDesc));
trigtuple = fdw_trigtuple;
}
@@ -3090,7 +3091,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
}
if (newtuple != oldtuple)
- ExecForceStoreHeapTuple(newtuple, newslot);
+ ExecForceStoreHeapTuple(newtuple, newslot, RelationGetRelid(relinfo->ri_RelationDesc));
}
if (false && trigtuple != fdw_trigtuple && trigtuple != newtuple)
heap_freetuple(trigtuple);
@@ -3132,7 +3133,7 @@ ExecARUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
NULL,
NULL);
else if (fdw_trigtuple != NULL)
- ExecForceStoreHeapTuple(fdw_trigtuple, oldslot);
+ ExecForceStoreHeapTuple(fdw_trigtuple, oldslot, RelationGetRelid(relinfo->ri_RelationDesc));
AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_UPDATE,
true, oldslot, newslot, recheckIndexes,
@@ -3161,7 +3162,7 @@ ExecIRUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
LocTriggerData.tg_oldtable = NULL;
LocTriggerData.tg_newtable = NULL;
- ExecForceStoreHeapTuple(trigtuple, oldslot);
+ ExecForceStoreHeapTuple(trigtuple, oldslot, RelationGetRelid(relinfo->ri_RelationDesc));
for (i = 0; i < trigdesc->numtriggers; i++)
{
@@ -3193,7 +3194,7 @@ ExecIRUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
return false; /* "do nothing" */
if (oldtuple != newtuple)
- ExecForceStoreHeapTuple(newtuple, newslot);
+ ExecForceStoreHeapTuple(newtuple, newslot, RelationGetRelid(relinfo->ri_RelationDesc));
}
return true;
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 6cac1cf99c..b2a70bc07d 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3004,7 +3004,6 @@ ExecEvalFieldStoreDeForm(ExprState *state, ExprEvalStep *op, ExprContext *econte
tuphdr = DatumGetHeapTupleHeader(tupDatum);
tmptup.t_len = HeapTupleHeaderGetDatumLength(tuphdr);
ItemPointerSetInvalid(&(tmptup.t_self));
- tmptup.t_tableOid = InvalidOid;
tmptup.t_data = tuphdr;
heap_deform_tuple(&tmptup, tupDesc,
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index d91a71a7c1..ac8e8dc8cd 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -383,7 +383,7 @@ tts_heap_copyslot(TupleTableSlot *dstslot, TupleTableSlot *srcslot)
tuple = ExecCopySlotHeapTuple(srcslot);
MemoryContextSwitchTo(oldcontext);
- ExecStoreHeapTuple(tuple, dstslot, true);
+ ExecStoreHeapTuple(tuple, dstslot, srcslot->tts_tableOid, true);
}
static HeapTuple
@@ -1126,6 +1126,7 @@ MakeTupleTableSlot(TupleDesc tupleDesc,
slot->tts_tupleDescriptor = tupleDesc;
slot->tts_mcxt = CurrentMemoryContext;
slot->tts_nvalid = 0;
+ slot->tts_tableOid = InvalidOid;
if (tupleDesc != NULL)
{
@@ -1388,6 +1389,7 @@ ExecSetSlotDescriptor(TupleTableSlot *slot, /* slot to change */
TupleTableSlot *
ExecStoreHeapTuple(HeapTuple tuple,
TupleTableSlot *slot,
+ Oid relid,
bool shouldFree)
{
/*
@@ -1405,7 +1407,7 @@ ExecStoreHeapTuple(HeapTuple tuple,
else
elog(ERROR, "trying to store a heap tuple into wrong type of slot");
- slot->tts_tableOid = tuple->t_tableOid;
+ slot->tts_tableOid = relid;
return slot;
}
@@ -1446,8 +1448,6 @@ ExecStoreBufferHeapTuple(HeapTuple tuple,
elog(ERROR, "trying to store an on-disk heap tuple into wrong type of slot");
tts_buffer_heap_store_tuple(slot, tuple, buffer);
- slot->tts_tableOid = tuple->t_tableOid;
-
return slot;
}
@@ -1482,11 +1482,12 @@ ExecStoreMinimalTuple(MinimalTuple mtup,
*/
void
ExecForceStoreHeapTuple(HeapTuple tuple,
- TupleTableSlot *slot)
+ TupleTableSlot *slot,
+ Oid relid)
{
if (TTS_IS_HEAPTUPLE(slot))
{
- ExecStoreHeapTuple(tuple, slot, false);
+ ExecStoreHeapTuple(tuple, slot, relid, false);
}
else if (TTS_IS_BUFFERTUPLE(slot))
{
@@ -1499,6 +1500,7 @@ ExecForceStoreHeapTuple(HeapTuple tuple,
oldContext = MemoryContextSwitchTo(slot->tts_mcxt);
bslot->base.tuple = heap_copytuple(tuple);
MemoryContextSwitchTo(oldContext);
+ slot->tts_tableOid = relid;
}
else
{
@@ -1506,6 +1508,7 @@ ExecForceStoreHeapTuple(HeapTuple tuple,
heap_deform_tuple(tuple, slot->tts_tupleDescriptor,
slot->tts_values, slot->tts_isnull);
ExecStoreVirtualTuple(slot);
+ slot->tts_tableOid = relid;
}
}
@@ -1639,6 +1642,8 @@ ExecStoreAllNullTuple(TupleTableSlot *slot)
HeapTuple
ExecFetchSlotHeapTuple(TupleTableSlot *slot, bool materialize, bool *shouldFree)
{
+ HeapTuple htup;
+
/*
* sanity checks
*/
@@ -1653,14 +1658,18 @@ ExecFetchSlotHeapTuple(TupleTableSlot *slot, bool materialize, bool *shouldFree)
{
if (shouldFree)
*shouldFree = true;
- return slot->tts_ops->copy_heap_tuple(slot);
+ htup = slot->tts_ops->copy_heap_tuple(slot);
}
else
{
if (shouldFree)
*shouldFree = false;
- return slot->tts_ops->get_heap_tuple(slot);
+ htup = slot->tts_ops->get_heap_tuple(slot);
}
+
+ htup->t_tableOid = slot->tts_tableOid;
+
+ return htup;
}
/* --------------------------------
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 4031642b80..db2020bd0d 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1070,7 +1070,6 @@ GetAttributeByName(HeapTupleHeader tuple, const char *attname, bool *isNull)
*/
tmptup.t_len = HeapTupleHeaderGetDatumLength(tuple);
ItemPointerSetInvalid(&(tmptup.t_self));
- tmptup.t_tableOid = InvalidOid;
tmptup.t_data = tuple;
result = heap_getattr(&tmptup,
@@ -1118,7 +1117,6 @@ GetAttributeByNum(HeapTupleHeader tuple,
*/
tmptup.t_len = HeapTupleHeaderGetDatumLength(tuple);
ItemPointerSetInvalid(&(tmptup.t_self));
- tmptup.t_tableOid = InvalidOid;
tmptup.t_data = tuple;
result = heap_getattr(&tmptup,
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index daf56cd3d1..ced48bb791 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -1801,7 +1801,8 @@ agg_retrieve_direct(AggState *aggstate)
* cleared from the slot.
*/
ExecForceStoreHeapTuple(aggstate->grp_firstTuple,
- firstSlot);
+ firstSlot,
+ InvalidOid);
aggstate->grp_firstTuple = NULL; /* don't keep two pointers */
/* set up for first advance_aggregates call */
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 1dd8bb3f3a..1d4d79a3ab 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -280,6 +280,7 @@ gather_getnext(GatherState *gatherstate)
{
ExecStoreHeapTuple(tup, /* tuple to store */
fslot, /* slot to store the tuple */
+ InvalidOid,
true); /* pfree tuple when done with it */
return fslot;
}
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 54ef0ca7b7..73625965b2 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -703,6 +703,7 @@ gather_merge_readnext(GatherMergeState *gm_state, int reader, bool nowait)
ExecStoreHeapTuple(tup, /* tuple to store */
gm_state->gm_slots[reader], /* slot in which to store
* the tuple */
+ InvalidOid,
true); /* pfree tuple when done with it */
return true;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c39c4f453d..3c51c4f635 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -203,8 +203,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
*/
Assert(slot->tts_tupleDescriptor->natts ==
scandesc->xs_hitupdesc->natts);
- ExecForceStoreHeapTuple(scandesc->xs_hitup, slot);
- slot->tts_tableOid = RelationGetRelid(scandesc->heapRelation);
+ ExecForceStoreHeapTuple(scandesc->xs_hitup, slot,
+ RelationGetRelid(scandesc->heapRelation));
}
else if (scandesc->xs_itup)
StoreIndexTuple(slot, scandesc->xs_itup, scandesc->xs_itupdesc);
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index b38dadaa9a..28b3bdb4d4 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -247,8 +247,7 @@ IndexNextWithReorder(IndexScanState *node)
tuple = reorderqueue_pop(node);
/* Pass 'true', as the tuple in the queue is a palloc'd copy */
- slot->tts_tableOid = RelationGetRelid(scandesc->heapRelation);
- ExecStoreHeapTuple(tuple, slot, true);
+ ExecStoreHeapTuple(tuple, slot, RelationGetRelid(scandesc->heapRelation), true);
return slot;
}
}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index d1ac9fc2e9..8aa7501830 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -842,7 +842,7 @@ ldelete:;
slot = ExecTriggerGetReturnSlot(estate, resultRelationDesc);
if (oldtuple != NULL)
{
- ExecForceStoreHeapTuple(oldtuple, slot);
+ ExecForceStoreHeapTuple(oldtuple, slot, RelationGetRelid(resultRelationDesc));
}
else
{
@@ -2035,10 +2035,6 @@ ExecModifyTable(PlanState *pstate)
oldtupdata.t_len =
HeapTupleHeaderGetDatumLength(oldtupdata.t_data);
ItemPointerSetInvalid(&(oldtupdata.t_self));
- /* Historically, view triggers see invalid t_tableOid. */
- oldtupdata.t_tableOid =
- (relkind == RELKIND_VIEW) ? InvalidOid :
- RelationGetRelid(resultRelInfo->ri_RelationDesc);
oldtuple = &oldtupdata;
}
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 48b7aa9b8b..4f7da00d82 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -269,6 +269,7 @@ setop_retrieve_direct(SetOpState *setopstate)
*/
ExecStoreHeapTuple(setopstate->grp_firstTuple,
resultTupleSlot,
+ InvalidOid,
true);
setopstate->grp_firstTuple = NULL; /* don't keep two pointers */
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 34664e76d1..d9398ed527 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -870,7 +870,6 @@ SPI_modifytuple(Relation rel, HeapTuple tuple, int natts, int *attnum,
*/
mtuple->t_data->t_ctid = tuple->t_data->t_ctid;
mtuple->t_self = tuple->t_self;
- mtuple->t_tableOid = tuple->t_tableOid;
}
else
{
diff --git a/src/backend/executor/tqueue.c b/src/backend/executor/tqueue.c
index e2b596cf74..d3ef18e264 100644
--- a/src/backend/executor/tqueue.c
+++ b/src/backend/executor/tqueue.c
@@ -206,7 +206,6 @@ TupleQueueReaderNext(TupleQueueReader *reader, bool nowait, bool *done)
* (which had better be sufficiently aligned).
*/
ItemPointerSetInvalid(&htup.t_self);
- htup.t_tableOid = InvalidOid;
htup.t_len = nbytes;
htup.t_data = data;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index e3b05657f8..a3430e1336 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -940,12 +940,6 @@ DecodeMultiInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
/* not a disk based tuple */
ItemPointerSetInvalid(&tuple->tuple.t_self);
- /*
- * We can only figure this out after reassembling the
- * transactions.
- */
- tuple->tuple.t_tableOid = InvalidOid;
-
tuple->tuple.t_len = datalen + SizeofHeapTupleHeader;
memset(header, 0, SizeofHeapTupleHeader);
@@ -1033,9 +1027,6 @@ DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tuple)
/* not a disk based tuple */
ItemPointerSetInvalid(&tuple->tuple.t_self);
- /* we can only figure this out after reassembling the transactions */
- tuple->tuple.t_tableOid = InvalidOid;
-
/* data is not stored aligned, copy to aligned storage */
memcpy((char *) &xlhdr,
data,
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 23466bade2..60ee12b91a 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3482,7 +3482,7 @@ UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
bool
ResolveCminCmaxDuringDecoding(HTAB *tuplecid_data,
Snapshot snapshot,
- HeapTuple htup, Buffer buffer,
+ HeapTuple htup, Oid relid, Buffer buffer,
CommandId *cmin, CommandId *cmax)
{
ReorderBufferTupleCidKey key;
@@ -3524,7 +3524,7 @@ restart:
*/
if (ent == NULL && !updated_mapping)
{
- UpdateLogicalMappings(tuplecid_data, htup->t_tableOid, snapshot);
+ UpdateLogicalMappings(tuplecid_data, relid, snapshot);
/* now check but don't update for a mapping again */
updated_mapping = true;
goto restart;
diff --git a/src/backend/utils/adt/expandedrecord.c b/src/backend/utils/adt/expandedrecord.c
index 5561b741e9..fbf26b0891 100644
--- a/src/backend/utils/adt/expandedrecord.c
+++ b/src/backend/utils/adt/expandedrecord.c
@@ -610,7 +610,6 @@ make_expanded_record_from_datum(Datum recorddatum, MemoryContext parentcontext)
tmptup.t_len = HeapTupleHeaderGetDatumLength(tuphdr);
ItemPointerSetInvalid(&(tmptup.t_self));
- tmptup.t_tableOid = InvalidOid;
tmptup.t_data = tuphdr;
oldcxt = MemoryContextSwitchTo(objcxt);
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index fc1581c92b..5fe0659dcf 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -3147,7 +3147,6 @@ populate_record(TupleDesc tupdesc,
/* Build a temporary HeapTuple control structure */
tuple.t_len = HeapTupleHeaderGetDatumLength(defaultval);
ItemPointerSetInvalid(&(tuple.t_self));
- tuple.t_tableOid = InvalidOid;
tuple.t_data = defaultval;
/* Break down the tuple into fields */
@@ -3546,7 +3545,6 @@ populate_recordset_record(PopulateRecordsetState *state, JsObject *obj)
/* ok, save into tuplestore */
tuple.t_len = HeapTupleHeaderGetDatumLength(tuphead);
ItemPointerSetInvalid(&(tuple.t_self));
- tuple.t_tableOid = InvalidOid;
tuple.t_data = tuphead;
tuplestore_puttuple(state->tuple_store, &tuple);
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index 5f729342f8..060ee6c6ca 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -324,7 +324,6 @@ record_out(PG_FUNCTION_ARGS)
/* Build a temporary HeapTuple control structure */
tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
ItemPointerSetInvalid(&(tuple.t_self));
- tuple.t_tableOid = InvalidOid;
tuple.t_data = rec;
/*
@@ -671,7 +670,6 @@ record_send(PG_FUNCTION_ARGS)
/* Build a temporary HeapTuple control structure */
tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
ItemPointerSetInvalid(&(tuple.t_self));
- tuple.t_tableOid = InvalidOid;
tuple.t_data = rec;
/*
@@ -821,11 +819,9 @@ record_cmp(FunctionCallInfo fcinfo)
/* Build temporary HeapTuple control structures */
tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
ItemPointerSetInvalid(&(tuple1.t_self));
- tuple1.t_tableOid = InvalidOid;
tuple1.t_data = record1;
tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
ItemPointerSetInvalid(&(tuple2.t_self));
- tuple2.t_tableOid = InvalidOid;
tuple2.t_data = record2;
/*
@@ -1063,11 +1059,9 @@ record_eq(PG_FUNCTION_ARGS)
/* Build temporary HeapTuple control structures */
tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
ItemPointerSetInvalid(&(tuple1.t_self));
- tuple1.t_tableOid = InvalidOid;
tuple1.t_data = record1;
tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
ItemPointerSetInvalid(&(tuple2.t_self));
- tuple2.t_tableOid = InvalidOid;
tuple2.t_data = record2;
/*
@@ -1326,11 +1320,9 @@ record_image_cmp(FunctionCallInfo fcinfo)
/* Build temporary HeapTuple control structures */
tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
ItemPointerSetInvalid(&(tuple1.t_self));
- tuple1.t_tableOid = InvalidOid;
tuple1.t_data = record1;
tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
ItemPointerSetInvalid(&(tuple2.t_self));
- tuple2.t_tableOid = InvalidOid;
tuple2.t_data = record2;
/*
@@ -1570,11 +1562,9 @@ record_image_eq(PG_FUNCTION_ARGS)
/* Build temporary HeapTuple control structures */
tuple1.t_len = HeapTupleHeaderGetDatumLength(record1);
ItemPointerSetInvalid(&(tuple1.t_self));
- tuple1.t_tableOid = InvalidOid;
tuple1.t_data = record1;
tuple2.t_len = HeapTupleHeaderGetDatumLength(record2);
ItemPointerSetInvalid(&(tuple2.t_self));
- tuple2.t_tableOid = InvalidOid;
tuple2.t_data = record2;
/*
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index b31fd5acea..7bf2f4617f 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1846,7 +1846,6 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
MAXIMUM_ALIGNOF + dtp->t_len);
ct->tuple.t_len = dtp->t_len;
ct->tuple.t_self = dtp->t_self;
- ct->tuple.t_tableOid = dtp->t_tableOid;
ct->tuple.t_data = (HeapTupleHeader)
MAXALIGN(((char *) ct) + sizeof(CatCTup));
/* copy tuple contents */
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 7d2b6facf2..3bd8cde14b 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -3792,11 +3792,11 @@ comparetup_cluster(const SortTuple *a, const SortTuple *b,
ecxt_scantuple = GetPerTupleExprContext(state->estate)->ecxt_scantuple;
- ExecStoreHeapTuple(ltup, ecxt_scantuple, false);
+ ExecStoreHeapTuple(ltup, ecxt_scantuple, InvalidOid, false);
FormIndexDatum(state->indexInfo, ecxt_scantuple, state->estate,
l_index_values, l_index_isnull);
- ExecStoreHeapTuple(rtup, ecxt_scantuple, false);
+ ExecStoreHeapTuple(rtup, ecxt_scantuple, InvalidOid, false);
FormIndexDatum(state->indexInfo, ecxt_scantuple, state->estate,
r_index_values, r_index_isnull);
@@ -3926,8 +3926,7 @@ readtup_cluster(Tuplesortstate *state, SortTuple *stup,
tuple->t_len = t_len;
LogicalTapeReadExact(state->tapeset, tapenum,
&tuple->t_self, sizeof(ItemPointerData));
- /* We don't currently bother to reconstruct t_tableOid */
- tuple->t_tableOid = InvalidOid;
+
/* Read in the tuple body */
LogicalTapeReadExact(state->tapeset, tapenum,
tuple->t_data, tuple->t_len);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a309db1a1c..8dc1880925 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -218,7 +218,7 @@ extern void heap_vacuum_rel(Relation onerel, int options,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
/* in heap/heapam_visibility.c */
-extern bool HeapTupleSatisfies(HeapTuple stup, Snapshot snapshot, Buffer buffer);
+extern bool HeapTupleSatisfies(HeapTuple stup, Oid relid, Snapshot snapshot, Buffer buffer);
extern HTSU_Result HeapTupleSatisfiesUpdate(HeapTuple stup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple stup, TransactionId OldestXmin,
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index c87689b3dd..b1d69ab5ea 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -306,9 +306,8 @@ extern TupleTableSlot *MakeSingleTupleTableSlot(TupleDesc tupdesc,
extern void ExecDropSingleTupleTableSlot(TupleTableSlot *slot);
extern void ExecSetSlotDescriptor(TupleTableSlot *slot, TupleDesc tupdesc);
extern TupleTableSlot *ExecStoreHeapTuple(HeapTuple tuple,
- TupleTableSlot *slot,
- bool shouldFree);
-extern void ExecForceStoreHeapTuple(HeapTuple tuple, TupleTableSlot *slot);
+ TupleTableSlot *slot, Oid relid, bool shouldFree);
+extern void ExecForceStoreHeapTuple(HeapTuple tuple, TupleTableSlot *slot, Oid relid);
/* FIXME: Remove */
extern void ExecForceStoreHeapTupleDatum(Datum data, TupleTableSlot *slot);
extern TupleTableSlot *ExecStoreBufferHeapTuple(HeapTuple tuple,
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 1fe9cc6402..ccd81dff39 100644
--- a/src/include/utils/tqual.h
+++ b/src/include/utils/tqual.h
@@ -39,6 +39,7 @@ struct HTAB;
extern bool ResolveCminCmaxDuringDecoding(struct HTAB *tuplecid_data,
Snapshot snapshot,
HeapTuple htup,
+ Oid relid,
Buffer buffer,
CommandId *cmin, CommandId *cmax);
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 1e0617322b..efce063ecd 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -7234,7 +7234,6 @@ deconstruct_composite_datum(Datum value, HeapTupleData *tmptup)
/* Build a temporary HeapTuple control structure */
tmptup->t_len = HeapTupleHeaderGetDatumLength(td);
ItemPointerSetInvalid(&(tmptup->t_self));
- tmptup->t_tableOid = InvalidOid;
tmptup->t_data = td;
/* Extract rowtype info and find a tupdesc */
@@ -7403,7 +7402,6 @@ exec_move_row_from_datum(PLpgSQL_execstate *estate,
/* Build a temporary HeapTuple control structure */
tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
ItemPointerSetInvalid(&(tmptup.t_self));
- tmptup.t_tableOid = InvalidOid;
tmptup.t_data = td;
/* Extract rowtype info */
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index a2e57768d4..76dff2a51d 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -524,7 +524,6 @@ make_tuple_indirect(PG_FUNCTION_ARGS)
/* Build a temporary HeapTuple control structure */
tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
ItemPointerSetInvalid(&(tuple.t_self));
- tuple.t_tableOid = InvalidOid;
tuple.t_data = rec;
values = (Datum *) palloc(ncolumns * sizeof(Datum));
--
2.18.0.windows.1
Hi,
(resending with compressed attachements, perhaps that'll go through)
On 2018-12-10 18:13:40 -0800, Andres Freund wrote:
On 2018-11-26 17:55:57 -0800, Andres Freund wrote:
FWIW, now that oids are removed, and the tuple table slot abstraction
got in, I'm working on rebasing the pluggable storage patchset ontop of
that.I've pushed a version to that to the git tree, including a rebased
version of zheap:
https://github.com/anarazel/postgres-pluggable-storage
https://github.com/anarazel/postgres-pluggable-zheap
I've pushed the newest, substantially revised, version to the same
repository. Note, that while the newest pluggable-zheap version is newer
than my last email, it's not based on the latest version, and the
pluggable-zheap development is now happening in the main zheap
repository.
My next steps are:
- make relation creation properly pluggable
- remove the typedefs from tableam.h, instead move them into the
TableAmRoutine struct.
- Move rs_{nblocks, startblock, numblocks} out of TableScanDescData
- Move HeapScanDesc and IndexFetchHeapData out of relscan.h
- remove ExecSlotCompare(), it's entirely unrelated to these changes imo
(and in the wrong place)
These are done.
- split pluggable storage patchset, to commit earlier:
- EvalPlanQual slotification
- trigger slotification
- split of IndexBuildHeapScan out of index.c
The patchset is now pretty granularly split into individual pieces.
There's two commits that might be worthwhile to split up further:
1) The commit introducing table_beginscan et al, currently also
introduces indexscans through tableam.
2) The commit introducing table_(insert|delete|update) also includes
table_lock_tuple(), which in turn changes a bunch of EPQ related
code. It's probably worthwhile to break that out.
I tried to make each individual commit make some sense, and pass all
tests on its own. That requires some changes that are then obsolted
in a later commit, but it's not as much as I feared.
- rename HeapUpdateFailureData et al to not reference Heap
I've not done that, I decided it's best to do that after all the work
has gone in.
- See if the slot in SysScanDescData can be avoided, it's not exactly
free of overhead.
After reconsidering, I don't think it's worth doing so.
There's pretty substantial changes in this series, besides the things
mentioned above:
- I re-introduced parallel scan into pluggable storage, but added a set
of helper functions to avoid having to duplicate the current block
based logic from heap. That way it can be shared between most/all
block based AMs
- latestRemovedXid handling is moved into the table-AM, that's required
for correct replay on Hot-Standby, where we do not know the AM of the
current
- the whole truncation and relation creation code has been overhauled
- the order of functions in tableam.h, heapam_handler.c etc has been
made more sensible
- a number of callbacks have been obsoleted (relation_sync,
relation_create_init_fork, scansetlimits)
- A bunch of prerequisite work has been merged
- (heap|relation)_(open|openrv|close) have been split into their own
files
- To avoid having to care about the bulk-insert flags code that uses a
bulk-insert now unconditionally calls table_finish_bulk_insert(). The
AM then internally can decide what it needs to do in case of
e.g. HEAP_INSERT_SKIP_WAL. Zheap currently for example doesn't
implement that (because UNDO handling is complicated), and this way it
can just ignore the option, without needing call-site code for that.
- A *lot* of cleanups
Todo:
- merge psql / pg_dump support by Dmitry
- consider removing scan_update_snapshot
- consider removing table_gimmegimmeslot()
- add substantial docs for every callback
- consider revising the current table_lock_tuple() API, I'm not quite
convinced that's right
- reconsider heap_fetch() API changes, causes unnecessary pain
- polish the split out trigger and EPQ changes, so they can be merged
soon-ish
I plan to merge the first few commits pretty soon (as largely announced
in related threads).
While I saw an initial attempt at writing smgl docs for the table AM
API, I'm not convinced that's the best approach. I think it might make
more sense to have high-level docs in sgml, but then do all the
per-callback docs in tableam.h.
Greetings,
Andres Freund
Attachments:
v12-0003-Replace-uses-of-heap_open-et-al-with-table_open-.patch.gzapplication/x-patch-gzipDownload
��'E\ v12-0003-Replace-uses-of-heap_open-et-al-with-table_open-.patch �\�o�F����+6�=���p^�����/{b{��I6%��Px����U���I��I��x�����>���������cC{<���`6��c������y�6z�m�H�|0����������;�������_A5����B��P$�������s����v-�����)��a��o��|����a�t�g��Y�����f|�������w��?`���h�/v/�.7K"h)��J��"X���q�=;����pEzwnw����x��������:>���_8��D���l�����dD�2����;�?���5��������>3?��{�J�N�D����o��]����?b�
R��:�u��.JMJR�������F�f%��2����>jz8(K��ED�Yb���:,���H�D�=��z���X�����u�J�� L5�P/��A����py i������ "����0
�P&|������#t|���y�z��^Y��-���u�(a.��j��x�K�q��u)#�(���S��8�IM�\���t���]@j�xE��T��������
������I���^k'h����'#��D��B�����Z���cGf=�2���/�����E�vU���u�������+������!G��r�n���E �F"�_�e�����a3F@J����c�@��Wg���4�x�\.��I������'�5���bm�� ��u���l���u����
�j�W���.��[9Zi�R�V�G����.u��*o���n/b��e)�����,:�n��w��L����:��|�n~�&�q�R��D�>��G
��R�)��R������R:���=o����(��2��ZF���Y�:�0Xkl��}Z)��X@*�\V~3�����Z\�.NZ�h�p�~��M\��%�Y�s��Rk�
A�}`�����y<���A�ryi�����}��h��u 4Z��"g��F9
\�I��]�$���C�A�t��;"a�&��� �sI�f��V������@1>����f����3�r��8��VO-�S�-}��A&5nc�8<4W@�2�B��Mf��6�K"]`��?O�L�y��W�Ga&�l��-�����/I����z_-I��w'� ,��E�'k�T��W�\�\�P��O�:Q\��V��w�2�r^�U
{�P�|��E���,�����`MKR~`��i��(Ii���\�$�CW'>�qI8@&G_� )�z=6p���I� ��0�Uj�D��)�R��p���� 1x%q��T 4�����:�����II���8�h��3�U��a;�(�P� ���C����V�{�Ob��*R�z]Zq�2���a�7;x
��1U6g kA.�OZ� �KqB����o<-��I�U:L��D;%�*�9�f!O��\*t����5�J%�Hj�����T$\�R5��&;�5%��#*�����B�l#�L���H�P���J)�F��.�W�p)�K���!��h�kl��f��j���;1q#~�!��R�H�����0��id��>(�1s�KVe��:>D
�������g W�[�����c���x����l�x0����������{�io�����h21g������F�����}H3wh��g����3e��n��:� �������c�J,��2��e A�TN���m��
�u����� ��� �
��'oD|���A^e&��k(��[��~���@���Q=P*������Y���:^K�{M��lO���KA����
<����I=t����jt�H(�T��*Q����������Q7a�� �����}0�����7p��h��N����S���z�V�(���������
��"^�?����1����b4T0��t��RP(Bu���� 6�B���������{�~�`q�"���vqxv2���X�'�3[#�VN��u�����Yg�H�.>����f��
�������� �������5����������;�(�HG���*�?^��p�!y,����3D�aW�����:�S����6 ��E�H���}�` �El�t�WW�[�c����;%nL~@���48yU��Q,�0��� ����o��a7w��{wq�y������.�4����=��5N�q�x���q��G�v�-4q�5�� �qlGX��b7Iuj�d�r����v.�+���\�Z�v�Zfm��@�HO_�C����������st��F�S�=sc:��'b�!`��e����6=���)���Bd4$���HZ= d�*�b��I�����f�)���v�������X�����]�Tx'$Sa!��LK���7�Q�����z]�����D
�\���$j��0$rd�^��c��n��J�,
<���M<`�C$l�n��B���e'4��|�c��!C:L`��<����<�6-�w����v�'"���AyME��b�OTaL�]b��*VT-��I;C`��nX�b�+0-;8@BO<(kx�;���'p �$�I��!���;$U��#T���U�A����W_�1�a�6�lh���q@L��c��lm�����6w�S���b������4�J�x��
@�P�jM�|�m%���-��T����i������g�����T
9�M���7���,����~���]���y��9��-�{<C �.��R|��Z�_>���]\��^�<��?��Vo��)cKe�Ao>X\��Y�k'3kd�vC�i��b}!�A�7�u�}�*� 7U������(�������p_�f����cHU0k}���yC�
���@���X2����2ef�H�����5j`�( �P�`�T�n�
C���GHn{)��\,���VRVNZ*�0��@$�)�<��VLy�����Q��R�@es��e� c��:�6'���(��]���j<�w��7W��%���[S�����*A�����&i\��^���0��E&�G��k�l�@Z/���X*���R��A��9�h���������}�����*��Y��\�=,��_^]��A��L?����r�6��������k
���������7��<�%��t�W��@� ����$��k1c��M�q!LF*'�*R��r>�Ft�N����
�]��HQ2Yw)?���8(e&Y� ��8�P���w~4D(������6�A�G��LSA��\=������_�^x�Q"
QL ��
���[9w�
��T�N���,��w��l�gCn�y��h[}-A�R�e����+�C�R��DL��u����K�;Y��������/�9��� �������Kr?�����*!�s���h{B��#59�IM���5 �}h��������:
s�u�<q��aB%G��� O7����r�X^+#�p�*��`��3� !/1���@�%��x�n���/����2v ���H=��C]_�G�}MJK8YPv|`�g���w�1I<��8J�q�B��kP����P��l8�f�75`�gF����f0�����2�i
��������*�H-P�y��u���L��q�,dGV,<�c,�������1�Y�!T��C��@�O>�H����H�?@��}i}�@����{��� �b����C �p\�L���<R����m��5�M�����i6��wb�>mk��g�A���n�i��u����6wE�6�D���������(=1XG�z�\�?7!��=CH�������jw�����O�i�k�m0�C�*��zA�&��d
�G�X�`:4�:"~������������.�(1�������/���3�$��+?@��F���4M��1+ ����Q�Dz��d��AZ��h7��a�K�'���'���s���W�q7��z� 5���\������������������{ou��
���h6�
y��G��������W��c[=D��)CO�\�j<����{%bHP�wS� �7V ����!Y�E�ZB���D8���<9�����
�A����a�mYZ��v�|����u�
�����~����!`G�]b i3���X^^�����~����9�����rbz���M)�|#�pK��tgUm�j��������D�J<��������\N�d�P#tD@L|�D����A�Zq�&5� f������g�V��i\��0�iNF��d6X>
GD4����P �=��<����>e��@���������J],�����������^�
�
��Q5����}�L����%���Z��#�5nR��73�Y:�}GI-����W��^��<OB�6�i���Y�� ����oP�_�P�a���<�p��9;��Y<���0?��?/��1_z����<B3�����A/���>LqB����q��l���6�������.,,O��4<�
��ez�������{�C����+�T����6�4�-���j����^���W��B)��E�S6]w?��_��x��<���L�e'���I|���L�B}J��E��b����)9�_��&*W�p���2)�����>��,���Ge,���a�'��~&.�^zr
S����d{�)���������Gdd^b�>�U���b���\�����`����+-�)|���A�t)(�Mc�2Wv�t����xm*#>���%h���
���MGSj?GS�r�G�2�x`Z� �M8����5�i����Y%�L:{�� ��"A6��f�/_���E �n{H!YQ�X������_0��FF�M{�bJ�QrN�T���|(O�}�J�3��iw ����^^^ *;��_X2�H��G�l`�&�A�b�7&���ie�wKU�5���r���!������"6���k���:r)���/��);��t~���� "?D��o^e+:h�;��3�����c!��%������e�������f$��o�����R9������AUk���v�,�-n#�A������9Q�����r����+;���!7#�s����2�|����n��?�-��G�:�N����_��?;��4�����[� ���������l�����=�zpB�����(��0�P��i@�P*������O�N1������T8m����2�%Ro��{r�x��6�aEQ�y<%�;�!����vqvs�8��A��n�\>�73��`{�o�����������|F�^S��K03ynl6/hp��\�@��pNVA�[~�c��-����������5�?q���y*Ll�l�E�������<�>�c��W���?|�
��"*����������b@�o
���Cz��t H��=���$�o�^nC�����f\�K_�D-�_�/��K��:��j
��Z�����\�S����'�-�� ��E"o:�',j����G���������ok�7i��p��f�XwBT�G1�-\,��cm��x8����t2����gvCV��vt���
��@��
� Qz����h\�?�� �e��C�TRG��)|�������y3nH���TM���,��A��*[!����Q1�Al�T�m�����6�P���O����kD��_��'���d7d_e[��r� ����m��J3�z�pu�����s5���8���R�iQxt(�*/�^�=�,��a�;�kZ ��Jw����l.�U�:a��a��sp;H�+<������[�:���"r 8b��L���x<�c��9QS��*��z)��x�������2):1n��/p����|�P]?�RX�13%l��-�5K5^�X�r�����:�fj��'�����������$�[~���:�#sx��7��%��wl�#����;�$@ k�` ��v6�����
7��Nvrf�l4�����u����� )��9���V��GS�!_-�c35u�������G�4�� �l�������������J��B5 �G��o�$��]�}����[���G�G� ���N�F�����Q��D�RA\�O����xB�on��MA,ff��<3�4�v�W q?b �49&M&9����XI,�#>��p���h�W��SX�����f��u��C��J�-g�|7����6J+�\�nE�G[�i2|V�k�k��Uk��d����IL�j6lN����_�F���q����\�hQH
]c~��������y�M��hP�Is2mw�F�Z�&�~���{����u����u�%���8���Fc��n���%��`Q�b��-�$��'N|7\<q��4����%����,��0� �U\iY�8�^0��|V|�YQ@<���2�:
&J�K�����}B
��id�s���V�N�mM�`�}N�������'�/���R<���z0�|��|t3<���"���L�"�.��q�Q<I��x��R������4��z��o�;������~�o5{�K"�����n��85���k�;�=��
�
E!u�����������)&�g��|/��.�Lr9L��I��Y������`����Y���8��\�:=r�E�~�x6b�������iU!���0�w#]��a(|��A���@��t�ll���lp�}�y��=���hg�1��;�R3��/��n�8����s�MlS��
R�`D�|����:vN������������
�G�N��������� ����d&�Y�2}T����Y�X�e��q$�P��UE[�}�FPL*�\N�����?�������]
�Q^R�Mz���'r��)?B����)�2����V��o���=d����>E�h@4 <N���+�~zM��,c �t��?���#���*V��oo"�,�pZ�<�����D�����9a��x.$�� �����Z��za a�~�_��[�_�� �,�F��O� �Pq��P��
�I[�4������tD���FD ���nT{�E0t;�����z�l_����fPCns`��g�]F�`����8�
u���� ���������'��`��95�_���n������5m��V��oV��I��5��i����:J+9�&��%��P����bq�1u��*���~r WJp�E��0;#�.����k@B�M��:=4|���������������������1OM��C����2���0 ��~�wd 9q�Y������� ���z��w�|y�-���<2H �
#��a5�����rn�8UTi2���r���?����+��������&_e�����Nk��u ���0��.�P�$�a)KL#���[�;�CW��
s"�����\0����yI�KB���pegAtB����������N�XOs�W$� ��5}C#Nj����c���58^
�y����{�1��M�D�&-;�nG68?��.d�y,�+�~�F��B*�=j�9.���d*?r����#�fBBzE������Z�W����:h�8JT�(v��_��>{�3lI~)��f��j@�����6GG��!%�8�������b�y�����>�'M�������
a�~�uM6�5������|��DF|5�Ig���@uxP�|A�Q"9��m��l7����\C�L7��S$��V�k<=��/��:"&���I��x_
>����(#II%��k!���|e��+�S�l��u��f�]L�UMw��A�J��k7�96�g2J �|��/�)�6�� �N��
PE�����`V|�z&���M����; �&�&��2�Y�dks�CT����M��85�)��� �
���$�r��5S_��O����B��l��AO]^��/^�8�����f�
������tx���fP��Y8`[pn���`�F������������]x��&���5�W��d�W�?p�V�� �5��7��h.iR:6_K���b�&��i��3� �v7~����$�\].^_�n�?
�7s
��~S��-x�E��|���_����W{8��%���
X%�������������h�uqG���j�����]��_R!ysz�����"�D����Q��-x��� A�nG���7�����H�[���E���������
9D�+�;��+b5��.����d���������uk�~]�������6B9N�9�q����I2Q/��#e�//H�5�E���$����Y�|?�ZCb��/����fp����h��7���GJw�����$t���_��[�%t��_�?����3�5����[���s�Z��_�z�Ny|�3�\��6F��������M����0�l��:��������sz�d�9��Od� ����i�}��N�%���hGAT|��v����n���O-i�^�RzEbh�����k����`..T�,NX����S�����9��SQs��L;lv�^���������z0�^�?J'��f����O�(|E*qc����"}�<�c�sm�v�V���S :s�I���p�,f�Ay0�����TC����M��D#�����,g�A0��l7zL������F�G&�_w`AcBJ0aa���� �A����X��@�w��
{���}/I���V�A��� �m����N�<�gGA����Ec���m��&mi%y���\"S�:�K�S����Fnf�����Qi��w�wT-���Z�����&Y�(��5�����q()X`�����HL<V��L~|fM���!�x�U0��%E�������e]��xb�;�z�Z�����^����F�Qw���9�X��Q���9��9 y�
�L`/�_ELTn�������������Wg�������Oo��fq�V����������!�?���g~s��)�s!%R�6����d��/r[������+���#��y�k�N�,EB���98��X?�;����:X��!�'~�z�������O�4�-���b��|���/&���l"s��fo�7Z�j����}
b�Vev���fd�`[X���g�AQ0�T���</Bt��@�����}�e8y���L�{�2��u��C������tI��*�G"�x��J��9u��`Z�����5��x�u*�
��q�QwZl%n�`r'����x=�_���\��H�/<_���K0@�\�[�I!/KS:Y�����B���t�K�) 9_��r��[���b[Kj� ��X`
-��&��4%���-omr�q�}5��_��`�3 ��T�4�*�A��@|�J���xn���
��[������������������f�\-X�/\_�Y���r7���1#>+�m�I���P-�c��[-��<u�
�_]]���p=��<{}-��7W��<�g�d��R���!
�E��J��
�`a�P#��|�;��>�LV�Z|=<Y�}�'�A9��#t�����l��k�������n�G�P�������d�]���%��9�1+nw�[��t�Xub*��������]��o����[���)�s�����
'�(R�~�Q�����Z�����W�O��w��+�� ���S��:�@h#KZ������.y�����e �r��m"��{����8� �d.("�].���.�����;������'oIl��7����D_����T�b��*W��h1|����`������r���bE��(d�mYT�F�
%Q(��$��>/������h��9�ML>dw�uJ���p�Z)KO(%?�}U�����D��p#�D���$.��1'�$�:`R�1��$�����y!f��x�Y3���2v�\�_%;�h^�g�yp�����vV���[�����/bGV�b�J�q2mX ���X<���,v�yMwbm��z��eS[,�"�Fd��\[��$����C�'W+��F�+s�}�h�9�S�;����������:r��|��e
-6����?*a }i_�($����&w��W�\��s����g&���NV��^�B�[F�4��8������H%���=~����Y�@w�%t����$TrG�I����$^�=���B*K����W�u:���#���)1��8��ZY?a�x�X�S�����r�og��L���M����v����6�mb�������e�����y�Ute�=X
F%JG���].����JZ*�F�MOq�h?���9�������C�r:^�k@��lL������t����T(��[_Y�,1*��M��������1��V�U0����3A�K[]��s�}��O�#}�uK�!0�x�4�o�y�8G��`�r��!
�B`�����i�,EJ9�5����v%����[c�%������I8_��`����B����&�zF0nS@P��Zz�����yl9ITj���k������t������z�v���L�r_ �A� g�}�����bN"l��5������8�Y���!��L�d������zJ�[�������XW���S
@��Hh'V%�f��'b���V��E>b�uI��8x
0�o� ����_2
G���������jI�A|�'bg�X)#����������Q�d�JK�'��m��-����i�I��9��Z6��RL���v����
g�c��8���O��ED�b]�q+x�\ ������\fSp�4�~�vt��-����F$� �M�G?=�����A�1�TJ2s��^<O���A+��`���&u������y����������,��X~�����5�I�S���dZ��j���Qk�p����x0����H��'bb��X����i #9��3:Rkz��^�����5m�,Yx�����U,:)r�~5{S��-g=��h��b�}��Y%%JE�m� <��I�g����P�N<f�3.��)�ox~44k���^A���;�����<I�������q�]�Ye3��6Q6eC��r�����@���X�S�)xrj�+SHC�G���7p�����'�x�D���p�OKxk ��V�
!`�r���� F��2�Wwwb)rTC�4�pX]%�n���z��j)�=��� �m��e0�^�j\�[J�ZO�N�A�AG� ����J����Q a�P��i��X��;������c��;�SL%�vNo���*/�&��|'��E��a��������F1�stJ����`=(��-�U�����KNN���e22r��9���9�c�*��������#��+���E���6:T`�����~t�C�� �A�'������6&�H.CO�+)��Zm4���4� fZ�����CL[���<������7��o���2��K�
i� 6�Q?<-s��pc����X��]������h��7R��Q�d=?���a������Y%~'9�Y���}�a�*�0)��#6N��w�6�_�8R����Q�zH<��:B=����60��L����l�����v@������W��a��D��4`����Y���C�I�c6r����*e�(����A���1�Lo7:�S5�Z*�����~��� Gb�S�:^�U�"8*A�~������8���cP����sGR���)3 �\x*T7�2T�B�9U=N������(\�`P���&����'�8Bur>�������J��m��p>VF��+��� �~������'�4��(y�s��K!(�@����������E�3� ��'6����8���Ku�lv�-.����y`�q.�E�_�V����������5�����=p��^�%�\��M�a��\*g�t���\�w+8<� ��P�b�l:�'���7��O��"t��n0C�W�]��3E����TH���g+hYp����]6�tpC%P�0UeP*����,K<:�C%�t�1�*���-'#G�o����{n�|(h�f�l�(��������V��� m�� �!e�h)�^K��w��t,����������Q�����3;3!UK��x����UY�g�XU��rn�����o�(���HdU�K��5C�u=�d���K�cUbxze��r���h�y����q���L�d���T���>P��:�������og���"