What to name the current heap after pluggable storage / what to rename?
Hi,
The current pluggable table storage patchset [1]/messages/by-id/20180703070645.wchpu5muyto5n647@alap3.anarazel.de introduces the ability
to specify the access method of a table (CREATE TABLE ... USING
"ident"). The patchset currently names the current storage method
(i.e. heapam.c et al) "heap" (whereas e.g. zheap's in named, drumroll,
zheap).
I'm concerned that naming it heap, and the corresponding functions
fitting with that name, will be confusing. There's a lot of functions
that have a heap_ prefix (or similar) that aren't really dependent on
how the storage works, but much more general infrastructure (consider
e.g. heap_create_with_catalog()).
One solution is to just live with the confusion, add a few header
comments to files like src/backend/catalog/heap.c, explaining that
they're more general than heapam.c (and in the patch heapam_handler.c,
which mostly dispatches to heapam.c).
Another approach would be to come up with a different, potentially
witty, name for the current postgres table access method. Then either
rename a few of the heapam.c et functions, or live with the slightly
more confusing names.
Another would be to be aggressive in renaming, and deconflict by
renaming functions like heap_create[_with_catalog] etc to sound more
accurate. I think that has some appeal, because a lot of those names
aren't describing their tasks particularly well.
Greetings,
Andres Freund
[1]: /messages/by-id/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
On 19/12/2018 05:17, Andres Freund wrote:
I'm concerned that naming it heap, and the corresponding functions
fitting with that name, will be confusing. There's a lot of functions
that have a heap_ prefix (or similar) that aren't really dependent on
how the storage works, but much more general infrastructure (consider
e.g. heap_create_with_catalog()).
I'm wondering where the choice of the name "heap" originally came from
and what it refers to. In "The Design of Postgres"[0]http://db.cs.berkeley.edu/papers/ERL-M85-95.pdf, it is said that
"All relations will be stored as heaps (as in [ASTR76]) with an optional
collection of secondary indexes." But ASTR76[1]http://daslab.seas.harvard.edu/reading-group/papers/astrahan-1976.pdf does not mention the
word "heap", so it doesn't appear to refer to any specific method or
algorithm.
The heap is clearly not the tree data structure. Is it meant in the
sense of memory allocation, a place to store a large amount of stuff in
a mostly unorganized way?
[0]: http://db.cs.berkeley.edu/papers/ERL-M85-95.pdf
[1]: http://daslab.seas.harvard.edu/reading-group/papers/astrahan-1976.pdf
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I'm wondering where the choice of the name "heap" originally came from
and what it refers to.
It seems to me that "heap" is an Oracle word (as explained here[1]https://docs.oracle.com/en/database/oracle/oracle-database/18/tgsql/optimizer-access-paths.html#GUID-53905BCD-3EAD-4B44-8CCC-C442A80647E8).
By default, a table is organized as a heap, which means that the database
places rows where they fit best rather than in a user-specified order.
[1]: https://docs.oracle.com/en/database/oracle/oracle-database/18/tgsql/optimizer-access-paths.html#GUID-53905BCD-3EAD-4B44-8CCC-C442A80647E8
https://docs.oracle.com/en/database/oracle/oracle-database/18/tgsql/optimizer-access-paths.html#GUID-53905BCD-3EAD-4B44-8CCC-C442A80647E8
--
Adoptez l'éco-attitude
N'imprimez ce mail que si c'est vraiment nécessaire
On Wed, Dec 19, 2018 at 7:44 PM Arkhena <Arkhena@gmail.com> wrote:
I'm wondering where the choice of the name "heap" originally came from
and what it refers to.It seems to me that "heap" is an Oracle word (as explained here[1]).
By default, a table is organized as a heap, which means that the database places rows where they fit best rather than in a user-specified order.
No, it's more widely used than that, and we're using it with the
standard meaning AFAIK:
https://docs.microsoft.com/en-us/sql/relational-databases/indexes/heaps-tables-without-clustered-indexes?view=sql-server-2017
http://docs.actian.com/ingres/10.2/index.html#page/DatabaseAdmin/Heap_Storage_Structure.htm
It just means tuples stored in no particular order (as opposed to eg
btree tables, in systems that support those).
--
Thomas Munro
http://www.enterprisedb.com
On 19/12/2018 11:15, Thomas Munro wrote:
It just means tuples stored in no particular order (as opposed to eg
btree tables, in systems that support those).
So would the proposed pluggable storage API allow index-organized base
storage and other non-heap layouts?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
On 2018-12-19 12:02:38 +0100, Peter Eisentraut wrote:
On 19/12/2018 11:15, Thomas Munro wrote:
It just means tuples stored in no particular order (as opposed to eg
btree tables, in systems that support those).So would the proposed pluggable storage API allow index-organized base
storage and other non-heap layouts?
Well, that depends on what "non-heap layouts" you're thinking of. I
think there'd be some further work needed to make efficient IOTs
possible, but the patchset gets us a long way to be able to do that in a
pluggable fashion. Biggest problem would probably be extending the
existing index AMs, for secondary indexes, to point to a key wider than
a tid, without loosing too much efficiency.
Greetings,
Andres Freund
Hi,
On 2018-Dec-19, Andres Freund wrote:
Well, that depends on what "non-heap layouts" you're thinking of. I
think there'd be some further work needed to make efficient IOTs
possible, but the patchset gets us a long way to be able to do that in a
pluggable fashion. Biggest problem would probably be extending the
existing index AMs, for secondary indexes, to point to a key wider than
a tid, without loosing too much efficiency.
I think the important question is will we eventually get the option to
do "CREATE TABLE ... STORAGE indexorg" (or whatever name) rather than
are we already getting that feature, and I think the answer is clearly
yes, so we should keep using the word "heap" in the name as the primary
feature of the historical implementation.
The "zheap" name makes it clear that it is still a heap; the main
difference (if I understand correctly) is how does tuple
updating/deletion work.
The current heap implementation is for "non-overwriting storage
management", but that's a mouthful and acronyms based on
"non-overwriting" don't seem great ("noheap" seems a bit silly. Maybe
"nowheap" is better? How about "nosheap"?)
Maybe we can take the easy way and use something like "stdheap".
Or just "nheap".
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Call it "pile" and "hoard":
https://www.thesaurus.com/browse/heap
https://www.thesaurus.com/browse/pile
https://www.thesaurus.com/browse/hoard
ср, 19 дек. 2018 г. в 07:17, Andres Freund <andres@anarazel.de>:
Hi,
The current pluggable table storage patchset [1] introduces the ability
to specify the access method of a table (CREATE TABLE ... USING
"ident"). The patchset currently names the current storage method
(i.e. heapam.c et al) "heap" (whereas e.g. zheap's in named, drumroll,
zheap).I'm concerned that naming it heap, and the corresponding functions
fitting with that name, will be confusing. There's a lot of functions
that have a heap_ prefix (or similar) that aren't really dependent on
how the storage works, but much more general infrastructure (consider
e.g. heap_create_with_catalog()).One solution is to just live with the confusion, add a few header
comments to files like src/backend/catalog/heap.c, explaining that
they're more general than heapam.c (and in the patch heapam_handler.c,
which mostly dispatches to heapam.c).Another approach would be to come up with a different, potentially
witty, name for the current postgres table access method. Then either
rename a few of the heapam.c et functions, or live with the slightly
more confusing names.Another would be to be aggressive in renaming, and deconflict by
renaming functions like heap_create[_with_catalog] etc to sound more
accurate. I think that has some appeal, because a lot of those names
aren't describing their tasks particularly well.Greetings,
Andres Freund
[1]
/messages/by-id/20180703070645.wchpu5muyto5n647@alap3.anarazel.de--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
On Wed, Dec 19, 2018 at 08:32:28AM -0300, Alvaro Herrera wrote:
Hi,
On 2018-Dec-19, Andres Freund wrote:
Well, that depends on what "non-heap layouts" you're thinking of. I
think there'd be some further work needed to make efficient IOTs
possible, but the patchset gets us a long way to be able to do that in a
pluggable fashion. Biggest problem would probably be extending the
existing index AMs, for secondary indexes, to point to a key wider than
a tid, without loosing too much efficiency.I think the important question is will we eventually get the option to
do "CREATE TABLE ... STORAGE indexorg" (or whatever name) rather than
are we already getting that feature, and I think the answer is clearly
yes, so we should keep using the word "heap" in the name as the primary
feature of the historical implementation.The "zheap" name makes it clear that it is still a heap; the main
difference (if I understand correctly) is how does tuple
updating/deletion work.The current heap implementation is for "non-overwriting storage
management", but that's a mouthful and acronyms based on
"non-overwriting" don't seem great ("noheap" seems a bit silly. Maybe
"nowheap" is better? How about "nosheap"?)Maybe we can take the easy way and use something like "stdheap".
Or just "nheap".
oheap for "original?"
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Dec 18, 2018 at 11:17 PM Andres Freund <andres@anarazel.de> wrote:
The current pluggable table storage patchset [1] introduces the ability
to specify the access method of a table (CREATE TABLE ... USING
"ident"). The patchset currently names the current storage method
(i.e. heapam.c et al) "heap" (whereas e.g. zheap's in named, drumroll,
zheap).
I vote for calling the current heap "heap" - i.e. what the patchset is
currently doing. As others have already noted, that's a perfectly
good word for storing stuff in no particular order, and it's also a
term with a very long history. If we call it "oheap" or "pile" or
something based on a clever pun, then we'll just be making users learn
a new word for, as far as I can see, no real benefit.
Another would be to be aggressive in renaming, and deconflict by
renaming functions like heap_create[_with_catalog] etc to sound more
accurate. I think that has some appeal, because a lot of those names
aren't describing their tasks particularly well.
I like that option.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 19 Dec 2018, at 20:21, Robert Haas <robertmhaas@gmail.com> wrote:
I vote for calling the current heap "heap" - i.e. what the patchset is
currently doing. As others have already noted, that's a perfectly
good word for storing stuff in no particular order, and it's also a
term with a very long history.
FWIW, +1
cheers ./daniel
Hi,
On 2018-12-19 14:21:29 -0500, Robert Haas wrote:
On Tue, Dec 18, 2018 at 11:17 PM Andres Freund <andres@anarazel.de> wrote:
Another would be to be aggressive in renaming, and deconflict by
renaming functions like heap_create[_with_catalog] etc to sound more
accurate. I think that has some appeal, because a lot of those names
aren't describing their tasks particularly well.I like that option.
I'd like to start doing that by moving the functions in the following
heapam.h block elsewhere:
/* in heap/heapam.c */
extern Relation relation_open(Oid relationId, LOCKMODE lockmode);
extern Relation try_relation_open(Oid relationId, LOCKMODE lockmode);
extern Relation relation_openrv(const RangeVar *relation, LOCKMODE lockmode);
extern Relation relation_openrv_extended(const RangeVar *relation,
LOCKMODE lockmode, bool missing_ok);
extern void relation_close(Relation relation, LOCKMODE lockmode);
extern Relation heap_open(Oid relationId, LOCKMODE lockmode);
extern Relation heap_openrv(const RangeVar *relation, LOCKMODE lockmode);
extern Relation heap_openrv_extended(const RangeVar *relation,
LOCKMODE lockmode, bool missing_ok);
ISTM that the first block would best belong into new files like
access/rel[ation].h and access/common/rel[ation].h. I think the second
set should be renamed to be table_open() (with backward compat
redirects, there's way way too many references) and should go into
access/table.h access/table/table.c alongside tableam.[ch], but I could
also see just putting them into relation.[ch].
Comments?
Greetings,
Andres Freund
On Fri, Jan 11, 2019 at 11:05 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-12-19 14:21:29 -0500, Robert Haas wrote:
On Tue, Dec 18, 2018 at 11:17 PM Andres Freund <andres@anarazel.de>
wrote:
Another would be to be aggressive in renaming, and deconflict by
renaming functions like heap_create[_with_catalog] etc to sound more
accurate. I think that has some appeal, because a lot of those names
aren't describing their tasks particularly well.I like that option.
I'd like to start doing that by moving the functions in the following
heapam.h block elsewhere:/* in heap/heapam.c */
extern Relation relation_open(Oid relationId, LOCKMODE lockmode);
extern Relation try_relation_open(Oid relationId, LOCKMODE lockmode);
extern Relation relation_openrv(const RangeVar *relation, LOCKMODE
lockmode);
extern Relation relation_openrv_extended(const RangeVar *relation,
LOCKMODE lockmode, bool
missing_ok);
extern void relation_close(Relation relation, LOCKMODE lockmode);extern Relation heap_open(Oid relationId, LOCKMODE lockmode);
extern Relation heap_openrv(const RangeVar *relation, LOCKMODE lockmode);
extern Relation heap_openrv_extended(const RangeVar *relation,
LOCKMODE lockmode, bool
missing_ok);ISTM that the first block would best belong into new files like
access/rel[ation].h and access/common/rel[ation].h. I think the second
set should be renamed to be table_open() (with backward compat
redirects, there's way way too many references) and should go into
access/table.h access/table/table.c alongside tableam.[ch], but I could
also see just putting them into relation.[ch].Comments?
Yes, that will be good.
Regards,
Haribabu Kommi
Fujitsu Australia
On Thu, Jan 10, 2019 at 7:05 PM Andres Freund <andres@anarazel.de> wrote:
ISTM that the first block would best belong into new files like
access/rel[ation].h and access/common/rel[ation].h.
+1.
I think the second
set should be renamed to be table_open() (with backward compat
redirects, there's way way too many references) and should go into
access/table.h access/table/table.c alongside tableam.[ch],
Sounds reasonable.
but I could
also see just putting them into relation.[ch].
I would view that as a less-preferred alternative, but not crazy.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2019-01-11 12:01:36 -0500, Robert Haas wrote:
On Thu, Jan 10, 2019 at 7:05 PM Andres Freund <andres@anarazel.de> wrote:
ISTM that the first block would best belong into new files like
access/rel[ation].h and access/common/rel[ation].h.+1.
I think the second
set should be renamed to be table_open() (with backward compat
redirects, there's way way too many references) and should go into
access/table.h access/table/table.c alongside tableam.[ch],Sounds reasonable.
but I could
also see just putting them into relation.[ch].I would view that as a less-preferred alternative, but not crazy.
Here's a set of patches. Not commit quality, but enough to discuss.
The first patch, the only really interesting one, splits out
relation_(open|openrv|openrv_extended|close) into access/relation.h and access/common/relation.c
and
heap_(open|openrv|openrv_extended|close) into access/table.h and access/table/table.c
It's worthwhile to note that there's another header named
nodes/relation.h. But there's also utils/rel.h, so I couldn't think of a
another good name.
I'm basically thinking that table.h, even in the post pluggable storage
world, should not contain lower level functionality like dispatching
into table-am (that'll reside in tableam.h). But e.g. a
simple_table_(insert|update|delete) could live there, as well as
potentially some other heap_ functionality strewn around the backend.
I made table.h not include relation.h, which means that a few files
might need both. I'm not sure that's the right choice, but it seems
easier to extend that later if shows to be painful, than to do the
reverse.
I've left the following in table.h:
/*
* heap_ used to be the prefix for these routines, and a lot of code will just
* continue to work without adaptions after the introduction of pluggable
* storage, therefore just map these names.
*/
#define heap_open(r, l) table_open(r, l)
#define heap_openrv(r, l) table_openrv(r, l)
#define heap_openrv_extended(r, l, m) table_openrv_extended(r, l, m)
#define heap_close(r, l) table_close(r, l)
and I think we should leave that in there for the forseeable future.
Patches 0002 replaces includes of heapam.h with relation.h / table.h
where appropriate. 0003 renames all in-core references of
heap_(open|openrv|openrv_extended|close) with the table_ variant.
Does this seem basically sensible? Different ideas?
Greetings,
Andres Freund
Attachments:
v1-0001-WIP-Introduce-access-table.h-access-relation.h.patchtext/x-diff; charset=us-asciiDownload+428-311
v1-0002-Replace-heapam.h-includes-with-relation.h-table.h.patchtext/x-diff; charset=us-asciiDownload+115-98
v1-0003-Replace-uses-of-heap_open-et-al-with-table_open-e.patchtext/x-diff; charset=us-asciiDownload+1240-1241
On Tue, Jan 15, 2019 at 1:43 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-01-11 12:01:36 -0500, Robert Haas wrote:
On Thu, Jan 10, 2019 at 7:05 PM Andres Freund <andres@anarazel.de>
wrote:
ISTM that the first block would best belong into new files like
access/rel[ation].h and access/common/rel[ation].h.+1.
I think the second
set should be renamed to be table_open() (with backward compat
redirects, there's way way too many references) and should go into
access/table.h access/table/table.c alongside tableam.[ch],Sounds reasonable.
but I could
also see just putting them into relation.[ch].I would view that as a less-preferred alternative, but not crazy.
Here's a set of patches. Not commit quality, but enough to discuss.
The first patch, the only really interesting one, splits out
relation_(open|openrv|openrv_extended|close) into access/relation.h and
access/common/relation.c
and
heap_(open|openrv|openrv_extended|close) into access/table.h and
access/table/table.cIt's worthwhile to note that there's another header named
nodes/relation.h. But there's also utils/rel.h, so I couldn't think of a
another good name.
access/relation.[c|h] name is fine. Or how about access/rel.[c|h],
because nodes/relation.h is related to planner. utils/rel.h is some how
related to relation caches.
I'm basically thinking that table.h, even in the post pluggable storage
world, should not contain lower level functionality like dispatching
into table-am (that'll reside in tableam.h). But e.g. a
simple_table_(insert|update|delete) could live there, as well as
potentially some other heap_ functionality strewn around the backend.I made table.h not include relation.h, which means that a few files
might need both. I'm not sure that's the right choice, but it seems
easier to extend that later if shows to be painful, than to do the
reverse.
The need of both relation.h and table.h into a single file is because of
use of both relation_open table_open functions. when I checked one
of the file where both headers are included, I found that it simply used
the relation_open function even the type of the relation is known.
I didn't check whether it is possible or not? In case if the kind of the
relation
is known at every caller of relation_open, it can be replaced with either
table_open or index_open or composite_type_open functions. So that
there may not need any of the relation functions needs to be exposed.
I've left the following in table.h:
/*
* heap_ used to be the prefix for these routines, and a lot of code will
just
* continue to work without adaptions after the introduction of pluggable
* storage, therefore just map these names.
*/
#define heap_open(r, l) table_open(r, l)
#define heap_openrv(r, l) table_openrv(r, l)
#define heap_openrv_extended(r, l, m) table_openrv_extended(r, l, m)
#define heap_close(r, l) table_close(r, l)and I think we should leave that in there for the forseeable future.
Above typedefs are good. They are useful to avoid any third party
extensions.
Regards,
Haribabu Kommi
Fujitsu Australia
On Tue, Jan 15, 2019 at 10:23 PM Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
access/relation.[c|h] name is fine. Or how about access/rel.[c|h],
because nodes/relation.h is related to planner. utils/rel.h is some how
related to relation caches.
Insofar as we can reasonably do so, I'd rather pick unique names for
header files. I know that there's no law that rules out having both
nodes/relation.h and access/relation.h, or likewise utils/rel.h and
access/rel.h; the computer won't be confused. But it might create
some confusion among human beings, so my vote is for avoiding that
sort of thing if we can.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Jan 15, 2019 at 10:23 PM Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:access/relation.[c|h] name is fine. Or how about access/rel.[c|h],
because nodes/relation.h is related to planner. utils/rel.h is some how
related to relation caches.
Insofar as we can reasonably do so, I'd rather pick unique names for
header files.
+1
regards, tom lane
On January 16, 2019 8:08:09 AM PST, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jan 15, 2019 at 10:23 PM Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:access/relation.[c|h] name is fine. Or how about access/rel.[c|h],
because nodes/relation.h is related to planner. utils/rel.h is somehow
related to relation caches.
Insofar as we can reasonably do so, I'd rather pick unique names for
header files. I know that there's no law that rules out having both
nodes/relation.h and access/relation.h, or likewise utils/rel.h and
access/rel.h; the computer won't be confused. But it might create
some confusion among human beings, so my vote is for avoiding that
sort of thing if we can.
I prefer that too - if the new name isn't worse enough to make it hard to remember. I'd welcome suggestions that don't conflict...
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Hi,
On 2019-01-16 08:20:37 -0800, Andres Freund wrote:
On January 16, 2019 8:08:09 AM PST, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jan 15, 2019 at 10:23 PM Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:access/relation.[c|h] name is fine. Or how about access/rel.[c|h],
because nodes/relation.h is related to planner. utils/rel.h is somehow
related to relation caches.
Insofar as we can reasonably do so, I'd rather pick unique names for
header files. I know that there's no law that rules out having both
nodes/relation.h and access/relation.h, or likewise utils/rel.h and
access/rel.h; the computer won't be confused. But it might create
some confusion among human beings, so my vote is for avoiding that
sort of thing if we can.I prefer that too - if the new name isn't worse enough to make it hard
to remember. I'd welcome suggestions that don't conflict...
Unless somebody comes up with a better suggestion I'm planning to press
ahead with this one. It's large enough to be a bit of a pain to maintain
over time... I'm absolutely not wedded to access/relation.h, so I'm
happy with another good suggestion, or somebody revising it subsequently.
Greetings,
Andres Freund