What is an item pointer, anyway?

Started by Peter Geogheganalmost 7 years ago11 messageshackers
Jump to latest

itemid.h introduces the struct ItemIdData as follows:

/*
* An item pointer (also called line pointer) on a buffer page

Meanwhile, itemptr.h introduces the struct ItemPointerData as follows:

/*
* ItemPointer:
*
* This is a pointer to an item within a disk page of a known file
* (for example, a cross-link from an index to its parent table).

It doesn't seem reasonable to assume that you should know the
difference based on context. The two concepts are closely related. An
ItemPointerData points to a block, as well as the, uh, item pointer
within that block.

This ambiguity is avoidable, and should be avoided. ISTM that the
least confusing way of removing the ambiguity would be to no longer
refer to ItemIds as item pointers, without changing anything else.

--
Peter Geoghegan

#2Ashwin Agrawal
aagrawal@pivotal.io
In reply to: Peter Geoghegan (#1)
Re: What is an item pointer, anyway?

On Fri, Apr 26, 2019 at 2:19 PM Peter Geoghegan <pg@bowt.ie> wrote:

itemid.h introduces the struct ItemIdData as follows:

/*
* An item pointer (also called line pointer) on a buffer page

Meanwhile, itemptr.h introduces the struct ItemPointerData as follows:

/*
* ItemPointer:
*
* This is a pointer to an item within a disk page of a known file
* (for example, a cross-link from an index to its parent table).

It doesn't seem reasonable to assume that you should know the
difference based on context. The two concepts are closely related. An
ItemPointerData points to a block, as well as the, uh, item pointer
within that block.

This ambiguity is avoidable, and should be avoided.

Agree.

ISTM that the
least confusing way of removing the ambiguity would be to no longer
refer to ItemIds as item pointers, without changing anything else.

How about we rename ItemPointerData to TupleIdentifier or ItemIdentifier
instead and leave ItemPointer or Item confined to AM term, where item can
be tuple, datum or anything else ?

In reply to: Ashwin Agrawal (#2)
Re: What is an item pointer, anyway?

On Fri, Apr 26, 2019 at 4:23 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote:

How about we rename ItemPointerData to TupleIdentifier or ItemIdentifier instead and leave ItemPointer or Item confined to AM term, where item can be tuple, datum or anything else ?

I'm not a fan of that idea, because the reality is that an
ItemPointerData is quite explicitly supposed to be a physiological
identifier (TID) used by heapam, or a similar heap-like access method
such as zheap. This is baked into a number of things.

The limitation that pluggable storage engines have to work in terms of
item pointers is certainly a problem, especially for things like the
Zedstore column store project you're working on. However, I suspect
that that problem is best solved by accommodating other types of
identifiers that don't work like TIDs.

I understand why you've adopted ItemPointerData as a fully-logical
identifier in your prototype, but it's not a great long term solution.

--
Peter Geoghegan

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ashwin Agrawal (#2)
Re: What is an item pointer, anyway?

Ashwin Agrawal <aagrawal@pivotal.io> writes:

On Fri, Apr 26, 2019 at 2:19 PM Peter Geoghegan <pg@bowt.ie> wrote:

ISTM that the
least confusing way of removing the ambiguity would be to no longer
refer to ItemIds as item pointers, without changing anything else.

How many places would we be changing to clean that up?

How about we rename ItemPointerData to TupleIdentifier or ItemIdentifier
instead and leave ItemPointer or Item confined to AM term, where item can
be tuple, datum or anything else ?

There's half a thousand references to ItemPointer[Data] in our
sources, and probably tons more in external modules. I'm *not*
in favor of renaming it.

ItemId[Data] is somewhat less widely referenced, but I'm still not
much in favor of renaming that type. I think fixing comments to
uniformly call it an item ID would be more reasonable. (We should
leave the "line pointer" terminology in place, too; if memory serves,
an awful lot of variables of the type are named "lp" or variants.
Renaming all of those is to nobody's benefit.)

regards, tom lane

In reply to: Tom Lane (#4)
Re: What is an item pointer, anyway?

On Fri, Apr 26, 2019 at 4:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

ItemId[Data] is somewhat less widely referenced, but I'm still not
much in favor of renaming that type. I think fixing comments to
uniformly call it an item ID would be more reasonable. (We should
leave the "line pointer" terminology in place, too; if memory serves,
an awful lot of variables of the type are named "lp" or variants.
Renaming all of those is to nobody's benefit.)

I was proposing that we not rename any struct at all, and continue to
call ItemId[Data]s "line pointers" only. This would involve removing
the comment in itemid.h that confusingly refers to line pointers as
"item pointers" (plus any other comments that fail to make a clear
distinction).

I think that the total number of comments that would be affected by
this approach is quite low.

--
Peter Geoghegan

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#5)
Re: What is an item pointer, anyway?

Peter Geoghegan <pg@bowt.ie> writes:

I was proposing that we not rename any struct at all, and continue to
call ItemId[Data]s "line pointers" only.

Yeah, I'd be fine with that, although the disconnect between the type
name and the comment terminology might confuse some people.

regards, tom lane

In reply to: Tom Lane (#6)
Re: What is an item pointer, anyway?

On Fri, Apr 26, 2019 at 5:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, I'd be fine with that, although the disconnect between the type
name and the comment terminology might confuse some people.

Maybe, but the fact that the ItemIdData struct consists of bit fields
that are all named "lp_*" offers a hint. Plus you have the LP_*
constants that get stored in ItemIdData.lp_flags.

I wouldn't call the struct ItemIdData if I was in a green field
situation, but it doesn't seem too bad under the present
circumstances. I'd rather not change the struct's name, because that
would probably cause problems without any real benefit. OTOH, calling
two closely related but distinct things by the same name is atrocious.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#7)
Re: What is an item pointer, anyway?

On Fri, Apr 26, 2019 at 5:13 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Fri, Apr 26, 2019 at 5:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, I'd be fine with that, although the disconnect between the type
name and the comment terminology might confuse some people.

Maybe, but the fact that the ItemIdData struct consists of bit fields
that are all named "lp_*" offers a hint. Plus you have the LP_*
constants that get stored in ItemIdData.lp_flags.

Attached draft patch adjusts code comments and error messages where a
line pointer is referred to as an item pointer. It turns out that this
practice isn't all that prevalent. Here are some specific concerns
that I had to think about when writing the patch, though:

* I ended up removing a big indexam.c "old comments" comment paragraph
from the Berkeley days, because it used the term item pointer in what
seemed like the wrong way, but also because AFAICT it's totally
obsolete.

* Someone should confirm that I have preserved the original intent of
the changes within README.HOT, and the heapam changes that relate to
pruning. It's possible that I changed "item pointer" to "line pointer"
in one or two places where I should have changed "item pointer" to
"tuple" instead.

* I changed a few long standing "can't happen" error messages that
concern corruption, most of which also relate to pruning. Maybe that's
a cost that needs to be considered.

* I changed a lazy_scan_heap() log message of long-standing. Another
downside that needs to be considered.

* I expanded a little on the advantages of using line pointers within
bufpage.h. That seemed in scope to me, because it drives home the
distinction between item pointers and line pointers.

--
Peter Geoghegan

Attachments:

v1-0001-Standardize-line-pointers-item-pointer-terminolog.patchapplication/octet-stream; name=v1-0001-Standardize-line-pointers-item-pointer-terminolog.patchDownload+53-75
In reply to: Peter Geoghegan (#8)
Re: What is an item pointer, anyway?

On Sun, May 5, 2019 at 1:14 PM Peter Geoghegan <pg@bowt.ie> wrote:

Attached draft patch adjusts code comments and error messages where a
line pointer is referred to as an item pointer. It turns out that this
practice isn't all that prevalent. Here are some specific concerns
that I had to think about when writing the patch, though:

Ping? Any objections to pushing ahead with this?

--
Peter Geoghegan

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#9)
Re: What is an item pointer, anyway?

Peter Geoghegan <pg@bowt.ie> writes:

On Sun, May 5, 2019 at 1:14 PM Peter Geoghegan <pg@bowt.ie> wrote:

Attached draft patch adjusts code comments and error messages where a
line pointer is referred to as an item pointer. It turns out that this
practice isn't all that prevalent. Here are some specific concerns
that I had to think about when writing the patch, though:

Ping? Any objections to pushing ahead with this?

Patch looks fine to me. One minor quibble: in pruneheap.c you have

 /*
- * Prune specified item pointer or a HOT chain originating at that item.
+ * Prune specified line pointer or a HOT chain originating at that item.
  *
  * If the item is an index-referenced tuple (i.e. not a heap-only tuple),

Should "that item" also be re-worded, for consistency?

regards, tom lane

In reply to: Tom Lane (#10)
Re: What is an item pointer, anyway?

On Mon, May 13, 2019 at 12:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

/*
- * Prune specified item pointer or a HOT chain originating at that item.
+ * Prune specified line pointer or a HOT chain originating at that item.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),

Should "that item" also be re-worded, for consistency?

Yes, it should be -- I'll fix it.

I'm going to backpatch the storage.sgml change on its own, while
pushing everything else in a separate HEAD-only commit.

Thanks
--
Peter Geoghegan