cmax docs seem misleading

Started by Paul Jungwirth2 months ago4 messagesdocs
Jump to latest
#1Paul Jungwirth
pj@illuminatedcomputing.com

The docs for cmax say:[0]https://www.postgresql.org/docs/current/ddl-system-columns.html#DDL-SYSTEM-COLUMNS-CMAX

The command identifier within the deleting transaction, or zero.

This was true once upon a time, I think. But nowadays cmax and cmin
are the same physical field, and the user-facing system columns don't
seem to be trying to interpret it. For example:

[v19devel:5432][334102] regression=# create table pj (a int);
CREATE TABLE
[v19devel:5432][334102] regression=# begin; insert into pj values (1);
insert into pj values (2); commit;
BEGIN
INSERT 0 1
INSERT 0 1
COMMIT
[v19devel:5432][334102] regression=# select ctid, xmin, xmax, cmin,
cmax, * from pj;
ctid | xmin | xmax | cmin | cmax | a
-------+-------+------+------+------+---
(0,1) | 22424 | 0 | 0 | 0 | 1
(0,2) | 22424 | 0 | 1 | 1 | 2

So here you have a non-zero cmax for a not-deleted row.

The converse isn't true either. "Or zero" hints that deleted rows
might always have a non-zero value, but 0 is also just the first
command in the transaction. (Null would be a meaningful signal, but I
assume we don't want to do that.)

As far as I can tell, it is impossible to observe cmin <> cmax. From
heap_getsysattr (access/common/heaptuple.c):

case MinCommandIdAttributeNumber:
case MaxCommandIdAttributeNumber:

/*
* cmin and cmax are now both aliases for the same field, which
* can in fact also be a combo command id. XXX perhaps we should
* return the "real" cmin or cmax if possible, that is if we are
* inside the originating transaction?
*/
result =
CommandIdGetDatum(HeapTupleHeaderGetRawCommandId(tup->t_data));
break;

So it looks like these system columns also don't look up combocids.

I'm not interested in changing any of this, but I think we could clean
up the docs a little. The description for cmin is questionable too:

The command identifier (starting at zero) within the inserting transaction.

That's true if the row hasn't been deleted yet, but then we overwrite the field.

Here is a patch to make both of these fields a little clearer, I
think. It could be improved further by some glossary entries
explaining what a command id is (and a combocid). Or maybe that's too
much information? And maybe we should be more drastic: combine cmin &
cmax into one entry, and explain that they are two names for the same
value, which might signify the insert cid, the delete cid, or a
combocid.

[0]: https://www.postgresql.org/docs/current/ddl-system-columns.html#DDL-SYSTEM-COLUMNS-CMAX

Yours,

--
Paul ~{:-)
pj@illuminatedcomputing.com

Attachments:

v1-0001-docs-Clarify-cmin-and-cmax-system-columns.patchtext/x-patch; charset=US-ASCII; name=v1-0001-docs-Clarify-cmin-and-cmax-system-columns.patchDownload+10-3
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Paul Jungwirth (#1)
Re: cmax docs seem misleading

Paul A Jungwirth <pj@illuminatedcomputing.com> writes:

The docs for cmax say:[0]

The command identifier within the deleting transaction, or zero.

This was true once upon a time, I think. But nowadays cmax and cmin
are the same physical field, and the user-facing system columns don't
seem to be trying to interpret it.

Yeah, this is a mess. Nobody ever updated this text when we decided we
could pack those fields into one. I think it would be better to do
what you suggest:

... And maybe we should be more drastic: combine cmin &
cmax into one entry, and explain that they are two names for the same
value, which might signify the insert cid, the delete cid, or a
combocid.

I'm not sure about good wording, but maybe like

cmin, cmax:

Originally, cmin and cmax were separate fields. cmin was the
inserting command's command identifier within the inserting
transaction, while cmax was the updating or deleting command's
command identifier within the updating/deleting transaction, or
zero if no update or delete attempt had occurred yet. Nowadays
these system columns refer to the same field and will always read
as the same value. That might be the inserting command's command
identifier, or the deleting command's command identifier, or a
"combocid" that reflects both actions when those happened in the
same transaction.

I don't know if we want to go into any more detail than that.

regards, tom lane

#3Paul Jungwirth
pj@illuminatedcomputing.com
In reply to: Tom Lane (#2)
Re: cmax docs seem misleading

On Sun, Mar 29, 2026 at 12:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Paul A Jungwirth <pj@illuminatedcomputing.com> writes:

The docs for cmax say:[0]

The command identifier within the deleting transaction, or zero.

This was true once upon a time, I think. But nowadays cmax and cmin
are the same physical field, and the user-facing system columns don't
seem to be trying to interpret it.

Yeah, this is a mess. Nobody ever updated this text when we decided we
could pack those fields into one. I think it would be better to do
what you suggest:

... And maybe we should be more drastic: combine cmin &
cmax into one entry, and explain that they are two names for the same
value, which might signify the insert cid, the delete cid, or a
combocid.

I'm not sure about good wording, but maybe like

cmin, cmax:

Originally, cmin and cmax were separate fields. cmin was the
inserting command's command identifier within the inserting
transaction, while cmax was the updating or deleting command's
command identifier within the updating/deleting transaction, or
zero if no update or delete attempt had occurred yet. Nowadays
these system columns refer to the same field and will always read
as the same value. That might be the inserting command's command
identifier, or the deleting command's command identifier, or a
"combocid" that reflects both actions when those happened in the
same transaction.

I don't know if we want to go into any more detail than that.

I agree that is plenty of detail for user-facing documentation. I
think your suggested text is a big improvement.

Yours,

--
Paul ~{:-)
pj@illuminatedcomputing.com

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Paul Jungwirth (#3)
Re: cmax docs seem misleading

Paul A Jungwirth <pj@illuminatedcomputing.com> writes:

On Sun, Mar 29, 2026 at 12:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, this is a mess. Nobody ever updated this text when we decided we
could pack those fields into one. I think it would be better to do
what you suggest:

... And maybe we should be more drastic: combine cmin &
cmax into one entry, and explain that they are two names for the same
value, which might signify the insert cid, the delete cid, or a
combocid.

I agree that is plenty of detail for user-facing documentation. I
think your suggested text is a big improvement.

Done like that, then.

regards, tom lane