Using CTID system column as a "temporary" primary key

Started by Sebastien Flaeschabout 3 years ago28 messagesgeneral

Jump to latest

Sebastien Flaesch

sebastien.flaesch@4js.com

about 3 years ago

Hello!

We are looking for an equivalent of Informix ROWID or Oracle's ROWNUM.

Is the CTID a good choice?

I assume it must be used in a specific context, and of course not considered as permanent primary key.

I understand that if the row is updated, the CTID may change.

Where can we find details about the validity and lifetime of the value such column?

Will CTID be supported long term or is there any plan to remove it or hide it some day?

Of course, one should use a real primary key definition. However, we have legacy code to adapt to PostgreSQL, and in some cases, tables have a composite primary key. A first SELECT uses that primary key, but it also fetches the ROWID, and will use that one in a subsequent SELECT, UPDATE or DELETE, instead of carrying the composite pkey values.

Seb

Sebastien Flaesch

sebastien.flaesch@4js.com

about 3 years ago

In reply to: Sebastien Flaesch (#1)

Re: Using CTID system column as a "temporary" primary key

I mean Oracle's ROWID of course, not ROWNUM.
ROWNUM is temporary in the context of the SELECT, so it cannot be used in subsequent SQL statements.
Seb
________________________________
From: Sebastien Flaesch <sebastien.flaesch@4js.com>
Sent: Tuesday, March 28, 2023 11:28 AM
To: pgsql-general <pgsql-general@lists.postgresql.org>
Subject: Using CTID system column as a "temporary" primary key

EXTERNAL: Do not click links or open attachments if you do not recognize the sender.

Hello!

We are looking for an equivalent of Informix ROWID or Oracle's ROWNUM.

Is the CTID a good choice?

I assume it must be used in a specific context, and of course not considered as permanent primary key.

I understand that if the row is updated, the CTID may change.

Where can we find details about the validity and lifetime of the value such column?

Will CTID be supported long term or is there any plan to remove it or hide it some day?

Seb

Geoff Winkless

pgsqladmin@geoff.dj

about 3 years ago

In reply to: Sebastien Flaesch (#1)

Re: Using CTID system column as a "temporary" primary key

On Tue, 28 Mar 2023 at 10:28, Sebastien Flaesch <sebastien.flaesch@4js.com>
wrote:

Is the CTID a good choice?

I think if you're honest with yourself you already know the answer to this
question. The only real solution is to update the legacy code to use the
primary key, or (if that's not possible) change the table definition to add
your own indexed BIGSERIAL value called "ROWID" to the rows and use that
instead (assuming it will be large enough).

Geoff

Sebastien Flaesch

sebastien.flaesch@4js.com

about 3 years ago

In reply to: Sebastien Flaesch (#2)

Re: Using CTID system column as a "temporary" primary key

How large can the string representation of a CTID (TID type) be?

This page mentions 6 bytes for t_ctid / ItemPointerData...
=> how can I deduce the max size of its string representation?

https://www.postgresql.org/docs/14/storage-page-layout.html

Seb
________________________________
From: Sebastien Flaesch <sebastien.flaesch@4js.com>
Sent: Tuesday, March 28, 2023 11:57 AM
To: pgsql-general <pgsql-general@lists.postgresql.org>
Subject: Re: Using CTID system column as a "temporary" primary key

EXTERNAL: Do not click links or open attachments if you do not recognize the sender.

Hello!

We are looking for an equivalent of Informix ROWID or Oracle's ROWNUM.

Is the CTID a good choice?

I assume it must be used in a specific context, and of course not considered as permanent primary key.

I understand that if the row is updated, the CTID may change.

Where can we find details about the validity and lifetime of the value such column?

Will CTID be supported long term or is there any plan to remove it or hide it some day?

Seb

Sebastien Flaesch

sebastien.flaesch@4js.com

about 3 years ago

In reply to: Geoff Winkless (#3)

Re: Using CTID system column as a "temporary" primary key

Hi Geoff,

Your remark makes total sense, and this is what should be done.

However, we have to deal with quite complex legacy 4GL code that we prefer to not touch, and we can adapt the SQL statements on the fly with our solution (kind of Java compiler/runtime system).

Next question:

How can I UPDATE or DELETE a row, with the CTID column?

When I bind a string parameter, I get this error:

SQLSTATE = 42883
MESSAGE: operator does not exist: tid = character varying

Do I have to cast() ?

Seb

________________________________
From: Geoff Winkless <pgsqladmin@geoff.dj>
Sent: Tuesday, March 28, 2023 12:20 PM
To: Sebastien Flaesch <sebastien.flaesch@4js.com>
Cc: pgsql-general <pgsql-general@lists.postgresql.org>
Subject: Re: Using CTID system column as a "temporary" primary key

EXTERNAL: Do not click links or open attachments if you do not recognize the sender.

On Tue, 28 Mar 2023 at 10:28, Sebastien Flaesch <sebastien.flaesch@4js.com<mailto:sebastien.flaesch@4js.com>> wrote:
Is the CTID a good choice?

I think if you're honest with yourself you already know the answer to this question. The only real solution is to update the legacy code to use the primary key, or (if that's not possible) change the table definition to add your own indexed BIGSERIAL value called "ROWID" to the rows and use that instead (assuming it will be large enough).

Geoff

Christophe Pettus

xof@thebuild.com

about 3 years ago

In reply to: Sebastien Flaesch (#5)

Re: Using CTID system column as a "temporary" primary key

On Mar 28, 2023, at 03:39, Sebastien Flaesch <sebastien.flaesch@4js.com> wrote:
Do I have to cast() ?

Yes:

select * from t where ctid='(0,1)'::tid;

The string representation can be up to 17 characters: 10 for the page number, 4 for the tuple number, and three for the delimiters.

Remember that updating a row changes its CTID.

Sebastien Flaesch

sebastien.flaesch@4js.com

about 3 years ago

In reply to: Christophe Pettus (#6)

Re: Using CTID system column as a "temporary" primary key

CAST seems to do the job so that's a good solution here.

Seb
________________________________
From: Christophe Pettus <xof@thebuild.com>
Sent: Tuesday, March 28, 2023 2:39 PM
To: Sebastien Flaesch <sebastien.flaesch@4js.com>
Cc: Geoff Winkless <pgsqladmin@geoff.dj>; pgsql-general <pgsql-general@lists.postgresql.org>
Subject: Re: Using CTID system column as a "temporary" primary key

EXTERNAL: Do not click links or open attachments if you do not recognize the sender.

On Mar 28, 2023, at 03:39, Sebastien Flaesch <sebastien.flaesch@4js.com> wrote:
Do I have to cast() ?

Yes:

select * from t where ctid='(0,1)'::tid;

The string representation can be up to 17 characters: 10 for the page number, 4 for the tuple number, and three for the delimiters.

Remember that updating a row changes its CTID.

Maciek Sakrejda

m.sakrejda@gmail.com

about 3 years ago

In reply to: Sebastien Flaesch (#7)

Re: Using CTID system column as a "temporary" primary key

Note that VACUUM FULL and CLUSTER can update ctids. I don't believe regular
VACUUM can, so you should be safe from autovacuum interfering in this
scheme, but the ctid colum documentation [1]https://www.postgresql.org/docs/current/ddl-system-columns.html states "A primary key should
be used to identify logical rows," so this is not exactly intended usage.

Thanks,
Maciek

[1]: https://www.postgresql.org/docs/current/ddl-system-columns.html

Kirk Wolak

wolakk@gmail.com

about 3 years ago

In reply to: Sebastien Flaesch (#5)

Re: Using CTID system column as a "temporary" primary key

On Tue, Mar 28, 2023 at 6:40 AM Sebastien Flaesch <sebastien.flaesch@4js.com>
wrote:

...

I think if you're honest with yourself you already know the answer to this
question. The only real solution is to update the legacy code to use the
primary key, or (if that's not possible) change the table definition to add
your own indexed BIGSERIAL value called "ROWID" to the rows and use that
instead (assuming it will be large enough).

Geoff

I have to second this... Why not, during conversion, create a ROWID
BIGSERIAL column in the PG only version. (And if not large enough, it's
easy enough to use a NUMERIC field, and a sequence)
Then the code would have access to this field, and when it uses it to
delete/update it should work.

I cringe at the thought of using CTID. And while it's probably "safe
enough" inside a single transaction. I doubt that there is much "testing"
of this concept.

Having been through this process (Oracle to PG), I wonder how far you are
into the process... Because Packages/Package Variables, Global Temp
Tables, and Autonomous Transactions all consumed significant time in our
process, as well as variable/field naming problems... If you pull off
converting this to PG without changing the source. Let me know...

Regards, Kirk

#10

Laurenz Albe

laurenz.albe@cybertec.at

about 3 years ago

In reply to: Kirk Wolak (#9)

Re: Using CTID system column as a "temporary" primary key

On Tue, 2023-03-28 at 14:24 -0400, Kirk Wolak wrote:

I cringe at the thought of using CTID. And while it's probably "safe enough"
inside a single transaction. I doubt that there is much "testing" of this concept.

It is safe to assume that the CTID is stable within a single transaction
only if you use REPEATABLE READ or better transaction isolation level.

With READ COMMITTED, you see updated rows (and consequently changed CTID)
within a single transaction. And if you use SELECT ... FOR UPDATE, you
could even see a changed CTID within a single statement.

So don't use CTID to identify rows unless you use REPEATABLE READ or better.

Yours,
Laurenz Albe

#11

Sebastien Flaesch

sebastien.flaesch@4js.com

about 3 years ago

In reply to: Kirk Wolak (#9)

Re: Using CTID system column as a "temporary" primary key

Hello Kirk,

We are pushing our customers to use only "pure" SQL without writing triggers or stored procedures, to not be stuck with a specific DB vendor.

We have a quite good vision of what is SQL portable and what is not SQL portable.

Concurrent data access is one these topic, especially when using old-style Informix pessimistic locking where you declare a cursor FOR UPDATE, fetch the row to set an exclusive lock, until the end user has finished to modify the record in the form, then do the UPDATE and close the cursor or commit the TX to release the lock. Involves all concepts of concurrent data access (isolation level, lock wait mode, locks and locking granularity, transactions) - best solution I found so far is: Committed read isolation level, wait for locks to the released (with timeout like 10 seconds), do short transaction to hold locks only for a fraction of seconds.

For sure the application code needs to be modified.

Adding a ROWID BIGSERIAL is an option we consider, but then it has other constraints.

INSERT statements must not use the serial column, so you have to list all columns of the table and provide only the values of the non-serial columns. With Informix you could just specific a zero to get a new generated serial, but seems this has never been considered with PostgreSQL.

SELECT * FROM table will return all column, user-defined ROWID included...
This is not the case with Informix or Oracle ROWID columns.
So, either you specify all columns except user-def ROWID or you add the rowid field to the program variable structure that receives the row.

...

Seb
________________________________
From: Kirk Wolak <wolakk@gmail.com>
Sent: Tuesday, March 28, 2023 8:24 PM
To: Sebastien Flaesch <sebastien.flaesch@4js.com>
Cc: Geoff Winkless <pgsqladmin@geoff.dj>; pgsql-general <pgsql-general@lists.postgresql.org>
Subject: Re: Using CTID system column as a "temporary" primary key

EXTERNAL: Do not click links or open attachments if you do not recognize the sender.

On Tue, Mar 28, 2023 at 6:40 AM Sebastien Flaesch <sebastien.flaesch@4js.com<mailto:sebastien.flaesch@4js.com>> wrote:
...

I think if you're honest with yourself you already know the answer to this question. The only real solution is to update the legacy code to use the primary key, or (if that's not possible) change the table definition to add your own indexed BIGSERIAL value called "ROWID" to the rows and use that instead (assuming it will be large enough).

Geoff

I have to second this... Why not, during conversion, create a ROWID BIGSERIAL column in the PG only version. (And if not large enough, it's easy enough to use a NUMERIC field, and a sequence)
Then the code would have access to this field, and when it uses it to delete/update it should work.

I cringe at the thought of using CTID. And while it's probably "safe enough" inside a single transaction. I doubt that there is much "testing" of this concept.

Having been through this process (Oracle to PG), I wonder how far you are into the process... Because Packages/Package Variables, Global Temp Tables, and Autonomous Transactions all consumed significant time in our process, as well as variable/field naming problems... If you pull off converting this to PG without changing the source. Let me know...

Regards, Kirk

#12

Sebastien Flaesch

sebastien.flaesch@4js.com

about 3 years ago

In reply to: Laurenz Albe (#10)

Re: Using CTID system column as a "temporary" primary key

Laurent,

Thanks for the advice about REPEATABLE READ isolation level!

Seb
________________________________
From: Laurenz Albe <laurenz.albe@cybertec.at>
Sent: Wednesday, March 29, 2023 1:08 PM
To: Kirk Wolak <wolakk@gmail.com>; Sebastien Flaesch <sebastien.flaesch@4js.com>
Cc: Geoff Winkless <pgsqladmin@geoff.dj>; pgsql-general <pgsql-general@lists.postgresql.org>
Subject: Re: Using CTID system column as a "temporary" primary key

EXTERNAL: Do not click links or open attachments if you do not recognize the sender.

On Tue, 2023-03-28 at 14:24 -0400, Kirk Wolak wrote:

I cringe at the thought of using CTID. And while it's probably "safe enough"
inside a single transaction. I doubt that there is much "testing" of this concept.

It is safe to assume that the CTID is stable within a single transaction
only if you use REPEATABLE READ or better transaction isolation level.

So don't use CTID to identify rows unless you use REPEATABLE READ or better.

Yours,
Laurenz Albe

#13

Adrian Klaver

adrian.klaver@aklaver.com

about 3 years ago

In reply to: Sebastien Flaesch (#11)

Re: Using CTID system column as a "temporary" primary key

On 3/29/23 07:19, Sebastien Flaesch wrote:

Hello Kirk,

INSERT statements must not use the serial column, so you have to list
all columns of the table and provide only the values of the non-serial
columns. With Informix you could just specific a zero to get a new
generated serial, but seems this has never been considered with PostgreSQL.

Yes it has:

insert into seq_test values(default, 'test');

select * from seq_test;
id | fld_1
----+-------
1 | test

SELECT * FROM table will return all column, user-defined ROWID included...
This is not the case with Informix or Oracle ROWID columns.
So, either you specify all columns except user-def ROWID or you add the
rowid field to the program variable structure that receives the row.

...

Seb
------------------------------------------------------------------------

--
Adrian Klaver
adrian.klaver@aklaver.com

#14

Laurenz Albe

laurenz.albe@cybertec.at

about 3 years ago

In reply to: Sebastien Flaesch (#12)

Re: Using CTID system column as a "temporary" primary key

On Wed, 2023-03-29 at 14:23 +0000, Sebastien Flaesch wrote:

From: Laurenz Albe <laurenz.albe@cybertec.at>

It is safe to assume that the CTID is stable within a single transaction
only if you use REPEATABLE READ or better transaction isolation level.

With READ COMMITTED, you see updated rows (and consequently changed CTID)
within a single transaction. And if you use SELECT ... FOR UPDATE, you
could even see a changed CTID within a single statement.

So don't use CTID to identify rows unless you use REPEATABLE READ or better.

Thanks for the advice about REPEATABLE READ isolation level!

... but that is only useful in a read-only scenario.

If you try to UPDATE the row in a REPEATABLE READ transaction, you
will get a serialization error if there was a concurrent update.

In short: don't use the CTID to identify a row.

Yours,
Laurenz Albe

#15

Peter J. Holzer

hjp-pgsql@hjp.at

about 3 years ago

In reply to: Adrian Klaver (#13)

Re: Using CTID system column as a "temporary" primary key

On 2023-03-29 07:59:54 -0700, Adrian Klaver wrote:

On 3/29/23 07:19, Sebastien Flaesch wrote:

INSERT statements must not use the serial column, so you have to list
all columns of the table and provide only the values of the non-serial
columns. With Informix you could just specific a zero to get a new
generated serial, but seems this has never been considered with
PostgreSQL.

Yes it has:

[...]

insert into seq_test values(default, 'test');

Default is not the same as zero.

#16

Sebastien Flaesch

sebastien.flaesch@4js.com

about 3 years ago

In reply to: Laurenz Albe (#14)

Re: Using CTID system column as a "temporary" primary key

Ok... sounds not good all in all.
Appreciate your help!
Thanks!
________________________________
From: Laurenz Albe <laurenz.albe@cybertec.at>
Sent: Wednesday, March 29, 2023 5:53 PM
To: Sebastien Flaesch <sebastien.flaesch@4js.com>; Kirk Wolak <wolakk@gmail.com>
Cc: Geoff Winkless <pgsqladmin@geoff.dj>; pgsql-general <pgsql-general@lists.postgresql.org>
Subject: Re: Using CTID system column as a "temporary" primary key

EXTERNAL: Do not click links or open attachments if you do not recognize the sender.

On Wed, 2023-03-29 at 14:23 +0000, Sebastien Flaesch wrote:

From: Laurenz Albe <laurenz.albe@cybertec.at>

It is safe to assume that the CTID is stable within a single transaction
only if you use REPEATABLE READ or better transaction isolation level.

With READ COMMITTED, you see updated rows (and consequently changed CTID)
within a single transaction. And if you use SELECT ... FOR UPDATE, you
could even see a changed CTID within a single statement.

So don't use CTID to identify rows unless you use REPEATABLE READ or better.

Thanks for the advice about REPEATABLE READ isolation level!

... but that is only useful in a read-only scenario.

If you try to UPDATE the row in a REPEATABLE READ transaction, you
will get a serialization error if there was a concurrent update.

In short: don't use the CTID to identify a row.

Yours,
Laurenz Albe

#17

Sebastien Flaesch

sebastien.flaesch@4js.com

about 3 years ago

In reply to: Adrian Klaver (#13)

Re: Using CTID system column as a "temporary" primary key

Oh the use of default keyword is new to me, thanks for that.

But to make PostgreSQL more Informix-compatible, zero should have been considered as well.

Informix:

sf@toro:/tmp$ dbaccess test1 -
Database selected.

create table mytable ( pkey serial not null primary key, name varchar(50) );

Table created.

insert into mytable values ( 0, 'aaaaa' );

1 row(s) inserted.

select * from mytable;

pkey name
1 aaaaa
1 row(s) retrieved.

PostgreSQL:

sf@toro:/tmp$ psql test1 --host=localhost --port=5436 --user=pgsuser
psql (14.1)
Type "help" for help.

test1=> create table mytable ( pkey serial not null primary key, name varchar(50) );
CREATE TABLE

test1=> insert into mytable values ( 0, 'aaaaa' );
INSERT 0 1

test1=> select * from mytable;
pkey | name
------+-------
0 | aaaaa
(1 row)

So, I would rather say : no, using zero was not considered.

😉
Seb

________________________________
From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Wednesday, March 29, 2023 4:59 PM
To: Sebastien Flaesch <sebastien.flaesch@4js.com>; Kirk Wolak <wolakk@gmail.com>
Cc: Geoff Winkless <pgsqladmin@geoff.dj>; pgsql-general <pgsql-general@lists.postgresql.org>
Subject: Re: Using CTID system column as a "temporary" primary key

EXTERNAL: Do not click links or open attachments if you do not recognize the sender.

On 3/29/23 07:19, Sebastien Flaesch wrote:

Hello Kirk,

INSERT statements must not use the serial column, so you have to list
all columns of the table and provide only the values of the non-serial
columns. With Informix you could just specific a zero to get a new
generated serial, but seems this has never been considered with PostgreSQL.

Yes it has:

insert into seq_test values(default, 'test');

select * from seq_test;
id | fld_1
----+-------
1 | test

SELECT * FROM table will return all column, user-defined ROWID included...
This is not the case with Informix or Oracle ROWID columns.
So, either you specify all columns except user-def ROWID or you add the
rowid field to the program variable structure that receives the row.

...

Seb
------------------------------------------------------------------------

--
Adrian Klaver
adrian.klaver@aklaver.com

#18

Adrian Klaver

adrian.klaver@aklaver.com

about 3 years ago

In reply to: Peter J. Holzer (#15)

Re: Using CTID system column as a "temporary" primary key

On 3/29/23 09:43, Peter J. Holzer wrote:

On 2023-03-29 07:59:54 -0700, Adrian Klaver wrote:

On 3/29/23 07:19, Sebastien Flaesch wrote:

INSERT statements must not use the serial column, so you have to list
all columns of the table and provide only the values of the non-serial
columns. With Informix you could just specific a zero to get a new
generated serial, but seems this has never been considered with
PostgreSQL.

Yes it has:

[...]

insert into seq_test values(default, 'test');

Default is not the same as zero.

It accomplishes the same thing, a place holder value can be used to fire
the sequence without column qualifying the insert/update. Furthermore it
works over all columns. So I would say it has been considered by
Postgres as a way to 'to get a new generated serial'.

hp

--
Adrian Klaver
adrian.klaver@aklaver.com

#19

Christophe Pettus

xof@thebuild.com

about 3 years ago

In reply to: Sebastien Flaesch (#17)

Re: Using CTID system column as a "temporary" primary key

On Mar 29, 2023, at 12:11, Sebastien Flaesch <sebastien.flaesch@4js.com> wrote:
But to make PostgreSQL more Informix-compatible, zero should have been considered as well.

There is an infinite family of strange features that various databases have (DUAL from Oracle, anyone?); PostgreSQL will rapidly become unusable if it tried to adopt them all. This one in particular seems particularly hacky and misguided (as well as non-standard-compliant).

#20

Adrian Klaver

adrian.klaver@aklaver.com

about 3 years ago

In reply to: Sebastien Flaesch (#17)

Re: Using CTID system column as a "temporary" primary key

On 3/29/23 12:11, Sebastien Flaesch wrote:

Oh the use of default keyword is new to me, thanks for that.

But to make PostgreSQL more Informix-compatible, zero should have been
considered as well.

1) Why? Down the road to compatibility with some undetermined group of
databases lies mayhem.

2) 0 can be a valid sequence value:

test(5432)=# create sequence zero_test start 0 minvalue 0;
CREATE SEQUENCE
test(5432)=# select * from zero_test ;
last_value | log_cnt | is_called
------------+---------+-----------
0 | 0 | f

Then what do you do?

--
Adrian Klaver
adrian.klaver@aklaver.com

#21