existing row not found by SELECT ... WHERE CTID = ?

Started by Matthias Apitzalmost 4 years ago18 messagesgeneral
Jump to latest
#1Matthias Apitz
guru@unixarea.de

We have a C-written program, written in ESQL/C, of our LMS where the logic
crawls with FETCH through a hit list and does UPDATE on some rows which
match certain condition. This works fine for thousands of rows every night,
but magically sometimes it fails. I have here the part of ESQL/C logs
when I was able to catch such a case:

The hit list for the FETCHes is built with:

[29858]: [23.05.2022 23:21:21:844]: ecpg_free_params on line 2526: parameter 1 = (668486,20)
[29858]: [23.05.2022 23:21:21:844]: ecpg_free_params on line 2526: parameter 1 = (668486,20)

then the CURSOR d01buch_scr is FETCHed some 59537 times, some of the
rows were updated and here is the failing situation with a good FETCH of
the row:

...
[29858]: [23.05.2022 23:21:21:844]: ecpg_free_params on line 2526: parameter 1 = (668486,20)
[29858]: [23.05.2022 23:21:21:844]: ecpg_free_params on line 2526: parameter 1 = (668486,20)
[29858]: [23.05.2022 23:21:21:844]: ecpg_free_params on line 2526: parameter 1 = (668486,20)
...

but when a new CURSOR should be build to UPDATE the row based
on its CTID = (668486,20), the row could not be found:

[29858]: [23.05.2022 23:21:21:844]: ecpg_free_params on line 2526: parameter 1 = (668486,20)
[29858]: [23.05.2022 23:21:21:844]: ecpg_free_params on line 2526: parameter 1 = (668486,20)
[29858]: [23.05.2022 23:21:21:844]: ecpg_free_params on line 2526: parameter 1 = (668486,20)
[29858]: [23.05.2022 23:21:21:844]: ecpg_free_params on line 2526: parameter 1 = (668486,20)
[29858]: [23.05.2022 23:21:21:844]: ecpg_free_params on line 2526: parameter 1 = (668486,20)

...

and the first FETCH in the CURSOR hc_d01buch could not see any row:

104 [29858][23.05.2022 23:21:21:844]: ecpg_free_params on line 2526: parameter 1 = (668486,20) [23.05.2022 23:21:21:844]: raising sqlcode 100 on line 2531: no data found on line 2531

Why is this? Thanks

matthias

--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matthias Apitz (#1)
Re: existing row not found by SELECT ... WHERE CTID = ?

Matthias Apitz <guru@unixarea.de> writes:

We have a C-written program, written in ESQL/C, of our LMS where the logic
crawls with FETCH through a hit list and does UPDATE on some rows which
match certain condition. This works fine for thousands of rows every night,
but magically sometimes it fails.

Maybe something else already updated the row since the cursor was opened?
That would change its CTID.

regards, tom lane

#3Matthias Apitz
guru@unixarea.de
In reply to: Tom Lane (#2)
Re: existing row not found by SELECT ... WHERE CTID = ?

El día martes, mayo 24, 2022 a las 10:47:11 -0400, Tom Lane escribió:

Matthias Apitz <guru@unixarea.de> writes:

We have a C-written program, written in ESQL/C, of our LMS where the logic
crawls with FETCH through a hit list and does UPDATE on some rows which
match certain condition. This works fine for thousands of rows every night,
but magically sometimes it fails.

Maybe something else already updated the row since the cursor was opened?
That would change its CTID.

If you compare the time when the CTID was read:

[29858]: [23.05.2022 23:21:21:844]: raising sqlcode 100 on line 2531: no data found on line 2531
19; array: no

with the time when it was not found:

[29858]: [23.05.2022 23:21:21:844]: raising sqlcode 100 on line 2531: no data found on line 2531
line 2531

it's nearly imposible that our software could have done this. Maybe
VACUUM hits exactly this moment in time and row, but this sounds also like 5
good numbers in the lotery jackpot :-)

Thanks

matthias
--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Matthias Apitz (#1)
Re: existing row not found by SELECT ... WHERE CTID = ?

On Tue, 2022-05-24 at 16:44 +0200, Matthias Apitz wrote:

We have a C-written program, written in ESQL/C, of our LMS where the logic
crawls with FETCH through a hit list and does UPDATE on some rows which
match certain condition. This works fine for thousands of rows every night,
but magically sometimes it fails. I have here the part of ESQL/C logs
when I was able to catch such a case:

The hit list for the FETCHes is built with:

[29858] [23.05.2022 23:11:06:419]: prepare_common on line 1825: name sid_d01buch; query: "SELECT ctid, * from d01buch WHERE d01status = 4 "
[29858] [23.05.2022 23:11:06:419]: ecpg_execute on line 2026: query: declare d01buch_scr scroll cursor with hold for SELECT ctid, * from d01buch WHERE d01status = 4 ; with 0 parameter(s) on
connection sisis

then the CURSOR d01buch_scr is FETCHed some 59537 times, some of the
rows were updated and here is the failing situation with a good FETCH of
the row:

I cannot understand many of your log messages, but it seems quite clear that
you are declaring a WITH HOLD cursor.

Such cursors are materialized when the transactoin that creates the cursor
commits, so the result set is "frozen" and does no longer reflect the current
state of the table.

It may well be that somebody deleted or updated a few rows between the time
the cursor was materialized and the time the 50000th row was fetched.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Laurenz Albe (#4)
Re: existing row not found by SELECT ... WHERE CTID = ?

Laurenz Albe <laurenz.albe@cybertec.at> writes:

It may well be that somebody deleted or updated a few rows between the time
the cursor was materialized and the time the 50000th row was fetched.

Even without HOLD, a cursor will return a view of the data as it stood
when the cursor was opened, just as a plain SELECT does. There is
*plenty* of time for another session to get in there if you've been
groveling through 50K records one at a time.

regards, tom lane

#6Matthias Apitz
guru@unixarea.de
In reply to: Tom Lane (#5)
Re: existing row not found by SELECT ... WHERE CTID = ?

El día martes, mayo 24, 2022 a las 12:11:49 -0400, Tom Lane escribió:

Laurenz Albe <laurenz.albe@cybertec.at> writes:

It may well be that somebody deleted or updated a few rows between the time
the cursor was materialized and the time the 50000th row was fetched.

Even without HOLD, a cursor will return a view of the data as it stood
when the cursor was opened, just as a plain SELECT does. There is
*plenty* of time for another session to get in there if you've been
groveling through 50K records one at a time.

Tom, Thanks for pointing us in the right direction where to look for a
solution. The CURSOR was opened around 23:11 pm and the CTID not found
at 23:21 pm, i.e. ten minutes later. This piece of software does every
night some housekeeping work in the circulation area of our LMS (Library
Management System) and is meant to run as a standalone job (only one
process after the other). We're trying to figure out with the customer if something
else was started/running at this time between 23:11 and 23:21, to shut this
off in the future. Is it possible that the PostgreSQL 13.1 server does
something by its own to invalidate the rowid?

matthias
--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

#7Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Matthias Apitz (#6)
Re: existing row not found by SELECT ... WHERE CTID = ?

On Wed, 2022-05-25 at 11:21 +0200, Matthias Apitz wrote:

Is it possible that the PostgreSQL 13.1 server does something by its own to invalidate the rowid?

No. PostgreSQL may remove a dead row, but a dead row is by definition
no longer visible, so it wouldn't be found by a query.

Yours,
Laurenz Albe

#8Matthias Apitz
guru@unixarea.de
In reply to: Laurenz Albe (#7)
Re: existing row not found by SELECT ... WHERE CTID = ?

El día Mittwoch, Mai 25, 2022 a las 12:51:02 +0200, Laurenz Albe escribió:

On Wed, 2022-05-25 at 11:21 +0200, Matthias Apitz wrote:

Is it possible that the PostgreSQL 13.1 server does something by its own to invalidate the rowid?

No. PostgreSQL may remove a dead row, but a dead row is by definition
no longer visible, so it wouldn't be found by a query.

We will solve the problem now with setting the session after connect to

SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ;

(with an appropriate ESQL/C call). Any comments?

Thanks

matthias
--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

#9Christoph Moench-Tegeder
cmt@burggraben.net
In reply to: Matthias Apitz (#8)
Re: existing row not found by SELECT ... WHERE CTID = ?

## Matthias Apitz (guru@unixarea.de):

We will solve the problem now with setting the session after connect to

SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ;

(with an appropriate ESQL/C call). Any comments?

Maybe the real question is whether it is wise to use an implementation
artifact (ctid) to identify rows?
The textbook approach could be row locks (SELECT ... FOR SHARE/UPDATE and
variants) to prevent concurrent changes or optimistic locking (and a
primary key in any case) - but maybe you already investigated those options?

Regards,
Christoph

--
Spare Space

#10Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Christoph Moench-Tegeder (#9)
Re: existing row not found by SELECT ... WHERE CTID = ?

On Wed, 2022-05-25 at 14:27 +0200, Christoph Moench-Tegeder wrote:

## Matthias Apitz (guru@unixarea.de):

We will solve the problem now with setting the session after connect to

   SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ;

(with an appropriate ESQL/C call). Any comments?

Maybe the real question is whether it is wise to use an implementation
artifact (ctid) to identify rows?
The textbook approach could be row locks (SELECT ... FOR SHARE/UPDATE and
variants) to prevent concurrent changes or optimistic locking  (and a
primary key in any case) - but maybe you already investigated those options?

Right.

REPEATABLE READ won't help you there. True, you will see a stable snapshot
of the database inside a single transaction, but if a concurrent session has
modified the row, you will get a serialization error.
So that is not a solution.

Yours,
Laurenz Albe

#11Ravi Krishna
srkrishna@vivaldi.net
In reply to: Laurenz Albe (#7)
Re: existing row not found by SELECT ... WHERE CTID = ?

No. PostgreSQL may remove a dead row, but a dead row is by definition
no longer visible, so it wouldn't be found by a query.

I am wondering whether it is a good practice to use CTID in a where
clause. years ago when I use to code in Informix, using ROWID as a
generic substitute for primary key was discouraged precisely for the
same reason as described here for CTID. Sometimes rowid can change under
concurrent updates.

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ravi Krishna (#11)
Re: existing row not found by SELECT ... WHERE CTID = ?

Ravi Krishna <srkrishna@vivaldi.net> writes:

No. PostgreSQL may remove a dead row, but a dead row is by definition
no longer visible, so it wouldn't be found by a query.

I am wondering whether it is a good practice to use CTID in a where
clause.

It's fine if part of your business logic is that you don't want to allow
concurrent updates. In this case, the OP seems to want to prevent rather
than tolerate the concurrent update, so I don't think he needs to revisit
the app's use of CTID.

If you do need to support concurrent updates, then yeah relying on CTID
is likely to be problematic.

regards, tom lane

#13Matthias Apitz
guru@unixarea.de
In reply to: Matthias Apitz (#6)
Re: existing row not found by SELECT ... WHERE CTID = ?

El día Wednesday, May 25, 2022 a las 11:21:44AM +0200, Matthias Apitz escribió:

El día martes, mayo 24, 2022 a las 12:11:49 -0400, Tom Lane escribió:

Laurenz Albe <laurenz.albe@cybertec.at> writes:

It may well be that somebody deleted or updated a few rows between the time
the cursor was materialized and the time the 50000th row was fetched.

Even without HOLD, a cursor will return a view of the data as it stood
when the cursor was opened, just as a plain SELECT does. There is
*plenty* of time for another session to get in there if you've been
groveling through 50K records one at a time.

Tom, Thanks for pointing us in the right direction where to look for a
solution. The CURSOR was opened around 23:11 pm and the CTID not found
at 23:21 pm, i.e. ten minutes later. This piece of software does every
night some housekeeping work in the circulation area of our LMS (Library
Management System) and is meant to run as a standalone job (only one
process after the other). We're trying to figure out with the customer if something
else was started/running at this time between 23:11 and 23:21, to shut this
off in the future. ...

Is there any way to get with the old CTID to the row, for example with
the old CTID to the new one which the row now has after the update of the row?

matthias

--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

#14David G. Johnston
david.g.johnston@gmail.com
In reply to: Matthias Apitz (#13)
Re: existing row not found by SELECT ... WHERE CTID = ?

On Thursday, May 26, 2022, Matthias Apitz <guru@unixarea.de> wrote:

Is there any way to get with the old CTID to the row, for example with
the old CTID to the new one which the row now has after the update of the
row?

No, there is no link between old and new in the main table.

David J.

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#14)
Re: existing row not found by SELECT ... WHERE CTID = ?

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Thursday, May 26, 2022, Matthias Apitz <guru@unixarea.de> wrote:

Is there any way to get with the old CTID to the row, for example with
the old CTID to the new one which the row now has after the update of the
row?

No, there is no link between old and new in the main table.

There is a forward link from the old CTID to the new, but we don't
provide any user-accessible way to use it. I wonder though if the
OP should be considering using SELECT FOR UPDATE in his cursor,
so that it'd automatically chain up to the newest row version.

regards, tom lane

#16Matthias Apitz
guru@unixarea.de
In reply to: Tom Lane (#15)
Re: existing row not found by SELECT ... WHERE CTID = ?

El día jueves, mayo 26, 2022 a las 12:02:35 -0400, Tom Lane escribió:

I wonder though if the
OP should be considering using SELECT FOR UPDATE in his cursor,
so that it'd automatically chain up to the newest row version.

I diged into this but the CURSOR are all declared WITH HOLD and this
conflicts with FOR UPDATE. And this is not easy to change in our generic
generated DB-layer for all the ~400 tables.

matthias

--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

#17Matthias Apitz
guru@unixarea.de
In reply to: Tom Lane (#12)
Re: existing row not found by SELECT ... WHERE CTID = ?

El día miércoles, mayo 25, 2022 a las 10:38:24a. m. -0400, Tom Lane escribió:

Ravi Krishna <srkrishna@vivaldi.net> writes:

No. PostgreSQL may remove a dead row, but a dead row is by definition
no longer visible, so it wouldn't be found by a query.

I am wondering whether it is a good practice to use CTID in a where
clause.

It's fine if part of your business logic is that you don't want to allow
concurrent updates. In this case, the OP seems to want to prevent rather
than tolerate the concurrent update, so I don't think he needs to revisit
the app's use of CTID.

If you do need to support concurrent updates, then yeah relying on CTID
is likely to be problematic.

Tom, we detected another issue of missing a row by its CTID in another table
where fees are stored which one(!) process cumulates in the night. The time
window between creating the CURSOR and missing the CTID is only 42
seconds and I can not imagine that any other concurrent process is updating
such fee rows at midnight. Could exist any other reason why a row changes
its CTID? Full VACUUM is not used either.

Thanks

matthias
--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

#18Matthias Apitz
guru@unixarea.de
In reply to: Matthias Apitz (#17)
Re: existing row not found by SELECT ... WHERE CTID = ?

Analizing our logs in more detail, we got to know that there are cases
where, after updating a row with something like:

EXEC SQL UPDATE d01buch SET
d01gsi =:d01gsi,
d01ex =:d01ex,
...
d01vldate =:d01vldate
WHERE CURRENT OF hc_d01buch;

we look-up the same row later again with its old CTID in :rowid; one solution
of it could be right after the UPDATE look-up the new CTID with:

EXEC SQL SELECT ctid INTO :rowid FROM d01buch WHERE CURRENT OF hc_d01buch;

i.e. using the same CURSOR which was used for the UPDATE again for a
SELECT for the CTID. Can this work?

matthias
--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub