[PATCH] Improve error message when trying to lock virtual tuple.

Started by Sven Klemmover 1 year ago5 messages
#1Sven Klemm
sven@timescale.com
1 attachment(s)

Hello,

When currently trying to lock a virtual tuple the returned error
will be a misleading `could not read block 0`. This patch adds a
check for the tuple table slot being virtual to produce a clearer
error.

This can be triggered by extensions returning virtual tuples.
While this is of course an error in those extensions the resulting
error is very misleading.

--
Regards, Sven Klemm

Attachments:

0001-Improve-error-message-when-trying-to-lock-virtual-tu.patchtext/x-patch; charset=US-ASCII; name=0001-Improve-error-message-when-trying-to-lock-virtual-tu.patchDownload
From c5ccbf2e9cb401a7e63e93cd643b4dfec8d92ab8 Mon Sep 17 00:00:00 2001
From: Sven Klemm <sven@timescale.com>
Date: Mon, 17 Jun 2024 17:38:27 +0200
Subject: [PATCH] Improve error message when trying to lock virtual tuple.

When currently trying to lock a virtual tuple the returned error
will be a misleading `could not read block 0`. This patch adds a
check for the tuple table slot being virtual to produce a clearer
error.
---
 src/backend/executor/nodeLockRows.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 41754ddfea..cb3abbd348 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -65,6 +65,13 @@ lnext:
 		return NULL;
 	}
 
+	/*
+	 * If the slot is virtual, we can't lock it. This should never happen, but
+	 * this will lead to a misleading could not read block error later otherwise.
+	 */
+	if (TTS_IS_VIRTUAL(slot))
+		elog(ERROR, "cannot lock virtual tuple");
+
 	/* We don't need EvalPlanQual unless we get updated tuple version(s) */
 	epq_needed = false;
 
-- 
2.45.2

#2Aleksander Alekseev
aleksander@timescale.com
In reply to: Sven Klemm (#1)
Re: [PATCH] Improve error message when trying to lock virtual tuple.

Hi,

When currently trying to lock a virtual tuple the returned error
will be a misleading `could not read block 0`. This patch adds a
check for the tuple table slot being virtual to produce a clearer
error.

This can be triggered by extensions returning virtual tuples.
While this is of course an error in those extensions the resulting
error is very misleading.

```
+    /*
+     * If the slot is virtual, we can't lock it. This should never happen, but
+     * this will lead to a misleading could not read block error
later otherwise.
+     */
```

I suggest dropping or rephrasing the "this should never happen" part.
If this never happened we didn't need this check. Maybe "If the slot
is virtual, we can't lock it. Fail early in order to provide an
appropriate error message", or just "If the slot is virtual, we can't
lock it".

```
elog(ERROR, "cannot lock virtual tuple");
```

For some reason I thought that ereport() is the preferred way of
throwing errors, but I see elog() used many times in ExecLockRows() so
this is probably fine.

--
Best regards,
Aleksander Alekseev

#3Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Sven Klemm (#1)
Re: [PATCH] Improve error message when trying to lock virtual tuple.

(now send a copy to -hackers, too)

On Mon, 17 Jun 2024 at 17:55, Sven Klemm <sven@timescale.com> wrote:

Hello,

When currently trying to lock a virtual tuple the returned error
will be a misleading `could not read block 0`. This patch adds a
check for the tuple table slot being virtual to produce a clearer
error.

This can be triggered by extensions returning virtual tuples.
While this is of course an error in those extensions the resulting
error is very misleading.

I think you're solving the wrong problem here, as I can't think of a
place where both virtual tuple slots and tuple locking are allowed at
the same time in core code.

I mean, in which kind of situation could we get a Relation's table
slot which is not lockable by said relation's AM? Assuming the "could
not read block 0" error comes from the heap code, why does the
assertion in heapam_tuple_lock that checks for a
BufferHeapTupleTableSlot not fire before this `block 0` error? If the
error is not in the heapam code, could you show an example of the code
that breaks with that error code?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#4Sven Klemm
sven@timescale.com
In reply to: Matthias van de Meent (#3)
Re: [PATCH] Improve error message when trying to lock virtual tuple.

On Mon, Jun 17, 2024 at 10:25 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

I think you're solving the wrong problem here, as I can't think of a
place where both virtual tuple slots and tuple locking are allowed at
the same time in core code.

I mean, in which kind of situation could we get a Relation's table
slot which is not lockable by said relation's AM? Assuming the "could
not read block 0" error comes from the heap code, why does the
assertion in heapam_tuple_lock that checks for a
BufferHeapTupleTableSlot not fire before this `block 0` error? If the
error is not in the heapam code, could you show an example of the code
that breaks with that error code?

In assertion enabled builds this will be stopped much earlier and not return
the misleading error message. But most packaged postgres versions don't have
assertions enabled and will produce the misleading `could not read block 0`
error.
I am aware that this not a postgres bug, but i think this error message
is an improvement over the current situation.

--
Regards, Sven Klemm

#5Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Sven Klemm (#4)
Re: [PATCH] Improve error message when trying to lock virtual tuple.

On Tue, 18 Jun 2024 at 09:32, Sven Klemm <sven@timescale.com> wrote:

On Mon, Jun 17, 2024 at 10:25 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

I think you're solving the wrong problem here, as I can't think of a
place where both virtual tuple slots and tuple locking are allowed at
the same time in core code.

I mean, in which kind of situation could we get a Relation's table
slot which is not lockable by said relation's AM? Assuming the "could
not read block 0" error comes from the heap code, why does the
assertion in heapam_tuple_lock that checks for a
BufferHeapTupleTableSlot not fire before this `block 0` error? If the
error is not in the heapam code, could you show an example of the code
that breaks with that error code?

In assertion enabled builds this will be stopped much earlier and not return
the misleading error message. But most packaged postgres versions don't have
assertions enabled and will produce the misleading `could not read block 0`
error.
I am aware that this not a postgres bug, but i think this error message
is an improvement over the current situation.

Extensions shouldn't cause assertions to trigger, IMO, and I don't
think that this check in ExecLockRows is a good way to solve that
issue. In my opinion, authors should test their extension on
assert-enabled PostgreSQL, so that they're certain they're not doing

If you're dead-set on having users see less confusing error messages
when assertions should have triggered (but are not enabled, and thus
don't), I think the better place to add additional checks & error
messages is in the actual heapam_tuple_lock method, just after the
assertion, rather than in the AM-agnostic ExecLockRows: If or when a
tableAM decides that VirtualTableTupleSlot is the slot type they want
to use for passing tuples around, then that shouldn't be broken by
code in ExecLockRows that was put there to mimick an assert in the
heap AM.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)