Is element access after HASH_REMOVE ever OK?
Hi,
After hearing from a couple of directions about systems spending too
much time scanning the local lock hash table, I wrote the trivial
patch to put them in a linked list, before learning that people have
considered that before, so I should probably go and read some history
on that and find out why it hasn't been done...
However, I noticed in passing that RemoveLocalLock() accesses
*locallock after removing it from the hash table (in assertion builds
only). So one question I have is whether it's actually a programming
rule that you can't do that (at most you can compare the pointer
against NULL), or whether it's supposed to be
safe-if-you-know-what-you're-doing, as the existing comments hints.
Here also is a patch that does wipe_mem on removed elements, as
threatened last time this topic came up[1]/messages/by-id/CAHut+Ps-pL++f6CJwPx2+vUqXuew=Xt-9Bi-6kCyxn+Fwi2M7w@mail.gmail.com, which reveals the problem.
I'm also not exactly sure why it's only a WARNING if your local lock
table is out of sync, but perhaps that's in the archives too.
[1]: /messages/by-id/CAHut+Ps-pL++f6CJwPx2+vUqXuew=Xt-9Bi-6kCyxn+Fwi2M7w@mail.gmail.com
Attachments:
0001-Clobber-memory-released-by-HASH_REMOVE.patchtext/x-patch; charset=US-ASCII; name=0001-Clobber-memory-released-by-HASH_REMOVE.patchDownload+6-4
0002-Don-t-access-a-local-locks-after-freeing-them.patchtext/x-patch; charset=US-ASCII; name=0002-Don-t-access-a-local-locks-after-freeing-them.patchDownload+5-6
Thomas Munro <thomas.munro@gmail.com> writes:
However, I noticed in passing that RemoveLocalLock() accesses
*locallock after removing it from the hash table (in assertion builds
only). So one question I have is whether it's actually a programming
rule that you can't do that (at most you can compare the pointer
against NULL), or whether it's supposed to be
safe-if-you-know-what-you're-doing, as the existing comments hints.
I'd say it's, at best, unwarranted familiarity with the dynahash
implementation ...
Here also is a patch that does wipe_mem on removed elements, as
threatened last time this topic came up[1], which reveals the problem.
... one good reason being that it'll fail under this sort of
entirely-reasonable debugging aid. Can we get rid of the unsafe
access easily?
regards, tom lane
I wrote:
... Can we get rid of the unsafe
access easily?
Oh, shoulda read your second patch first. Looking at that,
I fear it might not be quite that simple, because the
comment on CheckAndSetLockHeld says very clearly
* It is callers responsibility that this function is called after
* acquiring/releasing the relation extension/page lock.
so your proposed patch violates that specification.
I'm inclined to think that this API spec is very poorly thought out
and should be changed --- why is it that the flags should change
*after* the lock change in both directions? But we'd have to take
a look at the usage of these flags to understand what's going on
exactly.
regards, tom lane
Hi,
On 2021-05-10 20:15:41 -0400, Tom Lane wrote:
I wrote:
... Can we get rid of the unsafe
access easily?Oh, shoulda read your second patch first. Looking at that,
I fear it might not be quite that simple, because the
comment on CheckAndSetLockHeld says very clearly* It is callers responsibility that this function is called after
* acquiring/releasing the relation extension/page lock.so your proposed patch violates that specification.
It wouldn't be too hard to fix this though - we can just copy the
locktag into a local variable. Or use one of the existing local copies,
higher in the stack.
But:
I'm inclined to think that this API spec is very poorly thought out
and should be changed --- why is it that the flags should change
*after* the lock change in both directions? But we'd have to take
a look at the usage of these flags to understand what's going on
exactly.
I can't see a need to do it after the HASH_REMOVE at least - as we don't
return early if that fails, there's no danger getting out of sync if we
reverse the order. I think the comment could just be changed to say
that the function has to be called after it is inevitable that the lock
is acquired/released.
Greetings,
Andres Freund