Is element access after HASH_REMOVE ever OK?

Started by Thomas Munroalmost 5 years ago4 messageshackers
Jump to latest
#1Thomas Munro
thomas.munro@gmail.com

Hi,

After hearing from a couple of directions about systems spending too
much time scanning the local lock hash table, I wrote the trivial
patch to put them in a linked list, before learning that people have
considered that before, so I should probably go and read some history
on that and find out why it hasn't been done...

However, I noticed in passing that RemoveLocalLock() accesses
*locallock after removing it from the hash table (in assertion builds
only). So one question I have is whether it's actually a programming
rule that you can't do that (at most you can compare the pointer
against NULL), or whether it's supposed to be
safe-if-you-know-what-you're-doing, as the existing comments hints.
Here also is a patch that does wipe_mem on removed elements, as
threatened last time this topic came up[1]/messages/by-id/CAHut+Ps-pL++f6CJwPx2+vUqXuew=Xt-9Bi-6kCyxn+Fwi2M7w@mail.gmail.com, which reveals the problem.
I'm also not exactly sure why it's only a WARNING if your local lock
table is out of sync, but perhaps that's in the archives too.

[1]: /messages/by-id/CAHut+Ps-pL++f6CJwPx2+vUqXuew=Xt-9Bi-6kCyxn+Fwi2M7w@mail.gmail.com

Attachments:

0001-Clobber-memory-released-by-HASH_REMOVE.patchtext/x-patch; charset=US-ASCII; name=0001-Clobber-memory-released-by-HASH_REMOVE.patchDownload+6-4
0002-Don-t-access-a-local-locks-after-freeing-them.patchtext/x-patch; charset=US-ASCII; name=0002-Don-t-access-a-local-locks-after-freeing-them.patchDownload+5-6
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#1)
Re: Is element access after HASH_REMOVE ever OK?

Thomas Munro <thomas.munro@gmail.com> writes:

However, I noticed in passing that RemoveLocalLock() accesses
*locallock after removing it from the hash table (in assertion builds
only). So one question I have is whether it's actually a programming
rule that you can't do that (at most you can compare the pointer
against NULL), or whether it's supposed to be
safe-if-you-know-what-you're-doing, as the existing comments hints.

I'd say it's, at best, unwarranted familiarity with the dynahash
implementation ...

Here also is a patch that does wipe_mem on removed elements, as
threatened last time this topic came up[1], which reveals the problem.

... one good reason being that it'll fail under this sort of
entirely-reasonable debugging aid. Can we get rid of the unsafe
access easily?

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
Re: Is element access after HASH_REMOVE ever OK?

I wrote:

... Can we get rid of the unsafe
access easily?

Oh, shoulda read your second patch first. Looking at that,
I fear it might not be quite that simple, because the
comment on CheckAndSetLockHeld says very clearly

* It is callers responsibility that this function is called after
* acquiring/releasing the relation extension/page lock.

so your proposed patch violates that specification.

I'm inclined to think that this API spec is very poorly thought out
and should be changed --- why is it that the flags should change
*after* the lock change in both directions? But we'd have to take
a look at the usage of these flags to understand what's going on
exactly.

regards, tom lane

#4Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#3)
Re: Is element access after HASH_REMOVE ever OK?

Hi,

On 2021-05-10 20:15:41 -0400, Tom Lane wrote:

I wrote:

... Can we get rid of the unsafe
access easily?

Oh, shoulda read your second patch first. Looking at that,
I fear it might not be quite that simple, because the
comment on CheckAndSetLockHeld says very clearly

* It is callers responsibility that this function is called after
* acquiring/releasing the relation extension/page lock.

so your proposed patch violates that specification.

It wouldn't be too hard to fix this though - we can just copy the
locktag into a local variable. Or use one of the existing local copies,
higher in the stack.

But:

I'm inclined to think that this API spec is very poorly thought out
and should be changed --- why is it that the flags should change
*after* the lock change in both directions? But we'd have to take
a look at the usage of these flags to understand what's going on
exactly.

I can't see a need to do it after the HASH_REMOVE at least - as we don't
return early if that fails, there's no danger getting out of sync if we
reverse the order. I think the comment could just be changed to say
that the function has to be called after it is inevitable that the lock
is acquired/released.

Greetings,

Andres Freund