pgsql: dshash: Add sequential scan support.

Started by Andres Freundalmost 4 years ago11 messages
#1Andres Freund
Andres Freund
andres@anarazel.de

dshash: Add sequential scan support.

Add ability to scan all entries sequentially to dshash. The interface is
similar but a bit different both from that of dynahash and simple dshash
search functions. The most significant differences is that dshash's interfac
always needs a call to dshash_seq_term when scan ends. Another is
locking. Dshash holds partition lock when returning an entry,
dshash_seq_next() also holds lock when returning an entry but callers
shouldn't release it, since the lock is essential to continue a scan. The
seqscan interface allows entry deletion while a scan is in progress using
dshash_delete_current().

Reviewed-By: Andres Freund <andres@anarazel.de>
Author: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/352d297dc74feb0bf0dcb255cc0dfaaed2b96c1e

Modified Files
--------------
src/backend/lib/dshash.c | 163 ++++++++++++++++++++++++++++++++++++++-
src/include/lib/dshash.h | 23 ++++++
src/tools/pgindent/typedefs.list | 1 +
3 files changed, 186 insertions(+), 1 deletion(-)

#2Tom Lane
Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#1)
Re: pgsql: dshash: Add sequential scan support.

Andres Freund <andres@anarazel.de> writes:

dshash: Add sequential scan support.
Add ability to scan all entries sequentially to dshash. The interface is
similar but a bit different both from that of dynahash and simple dshash
search functions. The most significant differences is that dshash's interfac
always needs a call to dshash_seq_term when scan ends.

Umm ... what about error recovery? Or have you just cemented the
proposition that long-lived dshashes are unsafe?

regards, tom lane

#3Andres Freund
Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#2)
Re: pgsql: dshash: Add sequential scan support.

Hi,

On 2022-03-10 20:09:56 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

dshash: Add sequential scan support.
Add ability to scan all entries sequentially to dshash. The interface is
similar but a bit different both from that of dynahash and simple dshash
search functions. The most significant differences is that dshash's interfac
always needs a call to dshash_seq_term when scan ends.

Umm ... what about error recovery? Or have you just cemented the
proposition that long-lived dshashes are unsafe?

I don't think this commit made it worse. dshash_seq_term() releases an lwlock
(which will be released in case of an error) and unsets
hash_table->find_[exclusively_]locked. The latter weren't introduced by this
patch, and are also set by dshash_find().

I agree that ->find_[exclusively_]locked are problematic from an error
recovery perspective.

It's per-backend state at least and just used for assertions. We could remove
it. Or stop checking it in places where it could be set wrongly: dshash_find()
and dshash_detach() couldn't check anymore, but the rest of the assertions
would still be valid afaics?

Greetings,

Andres Freund

#4Thomas Munro
Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#3)
1 attachment(s)
Re: pgsql: dshash: Add sequential scan support.

[Re-directing to -hackers]

On Fri, Mar 11, 2022 at 2:27 PM Andres Freund <andres@anarazel.de> wrote:

On 2022-03-10 20:09:56 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

dshash: Add sequential scan support.
Add ability to scan all entries sequentially to dshash. The interface is
similar but a bit different both from that of dynahash and simple dshash
search functions. The most significant differences is that dshash's interfac
always needs a call to dshash_seq_term when scan ends.

Umm ... what about error recovery? Or have you just cemented the
proposition that long-lived dshashes are unsafe?

I don't think this commit made it worse. dshash_seq_term() releases an lwlock
(which will be released in case of an error) and unsets
hash_table->find_[exclusively_]locked. The latter weren't introduced by this
patch, and are also set by dshash_find().

I agree that ->find_[exclusively_]locked are problematic from an error
recovery perspective.

Right, as seen in the build farm at [1]/messages/by-id/20220701232009.jcwxpl45bptaxv5n@alap3.anarazel.de. Also reproducible with something like:

@@ -269,6 +269,14 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size
request_size,
return false;
}

+       /* XXX random fault injection */
+       if (op == DSM_OP_ATTACH && random() < RAND_MAX / 8)
+       {
+               close(fd);
+               elog(ERROR, "chaos");
+               return false;
+       }
+

I must have thought that it was easy and practical to write no-throw
straight-line code and be sure to reach dshash_release_lock(), but I
concede that it was a bad idea: even dsa_get_address() can throw*, and
you're often likely to need to call that while accessing dshash
elements. For example, in lookup_rowtype_tupdesc_internal(), there is
a sequence dshash_find(), ..., dsa_get_address(), ...,
dshash_release_lock(), and I must have considered the range of code
between find and release to be no-throw, but now I know that it is
not.

It's per-backend state at least and just used for assertions. We could remove
it. Or stop checking it in places where it could be set wrongly: dshash_find()
and dshash_detach() couldn't check anymore, but the rest of the assertions
would still be valid afaics?

Yeah, it's all for assertions... let's just remove it. Those
assertions were useful to me at some stage in development but won't
hold as well as I thought, at least without widespread PG_FINALLY(),
which wouldn't be nice.

*dsa_get_address() might need to adjust the memory map with system
calls, which might fail. If you think of DSA as not only an allocator
but also a poor man's user level virtual memory scheme to tide us over
until we get threads, then this is a pretty low level kind of
should-not-happen failure that is analogous on some level to SIGBUS or
SIGSEGV or something like that, and we should PANIC. Then we could
claim that dsa_get_address() is no-throw. At least, that was one
argument I had with myself while investigating that strange Solaris
shm_open() failure, but ... I lost the argument. It's quite an
extreme position to take just to support these assertions, which are
of pretty limited value.

[1]: /messages/by-id/20220701232009.jcwxpl45bptaxv5n@alap3.anarazel.de

Attachments:

0001-Remove-spurious-assertions-from-dshash.c.patchtext/x-patch; charset=US-ASCII; name=0001-Remove-spurious-assertions-from-dshash.c.patch
#5Kyotaro Horiguchi
Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Thomas Munro (#4)
Re: pgsql: dshash: Add sequential scan support.

At Mon, 4 Jul 2022 14:55:43 +1200, Thomas Munro <thomas.munro@gmail.com> wrote in

[Re-directing to -hackers]

On Fri, Mar 11, 2022 at 2:27 PM Andres Freund <andres@anarazel.de> wrote:

It's per-backend state at least and just used for assertions. We could remove
it. Or stop checking it in places where it could be set wrongly: dshash_find()
and dshash_detach() couldn't check anymore, but the rest of the assertions
would still be valid afaics?

Yeah, it's all for assertions... let's just remove it. Those
assertions were useful to me at some stage in development but won't
hold as well as I thought, at least without widespread PG_FINALLY(),
which wouldn't be nice.

*dsa_get_address() might need to adjust the memory map with system
calls, which might fail. If you think of DSA as not only an allocator
but also a poor man's user level virtual memory scheme to tide us over
until we get threads, then this is a pretty low level kind of
should-not-happen failure that is analogous on some level to SIGBUS or
SIGSEGV or something like that, and we should PANIC. Then we could
claim that dsa_get_address() is no-throw. At least, that was one
argument I had with myself while investigating that strange Solaris
shm_open() failure, but ... I lost the argument. It's quite an
extreme position to take just to support these assertions, which are
of pretty limited value.

[1] /messages/by-id/20220701232009.jcwxpl45bptaxv5n@alap3.anarazel.de

FWIW, the discussion above is convincing to me and the patch looks good.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#6Zhihong Yu
Zhihong Yu
zyu@yugabyte.com
In reply to: Thomas Munro (#4)
Re: pgsql: dshash: Add sequential scan support.

On Sun, Jul 3, 2022 at 7:56 PM Thomas Munro <thomas.munro@gmail.com> wrote:

[Re-directing to -hackers]

On Fri, Mar 11, 2022 at 2:27 PM Andres Freund <andres@anarazel.de> wrote:

On 2022-03-10 20:09:56 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

dshash: Add sequential scan support.
Add ability to scan all entries sequentially to dshash. The

interface is

similar but a bit different both from that of dynahash and simple

dshash

search functions. The most significant differences is that dshash's

interfac

always needs a call to dshash_seq_term when scan ends.

Umm ... what about error recovery? Or have you just cemented the
proposition that long-lived dshashes are unsafe?

I don't think this commit made it worse. dshash_seq_term() releases an

lwlock

(which will be released in case of an error) and unsets
hash_table->find_[exclusively_]locked. The latter weren't introduced by

this

patch, and are also set by dshash_find().

I agree that ->find_[exclusively_]locked are problematic from an error
recovery perspective.

Right, as seen in the build farm at [1]. Also reproducible with something
like:

@@ -269,6 +269,14 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size
request_size,
return false;
}

+       /* XXX random fault injection */
+       if (op == DSM_OP_ATTACH && random() < RAND_MAX / 8)
+       {
+               close(fd);
+               elog(ERROR, "chaos");
+               return false;
+       }
+

I must have thought that it was easy and practical to write no-throw
straight-line code and be sure to reach dshash_release_lock(), but I
concede that it was a bad idea: even dsa_get_address() can throw*, and
you're often likely to need to call that while accessing dshash
elements. For example, in lookup_rowtype_tupdesc_internal(), there is
a sequence dshash_find(), ..., dsa_get_address(), ...,
dshash_release_lock(), and I must have considered the range of code
between find and release to be no-throw, but now I know that it is
not.

It's per-backend state at least and just used for assertions. We could

remove

it. Or stop checking it in places where it could be set wrongly:

dshash_find()

and dshash_detach() couldn't check anymore, but the rest of the

assertions

would still be valid afaics?

Yeah, it's all for assertions... let's just remove it. Those
assertions were useful to me at some stage in development but won't
hold as well as I thought, at least without widespread PG_FINALLY(),
which wouldn't be nice.

*dsa_get_address() might need to adjust the memory map with system
calls, which might fail. If you think of DSA as not only an allocator
but also a poor man's user level virtual memory scheme to tide us over
until we get threads, then this is a pretty low level kind of
should-not-happen failure that is analogous on some level to SIGBUS or
SIGSEGV or something like that, and we should PANIC. Then we could
claim that dsa_get_address() is no-throw. At least, that was one
argument I had with myself while investigating that strange Solaris
shm_open() failure, but ... I lost the argument. It's quite an
extreme position to take just to support these assertions, which are
of pretty limited value.

[1]
/messages/by-id/20220701232009.jcwxpl45bptaxv5n@alap3.anarazel.de

Hi,
In the description,

`new shared memory stats system in 15`

It would be clearer to add `release` before `15`.

Cheers

#7Andres Freund
Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#4)
Re: pgsql: dshash: Add sequential scan support.

Hi,

On 2022-07-04 14:55:43 +1200, Thomas Munro wrote:

Right, as seen in the build farm at [1]. Also reproducible with something like:

@@ -269,6 +269,14 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size
request_size,
return false;
}

+       /* XXX random fault injection */
+       if (op == DSM_OP_ATTACH && random() < RAND_MAX / 8)
+       {
+               close(fd);
+               elog(ERROR, "chaos");
+               return false;
+       }
+

I must have thought that it was easy and practical to write no-throw
straight-line code and be sure to reach dshash_release_lock(), but I
concede that it was a bad idea: even dsa_get_address() can throw*, and
you're often likely to need to call that while accessing dshash
elements. For example, in lookup_rowtype_tupdesc_internal(), there is
a sequence dshash_find(), ..., dsa_get_address(), ...,
dshash_release_lock(), and I must have considered the range of code
between find and release to be no-throw, but now I know that it is
not.

Yea - I'd go as far as saying that it's almost never feasible.

It's per-backend state at least and just used for assertions. We could remove
it. Or stop checking it in places where it could be set wrongly: dshash_find()
and dshash_detach() couldn't check anymore, but the rest of the assertions
would still be valid afaics?

Yeah, it's all for assertions... let's just remove it. Those
assertions were useful to me at some stage in development but won't
hold as well as I thought, at least without widespread PG_FINALLY(),
which wouldn't be nice.

Hm. I'd be inclined to at least add a few more
Assert(!LWLockHeldByMe[InMode]()) style assertions. E.g. to
dshash_find_or_insert().

@@ -572,13 +552,8 @@ dshash_release_lock(dshash_table *hash_table, void *entry)
size_t partition_index = PARTITION_FOR_HASH(item->hash);

Assert(hash_table->control->magic == DSHASH_MAGIC);
-	Assert(hash_table->find_locked);
-	Assert(LWLockHeldByMeInMode(PARTITION_LOCK(hash_table, partition_index),
-								hash_table->find_exclusively_locked
-								? LW_EXCLUSIVE : LW_SHARED));
+	Assert(LWLockHeldByMe(PARTITION_LOCK(hash_table, partition_index)));

- hash_table->find_locked = false;
- hash_table->find_exclusively_locked = false;
LWLockRelease(PARTITION_LOCK(hash_table, partition_index));
}

This LWLockHeldByMe() doesn't add much - the LWLockRelease() will error out if
we don't hold the lock.

Greetings,

Andres Freund

#8Thomas Munro
Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#7)
1 attachment(s)
Re: pgsql: dshash: Add sequential scan support.

On Tue, Jul 5, 2022 at 8:54 AM Andres Freund <andres@anarazel.de> wrote:

On 2022-07-04 14:55:43 +1200, Thomas Munro wrote:

It's per-backend state at least and just used for assertions. We could remove
it. Or stop checking it in places where it could be set wrongly: dshash_find()
and dshash_detach() couldn't check anymore, but the rest of the assertions
would still be valid afaics?

Yeah, it's all for assertions... let's just remove it. Those
assertions were useful to me at some stage in development but won't
hold as well as I thought, at least without widespread PG_FINALLY(),
which wouldn't be nice.

Hm. I'd be inclined to at least add a few more
Assert(!LWLockHeldByMe[InMode]()) style assertions. E.g. to
dshash_find_or_insert().

Yeah, I was wondering about that, but it needs to check the whole 128
element lock array. Hmm, yeah that seems OK for assertion builds.
Since there were 6 places with I-hold-no-lock assertions, I shoved the
loop into a function so I could do:

-               Assert(!status->hash_table->find_locked);
+               assert_no_lock_held_by_me(hash_table);

+ Assert(LWLockHeldByMe(PARTITION_LOCK(hash_table, partition_index)));

- hash_table->find_locked = false;
- hash_table->find_exclusively_locked = false;
LWLockRelease(PARTITION_LOCK(hash_table, partition_index));

This LWLockHeldByMe() doesn't add much - the LWLockRelease() will error out if
we don't hold the lock.

Duh. Removed.

Attachments:

v2-0001-Fix-lock-assertions-in-dshash.c.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Fix-lock-assertions-in-dshash.c.patch
#9Andres Freund
Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#8)
Re: pgsql: dshash: Add sequential scan support.

Hi,

On 2022-07-05 11:20:54 +1200, Thomas Munro wrote:

On Tue, Jul 5, 2022 at 8:54 AM Andres Freund <andres@anarazel.de> wrote:

Yeah, it's all for assertions... let's just remove it. Those
assertions were useful to me at some stage in development but won't
hold as well as I thought, at least without widespread PG_FINALLY(),
which wouldn't be nice.

Hm. I'd be inclined to at least add a few more
Assert(!LWLockHeldByMe[InMode]()) style assertions. E.g. to
dshash_find_or_insert().

Yeah, I was wondering about that, but it needs to check the whole 128
element lock array.

I think it'd be ok to just check the current partition - yes, it'd not catch
cases where we're still holding a lock on another partition, but that's imo
not too bad?

Hmm, yeah that seems OK for assertion builds.
Since there were 6 places with I-hold-no-lock assertions, I shoved the
loop into a function so I could do:

-               Assert(!status->hash_table->find_locked);
+               assert_no_lock_held_by_me(hash_table);

I am a *bit* wary about the costs of that, even in assert builds - each of the
partition checks in the loop will in turn need to iterate through
held_lwlocks. But I guess we can also just later weaken them if it turns out
to be a problem.

Greetings,

Andres Freund

#10Thomas Munro
Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#9)
1 attachment(s)
Re: pgsql: dshash: Add sequential scan support.

On Tue, Jul 5, 2022 at 11:25 AM Andres Freund <andres@anarazel.de> wrote:

On 2022-07-05 11:20:54 +1200, Thomas Munro wrote:

Since there were 6 places with I-hold-no-lock assertions, I shoved the
loop into a function so I could do:

-               Assert(!status->hash_table->find_locked);
+               assert_no_lock_held_by_me(hash_table);

I am a *bit* wary about the costs of that, even in assert builds - each of the
partition checks in the loop will in turn need to iterate through
held_lwlocks. But I guess we can also just later weaken them if it turns out
to be a problem.

Maybe we should add assertion support for arrays of locks, so we don't
need two levels of loop? Something like the attached?

Attachments:

v3-0001-Fix-lock-assertions-in-dshash.c.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Fix-lock-assertions-in-dshash.c.patch
#11Thomas Munro
Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#10)
Re: pgsql: dshash: Add sequential scan support.

On Tue, Jul 5, 2022 at 3:21 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Tue, Jul 5, 2022 at 11:25 AM Andres Freund <andres@anarazel.de> wrote:

On 2022-07-05 11:20:54 +1200, Thomas Munro wrote:

Since there were 6 places with I-hold-no-lock assertions, I shoved the
loop into a function so I could do:

-               Assert(!status->hash_table->find_locked);
+               assert_no_lock_held_by_me(hash_table);

I am a *bit* wary about the costs of that, even in assert builds - each of the
partition checks in the loop will in turn need to iterate through
held_lwlocks. But I guess we can also just later weaken them if it turns out
to be a problem.

Maybe we should add assertion support for arrays of locks, so we don't
need two levels of loop? Something like the attached?

Pushed.