Our naming of wait events is a disaster.

Started by Tom Lanealmost 6 years ago30 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

I've been trying to reformat table 27.4 (wait events) to fit
into PDF output, which has caused me to study its contents
more than I ever had before. The lack of consistency, or
even any weak attempt at consistency, is not just glaring;
it's embarrassing.

We have a lot of wait event names like these:

XidGenLock
ProcArrayLock
SInvalReadLock
SInvalWriteLock
WALBufMappingLock
WALWriteLock

which are more or less fine; maybe one could wish for having
just one way of capitalizing acronyms not two, but I'll let
that pass. But could we be satisfied with handling all multi
word names in that style? Nope:

commit_timestamp
multixact_offset
multixact_member
wal_insert

(and in case you are wondering, yes, "WAL" is also spelled "Wal"
in yet other places.)

And then somebody else, unwilling to use either of those styles,
thought it would be cute to do

Hash/Batch/Allocating
Hash/Batch/Electing
Hash/Batch/Loading
Hash/GrowBatches/Allocating

and all alone in the remotest stretch of left field, we've got

speculative token

(yes, with a space in it).

Also, while the average length of these names exceeds 16 characters,
with such gems as SerializablePredicateLockListLock, think not that
prolixity is the uniform rule:

oldserxid
proc
tbm

Is it unreasonable of me to think that there should be *some*
amount of editorial control over these user-visible names?
At the rock bottom minimum, shouldn't we insist that they all
be legal identifiers?

I'm not sure what our stance is on version-to-version consistency
of these names, but I'd like to think that we are not stuck for
all time with the results of these random coin tosses.

My inclination is to propose that we settle on the first style
shown above, which is the majority case now, and rename the
other events to fit that. As long as we're breaking compatibility
anyway, I'd also like to shorten one or two of the very longest
names, because they're just giving me fits in fixing the PDF
rendering. (They would make a mess of the display of
pg_stat_activity, too, anytime they come up in the field.)

Thoughts?

regards, tom lane

#2Jonah H. Harris
jonah.harris@gmail.com
In reply to: Tom Lane (#1)
Re: Our naming of wait events is a disaster.

On Tue, May 12, 2020 at 11:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

My inclination is to propose that we settle on the first style
shown above, which is the majority case now, and rename the
other events to fit that. As long as we're breaking compatibility
anyway, I'd also like to shorten one or two of the very longest
names, because they're just giving me fits in fixing the PDF
rendering. (They would make a mess of the display of
pg_stat_activity, too, anytime they come up in the field.)

Thoughts?

+1

--
Jonah H. Harris

#3Andrey Borodin
amborodin@acm.org
In reply to: Tom Lane (#1)
Re: Our naming of wait events is a disaster.

12 мая 2020 г., в 20:16, Tom Lane <tgl@sss.pgh.pa.us> написал(а):

Thoughts?

I've been coping with cognitive load of these names recently. 2 cents of my impressions:
1. Names are somewhat recognisable and seem to have some meaning. But there is not so much information about them in the Internet. But I did not try to Google them all, just a small subset.
2. Anyway, names should be grepable and googlable, i.e. unique amid identifiers.
3. I think names observed in wait_event and wait_event_type should not duplicate information. i.e. "XidGenLock" is already "LWLock".
4. It's hard to tell the difference between "buffer_content", "buffer_io", "buffer_mapping", "BufferPin", "BufFileRead", "BufFileWrite" and some others. "CLogControlLock" vs "clog"? I'm not sure good DBA can tell the difference without looking up into the code.
I hope some thoughts will be useful.

Best regards, Andrey Borodin.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrey Borodin (#3)
Re: Our naming of wait events is a disaster.

"Andrey M. Borodin" <x4mmm@yandex-team.ru> writes:

3. I think names observed in wait_event and wait_event_type should not duplicate information. i.e. "XidGenLock" is already "LWLock".

Yeah, I'd been wondering about that too: we could strip the "Lock" suffix
from all the names in the LWLock category, and make pg_stat_activity
output a bit narrower.

There are a lot of other things that seem inconsistent, but I'm not sure
how much patience people would have for judgment-call renamings. An
example is that "ProcSignalBarrier" is under IO, but why? Shouldn't it
be reclassified as IPC? Other than that, *almost* all the IO events
are named SomethingRead, SomethingWrite, or SomethingSync, which
makes sense to me ... should we insist they all follow that pattern?

Anyway, I was just throwing this idea out to see if there would be
howls of "you can't rename anything" anguish. Since there haven't
been so far, I'll spend a bit more time and try to create a concrete
list of possible changes.

regards, tom lane

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#4)
Re: Our naming of wait events is a disaster.

On Tue, 12 May 2020 at 19:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Anyway, I was just throwing this idea out to see if there would be
howls of "you can't rename anything" anguish. Since there haven't
been so far, I'll spend a bit more time and try to create a concrete
list of possible changes.

If we add in extensions and lwlocks, will they show up as well?

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
Mission Critical Databases

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#5)
Re: Our naming of wait events is a disaster.

Simon Riggs <simon@2ndquadrant.com> writes:

On Tue, 12 May 2020 at 19:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Anyway, I was just throwing this idea out to see if there would be
howls of "you can't rename anything" anguish. Since there haven't
been so far, I'll spend a bit more time and try to create a concrete
list of possible changes.

If we add in extensions and lwlocks, will they show up as well?

Yeah, I was just looking into that. Part of the reason for the
inconsistency is that we've exposed names that are passed to,
eg, SimpleLruInit that previously were strictly internal debugging
identifiers, so that approximately zero thought was put into them.

We're going to have to document SimpleLruInit and similar functions
along the lines of "The name you give here will be user-visible as
a wait event. Choose it with an eye to consistency with existing
wait event names, and add it to the user-facing documentation."
But that requirement isn't something I just invented, it was
effectively created by whoever implemented things this way.

Said user-facing documentation largely fails to explain that the
set of wait events can be enlarged by extensions; that needs to
be fixed, too.

There isn't a lot we can do to force extensions to pick consistent
names, but on the other hand we won't be documenting such names
anyway, so for my immediate purposes it doesn't matter ;-)

regards, tom lane

#7Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#4)
Re: Our naming of wait events is a disaster.

There are a lot of other things that seem inconsistent, but I'm not sure
how much patience people would have for judgment-call renamings. An
example is that "ProcSignalBarrier" is under IO, but why? Shouldn't it
be reclassified as IPC?

Hmm, that seems like a goof.

Other than that, *almost* all the IO events
are named SomethingRead, SomethingWrite, or SomethingSync, which
makes sense to me ... should we insist they all follow that pattern?

Maybe, but sometimes module X does more than one kind of
read/write/sync, and I'm not necessarily keen on merging things
together. The whole point of this is to be able to tell where you're
stuck in the code, and the more you merge related things together, the
less you can actually tell that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#6)
Re: Our naming of wait events is a disaster.

On Tue, May 12, 2020 at 4:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Said user-facing documentation largely fails to explain that the
set of wait events can be enlarged by extensions; that needs to
be fixed, too.

Is that true? How can they do that? I thought they were stuck with
PG_WAIT_EXTENSION.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#1)
Re: Our naming of wait events is a disaster.

On Tue, May 12, 2020 at 11:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I've been trying to reformat table 27.4 (wait events) to fit
into PDF output, which has caused me to study its contents
more than I ever had before.

That reminds me that it might be easier to maintain that table if we
broke it up into one table per major category - that is, one table for
lwlocks, one table for IPC, one table for IO, etc. - instead of a
single table with a row-span number that is large and frequently
updated incorrectly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#9)
Re: Our naming of wait events is a disaster.

On 2020-May-12, Robert Haas wrote:

That reminds me that it might be easier to maintain that table if we
broke it up into one table per major category - that is, one table for
lwlocks, one table for IPC, one table for IO, etc. - instead of a
single table with a row-span number that is large and frequently
updated incorrectly.

(Didn't we have a patch to generate the table programmatically?)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In reply to: Tom Lane (#1)
Re: Our naming of wait events is a disaster.

On Tue, May 12, 2020 at 8:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm not sure what our stance is on version-to-version consistency
of these names, but I'd like to think that we are not stuck for
all time with the results of these random coin tosses.

These names are fundamentally implementation details, and
implementation details are subject to change without too much warning.
I think it's okay to change the names for consistency along the lines
you propose. ISTM that it's worth going to a little bit of effort to
preserve any existing names. But not too much.

--
Peter Geoghegan

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#9)
Re: Our naming of wait events is a disaster.

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, May 12, 2020 at 11:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I've been trying to reformat table 27.4 (wait events) to fit
into PDF output, which has caused me to study its contents
more than I ever had before.

That reminds me that it might be easier to maintain that table if we
broke it up into one table per major category - that is, one table for
lwlocks, one table for IPC, one table for IO, etc. - instead of a
single table with a row-span number that is large and frequently
updated incorrectly.

Yeah, see my last attempt at

/messages/by-id/26961.1589260206@sss.pgh.pa.us

I'm probably going to go with that, but as given that patch conflicts
with my other pending patch to change the catalog description tables,
so I want to push that other one first and then clean up the wait-
event one. In the meantime, I'm going to look at these naming issues,
which will also be changing that patch ...

regards, tom lane

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#8)
Re: Our naming of wait events is a disaster.

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, May 12, 2020 at 4:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Said user-facing documentation largely fails to explain that the
set of wait events can be enlarged by extensions; that needs to
be fixed, too.

Is that true? How can they do that? I thought they were stuck with
PG_WAIT_EXTENSION.

Extensions can definitely add new LWLock tranches, and thereby
enlarge the set of names in that category. I haven't figured out
whether there are other avenues.

regards, tom lane

#14Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#6)
Re: Our naming of wait events is a disaster.

On Tue, 12 May 2020 at 21:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

On Tue, 12 May 2020 at 19:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Anyway, I was just throwing this idea out to see if there would be
howls of "you can't rename anything" anguish. Since there haven't
been so far, I'll spend a bit more time and try to create a concrete
list of possible changes.

If we add in extensions and lwlocks, will they show up as well?

Yeah, I was just looking into that. Part of the reason for the
inconsistency is that we've exposed names that are passed to,
eg, SimpleLruInit that previously were strictly internal debugging
identifiers, so that approximately zero thought was put into them.

We're going to have to document SimpleLruInit and similar functions
along the lines of "The name you give here will be user-visible as
a wait event. Choose it with an eye to consistency with existing
wait event names, and add it to the user-facing documentation."
But that requirement isn't something I just invented, it was
effectively created by whoever implemented things this way.

Said user-facing documentation largely fails to explain that the
set of wait events can be enlarged by extensions; that needs to
be fixed, too.

There isn't a lot we can do to force extensions to pick consistent
names, but on the other hand we won't be documenting such names
anyway, so for my immediate purposes it doesn't matter ;-)

I think we need to plan the namespace with extensions in mind.

There are now dozens; some of them even help you view wait events...

We don't want the equivalent of the Dewey decimal system: 300 categories of
Exaggeration and one small corner for Science.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
Mission Critical Databases

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#10)
Re: Our naming of wait events is a disaster.

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

(Didn't we have a patch to generate the table programmatically?)

Having now looked around a bit at where the names come from, I think
such a patch would be impossible as things stand, which is a pity
because as-things-stand is a totally unmaintainable situation.
Anybody at all can call LWLockRegisterTranche and thereby create a new
name that ought to be listed in the SGML table. Apparently, somebody
grepped for such calls and put all the ones that existed at the time
into the table; unsurprisingly, the results are already out of date.
Several of the hard-wired calls in RegisterLWLockTranches() are not
reflected in the SGML table AFAICS.

Or, if you don't want to call LWLockRegisterTranche, you can instead
call RequestNamedLWLockTranche. Whee. At least there are none of
those in the core code. However, we do have both autoprewarm and
pg_stat_statements calling these things from contrib.

That raises a policy question, or really two of them: should contrib
code be held to the standards we're going to set forth for tranche
names chosen by core code? And should contrib-created tranche names
be listed in chapter 27's table? (If not, should the individual
contrib modules document their additions?)

We could make things a little better perhaps if we got rid of all the
cowboy calls to LWLockRegisterTranche and had RegisterLWLockTranches
make all of them using a single table of names, as it already does
with MainLWLockNames[] but not the other ones. Then it'd be possible
to have documentation beside that table warning people to add entries
to the SGML docs; and even for the people who can't be bothered to
read comments, at least they'd be adding names to a list of names that
would give them some precedent and context for how to choose a new name.
I think 50% of the problem right now is that if you just write a
random new call to LWLockRegisterTranche in a random new place, you
have no context about what the tranche name should look like.

Even with all the names declared in some reasonably centralized
place(s), we'd be a long way from making the SGML tables programmatically,
because we'd not have text descriptions for the wait events. I can
imagine extending the source-code conventions a bit to include those
strings there, but I'm doubtful that it's worth the trouble.

regards, tom lane

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#15)
Re: Our naming of wait events is a disaster.

For the specific category of the heavyweight lock types, I'm
now thinking that we can't change the event names very much, because
those are also exposed in pg_locks' locktype column. We can be
darn certain, for example, that changing the spelling of "relation"
in that column would break a lot of user queries. Conceivably we
could decouple the wait event names from the locktype column, but
on the whole that doesn't seem like a great plan.

However, having said that, I remain on the warpath about "speculative
token". That's an utterly horrid choice for both locktype and wait
event. I also notice, with no amusement, that "speculative token"
is not documented in the pg_locks documentation. So I think we should
change it ... but to what, exactly? Looking at the other existing names:

const char *const LockTagTypeNames[] = {
"relation",
"extend",
"page",
"tuple",
"transactionid",
"virtualxid",
"speculative token",
"object",
"userlock",
"advisory"
};

I'm inclined to propose "spectoken". I'd be okay with "spec_token" as
well, but there are not underscores in the longer-established names.

(Needless to say, this array is going to gain a comment noting that
there are two places to document any changes. Also, if we split up
the wait_event table as discussed earlier, it might make sense for
pg_locks' documentation to cross-reference the sub-table for heavyweight
lock events, since that has some explanation of what the codes mean.)

regards, tom lane

#17Isaac Morland
isaac.morland@gmail.com
In reply to: Tom Lane (#15)
Re: Our naming of wait events is a disaster.

On Tue, 12 May 2020 at 18:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

(Didn't we have a patch to generate the table programmatically?)

Having now looked around a bit at where the names come from, I think
such a patch would be impossible as things stand, which is a pity
because as-things-stand is a totally unmaintainable situation.
Anybody at all can call LWLockRegisterTranche and thereby create a new
name that ought to be listed in the SGML table. Apparently, somebody
grepped for such calls and put all the ones that existed at the time
into the table; unsurprisingly, the results are already out of date.
Several of the hard-wired calls in RegisterLWLockTranches() are not
reflected in the SGML table AFAICS.

I expect there is a reason why this hasn’t been suggested, but just in case
it is at all helpful:

When do these names get created? That is, do all the wait types get created
and registered on startup, or is it more like whenever something needs to
do something the name gets passed in ad hoc? I'm wondering because it seems
like it might be helpful to have a system view which gives all the wait
event types, names, and descriptions. Maybe even add a column for which
extension (or core) it came from. The documentation could then just explain
the general situation and point people at the system view to see exactly
which wait types exist in their system. This would require every instance
where a type is registered to pass an additional parameter — the
description, as currently seen in the table in the documentation.

Of course if the names get passed in ad hoc then such a view could only
show the types that happen to have been created up to the moment it is
queried, which would defeat the purpose. And I can think of a few potential
reasons why this might not work at all, starting with the need to re-write
every example of registering a new type to provide the documentation string
for the view.

Inspiration due to pg_setting, pg_config, pg_available_extensions and
pg_get_keywords ().

#18Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#1)
Re: Our naming of wait events is a disaster.

On Wed, May 13, 2020 at 3:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

And then somebody else, unwilling to use either of those styles,
thought it would be cute to do

Hash/Batch/Allocating
Hash/Batch/Electing
Hash/Batch/Loading
Hash/GrowBatches/Allocating

Perhaps we should also drop the 'ing' from the verbs, to be more like
...Read etc.

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#18)
Re: Our naming of wait events is a disaster.

Thomas Munro <thomas.munro@gmail.com> writes:

On Wed, May 13, 2020 at 3:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hash/Batch/Allocating
Hash/Batch/Electing
Hash/Batch/Loading
Hash/GrowBatches/Allocating

Perhaps we should also drop the 'ing' from the verbs, to be more like
...Read etc.

Yeah, that aspect was bothering me too. Comparing these to other
wait event names, you could make a case for either "Allocate" or
"Allocation"; but there are no other names with -ing.

regards, tom lane

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Isaac Morland (#17)
Re: Our naming of wait events is a disaster.

Isaac Morland <isaac.morland@gmail.com> writes:

... I'm wondering because it seems
like it might be helpful to have a system view which gives all the wait
event types, names, and descriptions. Maybe even add a column for which
extension (or core) it came from. The documentation could then just explain
the general situation and point people at the system view to see exactly
which wait types exist in their system.

There's certainly an argument for doing things like that, but I think it'd
be a net negative in terms of quality and consistency of documentation.
We'd basically be deciding those are non-goals.

Of course, ripping out table 27.4 altogether would be a simple solution
to the formatting problem I started with ;-). But it doesn't really
seem like the answer we want.

Of course if the names get passed in ad hoc then such a view could only
show the types that happen to have been created up to the moment it is
queried, which would defeat the purpose.

Yes, exactly.

I don't actually understand why the LWLock tranche mechanism is designed
the way it is. It seems to be intended to support different backends
having different sets of LWLocks, but I fail to see why that's a good idea,
or even useful at all. In any case, dynamically-created LWLocks are
clearly out of scope for the documentation. The problem that I'm trying
to deal with right now is that even LWLocks that are hard-wired into the
backend code are difficult to enumerate. That wasn't a problem before
we decided we needed to expose them all to user view; but now it is.

regards, tom lane

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#21)
#23Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#20)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#25)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#26)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#21)
#29Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#19)