Converting tab-complete.c's else-if chain to a switch
Per discussion elsewhere [1]/messages/by-id/1203570.1720470608@sss.pgh.pa.us, I've been looking at $SUBJECT.
I have a very crude Perl hack (too ugly to show yet ;-)) that
produces output along the lines of
Existing code:
/* CREATE STATISTICS <name> */
else if (Matches("CREATE", "STATISTICS", MatchAny))
COMPLETE_WITH("(", "ON");
else if (Matches("CREATE", "STATISTICS", MatchAny, "("))
COMPLETE_WITH("ndistinct", "dependencies", "mcv");
else if (Matches("CREATE", "STATISTICS", MatchAny, "(*)"))
COMPLETE_WITH("ON");
else if (HeadMatches("CREATE", "STATISTICS", MatchAny) &&
TailMatches("FROM"))
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables);
Generated tables:
enum TCPatternID
{
...
M_CREATE_STATISTICS_ANY,
M_CREATE_STATISTICS_ANY_LPAREN,
M_CREATE_STATISTICS_ANY_PARENSTARPAREN,
HM_CREATE_STATISTICS_ANY,
...
};
enum TCPatternKind
{
Match,
MatchCS,
HeadMatch,
HeadMatchCS,
TailMatch,
TailMatchCS,
};
typedef struct
{
enum TCPatternID id;
enum TCPatternKind kind;
int nwords;
const char *const *words;
} TCPattern;
#define TCPAT(id, kind, ...) \
{ (id), (kind), VA_ARGS_NARGS(__VA_ARGS__), \
(const char * const []) { __VA_ARGS__ } }
static const TCPattern tcpatterns[] =
{
...
TCPAT(M_CREATE_STATISTICS_ANY,
Match, "CREATE", "STATISTICS", MatchAny),
TCPAT(M_CREATE_STATISTICS_ANY_LPAREN,
Match, "CREATE", "STATISTICS", MatchAny, "("),
TCPAT(M_CREATE_STATISTICS_ANY_PARENSTARPAREN,
Match, "CREATE", "STATISTICS", MatchAny, "(*)"),
TCPAT(HM_CREATE_STATISTICS_ANY,
HeadMatch, "CREATE", "STATISTICS", MatchAny),
...
};
Converted code:
/* CREATE STATISTICS <name> */
case M_CREATE_STATISTICS_ANY:
COMPLETE_WITH("(", "ON");
break;
case M_CREATE_STATISTICS_ANY_LPAREN:
COMPLETE_WITH("ndistinct", "dependencies", "mcv");
break;
case M_CREATE_STATISTICS_ANY_PARENSTARPAREN:
COMPLETE_WITH("ON");
break;
case HM_CREATE_STATISTICS_ANY:
if (TailMatches("FROM"))
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables);
break;
The idea (I've not actually written this part yet) is that an
outer loop would iterate through the table entries and invoke
the appropriate switch case for any successful match. As soon
as the switch code sets "matches" non-NULL (it might not, as in
the last case in the example), we can exit.
While this clearly can be made to work, it doesn't seem like the
result will be very maintainable. You have to invent a single-use
enum label, and the actual definition of the primary match pattern
is far away from the code using it, and we have two fundamentally
different ways of writing the same pattern test (since we'll still
need some instances of direct calls to TailMatches and friends,
as in the last example case).
What I'm thinking about doing instead of pursuing this exact
implementation idea is that we should create a preprocessor
to deal with building the table. I'm envisioning that the
new source code will look like
/* CREATE STATISTICS <name> */
case Matches("CREATE", "STATISTICS", MatchAny):
COMPLETE_WITH("(", "ON");
break;
case Matches("CREATE", "STATISTICS", MatchAny, "("):
COMPLETE_WITH("ndistinct", "dependencies", "mcv");
break;
case Matches("CREATE", "STATISTICS", MatchAny, "(*)"):
COMPLETE_WITH("ON");
break;
case HeadMatches("CREATE", "STATISTICS", MatchAny):
if (TailMatches("FROM"))
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables);
break;
Of course this isn't legal C, since the case labels are not
constant expressions. The preprocessing script would look
for case labels that look like this, generate the appropriate
table entries, and replace the case labels with mechanically-
generated ID codes that don't need to be particularly human
readable. On the downside, YA preprocessor and mini-language
isn't much fun; but at least this is a lot closer to real C
than some of the other things we've invented.
(Further down the line, we can look into improvements such as
avoiding duplicate string comparisons; but that can be done behind
the scenes, without messing with this source-code notation.)
Does anyone particularly hate this plan, or have a better idea?
BTW, we have quite a few instances of code like the aforementioned
else if (HeadMatches("CREATE", "STATISTICS", MatchAny) &&
TailMatches("FROM"))
I'm thinking we should invent a Matches() option "MatchAnyN",
which could appear at most once and would represent an automatic
match to zero or more words appearing between the head part and
the tail part. Then this could be transformed to
else if (Matches("CREATE", "STATISTICS", MatchAny, MatchAnyN, "FROM"))
the advantage being that more of the pattern can be embedded in the
table and we need fewer bits of ad-hoc logic. Maybe this'd be worth
doing even if we don't go forward with the switch conversion.
regards, tom lane
I wrote:
Per discussion elsewhere [1], I've been looking at $SUBJECT.
Here's a finished patchset to perform this change. I don't
believe it has any material effect on runtime behavior or
performance, but it does achieve what Andres anticipated
would happen: the compile time for tab-complete.c drops
considerably. With gcc 8.5.0 I see the time drop from
about 3 seconds to about 0.7. The win is less noticeable
with clang version 15.0 on a Mac M1: from about 0.7s to 0.45s.
Of course the main point is the hope that it will work at all
with MSVC, but I'm not in a position to test that.
The most painful part of this is all the changes like
COMPLETE_WITH("ADD", "DROP", "OWNER TO", "RENAME", "SET",
"VALIDATE CONSTRAINT");
+ break;
/* ALTER DOMAIN <sth> DROP */
- else if (Matches("ALTER", "DOMAIN", MatchAny, "DROP"))
+ case Matches("ALTER", "DOMAIN", MatchAny, "DROP"):
COMPLETE_WITH("CONSTRAINT", "DEFAULT", "NOT NULL");
+ break;
/* ALTER DOMAIN <sth> DROP|RENAME|VALIDATE CONSTRAINT */
I despaired of doing that accurately by hand, so I wrote
a Perl script to do it. The script can cope with all but
about three of the existing tests; those have to be manually
modified before running the script, and then the actual
"switch()" has to be inserted afterwards. There won't be
any need to commit the script, since it's a one-time tool,
but I've included it for documentation's sake, along with
the "pre-convert.diff" and "post-convert.diff" manual patch
steps that are required to construct the complete 0004 patch.
I'm going to add this to the September CF, but I'd really
rather get it reviewed and committed pretty quickly.
Even with the helper script, I'm not eager to have to
rebase this a bunch of times.
regards, tom lane
Attachments:
v1-0001-Invent-MatchAnyN-option-for-tab-complete.c-s-Matc.patchtext/x-diff; charset=us-ascii; name*0=v1-0001-Invent-MatchAnyN-option-for-tab-complete.c-s-Matc.p; name*1=atchDownload+258-143
v1-0002-Split-out-most-of-psql_completion-into-a-separate.patchtext/x-diff; charset=us-ascii; name*0=v1-0002-Split-out-most-of-psql_completion-into-a-separate.p; name*1=atchDownload+97-67
v1-0003-Install-preprocessor-infrastructure-for-tab-compl.patchtext/x-diff; charset=us-ascii; name*0=v1-0003-Install-preprocessor-infrastructure-for-tab-compl.p; name*1=atchDownload+202-4
v1-0004-Convert-tab-complete-s-long-else-if-chain-to-a-sw.patchtext/x-diff; charset=us-ascii; name*0=v1-0004-Convert-tab-complete-s-long-else-if-chain-to-a-sw.p; name*1=atchDownload+1726-1125
v1-0005-Run-pgindent-on-tab-complete.in.c.patchtext/x-diff; charset=us-ascii; name=v1-0005-Run-pgindent-on-tab-complete.in.c.patchDownload+1942-1839
convert_tabcomplete.pltext/plain; charset=us-ascii; name=convert_tabcomplete.plDownload
pre-convert.diff.txttext/x-diff; charset=us-ascii; name=pre-convert.diff.txtDownload+49-11
post-convert.diff.txttext/x-diff; charset=us-ascii; name=post-convert.diff.txtDownload+75-12
Hi,
On 2024-07-13 13:16:14 -0400, Tom Lane wrote:
I wrote:
Per discussion elsewhere [1], I've been looking at $SUBJECT.
Here's a finished patchset to perform this change.
Awesome!
With gcc 8.5.0 I see the time drop from about 3 seconds to about 0.7. The
win is less noticeable with clang version 15.0 on a Mac M1: from about 0.7s
to 0.45s.
Nice!
With gcc 15 I see:
before after
-O0 1.2s 0.6s
-O2 10.5s 2.6s
-O3 10.8s 2.8s
Quite a nice win.
Of course the main point is the hope that it will work at all with MSVC, but
I'm not in a position to test that.
It does work with just your patches applied - very nice.
The only obvious thing that doesn't is that ctrl-c without a running query
terminates psql - interrupting a running query works without terminating psql,
which is a bit weird. But that seems independent of this series.
Subject: [PATCH v1 3/5] Install preprocessor infrastructure for
tab-complete.c.
Ah - I was wondering how the above would actually work when I read your intro :)
+tab-complete.c: gen_tabcomplete.pl tab-complete.in.c + $(PERL) $^ --outfile $@ +
When I first built this with make, I got this error:
make: *** No rule to make target '/home/andres/src/postgresql/src/bin/psql/tab-complete.c', needed by 'tab-complete.o'. Stop.
but that's just a a consequence of the make dependency handling... Removing
the .deps directory fixed it.
+# The input is a C file that contains some case labels that are not +# constants, but look like function calls, for example: +# case Matches("A", "B"): +# The function name can be any one of Matches, HeadMatches, TailMatches, +# MatchesCS, HeadMatchesCS, or TailMatchesCS. The argument(s) must be +# string literals or macros that expand to string literals or NULL.
Hm, the fact that we are continuing to use the same macros in the switch makes
it a bit painful to edit the .in.c in an editor with compiler-warnings
integration - I see a lot of reported errors ("Expression is not an integer
constant expression") due to case statements not being something that the
normal C switch can handle.
Perhaps we could use a distinct macro for use in the switch, which generates
valid (but nonsensical) code, so we avoid warnings?
+# These lines are replaced by "case N:" where N is a unique number +# for each such case label. (For convenience, we use the line number +# of the case label, although other values would work just as well.)
Hm, using the line number seems to make it a bit harder to compare the output
of the script after making changes to the input. Not the end of the world, but
...
+tabcomplete = custom_target('tabcomplete', + input: 'tab-complete.in.c', + output: 'tab-complete.c', + command: [ + perl, files('gen_tabcomplete.pl'), files('tab-complete.in.c'), + '--outfile', '@OUTPUT@', + '@INPUT@', + ], + install: true, + install_dir: dir_include_server / 'utils', +)
I don't think we want to install tab-complete.c?
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2024-07-13 13:16:14 -0400, Tom Lane wrote:
Of course the main point is the hope that it will work at all with MSVC, but
I'm not in a position to test that.
It does work with just your patches applied - very nice.
Thanks for testing!
The only obvious thing that doesn't is that ctrl-c without a running query
terminates psql - interrupting a running query works without terminating psql,
which is a bit weird. But that seems independent of this series.
Yeah, whatever's going on there is surely orthogonal to this.
+# The input is a C file that contains some case labels that are not +# constants, but look like function calls, for example: +# case Matches("A", "B"):
Hm, the fact that we are continuing to use the same macros in the switch makes
it a bit painful to edit the .in.c in an editor with compiler-warnings
integration - I see a lot of reported errors ("Expression is not an integer
constant expression") due to case statements not being something that the
normal C switch can handle.
Ugh, good point.
Perhaps we could use a distinct macro for use in the switch, which generates
valid (but nonsensical) code, so we avoid warnings?
Dunno. I'd explicitly wanted to not have different notations for
patterns implemented in switch labels and those used in ad-hoc tests.
Thinking a bit further outside the box ... maybe the original source
code could be like it is now (or more likely, like it is after 0002,
with the switch-to-be stuff all in a separate function), and we
could make the preprocessor perform the else-to-switch conversion
every time instead of viewing that as a one-time conversion?
That would make it a bit more fragile, but perhaps not impossibly so.
+# These lines are replaced by "case N:" where N is a unique number +# for each such case label. (For convenience, we use the line number +# of the case label, although other values would work just as well.)
Hm, using the line number seems to make it a bit harder to compare the output
of the script after making changes to the input.
OK. I'd thought this choice might be helpful for debugging, but I'm
not wedded to it. The obvious alternative is to use sequential
numbers for case labels, but wouldn't that also be a bit problematic
if "making changes" includes adding or removing rules?
I don't think we want to install tab-complete.c?
Good point --- I borrowed that rule from somewhere else, and
failed to study it closely enough.
regards, tom lane
I wrote:
Andres Freund <andres@anarazel.de> writes:
Hm, the fact that we are continuing to use the same macros in the switch makes
it a bit painful to edit the .in.c in an editor with compiler-warnings
integration - I see a lot of reported errors ("Expression is not an integer
constant expression") due to case statements not being something that the
normal C switch can handle.
Ugh, good point.
Thinking a bit further outside the box ... maybe the original source
code could be like it is now (or more likely, like it is after 0002,
with the switch-to-be stuff all in a separate function), and we
could make the preprocessor perform the else-to-switch conversion
every time instead of viewing that as a one-time conversion?
That would make it a bit more fragile, but perhaps not impossibly so.
I modified the preprocessor to work like that, and I like the results
better than what I had. This version of the patch is set up so that
both the preprocessor input and output files are legal, functional C:
without preprocessing it runs through the else-if chain, but after
preprocessing it uses a loop around the switch. Maybe that's
overkill, but to do anything less you have to make assumptions about
how smart the editor's compiler helper is. That doesn't sound like
an arms race that I want to engage in.
0001 attached is the same as the v1 version, but 0002 does a little
more than before, and then 0003 adds the preprocessor. (I fixed the
bogus install rule, too.)
regards, tom lane
Attachments:
v2-0001-Invent-MatchAnyN-option-for-tab-complete.c-s-Matc.patchtext/x-diff; charset=us-ascii; name*0=v2-0001-Invent-MatchAnyN-option-for-tab-complete.c-s-Matc.p; name*1=atchDownload+258-143
v2-0002-Prepare-tab-complete.c-for-preprocessing.patchtext/x-diff; charset=us-ascii; name=v2-0002-Prepare-tab-complete.c-for-preprocessing.patchDownload+268-77
v2-0003-Convert-tab-complete-s-long-else-if-chain-to-a-sw.patchtext/x-diff; charset=us-ascii; name*0=v2-0003-Convert-tab-complete-s-long-else-if-chain-to-a-sw.p; name*1=atchDownload+324-4
I wrote:
I modified the preprocessor to work like that, and I like the results
better than what I had.
I thought this version of the patch would be less subject to merge
conflicts than v1, but it didn't take long for somebody to break it.
Rebased v3 attached -- no nontrivial changes from v2.
I'd like to get this merged soon ...
regards, tom lane
Attachments:
v3-0001-Invent-MatchAnyN-option-for-tab-complete.c-s-Matc.patchtext/x-diff; charset=us-ascii; name*0=v3-0001-Invent-MatchAnyN-option-for-tab-complete.c-s-Matc.p; name*1=atchDownload+258-143
v3-0002-Prepare-tab-complete.c-for-preprocessing.patchtext/x-diff; charset=us-ascii; name=v3-0002-Prepare-tab-complete.c-for-preprocessing.patchDownload+269-77
v3-0003-Convert-tab-complete-s-long-else-if-chain-to-a-sw.patchtext/x-diff; charset=us-ascii; name*0=v3-0003-Convert-tab-complete-s-long-else-if-chain-to-a-sw.p; name*1=atchDownload+323-4
On Sat, Jul 27, 2024 at 8:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
I modified the preprocessor to work like that, and I like the results
better than what I had.I thought this version of the patch would be less subject to merge
conflicts than v1, but it didn't take long for somebody to break it.
Rebased v3 attached -- no nontrivial changes from v2.I'd like to get this merged soon ...
+1
This works nicely for me and I couldn't find anything that doesn't
work as expected, the mechanism seems reasonable, etc. I like how
it's moving slowly towards a grammar DSL.
Should we try to make sure that tab-complete.in.c remains compilable
directly with a build mode that does that with 'cp' instead of the
perl script?
One very trivial thing, where you say:
+# where N is the replacement case label, "HeadMatch" is the original
+# function name shorn of "es", and the rest are the function arguments.
... maybe simpler language: 'with "es" removed'? (Thinking of
non-native speakers; I understood that perfectly but I'm anglophone in
a country overrun with sheep.)
Thomas Munro <thomas.munro@gmail.com> writes:
On Sat, Jul 27, 2024 at 8:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I thought this version of the patch would be less subject to merge
conflicts than v1, but it didn't take long for somebody to break it.
Rebased v3 attached -- no nontrivial changes from v2.
+1
Thanks for looking at it!
Should we try to make sure that tab-complete.in.c remains compilable
directly with a build mode that does that with 'cp' instead of the
perl script?
Hmm, I should think that if you break anything in tab-complete.in.c,
the breakage would propagate to tab-complete.c without difficulty.
Do you have an example of something that the preprocessor would mask?
One very trivial thing, where you say: +# where N is the replacement case label, "HeadMatch" is the original +# function name shorn of "es", and the rest are the function arguments. ... maybe simpler language: 'with "es" removed'? (Thinking of non-native speakers; I understood that perfectly but I'm anglophone in a country overrun with sheep.)
Sure, that's changeable.
regards, tom lane
On Mon, Oct 7, 2024 at 12:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hmm, I should think that if you break anything in tab-complete.in.c,
the breakage would propagate to tab-complete.c without difficulty.
Do you have an example of something that the preprocessor would mask?
Fair point, and nope. Thought withdrawn.
Thomas Munro <thomas.munro@gmail.com> writes:
On Mon, Oct 7, 2024 at 12:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hmm, I should think that if you break anything in tab-complete.in.c,
the breakage would propagate to tab-complete.c without difficulty.
Do you have an example of something that the preprocessor would mask?
Fair point, and nope. Thought withdrawn.
Pushed, then.
The original motivation for all this was to try to get tab completion
working in MSVC builds. I'm not planning to do anything more towards
that myself, but maybe somebody else would like to have a go at it.
regards, tom lane
Hi,
bd1276a3c9 seems to have introduced a segfault when trying to complete
a word that doesn't have any match. For example, 'postgres=# z\t' will
yield the following backtrace:
#0: psql`pg_strcasecmp(s1="", s2="ACCESS METHOD") at pgstrcasecmp.c:40:39
#1: psql`psql_completion(text=":pgss-", start=0, end=6) at
tab-complete.in.c:2033:8
The matching code in tab-complete.in.c:
if (matches == NULL)
{
const pgsql_thing_t *wac;
for (wac = words_after_create; wac->name != NULL; wac++)
{
if (pg_strcasecmp(prev_wd, wac->name) == 0)
{
psql tries to read the previous word when matches are NULL. However,
there's no previous word set here, leading to the segfault.
I've attached a patch that checks if the previous word does exist
before trying to use it, along with an additional test for this
codepath.
Attachments:
v1-0001-Fix-psql-segfault-on-word-completion-without-matc.patchapplication/octet-stream; name=v1-0001-Fix-psql-segfault-on-word-completion-without-matc.patchDownload+6-2
On 10 Oct 2024, at 09:22, Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> wrote:
psql tries to read the previous word when matches are NULL. However,
there's no previous word set here, leading to the segfault.I've attached a patch that checks if the previous word does exist
before trying to use it, along with an additional test for this
codepath.
Nice, I was just typing up an email with the same fix when your landed in my
inbox after having ended up with a segfaulting psql testing a patch. (I didn't
include a test though so yours is better.) From reading the code I agree that
we must check previous_words_count here so +1 on this patch.
--
Daniel Gustafsson
Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> writes:
bd1276a3c9 seems to have introduced a segfault when trying to complete
a word that doesn't have any match. For example, 'postgres=# z\t' will
yield the following backtrace:
Interesting --- it seems to depend on which readline version you're
using, because I see a crash with current MacPorts' libreadline
but not with RHEL8's. Thanks for the report and patch!
regards, tom lane
Hi Tom,
Meson's coverage generation is unhappy with the new hardcoded #line
references emitted by gen_tabcomplete.pl:
genhtml: ERROR: cannot read
/home/jacob/src/postgres/worktree-oauth/build-dev/tab-complete.in.c
I've attached a potential fix which references $ARGV[0] instead, but
it's not really tested beyond "works for me".
While I was root-causing that, I also applied the following hunk to
align Meson's custom_target with the Makefile side:
@@ -27,8 +27,8 @@ tabcomplete = custom_target('tabcomplete',
input: 'tab-complete.in.c',
output: 'tab-complete.c',
command: [
- perl, files('gen_tabcomplete.pl'), files('tab-complete.in.c'),
- '--outfile', '@OUTPUT@', '@INPUT@',
+ perl, files('gen_tabcomplete.pl'), '@INPUT@',
+ '--outfile', '@OUTPUT@',
],
)
--Jacob
Attachments:
cov.diffapplication/octet-stream; name=cov.diffDownload+6-6
Jacob Champion <jacob.champion@enterprisedb.com> writes:
Meson's coverage generation is unhappy with the new hardcoded #line
references emitted by gen_tabcomplete.pl:
genhtml: ERROR: cannot read
/home/jacob/src/postgres/worktree-oauth/build-dev/tab-complete.in.c
I've attached a potential fix which references $ARGV[0] instead, but
it's not really tested beyond "works for me".
I tried this with meson and with a VPATH configure build, and it
seemed to work all right for both. (The coverage report in a
VPATH configure build is kind of wonky, but that was true before,
and I'm not sure that anyone wants to put effort into making it
cleaner.) Pushed.
regards, tom lane