Portal->commandTag as an enum

Started by Mark Dilgerabout 6 years ago36 messageshackers
Jump to latest
#1Mark Dilger
mark.dilger@enterprisedb.com

Hackers,

I have implemented $subject, attached.

While reviewing the "New SQL counter statistics view (pg_stat_sql)” thread [1]/messages/by-id/CAJrrPGeY4xujjoR=z=KoyRMHEK_pSjjp=7VBhOAHq9rfgpV7QQ@mail.gmail.com, I came across Andres’ comment

That's not really something in this patch, but a lot of this would be
better if we didn't internally have command tags as strings, but as an
enum. We should really only convert to a string with needed. That
we're doing string comparisons on Portal->commandTag is just plain bad.

If so, we could also make this whole patch a lot cheaper - have a fixed
size array that has an entry for every possible tag (possibly even in
shared memory, and then use atomics there).

I put the CommandTag enum in src/common because there wasn’t any reason not to do so. It seems plausible that frontend test frameworks might want access to this enum. I don’t have any frontend code using it yet, nor any concrete plans for that. I’m indifferent about this, and will move it into src/backend if folks think that’s better.

In commands/event_trigger.c, I changed the separation between EVENT_TRIGGER_COMMAND_TAG_NOT_SUPPORTED and EVENT_TRIGGER_COMMAND_TAG_NOT_RECOGNIZED. It used to claim not to recognize command tags that are indeed recognized elsewhere in the system, but simply not expected here. It now returns “not supported” for them, and only returns “not recognized” for special enum values COMMANDTAG_NULL and COMMANDTAG_UNKNOWN, as well as values outside the recognized range of the enum. I’m happy to change my implementation to preserve the old behavior if necessary. Is there a backward compatibility issue here? It does not impact regression test output for me to change this, but that’s not definitive….

I have extended the event_trigger.sql regression test, with new expected output, and when applying that change to master, the test fails due to the “not supported” vs. “not recognized” distinction. I have kept this regression test change in its own patch file, 0002. The differences when applied to master look like:

create event trigger regress_event_trigger_ALTER_SYSTEM on ddl_command_start
when tag in ('ALTER SYSTEM')
execute procedure test_event_trigger2();
-ERROR:  event triggers are not supported for ALTER SYSTEM
+ERROR:  filter value "ALTER SYSTEM" not recognized for filter variable "tag"

PreventCommandIfReadOnly and PreventCommandIfParallelMode sometimes take a commandTag, but in commands/sequence.c they take strings “nextval()” and “setval()”. Likewise, PreventCommandDuringRecovery takes "txid_current()” in adt/txid.c. I had to work around this a little, which was not hard to do, but it made me wonder if command tags and these sorts of functions shouldn’t be unified a bit more. They don’t really look consistent with all the other values in the CommandTag enum, so I left them out. I’m open to opinions about this.

There was some confusion in the code between a commandTag and a completionTag, with the commandTag getting overwritten with the completionTag over the course of execution. I’ve split that out into two distinctly separate concepts, which I think makes the code easier to grok. I’ve added a portal->completionTag field that is a fixed size buffer rather than a palloc’d string, to match how completionTag works elsewhere. But the old code that was overwriting the commandTag (a palloc’d string) with a completionTag (a char[] buffer) was using pstrdup for that purpose. I’m now using strlcpy. I don’t care much which way to go here (buffer vs. palloc’d string). Let me know if using a fixed sized buffer as I’ve done bothers anybody.

There were some instances of things like:

strcpy(completionTag, portal->commandTag);

which should have more properly been

strlcpy(completionTag, portal->commandTag, COMPLETION_TAG_BUFSIZE);

I don’t know if any of these were live bugs, but they seemed like traps for the future, should any new commandTag length overflow the buffer size. I think this patch fixes all of those cases.

Generating CommandTag enum values from user queries and then converting those back to string for printing or use in set_ps_display results in normalization of the commandTag, by which I mean that it becomes all uppercase. I don’t know of any situations where this would matter, but I can’t say for sure that it doesn’t. Anybody have thoughts on that?

[1]: /messages/by-id/CAJrrPGeY4xujjoR=z=KoyRMHEK_pSjjp=7VBhOAHq9rfgpV7QQ@mail.gmail.com

Attachments:

v1-0001-Migrating-commandTag-from-string-to-enum.patchapplication/octet-stream; name=v1-0001-Migrating-commandTag-from-string-to-enum.patch; x-unix-mode=0644Download+1298-438
v1-0002-Extending-the-event_trigger-regression-test.patchapplication/octet-stream; name=v1-0002-Extending-the-event_trigger-regression-test.patch; x-unix-mode=0644Download+1549-1
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Dilger (#1)
Re: Portal->commandTag as an enum

Mark Dilger <mark.dilger@enterprisedb.com> writes:

I put the CommandTag enum in src/common because there wasn’t any reason
not to do so. It seems plausible that frontend test frameworks might want
access to this enum.

Au contraire, that's an absolutely fundamental mistake. There is
zero chance of this enum holding still across PG versions, so if
we expose it to frontend code, we're going to have big problems
with cross-version compatibility. See our historical problems with
assuming the enum for character set encodings was the same between
frontend and backend ... and that set is orders of magnitude more
stable than this one.

As I recall, the hardest problem with de-string-ifying this is the fact
that for certain tags we include a rowcount in the string. I'd like to
see that undone --- we have to keep it like that on-the-wire to avoid a
protocol break, but it'd be best if noplace inside the backend did it that
way, and we converted at the last moment before sending a CommandComplete
to the client. Your references to "completion tags" make it sound like
you've only half done the conversion (though I've not read the patch
in enough detail to verify).

BTW, the size of the patch is rather depressing, especially if it's
only half done. Unlike Andres, I'm not even a little bit convinced
that this is worth the amount of code churn it'll cause. Have you
done any code-size or speed comparisons?

regards, tom lane

#3Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Tom Lane (#2)
Re: Portal->commandTag as an enum

On Feb 2, 2020, at 6:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Mark Dilger <mark.dilger@enterprisedb.com> writes:

I put the CommandTag enum in src/common because there wasn’t any reason
not to do so. It seems plausible that frontend test frameworks might want
access to this enum.

Thanks for looking!

Au contraire, that's an absolutely fundamental mistake. There is
zero chance of this enum holding still across PG versions, so if
we expose it to frontend code, we're going to have big problems
with cross-version compatibility. See our historical problems with
assuming the enum for character set encodings was the same between
frontend and backend ... and that set is orders of magnitude more
stable than this one.

I completely agree that this enum cannot be expected to remain stable across versions.

For the purposes of this patch, which has nothing to do with frontend tools, this issue doesn’t matter to me. I’m happy to move this into src/backend.

Is there no place to put code which would be useful for frontend tools without implying stability? Sure, psql and friends can’t use it, because they need to be able to connect to servers of other versions. But why couldn’t a test framework tool use something like this? Could we have someplace like src/common/volatile for this sort of thing?

As I recall, the hardest problem with de-string-ifying this is the fact
that for certain tags we include a rowcount in the string. I'd like to
see that undone --- we have to keep it like that on-the-wire to avoid a
protocol break, but it'd be best if noplace inside the backend did it that
way, and we converted at the last moment before sending a CommandComplete
to the client. Your references to "completion tags" make it sound like
you've only half done the conversion (though I've not read the patch
in enough detail to verify).

In v1, I stayed closer to the existing code structure than you are requesting. I like the direction you’re suggesting that I go, and I’ve begun that transition in anticipation of posting a v2 patch set soon.

BTW, the size of the patch is rather depressing, especially if it's
only half done. Unlike Andres, I'm not even a little bit convinced
that this is worth the amount of code churn it'll cause. Have you
done any code-size or speed comparisons?

A fair amount of the code churn is replacing strings with their enum equivalent, creating the enum itself, and creating a data table mapping enums to strings. The churn doesn’t look too bad to me when viewing the original vs new code diff side-by-side.

The second file (v1-0002…) is entirely an extension of the regression tests. Applying v1-0001… doesn’t entail needing to apply v1-0002… as the code being tested exists before and after the patch. If you don’t want to apply that regression test change, that’s fine. It just provides more extensive coverage of event_triggers over different command tags.

There will be a bit more churn in v2, since I’m changing the code flow a bit more to avoid generating the strings until they are about to get sent to the client, per your comments above. That has the advantage that multiple places in the old code where the completionTag was parsed to get the nprocessed count back out now doesn’t need any parsing.

I’ll include stats about code-size and speed when I post v2.

Thanks again for reviewing my patch idea!


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Mark Dilger (#3)
Re: Portal->commandTag as an enum

On Feb 3, 2020, at 9:41 AM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:

In v1, I stayed closer to the existing code structure than you are requesting. I like the direction you’re suggesting that I go, and I’ve begun that transition in anticipation of posting a v2 patch set soon.

Ok, here is v2, attached.

In master, a number of functions pass a char *completionTag argument (really a char completionTag[COMPLETION_TAG_BUFSIZE]) which gets filled in with the string to return to the client from EndCommand. I have removed that kind of logic:

- /* save the rowcount if we're given a completionTag to fill */
- if (completionTag)
- snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
- "SELECT " UINT64_FORMAT,
- queryDesc->estate->es_processed);

In the patch, this is replaced with a new struct, QueryCompletionData. That bit of code above is replaced with:

+               /* save the rowcount if we're given a qc to fill */
+               if (qc)
+                       SetQC(qc, COMMANDTAG_SELECT, queryDesc->estate->es_processed, DISPLAYFORMAT_NPROCESSED);

For wire protocol compatibility, we have to track the display format. When this gets to EndCommand, the display format allows the string to be written exactly as the client will expect. If we ever get to the point where we can break with that compatibility, the third member of this struct, display_format, can be removed.

Where string parsing is being done in master to get the count back out, it changes to look like this:

-                                       if (strncmp(completionTag, "SELECT ", 7) == 0)
-                                               _SPI_current->processed =
-                                                       pg_strtouint64(completionTag + 7, NULL, 10);
+                                       if (qcdata.commandTag == COMMANDTAG_SELECT)
+                                               _SPI_current->processed = qcdata.nprocessed;

One of the advantages to the patch is that the commandTag for a portal is not overwritten by the commandTag in the QueryCompletionData, meaning for example that if an EXECUTE command returns the string “UPDATE 0”, the portal->commandTag remains COMMANDTAG_EXECUTE while the qcdata.commandTag becomes COMMANDTAG_UPDATE. This could be helpful to code trying to track how many operations of a given type have run.

In event_trigger.c, in master there are ad-hoc comparisons against c-strings:

- /*
- * Handle some idiosyncratic special cases.
- */
- if (pg_strcasecmp(tag, "CREATE TABLE AS") == 0 ||
- pg_strcasecmp(tag, "SELECT INTO") == 0 ||
- pg_strcasecmp(tag, "REFRESH MATERIALIZED VIEW") == 0 ||
- pg_strcasecmp(tag, "ALTER DEFAULT PRIVILEGES") == 0 ||
- pg_strcasecmp(tag, "ALTER LARGE OBJECT") == 0 ||
- pg_strcasecmp(tag, "COMMENT") == 0 ||
- pg_strcasecmp(tag, "GRANT") == 0 ||
- pg_strcasecmp(tag, "REVOKE") == 0 ||
- pg_strcasecmp(tag, "DROP OWNED") == 0 ||
- pg_strcasecmp(tag, "IMPORT FOREIGN SCHEMA") == 0 ||
- pg_strcasecmp(tag, "SECURITY LABEL") == 0)

These are replaced by switch() case statements over the possible commandTags:

+       switch (commandTag)
+       {
+                       /*
+                        * Supported idiosyncratic special cases.
+                        */
+               case COMMANDTAG_ALTER_DEFAULT_PRIVILEGES:
+               case COMMANDTAG_ALTER_LARGE_OBJECT:
+               case COMMANDTAG_COMMENT:
+               case COMMANDTAG_CREATE_TABLE_AS:
+               case COMMANDTAG_DROP_OWNED:
+               case COMMANDTAG_GRANT:
+               case COMMANDTAG_IMPORT_FOREIGN_SCHEMA:
+               case COMMANDTAG_REFRESH_MATERIALIZED_VIEW:
+               case COMMANDTAG_REVOKE:
+               case COMMANDTAG_SECURITY_LABEL:
+               case COMMANDTAG_SELECT_INTO:

I think this is easier to read, verify, and maintain. The compiler can help if you leave a command tag out of the list, which the compiler cannot help discover in master as it is currently written. But I also think all those pg_strcasecmp calls are likely more expensive at runtime.

In master, EventTriggerCacheItem tracks a sorted array of palloc’d cstrings. In the patch, that becomes a Bitmapset over the enum:

typedef struct
 {
        Oid                     fnoid;                  /* function to be called */
        char            enabled;                /* as SESSION_REPLICATION_ROLE_* */
-       int                     ntags;                  /* number of command tags */
-       char      **tag;                        /* command tags in SORTED order */
+       Bitmapset  *tagset;                     /* command tags, or NULL if empty */
 } EventTriggerCacheItem;

The code in evtcache.c is shorter and, in my opinion, easier to read. In filter_event_trigger, rather than running bsearch through a sorted array of strings, it just runs bms_is_member.

I’ve kept this change to the event trigger code in its own separate patch file, to make the change easier to review in isolation.

I’ll include stats about code-size and speed when I post v2.

The benchmarks are from tpc-b_96.sql. I think I’ll need to adjust the benchmark to put more emphasis on the particular code that I’m changing, but I have run this standard benchmark for this email:

For master (1fd687a035):

postgresql % find src -type f -name "*.c" -or -name "*.h" | xargs cat | wc
1482117 5690660 45256959

postgresql % find src -type f -name "*.o" | xargs cat | wc
38283 476264 18999164

Averages for test set 1 by scale:
set scale tps avg_latency 90%< max_latency
1 1 3741 1.734 3.162 133.718
1 9 6124 0.904 1.05 230.547
1 81 5921 0.931 1.015 67.023

Averages for test set 1 by clients:
set clients tps avg_latency 90%< max_latency
1 1 2163 0.461 0.514 24.414
1 4 5968 0.675 0.791 40.354
1 16 7655 2.433 3.922 366.519

For command tag patch (branched from 1fd687a035):

postgresql % find src -type f -name "*.c" -or -name "*.h" | xargs cat | wc
1482969 5691908 45281399

postgresql % find src -type f -name "*.o" | xargs cat | wc
38209 476243 18999752

Averages for test set 1 by scale:
set scale tps avg_latency 90%< max_latency
1 1 3877 1.645 3.066 24.973
1 9 6383 0.859 1.032 64.566
1 81 5945 0.925 1.023 162.9

Averages for test set 1 by clients:
set clients tps avg_latency 90%< max_latency
1 1 2141 0.466 0.522 11.531
1 4 5967 0.673 0.783 136.882
1 16 8096 2.292 3.817 104.026

Attachments:

v2-0001-Migrating-commandTag-from-string-to-enum.patchapplication/octet-stream; name=v2-0001-Migrating-commandTag-from-string-to-enum.patch; x-unix-mode=0644Download+1439-586
v2-0002-Using-a-Bitmapset-of-tags-rather-than-a-string-ar.patchapplication/octet-stream; name=v2-0002-Using-a-Bitmapset-of-tags-rather-than-a-string-ar.patch; x-unix-mode=0644Download+21-26
v2-0003-Extending-the-event_trigger-regression-test.patchapplication/octet-stream; name=v2-0003-Extending-the-event_trigger-regression-test.patch; x-unix-mode=0644Download+1549-1
#5Andres Freund
andres@anarazel.de
In reply to: Mark Dilger (#4)
Re: Portal->commandTag as an enum

Hi,

On 2020-02-04 18:18:52 -0800, Mark Dilger wrote:

In master, a number of functions pass a char *completionTag argument (really a char completionTag[COMPLETION_TAG_BUFSIZE]) which gets filled in with the string to return to the client from EndCommand. I have removed that kind of logic:

- /* save the rowcount if we're given a completionTag to fill */
- if (completionTag)
- snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
- "SELECT " UINT64_FORMAT,
- queryDesc->estate->es_processed);

In the patch, this is replaced with a new struct, QueryCompletionData. That bit of code above is replaced with:

+               /* save the rowcount if we're given a qc to fill */
+               if (qc)
+                       SetQC(qc, COMMANDTAG_SELECT, queryDesc->estate->es_processed, DISPLAYFORMAT_NPROCESSED);

For wire protocol compatibility, we have to track the display format.
When this gets to EndCommand, the display format allows the string to
be written exactly as the client will expect. If we ever get to the
point where we can break with that compatibility, the third member of
this struct, display_format, can be removed.

Hm. While I like not having this as strings a lot, I wish we could get
rid of this displayformat stuff.

These are replaced by switch() case statements over the possible commandTags:

+       switch (commandTag)
+       {
+                       /*
+                        * Supported idiosyncratic special cases.
+                        */
+               case COMMANDTAG_ALTER_DEFAULT_PRIVILEGES:
+               case COMMANDTAG_ALTER_LARGE_OBJECT:
+               case COMMANDTAG_COMMENT:
+               case COMMANDTAG_CREATE_TABLE_AS:
+               case COMMANDTAG_DROP_OWNED:
+               case COMMANDTAG_GRANT:
+               case COMMANDTAG_IMPORT_FOREIGN_SCHEMA:
+               case COMMANDTAG_REFRESH_MATERIALIZED_VIEW:
+               case COMMANDTAG_REVOKE:
+               case COMMANDTAG_SECURITY_LABEL:
+               case COMMANDTAG_SELECT_INTO:

The number of these makes me wonder if we should just have a metadata
table in one place, instead of needing to edit multiple
locations. Something like

const ... CommandTagBehaviour[] = {
[COMMANDTAG_INSERT] = {
.display_processed = true, .display_oid = true, ...},
[COMMANDTAG_CREATE_TABLE_AS] = {
.event_trigger = true, ...},

with the zero initialized defaults being the common cases.

Not sure if it's worth going there. But it's maybe worth thinking about
for a minute?

Averages for test set 1 by scale:
set scale tps avg_latency 90%< max_latency
1 1 3741 1.734 3.162 133.718
1 9 6124 0.904 1.05 230.547
1 81 5921 0.931 1.015 67.023

Averages for test set 1 by clients:
set clients tps avg_latency 90%< max_latency
1 1 2163 0.461 0.514 24.414
1 4 5968 0.675 0.791 40.354
1 16 7655 2.433 3.922 366.519

For command tag patch (branched from 1fd687a035):

postgresql % find src -type f -name "*.c" -or -name "*.h" | xargs cat | wc
1482969 5691908 45281399

postgresql % find src -type f -name "*.o" | xargs cat | wc
38209 476243 18999752

Averages for test set 1 by scale:
set scale tps avg_latency 90%< max_latency
1 1 3877 1.645 3.066 24.973
1 9 6383 0.859 1.032 64.566
1 81 5945 0.925 1.023 162.9

Averages for test set 1 by clients:
set clients tps avg_latency 90%< max_latency
1 1 2141 0.466 0.522 11.531
1 4 5967 0.673 0.783 136.882
1 16 8096 2.292 3.817 104.026

Not bad.

diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 9aa2b61600..5322c14ce4 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -594,7 +594,7 @@ pg_notify(PG_FUNCTION_ARGS)
payload = text_to_cstring(PG_GETARG_TEXT_PP(1));
/* For NOTIFY as a statement, this is checked in ProcessUtility */
-	PreventCommandDuringRecovery("NOTIFY");
+	PreventCommandDuringRecovery(COMMANDTAG_NOTIFY);

Async_Notify(channel, payload);

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 40a8ec1abd..4828e75bd5 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1063,7 +1063,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
/* check read-only transaction and parallel mode */
if (XactReadOnly && !rel->rd_islocaltemp)
-			PreventCommandIfReadOnly("COPY FROM");
+			PreventCommandIfReadOnly(COMMANDTAG_COPY_FROM);

cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);

I'm not sure this really ought to be part of this change - seems like a
somewhat independent change to me. With less obvious benefits.

static event_trigger_command_tag_check_result
-check_ddl_tag(const char *tag)
+check_ddl_tag(CommandTag commandTag)
{
-	const char *obtypename;
-	const event_trigger_support_data *etsd;
+	switch (commandTag)
+	{
+			/*
+			 * Supported idiosyncratic special cases.
+			 */
+		case COMMANDTAG_ALTER_DEFAULT_PRIVILEGES:
+		case COMMANDTAG_ALTER_LARGE_OBJECT:
+		case COMMANDTAG_COMMENT:
+		case COMMANDTAG_CREATE_TABLE_AS:
+		case COMMANDTAG_DROP_OWNED:
+		case COMMANDTAG_GRANT:
+		case COMMANDTAG_IMPORT_FOREIGN_SCHEMA:
+		case COMMANDTAG_REFRESH_MATERIALIZED_VIEW:
+		case COMMANDTAG_REVOKE:
+		case COMMANDTAG_SECURITY_LABEL:
+		case COMMANDTAG_SELECT_INTO:
-	/*
-	 * Handle some idiosyncratic special cases.
-	 */
-	if (pg_strcasecmp(tag, "CREATE TABLE AS") == 0 ||
-		pg_strcasecmp(tag, "SELECT INTO") == 0 ||
-		pg_strcasecmp(tag, "REFRESH MATERIALIZED VIEW") == 0 ||
-		pg_strcasecmp(tag, "ALTER DEFAULT PRIVILEGES") == 0 ||
-		pg_strcasecmp(tag, "ALTER LARGE OBJECT") == 0 ||
-		pg_strcasecmp(tag, "COMMENT") == 0 ||
-		pg_strcasecmp(tag, "GRANT") == 0 ||
-		pg_strcasecmp(tag, "REVOKE") == 0 ||
-		pg_strcasecmp(tag, "DROP OWNED") == 0 ||
-		pg_strcasecmp(tag, "IMPORT FOREIGN SCHEMA") == 0 ||
-		pg_strcasecmp(tag, "SECURITY LABEL") == 0)
-		return EVENT_TRIGGER_COMMAND_TAG_OK;
+			/*
+			 * Supported CREATE commands
+			 */
+		case COMMANDTAG_CREATE_ACCESS_METHOD:
+		case COMMANDTAG_CREATE_AGGREGATE:
+		case COMMANDTAG_CREATE_CAST:
+		case COMMANDTAG_CREATE_COLLATION:
+		case COMMANDTAG_CREATE_CONSTRAINT:
+		case COMMANDTAG_CREATE_CONVERSION:
+		case COMMANDTAG_CREATE_DOMAIN:
+		case COMMANDTAG_CREATE_EXTENSION:
+		case COMMANDTAG_CREATE_FOREIGN_DATA_WRAPPER:
+		case COMMANDTAG_CREATE_FOREIGN_TABLE:
+		case COMMANDTAG_CREATE_FUNCTION:
+		case COMMANDTAG_CREATE_INDEX:
+		case COMMANDTAG_CREATE_LANGUAGE:
+		case COMMANDTAG_CREATE_MATERIALIZED_VIEW:
+		case COMMANDTAG_CREATE_OPERATOR:
+		case COMMANDTAG_CREATE_OPERATOR_CLASS:
+		case COMMANDTAG_CREATE_OPERATOR_FAMILY:
+		case COMMANDTAG_CREATE_POLICY:
+		case COMMANDTAG_CREATE_PROCEDURE:
+		case COMMANDTAG_CREATE_PUBLICATION:
+		case COMMANDTAG_CREATE_ROUTINE:
+		case COMMANDTAG_CREATE_RULE:
+		case COMMANDTAG_CREATE_SCHEMA:
+		case COMMANDTAG_CREATE_SEQUENCE:
+		case COMMANDTAG_CREATE_SERVER:
+		case COMMANDTAG_CREATE_STATISTICS:
+		case COMMANDTAG_CREATE_SUBSCRIPTION:
+		case COMMANDTAG_CREATE_TABLE:
+		case COMMANDTAG_CREATE_TEXT_SEARCH_CONFIGURATION:
+		case COMMANDTAG_CREATE_TEXT_SEARCH_DICTIONARY:
+		case COMMANDTAG_CREATE_TEXT_SEARCH_PARSER:
+		case COMMANDTAG_CREATE_TEXT_SEARCH_TEMPLATE:
+		case COMMANDTAG_CREATE_TRANSFORM:
+		case COMMANDTAG_CREATE_TRIGGER:
+		case COMMANDTAG_CREATE_TYPE:
+		case COMMANDTAG_CREATE_USER_MAPPING:
+		case COMMANDTAG_CREATE_VIEW:
-	/*
-	 * Otherwise, command should be CREATE, ALTER, or DROP.
-	 */
-	if (pg_strncasecmp(tag, "CREATE ", 7) == 0)
-		obtypename = tag + 7;
-	else if (pg_strncasecmp(tag, "ALTER ", 6) == 0)
-		obtypename = tag + 6;
-	else if (pg_strncasecmp(tag, "DROP ", 5) == 0)
-		obtypename = tag + 5;
-	else
-		return EVENT_TRIGGER_COMMAND_TAG_NOT_RECOGNIZED;
+			/*
+			 * Supported ALTER commands
+			 */
+		case COMMANDTAG_ALTER_ACCESS_METHOD:
+		case COMMANDTAG_ALTER_AGGREGATE:
+		case COMMANDTAG_ALTER_CAST:
+		case COMMANDTAG_ALTER_COLLATION:
+		case COMMANDTAG_ALTER_CONSTRAINT:
+		case COMMANDTAG_ALTER_CONVERSION:
+		case COMMANDTAG_ALTER_DOMAIN:
+		case COMMANDTAG_ALTER_EXTENSION:
+		case COMMANDTAG_ALTER_FOREIGN_DATA_WRAPPER:
+		case COMMANDTAG_ALTER_FOREIGN_TABLE:
+		case COMMANDTAG_ALTER_FUNCTION:
+		case COMMANDTAG_ALTER_INDEX:
+		case COMMANDTAG_ALTER_LANGUAGE:
+		case COMMANDTAG_ALTER_MATERIALIZED_VIEW:
+		case COMMANDTAG_ALTER_OPERATOR:
+		case COMMANDTAG_ALTER_OPERATOR_CLASS:
+		case COMMANDTAG_ALTER_OPERATOR_FAMILY:
+		case COMMANDTAG_ALTER_POLICY:
+		case COMMANDTAG_ALTER_PROCEDURE:
+		case COMMANDTAG_ALTER_PUBLICATION:
+		case COMMANDTAG_ALTER_ROUTINE:
+		case COMMANDTAG_ALTER_RULE:
+		case COMMANDTAG_ALTER_SCHEMA:
+		case COMMANDTAG_ALTER_SEQUENCE:
+		case COMMANDTAG_ALTER_SERVER:
+		case COMMANDTAG_ALTER_STATISTICS:
+		case COMMANDTAG_ALTER_SUBSCRIPTION:
+		case COMMANDTAG_ALTER_TABLE:
+		case COMMANDTAG_ALTER_TEXT_SEARCH_CONFIGURATION:
+		case COMMANDTAG_ALTER_TEXT_SEARCH_DICTIONARY:
+		case COMMANDTAG_ALTER_TEXT_SEARCH_PARSER:
+		case COMMANDTAG_ALTER_TEXT_SEARCH_TEMPLATE:
+		case COMMANDTAG_ALTER_TRANSFORM:
+		case COMMANDTAG_ALTER_TRIGGER:
+		case COMMANDTAG_ALTER_TYPE:
+		case COMMANDTAG_ALTER_USER_MAPPING:
+		case COMMANDTAG_ALTER_VIEW:
-	/*
-	 * ...and the object type should be something recognizable.
-	 */
-	for (etsd = event_trigger_support; etsd->obtypename != NULL; etsd++)
-		if (pg_strcasecmp(etsd->obtypename, obtypename) == 0)
+			/*
+			 * Supported DROP commands
+			 */
+		case COMMANDTAG_DROP_ACCESS_METHOD:
+		case COMMANDTAG_DROP_AGGREGATE:
+		case COMMANDTAG_DROP_CAST:
+		case COMMANDTAG_DROP_COLLATION:
+		case COMMANDTAG_DROP_CONSTRAINT:
+		case COMMANDTAG_DROP_CONVERSION:
+		case COMMANDTAG_DROP_DOMAIN:
+		case COMMANDTAG_DROP_EXTENSION:
+		case COMMANDTAG_DROP_FOREIGN_DATA_WRAPPER:
+		case COMMANDTAG_DROP_FOREIGN_TABLE:
+		case COMMANDTAG_DROP_FUNCTION:
+		case COMMANDTAG_DROP_INDEX:
+		case COMMANDTAG_DROP_LANGUAGE:
+		case COMMANDTAG_DROP_MATERIALIZED_VIEW:
+		case COMMANDTAG_DROP_OPERATOR:
+		case COMMANDTAG_DROP_OPERATOR_CLASS:
+		case COMMANDTAG_DROP_OPERATOR_FAMILY:
+		case COMMANDTAG_DROP_POLICY:
+		case COMMANDTAG_DROP_PROCEDURE:
+		case COMMANDTAG_DROP_PUBLICATION:
+		case COMMANDTAG_DROP_ROUTINE:
+		case COMMANDTAG_DROP_RULE:
+		case COMMANDTAG_DROP_SCHEMA:
+		case COMMANDTAG_DROP_SEQUENCE:
+		case COMMANDTAG_DROP_SERVER:
+		case COMMANDTAG_DROP_STATISTICS:
+		case COMMANDTAG_DROP_SUBSCRIPTION:
+		case COMMANDTAG_DROP_TABLE:
+		case COMMANDTAG_DROP_TEXT_SEARCH_CONFIGURATION:
+		case COMMANDTAG_DROP_TEXT_SEARCH_DICTIONARY:
+		case COMMANDTAG_DROP_TEXT_SEARCH_PARSER:
+		case COMMANDTAG_DROP_TEXT_SEARCH_TEMPLATE:
+		case COMMANDTAG_DROP_TRANSFORM:
+		case COMMANDTAG_DROP_TRIGGER:
+		case COMMANDTAG_DROP_TYPE:
+		case COMMANDTAG_DROP_USER_MAPPING:
+		case COMMANDTAG_DROP_VIEW:
+			return EVENT_TRIGGER_COMMAND_TAG_OK;
+
+			/*
+			 * Unsupported CREATE commands
+			 */
+		case COMMANDTAG_CREATE_DATABASE:
+		case COMMANDTAG_CREATE_EVENT_TRIGGER:
+		case COMMANDTAG_CREATE_ROLE:
+		case COMMANDTAG_CREATE_TABLESPACE:
+
+			/*
+			 * Unsupported ALTER commands
+			 */
+		case COMMANDTAG_ALTER_DATABASE:
+		case COMMANDTAG_ALTER_EVENT_TRIGGER:
+		case COMMANDTAG_ALTER_ROLE:
+		case COMMANDTAG_ALTER_TABLESPACE:
+
+			/*
+			 * Unsupported DROP commands
+			 */
+		case COMMANDTAG_DROP_DATABASE:
+		case COMMANDTAG_DROP_EVENT_TRIGGER:
+		case COMMANDTAG_DROP_ROLE:
+		case COMMANDTAG_DROP_TABLESPACE:
+
+			/*
+			 * Other unsupported commands.  These used to return
+			 * EVENT_TRIGGER_COMMAND_TAG_NOT_RECOGNIZED prior to the
+			 * conversion of commandTag from string to enum.
+			 */
+		case COMMANDTAG_ALTER_SYSTEM:
+		case COMMANDTAG_ANALYZE:
+		case COMMANDTAG_BEGIN:
+		case COMMANDTAG_CALL:
+		case COMMANDTAG_CHECKPOINT:
+		case COMMANDTAG_CLOSE:
+		case COMMANDTAG_CLOSE_CURSOR:
+		case COMMANDTAG_CLOSE_CURSOR_ALL:
+		case COMMANDTAG_CLUSTER:
+		case COMMANDTAG_COMMIT:
+		case COMMANDTAG_COMMIT_PREPARED:
+		case COMMANDTAG_COPY:
+		case COMMANDTAG_COPY_FROM:
+		case COMMANDTAG_DEALLOCATE:
+		case COMMANDTAG_DEALLOCATE_ALL:
+		case COMMANDTAG_DECLARE_CURSOR:
+		case COMMANDTAG_DELETE:
+		case COMMANDTAG_DISCARD:
+		case COMMANDTAG_DISCARD_ALL:
+		case COMMANDTAG_DISCARD_PLANS:
+		case COMMANDTAG_DISCARD_SEQUENCES:
+		case COMMANDTAG_DISCARD_TEMP:
+		case COMMANDTAG_DO:
+		case COMMANDTAG_DROP_REPLICATION_SLOT:
+		case COMMANDTAG_EXECUTE:
+		case COMMANDTAG_EXPLAIN:
+		case COMMANDTAG_FETCH:
+		case COMMANDTAG_GRANT_ROLE:
+		case COMMANDTAG_INSERT:
+		case COMMANDTAG_LISTEN:
+		case COMMANDTAG_LOAD:
+		case COMMANDTAG_LOCK_TABLE:
+		case COMMANDTAG_MOVE:
+		case COMMANDTAG_NOTIFY:
+		case COMMANDTAG_PREPARE:
+		case COMMANDTAG_PREPARE_TRANSACTION:
+		case COMMANDTAG_REASSIGN_OWNED:
+		case COMMANDTAG_REINDEX:
+		case COMMANDTAG_RELEASE:
+		case COMMANDTAG_RESET:
+		case COMMANDTAG_REVOKE_ROLE:
+		case COMMANDTAG_ROLLBACK:
+		case COMMANDTAG_ROLLBACK_PREPARED:
+		case COMMANDTAG_SAVEPOINT:
+		case COMMANDTAG_SELECT:
+		case COMMANDTAG_SELECT_FOR_KEY_SHARE:
+		case COMMANDTAG_SELECT_FOR_NO_KEY_UPDATE:
+		case COMMANDTAG_SELECT_FOR_SHARE:
+		case COMMANDTAG_SELECT_FOR_UPDATE:
+		case COMMANDTAG_SET:
+		case COMMANDTAG_SET_CONSTRAINTS:
+		case COMMANDTAG_SHOW:
+		case COMMANDTAG_START_TRANSACTION:
+		case COMMANDTAG_TRUNCATE_TABLE:
+		case COMMANDTAG_UNLISTEN:
+		case COMMANDTAG_UPDATE:
+		case COMMANDTAG_VACUUM:
+			return EVENT_TRIGGER_COMMAND_TAG_NOT_SUPPORTED;
+		case COMMANDTAG_UNKNOWN:
break;
-	if (etsd->obtypename == NULL)
-		return EVENT_TRIGGER_COMMAND_TAG_NOT_RECOGNIZED;
-	if (!etsd->supported)
-		return EVENT_TRIGGER_COMMAND_TAG_NOT_SUPPORTED;
-	return EVENT_TRIGGER_COMMAND_TAG_OK;
+	}
+	return EVENT_TRIGGER_COMMAND_TAG_NOT_RECOGNIZED;
}

This is pretty painful.

@@ -745,7 +902,7 @@ EventTriggerCommonSetup(Node *parsetree,
return NIL;

/* Get the command tag. */
-	tag = CreateCommandTag(parsetree);
+	tag = GetCommandTagName(CreateCommandTag(parsetree));
/*
* Filter list of event triggers by command tag, and copy them into our
@@ -2136,7 +2293,7 @@ pg_event_trigger_ddl_commands(PG_FUNCTION_ARGS)
/* objsubid */
values[i++] = Int32GetDatum(addr.objectSubId);
/* command tag */
-					values[i++] = CStringGetTextDatum(CreateCommandTag(cmd->parsetree));
+					values[i++] = CStringGetTextDatum(GetCommandTagName(CreateCommandTag(cmd->parsetree)));
/* object_type */
values[i++] = CStringGetTextDatum(type);
/* schema */
@@ -2161,7 +2318,7 @@ pg_event_trigger_ddl_commands(PG_FUNCTION_ARGS)
/* objsubid */
nulls[i++] = true;
/* command tag */
-				values[i++] = CStringGetTextDatum(CreateCommandTag(cmd->parsetree));
+				values[i++] = CStringGetTextDatum(GetCommandTagName(CreateCommandTag(cmd->parsetree)));
/* object_type */
values[i++] = CStringGetTextDatum(stringify_adefprivs_objtype(cmd->d.defprivs.objtype));
/* schema */

So GetCommandTagName we commonly do twice for some reason? Once in
EventTriggerCommonSetup() and then again in
pg_event_trigger_ddl_commands()? Why is EventTriggerData.tag still the
string?

Assert(list_length(plan->plancache_list) == 1);
@@ -1469,7 +1469,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/* translator: %s is a SQL statement name */
errmsg("%s is not allowed in a non-volatile function",
-								CreateCommandTag((Node *) pstmt))));
+								GetCommandTagName(CreateCommandTag((Node *) pstmt)))));

Probably worth having a wrapper for this - these lines are pretty long,
and there quite a number of cases like it in the patch.

@@ -172,11 +175,38 @@ EndCommand(const char *commandTag, CommandDest dest)
case DestRemoteSimple:

/*
-			 * We assume the commandTag is plain ASCII and therefore requires
-			 * no encoding conversion.
+			 * We assume the tagname is plain ASCII and therefore
+			 * requires no encoding conversion.
*/
-			pq_putmessage('C', commandTag, strlen(commandTag) + 1);
-			break;
+			tagname = GetCommandTagName(qc->commandTag);
+			switch (qc->display_format)
+			{
+				case DISPLAYFORMAT_PLAIN:
+					pq_putmessage('C', tagname, strlen(tagname) + 1);
+					break;
+				case DISPLAYFORMAT_LAST_OID:
+					/*
+					 * We no longer display LastOid, but to preserve the wire protocol,
+					 * we write InvalidOid where the LastOid used to be written.  For
+					 * efficiency in the snprintf(), hard-code InvalidOid as zero.
+					 */
+					Assert(InvalidOid == 0);
+					snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
+								"%s 0 " UINT64_FORMAT,
+								tagname,
+								qc->nprocessed);
+					pq_putmessage('C', completionTag, strlen(completionTag) + 1);
+					break;
+				case DISPLAYFORMAT_NPROCESSED:
+					snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
+							"%s " UINT64_FORMAT,
+							tagname,
+							qc->nprocessed);
+					pq_putmessage('C', completionTag, strlen(completionTag) + 1);
+					break;
+				default:
+					elog(ERROR, "Invalid display_format in EndCommand");
+			}

Imo there should only be one pq_putmessage(). Also think this type of
default: is a bad idea, just prevents the compiler from warning if we
were to ever introduce a new variant of DISPLAYFORMAT_*, without
providing any meaningful additional security.

@@ -855,7 +889,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,

case T_DiscardStmt:
/* should we allow DISCARD PLANS? */
-			CheckRestrictedOperation("DISCARD");
+			CheckRestrictedOperation(COMMANDTAG_DISCARD);
DiscardCommand((DiscardStmt *) parsetree, isTopLevel);
break;

@@ -974,7 +1008,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->objtype))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
ExecuteGrantStmt(stmt);
}
@@ -987,7 +1021,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->removeType))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
ExecDropStmt(stmt, isTopLevel);
}
@@ -1000,7 +1034,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->renameType))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
ExecRenameStmt(stmt);
}
@@ -1013,7 +1047,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->objectType))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
ExecAlterObjectDependsStmt(stmt, NULL);
}
@@ -1026,7 +1060,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->objectType))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
ExecAlterObjectSchemaStmt(stmt, NULL);
}
@@ -1039,7 +1073,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->objectType))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
ExecAlterOwnerStmt(stmt);
}
@@ -1052,7 +1086,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->objtype))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
CommentObject(stmt);
break;
@@ -1065,7 +1099,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->objtype))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);

Not this patch's fault or task. But I hate this type of code - needing
to touch a dozen places for new type of statement is just
insane. utility.c should long have been rewritten to just have one
metadata table for nearly all of this. Perhaps with a few callbacks for
special cases.

+static const char * tag_names[] = {
+	"???",
+	"ALTER ACCESS METHOD",
+	"ALTER AGGREGATE",
+	"ALTER CAST",

This seems problematic to maintain, because the order needs to match
between this and something defined in a header - and there's no
guarantee a misordering is immediately noticeable. We should either go
for my metadata table idea, or at least rewrite this, even if more
verbose, to something like

static const char * tag_names[] = {
[COMMAND_TAG_ALTER_ACCESS_METHOD] = "ALTER ACCESS METHOD",
...

I think the fact that this would show up in a grep for
COMMAND_TAG_ALTER_ACCESS_METHOD is good too.

+/*
+ * Search CommandTag by name
+ *
+ * Returns CommandTag, or COMMANDTAG_UNKNOWN if not recognized
+ */
+CommandTag
+GetCommandTagEnum(const char *commandname)
+{
+	const char **base, **last, **position;
+	int		 result;
+
+	OPTIONALLY_CHECK_COMMAND_TAGS();
+	if (commandname == NULL || *commandname == '\0')
+		return COMMANDTAG_UNKNOWN;
+
+	base = tag_names;
+	last = tag_names + tag_name_length - 1;
+	while (last >= base)
+	{
+		position = base + ((last - base) >> 1);
+		result = pg_strcasecmp(commandname, *position);
+		if (result == 0)
+			return (CommandTag) (position - tag_names);
+		else if (result < 0)
+			last = position - 1;
+		else
+			base = position + 1;
+	}
+	return COMMANDTAG_UNKNOWN;
+}

This seems pretty grotty - but you get rid of it later. See my comments there.

+#ifdef COMMANDTAG_CHECKING
+bool
+CheckCommandTagEnum()
+{
+	CommandTag	i, j;
+
+	if (FIRST_COMMAND_TAG < 0 || LAST_COMMAND_TAG < 0 || LAST_COMMAND_TAG < FIRST_COMMAND_TAG)
+	{
+		elog(ERROR, "FIRST_COMMAND_TAG (%u), LAST_COMMAND_TAG (%u) not reasonable",
+			 (unsigned int) FIRST_COMMAND_TAG, (unsigned int) LAST_COMMAND_TAG);
+		return false;
+	}
+	if (FIRST_COMMAND_TAG != (CommandTag)0)
+	{
+		elog(ERROR, "FIRST_COMMAND_TAG (%u) != 0", (unsigned int) FIRST_COMMAND_TAG);
+		return false;
+	}
+	if (LAST_COMMAND_TAG != (CommandTag)(tag_name_length - 1))
+	{
+		elog(ERROR, "LAST_COMMAND_TAG (%u) != tag_name_length (%u)",
+			 (unsigned int) LAST_COMMAND_TAG, (unsigned int) tag_name_length);
+		return false;
+	}

These all seem to want to be static asserts.

+	for (i = FIRST_COMMAND_TAG; i < LAST_COMMAND_TAG; i++)
+	{
+		for (j = i+1; j < LAST_COMMAND_TAG; j++)
+		{
+			int cmp = strcmp(tag_names[i], tag_names[j]);
+			if (cmp == 0)
+			{
+				elog(ERROR, "Found duplicate tag_name: \"%s\"",
+					tag_names[i]);
+				return false;
+			}
+			if (cmp > 0)
+			{
+				elog(ERROR, "Found commandnames out of order: \"%s\" before \"%s\"",
+					tag_names[i], tag_names[j]);
+				return false;
+			}
+		}
+	}
+	return true;
+}

And I think we could get rid of this with my earlier suggestions?

+/*
+ * BEWARE: These are in sorted order, but ordered by their printed
+ *         values in the tag_name list (see common/commandtag.c).
+ *         In particular it matters because the sort ordering changes
+ *         when you replace a space with an underscore.  To wit:
+ *
+ *             "CREATE TABLE"
+ *             "CREATE TABLE AS"
+ *             "CREATE TABLESPACE"
+ *
+ *         but...
+ *
+ *             CREATE_TABLE
+ *             CREATE_TABLESPACE
+ *             CREATE_TABLE_AS
+ *
+ *         It also matters that COMMANDTAG_UNKNOWN is written "???".
+ *
+ * If you add a value here, add it in common/commandtag.c also, and
+ * be careful to get the ordering right.  You can build with
+ * COMMANDTAG_CHECKING to have this automatically checked
+ * at runtime, but that adds considerable overhead, so do so sparingly.
+ */
+typedef enum CommandTag
+{

This seems pretty darn nightmarish.

+#define FIRST_COMMAND_TAG COMMANDTAG_UNKNOWN
+	COMMANDTAG_UNKNOWN,
+	COMMANDTAG_ALTER_ACCESS_METHOD,
+	COMMANDTAG_ALTER_AGGREGATE,
+	COMMANDTAG_ALTER_CAST,
+	COMMANDTAG_ALTER_COLLATION,
+	COMMANDTAG_ALTER_CONSTRAINT,
+	COMMANDTAG_ALTER_CONVERSION,
+	COMMANDTAG_ALTER_DATABASE,
+	COMMANDTAG_ALTER_DEFAULT_PRIVILEGES,
+	COMMANDTAG_ALTER_DOMAIN,
[...]

I'm a bit worried that this basically duplicates a good portion of NodeTag, without having otherwise much of a point?

From a70b0cadc1142e92b2354a0ca3cd47aaeb0c148e Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 4 Feb 2020 12:25:05 -0800
Subject: [PATCH v2 2/3] Using a Bitmapset of tags rather than a string array.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

EventTriggerCacheItem no longer holds an array of palloc’d tag strings
in sorted order, but rather just a Bitmapset over the CommandTags. This
makes the code a little simpler and easier to read, in my opinion. In
filter_event_trigger, rather than running bsearch through a sorted array
of strings, it just runs bms_is_member.
---

It seems weird to add the bsearch just to remove it immediately again a
patch later. This probably should just go first?

diff --git a/src/test/regress/sql/event_trigger.sql b/src/test/regress/sql/event_trigger.sql
index 346168673d..cad02212ad 100644
--- a/src/test/regress/sql/event_trigger.sql
+++ b/src/test/regress/sql/event_trigger.sql
@@ -10,6 +10,13 @@ BEGIN
END
$$ language plpgsql;
+-- OK
+create function test_event_trigger2() returns event_trigger as $$
+BEGIN
+	RAISE NOTICE 'test_event_trigger2: % %', tg_event, tg_tag;
+END
+$$ LANGUAGE plpgsql;
+
-- should fail, event triggers cannot have declared arguments
create function test_event_trigger_arg(name text)
returns event_trigger as $$ BEGIN RETURN 1; END $$ language plpgsql;
@@ -82,6 +89,783 @@ create event trigger regress_event_trigger2 on ddl_command_start
-- OK
comment on event trigger regress_event_trigger is 'test comment';
+-- These are all unsupported
+create event trigger regress_event_triger_NULL on ddl_command_start
+   when tag in ('')
+   execute procedure test_event_trigger2();
+
+create event trigger regress_event_triger_UNKNOWN on ddl_command_start
+   when tag in ('???')
+   execute procedure test_event_trigger2();
+
+create event trigger regress_event_trigger_ALTER_DATABASE on ddl_command_start
+   when tag in ('ALTER DATABASE')
+   execute procedure test_event_trigger2();

[...]

There got to be a more maintainable way to write this.

Greetings,

Andres Freund

#6Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Andres Freund (#5)
Re: Portal->commandTag as an enum

On Feb 4, 2020, at 7:34 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,

Thanks for reviewing! I am pretty much in agreement with your comments, below.

On 2020-02-04 18:18:52 -0800, Mark Dilger wrote:

In master, a number of functions pass a char *completionTag argument (really a char completionTag[COMPLETION_TAG_BUFSIZE]) which gets filled in with the string to return to the client from EndCommand. I have removed that kind of logic:

- /* save the rowcount if we're given a completionTag to fill */
- if (completionTag)
- snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
- "SELECT " UINT64_FORMAT,
- queryDesc->estate->es_processed);

In the patch, this is replaced with a new struct, QueryCompletionData. That bit of code above is replaced with:

+               /* save the rowcount if we're given a qc to fill */
+               if (qc)
+                       SetQC(qc, COMMANDTAG_SELECT, queryDesc->estate->es_processed, DISPLAYFORMAT_NPROCESSED);

For wire protocol compatibility, we have to track the display format.
When this gets to EndCommand, the display format allows the string to
be written exactly as the client will expect. If we ever get to the
point where we can break with that compatibility, the third member of
this struct, display_format, can be removed.

Hm. While I like not having this as strings a lot, I wish we could get
rid of this displayformat stuff.

Agreed, but I don’t know how.

These are replaced by switch() case statements over the possible commandTags:

+       switch (commandTag)
+       {
+                       /*
+                        * Supported idiosyncratic special cases.
+                        */
+               case COMMANDTAG_ALTER_DEFAULT_PRIVILEGES:
+               case COMMANDTAG_ALTER_LARGE_OBJECT:
+               case COMMANDTAG_COMMENT:
+               case COMMANDTAG_CREATE_TABLE_AS:
+               case COMMANDTAG_DROP_OWNED:
+               case COMMANDTAG_GRANT:
+               case COMMANDTAG_IMPORT_FOREIGN_SCHEMA:
+               case COMMANDTAG_REFRESH_MATERIALIZED_VIEW:
+               case COMMANDTAG_REVOKE:
+               case COMMANDTAG_SECURITY_LABEL:
+               case COMMANDTAG_SELECT_INTO:

The number of these makes me wonder if we should just have a metadata
table in one place, instead of needing to edit multiple
locations. Something like

const ... CommandTagBehaviour[] = {
[COMMANDTAG_INSERT] = {
.display_processed = true, .display_oid = true, ...},
[COMMANDTAG_CREATE_TABLE_AS] = {
.event_trigger = true, ...},

with the zero initialized defaults being the common cases.

Not sure if it's worth going there. But it's maybe worth thinking about
for a minute?

Yes, I was thinking about something like this, only I had in mind a Bitmapset for these. It just so happens that there are 192 enum values, 0..191, which happens to fit in 3 64bit words plus a varlena header. Mind you, that nice property would be immediately blown if we added another entry to the enum. Has anybody made a compile-time static version of Bitmapset? We could store this information in either 24 bytes or 32 bytes, depending on whether we add another enum value.

Getting a little off topic, I was also thinking about having a counting Bitmapset that would store one bit per enum that is included, and then a sparse array of counts, perhaps with one byte counts for [0..127] and 8 byte counts for [128..huge] that we could use in shared memory for the pg_stat_tag work. Is there anything like that?

Anyway, I don’t think we should invent lots of different structures for CommandTag tracking, so something that serves double duty might keep the code tighter. I’m already using ordinary Bitmapset over CommandTags in event_trigger, so naturally that comes to mind for this, too.

Averages for test set 1 by scale:
set scale tps avg_latency 90%< max_latency
1 1 3741 1.734 3.162 133.718
1 9 6124 0.904 1.05 230.547
1 81 5921 0.931 1.015 67.023

Averages for test set 1 by clients:
set clients tps avg_latency 90%< max_latency
1 1 2163 0.461 0.514 24.414
1 4 5968 0.675 0.791 40.354
1 16 7655 2.433 3.922 366.519

For command tag patch (branched from 1fd687a035):

postgresql % find src -type f -name "*.c" -or -name "*.h" | xargs cat | wc
1482969 5691908 45281399

postgresql % find src -type f -name "*.o" | xargs cat | wc
38209 476243 18999752

Averages for test set 1 by scale:
set scale tps avg_latency 90%< max_latency
1 1 3877 1.645 3.066 24.973
1 9 6383 0.859 1.032 64.566
1 81 5945 0.925 1.023 162.9

Averages for test set 1 by clients:
set clients tps avg_latency 90%< max_latency
1 1 2141 0.466 0.522 11.531
1 4 5967 0.673 0.783 136.882
1 16 8096 2.292 3.817 104.026

Not bad.

I still need to get a benchmark more targeted at this codepath.

diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 9aa2b61600..5322c14ce4 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -594,7 +594,7 @@ pg_notify(PG_FUNCTION_ARGS)
payload = text_to_cstring(PG_GETARG_TEXT_PP(1));
/* For NOTIFY as a statement, this is checked in ProcessUtility */
-	PreventCommandDuringRecovery("NOTIFY");
+	PreventCommandDuringRecovery(COMMANDTAG_NOTIFY);

Async_Notify(channel, payload);

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 40a8ec1abd..4828e75bd5 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1063,7 +1063,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
/* check read-only transaction and parallel mode */
if (XactReadOnly && !rel->rd_islocaltemp)
-			PreventCommandIfReadOnly("COPY FROM");
+			PreventCommandIfReadOnly(COMMANDTAG_COPY_FROM);

cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);

I'm not sure this really ought to be part of this change - seems like a
somewhat independent change to me. With less obvious benefits.

I don’t think I care too much either way. I had some vague ideas about consolidating all of these strings in the backend into one place.

static event_trigger_command_tag_check_result
-check_ddl_tag(const char *tag)
+check_ddl_tag(CommandTag commandTag)
{
-	const char *obtypename;
-	const event_trigger_support_data *etsd;
+	switch (commandTag)
+	{
+			/*
+			 * Supported idiosyncratic special cases.
+			 */
+		case COMMANDTAG_ALTER_DEFAULT_PRIVILEGES:
+		case COMMANDTAG_ALTER_LARGE_OBJECT:
+		case COMMANDTAG_COMMENT:
+		case COMMANDTAG_CREATE_TABLE_AS:
+		case COMMANDTAG_DROP_OWNED:
+		case COMMANDTAG_GRANT:
+		case COMMANDTAG_IMPORT_FOREIGN_SCHEMA:
+		case COMMANDTAG_REFRESH_MATERIALIZED_VIEW:
+		case COMMANDTAG_REVOKE:
+		case COMMANDTAG_SECURITY_LABEL:
+		case COMMANDTAG_SELECT_INTO:
-	/*
-	 * Handle some idiosyncratic special cases.
-	 */
-	if (pg_strcasecmp(tag, "CREATE TABLE AS") == 0 ||
-		pg_strcasecmp(tag, "SELECT INTO") == 0 ||
-		pg_strcasecmp(tag, "REFRESH MATERIALIZED VIEW") == 0 ||
-		pg_strcasecmp(tag, "ALTER DEFAULT PRIVILEGES") == 0 ||
-		pg_strcasecmp(tag, "ALTER LARGE OBJECT") == 0 ||
-		pg_strcasecmp(tag, "COMMENT") == 0 ||
-		pg_strcasecmp(tag, "GRANT") == 0 ||
-		pg_strcasecmp(tag, "REVOKE") == 0 ||
-		pg_strcasecmp(tag, "DROP OWNED") == 0 ||
-		pg_strcasecmp(tag, "IMPORT FOREIGN SCHEMA") == 0 ||
-		pg_strcasecmp(tag, "SECURITY LABEL") == 0)
-		return EVENT_TRIGGER_COMMAND_TAG_OK;
+			/*
+			 * Supported CREATE commands
+			 */
+		case COMMANDTAG_CREATE_ACCESS_METHOD:
+		case COMMANDTAG_CREATE_AGGREGATE:
+		case COMMANDTAG_CREATE_CAST:
+		case COMMANDTAG_CREATE_COLLATION:
+		case COMMANDTAG_CREATE_CONSTRAINT:
+		case COMMANDTAG_CREATE_CONVERSION:
+		case COMMANDTAG_CREATE_DOMAIN:
+		case COMMANDTAG_CREATE_EXTENSION:
+		case COMMANDTAG_CREATE_FOREIGN_DATA_WRAPPER:
+		case COMMANDTAG_CREATE_FOREIGN_TABLE:
+		case COMMANDTAG_CREATE_FUNCTION:
+		case COMMANDTAG_CREATE_INDEX:
+		case COMMANDTAG_CREATE_LANGUAGE:
+		case COMMANDTAG_CREATE_MATERIALIZED_VIEW:
+		case COMMANDTAG_CREATE_OPERATOR:
+		case COMMANDTAG_CREATE_OPERATOR_CLASS:
+		case COMMANDTAG_CREATE_OPERATOR_FAMILY:
+		case COMMANDTAG_CREATE_POLICY:
+		case COMMANDTAG_CREATE_PROCEDURE:
+		case COMMANDTAG_CREATE_PUBLICATION:
+		case COMMANDTAG_CREATE_ROUTINE:
+		case COMMANDTAG_CREATE_RULE:
+		case COMMANDTAG_CREATE_SCHEMA:
+		case COMMANDTAG_CREATE_SEQUENCE:
+		case COMMANDTAG_CREATE_SERVER:
+		case COMMANDTAG_CREATE_STATISTICS:
+		case COMMANDTAG_CREATE_SUBSCRIPTION:
+		case COMMANDTAG_CREATE_TABLE:
+		case COMMANDTAG_CREATE_TEXT_SEARCH_CONFIGURATION:
+		case COMMANDTAG_CREATE_TEXT_SEARCH_DICTIONARY:
+		case COMMANDTAG_CREATE_TEXT_SEARCH_PARSER:
+		case COMMANDTAG_CREATE_TEXT_SEARCH_TEMPLATE:
+		case COMMANDTAG_CREATE_TRANSFORM:
+		case COMMANDTAG_CREATE_TRIGGER:
+		case COMMANDTAG_CREATE_TYPE:
+		case COMMANDTAG_CREATE_USER_MAPPING:
+		case COMMANDTAG_CREATE_VIEW:
-	/*
-	 * Otherwise, command should be CREATE, ALTER, or DROP.
-	 */
-	if (pg_strncasecmp(tag, "CREATE ", 7) == 0)
-		obtypename = tag + 7;
-	else if (pg_strncasecmp(tag, "ALTER ", 6) == 0)
-		obtypename = tag + 6;
-	else if (pg_strncasecmp(tag, "DROP ", 5) == 0)
-		obtypename = tag + 5;
-	else
-		return EVENT_TRIGGER_COMMAND_TAG_NOT_RECOGNIZED;
+			/*
+			 * Supported ALTER commands
+			 */
+		case COMMANDTAG_ALTER_ACCESS_METHOD:
+		case COMMANDTAG_ALTER_AGGREGATE:
+		case COMMANDTAG_ALTER_CAST:
+		case COMMANDTAG_ALTER_COLLATION:
+		case COMMANDTAG_ALTER_CONSTRAINT:
+		case COMMANDTAG_ALTER_CONVERSION:
+		case COMMANDTAG_ALTER_DOMAIN:
+		case COMMANDTAG_ALTER_EXTENSION:
+		case COMMANDTAG_ALTER_FOREIGN_DATA_WRAPPER:
+		case COMMANDTAG_ALTER_FOREIGN_TABLE:
+		case COMMANDTAG_ALTER_FUNCTION:
+		case COMMANDTAG_ALTER_INDEX:
+		case COMMANDTAG_ALTER_LANGUAGE:
+		case COMMANDTAG_ALTER_MATERIALIZED_VIEW:
+		case COMMANDTAG_ALTER_OPERATOR:
+		case COMMANDTAG_ALTER_OPERATOR_CLASS:
+		case COMMANDTAG_ALTER_OPERATOR_FAMILY:
+		case COMMANDTAG_ALTER_POLICY:
+		case COMMANDTAG_ALTER_PROCEDURE:
+		case COMMANDTAG_ALTER_PUBLICATION:
+		case COMMANDTAG_ALTER_ROUTINE:
+		case COMMANDTAG_ALTER_RULE:
+		case COMMANDTAG_ALTER_SCHEMA:
+		case COMMANDTAG_ALTER_SEQUENCE:
+		case COMMANDTAG_ALTER_SERVER:
+		case COMMANDTAG_ALTER_STATISTICS:
+		case COMMANDTAG_ALTER_SUBSCRIPTION:
+		case COMMANDTAG_ALTER_TABLE:
+		case COMMANDTAG_ALTER_TEXT_SEARCH_CONFIGURATION:
+		case COMMANDTAG_ALTER_TEXT_SEARCH_DICTIONARY:
+		case COMMANDTAG_ALTER_TEXT_SEARCH_PARSER:
+		case COMMANDTAG_ALTER_TEXT_SEARCH_TEMPLATE:
+		case COMMANDTAG_ALTER_TRANSFORM:
+		case COMMANDTAG_ALTER_TRIGGER:
+		case COMMANDTAG_ALTER_TYPE:
+		case COMMANDTAG_ALTER_USER_MAPPING:
+		case COMMANDTAG_ALTER_VIEW:
-	/*
-	 * ...and the object type should be something recognizable.
-	 */
-	for (etsd = event_trigger_support; etsd->obtypename != NULL; etsd++)
-		if (pg_strcasecmp(etsd->obtypename, obtypename) == 0)
+			/*
+			 * Supported DROP commands
+			 */
+		case COMMANDTAG_DROP_ACCESS_METHOD:
+		case COMMANDTAG_DROP_AGGREGATE:
+		case COMMANDTAG_DROP_CAST:
+		case COMMANDTAG_DROP_COLLATION:
+		case COMMANDTAG_DROP_CONSTRAINT:
+		case COMMANDTAG_DROP_CONVERSION:
+		case COMMANDTAG_DROP_DOMAIN:
+		case COMMANDTAG_DROP_EXTENSION:
+		case COMMANDTAG_DROP_FOREIGN_DATA_WRAPPER:
+		case COMMANDTAG_DROP_FOREIGN_TABLE:
+		case COMMANDTAG_DROP_FUNCTION:
+		case COMMANDTAG_DROP_INDEX:
+		case COMMANDTAG_DROP_LANGUAGE:
+		case COMMANDTAG_DROP_MATERIALIZED_VIEW:
+		case COMMANDTAG_DROP_OPERATOR:
+		case COMMANDTAG_DROP_OPERATOR_CLASS:
+		case COMMANDTAG_DROP_OPERATOR_FAMILY:
+		case COMMANDTAG_DROP_POLICY:
+		case COMMANDTAG_DROP_PROCEDURE:
+		case COMMANDTAG_DROP_PUBLICATION:
+		case COMMANDTAG_DROP_ROUTINE:
+		case COMMANDTAG_DROP_RULE:
+		case COMMANDTAG_DROP_SCHEMA:
+		case COMMANDTAG_DROP_SEQUENCE:
+		case COMMANDTAG_DROP_SERVER:
+		case COMMANDTAG_DROP_STATISTICS:
+		case COMMANDTAG_DROP_SUBSCRIPTION:
+		case COMMANDTAG_DROP_TABLE:
+		case COMMANDTAG_DROP_TEXT_SEARCH_CONFIGURATION:
+		case COMMANDTAG_DROP_TEXT_SEARCH_DICTIONARY:
+		case COMMANDTAG_DROP_TEXT_SEARCH_PARSER:
+		case COMMANDTAG_DROP_TEXT_SEARCH_TEMPLATE:
+		case COMMANDTAG_DROP_TRANSFORM:
+		case COMMANDTAG_DROP_TRIGGER:
+		case COMMANDTAG_DROP_TYPE:
+		case COMMANDTAG_DROP_USER_MAPPING:
+		case COMMANDTAG_DROP_VIEW:
+			return EVENT_TRIGGER_COMMAND_TAG_OK;
+
+			/*
+			 * Unsupported CREATE commands
+			 */
+		case COMMANDTAG_CREATE_DATABASE:
+		case COMMANDTAG_CREATE_EVENT_TRIGGER:
+		case COMMANDTAG_CREATE_ROLE:
+		case COMMANDTAG_CREATE_TABLESPACE:
+
+			/*
+			 * Unsupported ALTER commands
+			 */
+		case COMMANDTAG_ALTER_DATABASE:
+		case COMMANDTAG_ALTER_EVENT_TRIGGER:
+		case COMMANDTAG_ALTER_ROLE:
+		case COMMANDTAG_ALTER_TABLESPACE:
+
+			/*
+			 * Unsupported DROP commands
+			 */
+		case COMMANDTAG_DROP_DATABASE:
+		case COMMANDTAG_DROP_EVENT_TRIGGER:
+		case COMMANDTAG_DROP_ROLE:
+		case COMMANDTAG_DROP_TABLESPACE:
+
+			/*
+			 * Other unsupported commands.  These used to return
+			 * EVENT_TRIGGER_COMMAND_TAG_NOT_RECOGNIZED prior to the
+			 * conversion of commandTag from string to enum.
+			 */
+		case COMMANDTAG_ALTER_SYSTEM:
+		case COMMANDTAG_ANALYZE:
+		case COMMANDTAG_BEGIN:
+		case COMMANDTAG_CALL:
+		case COMMANDTAG_CHECKPOINT:
+		case COMMANDTAG_CLOSE:
+		case COMMANDTAG_CLOSE_CURSOR:
+		case COMMANDTAG_CLOSE_CURSOR_ALL:
+		case COMMANDTAG_CLUSTER:
+		case COMMANDTAG_COMMIT:
+		case COMMANDTAG_COMMIT_PREPARED:
+		case COMMANDTAG_COPY:
+		case COMMANDTAG_COPY_FROM:
+		case COMMANDTAG_DEALLOCATE:
+		case COMMANDTAG_DEALLOCATE_ALL:
+		case COMMANDTAG_DECLARE_CURSOR:
+		case COMMANDTAG_DELETE:
+		case COMMANDTAG_DISCARD:
+		case COMMANDTAG_DISCARD_ALL:
+		case COMMANDTAG_DISCARD_PLANS:
+		case COMMANDTAG_DISCARD_SEQUENCES:
+		case COMMANDTAG_DISCARD_TEMP:
+		case COMMANDTAG_DO:
+		case COMMANDTAG_DROP_REPLICATION_SLOT:
+		case COMMANDTAG_EXECUTE:
+		case COMMANDTAG_EXPLAIN:
+		case COMMANDTAG_FETCH:
+		case COMMANDTAG_GRANT_ROLE:
+		case COMMANDTAG_INSERT:
+		case COMMANDTAG_LISTEN:
+		case COMMANDTAG_LOAD:
+		case COMMANDTAG_LOCK_TABLE:
+		case COMMANDTAG_MOVE:
+		case COMMANDTAG_NOTIFY:
+		case COMMANDTAG_PREPARE:
+		case COMMANDTAG_PREPARE_TRANSACTION:
+		case COMMANDTAG_REASSIGN_OWNED:
+		case COMMANDTAG_REINDEX:
+		case COMMANDTAG_RELEASE:
+		case COMMANDTAG_RESET:
+		case COMMANDTAG_REVOKE_ROLE:
+		case COMMANDTAG_ROLLBACK:
+		case COMMANDTAG_ROLLBACK_PREPARED:
+		case COMMANDTAG_SAVEPOINT:
+		case COMMANDTAG_SELECT:
+		case COMMANDTAG_SELECT_FOR_KEY_SHARE:
+		case COMMANDTAG_SELECT_FOR_NO_KEY_UPDATE:
+		case COMMANDTAG_SELECT_FOR_SHARE:
+		case COMMANDTAG_SELECT_FOR_UPDATE:
+		case COMMANDTAG_SET:
+		case COMMANDTAG_SET_CONSTRAINTS:
+		case COMMANDTAG_SHOW:
+		case COMMANDTAG_START_TRANSACTION:
+		case COMMANDTAG_TRUNCATE_TABLE:
+		case COMMANDTAG_UNLISTEN:
+		case COMMANDTAG_UPDATE:
+		case COMMANDTAG_VACUUM:
+			return EVENT_TRIGGER_COMMAND_TAG_NOT_SUPPORTED;
+		case COMMANDTAG_UNKNOWN:
break;
-	if (etsd->obtypename == NULL)
-		return EVENT_TRIGGER_COMMAND_TAG_NOT_RECOGNIZED;
-	if (!etsd->supported)
-		return EVENT_TRIGGER_COMMAND_TAG_NOT_SUPPORTED;
-	return EVENT_TRIGGER_COMMAND_TAG_OK;
+	}
+	return EVENT_TRIGGER_COMMAND_TAG_NOT_RECOGNIZED;
}

This is pretty painful.

I think it is painful in a different way. The existing code on master is a mess of parsing logic that is harder to reason through, but fewer lines. There are other places in the backend that have long switch statements, so I didn’t feel I was breaking with project style to do this. I also made the switch longer than I had too, by including all enumerated values rather than just the ones that are supported. We could remove the extra cases, but I think that’s only a half measure. Something more like a consolidated table or bitmap seems better.

@@ -745,7 +902,7 @@ EventTriggerCommonSetup(Node *parsetree,
return NIL;

/* Get the command tag. */
-	tag = CreateCommandTag(parsetree);
+	tag = GetCommandTagName(CreateCommandTag(parsetree));
/*
* Filter list of event triggers by command tag, and copy them into our
@@ -2136,7 +2293,7 @@ pg_event_trigger_ddl_commands(PG_FUNCTION_ARGS)
/* objsubid */
values[i++] = Int32GetDatum(addr.objectSubId);
/* command tag */
-					values[i++] = CStringGetTextDatum(CreateCommandTag(cmd->parsetree));
+					values[i++] = CStringGetTextDatum(GetCommandTagName(CreateCommandTag(cmd->parsetree)));
/* object_type */
values[i++] = CStringGetTextDatum(type);
/* schema */
@@ -2161,7 +2318,7 @@ pg_event_trigger_ddl_commands(PG_FUNCTION_ARGS)
/* objsubid */
nulls[i++] = true;
/* command tag */
-				values[i++] = CStringGetTextDatum(CreateCommandTag(cmd->parsetree));
+				values[i++] = CStringGetTextDatum(GetCommandTagName(CreateCommandTag(cmd->parsetree)));
/* object_type */
values[i++] = CStringGetTextDatum(stringify_adefprivs_objtype(cmd->d.defprivs.objtype));
/* schema */

So GetCommandTagName we commonly do twice for some reason? Once in
EventTriggerCommonSetup() and then again in
pg_event_trigger_ddl_commands()? Why is EventTriggerData.tag still the
string?

It is not a string after applying v2-0003…. The main issue I see in the code you are quoting is that CreateCommandTag(cmd->parsetree) is called more than once, and that’s not the consequence of this patch. That’s pre-existing. I didn’t look into it, though I can if you think it is relevant to this patch set. The name of the function, CreateCommandTag, sounds like something I invented as part of this patch, but it pre-exists this patch. I only changed it’s return value from char * to CommandTag.

Assert(list_length(plan->plancache_list) == 1);
@@ -1469,7 +1469,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/* translator: %s is a SQL statement name */
errmsg("%s is not allowed in a non-volatile function",
-								CreateCommandTag((Node *) pstmt))));
+								GetCommandTagName(CreateCommandTag((Node *) pstmt)))));

Probably worth having a wrapper for this - these lines are pretty long,
and there quite a number of cases like it in the patch.

Actually, I looked at that. The number of them seemed right on the line between making a wrapper and not. I thought the counter-argument was that by making a wrapper that only got used in a few places, I was creating more lines of code and obfuscating what happens. I’m happy to do it your way, if consensus emerges around that.

@@ -172,11 +175,38 @@ EndCommand(const char *commandTag, CommandDest dest)
case DestRemoteSimple:

/*
-			 * We assume the commandTag is plain ASCII and therefore requires
-			 * no encoding conversion.
+			 * We assume the tagname is plain ASCII and therefore
+			 * requires no encoding conversion.
*/
-			pq_putmessage('C', commandTag, strlen(commandTag) + 1);
-			break;
+			tagname = GetCommandTagName(qc->commandTag);
+			switch (qc->display_format)
+			{
+				case DISPLAYFORMAT_PLAIN:
+					pq_putmessage('C', tagname, strlen(tagname) + 1);
+					break;
+				case DISPLAYFORMAT_LAST_OID:
+					/*
+					 * We no longer display LastOid, but to preserve the wire protocol,
+					 * we write InvalidOid where the LastOid used to be written.  For
+					 * efficiency in the snprintf(), hard-code InvalidOid as zero.
+					 */
+					Assert(InvalidOid == 0);
+					snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
+								"%s 0 " UINT64_FORMAT,
+								tagname,
+								qc->nprocessed);
+					pq_putmessage('C', completionTag, strlen(completionTag) + 1);
+					break;
+				case DISPLAYFORMAT_NPROCESSED:
+					snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
+							"%s " UINT64_FORMAT,
+							tagname,
+							qc->nprocessed);
+					pq_putmessage('C', completionTag, strlen(completionTag) + 1);
+					break;
+				default:
+					elog(ERROR, "Invalid display_format in EndCommand");
+			}

Imo there should only be one pq_putmessage(). Also think this type of
default: is a bad idea, just prevents the compiler from warning if we
were to ever introduce a new variant of DISPLAYFORMAT_*, without
providing any meaningful additional security.

Ok.

@@ -855,7 +889,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,

case T_DiscardStmt:
/* should we allow DISCARD PLANS? */
-			CheckRestrictedOperation("DISCARD");
+			CheckRestrictedOperation(COMMANDTAG_DISCARD);
DiscardCommand((DiscardStmt *) parsetree, isTopLevel);
break;

@@ -974,7 +1008,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->objtype))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
ExecuteGrantStmt(stmt);
}
@@ -987,7 +1021,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->removeType))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
ExecDropStmt(stmt, isTopLevel);
}
@@ -1000,7 +1034,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->renameType))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
ExecRenameStmt(stmt);
}
@@ -1013,7 +1047,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->objectType))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
ExecAlterObjectDependsStmt(stmt, NULL);
}
@@ -1026,7 +1060,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->objectType))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
ExecAlterObjectSchemaStmt(stmt, NULL);
}
@@ -1039,7 +1073,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->objectType))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
ExecAlterOwnerStmt(stmt);
}
@@ -1052,7 +1086,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->objtype))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
CommentObject(stmt);
break;
@@ -1065,7 +1099,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->objtype))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);

Not this patch's fault or task. But I hate this type of code - needing
to touch a dozen places for new type of statement is just
insane. utility.c should long have been rewritten to just have one
metadata table for nearly all of this. Perhaps with a few callbacks for
special cases.

No objection from me, though I’d have to see the alternative and what it does to performance.

+static const char * tag_names[] = {
+	"???",
+	"ALTER ACCESS METHOD",
+	"ALTER AGGREGATE",
+	"ALTER CAST",

This seems problematic to maintain, because the order needs to match
between this and something defined in a header - and there's no
guarantee a misordering is immediately noticeable. We should either go
for my metadata table idea, or at least rewrite this, even if more
verbose, to something like

static const char * tag_names[] = {
[COMMAND_TAG_ALTER_ACCESS_METHOD] = "ALTER ACCESS METHOD",
...

I think the fact that this would show up in a grep for
COMMAND_TAG_ALTER_ACCESS_METHOD is good too.

I had something closer to what you’re asking for as part of the v1 patch and ripped it out to get the code size down. Avoiding code bloat was one of Tom's concerns. What you are suggesting is admittedly better than what I ripped out, though.

+/*
+ * Search CommandTag by name
+ *
+ * Returns CommandTag, or COMMANDTAG_UNKNOWN if not recognized
+ */
+CommandTag
+GetCommandTagEnum(const char *commandname)
+{
+	const char **base, **last, **position;
+	int		 result;
+
+	OPTIONALLY_CHECK_COMMAND_TAGS();
+	if (commandname == NULL || *commandname == '\0')
+		return COMMANDTAG_UNKNOWN;
+
+	base = tag_names;
+	last = tag_names + tag_name_length - 1;
+	while (last >= base)
+	{
+		position = base + ((last - base) >> 1);
+		result = pg_strcasecmp(commandname, *position);
+		if (result == 0)
+			return (CommandTag) (position - tag_names);
+		else if (result < 0)
+			last = position - 1;
+		else
+			base = position + 1;
+	}
+	return COMMANDTAG_UNKNOWN;
+}

This seems pretty grotty - but you get rid of it later. See my comments there.

+#ifdef COMMANDTAG_CHECKING
+bool
+CheckCommandTagEnum()
+{
+	CommandTag	i, j;
+
+	if (FIRST_COMMAND_TAG < 0 || LAST_COMMAND_TAG < 0 || LAST_COMMAND_TAG < FIRST_COMMAND_TAG)
+	{
+		elog(ERROR, "FIRST_COMMAND_TAG (%u), LAST_COMMAND_TAG (%u) not reasonable",
+			 (unsigned int) FIRST_COMMAND_TAG, (unsigned int) LAST_COMMAND_TAG);
+		return false;
+	}
+	if (FIRST_COMMAND_TAG != (CommandTag)0)
+	{
+		elog(ERROR, "FIRST_COMMAND_TAG (%u) != 0", (unsigned int) FIRST_COMMAND_TAG);
+		return false;
+	}
+	if (LAST_COMMAND_TAG != (CommandTag)(tag_name_length - 1))
+	{
+		elog(ERROR, "LAST_COMMAND_TAG (%u) != tag_name_length (%u)",
+			 (unsigned int) LAST_COMMAND_TAG, (unsigned int) tag_name_length);
+		return false;
+	}

These all seem to want to be static asserts.

+	for (i = FIRST_COMMAND_TAG; i < LAST_COMMAND_TAG; i++)
+	{
+		for (j = i+1; j < LAST_COMMAND_TAG; j++)
+		{
+			int cmp = strcmp(tag_names[i], tag_names[j]);
+			if (cmp == 0)
+			{
+				elog(ERROR, "Found duplicate tag_name: \"%s\"",
+					tag_names[i]);
+				return false;
+			}
+			if (cmp > 0)
+			{
+				elog(ERROR, "Found commandnames out of order: \"%s\" before \"%s\"",
+					tag_names[i], tag_names[j]);
+				return false;
+			}
+		}
+	}
+	return true;
+}

And I think we could get rid of this with my earlier suggestions?

+/*
+ * BEWARE: These are in sorted order, but ordered by their printed
+ *         values in the tag_name list (see common/commandtag.c).
+ *         In particular it matters because the sort ordering changes
+ *         when you replace a space with an underscore.  To wit:
+ *
+ *             "CREATE TABLE"
+ *             "CREATE TABLE AS"
+ *             "CREATE TABLESPACE"
+ *
+ *         but...
+ *
+ *             CREATE_TABLE
+ *             CREATE_TABLESPACE
+ *             CREATE_TABLE_AS
+ *
+ *         It also matters that COMMANDTAG_UNKNOWN is written "???".
+ *
+ * If you add a value here, add it in common/commandtag.c also, and
+ * be careful to get the ordering right.  You can build with
+ * COMMANDTAG_CHECKING to have this automatically checked
+ * at runtime, but that adds considerable overhead, so do so sparingly.
+ */
+typedef enum CommandTag
+{

This seems pretty darn nightmarish.

+#define FIRST_COMMAND_TAG COMMANDTAG_UNKNOWN
+	COMMANDTAG_UNKNOWN,
+	COMMANDTAG_ALTER_ACCESS_METHOD,
+	COMMANDTAG_ALTER_AGGREGATE,
+	COMMANDTAG_ALTER_CAST,
+	COMMANDTAG_ALTER_COLLATION,
+	COMMANDTAG_ALTER_CONSTRAINT,
+	COMMANDTAG_ALTER_CONVERSION,
+	COMMANDTAG_ALTER_DATABASE,
+	COMMANDTAG_ALTER_DEFAULT_PRIVILEGES,
+	COMMANDTAG_ALTER_DOMAIN,
[...]

I'm a bit worried that this basically duplicates a good portion of NodeTag, without having otherwise much of a point?

I never quite came up with a one-size-fits-all enumeration. There are lots of places where these enumerations seem to almost map onto each other, but with special cases that don’t line up. I’m open to suggestions.

From a70b0cadc1142e92b2354a0ca3cd47aaeb0c148e Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 4 Feb 2020 12:25:05 -0800
Subject: [PATCH v2 2/3] Using a Bitmapset of tags rather than a string array.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

EventTriggerCacheItem no longer holds an array of palloc’d tag strings
in sorted order, but rather just a Bitmapset over the CommandTags. This
makes the code a little simpler and easier to read, in my opinion. In
filter_event_trigger, rather than running bsearch through a sorted array
of strings, it just runs bms_is_member.
---

It seems weird to add the bsearch just to remove it immediately again a
patch later. This probably should just go first?

I’m not sure what you mean. That bsearch is pre-existing, not mine.

diff --git a/src/test/regress/sql/event_trigger.sql b/src/test/regress/sql/event_trigger.sql
index 346168673d..cad02212ad 100644
--- a/src/test/regress/sql/event_trigger.sql
+++ b/src/test/regress/sql/event_trigger.sql
@@ -10,6 +10,13 @@ BEGIN
END
$$ language plpgsql;
+-- OK
+create function test_event_trigger2() returns event_trigger as $$
+BEGIN
+	RAISE NOTICE 'test_event_trigger2: % %', tg_event, tg_tag;
+END
+$$ LANGUAGE plpgsql;
+
-- should fail, event triggers cannot have declared arguments
create function test_event_trigger_arg(name text)
returns event_trigger as $$ BEGIN RETURN 1; END $$ language plpgsql;
@@ -82,6 +89,783 @@ create event trigger regress_event_trigger2 on ddl_command_start
-- OK
comment on event trigger regress_event_trigger is 'test comment';
+-- These are all unsupported
+create event trigger regress_event_triger_NULL on ddl_command_start
+   when tag in ('')
+   execute procedure test_event_trigger2();
+
+create event trigger regress_event_triger_UNKNOWN on ddl_command_start
+   when tag in ('???')
+   execute procedure test_event_trigger2();
+
+create event trigger regress_event_trigger_ALTER_DATABASE on ddl_command_start
+   when tag in ('ALTER DATABASE')
+   execute procedure test_event_trigger2();

[...]

There got to be a more maintainable way to write this.

Yeah, I already conceded to Tom in his review that I’m not wedded to committing this test in any form, let alone in this form. That’s part of why I kept it as a separate patch file. But if you like what it is doing, and just don’t like the verbosity, I can try harder to compress it.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Andres Freund (#5)
Re: Portal->commandTag as an enum

Andres,

The previous patch set seemed to cause confusion, having separated changes into multiple files. The latest patch, heavily influenced by your review, is all in one file, attached.

On Feb 4, 2020, at 7:34 PM, Andres Freund <andres@anarazel.de> wrote:

These are replaced by switch() case statements over the possible commandTags:

+       switch (commandTag)
+       {
+                       /*
+                        * Supported idiosyncratic special cases.
+                        */
+               case COMMANDTAG_ALTER_DEFAULT_PRIVILEGES:
+               case COMMANDTAG_ALTER_LARGE_OBJECT:
+               case COMMANDTAG_COMMENT:
+               case COMMANDTAG_CREATE_TABLE_AS:
+               case COMMANDTAG_DROP_OWNED:
+               case COMMANDTAG_GRANT:
+               case COMMANDTAG_IMPORT_FOREIGN_SCHEMA:
+               case COMMANDTAG_REFRESH_MATERIALIZED_VIEW:
+               case COMMANDTAG_REVOKE:
+               case COMMANDTAG_SECURITY_LABEL:
+               case COMMANDTAG_SELECT_INTO:

The number of these makes me wonder if we should just have a metadata
table in one place, instead of needing to edit multiple
locations. Something like

const ... CommandTagBehaviour[] = {
[COMMANDTAG_INSERT] = {
.display_processed = true, .display_oid = true, ...},
[COMMANDTAG_CREATE_TABLE_AS] = {
.event_trigger = true, ...},

with the zero initialized defaults being the common cases.

Not sure if it's worth going there. But it's maybe worth thinking about
for a minute?

The v3 patch does something like you suggest.

The only gotcha I came across while reorganizing the code this way is that exec_replication_command(…) outputs “SELECT” rather than “SELECT <ROWCOUNT>” as is done everywhere else. Strangely, exec_replication_command(…) does output the rowcount for “COPY <ROWCOUNT>”, which matches how COPY is handled elsewhere. I can’t see any logic in this. I’m concerned that outputting “SELECT 0” from exec_replication_command rather than “SELECT” as is currently done will break some client somewhere, though none that I can find.

Putting the display information into the CommandTag behavior table forces the behavior per tag to be the same everywhere, which forces this change on exec_replication_command.

To get around this, I’ve added an extremely bogus extra boolean argument to EndCommand, force_undecorated_output, that is false from all callers except exec_replication_command(…) in the one spot I described.

I don’t know whether the code should be committed this way, but I need something as a placeholder until I get a better understanding of why exec_replication_command(…) behaves as it does and what I should do about it in the patch.

Averages for test set 1 by scale:
set scale tps avg_latency 90%< max_latency
1 1 3741 1.734 3.162 133.718
1 9 6124 0.904 1.05 230.547
1 81 5921 0.931 1.015 67.023

Averages for test set 1 by clients:
set clients tps avg_latency 90%< max_latency
1 1 2163 0.461 0.514 24.414
1 4 5968 0.675 0.791 40.354
1 16 7655 2.433 3.922 366.519

For command tag patch (branched from 1fd687a035):

postgresql % find src -type f -name "*.c" -or -name "*.h" | xargs cat | wc
1482969 5691908 45281399

postgresql % find src -type f -name "*.o" | xargs cat | wc
38209 476243 18999752

Averages for test set 1 by scale:
set scale tps avg_latency 90%< max_latency
1 1 3877 1.645 3.066 24.973
1 9 6383 0.859 1.032 64.566
1 81 5945 0.925 1.023 162.9

Averages for test set 1 by clients:
set clients tps avg_latency 90%< max_latency
1 1 2141 0.466 0.522 11.531
1 4 5967 0.673 0.783 136.882
1 16 8096 2.292 3.817 104.026

Not bad.

diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 9aa2b61600..5322c14ce4 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -594,7 +594,7 @@ pg_notify(PG_FUNCTION_ARGS)
payload = text_to_cstring(PG_GETARG_TEXT_PP(1));
/* For NOTIFY as a statement, this is checked in ProcessUtility */
-	PreventCommandDuringRecovery("NOTIFY");
+	PreventCommandDuringRecovery(COMMANDTAG_NOTIFY);

Async_Notify(channel, payload);

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 40a8ec1abd..4828e75bd5 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1063,7 +1063,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
/* check read-only transaction and parallel mode */
if (XactReadOnly && !rel->rd_islocaltemp)
-			PreventCommandIfReadOnly("COPY FROM");
+			PreventCommandIfReadOnly(COMMANDTAG_COPY_FROM);

cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);

I'm not sure this really ought to be part of this change - seems like a
somewhat independent change to me. With less obvious benefits.

This is changed back in v3 to be more like how it was before.

static event_trigger_command_tag_check_result
-check_ddl_tag(const char *tag)
+check_ddl_tag(CommandTag commandTag)
{
-	const char *obtypename;
-	const event_trigger_support_data *etsd;
+	switch (commandTag)
+	{
+			/*
+			 * Supported idiosyncratic special cases.
+			 */
+		case COMMANDTAG_ALTER_DEFAULT_PRIVILEGES:
+		case COMMANDTAG_ALTER_LARGE_OBJECT:
+		case COMMANDTAG_COMMENT:
+		case COMMANDTAG_CREATE_TABLE_AS:
+		case COMMANDTAG_DROP_OWNED:
+		case COMMANDTAG_GRANT:
+		case COMMANDTAG_IMPORT_FOREIGN_SCHEMA:
+		case COMMANDTAG_REFRESH_MATERIALIZED_VIEW:
+		case COMMANDTAG_REVOKE:
+		case COMMANDTAG_SECURITY_LABEL:
+		case COMMANDTAG_SELECT_INTO:
-	/*
-	 * Handle some idiosyncratic special cases.
-	 */
-	if (pg_strcasecmp(tag, "CREATE TABLE AS") == 0 ||
-		pg_strcasecmp(tag, "SELECT INTO") == 0 ||
-		pg_strcasecmp(tag, "REFRESH MATERIALIZED VIEW") == 0 ||
-		pg_strcasecmp(tag, "ALTER DEFAULT PRIVILEGES") == 0 ||
-		pg_strcasecmp(tag, "ALTER LARGE OBJECT") == 0 ||
-		pg_strcasecmp(tag, "COMMENT") == 0 ||
-		pg_strcasecmp(tag, "GRANT") == 0 ||
-		pg_strcasecmp(tag, "REVOKE") == 0 ||
-		pg_strcasecmp(tag, "DROP OWNED") == 0 ||
-		pg_strcasecmp(tag, "IMPORT FOREIGN SCHEMA") == 0 ||
-		pg_strcasecmp(tag, "SECURITY LABEL") == 0)
-		return EVENT_TRIGGER_COMMAND_TAG_OK;
+			/*
+			 * Supported CREATE commands
+			 */
+		case COMMANDTAG_CREATE_ACCESS_METHOD:
+		case COMMANDTAG_CREATE_AGGREGATE:
+		case COMMANDTAG_CREATE_CAST:
+		case COMMANDTAG_CREATE_COLLATION:
+		case COMMANDTAG_CREATE_CONSTRAINT:
+		case COMMANDTAG_CREATE_CONVERSION:
+		case COMMANDTAG_CREATE_DOMAIN:
+		case COMMANDTAG_CREATE_EXTENSION:
+		case COMMANDTAG_CREATE_FOREIGN_DATA_WRAPPER:
+		case COMMANDTAG_CREATE_FOREIGN_TABLE:
+		case COMMANDTAG_CREATE_FUNCTION:
+		case COMMANDTAG_CREATE_INDEX:
+		case COMMANDTAG_CREATE_LANGUAGE:
+		case COMMANDTAG_CREATE_MATERIALIZED_VIEW:
+		case COMMANDTAG_CREATE_OPERATOR:
+		case COMMANDTAG_CREATE_OPERATOR_CLASS:
+		case COMMANDTAG_CREATE_OPERATOR_FAMILY:
+		case COMMANDTAG_CREATE_POLICY:
+		case COMMANDTAG_CREATE_PROCEDURE:
+		case COMMANDTAG_CREATE_PUBLICATION:
+		case COMMANDTAG_CREATE_ROUTINE:
+		case COMMANDTAG_CREATE_RULE:
+		case COMMANDTAG_CREATE_SCHEMA:
+		case COMMANDTAG_CREATE_SEQUENCE:
+		case COMMANDTAG_CREATE_SERVER:
+		case COMMANDTAG_CREATE_STATISTICS:
+		case COMMANDTAG_CREATE_SUBSCRIPTION:
+		case COMMANDTAG_CREATE_TABLE:
+		case COMMANDTAG_CREATE_TEXT_SEARCH_CONFIGURATION:
+		case COMMANDTAG_CREATE_TEXT_SEARCH_DICTIONARY:
+		case COMMANDTAG_CREATE_TEXT_SEARCH_PARSER:
+		case COMMANDTAG_CREATE_TEXT_SEARCH_TEMPLATE:
+		case COMMANDTAG_CREATE_TRANSFORM:
+		case COMMANDTAG_CREATE_TRIGGER:
+		case COMMANDTAG_CREATE_TYPE:
+		case COMMANDTAG_CREATE_USER_MAPPING:
+		case COMMANDTAG_CREATE_VIEW:
-	/*
-	 * Otherwise, command should be CREATE, ALTER, or DROP.
-	 */
-	if (pg_strncasecmp(tag, "CREATE ", 7) == 0)
-		obtypename = tag + 7;
-	else if (pg_strncasecmp(tag, "ALTER ", 6) == 0)
-		obtypename = tag + 6;
-	else if (pg_strncasecmp(tag, "DROP ", 5) == 0)
-		obtypename = tag + 5;
-	else
-		return EVENT_TRIGGER_COMMAND_TAG_NOT_RECOGNIZED;
+			/*
+			 * Supported ALTER commands
+			 */
+		case COMMANDTAG_ALTER_ACCESS_METHOD:
+		case COMMANDTAG_ALTER_AGGREGATE:
+		case COMMANDTAG_ALTER_CAST:
+		case COMMANDTAG_ALTER_COLLATION:
+		case COMMANDTAG_ALTER_CONSTRAINT:
+		case COMMANDTAG_ALTER_CONVERSION:
+		case COMMANDTAG_ALTER_DOMAIN:
+		case COMMANDTAG_ALTER_EXTENSION:
+		case COMMANDTAG_ALTER_FOREIGN_DATA_WRAPPER:
+		case COMMANDTAG_ALTER_FOREIGN_TABLE:
+		case COMMANDTAG_ALTER_FUNCTION:
+		case COMMANDTAG_ALTER_INDEX:
+		case COMMANDTAG_ALTER_LANGUAGE:
+		case COMMANDTAG_ALTER_MATERIALIZED_VIEW:
+		case COMMANDTAG_ALTER_OPERATOR:
+		case COMMANDTAG_ALTER_OPERATOR_CLASS:
+		case COMMANDTAG_ALTER_OPERATOR_FAMILY:
+		case COMMANDTAG_ALTER_POLICY:
+		case COMMANDTAG_ALTER_PROCEDURE:
+		case COMMANDTAG_ALTER_PUBLICATION:
+		case COMMANDTAG_ALTER_ROUTINE:
+		case COMMANDTAG_ALTER_RULE:
+		case COMMANDTAG_ALTER_SCHEMA:
+		case COMMANDTAG_ALTER_SEQUENCE:
+		case COMMANDTAG_ALTER_SERVER:
+		case COMMANDTAG_ALTER_STATISTICS:
+		case COMMANDTAG_ALTER_SUBSCRIPTION:
+		case COMMANDTAG_ALTER_TABLE:
+		case COMMANDTAG_ALTER_TEXT_SEARCH_CONFIGURATION:
+		case COMMANDTAG_ALTER_TEXT_SEARCH_DICTIONARY:
+		case COMMANDTAG_ALTER_TEXT_SEARCH_PARSER:
+		case COMMANDTAG_ALTER_TEXT_SEARCH_TEMPLATE:
+		case COMMANDTAG_ALTER_TRANSFORM:
+		case COMMANDTAG_ALTER_TRIGGER:
+		case COMMANDTAG_ALTER_TYPE:
+		case COMMANDTAG_ALTER_USER_MAPPING:
+		case COMMANDTAG_ALTER_VIEW:
-	/*
-	 * ...and the object type should be something recognizable.
-	 */
-	for (etsd = event_trigger_support; etsd->obtypename != NULL; etsd++)
-		if (pg_strcasecmp(etsd->obtypename, obtypename) == 0)
+			/*
+			 * Supported DROP commands
+			 */
+		case COMMANDTAG_DROP_ACCESS_METHOD:
+		case COMMANDTAG_DROP_AGGREGATE:
+		case COMMANDTAG_DROP_CAST:
+		case COMMANDTAG_DROP_COLLATION:
+		case COMMANDTAG_DROP_CONSTRAINT:
+		case COMMANDTAG_DROP_CONVERSION:
+		case COMMANDTAG_DROP_DOMAIN:
+		case COMMANDTAG_DROP_EXTENSION:
+		case COMMANDTAG_DROP_FOREIGN_DATA_WRAPPER:
+		case COMMANDTAG_DROP_FOREIGN_TABLE:
+		case COMMANDTAG_DROP_FUNCTION:
+		case COMMANDTAG_DROP_INDEX:
+		case COMMANDTAG_DROP_LANGUAGE:
+		case COMMANDTAG_DROP_MATERIALIZED_VIEW:
+		case COMMANDTAG_DROP_OPERATOR:
+		case COMMANDTAG_DROP_OPERATOR_CLASS:
+		case COMMANDTAG_DROP_OPERATOR_FAMILY:
+		case COMMANDTAG_DROP_POLICY:
+		case COMMANDTAG_DROP_PROCEDURE:
+		case COMMANDTAG_DROP_PUBLICATION:
+		case COMMANDTAG_DROP_ROUTINE:
+		case COMMANDTAG_DROP_RULE:
+		case COMMANDTAG_DROP_SCHEMA:
+		case COMMANDTAG_DROP_SEQUENCE:
+		case COMMANDTAG_DROP_SERVER:
+		case COMMANDTAG_DROP_STATISTICS:
+		case COMMANDTAG_DROP_SUBSCRIPTION:
+		case COMMANDTAG_DROP_TABLE:
+		case COMMANDTAG_DROP_TEXT_SEARCH_CONFIGURATION:
+		case COMMANDTAG_DROP_TEXT_SEARCH_DICTIONARY:
+		case COMMANDTAG_DROP_TEXT_SEARCH_PARSER:
+		case COMMANDTAG_DROP_TEXT_SEARCH_TEMPLATE:
+		case COMMANDTAG_DROP_TRANSFORM:
+		case COMMANDTAG_DROP_TRIGGER:
+		case COMMANDTAG_DROP_TYPE:
+		case COMMANDTAG_DROP_USER_MAPPING:
+		case COMMANDTAG_DROP_VIEW:
+			return EVENT_TRIGGER_COMMAND_TAG_OK;
+
+			/*
+			 * Unsupported CREATE commands
+			 */
+		case COMMANDTAG_CREATE_DATABASE:
+		case COMMANDTAG_CREATE_EVENT_TRIGGER:
+		case COMMANDTAG_CREATE_ROLE:
+		case COMMANDTAG_CREATE_TABLESPACE:
+
+			/*
+			 * Unsupported ALTER commands
+			 */
+		case COMMANDTAG_ALTER_DATABASE:
+		case COMMANDTAG_ALTER_EVENT_TRIGGER:
+		case COMMANDTAG_ALTER_ROLE:
+		case COMMANDTAG_ALTER_TABLESPACE:
+
+			/*
+			 * Unsupported DROP commands
+			 */
+		case COMMANDTAG_DROP_DATABASE:
+		case COMMANDTAG_DROP_EVENT_TRIGGER:
+		case COMMANDTAG_DROP_ROLE:
+		case COMMANDTAG_DROP_TABLESPACE:
+
+			/*
+			 * Other unsupported commands.  These used to return
+			 * EVENT_TRIGGER_COMMAND_TAG_NOT_RECOGNIZED prior to the
+			 * conversion of commandTag from string to enum.
+			 */
+		case COMMANDTAG_ALTER_SYSTEM:
+		case COMMANDTAG_ANALYZE:
+		case COMMANDTAG_BEGIN:
+		case COMMANDTAG_CALL:
+		case COMMANDTAG_CHECKPOINT:
+		case COMMANDTAG_CLOSE:
+		case COMMANDTAG_CLOSE_CURSOR:
+		case COMMANDTAG_CLOSE_CURSOR_ALL:
+		case COMMANDTAG_CLUSTER:
+		case COMMANDTAG_COMMIT:
+		case COMMANDTAG_COMMIT_PREPARED:
+		case COMMANDTAG_COPY:
+		case COMMANDTAG_COPY_FROM:
+		case COMMANDTAG_DEALLOCATE:
+		case COMMANDTAG_DEALLOCATE_ALL:
+		case COMMANDTAG_DECLARE_CURSOR:
+		case COMMANDTAG_DELETE:
+		case COMMANDTAG_DISCARD:
+		case COMMANDTAG_DISCARD_ALL:
+		case COMMANDTAG_DISCARD_PLANS:
+		case COMMANDTAG_DISCARD_SEQUENCES:
+		case COMMANDTAG_DISCARD_TEMP:
+		case COMMANDTAG_DO:
+		case COMMANDTAG_DROP_REPLICATION_SLOT:
+		case COMMANDTAG_EXECUTE:
+		case COMMANDTAG_EXPLAIN:
+		case COMMANDTAG_FETCH:
+		case COMMANDTAG_GRANT_ROLE:
+		case COMMANDTAG_INSERT:
+		case COMMANDTAG_LISTEN:
+		case COMMANDTAG_LOAD:
+		case COMMANDTAG_LOCK_TABLE:
+		case COMMANDTAG_MOVE:
+		case COMMANDTAG_NOTIFY:
+		case COMMANDTAG_PREPARE:
+		case COMMANDTAG_PREPARE_TRANSACTION:
+		case COMMANDTAG_REASSIGN_OWNED:
+		case COMMANDTAG_REINDEX:
+		case COMMANDTAG_RELEASE:
+		case COMMANDTAG_RESET:
+		case COMMANDTAG_REVOKE_ROLE:
+		case COMMANDTAG_ROLLBACK:
+		case COMMANDTAG_ROLLBACK_PREPARED:
+		case COMMANDTAG_SAVEPOINT:
+		case COMMANDTAG_SELECT:
+		case COMMANDTAG_SELECT_FOR_KEY_SHARE:
+		case COMMANDTAG_SELECT_FOR_NO_KEY_UPDATE:
+		case COMMANDTAG_SELECT_FOR_SHARE:
+		case COMMANDTAG_SELECT_FOR_UPDATE:
+		case COMMANDTAG_SET:
+		case COMMANDTAG_SET_CONSTRAINTS:
+		case COMMANDTAG_SHOW:
+		case COMMANDTAG_START_TRANSACTION:
+		case COMMANDTAG_TRUNCATE_TABLE:
+		case COMMANDTAG_UNLISTEN:
+		case COMMANDTAG_UPDATE:
+		case COMMANDTAG_VACUUM:
+			return EVENT_TRIGGER_COMMAND_TAG_NOT_SUPPORTED;
+		case COMMANDTAG_UNKNOWN:
break;
-	if (etsd->obtypename == NULL)
-		return EVENT_TRIGGER_COMMAND_TAG_NOT_RECOGNIZED;
-	if (!etsd->supported)
-		return EVENT_TRIGGER_COMMAND_TAG_NOT_SUPPORTED;
-	return EVENT_TRIGGER_COMMAND_TAG_OK;
+	}
+	return EVENT_TRIGGER_COMMAND_TAG_NOT_RECOGNIZED;
}

This is pretty painful.

Yeah, and it’s gone in v3. This sort of logic now lives in the behavior table in src/backend/utils/misc/commandtag.c.

@@ -745,7 +902,7 @@ EventTriggerCommonSetup(Node *parsetree,
return NIL;

/* Get the command tag. */
-	tag = CreateCommandTag(parsetree);
+	tag = GetCommandTagName(CreateCommandTag(parsetree));
/*
* Filter list of event triggers by command tag, and copy them into our
@@ -2136,7 +2293,7 @@ pg_event_trigger_ddl_commands(PG_FUNCTION_ARGS)
/* objsubid */
values[i++] = Int32GetDatum(addr.objectSubId);
/* command tag */
-					values[i++] = CStringGetTextDatum(CreateCommandTag(cmd->parsetree));
+					values[i++] = CStringGetTextDatum(GetCommandTagName(CreateCommandTag(cmd->parsetree)));
/* object_type */
values[i++] = CStringGetTextDatum(type);
/* schema */
@@ -2161,7 +2318,7 @@ pg_event_trigger_ddl_commands(PG_FUNCTION_ARGS)
/* objsubid */
nulls[i++] = true;
/* command tag */
-				values[i++] = CStringGetTextDatum(CreateCommandTag(cmd->parsetree));
+				values[i++] = CStringGetTextDatum(GetCommandTagName(CreateCommandTag(cmd->parsetree)));
/* object_type */
values[i++] = CStringGetTextDatum(stringify_adefprivs_objtype(cmd->d.defprivs.objtype));
/* schema */

So GetCommandTagName we commonly do twice for some reason? Once in
EventTriggerCommonSetup() and then again in
pg_event_trigger_ddl_commands()? Why is EventTriggerData.tag still the
string?

EventTriggerCommonSetup() gets the command tag enum, not the string, at least in v3.

Assert(list_length(plan->plancache_list) == 1);
@@ -1469,7 +1469,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/* translator: %s is a SQL statement name */
errmsg("%s is not allowed in a non-volatile function",
-								CreateCommandTag((Node *) pstmt))));
+								GetCommandTagName(CreateCommandTag((Node *) pstmt)))));

Probably worth having a wrapper for this - these lines are pretty long,
and there quite a number of cases like it in the patch.

I was having some trouble figuring out what to name the wrapper. “CreateCommandTagAndGetName" is nearly as long as the two function names it replaces. “CreateCommandTagName” sounds like you’re creating a name rather than a CommandTag, which is misleading. But then I realized this function was poorly named to begin with. “Create” is an entirely inappropriate verb for what this function does. Even before this patch, it wasn’t creating anything. I was looking up a constant string name. Now it is looking up an enum.

I went with CreateCommandName(…), but I think this leaves a lot to be desired. Thoughts?

@@ -172,11 +175,38 @@ EndCommand(const char *commandTag, CommandDest dest)
case DestRemoteSimple:

/*
-			 * We assume the commandTag is plain ASCII and therefore requires
-			 * no encoding conversion.
+			 * We assume the tagname is plain ASCII and therefore
+			 * requires no encoding conversion.
*/
-			pq_putmessage('C', commandTag, strlen(commandTag) + 1);
-			break;
+			tagname = GetCommandTagName(qc->commandTag);
+			switch (qc->display_format)
+			{
+				case DISPLAYFORMAT_PLAIN:
+					pq_putmessage('C', tagname, strlen(tagname) + 1);
+					break;
+				case DISPLAYFORMAT_LAST_OID:
+					/*
+					 * We no longer display LastOid, but to preserve the wire protocol,
+					 * we write InvalidOid where the LastOid used to be written.  For
+					 * efficiency in the snprintf(), hard-code InvalidOid as zero.
+					 */
+					Assert(InvalidOid == 0);
+					snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
+								"%s 0 " UINT64_FORMAT,
+								tagname,
+								qc->nprocessed);
+					pq_putmessage('C', completionTag, strlen(completionTag) + 1);
+					break;
+				case DISPLAYFORMAT_NPROCESSED:
+					snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
+							"%s " UINT64_FORMAT,
+							tagname,
+							qc->nprocessed);
+					pq_putmessage('C', completionTag, strlen(completionTag) + 1);
+					break;
+				default:
+					elog(ERROR, "Invalid display_format in EndCommand");
+			}

Imo there should only be one pq_putmessage(). Also think this type of
default: is a bad idea, just prevents the compiler from warning if we
were to ever introduce a new variant of DISPLAYFORMAT_*, without
providing any meaningful additional security.

This is fixed in v3.

@@ -855,7 +889,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,

case T_DiscardStmt:
/* should we allow DISCARD PLANS? */
-			CheckRestrictedOperation("DISCARD");
+			CheckRestrictedOperation(COMMANDTAG_DISCARD);
DiscardCommand((DiscardStmt *) parsetree, isTopLevel);
break;

@@ -974,7 +1008,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->objtype))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
ExecuteGrantStmt(stmt);
}
@@ -987,7 +1021,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->removeType))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
ExecDropStmt(stmt, isTopLevel);
}
@@ -1000,7 +1034,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->renameType))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
ExecRenameStmt(stmt);
}
@@ -1013,7 +1047,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->objectType))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
ExecAlterObjectDependsStmt(stmt, NULL);
}
@@ -1026,7 +1060,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->objectType))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
ExecAlterObjectSchemaStmt(stmt, NULL);
}
@@ -1039,7 +1073,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->objectType))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
ExecAlterOwnerStmt(stmt);
}
@@ -1052,7 +1086,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->objtype))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);
else
CommentObject(stmt);
break;
@@ -1065,7 +1099,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
if (EventTriggerSupportsObjectType(stmt->objtype))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
- dest, completionTag);
+ dest, qc);

Not this patch's fault or task. But I hate this type of code - needing
to touch a dozen places for new type of statement is just
insane. utility.c should long have been rewritten to just have one
metadata table for nearly all of this. Perhaps with a few callbacks for
special cases.

I’ve decided not to touch this issue. There are no changes here from how it was done in v2.

+static const char * tag_names[] = {
+	"???",
+	"ALTER ACCESS METHOD",
+	"ALTER AGGREGATE",
+	"ALTER CAST",

This seems problematic to maintain, because the order needs to match
between this and something defined in a header - and there's no
guarantee a misordering is immediately noticeable. We should either go
for my metadata table idea, or at least rewrite this, even if more
verbose, to something like

static const char * tag_names[] = {
[COMMAND_TAG_ALTER_ACCESS_METHOD] = "ALTER ACCESS METHOD",
...

I think the fact that this would show up in a grep for
COMMAND_TAG_ALTER_ACCESS_METHOD is good too.

Rewriting this as you suggest does not prevent tag names from being out of sorted order. Version 3 of the patch adds a perl script that reads commandtag.h and commandtag.c during the build process and stops the build with a brief error message if they don’t match, are malformed, or if the sorting order is wrong. The script does not modify the code. It just reviews it for correctness. As such, it probably doesn’t matter whether it runs on all platforms. I did not look into whether this runs on Windows, but if there is any difficulty there, it could simply be disabled on that platform.

It also doesn’t matter if this perl script gets committed. There is a trade-off here between maintaining the script vs. manually maintaining the enum ordering.

+/*
+ * Search CommandTag by name
+ *
+ * Returns CommandTag, or COMMANDTAG_UNKNOWN if not recognized
+ */
+CommandTag
+GetCommandTagEnum(const char *commandname)
+{
+	const char **base, **last, **position;
+	int		 result;
+
+	OPTIONALLY_CHECK_COMMAND_TAGS();
+	if (commandname == NULL || *commandname == '\0')
+		return COMMANDTAG_UNKNOWN;
+
+	base = tag_names;
+	last = tag_names + tag_name_length - 1;
+	while (last >= base)
+	{
+		position = base + ((last - base) >> 1);
+		result = pg_strcasecmp(commandname, *position);
+		if (result == 0)
+			return (CommandTag) (position - tag_names);
+		else if (result < 0)
+			last = position - 1;
+		else
+			base = position + 1;
+	}
+	return COMMANDTAG_UNKNOWN;
+}

This seems pretty grotty - but you get rid of it later. See my comments there.

I kept a form of GetCommandTagEnum.

+#ifdef COMMANDTAG_CHECKING
+bool
+CheckCommandTagEnum()
+{
+	CommandTag	i, j;
+
+	if (FIRST_COMMAND_TAG < 0 || LAST_COMMAND_TAG < 0 || LAST_COMMAND_TAG < FIRST_COMMAND_TAG)
+	{
+		elog(ERROR, "FIRST_COMMAND_TAG (%u), LAST_COMMAND_TAG (%u) not reasonable",
+			 (unsigned int) FIRST_COMMAND_TAG, (unsigned int) LAST_COMMAND_TAG);
+		return false;
+	}
+	if (FIRST_COMMAND_TAG != (CommandTag)0)
+	{
+		elog(ERROR, "FIRST_COMMAND_TAG (%u) != 0", (unsigned int) FIRST_COMMAND_TAG);
+		return false;
+	}
+	if (LAST_COMMAND_TAG != (CommandTag)(tag_name_length - 1))
+	{
+		elog(ERROR, "LAST_COMMAND_TAG (%u) != tag_name_length (%u)",
+			 (unsigned int) LAST_COMMAND_TAG, (unsigned int) tag_name_length);
+		return false;
+	}

These all seem to want to be static asserts.

This is all gone now, either to the perl script or to a StaticAssert, or to a bit of both.

+	for (i = FIRST_COMMAND_TAG; i < LAST_COMMAND_TAG; i++)
+	{
+		for (j = i+1; j < LAST_COMMAND_TAG; j++)
+		{
+			int cmp = strcmp(tag_names[i], tag_names[j]);
+			if (cmp == 0)
+			{
+				elog(ERROR, "Found duplicate tag_name: \"%s\"",
+					tag_names[i]);
+				return false;
+			}
+			if (cmp > 0)
+			{
+				elog(ERROR, "Found commandnames out of order: \"%s\" before \"%s\"",
+					tag_names[i], tag_names[j]);
+				return false;
+			}
+		}
+	}
+	return true;
+}

And I think we could get rid of this with my earlier suggestions?

This is now handled by the perl script, also.

+/*
+ * BEWARE: These are in sorted order, but ordered by their printed
+ *         values in the tag_name list (see common/commandtag.c).
+ *         In particular it matters because the sort ordering changes
+ *         when you replace a space with an underscore.  To wit:
+ *
+ *             "CREATE TABLE"
+ *             "CREATE TABLE AS"
+ *             "CREATE TABLESPACE"
+ *
+ *         but...
+ *
+ *             CREATE_TABLE
+ *             CREATE_TABLESPACE
+ *             CREATE_TABLE_AS
+ *
+ *         It also matters that COMMANDTAG_UNKNOWN is written "???".
+ *
+ * If you add a value here, add it in common/commandtag.c also, and
+ * be careful to get the ordering right.  You can build with
+ * COMMANDTAG_CHECKING to have this automatically checked
+ * at runtime, but that adds considerable overhead, so do so sparingly.
+ */
+typedef enum CommandTag
+{

This seems pretty darn nightmarish.

Well, it does get automatically checked for you.

+#define FIRST_COMMAND_TAG COMMANDTAG_UNKNOWN
+	COMMANDTAG_UNKNOWN,
+	COMMANDTAG_ALTER_ACCESS_METHOD,
+	COMMANDTAG_ALTER_AGGREGATE,
+	COMMANDTAG_ALTER_CAST,
+	COMMANDTAG_ALTER_COLLATION,
+	COMMANDTAG_ALTER_CONSTRAINT,
+	COMMANDTAG_ALTER_CONVERSION,
+	COMMANDTAG_ALTER_DATABASE,
+	COMMANDTAG_ALTER_DEFAULT_PRIVILEGES,
+	COMMANDTAG_ALTER_DOMAIN,
[...]

I'm a bit worried that this basically duplicates a good portion of NodeTag, without having otherwise much of a point?

There is not enough overlap between NodeTag and CommandTag for any obvious consolidation. Feel free to recommend something specific.

From a70b0cadc1142e92b2354a0ca3cd47aaeb0c148e Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 4 Feb 2020 12:25:05 -0800
Subject: [PATCH v2 2/3] Using a Bitmapset of tags rather than a string array.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

EventTriggerCacheItem no longer holds an array of palloc’d tag strings
in sorted order, but rather just a Bitmapset over the CommandTags. This
makes the code a little simpler and easier to read, in my opinion. In
filter_event_trigger, rather than running bsearch through a sorted array
of strings, it just runs bms_is_member.
---

It seems weird to add the bsearch just to remove it immediately again a
patch later. This probably should just go first?

I still don’t know what this comment means.

diff --git a/src/test/regress/sql/event_trigger.sql b/src/test/regress/sql/event_trigger.sql
index 346168673d..cad02212ad 100644
--- a/src/test/regress/sql/event_trigger.sql
+++ b/src/test/regress/sql/event_trigger.sql
@@ -10,6 +10,13 @@ BEGIN
END
$$ language plpgsql;
+-- OK
+create function test_event_trigger2() returns event_trigger as $$
+BEGIN
+	RAISE NOTICE 'test_event_trigger2: % %', tg_event, tg_tag;
+END
+$$ LANGUAGE plpgsql;
+
-- should fail, event triggers cannot have declared arguments
create function test_event_trigger_arg(name text)
returns event_trigger as $$ BEGIN RETURN 1; END $$ language plpgsql;
@@ -82,6 +89,783 @@ create event trigger regress_event_trigger2 on ddl_command_start
-- OK
comment on event trigger regress_event_trigger is 'test comment';
+-- These are all unsupported
+create event trigger regress_event_triger_NULL on ddl_command_start
+   when tag in ('')
+   execute procedure test_event_trigger2();
+
+create event trigger regress_event_triger_UNKNOWN on ddl_command_start
+   when tag in ('???')
+   execute procedure test_event_trigger2();
+
+create event trigger regress_event_trigger_ALTER_DATABASE on ddl_command_start
+   when tag in ('ALTER DATABASE')
+   execute procedure test_event_trigger2();

[...]

There got to be a more maintainable way to write this.

This has all been removed from version 3 of the patch set.

Attachments:

v3-0001-Migrating-commandTag-from-string-to-enum.patchapplication/octet-stream; name=v3-0001-Migrating-commandTag-from-string-to-enum.patch; x-unix-mode=0644Download+1865-603
#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Mark Dilger (#7)
Re: Portal->commandTag as an enum

On 2020-Feb-07, Mark Dilger wrote:

Andres,

The previous patch set seemed to cause confusion, having separated
changes into multiple files. The latest patch, heavily influenced by
your review, is all in one file, attached.

Cool stuff.

I think is a little confused about what is source and what is generated.
I'm not clear why commandtag.c is a C file at all; wouldn't it be
simpler to have it as a Data::Dumper file or some sort of Perl struct,
so that it can be read easily by the Perl file? Similar to the
src/include/catalog/pg_*.dat files. That script can then generate all
the needed .c and .h files, which are not going to be part of the source
tree, and will always be in-sync and won't have the formatting
strictness about it. And you won't have the Martian syntax you had to
use in the commandtag.c file.

As for API, please don't shorten things such as SetQC, just use
SetQueryCompletion. Perhaps call it SetCompletionTag or SetCommandTag?.
(I'm not sure about the name "QueryCompletionData"; maybe CommandTag or
QueryTag would work better for that struct. There seems to be a lot of
effort in separating QueryCompletion from CommandTag; is that really
necessary?) Lastly, we have a convention that we have a struct called
FooData, and a typedef FooData *Foo, then use the latter in the API.
We don't adhere to that 100%, and some people dislike it, but I'd rather
be consistent and not be passing "FooData *" around; it's just noisier.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Alvaro Herrera (#8)
Re: Portal->commandTag as an enum

On Feb 11, 2020, at 11:09 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Feb-07, Mark Dilger wrote:

Andres,

The previous patch set seemed to cause confusion, having separated
changes into multiple files. The latest patch, heavily influenced by
your review, is all in one file, attached.

Cool stuff.

Thanks for the review!

I think is a little confused about what is source and what is generated.

The perl file generates nothing. It merely checks that the .h and .c files are valid and consistent with each other.

I'm not clear why commandtag.c is a C file at all; wouldn't it be
simpler to have it as a Data::Dumper file or some sort of Perl struct,
so that it can be read easily by the Perl file? Similar to the
src/include/catalog/pg_*.dat files. That script can then generate all
the needed .c and .h files, which are not going to be part of the source
tree, and will always be in-sync and won't have the formatting
strictness about it. And you won't have the Martian syntax you had to
use in the commandtag.c file.

I thought about generating the files rather than merely checking them. I could see arguments both ways. I wasn’t sure whether there would be broad support for having yet another perl script generating source files, nor for the maintenance burden of having to do that on all platforms. Having a perl script that merely sanity checks the source files has the advantage that there is no requirement for it to function on all platforms. There’s not even a requirement for it to be committed to the tree, since you could also argue that the maintenance burden of the script outweighs the burden of getting the source files right by hand.

As for API, please don't shorten things such as SetQC, just use
SetQueryCompletion. Perhaps call it SetCompletionTag or SetCommandTag?.
(I'm not sure about the name "QueryCompletionData"; maybe CommandTag or
QueryTag would work better for that struct.

I am happy to rename it as SetQueryCompletion.

There seems to be a lot of
effort in separating QueryCompletion from CommandTag; is that really
necessary?)

For some code paths, prior to this patch, the commandTag gets changed before returning, and I’m not just talking about the change where the rowcount gets written into the commandTag string. I have a work-in-progress patch to provide system views to track the number of commands executed of each type, and that patch includes the ability to distinguish between what the command started as and what it completed as, so I do want to keep those concepts separate. I rejected the “SetCommandTag” naming suggestion above because we’re really setting information about the completion (what it finished as) and not the command (what it started as). I rejected the “SetCompletionTag” naming because it’s not just the tag that is being set, but both the tag and the row count. I am happy with “SetQueryCompletion” because that naming is consistent with setting the pair of values.

Lastly, we have a convention that we have a struct called
FooData, and a typedef FooData *Foo, then use the latter in the API.
We don't adhere to that 100%, and some people dislike it, but I'd rather
be consistent and not be passing "FooData *" around; it's just noisier.

I’m familiar with the convention, and don’t like it, so I’ll have to look at a better way of naming this. I specifically don’t like it because it makes a mess of using the const qualifier.

Once again, thanks for the review! I will work to get another version of this patch posted around the time I post (separately) the command stats patch.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Mark Dilger (#9)
Re: Portal->commandTag as an enum

On 2020-Feb-11, Mark Dilger wrote:

I thought about generating the files rather than merely checking them.
I could see arguments both ways. I wasn’t sure whether there would be
broad support for having yet another perl script generating source
files, nor for the maintenance burden of having to do that on all
platforms. Having a perl script that merely sanity checks the source
files has the advantage that there is no requirement for it to
function on all platforms. There’s not even a requirement for it to
be committed to the tree, since you could also argue that the
maintenance burden of the script outweighs the burden of getting the
source files right by hand.

No thanks.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Alvaro Herrera (#10)
Re: Portal->commandTag as an enum

On Feb 11, 2020, at 12:50 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Feb-11, Mark Dilger wrote:

I thought about generating the files rather than merely checking them.
I could see arguments both ways. I wasn’t sure whether there would be
broad support for having yet another perl script generating source
files, nor for the maintenance burden of having to do that on all
platforms. Having a perl script that merely sanity checks the source
files has the advantage that there is no requirement for it to
function on all platforms. There’s not even a requirement for it to
be committed to the tree, since you could also argue that the
maintenance burden of the script outweighs the burden of getting the
source files right by hand.

No thanks.

I’m not sure which option you are voting for:

(Option #1) Have the perl script generate the .c and .h file from a .dat file

(Option #2) Have the perl script validate but not generate the .c and .h files

(Option #3) Have no perl script, with all burden on the programmer to get the .c and .h files right by hand.

I think you’re voting against #3, and I’m guessing you’re voting for #1, but I’m not certain.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Mark Dilger (#11)
Re: Portal->commandTag as an enum

On 2020-Feb-11, Mark Dilger wrote:

No thanks.

I’m not sure which option you are voting for:

(Option #1) Have the perl script generate the .c and .h file from a .dat file

(Option #2) Have the perl script validate but not generate the .c and .h files

(Option #3) Have no perl script, with all burden on the programmer to get the .c and .h files right by hand.

I think you’re voting against #3, and I’m guessing you’re voting for #1, but I’m not certain.

I was voting against #2 (burden the programmer with consistency checks
that must be fixed by hand, without actually doing the programmatically-
doable work), but I don't like #3 either. I do like #1.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Alvaro Herrera (#12)
Re: Portal->commandTag as an enum

On Feb 11, 2020, at 1:02 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Feb-11, Mark Dilger wrote:

No thanks.

I’m not sure which option you are voting for:

(Option #1) Have the perl script generate the .c and .h file from a .dat file

(Option #2) Have the perl script validate but not generate the .c and .h files

(Option #3) Have no perl script, with all burden on the programmer to get the .c and .h files right by hand.

I think you’re voting against #3, and I’m guessing you’re voting for #1, but I’m not certain.

I was voting against #2 (burden the programmer with consistency checks
that must be fixed by hand, without actually doing the programmatically-
doable work), but I don't like #3 either. I do like #1.

Option #1 works for me. If I don’t see any contrary votes before I get back to this patch, I’ll implement it that way for the next version.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Mark Dilger (#13)
Re: Portal->commandTag as an enum

On Feb 11, 2020, at 1:05 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:

On Feb 11, 2020, at 1:02 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Feb-11, Mark Dilger wrote:

No thanks.

I’m not sure which option you are voting for:

(Option #1) Have the perl script generate the .c and .h file from a .dat file

(Option #2) Have the perl script validate but not generate the .c and .h files

(Option #3) Have no perl script, with all burden on the programmer to get the .c and .h files right by hand.

I think you’re voting against #3, and I’m guessing you’re voting for #1, but I’m not certain.

I was voting against #2 (burden the programmer with consistency checks
that must be fixed by hand, without actually doing the programmatically-
doable work), but I don't like #3 either. I do like #1.

Option #1 works for me. If I don’t see any contrary votes before I get back to this patch, I’ll implement it that way for the next version.

While investigating how best to implement option #1, I took a look at how Catalog.pm does it.

Catalog.pm reads data files and eval()s chunks of them to vivify perl data.

# We're treating the input line as a piece of Perl, so we
# need to use string eval here. Tell perlcritic we know what
# we're doing.
eval '$hash_ref = ' . $_; ## no critic (ProhibitStringyEval)

This would only make sense to me if the string held in $_ had already been checked for safety, but Catalog.pm does very little to verify that the string is safe to eval. The assumption here, so far as I can infer, is that we don’t embed anything dangerous in our .dat files, so this should be ok. That may be true for the moment, but I can imagine a day when we start embedding perl functions as quoted text inside a data file, or shell commands as text which look enough like perl for eval() to be able to execute them. Developers who edit these .dat files and mess up the quoting, and then rerun ‘make’ to get the new .c and .h files generated, may not like the side effects. Perhaps I’m being overly paranoid….

Rather than add more code generation logic based on the design of Catalog.pm, I wrote a perl based data file parser that parses .dat files and returns vivified perl data, as Catalog.pm does, but with much stricter parsing logic to make certain nothing dangerous gets eval()ed. I put the new module in DataFile.pm. The commandtag data has been consolidated into a single .dat file. A new perl script, gencommandtag.pl, parses the commandtag.dat file and generates the .c and .h files. So far, only gencommandtag.pl uses DataFile.pm, but I’ve checked that it can parse all the .dat files currently in the source tree.

The new parser is more flexible about the structure of the data, which seems good to me for making it easier to add or modify data files in the future. The new parser does not yet have a means of hacking up the data to add autogenerated fields and such that Catalog.pm does, but I think a more clean break between parsing and autovivifying fields would be good anyway. If I get generally favorable reviews of DataFile.pm, I might go refactor Catalog.pm. For now, I’m just leaving Catalog.pm alone.

That script can then generate all
the needed .c and .h files, which are not going to be part of the source
tree, and will always be in-sync and won't have the formatting
strictness about it. And you won't have the Martian syntax you had to
use in the commandtag.c file.

I still have the “Martian syntax”, though now it is generated by the perl script. I can get rid of it, but I think Andres liked the Martian stuff.

We don't adhere to that 100%, and some people dislike it, but I'd rather
be consistent and not be passing "FooData *" around; it's just noisier.

I renamed QueryCompletionData to just QueryCompletion, and I’m passing pointers to that.

Attachments:

v4-0001-Migrating-commandTag-from-string-to-enum.patchapplication/octet-stream; name=v4-0001-Migrating-commandTag-from-string-to-enum.patch; x-unix-mode=0644Download+2340-615
#15John Naylor
john.naylor@enterprisedb.com
In reply to: Mark Dilger (#14)
Re: Portal->commandTag as an enum

Hi Mark,

On Wed, Feb 19, 2020 at 10:40 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

This would only make sense to me if the string held in $_ had already been checked for safety, but Catalog.pm does very little to verify that the string is safe to eval. The assumption here, so far as I can infer, is that we don’t embed anything dangerous in our .dat files, so this should be ok. That may be true for the moment, but I can imagine a day when we start embedding perl functions as quoted text inside a data file, or shell commands as text which look enough like perl for eval() to be able to execute them. Developers who edit these .dat files and mess up the quoting, and then rerun ‘make’ to get the new .c and .h files generated, may not like the side effects. Perhaps I’m being overly paranoid….

The use case for that seems slim. However, at a brief glance your
module seems more robust in other ways.

Rather than add more code generation logic based on the design of Catalog.pm, I wrote a perl based data file parser that parses .dat files and returns vivified perl data, as Catalog.pm does, but with much stricter parsing logic to make certain nothing dangerous gets eval()ed. I put the new module in DataFile.pm.
[...]
The new parser is more flexible about the structure of the data, which seems good to me for making it easier to add or modify data files in the future. The new parser does not yet have a means of hacking up the data to add autogenerated fields and such that Catalog.pm does, but I think a more clean break between parsing and autovivifying fields would be good anyway.

Separation of concerns sounds like a good idea, but I've not fully
thought it through. For one advantage, I think it might be nicer to
have indexing.dat and toasting.dat instead of having to dig the info
out of C macros. This was evident while recently experimenting with
generating catcache control data.

As for the patch, I have not done a full review, but I have some
comments based on a light read-through:

utils/Makefile:

+# location of commandtag.dat
+headerdir = $(top_srcdir)/src/include/utils

This variable name is too generic for what the comment says it is. A
better abstraction, if we want one, would be the full path to the
commandtag input file. The other script invocations in this Makefile
don't do it this way, but that's a separate patch.

+# location to write generated headers
+sourcedir = $(top_srcdir)/src/backend/utils

Calling the output the source is bound to confuse people. The comment
implies all generated headers, not just the ones introduced by the
patch. I would just output to the current directory (i.e. have an
--output option with a default empty string). Also, if we want to
output somewhere else, I would imagine it'd be under the top builddir,
not srcdir.

+$(PERL) -I $(top_srcdir)/src/include/utils $<
--headerdir=$(headerdir) --sourcedir=$(sourcedir)
--inputfile=$(headerdir)/commandtag.dat

1. headerdir is entirely unused by the script
2. We can default to working dir for the output as mentioned above
3. -I $(top_srcdir)/src/include/utils is supposed to point to the dir
containing DataFile.pm, but since gencommandtag.pl has "use lib..."
it's probably not needed here. I don't recall why we keep the "-I"
elsewhere. (ditto in Solution.pm)

I'm thinking it would look something like this:

+$(PERL) $< --inputfile=$(top_srcdir)/src/include/utils/commandtag.dat

--
utils/misc/Makefile

+all: distprep
+
 # Note: guc-file.c is not deleted by 'make clean',
 # since we want to ship it in distribution tarballs.
 clean:
  @rm -f lex.yy.c
+
+maintainer-clean: clean

Seems non-functional.

--
DataFiles.pm

I haven't studied this in detail, but I'll suggest that if this meant
to have wider application, maybe it should live in src/tools ?

I'm not familiar with using different IO routines depending on the OS
-- what's the benefit of that?

--
gencommandtag.pl

slurp_without_comments() is unused.

sanity_check_data() seems longer than the main body of the script
(minus header boilerplate), and I wonder if we can pare it down some.
For one, I have difficulty imagining anyone would accidentally type an
unprintable or non-ascii character in a command tag and somehow not
realize it. For another, duplicating checks that were done earlier
seems like a maintenance headache.

dataerror() is defined near the top, but other functions are defined
at the bottom.

+# Generate all output internally before outputting anything, to avoid
+# partially overwriting generated files under error conditions

My personal preference is, having this as a design requirement
sacrifices readability for unclear gain, especially since a "chunk"
also includes things like header boilerplate. That said, the script is
also short enough that it doesn't make a huge difference either way.
Speaking of boilerplate, it's better for readability to separate that
from actual code such as:

typedef enum CommandTag
{
#define FIRST_COMMANDTAG COMMANDTAG_$sorted[0]->{taglabel})

--
tcop/dest.c

+ * We no longer display LastOid, but to preserve the wire protocol,
+ * we write InvalidOid where the LastOid used to be written.  For
+ * efficiency in the snprintf(), hard-code InvalidOid as zero.

Hmm, is hard-coding zero going to make any difference here?

--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#16Mark Dilger
mark.dilger@enterprisedb.com
In reply to: John Naylor (#15)
Re: Portal->commandTag as an enum

On Feb 19, 2020, at 3:31 AM, John Naylor <john.naylor@2ndquadrant.com> wrote:

Hi Mark,

Hi John, thanks for reviewing!

On Wed, Feb 19, 2020 at 10:40 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

This would only make sense to me if the string held in $_ had already been checked for safety, but Catalog.pm does very little to verify that the string is safe to eval. The assumption here, so far as I can infer, is that we don’t embed anything dangerous in our .dat files, so this should be ok. That may be true for the moment, but I can imagine a day when we start embedding perl functions as quoted text inside a data file, or shell commands as text which look enough like perl for eval() to be able to execute them. Developers who edit these .dat files and mess up the quoting, and then rerun ‘make’ to get the new .c and .h files generated, may not like the side effects. Perhaps I’m being overly paranoid….

The use case for that seems slim. However, at a brief glance your
module seems more robust in other ways.

Rather than add more code generation logic based on the design of Catalog.pm, I wrote a perl based data file parser that parses .dat files and returns vivified perl data, as Catalog.pm does, but with much stricter parsing logic to make certain nothing dangerous gets eval()ed. I put the new module in DataFile.pm.
[...]
The new parser is more flexible about the structure of the data, which seems good to me for making it easier to add or modify data files in the future. The new parser does not yet have a means of hacking up the data to add autogenerated fields and such that Catalog.pm does, but I think a more clean break between parsing and autovivifying fields would be good anyway.

Separation of concerns sounds like a good idea, but I've not fully
thought it through. For one advantage, I think it might be nicer to
have indexing.dat and toasting.dat instead of having to dig the info
out of C macros. This was evident while recently experimenting with
generating catcache control data.

I guess you mean macros DECLARE_UNIQUE_INDEX and DECLARE_TOAST. I don’t mind converting that to .dat files, though I’m mindful of Tom’s concern expressed early in this thread about the amount of code churn. Is there sufficient demand for refactoring this stuff? There are more reasons in the conversation below to refactor the perl modules and code generation scripts.

As for the patch, I have not done a full review, but I have some
comments based on a light read-through:

utils/Makefile:

+# location of commandtag.dat
+headerdir = $(top_srcdir)/src/include/utils

This variable name is too generic for what the comment says it is. A
better abstraction, if we want one, would be the full path to the
commandtag input file. The other script invocations in this Makefile
don't do it this way, but that's a separate patch.

+# location to write generated headers
+sourcedir = $(top_srcdir)/src/backend/utils

Calling the output the source is bound to confuse people. The comment
implies all generated headers, not just the ones introduced by the
patch. I would just output to the current directory (i.e. have an
--output option with a default empty string). Also, if we want to
output somewhere else, I would imagine it'd be under the top builddir,
not srcdir.

+$(PERL) -I $(top_srcdir)/src/include/utils $<
--headerdir=$(headerdir) --sourcedir=$(sourcedir)
--inputfile=$(headerdir)/commandtag.dat

1. headerdir is entirely unused by the script
2. We can default to working dir for the output as mentioned above
3. -I $(top_srcdir)/src/include/utils is supposed to point to the dir
containing DataFile.pm, but since gencommandtag.pl has "use lib..."
it's probably not needed here. I don't recall why we keep the "-I"
elsewhere. (ditto in Solution.pm)

I'm thinking it would look something like this:

+$(PERL) $< --inputfile=$(top_srcdir)/src/include/utils/commandtag.dat\

I have taken all this advice in v5 of the patch. --inputfile and --outputdir (previously named --sourcedir) are now optional with the defaults as you suggested.

--
utils/misc/Makefile

+all: distprep
+
# Note: guc-file.c is not deleted by 'make clean',
# since we want to ship it in distribution tarballs.
clean:
@rm -f lex.yy.c
+
+maintainer-clean: clean

Seems non-functional.

Yeah, I also had an unnecessary addition to .gitignore in that directory. I had originally placed the commandtag stuff here before moving it one directory up. Thanks for catching that.

--
DataFiles.pm

I haven't studied this in detail, but I'll suggest that if this meant
to have wider application, maybe it should live in src/tools ?

We don’t seem to have a standard place for perl modules. src/test/perl has some that are specifically for tap testing, and src/backend/catalog has some for catalog data file processing. I put DataFiles.pm in src/backend/catalog because that’s where most data file processing currently is located. src/tools has PerfectHash.pm, and a bunch of Windows specific modules under src/tools/msvc.

I'm not familiar with using different IO routines depending on the OS
-- what's the benefit of that?

I think you are talking about the slurp_file routine. That came directly from the TestLib.pm module. I don't have enough perl-on-windows experience to comment about why it does things that way. I was reluctant to have DataFile.pm 'use TestLib', since DataFile has absolutely nothing to do with regression testing. I don't like copying the function, either, though I chose that as the lesser evil. Which is more evil is debateable.

src/test/perl/ contains SimpleTee.pm and RecursiveCopy.pm, neither of which contain functionality limited to just testing. I think they could be moved to src/tools. src/test/perl/TestLib.pm contains a mixture of testing specific functions and more general purpose functions. For instance, TestLib.pm contains functions to read in a file or directory (slurp_file(filepath) and slurp_dir(dirpath), respectively). I think we should have just one implementation of those in just one place. Neither TestLib nor DataFile seem appropriate, nor does src/test/perl seem right. I checked whether Perl ships with core module support for this and didn't find anything. There is a cpan module named File::Slurp, but it is not a core module so far as I can tell, and it does more than we want.

Should I submit a separate patch refactoring the location of perl modules and functions of general interest into src/tools? src/tools/perl?

I am not changing DataFile.pm's duplicate copy of slurp_file in v5 of the patch, as I don't yet know the best way to approach the problem. I expect there will have to be a v6 once this has been adequately debated.

--
gencommandtag.pl

slurp_without_comments() is unused.

Right. An earlier version of gencommandtag.pl didn't use DataFile.pm, and I neglected to remove this function when I transitioned to using DataFile.pm. Thanks for noticing!

sanity_check_data() seems longer than the main body of the script
(minus header boilerplate), and I wonder if we can pare it down some.
For one, I have difficulty imagining anyone would accidentally type an
unprintable or non-ascii character in a command tag and somehow not
realize it.

I'm uncertain about that. There is logic in EndCommand in tcop/dest.c that specifically warns that no encoding conversion will be performed due to the assumption that command tags contain only 7-bit ascii. I think that's a perfectly reasonable assumption in the C-code, but it needs to be checked by gencommandtag.pl because the bugs that might ensue from inserting an accent character or whatever could be subtle enough to not be caught right away. Such mistakes only get easier as time goes by, as the tendency for editors to change your quotes into "smart quotes" and such gets more common, and potentially as the assumption that PostgreSQL has been internationalized gets more common. Hopefully, we're moving more and more towards supporting non-ascii in more and more places. It might be less obvious to a contributor some years hence that they cannot stick an accented character into a command tag. (Compare, for example, that it used to be widely accepted that you shouldn't stick spaces and hyphens into file names, but now a fair number of programmers will do that without flinching.)

As for checking for unprintable characters, the case is weaker. I'm not too motivated to remove the check, though.

For another, duplicating checks that were done earlier
seems like a maintenance headache.

Hmmm. As long as gencommandtag.pl is the only user of DataFile.pm, I'm inclined to agree that we're double-checking some things. The code comments I wrote certainly say so. But if DataFile.pm gets wider adoption, it might start to accept more varied input, and then gencommandtag.pl will need to assert its own set of validation. There is also the distinction between checking that the input data file meets the syntax requirements of the *parser* vs. making certain that the vivified perl structures meet the semantic requirements of the *code generator*. You may at this point be able to assert that meeting the first guarantees meeting the second, but that can't be expected to hold indefinitely.

It would be easier to decide these matters if we knew whether commandtag logic will ever be removed and whether DataFile will ever gain wider adoption for code generation purposes....

dataerror() is defined near the top, but other functions are defined
at the bottom.

Moved.

+# Generate all output internally before outputting anything, to avoid
+# partially overwriting generated files under error conditions

My personal preference is, having this as a design requirement
sacrifices readability for unclear gain, especially since a "chunk"
also includes things like header boilerplate. That said, the script is
also short enough that it doesn't make a huge difference either way.

Catalog.pm writes a temporary file and then moves it to the final file name at the end. DataFile buffers the output and only writes it after all the code generation has succeeded. There is no principled basis for these two modules tackling the same problem in two different ways. Perhaps that's another argument for pulling this kind of functionality out of random places and consolidating it in one or more modules in src/tools.

Speaking of boilerplate, it's better for readability to separate that
from actual code such as:

typedef enum CommandTag
{
#define FIRST_COMMANDTAG COMMANDTAG_$sorted[0]->{taglabel})

Good idea. While I was doing this, I also consolidated the duplicated boilerplate into just one function. I think this function, too, should go in just one perl module somewhere. See boilerplate_header() for details.

--
tcop/dest.c

+ * We no longer display LastOid, but to preserve the wire protocol,
+ * we write InvalidOid where the LastOid used to be written.  For
+ * efficiency in the snprintf(), hard-code InvalidOid as zero.

Hmm, is hard-coding zero going to make any difference here?

Part of the value of refactoring the commandtag logic is to make it easier to remove the whole ugly mess later. Having snprintf write the Oid into the string obfuscates the stupidity of what is really being done here. Putting the zero directly into the format string makes it clearer, to my eyes, that nothing clever is afoot.

I have removed the sentence about efficiency. Thanks for mentioning it.

Attachments:

v5-0001-Migrating-commandTag-from-string-to-enum.patchapplication/octet-stream; name=v5-0001-Migrating-commandTag-from-string-to-enum.patch; x-unix-mode=0644Download+2311-615
#17John Naylor
john.naylor@enterprisedb.com
In reply to: Mark Dilger (#16)
Re: Portal->commandTag as an enum

On Thu, Feb 20, 2020 at 5:16 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

On Feb 19, 2020, at 3:31 AM, John Naylor <john.naylor@2ndquadrant.com> wrote:
thought it through. For one advantage, I think it might be nicer to
have indexing.dat and toasting.dat instead of having to dig the info
out of C macros. This was evident while recently experimenting with
generating catcache control data.

I guess you mean macros DECLARE_UNIQUE_INDEX and DECLARE_TOAST. I don’t mind converting that to .dat files, though I’m mindful of Tom’s concern expressed early in this thread about the amount of code churn. Is there sufficient demand for refactoring this stuff? There are more reasons in the conversation below to refactor the perl modules and code generation scripts.

Yes, I was referring to those macros, but I did not mean to imply you
should convert them, either now or ever. I was just thinking out loud
about the possibility. :-)

That said, if we ever want Catalog.pm to delegate vivification to
DataFile.pm, there will eventually need to be a way to optionally
preserve comments and whitespace.

I have taken all this advice in v5 of the patch. --inputfile and --outputdir (previously named --sourcedir) are now optional with the defaults as you suggested.

+my $inputfile = '../../include/utils/commandtag.dat';

I think we should skip the default for the input file, since the
relative path is fragile and every such script I've seen requires it
to be passed in.

DataFiles.pm
[...]
I'm not familiar with using different IO routines depending on the OS
-- what's the benefit of that?

I think you are talking about the slurp_file routine. That came directly from the TestLib.pm module. I don't have enough perl-on-windows experience to comment about why it does things that way.

Yes, that one, sorry I wasn't explicit.

Should I submit a separate patch refactoring the location of perl modules and functions of general interest into src/tools? src/tools/perl?

We may have accumulated enough disparate functionality by now to
consider this, but it sounds like PG14 material in any case.

I expect there will have to be a v6 once this has been adequately debated.

So far I haven't heard any justification for why it should remain in
src/backend/catalog, when it has nothing to do with catalogs. We don't
have to have a standard other-place for there to be a better
other-place.

--
gencommandtag.pl

sanity_check_data() seems longer than the main body of the script
(minus header boilerplate), and I wonder if we can pare it down some.
For one, I have difficulty imagining anyone would accidentally type an
unprintable or non-ascii character in a command tag and somehow not
realize it.
[...]
For another, duplicating checks that were done earlier
seems like a maintenance headache.

[ detailed rebuttals about the above points ]

Those were just examples that stuck out at me, so even if I were
convinced to your side on those, my broader point was: the sanity
check seems way over-engineered for something that spits out an enum
and an array. Maybe I'm not imaginative enough. I found it hard to
read in any case.

Catalog.pm writes a temporary file and then moves it to the final file name at the end. DataFile buffers the output and only writes it after all the code generation has succeeded. There is no principled basis for these two modules tackling the same problem in two different ways.

Not the same problem. The temp files were originally for parallel Make
hazards, and now to prevent large portions of the backend to be
rebuilt. I actually think partially written files can be helpful for
debugging, so I don't even think it's a problem. But as I said, it
doesn't matter terribly much either way.

Speaking of boilerplate, it's better for readability to separate that
from actual code such as:

typedef enum CommandTag
{
#define FIRST_COMMANDTAG COMMANDTAG_$sorted[0]->{taglabel})

Good idea. While I was doing this, I also consolidated the duplicated boilerplate into just one function. I think this function, too, should go in just one perl module somewhere. See boilerplate_header() for details.

I like this a lot.

While thinking, I wonder if it makes sense to have a CodeGen module,
which would contain e.g. the new ParseData function,
FindDefinedSymbol, and functions for writing boilerplate.

--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#18John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#17)
Re: Portal->commandTag as an enum

Thinking about this some more, would it be possible to treat these
like we do parser/kwlist.h? Something like this:

commandtag_list.h:
PG_COMMANDTAG(ALTER_ACCESS_METHOD, "ALTER ACCESS METHOD", true, false,
false, false)
...

then, just:

#define PG_COMMANDTAG(taglabel, tagname, event_trigger, table_rewrite,
display_rowcount, display_oid) label,

typedef enum CommandTag
{
#include "commandtag_list.h"
}

#undef PG_COMMANDTAG

...and then:

#define PG_COMMANDTAG(taglabel, tagname, event_trigger, table_rewrite,
display_rowcount, display_oid) \
{ tagname, event_trigger, table_rewrite, display_rowcount, display_oid },

const CommandTagBehavior tag_behavior[] =
{
#include "commandtag_list.h"
}

#undef PG_COMMANDTAG

I'm hand-waving a bit, and it doesn't have the flexibility of a .dat
file, but it's a whole lot simpler.

--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#19Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: John Naylor (#18)
Re: Portal->commandTag as an enum

On 2020-Feb-21, John Naylor wrote:

Thinking about this some more, would it be possible to treat these
like we do parser/kwlist.h? Something like this:

commandtag_list.h:
PG_COMMANDTAG(ALTER_ACCESS_METHOD, "ALTER ACCESS METHOD", true, false,
false, false)
...

I liked this idea, so I'm halfway on it now.

I'm hand-waving a bit, and it doesn't have the flexibility of a .dat
file, but it's a whole lot simpler.

Yeah, I for one don't want to maintain the proposed DataFile.pm.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#20Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#19)
Re: Portal->commandTag as an enum

On 2020-Feb-28, Alvaro Herrera wrote:

On 2020-Feb-21, John Naylor wrote:

Thinking about this some more, would it be possible to treat these
like we do parser/kwlist.h? Something like this:

commandtag_list.h:
PG_COMMANDTAG(ALTER_ACCESS_METHOD, "ALTER ACCESS METHOD", true, false,
false, false)
...

I liked this idea, so I'm halfway on it now.

Here.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v6-0001-Migrating-commandTag-from-string-to-enum.patchtext/x-diff; charset=utf-8Download+882-609
#21Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#21)
#23Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#22)
#24Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Tom Lane (#22)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Dilger (#24)
#26Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Tom Lane (#25)
#27Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Mark Dilger (#26)
#28Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Alvaro Herrera (#27)
#29Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Mark Dilger (#28)
#30Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#29)
#31Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#30)
#32Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Alvaro Herrera (#31)
#33Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Mark Dilger (#32)
#34Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Alvaro Herrera (#33)
#35Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Alvaro Herrera (#33)
#36Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Mark Dilger (#35)