Enumize logical replication message actions

Started by Ashutosh Bapatover 5 years ago33 messageshackers
Jump to latest
#1Ashutosh Bapat
ashutosh.bapat@enterprisedb.com

Hi All,
Logical replication protocol uses single byte character to identify
different chunks of logical repliation messages. The code uses
character literals for the same. These literals are used as bare
constants in code as well. That's true for almost all the code that
deals with wire protocol. With that it becomes difficult to identify
the code which deals with a particular message. For example code that
deals with message type 'B'. In various protocol 'B' has different
meaning and it gets difficult and time consuming to differentiate one
usage from other and find all places which deal with one usage. Here's
a patch simplifying that for top level logical replication messages.

I think I have covered the places that need change. But I might have
missed something, given that these literals are used at several other
places (a problem this patch tries to fix :)).

Initially I had used #define for the same, but Peter E suggested using
Enums so that switch cases can detect any remaining items along with
stronger type checks.

Pavan offleast suggested to create a wrapper
pg_send_logical_rep_message() on top of pg_sendbyte(), similarly for
pg_getmsgbyte(). I wanted to see if this change is acceptable. If so,
I will change that as well. Comments/suggestions welcome.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Enumize-top-level-logical-replication-actions.patchtext/x-patch; charset=US-ASCII; name=0001-Enumize-top-level-logical-replication-actions.patchDownload+58-40
#2Japin Li
japinli@hotmail.com
In reply to: Ashutosh Bapat (#1)
Re: Enumize logical replication message actions

On Oct 16, 2020, at 3:25 PM, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

Hi All,
Logical replication protocol uses single byte character to identify
different chunks of logical repliation messages. The code uses
character literals for the same. These literals are used as bare
constants in code as well. That's true for almost all the code that
deals with wire protocol. With that it becomes difficult to identify
the code which deals with a particular message. For example code that
deals with message type 'B'. In various protocol 'B' has different
meaning and it gets difficult and time consuming to differentiate one
usage from other and find all places which deal with one usage. Here's
a patch simplifying that for top level logical replication messages.

I think I have covered the places that need change. But I might have
missed something, given that these literals are used at several other
places (a problem this patch tries to fix :)).

Initially I had used #define for the same, but Peter E suggested using
Enums so that switch cases can detect any remaining items along with
stronger type checks.

Pavan offleast suggested to create a wrapper
pg_send_logical_rep_message() on top of pg_sendbyte(), similarly for
pg_getmsgbyte(). I wanted to see if this change is acceptable. If so,
I will change that as well. Comments/suggestions welcome.

--
Best Wishes,
Ashutosh Bapat
<0001-Enumize-top-level-logical-replication-actions.patch>

What about ’N’ for new tuples, ‘O’ for old tuple follows, ‘K’ for old key follows?
Those are also logical replication protocol message, I think.

--
Best regards
Japin Li

#3Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Japin Li (#2)
Re: Enumize logical replication message actions

At Fri, 16 Oct 2020 08:08:40 +0000, Li Japin <japinli@hotmail.com> wrote in

On Oct 16, 2020, at 3:25 PM, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

Hi All,
Logical replication protocol uses single byte character to identify
different chunks of logical repliation messages. The code uses
character literals for the same. These literals are used as bare
constants in code as well. That's true for almost all the code that
deals with wire protocol. With that it becomes difficult to identify
the code which deals with a particular message. For example code that
deals with message type 'B'. In various protocol 'B' has different
meaning and it gets difficult and time consuming to differentiate one
usage from other and find all places which deal with one usage. Here's
a patch simplifying that for top level logical replication messages.

I think I have covered the places that need change. But I might have
missed something, given that these literals are used at several other
places (a problem this patch tries to fix :)).

Initially I had used #define for the same, but Peter E suggested using
Enums so that switch cases can detect any remaining items along with
stronger type checks.

Pavan offleast suggested to create a wrapper
pg_send_logical_rep_message() on top of pg_sendbyte(), similarly for
pg_getmsgbyte(). I wanted to see if this change is acceptable. If so,
I will change that as well. Comments/suggestions welcome.

--
Best Wishes,
Ashutosh Bapat
<0001-Enumize-top-level-logical-replication-actions.patch>

What about ’N’ for new tuples, ‘O’ for old tuple follows, ‘K’ for old key follows?
Those are also logical replication protocol message, I think.

They are flags stored in a message so they can be seen as different
from the message type letters.

Anyway if the values are determined after some meaning, I'm not sure
enumerize them is good thing or not. In other words, 'U' conveys
almost same amount of information with LOGICAL_REP_MSG_UPDATE in the
context of logical replcation protocol.

We have the same code pattern in PostgresMain and perhaps we don't
going to change them into enums.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#4Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Kyotaro Horiguchi (#3)
Re: Enumize logical replication message actions

On Fri, 16 Oct 2020 at 14:06, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:

At Fri, 16 Oct 2020 08:08:40 +0000, Li Japin <japinli@hotmail.com> wrote
in

On Oct 16, 2020, at 3:25 PM, Ashutosh Bapat <

ashutosh.bapat.oss@gmail.com> wrote:

What about ’N’ for new tuples, ‘O’ for old tuple follows, ‘K’ for old

key follows?

Those are also logical replication protocol message, I think.

They are flags stored in a message so they can be seen as different
from the message type letters.

I think we converting those into macros/enums will help but for now I have
tackled only the top level message types.

Anyway if the values are determined after some meaning, I'm not sure
enumerize them is good thing or not. In other words, 'U' conveys
almost same amount of information with LOGICAL_REP_MSG_UPDATE in the
context of logical replcation protocol.

We have the same code pattern in PostgresMain and perhaps we don't
going to change them into enums.

That's exactly the problem I am trying to solve. Take for example 'B' as I
have mentioned before. That string literal appears in 64 different places
in the master branch. Which of those are the ones related to a "BEGIN"
message in logical replication protocol is not clear, unless I thumb
through each of those. In PostgresMain it's used to indicate a BIND
message. Which of those 64 instances are also using 'B' to mean a bind
message? Using enums or macros makes it clear. Just look
up LOGICAL_REP_MSG_BEGIN. Converting all 'B' to their respective macros
will help but might be problematic for back-patching. So that's arguable.
But doing that in something as new as logical replication will be helpful,
before it gets too late to change that.

Further logical repliation protocol is using the same literal e.g. 'O' to
mean origin in some places and old tuple in some other. While comments
there help, it's not easy to locate all the code that deals with one
meaning or the other. This change will help with that. Another reason as to
why logical replication.
--
Best Wishes,
Ashutosh

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Bapat (#1)
Re: Enumize logical replication message actions

On Fri, Oct 16, 2020 at 12:55 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Hi All,
Logical replication protocol uses single byte character to identify
different chunks of logical repliation messages. The code uses
character literals for the same. These literals are used as bare
constants in code as well. That's true for almost all the code that
deals with wire protocol. With that it becomes difficult to identify
the code which deals with a particular message. For example code that
deals with message type 'B'. In various protocol 'B' has different
meaning and it gets difficult and time consuming to differentiate one
usage from other and find all places which deal with one usage. Here's
a patch simplifying that for top level logical replication messages.

+1. I think this will make the code easier to read and understand. I
think it would be good to do this in some other parts as well but
starting with logical replication is a good idea as that area is still
evolving.

--
With Regards,
Amit Kapila.

#6Andres Freund
andres@anarazel.de
In reply to: Ashutosh Bapat (#1)
Re: Enumize logical replication message actions

Hi,

On 2020-10-16 12:55:26 +0530, Ashutosh Bapat wrote:

Here's a patch simplifying that for top level logical replication
messages.

I think that's a good plan. One big benefit for me is that it's much
easier to search for an enum than for a single letter
constant. Including searching for all the places that deal with any sort
of logical rep message type.

void
logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
{
-	pq_sendbyte(out, 'B');		/* BEGIN */
+	pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);		/* BEGIN */

I think if we have the LOGICAL_REP_MSG_BEGIN we don't need the /* BEGIN */.

Greetings,

Andres Freund

#7Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Andres Freund (#6)
Re: Enumize logical replication message actions

Thanks Andres for your review. Thanks Li, Horiguchi-san and Amit for your
comments.

On Tue, 20 Oct 2020 at 04:57, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2020-10-16 12:55:26 +0530, Ashutosh Bapat wrote:

Here's a patch simplifying that for top level logical replication
messages.

I think that's a good plan. One big benefit for me is that it's much
easier to search for an enum than for a single letter
constant. Including searching for all the places that deal with any sort
of logical rep message type.

void
logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
{
-     pq_sendbyte(out, 'B');          /* BEGIN */
+     pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);                /* BEGIN */

I think if we have the LOGICAL_REP_MSG_BEGIN we don't need the /* BEGIN */.

Yes. Fixed all places.

I have attached two places - 0001 which is previous 0001 patch with your
comments addressed.

0002 adds wrappers on top of pq_sendbyte() and pq_getmsgbyte() to send and
receive a logical replication message type respectively. These wrappers add
more protection to make sure that the enum definitions fit one byte. This
also removes the default case from apply_dispatch() so that we can detect
any LogicalRepMsgType not handled by that function.

These two patches are intended to be committed together as a single commit.
For now the second one is separate so that it's easy to remove the changes
if they are not acceptable.

--
Best Wishes,
Ashutosh

Attachments:

0001-Enumize-top-level-logical-replication-actions.patchtext/x-patch; charset=US-ASCII; name=0001-Enumize-top-level-logical-replication-actions.patchDownload+65-41
0002-Functions-to-send-and-receive-LogicalRepMsgType.patchtext/x-patch; charset=US-ASCII; name=0002-Functions-to-send-and-receive-LogicalRepMsgType.patchDownload+59-20
#8Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Ashutosh Bapat (#7)
Re: Enumize logical replication message actions

At Thu, 22 Oct 2020 12:13:40 +0530, Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com> wrote in

Thanks Andres for your review. Thanks Li, Horiguchi-san and Amit for your
comments.

On Tue, 20 Oct 2020 at 04:57, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2020-10-16 12:55:26 +0530, Ashutosh Bapat wrote:

Here's a patch simplifying that for top level logical replication
messages.

I think that's a good plan. One big benefit for me is that it's much
easier to search for an enum than for a single letter
constant. Including searching for all the places that deal with any sort
of logical rep message type.

void
logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
{
-     pq_sendbyte(out, 'B');          /* BEGIN */
+     pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);                /* BEGIN */

I think if we have the LOGICAL_REP_MSG_BEGIN we don't need the /* BEGIN */.

Yes. Fixed all places.

I have attached two places - 0001 which is previous 0001 patch with your
comments addressed.

We shouldn't have the default: in the switch() block in
apply_dispatch(). That prevents compilers from checking
completeness. The content of the default: should be moved out to after
the switch() block.

apply_dispatch()
{
switch (action)
{
....
case LOGICAL_REP_MSG_STREAM_COMMIT(s);
apply_handle_stream_commit(s);
return;
}

ereport(ERROR, ...);
}

0002 adds wrappers on top of pq_sendbyte() and pq_getmsgbyte() to send and
receive a logical replication message type respectively. These wrappers add
more protection to make sure that the enum definitions fit one byte. This
also removes the default case from apply_dispatch() so that we can detect
any LogicalRepMsgType not handled by that function.

pg_send_logicalrep_msg_type() looks somewhat too-much. If we need
something like that we shouldn't do this refactoring, I think.

pg_get_logicalrep_msg_type() seems doing the same check (that the
value is compared aganst every keyword value) with
apply_dispatch(). Why do we need that function separately from
apply_dispatch()?

These two patches are intended to be committed together as a single commit.
For now the second one is separate so that it's easy to remove the changes
if they are not acceptable.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#9Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Kyotaro Horiguchi (#8)
Re: Enumize logical replication message actions

On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:

We shouldn't have the default: in the switch() block in
apply_dispatch(). That prevents compilers from checking
completeness. The content of the default: should be moved out to after
the switch() block.

apply_dispatch()
{
switch (action)
{
....
case LOGICAL_REP_MSG_STREAM_COMMIT(s);
apply_handle_stream_commit(s);
return;
}

ereport(ERROR, ...);
}

0002 adds wrappers on top of pq_sendbyte() and pq_getmsgbyte() to send

and

receive a logical replication message type respectively. These wrappers

add

more protection to make sure that the enum definitions fit one byte. This
also removes the default case from apply_dispatch() so that we can detect
any LogicalRepMsgType not handled by that function.

pg_send_logicalrep_msg_type() looks somewhat too-much. If we need
something like that we shouldn't do this refactoring, I think.

Enum is an integer, and we want to send byte. The function asserts that the
enum fits a byte. If there's a way to declare byte long enums I would use
that. But I didn't find a way to do that.

pg_get_logicalrep_msg_type() seems doing the same check (that the

value is compared aganst every keyword value) with
apply_dispatch(). Why do we need that function separately from
apply_dispatch()?

The second patch removes the default case you quoted above. I think that's
important to detect any unhandled case at compile time rather than at run
time. But we need some way to detect whether the values we get from wire
are legit. pg_get_logicalrep_msg_type() does that. Further that function
can be used at places other than apply_dispatch() if required without each
of those places having their own validation.

--
Best Wishes,
Ashutosh

#10Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Ashutosh Bapat (#9)
Re: Enumize logical replication message actions

At Thu, 22 Oct 2020 16:37:18 +0530, Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com> wrote in

On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:

We shouldn't have the default: in the switch() block in
apply_dispatch(). That prevents compilers from checking
completeness. The content of the default: should be moved out to after
the switch() block.

apply_dispatch()
{
switch (action)
{
....
case LOGICAL_REP_MSG_STREAM_COMMIT(s);
apply_handle_stream_commit(s);
return;
}

ereport(ERROR, ...);
}

0002 adds wrappers on top of pq_sendbyte() and pq_getmsgbyte() to send

and

receive a logical replication message type respectively. These wrappers

add

more protection to make sure that the enum definitions fit one byte. This
also removes the default case from apply_dispatch() so that we can detect
any LogicalRepMsgType not handled by that function.

pg_send_logicalrep_msg_type() looks somewhat too-much. If we need
something like that we shouldn't do this refactoring, I think.

Enum is an integer, and we want to send byte. The function asserts that the
enum fits a byte. If there's a way to declare byte long enums I would use
that. But I didn't find a way to do that.

That way of defining enums can contain two different symbols with the
same value. If we need to check the values are actually in the range
of char, checking duplicate values has more importance from the
standpoint of likelihood.

AFAICS there're two instances of this kind of enums, CoreceionMethod
and TypeCat. None of them are not checked for width nor duplicates
when they are used.

Even if we need such a kind of check, it souldn't be a wrapper
function that adds costs on non-assertion builds, but a replacing of
pq_sendbyte() done only on USE_ASSERT_CHECKING.

pg_get_logicalrep_msg_type() seems doing the same check (that the

value is compared aganst every keyword value) with
apply_dispatch(). Why do we need that function separately from
apply_dispatch()?

The second patch removes the default case you quoted above. I think that's
important to detect any unhandled case at compile time rather than at run
time. But we need some way to detect whether the values we get from wire
are legit. pg_get_logicalrep_msg_type() does that. Further that function
can be used at places other than apply_dispatch() if required without each
of those places having their own validation.

Even if that enum contains out-of-range values, that "command" is sent
having truncated to uint8 and on the receiver side apply_dispatch()
doesn't identify the command and raises an error. That is equivalent
to what pq_send_logicalrep_msg_type() does. (Also equivalent on the
point that symbols that are not used in regression are not checked.)

reagrds.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#11Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#10)
Re: Enumize logical replication message actions

At Fri, 23 Oct 2020 10:08:44 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Thu, 22 Oct 2020 16:37:18 +0530, Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com> wrote in

On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:
pg_get_logicalrep_msg_type() seems doing the same check (that the

value is compared aganst every keyword value) with
apply_dispatch(). Why do we need that function separately from
apply_dispatch()?

The second patch removes the default case you quoted above. I think that's
important to detect any unhandled case at compile time rather than at run
time. But we need some way to detect whether the values we get from wire
are legit. pg_get_logicalrep_msg_type() does that. Further that function
can be used at places other than apply_dispatch() if required without each
of those places having their own validation.

Even if that enum contains out-of-range values, that "command" is sent
having truncated to uint8 and on the receiver side apply_dispatch()
doesn't identify the command and raises an error. That is equivalent
to what pq_send_logicalrep_msg_type() does. (Also equivalent on the
point that symbols that are not used in regression are not checked.)

Sorry, this is about pg_send_logicalrep_msg_type(), not
pg_get..(). And I forgot to mention pg_get_logicalrep_msg_type().

For the pg_get_logicalrep_msg_type(), It is just a repetion of what
apply_displatch() does in switch().

If I flattened the code, it looks like:

apply_dispatch(s)
{
LogicalRepMsgType msgtype = pq_getmsgtype(s);
bool pass = false;

switch (msgtype)
{
case LOGICAL_REP_MSG_BEGIN:
...
case LOGICAL_REP_MSG_STREAM_COMMIT:
pass = true;
}
if (!pass)
ereport(ERROR, (errmsg("invalid logical replication message type"..

switch (msgtype)
{
case LOGICAL_REP_MSG_BEGIN:
apply_handle_begin();
break;
...
case LOGICAL_REP_MSG_STREAM_COMMIT:
apply_handle_begin();
break;
}
}

Those two switch()es are apparently redundant. That code is exactly
equivalent to:

apply_dispatch(s)
{
LogicalRepMsgType msgtype = pq_getmsgtype(s);

switch (msgtype)
{
case LOGICAL_REP_MSG_BEGIN:
apply_handle_begin();
! return;
...
case LOGICAL_REP_MSG_STREAM_COMMIT:
apply_handle_begin();
! return;
}

ereport(ERROR, (errmsg("invalid logical replication message type"..
}

which is smaller and fast.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ashutosh Bapat (#9)
Re: Enumize logical replication message actions

On 2020-Oct-22, Ashutosh Bapat wrote:

On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:

pg_send_logicalrep_msg_type() looks somewhat too-much. If we need
something like that we shouldn't do this refactoring, I think.

Enum is an integer, and we want to send byte. The function asserts that the
enum fits a byte. If there's a way to declare byte long enums I would use
that. But I didn't find a way to do that.

I didn't look at the code, but maybe it's sufficient to add a
StaticAssert?

#13Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Alvaro Herrera (#12)
Re: Enumize logical replication message actions

At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in

On 2020-Oct-22, Ashutosh Bapat wrote:

On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:

pg_send_logicalrep_msg_type() looks somewhat too-much. If we need
something like that we shouldn't do this refactoring, I think.

Enum is an integer, and we want to send byte. The function asserts that the
enum fits a byte. If there's a way to declare byte long enums I would use
that. But I didn't find a way to do that.

I didn't look at the code, but maybe it's sufficient to add a
StaticAssert?

That check needs to visit all symbols in a enum and confirm that each
of them is in a certain range.

I thought of StaticAssert, but it cannot run a code and I don't know
of a syntax that loops through all symbols in a enumeration so I think
we needs to write a static assertion on every symbol in the
enumeration, which seems to be a kind of stupid.

enum hoge
{
a = '1',
b = '2',
c = '3'
};

StaticAssertDecl((unsigned int)(a | b | c ...) <= 0xff, "too large symbol value");

I didn't come up with a way to apply static assertion on each symbol
definition line.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#14Peter Smith
smithpb2250@gmail.com
In reply to: Kyotaro Horiguchi (#13)
Re: Enumize logical replication message actions

On Fri, Oct 23, 2020 at 5:20 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in

On 2020-Oct-22, Ashutosh Bapat wrote:

On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:

pg_send_logicalrep_msg_type() looks somewhat too-much. If we need
something like that we shouldn't do this refactoring, I think.

Enum is an integer, and we want to send byte. The function asserts that the
enum fits a byte. If there's a way to declare byte long enums I would use
that. But I didn't find a way to do that.

The pq_send_logicalrep_msg_type() function seemed a bit overkill to me.

The comment in the LogicalRepMsgType enum will sufficiently ensure
nobody is going to accidentally add any bad replication message codes.
And it's not like these are going to be changed often.

Why not simply downcast your enums when calling pq_sendbyte?
There are only a few of them.

e.g. pq_sendbyte(out, (uint8)LOGICAL_REP_MSG_STREAM_COMMIT);

Kind Regards.
Peter Smith
Fujitsu Australia.

#15Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Peter Smith (#14)
Re: Enumize logical replication message actions

At Fri, 23 Oct 2020 19:53:00 +1100, Peter Smith <smithpb2250@gmail.com> wrote in

On Fri, Oct 23, 2020 at 5:20 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in

On 2020-Oct-22, Ashutosh Bapat wrote:

On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:

pg_send_logicalrep_msg_type() looks somewhat too-much. If we need
something like that we shouldn't do this refactoring, I think.

Enum is an integer, and we want to send byte. The function asserts that the
enum fits a byte. If there's a way to declare byte long enums I would use
that. But I didn't find a way to do that.

The pq_send_logicalrep_msg_type() function seemed a bit overkill to me.

Ah, yes, it is what I meant. I didn't come up with the word "overkill".

The comment in the LogicalRepMsgType enum will sufficiently ensure
nobody is going to accidentally add any bad replication message codes.
And it's not like these are going to be changed often.

Agreed.

Why not simply downcast your enums when calling pq_sendbyte?
There are only a few of them.

e.g. pq_sendbyte(out, (uint8)LOGICAL_REP_MSG_STREAM_COMMIT);

If you are worried about compiler warning, that explicit cast is not
required. Even if the symbol is larger than 0xff, the upper bytes are
silently truncated off.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#16Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Kyotaro Horiguchi (#11)
Re: Enumize logical replication message actions

On Fri, 23 Oct 2020 at 06:50, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:

Those two switch()es are apparently redundant. That code is exactly
equivalent to:

apply_dispatch(s)
{
LogicalRepMsgType msgtype = pq_getmsgtype(s);

switch (msgtype)
{
case LOGICAL_REP_MSG_BEGIN:
apply_handle_begin();
! return;
...
case LOGICAL_REP_MSG_STREAM_COMMIT:
apply_handle_begin();
! return;
}

ereport(ERROR, (errmsg("invalid logical replication message type"..
}

which is smaller and fast.

Good idea. Implemented in the latest patch posted with the next mail.

--
Best Wishes,
Ashutosh

#17Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Kyotaro Horiguchi (#15)
Re: Enumize logical replication message actions

On Fri, 23 Oct 2020 at 17:02, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:

At Fri, 23 Oct 2020 19:53:00 +1100, Peter Smith <smithpb2250@gmail.com>
wrote in

On Fri, Oct 23, 2020 at 5:20 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <

alvherre@alvh.no-ip.org> wrote in

On 2020-Oct-22, Ashutosh Bapat wrote:

On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <

horikyota.ntt@gmail.com>

wrote:

pg_send_logicalrep_msg_type() looks somewhat too-much. If we

need

something like that we shouldn't do this refactoring, I think.

Enum is an integer, and we want to send byte. The function asserts

that the

enum fits a byte. If there's a way to declare byte long enums I

would use

that. But I didn't find a way to do that.

The pq_send_logicalrep_msg_type() function seemed a bit overkill to me.

Ah, yes, it is what I meant. I didn't come up with the word "overkill".

The comment in the LogicalRepMsgType enum will sufficiently ensure
nobody is going to accidentally add any bad replication message codes.
And it's not like these are going to be changed often.

Agreed.

Why not simply downcast your enums when calling pq_sendbyte?
There are only a few of them.

e.g. pq_sendbyte(out, (uint8)LOGICAL_REP_MSG_STREAM_COMMIT);

If you are worried about compiler warning, that explicit cast is not
required. Even if the symbol is larger than 0xff, the upper bytes are
silently truncated off.

I agree with Peter that the prologue of LogicalRepMsgType is enough.

I also agree with Kyotaro, that explicit cast is unnecessary.

All this together makes the second patch useless. Removed it. Instead used
Kyotaro's idea in previous mail.

PFA updated patch.

--
Best Wishes,
Ashutosh

Attachments:

0001-Enumize-top-level-logical-replication-actions.v2.patchtext/x-patch; charset=US-ASCII; name=0001-Enumize-top-level-logical-replication-actions.v2.patchDownload+81-58
#18Amit Kapila
amit.kapila16@gmail.com
In reply to: Kyotaro Horiguchi (#13)
Re: Enumize logical replication message actions

On Fri, Oct 23, 2020 at 11:50 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in

On 2020-Oct-22, Ashutosh Bapat wrote:

On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:

pg_send_logicalrep_msg_type() looks somewhat too-much. If we need
something like that we shouldn't do this refactoring, I think.

Enum is an integer, and we want to send byte. The function asserts that the
enum fits a byte. If there's a way to declare byte long enums I would use
that. But I didn't find a way to do that.

I didn't look at the code, but maybe it's sufficient to add a
StaticAssert?

That check needs to visit all symbols in a enum and confirm that each
of them is in a certain range.

Can we define something like LOGICAL_REP_MSG_LAST (also add a comment
indicating this is a fake message and must be the last one) as the
last and just check that?

--
With Regards,
Amit Kapila.

#19Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Kapila (#18)
Re: Enumize logical replication message actions

On Fri, 23 Oct 2020 at 18:23, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Oct 23, 2020 at 11:50 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <

alvherre@alvh.no-ip.org> wrote in

On 2020-Oct-22, Ashutosh Bapat wrote:

On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <

horikyota.ntt@gmail.com>

wrote:

pg_send_logicalrep_msg_type() looks somewhat too-much. If we need
something like that we shouldn't do this refactoring, I think.

Enum is an integer, and we want to send byte. The function asserts

that the

enum fits a byte. If there's a way to declare byte long enums I

would use

that. But I didn't find a way to do that.

I didn't look at the code, but maybe it's sufficient to add a
StaticAssert?

That check needs to visit all symbols in a enum and confirm that each
of them is in a certain range.

Can we define something like LOGICAL_REP_MSG_LAST (also add a comment
indicating this is a fake message and must be the last one) as the
last and just check that?

I don't think that's required once I applied suggestions from Kyotaro and
Peter. Please check the latest patch.
Usually LAST is added to an enum when we need to cap the number of symbols
or want to find the number of symbols. None of that is necessary here. Do
you see any other use?

--
Best Wishes,
Ashutosh

#20Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Bapat (#19)
Re: Enumize logical replication message actions

On Fri, Oct 23, 2020 at 6:26 PM Ashutosh Bapat
<ashutosh.bapat@2ndquadrant.com> wrote:

On Fri, 23 Oct 2020 at 18:23, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Oct 23, 2020 at 11:50 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in

On 2020-Oct-22, Ashutosh Bapat wrote:

On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:

pg_send_logicalrep_msg_type() looks somewhat too-much. If we need
something like that we shouldn't do this refactoring, I think.

Enum is an integer, and we want to send byte. The function asserts that the
enum fits a byte. If there's a way to declare byte long enums I would use
that. But I didn't find a way to do that.

I didn't look at the code, but maybe it's sufficient to add a
StaticAssert?

That check needs to visit all symbols in a enum and confirm that each
of them is in a certain range.

Can we define something like LOGICAL_REP_MSG_LAST (also add a comment
indicating this is a fake message and must be the last one) as the
last and just check that?

I don't think that's required once I applied suggestions from Kyotaro and Peter. Please check the latest patch.
Usually LAST is added to an enum when we need to cap the number of symbols or want to find the number of symbols. None of that is necessary here. Do you see any other use?

You mentioned in the beginning that you prefer to use Enum instead of
define so that switch cases can detect any remaining items but I have
tried adding extra enum value at the end and didn't handle that in
switch case but I didn't get any compilation warning or error. Do we
need something else to detect that at compile time?

Some comments assuming we want to use enum either because I am missing
something or due to some other reason we have not discussed yet.

1.
+ LOGICAL_REP_MSG_STREAM_ABORT = 'A',
+} LogicalRepMsgType;

There is no need for a comma after the last message.

2.
+/*
+ * Logical message types
+ *
+ * Used by logical replication wire protocol.
+ *
+ * Note: though this is an enum it should fit a single byte and should be a
+ * printable character.
+ */
+typedef enum
+{

I think we can expand the comments to probably say why we need these
to fit in a single byte or what problem it can cause if that rule is
disobeyed. This is to make the next person clear why we are imposing
such a rule.

3.
+typedef enum
+{
..
+} LogicalRepMsgType;

There are places in code where we use the enum name
(LogicalRepMsgType) both in the start and end. See TypeCat,
CoercionMethod, CoercionCodes, etc. I see places where we use the way
you have in the code. I would prefer the way we have used at places
like TypeCat as that makes it easier to read.

4.
  switch (action)
  {
- /* BEGIN */
- case 'B':
+ case LOGICAL_REP_MSG_BEGIN:
  apply_handle_begin(s);
- break;
- /* COMMIT */
- case 'C':
+ return;

I think we can simply use 'return apply_handle_begin;' instead of
adding return in another line. Again, I think we changed this handling
in apply_dispatch() to improve the case where we can detect at the
compile time any missing enum but at this stage it is not clear to me
if that is true.

--
With Regards,
Amit Kapila.

#21Peter Smith
smithpb2250@gmail.com
In reply to: Amit Kapila (#20)
#22Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#21)
#23Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#22)
#24Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Kapila (#20)
#25Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Kapila (#23)
#26Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Bapat (#24)
#27Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Kapila (#26)
#28Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Bapat (#27)
#29Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Kapila (#28)
#30Peter Smith
smithpb2250@gmail.com
In reply to: Ashutosh Bapat (#29)
#31Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Smith (#30)
#32Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#31)
#33Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Kapila (#32)