logicalrep_message_type throws an error

Started by Ashutosh Bapatover 2 years ago32 messages
#1Ashutosh Bapat
ashutosh.bapat.oss@gmail.com

Hi All,
logicalrep_message_type() is used to convert logical message type code
into name while prepared error context or details. Thus when this
function is called probably an ERROR is already raised. If
logicalrep_message_type() gets an unknown message type, it will throw
an error, which will suppress the error for which we are building
context or details. That's not useful. I think instead
logicalrep_message_type() should return "unknown" when it encounters
an unknown message type and let the original error message be thrown
as is.

--
Best Wishes,
Ashutosh Bapat

#2Euler Taveira
euler@eulerto.com
In reply to: Ashutosh Bapat (#1)
1 attachment(s)
Re: logicalrep_message_type throws an error

On Mon, Jul 3, 2023, at 7:30 AM, Ashutosh Bapat wrote:

logicalrep_message_type() is used to convert logical message type code
into name while prepared error context or details. Thus when this
function is called probably an ERROR is already raised. If
logicalrep_message_type() gets an unknown message type, it will throw
an error, which will suppress the error for which we are building
context or details. That's not useful. I think instead
logicalrep_message_type() should return "unknown" when it encounters
an unknown message type and let the original error message be thrown
as is.

Hmm. Good catch. The current behavior is:

ERROR: invalid logical replication message type "X"
LOG: background worker "logical replication worker" (PID 71800) exited with exit code 1

... that hides the details. After providing a default message type:

ERROR: invalid logical replication message type "X"
CONTEXT: processing remote data for replication origin "pg_16638" during message type "???" in transaction 796, finished at 0/16266F8

Masahiko, since abc0910e2e0 is your patch maybe you want to take a look at it.

--
Euler Taveira
EDB https://www.enterprisedb.com/

Attachments:

0001-uncover-logical-change-details.patchtext/x-patch; name=0001-uncover-logical-change-details.patchDownload
From aaeb2d7474a30a363b69441397b2d7dd91bfba30 Mon Sep 17 00:00:00 2001
From: Euler Taveira <euler.taveira@enterprisedb.com>
Date: Mon, 3 Jul 2023 08:54:10 -0300
Subject: [PATCH] uncover logical change details

The commit abc0910e2e0 adds logical change details to error context.
However, the function logicalrep_message_type() introduces an
elog(ERROR) that can hide these details. Instead, avoid elog() and use
??? (that is a synonym for unknown). Spotted by Ashutosh Bapat.

Discussion: https://postgr.es/m/CAExHW5suAEDW-mBZt_qu4RVxWZ1vL54-L%2Bci2zreYWebpzxYsA%40mail.gmail.com
---
 src/backend/replication/logical/proto.c | 8 +++-----
 src/include/replication/logicalproto.h  | 2 +-
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index f308713275..572ef0a1aa 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -1213,7 +1213,7 @@ logicalrep_read_stream_abort(StringInfo in,
 /*
  * Get string representing LogicalRepMsgType.
  */
-char *
+const char *
 logicalrep_message_type(LogicalRepMsgType action)
 {
 	switch (action)
@@ -1256,9 +1256,7 @@ logicalrep_message_type(LogicalRepMsgType action)
 			return "STREAM ABORT";
 		case LOGICAL_REP_MSG_STREAM_PREPARE:
 			return "STREAM PREPARE";
+		default:
+			return "???";
 	}
-
-	elog(ERROR, "invalid logical replication message type \"%c\"", action);
-
-	return NULL;				/* keep compiler quiet */
 }
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 0ea2df5088..c5be981eae 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -269,6 +269,6 @@ extern void logicalrep_write_stream_abort(StringInfo out, TransactionId xid,
 extern void logicalrep_read_stream_abort(StringInfo in,
 										 LogicalRepStreamAbortData *abort_data,
 										 bool read_abort_info);
-extern char *logicalrep_message_type(LogicalRepMsgType action);
+extern const char *logicalrep_message_type(LogicalRepMsgType action);
 
 #endif							/* LOGICAL_PROTO_H */
-- 
2.30.2

#3Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Euler Taveira (#2)
Re: logicalrep_message_type throws an error

Thanks Euler for the patch.

On Mon, Jul 3, 2023 at 6:32 PM Euler Taveira <euler@eulerto.com> wrote:

Masahiko, since abc0910e2e0 is your patch maybe you want to take a look at it.

A couple of comments.

-char *
+const char *

Nice improvement.

 logicalrep_message_type(LogicalRepMsgType action)
 {
     switch (action)
@@ -1256,9 +1256,7 @@ logicalrep_message_type(LogicalRepMsgType action)
             return "STREAM ABORT";
         case LOGICAL_REP_MSG_STREAM_PREPARE:
             return "STREAM PREPARE";
+        default:
+            return "???";
     }
-
-    elog(ERROR, "invalid logical replication message type \"%c\"", action);
-
-    return NULL;                /* keep compiler quiet */

The switch is on action which is an enum. So without default it will
provide a compilation warning for missing enums. Adding "default" case
defeats that purpose. I think we should just return "???" from outside
switch block.

--
Best Wishes,
Ashutosh Bapat

#4Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Ashutosh Bapat (#3)
1 attachment(s)
Re: logicalrep_message_type throws an error

On Mon, Jul 3, 2023 at 6:52 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Thanks Euler for the patch.

On Mon, Jul 3, 2023 at 6:32 PM Euler Taveira <euler@eulerto.com> wrote:

Masahiko, since abc0910e2e0 is your patch maybe you want to take a look at it.

A couple of comments.

-char *
+const char *

Nice improvement.

logicalrep_message_type(LogicalRepMsgType action)
{
switch (action)
@@ -1256,9 +1256,7 @@ logicalrep_message_type(LogicalRepMsgType action)
return "STREAM ABORT";
case LOGICAL_REP_MSG_STREAM_PREPARE:
return "STREAM PREPARE";
+        default:
+            return "???";
}
-
-    elog(ERROR, "invalid logical replication message type \"%c\"", action);
-
-    return NULL;                /* keep compiler quiet */

The switch is on action which is an enum. So without default it will
provide a compilation warning for missing enums. Adding "default" case
defeats that purpose. I think we should just return "???" from outside
switch block.

PFA patch.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-uncover-logical-change-details-20230703.patchtext/x-patch; charset=US-ASCII; name=0001-uncover-logical-change-details-20230703.patchDownload
From e117a1ea78327ebcb85d55fedb3b5e7e3b5c7b27 Mon Sep 17 00:00:00 2001
From: Euler Taveira <euler.taveira@enterprisedb.com>
Date: Mon, 3 Jul 2023 08:54:10 -0300
Subject: [PATCH] uncover logical change details

The commit abc0910e2e0 adds logical change details to error context.
However, the function logicalrep_message_type() introduces an
elog(ERROR) that can hide these details. Instead, avoid elog() and use
??? (that is a synonym for unknown). Spotted by Ashutosh Bapat.

Discussion: https://postgr.es/m/CAExHW5suAEDW-mBZt_qu4RVxWZ1vL54-L%2Bci2zreYWebpzxYsA%40mail.gmail.com
---
 src/backend/replication/logical/proto.c | 11 +++++++----
 src/include/replication/logicalproto.h  |  2 +-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index f308713275..fe6ed9f94b 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -1213,7 +1213,7 @@ logicalrep_read_stream_abort(StringInfo in,
 /*
  * Get string representing LogicalRepMsgType.
  */
-char *
+const char *
 logicalrep_message_type(LogicalRepMsgType action)
 {
 	switch (action)
@@ -1258,7 +1258,10 @@ logicalrep_message_type(LogicalRepMsgType action)
 			return "STREAM PREPARE";
 	}
 
-	elog(ERROR, "invalid logical replication message type \"%c\"", action);
-
-	return NULL;				/* keep compiler quiet */
+	/*
+	 * This function is called to provide context in the error raised when
+	 * applying a logical message. So we can't throw an error here. Return an
+	 * unknown indicator value so that the original error is still reported.
+	 */
+	return "???";
 }
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 0ea2df5088..c5be981eae 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -269,6 +269,6 @@ extern void logicalrep_write_stream_abort(StringInfo out, TransactionId xid,
 extern void logicalrep_read_stream_abort(StringInfo in,
 										 LogicalRepStreamAbortData *abort_data,
 										 bool read_abort_info);
-extern char *logicalrep_message_type(LogicalRepMsgType action);
+extern const char *logicalrep_message_type(LogicalRepMsgType action);
 
 #endif							/* LOGICAL_PROTO_H */
-- 
2.25.1

#5Euler Taveira
euler@eulerto.com
In reply to: Ashutosh Bapat (#4)
Re: logicalrep_message_type throws an error

On Mon, Jul 3, 2023, at 10:57 AM, Ashutosh Bapat wrote:

On Mon, Jul 3, 2023 at 6:52 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

The switch is on action which is an enum. So without default it will
provide a compilation warning for missing enums. Adding "default" case
defeats that purpose. I think we should just return "???" from outside
switch block.

Yeah. I don't think any gcc -Wswitch options have effect if a default is used
so your suggestion is good for wrong/missing message types in the future.

PFA patch.

WFM.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#6Amit Kapila
amit.kapila16@gmail.com
In reply to: Euler Taveira (#2)
Re: logicalrep_message_type throws an error

On Mon, Jul 3, 2023 at 6:32 PM Euler Taveira <euler@eulerto.com> wrote:

On Mon, Jul 3, 2023, at 7:30 AM, Ashutosh Bapat wrote:

logicalrep_message_type() is used to convert logical message type code
into name while prepared error context or details. Thus when this
function is called probably an ERROR is already raised. If
logicalrep_message_type() gets an unknown message type, it will throw
an error, which will suppress the error for which we are building
context or details. That's not useful. I think instead
logicalrep_message_type() should return "unknown" when it encounters
an unknown message type and let the original error message be thrown
as is.

Hmm. Good catch. The current behavior is:

ERROR: invalid logical replication message type "X"
LOG: background worker "logical replication worker" (PID 71800) exited with exit code 1

... that hides the details. After providing a default message type:

ERROR: invalid logical replication message type "X"
CONTEXT: processing remote data for replication origin "pg_16638" during message type "???" in transaction 796, finished at 0/16266F8

I think after returning "???" from logicalrep_message_type(), the
above is possible when we get the error: "invalid logical replication
message type "X"" from apply_dispatch(), right? If so, then what about
the case when we forgot to handle some message in
logicalrep_message_type() but handled it in apply_dispatch()? Isn't it
better to return the 'action' from the function
logicalrep_message_type() for unknown type? That way the information
could be a bit better and we can easily catch the code bug as well.

--
With Regards,
Amit Kapila.

#7Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Kapila (#6)
Re: logicalrep_message_type throws an error

On 2023-Jul-05, Amit Kapila wrote:

I think after returning "???" from logicalrep_message_type(), the
above is possible when we get the error: "invalid logical replication
message type "X"" from apply_dispatch(), right? If so, then what about
the case when we forgot to handle some message in
logicalrep_message_type() but handled it in apply_dispatch()? Isn't it
better to return the 'action' from the function
logicalrep_message_type() for unknown type? That way the information
could be a bit better and we can easily catch the code bug as well.

Are you suggesting that logicalrep_message_type should include the
numerical value of 'action' in the ??? message? Something like this:

ERROR: invalid logical replication message type "X"
CONTEXT: processing remote data for replication origin "pg_16638" during message type "??? (123)" in transaction 796, finished at 0/16266F8

I don't see why not -- seems easy enough, and might help somebody.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Tom: There seems to be something broken here.
Teodor: I'm in sackcloth and ashes... Fixed.
/messages/by-id/482D1632.8010507@sigaev.ru

#8Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#7)
Re: logicalrep_message_type throws an error

On Wed, Jul 5, 2023 at 4:26 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Jul-05, Amit Kapila wrote:

I think after returning "???" from logicalrep_message_type(), the
above is possible when we get the error: "invalid logical replication
message type "X"" from apply_dispatch(), right? If so, then what about
the case when we forgot to handle some message in
logicalrep_message_type() but handled it in apply_dispatch()? Isn't it
better to return the 'action' from the function
logicalrep_message_type() for unknown type? That way the information
could be a bit better and we can easily catch the code bug as well.

Are you suggesting that logicalrep_message_type should include the
numerical value of 'action' in the ??? message? Something like this:

ERROR: invalid logical replication message type "X"
CONTEXT: processing remote data for replication origin "pg_16638" during message type "??? (123)" in transaction 796, finished at 0/16266F8

Yes, something like that would be better.

--
With Regards,
Amit Kapila.

#9Euler Taveira
euler@eulerto.com
In reply to: Alvaro Herrera (#7)
Re: logicalrep_message_type throws an error

On Wed, Jul 5, 2023, at 7:56 AM, Alvaro Herrera wrote:

On 2023-Jul-05, Amit Kapila wrote:

I think after returning "???" from logicalrep_message_type(), the
above is possible when we get the error: "invalid logical replication
message type "X"" from apply_dispatch(), right? If so, then what about
the case when we forgot to handle some message in
logicalrep_message_type() but handled it in apply_dispatch()? Isn't it
better to return the 'action' from the function
logicalrep_message_type() for unknown type? That way the information
could be a bit better and we can easily catch the code bug as well.

Are you suggesting that logicalrep_message_type should include the
numerical value of 'action' in the ??? message? Something like this:

ERROR: invalid logical replication message type "X"
CONTEXT: processing remote data for replication origin "pg_16638" during message type "??? (123)" in transaction 796, finished at 0/16266F8

Isn't this numerical value already exposed in the error message (X = 88)?
In this example, it is:

ERROR: invalid logical replication message type "X"
CONTEXT: processing remote data for replication origin "pg_16638" during message type "??? (88)" in transaction 796, finished at 0/1626698

IMO it could be confusing if we provide two representations of the same data (X
and 88). Since we already provide "X" I don't think we need "88". Another
option, is to remove "X" from apply_dispatch() and add "??? (88)" to
logicalrep_message_type().

Opinions?

--
Euler Taveira
EDB https://www.enterprisedb.com/

#10Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Euler Taveira (#9)
Re: logicalrep_message_type throws an error

On Wed, Jul 5, 2023 at 7:07 PM Euler Taveira <euler@eulerto.com> wrote:

On Wed, Jul 5, 2023, at 7:56 AM, Alvaro Herrera wrote:

On 2023-Jul-05, Amit Kapila wrote:

I think after returning "???" from logicalrep_message_type(), the
above is possible when we get the error: "invalid logical replication
message type "X"" from apply_dispatch(), right? If so, then what about
the case when we forgot to handle some message in
logicalrep_message_type() but handled it in apply_dispatch()?

apply_dispatch() has a default case in switch() so it can
theoretically forget to handle some message type. I think we should
avoid default case in that function to catch missing message type in
that function. But if logicalrep_message_type() doesn't use default
case, it won't forget to handle a known message type. So what you are
suggesting is not possible.

It might happen that the upstream may send an unknown message type
that both apply_dispatch() and logicalrep_message_type() can not
handle.

ERROR: invalid logical replication message type "X"
CONTEXT: processing remote data for replication origin "pg_16638" during message type "??? (88)" in transaction 796, finished at 0/1626698

IMO it could be confusing if we provide two representations of the same data (X
and 88). Since we already provide "X" I don't think we need "88". Another
option, is to remove "X" from apply_dispatch() and add "??? (88)" to
logicalrep_message_type().

I think we don't need message type to be mentioned in the context for
an error about invalid message type. I think what needs to be done is
to set
apply_error_callback_arg.command = 0 before calling ereport in the
default case of apply_dispatch().
apply_error_callback() will just return without providing a context.
If we need a context and have all the other necessary fields, we can
improve apply_error_callback() to provide context when
apply_error_callback_arg.command = 0 .

--
Best Wishes,
Ashutosh Bapat

#11Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Euler Taveira (#9)
Re: logicalrep_message_type throws an error

On 2023-Jul-05, Euler Taveira wrote:

Isn't this numerical value already exposed in the error message (X = 88)?
In this example, it is:

ERROR: invalid logical replication message type "X"
CONTEXT: processing remote data for replication origin "pg_16638" during message type "??? (88)" in transaction 796, finished at 0/1626698

IMO it could be confusing if we provide two representations of the same data (X
and 88). Since we already provide "X" I don't think we need "88". Another
option, is to remove "X" from apply_dispatch() and add "??? (88)" to
logicalrep_message_type().

Opinions?

The CONTEXT message could theoretically appear in other error throws,
not just in "invalid logical replication message". So the duplicity is
not really a problem.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Now I have my system running, not a byte was off the shelf;
It rarely breaks and when it does I fix the code myself.
It's stable, clean and elegant, and lightning fast as well,
And it doesn't cost a nickel, so Bill Gates can go to hell."

#12Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#11)
Re: logicalrep_message_type throws an error

On 2023-Jul-05, Alvaro Herrera wrote:

On 2023-Jul-05, Euler Taveira wrote:

Isn't this numerical value already exposed in the error message (X = 88)?
In this example, it is:

ERROR: invalid logical replication message type "X"
CONTEXT: processing remote data for replication origin "pg_16638" during message type "??? (88)" in transaction 796, finished at 0/1626698

IMO it could be confusing if we provide two representations of the same data (X
and 88). Since we already provide "X" I don't think we need "88".

The CONTEXT message could theoretically appear in other error throws,
not just in "invalid logical replication message". So the duplicity is
not really a problem.

Ah, but you're thinking that if the message type is invalid, then it
will have been rejected in the "invalid logical replication message"
stage, so no invalid message type will be reported. I guess there's a
point to that argument as well.

However, I don't see that having the numerical ASCII value there causes
any harm, even if the char value is already exposed in the other
message. (I'm sure you'll agree that this is quite a minor issue.)

I doubt that each of these two prints of the enum value showing
different formats is confusing. Yes, the enum is defined in terms of
char literals, but if an actually invalid message shows up with an
uint32 value outside the printable ASCII range, the printout might
be ugly or useless.

Another option, is to remove "X" from apply_dispatch() and add "???
(88)" to logicalrep_message_type().

Now *this* would be an actively bad idea IMO.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#13Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#12)
Re: logicalrep_message_type throws an error

On Wed, Jul 5, 2023 at 10:45 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Jul-05, Alvaro Herrera wrote:

On 2023-Jul-05, Euler Taveira wrote:

Isn't this numerical value already exposed in the error message (X = 88)?
In this example, it is:

ERROR: invalid logical replication message type "X"
CONTEXT: processing remote data for replication origin "pg_16638" during message type "??? (88)" in transaction 796, finished at 0/1626698

IMO it could be confusing if we provide two representations of the same data (X
and 88). Since we already provide "X" I don't think we need "88".

The CONTEXT message could theoretically appear in other error throws,
not just in "invalid logical replication message". So the duplicity is
not really a problem.

Ah, but you're thinking that if the message type is invalid, then it
will have been rejected in the "invalid logical replication message"
stage, so no invalid message type will be reported.

Yeah, but it would still be displayed both in context and the actual message.

I guess there's a
point to that argument as well.

One point to note is that the user may also get confused if the actual
ERROR says message type as 'X' and the context says '???'. I feel in
this case duplicate information is better than different information.

--
With Regards,
Amit Kapila.

#14Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Bapat (#10)
Re: logicalrep_message_type throws an error

On Wed, Jul 5, 2023 at 7:54 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Wed, Jul 5, 2023 at 7:07 PM Euler Taveira <euler@eulerto.com> wrote:

On Wed, Jul 5, 2023, at 7:56 AM, Alvaro Herrera wrote:

On 2023-Jul-05, Amit Kapila wrote:

I think after returning "???" from logicalrep_message_type(), the
above is possible when we get the error: "invalid logical replication
message type "X"" from apply_dispatch(), right? If so, then what about
the case when we forgot to handle some message in
logicalrep_message_type() but handled it in apply_dispatch()?

apply_dispatch() has a default case in switch() so it can
theoretically forget to handle some message type. I think we should
avoid default case in that function to catch missing message type in
that function. But if logicalrep_message_type() doesn't use default
case, it won't forget to handle a known message type. So what you are
suggesting is not possible.

Right, but still I feel it would be better to return actual action.

It might happen that the upstream may send an unknown message type
that both apply_dispatch() and logicalrep_message_type() can not
handle.

ERROR: invalid logical replication message type "X"
CONTEXT: processing remote data for replication origin "pg_16638" during message type "??? (88)" in transaction 796, finished at 0/1626698

IMO it could be confusing if we provide two representations of the same data (X
and 88). Since we already provide "X" I don't think we need "88". Another
option, is to remove "X" from apply_dispatch() and add "??? (88)" to
logicalrep_message_type().

I think we don't need message type to be mentioned in the context for
an error about invalid message type. I think what needs to be done is
to set
apply_error_callback_arg.command = 0 before calling ereport in the
default case of apply_dispatch().
apply_error_callback() will just return without providing a context.

Hmm, this looks a bit hacky to me.

--
With Regards,
Amit Kapila.

#15Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#13)
Re: logicalrep_message_type throws an error

On Thu, Jul 6, 2023 at 6:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

One point to note is that the user may also get confused if the actual
ERROR says message type as 'X' and the context says '???'. I feel in
this case duplicate information is better than different information.

I agree. I think it would be better to show the same string like:

ERROR: invalid logical replication message type "??? (88)"
CONTEXT: processing remote data for replication origin "pg_16638"
during message type "??? (88)" in transaction 796, finished at
0/1626698

Since the numerical value is important only in invalid message type
cases, how about using a format like "??? (88)" in unknown message
type cases, in both error and context messages?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#16Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Masahiko Sawada (#15)
Re: logicalrep_message_type throws an error

On 2023-Jul-11, Masahiko Sawada wrote:

Since the numerical value is important only in invalid message type
cases, how about using a format like "??? (88)" in unknown message
type cases, in both error and context messages?

+1

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Las navajas y los monos deben estar siempre distantes" (Germán Poo)

#17Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#15)
1 attachment(s)
Re: logicalrep_message_type throws an error

On Tue, Jul 11, 2023 at 1:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Jul 6, 2023 at 6:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

One point to note is that the user may also get confused if the actual
ERROR says message type as 'X' and the context says '???'. I feel in
this case duplicate information is better than different information.

I agree. I think it would be better to show the same string like:

ERROR: invalid logical replication message type "??? (88)"
CONTEXT: processing remote data for replication origin "pg_16638"
during message type "??? (88)" in transaction 796, finished at
0/1626698

Since the numerical value is important only in invalid message type
cases, how about using a format like "??? (88)" in unknown message
type cases, in both error and context messages?

Do you have something like attached in mind?

--
With Regards,
Amit Kapila.

Attachments:

v2_fix_apply_err_inconsistency.patchapplication/octet-stream; name=v2_fix_apply_err_inconsistency.patchDownload
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index f308713275..a038f80dbd 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -1213,9 +1213,11 @@ logicalrep_read_stream_abort(StringInfo in,
 /*
  * Get string representing LogicalRepMsgType.
  */
-char *
+const char *
 logicalrep_message_type(LogicalRepMsgType action)
 {
+	char	*err_unknown;
+
 	switch (action)
 	{
 		case LOGICAL_REP_MSG_BEGIN:
@@ -1258,7 +1260,12 @@ logicalrep_message_type(LogicalRepMsgType action)
 			return "STREAM PREPARE";
 	}
 
-	elog(ERROR, "invalid logical replication message type \"%c\"", action);
+	/*
+	 * This function is called to provide context in the error raised when
+	 * applying a logical message. So we can't throw an error here. Return an
+	 * unknown indicator value so that the original error is still reported.
+	 */
+	err_unknown = psprintf("??? (%d)", action);
 
-	return NULL;				/* keep compiler quiet */
+	return err_unknown;
 }
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index dd353fd1cb..cb6659fc61 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -3367,7 +3367,7 @@ apply_dispatch(StringInfo s)
 		default:
 			ereport(ERROR,
 					(errcode(ERRCODE_PROTOCOL_VIOLATION),
-					 errmsg("invalid logical replication message type \"%c\"", action)));
+					 errmsg("invalid logical replication message type \"??? (%d)\"", action)));
 	}
 
 	/* Reset the current command */
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 0ea2df5088..c5be981eae 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -269,6 +269,6 @@ extern void logicalrep_write_stream_abort(StringInfo out, TransactionId xid,
 extern void logicalrep_read_stream_abort(StringInfo in,
 										 LogicalRepStreamAbortData *abort_data,
 										 bool read_abort_info);
-extern char *logicalrep_message_type(LogicalRepMsgType action);
+extern const char *logicalrep_message_type(LogicalRepMsgType action);
 
 #endif							/* LOGICAL_PROTO_H */
#18Euler Taveira
euler@eulerto.com
In reply to: Amit Kapila (#17)
Re: logicalrep_message_type throws an error

On Sat, Jul 15, 2023, at 4:27 AM, Amit Kapila wrote:

Do you have something like attached in mind?

WFM. I would change the comment that says

This function is called to provide context in the error ...

to

This message provides context in the error ...

because this comment is not at the beginning of the function but *before* the
message.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#19Amit Kapila
amit.kapila16@gmail.com
In reply to: Euler Taveira (#18)
Re: logicalrep_message_type throws an error

On Sat, Jul 15, 2023 at 7:16 PM Euler Taveira <euler@eulerto.com> wrote:

On Sat, Jul 15, 2023, at 4:27 AM, Amit Kapila wrote:

Do you have something like attached in mind?

WFM. I would change the comment that says

This function is called to provide context in the error ...

to

This message provides context in the error ...

because this comment is not at the beginning of the function but *before* the
message.

Sounds reasonable to me. I'll make this modification before pushing
unless there are more comments/suggestions.

--
With Regards,
Amit Kapila.

#20Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Amit Kapila (#17)
Re: logicalrep_message_type throws an error

On Sat, Jul 15, 2023 at 12:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Since the numerical value is important only in invalid message type
cases, how about using a format like "??? (88)" in unknown message
type cases, in both error and context messages?

Do you have something like attached in mind?

Prologue of psprintf() says

* Errors are not returned to the caller, but are reported via elog(ERROR)
* in the backend, or printf-to-stderr-and-exit() in frontend builds.
* One should therefore think twice about using this in libpq.

If an error occurs in psprintf(), it will throw an error which will
override the original error. I think we should avoid any stuff that
throws further errors.

--
Best Wishes,
Ashutosh Bapat

#21Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Ashutosh Bapat (#20)
Re: logicalrep_message_type throws an error

On 2023-Jul-17, Ashutosh Bapat wrote:

Prologue of psprintf() says

* Errors are not returned to the caller, but are reported via elog(ERROR)
* in the backend, or printf-to-stderr-and-exit() in frontend builds.
* One should therefore think twice about using this in libpq.

If an error occurs in psprintf(), it will throw an error which will
override the original error. I think we should avoid any stuff that
throws further errors.

Ooh, yeah, this is an excellent point. I agree it would be better to
avoid psprintf() here and anything that adds more failure modes. Let's
just do the thing in the original patch you submitted, to ensure all
these strings are compile-time constants; that's likely the most robust.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)

#22Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#21)
Re: logicalrep_message_type throws an error

On Mon, Jul 17, 2023 at 7:54 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Jul-17, Ashutosh Bapat wrote:

Prologue of psprintf() says

* Errors are not returned to the caller, but are reported via elog(ERROR)
* in the backend, or printf-to-stderr-and-exit() in frontend builds.
* One should therefore think twice about using this in libpq.

If an error occurs in psprintf(), it will throw an error which will
override the original error. I think we should avoid any stuff that
throws further errors.

Ooh, yeah, this is an excellent point. I agree it would be better to
avoid psprintf() here and anything that adds more failure modes.

I have tried to check whether we have such usage in any other error
callbacks. Though I haven't scrutinized each and every error callback,
I found a few of them where an error can be raised. For example,

rm_redo_error_callback()->initStringInfo()
CopyFromErrorCallback()->limit_printout_length()
shared_buffer_write_error_callback()->relpathperm()->relpathbackend()->GetRelationPath()->psprintf()

Let's
just do the thing in the original patch you submitted, to ensure all
these strings are compile-time constants; that's likely the most robust.

So will we be okay with something like the below:

ERROR: invalid logical replication message type "??? (88)"
CONTEXT: processing remote data for replication origin "pg_16638"
during message type "???" in transaction 796, finished at
0/1626698

--
With Regards,
Amit Kapila.

#23Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#22)
Re: logicalrep_message_type throws an error

On Tue, Jul 18, 2023 at 12:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jul 17, 2023 at 7:54 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Jul-17, Ashutosh Bapat wrote:

Prologue of psprintf() says

* Errors are not returned to the caller, but are reported via elog(ERROR)
* in the backend, or printf-to-stderr-and-exit() in frontend builds.
* One should therefore think twice about using this in libpq.

If an error occurs in psprintf(), it will throw an error which will
override the original error. I think we should avoid any stuff that
throws further errors.

Ooh, yeah, this is an excellent point. I agree it would be better to
avoid psprintf() here and anything that adds more failure modes.

I have tried to check whether we have such usage in any other error
callbacks. Though I haven't scrutinized each and every error callback,
I found a few of them where an error can be raised. For example,

rm_redo_error_callback()->initStringInfo()
CopyFromErrorCallback()->limit_printout_length()
shared_buffer_write_error_callback()->relpathperm()->relpathbackend()->GetRelationPath()->psprintf()

Let's
just do the thing in the original patch you submitted, to ensure all
these strings are compile-time constants; that's likely the most robust.

Or can we use snprintf() writing "??? (%d)" to a fixed length char[8 +
11] allocated on the stack instead?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#24Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#23)
Re: logicalrep_message_type throws an error

On Tue, Jul 18, 2023 at 10:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jul 18, 2023 at 12:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jul 17, 2023 at 7:54 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I have tried to check whether we have such usage in any other error
callbacks. Though I haven't scrutinized each and every error callback,
I found a few of them where an error can be raised. For example,

rm_redo_error_callback()->initStringInfo()
CopyFromErrorCallback()->limit_printout_length()
shared_buffer_write_error_callback()->relpathperm()->relpathbackend()->GetRelationPath()->psprintf()

Let's
just do the thing in the original patch you submitted, to ensure all
these strings are compile-time constants; that's likely the most robust.

Or can we use snprintf() writing "??? (%d)" to a fixed length char[8 +
11] allocated on the stack instead?

In the above size calculation, shouldn't it be 7 + 11 where 7 is for
(3 (???) + 1 for space + 2 for () + 1 for terminating null char) and
11 is for %d? BTW, this avoids dynamic allocation of the err string in
logicalrep_message_type() but we can't return a locally allocated
string, so do you think we should change the prototype of the function
to get this as an argument and then use it both for valid and invalid
cases? I think if there is some simpler way to achieve this then fine,
otherwise, let's return a constant string like "???" from
logicalrep_message_type() for the invalid action.

--
With Regards,
Amit Kapila.

#25Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Amit Kapila (#24)
Re: logicalrep_message_type throws an error

On Wed, Jul 19, 2023 at 9:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jul 18, 2023 at 10:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jul 18, 2023 at 12:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jul 17, 2023 at 7:54 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I have tried to check whether we have such usage in any other error
callbacks. Though I haven't scrutinized each and every error callback,
I found a few of them where an error can be raised. For example,

rm_redo_error_callback()->initStringInfo()
CopyFromErrorCallback()->limit_printout_length()
shared_buffer_write_error_callback()->relpathperm()->relpathbackend()->GetRelationPath()->psprintf()

Let's
just do the thing in the original patch you submitted, to ensure all
these strings are compile-time constants; that's likely the most robust.

Or can we use snprintf() writing "??? (%d)" to a fixed length char[8 +
11] allocated on the stack instead?

In the above size calculation, shouldn't it be 7 + 11 where 7 is for
(3 (???) + 1 for space + 2 for () + 1 for terminating null char) and
11 is for %d? BTW, this avoids dynamic allocation of the err string in
logicalrep_message_type() but we can't return a locally allocated
string, so do you think we should change the prototype of the function
to get this as an argument and then use it both for valid and invalid
cases?

There are other places in the code which do something similar by using
statically allocated buffers like static char xya[SIZE]. We could do
that here. The caller may decide whether to pstrdup this buffer
further or just use it one time e.g. as an elog or printf argument.

As I said before, we should not even print message type in the error
context because it's unknown. Repeating that twice is useless. That
will need some changes to apply_error_callback() though.
But I am fine with "???" as well.

--
Best Wishes,
Ashutosh Bapat

#26Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Bapat (#25)
Re: logicalrep_message_type throws an error

On Wed, Jul 19, 2023 at 10:08 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Wed, Jul 19, 2023 at 9:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jul 18, 2023 at 10:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Or can we use snprintf() writing "??? (%d)" to a fixed length char[8 +
11] allocated on the stack instead?

In the above size calculation, shouldn't it be 7 + 11 where 7 is for
(3 (???) + 1 for space + 2 for () + 1 for terminating null char) and
11 is for %d? BTW, this avoids dynamic allocation of the err string in
logicalrep_message_type() but we can't return a locally allocated
string, so do you think we should change the prototype of the function
to get this as an argument and then use it both for valid and invalid
cases?

There are other places in the code which do something similar by using
statically allocated buffers like static char xya[SIZE]. We could do
that here. The caller may decide whether to pstrdup this buffer
further or just use it one time e.g. as an elog or printf argument.

Okay, changed it accordingly. Currently, we call it only from
errcontext, so it looks reasonable to me to use static here.

As I said before, we should not even print message type in the error
context because it's unknown. Repeating that twice is useless. That
will need some changes to apply_error_callback() though.
But I am fine with "???" as well.

I think in the end it won't make a big difference. So, I would like to
go with Sawada-San's suggestion to keep the message type consistent in
actual error and error context unless that requires bigger changes.

--
With Regards,
Amit Kapila.

#27Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#26)
1 attachment(s)
Re: logicalrep_message_type throws an error

On Thu, Jul 20, 2023 at 9:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 19, 2023 at 10:08 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Wed, Jul 19, 2023 at 9:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jul 18, 2023 at 10:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Or can we use snprintf() writing "??? (%d)" to a fixed length char[8 +
11] allocated on the stack instead?

In the above size calculation, shouldn't it be 7 + 11 where 7 is for
(3 (???) + 1 for space + 2 for () + 1 for terminating null char) and
11 is for %d? BTW, this avoids dynamic allocation of the err string in
logicalrep_message_type() but we can't return a locally allocated
string, so do you think we should change the prototype of the function
to get this as an argument and then use it both for valid and invalid
cases?

There are other places in the code which do something similar by using
statically allocated buffers like static char xya[SIZE]. We could do
that here. The caller may decide whether to pstrdup this buffer
further or just use it one time e.g. as an elog or printf argument.

Okay, changed it accordingly.

oops, forgot to attach the patch.

--
With Regards,
Amit Kapila.

Attachments:

v3-0001-Fix-the-display-of-UNKNOWN-message-type-in-apply-.patchapplication/octet-stream; name=v3-0001-Fix-the-display-of-UNKNOWN-message-type-in-apply-.patchDownload
From 09ffcdcfb942ba6b2f55b6c8ee3422e3f9222901 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Thu, 20 Jul 2023 08:10:55 +0530
Subject: [PATCH v3] Fix the display of UNKNOWN message type in apply worker.

We include the message type while displaying an error context in the
apply worker. Now, while retrieving the message type string if the
message type is unknown we throw an error that will hide the original
error. So, instead, we need to simply return the string indicating an
unknown message type.
---
 src/backend/replication/logical/proto.c  | 13 ++++++++++---
 src/backend/replication/logical/worker.c |  2 +-
 src/include/replication/logicalproto.h   |  2 +-
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index f308713275..504f94d4a7 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -1213,9 +1213,11 @@ logicalrep_read_stream_abort(StringInfo in,
 /*
  * Get string representing LogicalRepMsgType.
  */
-char *
+const char *
 logicalrep_message_type(LogicalRepMsgType action)
 {
+	static char	err_unknown[20];
+
 	switch (action)
 	{
 		case LOGICAL_REP_MSG_BEGIN:
@@ -1258,7 +1260,12 @@ logicalrep_message_type(LogicalRepMsgType action)
 			return "STREAM PREPARE";
 	}
 
-	elog(ERROR, "invalid logical replication message type \"%c\"", action);
+	/*
+	 * This message provides context in the error raised when applying a
+	 * logical message. So we can't throw an error here. Return an unknown
+	 * indicator value so that the original error is still reported.
+	 */
+	snprintf(err_unknown, sizeof(err_unknown), "??? (%d)", action);
 
-	return NULL;				/* keep compiler quiet */
+	return err_unknown;
 }
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index dd353fd1cb..cb6659fc61 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -3367,7 +3367,7 @@ apply_dispatch(StringInfo s)
 		default:
 			ereport(ERROR,
 					(errcode(ERRCODE_PROTOCOL_VIOLATION),
-					 errmsg("invalid logical replication message type \"%c\"", action)));
+					 errmsg("invalid logical replication message type \"??? (%d)\"", action)));
 	}
 
 	/* Reset the current command */
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 0ea2df5088..c5be981eae 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -269,6 +269,6 @@ extern void logicalrep_write_stream_abort(StringInfo out, TransactionId xid,
 extern void logicalrep_read_stream_abort(StringInfo in,
 										 LogicalRepStreamAbortData *abort_data,
 										 bool read_abort_info);
-extern char *logicalrep_message_type(LogicalRepMsgType action);
+extern const char *logicalrep_message_type(LogicalRepMsgType action);
 
 #endif							/* LOGICAL_PROTO_H */
-- 
2.28.0.windows.1

#28Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Amit Kapila (#27)
Re: logicalrep_message_type throws an error

On Thu, Jul 20, 2023 at 9:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, changed it accordingly.

oops, forgot to attach the patch.

WFM

@@ -3367,7 +3367,7 @@ apply_dispatch(StringInfo s)
         default:
             ereport(ERROR,
                     (errcode(ERRCODE_PROTOCOL_VIOLATION),
-                     errmsg("invalid logical replication message type
\"%c\"", action)));
+                     errmsg("invalid logical replication message type
\"??? (%d)\"", action)));

I think we should report character here since that's what is visible
in the code and also the message types are communicated as characters
not integers. Makes debugging easier. Context may report integer as an
additional data point.

--
Best Wishes,
Ashutosh Bapat

#29Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Ashutosh Bapat (#28)
Re: logicalrep_message_type throws an error

On Fri, Jul 21, 2023 at 1:39 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Jul 20, 2023 at 9:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, changed it accordingly.

oops, forgot to attach the patch.

WFM

@@ -3367,7 +3367,7 @@ apply_dispatch(StringInfo s)
default:
ereport(ERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
-                     errmsg("invalid logical replication message type
\"%c\"", action)));
+                     errmsg("invalid logical replication message type
\"??? (%d)\"", action)));

I think we should report character here since that's what is visible
in the code and also the message types are communicated as characters
not integers. Makes debugging easier. Context may report integer as an
additional data point.

I think it could confuse us if an invalid message is not a printable character.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#30Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#29)
Re: logicalrep_message_type throws an error

On Fri, Jul 21, 2023 at 6:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jul 21, 2023 at 1:39 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Jul 20, 2023 at 9:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, changed it accordingly.

oops, forgot to attach the patch.

WFM

@@ -3367,7 +3367,7 @@ apply_dispatch(StringInfo s)
default:
ereport(ERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
-                     errmsg("invalid logical replication message type
\"%c\"", action)));
+                     errmsg("invalid logical replication message type
\"??? (%d)\"", action)));

I think we should report character here since that's what is visible
in the code and also the message types are communicated as characters
not integers. Makes debugging easier. Context may report integer as an
additional data point.

I think it could confuse us if an invalid message is not a printable character.

Right. I'll push and backpatch this till 15 by Tuesday unless you guys
think otherwise.

--
With Regards,
Amit Kapila.

#31Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Amit Kapila (#30)
Re: logicalrep_message_type throws an error

On Sat, Jul 22, 2023 at 10:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think it could confuse us if an invalid message is not a printable character.

That's a good point.

Right. I'll push and backpatch this till 15 by Tuesday unless you guys
think otherwise.

WFM.

--
Best Wishes,
Ashutosh Bapat

#32Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Bapat (#31)
Re: logicalrep_message_type throws an error

On Mon, Jul 24, 2023 at 6:14 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Sat, Jul 22, 2023 at 10:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Right. I'll push and backpatch this till 15 by Tuesday unless you guys
think otherwise.

WFM.

Pushed.

--
With Regards,
Amit Kapila.