Bugs during ProcessCatchupEvent()

Started by Simon Riggsabout 17 years ago5 messages
#1Simon Riggs
simon@2ndQuadrant.com
1 attachment(s)

I notice that if an ERROR occurs during ProcessCatchupEvent() then the
messages back to client get out of sync with each other. I've inserted
an optional error into ProcessCatchupEvent() to show what happens
(attached).

postgres=# begin;
BEGIN
postgres=# d;
ERROR: an error occurred while processing catchup event
postgres=# commit;
ERROR: syntax error at or near "d"
LINE 1: commit;
^
postgres=# commit;
ROLLBACK
postgres=# begin;
WARNING: there is no transaction in progress
COMMIT

Notice how "commit" has been issued twice... and that there is no "d" in
commit. LOL, but :-(

This issue happens to be exactly the same as the one I have while trying
to make SIGINT cancel an idle-in-transaction session. I was looking at
the catchup interrupt to try to learn more about this area of code, only
to find the same problem exists there also. Perhaps there is no
possibility of an ERROR happening during catchup processing, but looking
at the rest of ProcessCatchupEvent(), I doubt it.

(The attached patch allows behaviour to be turned on/off using
synchronous_commit but that has *nothing* to do with this issue and was
chosen to avoid inventing a new switch based on what was in miscadmin.h)

It looks to me that generating a single error message while idle causes
the server to provide ErrorResponse, which the client assumes is the end
of the processing of that statement as defined in FE/BE protocol. Yet
server continues processing anyway and gives second response later.

This also behaves differently on some tests, generating an infinite loop
of messages to the log and on the psql client like this:

ERROR: an error occurred while processing catchup event
message type 0x5a arrived from server while idle
ERROR: an error occurred while processing catchup event
message type 0x5a arrived from server while idle
...

having used over 8 minutes of CPU as I post this, with 1 CPU at 100%,
even after the client disconnects.

Thoughts, go-look-theres or other comments welcome.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachments:

c2text/x-patch; charset=UTF-8; name=c2Download
Index: src/backend/storage/ipc/sinval.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/storage/ipc/sinval.c,v
retrieving revision 1.89
diff -c -r1.89 sinval.c
*** src/backend/storage/ipc/sinval.c	1 Jan 2009 17:23:47 -0000	1.89
--- src/backend/storage/ipc/sinval.c	5 Jan 2009 19:03:45 -0000
***************
*** 303,308 ****
--- 303,311 ----
  	/* Must prevent SIGUSR2 interrupt while I am running */
  	notify_enabled = DisableNotifyInterrupt();
  
+ 	if (!XactSyncCommit)
+ 		elog(ERROR, "an error occurred while processing catchup event");
+ 
  	/*
  	 * What we need to do here is cause ReceiveSharedInvalidMessages() to run,
  	 * which will do the necessary work and also reset the
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#1)
Re: Bugs during ProcessCatchupEvent()

Simon Riggs <simon@2ndQuadrant.com> writes:

It looks to me that generating a single error message while idle causes
the server to provide ErrorResponse, which the client assumes is the end
of the processing of that statement as defined in FE/BE protocol.

Yeah. I think this is actually a client-side issue: it should keep
reading till it gets a 'Z' message. Not clear how that fits into the
libpq-to-app API though.

regards, tom lane

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#2)
Re: Bugs during ProcessCatchupEvent()

On Tue, 2009-01-06 at 09:44 -0500, Tom Lane wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

It looks to me that generating a single error message while idle causes
the server to provide ErrorResponse, which the client assumes is the end
of the processing of that statement as defined in FE/BE protocol.

Yeah. I think this is actually a client-side issue: it should keep
reading till it gets a 'Z' message. Not clear how that fits into the
libpq-to-app API though.

That makes sense. I'll dig around there.

The infinite loop error seems server-side though.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#4Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#3)
Re: Bugs during ProcessCatchupEvent()

Simon Riggs wrote:

On Tue, 2009-01-06 at 09:44 -0500, Tom Lane wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

It looks to me that generating a single error message while idle causes
the server to provide ErrorResponse, which the client assumes is the end
of the processing of that statement as defined in FE/BE protocol.

Yeah. I think this is actually a client-side issue: it should keep
reading till it gets a 'Z' message. Not clear how that fits into the
libpq-to-app API though.

That makes sense. I'll dig around there.

The infinite loop error seems server-side though.

Any progress on this?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#4)
Re: Bugs during ProcessCatchupEvent()

On Wed, 2009-02-04 at 14:53 -0500, Bruce Momjian wrote:

Simon Riggs wrote:

On Tue, 2009-01-06 at 09:44 -0500, Tom Lane wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

It looks to me that generating a single error message while idle causes
the server to provide ErrorResponse, which the client assumes is the end
of the processing of that statement as defined in FE/BE protocol.

Yeah. I think this is actually a client-side issue: it should keep
reading till it gets a 'Z' message. Not clear how that fits into the
libpq-to-app API though.

That makes sense. I'll dig around there.

The infinite loop error seems server-side though.

Any progress on this?

No, not had time in recent days.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support