write_pipe_chunks patch messes up early error message output

Started by Tom Laneover 18 years ago16 messages
#1Tom Lane
tgl@sss.pgh.pa.us

[ ok, let's try this again --- apparently there's something in my mail
software that dislikes embedded nulls ]

I was experimenting with what happened if I made the pg_log directory
unwritable, and found out that this comes out on stderr:

<NUL><NUL><NUL>_<NUL><NUL>J3tFATAL: could not create log file "pg_log/postgresql-2007-07-15_210716.log": Permission denied

The problem is that elog.c thinks that the GUC variable Redirect_stderr
tells it whether to use the pipe chunk protocol on stderr, but in fact
that should *not* happen until we have successfully launched the syslogger
and repointed stderr to the pipe.

I suspect that error messages coming out of the syslogger itself (and
directed to the original stderr destination) will be similarly broken.

So that patch still needs work. I think we probably need a flag
variable separate from the GUC variable to tell when to send using
the chunk protocol.

regards, tom lane

#2Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#1)
Re: write_pipe_chunks patch messes up early error message output

Tom Lane wrote:

I suspect that error messages coming out of the syslogger itself (and
directed to the original stderr destination) will be similarly broken.

I thought we had that case handled, but I could be wrong.

So that patch still needs work.

Yes, darnit.

I think we probably need a flag
variable separate from the GUC variable to tell when to send using
the chunk protocol.

Sounds reasonable. I will try to get to it this week.

cheers

andrew

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#1)
Re: write_pipe_chunks patch messes up early error message output

Tom Lane wrote:

I think we probably need a flag
variable separate from the GUC variable to tell when to send using
the chunk protocol.

Is there any reason we can't just use a check on whether SysLoggerPID is
not 0? It should only be set if the syslogger has in fact started.

cheers

andrew

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#3)
Re: write_pipe_chunks patch messes up early error message output

Andrew Dunstan <andrew@dunslane.net> writes:

Is there any reason we can't just use a check on whether SysLoggerPID is
not 0?

(a) that really shouldn't be exported out of postmaster.c, and (b) it is
not readily available to child backends is it?

regards, tom lane

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#4)
Re: write_pipe_chunks patch messes up early error message output

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Is there any reason we can't just use a check on whether SysLoggerPID is
not 0?

(a) that really shouldn't be exported out of postmaster.c, and (b) it is
not readily available to child backends is it?

It's already used in elog.c in Win32 code:

if ((!Redirect_stderr || am_syslogger ||
(!IsUnderPostmaster && SysLoggerPID==0)) &&
pgwin32_is_service())
write_eventlog(edata->elevel, buf.data);

Child backends might have an out of date version if we restart the
Syslogger, but would that matter in this case? For current purposes all
we need is to know that the syslogger has in fact started, ISTM.

If that makes you puke we can do something more elegant, but I suspect
it will amount to the same thing.

cheers

andrew

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#5)
Re: write_pipe_chunks patch messes up early error message output

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

(a) that really shouldn't be exported out of postmaster.c, and (b) it is
not readily available to child backends is it?

It's already used in elog.c in Win32 code:

if ((!Redirect_stderr || am_syslogger ||
(!IsUnderPostmaster && SysLoggerPID==0)) &&
pgwin32_is_service())
write_eventlog(edata->elevel, buf.data);

Child backends might have an out of date version if we restart the
Syslogger, but would that matter in this case?

This code is already too ugly to live :-(.

But aside from esthetics, there is a functional reason to have a
separate flag variable. Consider the transient state where the
syslogger has failed and we are trying to start a new one. If the
postmaster wishes to elog anything (like, say, the log entry about
the syslogger having failed) in this interval, then it *should*
use the chunk protocol, because we expect that the data will
eventually be eaten by the new syslogger.

I think offhand that the correct semantics of the flag are "we have
redirected our original stderr into a pipe for syslogger", and in
fact that we should transition the output format exactly at the
instant where we do that; the starting of the child process happens
at a slightly different time, and restarting of the child (if needed)
is yet a different issue.

Another thing that tracking such a flag would help us clean up is
the syslogger's own elogging behavior. IIRC the "original" syslogger
is launched with its stderr pointing to the original stderr, and so
it's useful for any messages generated by syslogger itself to be copied
onto that stderr. After a relaunch, though, this is no longer possible
and it'd probably be best if syslogger doesn't even try writing to its
stderr.

regards, tom lane

#7Zeugswetter Andreas ADI SD
Andreas.Zeugswetter@s-itsolutions.at
In reply to: Tom Lane (#4)
Re: write_pipe_chunks patch messes up early error message output

Is there any reason we can't just use a check on whether

SysLoggerPID

is not 0?

(a) that really shouldn't be exported out of postmaster.c,
and (b) it is not readily available to child backends is it?

Should there be child backends when the logger did not start ?
I'd think startup would be aborted if that happed ?

Andreas

#8Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#6)
Re: write_pipe_chunks patch messes up early error message output

Tom Lane wrote:

I think offhand that the correct semantics of the flag are "we have
redirected our original stderr into a pipe for syslogger", and in
fact that we should transition the output format exactly at the
instant where we do that; the starting of the child process happens
at a slightly different time, and restarting of the child (if needed)
is yet a different issue.

We could expose syslogger's redirection_done flag, which I think has the
semantics you want. AFAICS it is never unset once set. (I should note
that the distance between the point where this is set and where
SysLoggerPID is set on return from SysLogger_Start is negligible, and no
logging statements are called there, but I take your point about
continuing to use chunking during a syslogger restart when SysLoggerPID
might be 0.)

cheers

andrew

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#8)
Re: write_pipe_chunks patch messes up early error message output

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

I think offhand that the correct semantics of the flag are "we have
redirected our original stderr into a pipe for syslogger",

We could expose syslogger's redirection_done flag, which I think has the
semantics you want.

Yeah, that would work. You'd have to get rid of the current ad-hoc
method by which it is propagated to the syslogger child process
(EXEC_BACKEND case), because now it will have to be propagated to all
children; so postmaster.c should handle it in BackendParameters.

regards, tom lane

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#9)
Re: write_pipe_chunks patch messes up early error message output

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

I think offhand that the correct semantics of the flag are "we have
redirected our original stderr into a pipe for syslogger",

We could expose syslogger's redirection_done flag, which I think has the
semantics you want.

Yeah, that would work. You'd have to get rid of the current ad-hoc
method by which it is propagated to the syslogger child process
(EXEC_BACKEND case), because now it will have to be propagated to all
children; so postmaster.c should handle it in BackendParameters.

The problem with this as it stands is that the syslogger itself is
forked before the redirection is done. I guess we need to make sure the
syslogger itself never calls write_pipe_chunks() - which makes sense
anyway - should probably call write_syslogger_file() directly, I think.

cheers

andrew

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#10)
Re: write_pipe_chunks patch messes up early error message output

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

Yeah, that would work. You'd have to get rid of the current ad-hoc
method by which it is propagated to the syslogger child process
(EXEC_BACKEND case), because now it will have to be propagated to all
children; so postmaster.c should handle it in BackendParameters.

The problem with this as it stands is that the syslogger itself is
forked before the redirection is done.

Which is entirely correct. Re-read what I said about first launch vs
relaunch of the syslogger. Its stderr will be connected differently in
the two cases, and should be handled differently --- we want the first
launch to try to report problems on its own stderr, but there's no point
after a relaunch. That's why we pass down redirection_done to it.

regards, tom lane

#12Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#11)
Re: write_pipe_chunks patch messes up early error message output

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

Yeah, that would work. You'd have to get rid of the current ad-hoc
method by which it is propagated to the syslogger child process
(EXEC_BACKEND case), because now it will have to be propagated to all
children; so postmaster.c should handle it in BackendParameters.

The problem with this as it stands is that the syslogger itself is
forked before the redirection is done.

Which is entirely correct. Re-read what I said about first launch vs
relaunch of the syslogger. Its stderr will be connected differently in
the two cases, and should be handled differently --- we want the first
launch to try to report problems on its own stderr, but there's no point
after a relaunch. That's why we pass down redirection_done to it.

What I was trying to nut out was how to handle logging from the first
launched syslogger after redirection is done.

Or, looking at it another way, why would we ever want the syslogger to
use the chunking protocol at all?

cheers

andrew

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#12)
Re: write_pipe_chunks patch messes up early error message output

Andrew Dunstan <andrew@dunslane.net> writes:

Or, looking at it another way, why would we ever want the syslogger to
use the chunking protocol at all?

Ah, I misunderstood you. Yeah, I think you are right: if we are
special-casing the syslogger process anyway, then it need only have
these two behaviors:

not redirection_done: write to own stderr (not chunked) and directly to
file

redirection_done: write directly to file

One thing to watch out for is infinite recursion if the write-to-file
gets an error. I don't remember if we have a defense against that
in there now, but we probably should.

regards, tom lane

#14Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#13)
1 attachment(s)
Re: write_pipe_chunks patch messes up early error message output

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Or, looking at it another way, why would we ever want the syslogger to
use the chunking protocol at all?

Ah, I misunderstood you. Yeah, I think you are right: if we are
special-casing the syslogger process anyway, then it need only have
these two behaviors:

not redirection_done: write to own stderr (not chunked) and directly to
file

redirection_done: write directly to file

One thing to watch out for is infinite recursion if the write-to-file
gets an error. I don't remember if we have a defense against that
in there now, but we probably should.

I think we do. write_syslogger_file says:

/* can't use ereport here because of possible recursion */

Anyway, I think the attached patch will do what we need.

cheers

andrew

Attachments:

chunkfix.patchtext/x-patch; name=chunkfix.patchDownload
Index: src/backend/postmaster/postmaster.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/postmaster.c,v
retrieving revision 1.532
diff -c -r1.532 postmaster.c
*** src/backend/postmaster/postmaster.c	11 Jul 2007 08:27:33 -0000	1.532
--- src/backend/postmaster/postmaster.c	19 Jul 2007 15:53:39 -0000
***************
*** 203,210 ****
  			BgWriterPID = 0,
  			AutoVacPID = 0,
  			PgArchPID = 0,
! 			PgStatPID = 0;
! pid_t			SysLoggerPID = 0; /* Needs to be accessed from elog.c */
  
  /* Startup/shutdown state */
  #define			NoShutdown		0
--- 203,210 ----
  			BgWriterPID = 0,
  			AutoVacPID = 0,
  			PgArchPID = 0,
!         	PgStatPID = 0,
!     		SysLoggerPID = 0;
  
  /* Startup/shutdown state */
  #define			NoShutdown		0
***************
*** 218,223 ****
--- 218,225 ----
  bool		ClientAuthInProgress = false;		/* T during new-client
  												 * authentication */
  
+ bool redirection_done = false; 
+ 
  /* received START_AUTOVAC_LAUNCHER signal */
  static bool start_autovac_launcher = false;
  
***************
*** 332,337 ****
--- 334,340 ----
  	InheritableSocket pgStatSock;
  	pid_t		PostmasterPid;
  	TimestampTz PgStartTime;
+ 	bool        redirection_done;
  #ifdef WIN32
  	HANDLE		PostmasterHandle;
  	HANDLE		initial_signal_pipe;
***************
*** 3953,3958 ****
--- 3956,3963 ----
  	param->PostmasterPid = PostmasterPid;
  	param->PgStartTime = PgStartTime;
  
+ 	param->redirection_done = redirection_done;
+ 
  #ifdef WIN32
  	param->PostmasterHandle = PostmasterHandle;
  	write_duplicated_handle(&param->initial_signal_pipe,
***************
*** 4156,4161 ****
--- 4161,4168 ----
  	PostmasterPid = param->PostmasterPid;
  	PgStartTime = param->PgStartTime;
  
+ 	redirection_done = param->redirection_done;
+ 
  #ifdef WIN32
  	PostmasterHandle = param->PostmasterHandle;
  	pgwin32_initial_signal_pipe = param->initial_signal_pipe;
Index: src/backend/postmaster/syslogger.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/syslogger.c,v
retrieving revision 1.32
diff -c -r1.32 syslogger.c
*** src/backend/postmaster/syslogger.c	14 Jun 2007 01:48:51 -0000	1.32
--- src/backend/postmaster/syslogger.c	19 Jul 2007 15:53:39 -0000
***************
*** 79,89 ****
   */
  bool		am_syslogger = false;
  
  /*
   * Private state
   */
  static pg_time_t next_rotation_time;
- static bool redirection_done = false;
  static bool pipe_eof_seen = false;
  static FILE *syslogFile = NULL;
  static char *last_file_name = NULL;
--- 79,90 ----
   */
  bool		am_syslogger = false;
  
+ extern bool redirection_done;
+ 
  /*
   * Private state
   */
  static pg_time_t next_rotation_time;
  static bool pipe_eof_seen = false;
  static FILE *syslogFile = NULL;
  static char *last_file_name = NULL;
***************
*** 582,595 ****
  		snprintf(numbuf[bufc++], 32, "%d", fileno(syslogFile));
  	else
  		strcpy(numbuf[bufc++], "-1");
- 	snprintf(numbuf[bufc++], 32, "%d", (int) redirection_done);
  #else							/* WIN32 */
  	if (syslogFile != NULL)
  		snprintf(numbuf[bufc++], 32, "%ld",
  				 _get_osfhandle(_fileno(syslogFile)));
  	else
  		strcpy(numbuf[bufc++], "0");
- 	snprintf(numbuf[bufc++], 32, "%d", (int) redirection_done);
  #endif   /* WIN32 */
  
  	/* Add to the arg list */
--- 583,594 ----
***************
*** 623,629 ****
  		syslogFile = fdopen(fd, "a");
  		setvbuf(syslogFile, NULL, LBF_MODE, 0);
  	}
- 	redirection_done = (bool) atoi(*argv++);
  #else							/* WIN32 */
  	fd = atoi(*argv++);
  	if (fd != 0)
--- 622,627 ----
***************
*** 635,641 ****
  			setvbuf(syslogFile, NULL, LBF_MODE, 0);
  		}
  	}
- 	redirection_done = (bool) atoi(*argv++);
  #endif   /* WIN32 */
  }
  #endif   /* EXEC_BACKEND */
--- 633,638 ----
Index: src/backend/utils/error/elog.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/error/elog.c,v
retrieving revision 1.187
diff -c -r1.187 elog.c
*** src/backend/utils/error/elog.c	14 Jun 2007 01:48:51 -0000	1.187
--- src/backend/utils/error/elog.c	19 Jul 2007 15:53:41 -0000
***************
*** 76,82 ****
  
  sigjmp_buf *PG_exception_stack = NULL;
  
! extern pid_t SysLoggerPID;
  
  /* GUC parameters */
  PGErrorVerbosity Log_error_verbosity = PGERROR_VERBOSE;
--- 76,82 ----
  
  sigjmp_buf *PG_exception_stack = NULL;
  
! extern bool redirection_done;
  
  /* GUC parameters */
  PGErrorVerbosity Log_error_verbosity = PGERROR_VERBOSE;
***************
*** 1780,1790 ****
  		 * that's really a pipe to the syslogger process. Unless we're in the
  		 * postmaster, and the syslogger process isn't started yet.
  		 */
! 		if ((!Redirect_stderr || am_syslogger || (!IsUnderPostmaster && SysLoggerPID==0)) && pgwin32_is_service())
  			write_eventlog(edata->elevel, buf.data);
  		else
  #endif
! 			if (Redirect_stderr)
  				write_pipe_chunks(fileno(stderr), buf.data, buf.len);
  			else
  				write(fileno(stderr), buf.data, buf.len);
--- 1780,1795 ----
  		 * that's really a pipe to the syslogger process. Unless we're in the
  		 * postmaster, and the syslogger process isn't started yet.
  		 */
! 		if (pgwin32_is_service && 
! 			(!Redirect_stderr || !redirection_done || am_syslogger))
  			write_eventlog(edata->elevel, buf.data);
  		else
  #endif
! 			/* only use the chunking protocol if we know the syslogger should
! 			 * be catching stderr output, and we are not ourselves the
! 			 * syslogger. Otherwise, go directly to stderr.
! 			 */
! 			if (Redirect_stderr && redirection_done && !am_syslogger)
  				write_pipe_chunks(fileno(stderr), buf.data, buf.len);
  			else
  				write(fileno(stderr), buf.data, buf.len);
#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#14)
Re: write_pipe_chunks patch messes up early error message output

Andrew Dunstan <andrew@dunslane.net> writes:

Anyway, I think the attached patch will do what we need.

I think you could leave Redirect_stderr out of the elog.c tests
entirely, since redirection_done can never become set without it.

Also, you introduced a bug: pgwin32_is_service is a function no?
Needs invocation parens.

Otherwise looks reasonable.

regards, tom lane

#16Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#15)
Re: write_pipe_chunks patch messes up early error message output

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Anyway, I think the attached patch will do what we need.

I think you could leave Redirect_stderr out of the elog.c tests
entirely, since redirection_done can never become set without it.

Also, you introduced a bug: pgwin32_is_service is a function no?
Needs invocation parens.

Otherwise looks reasonable.

Thanks. Applied with those fixes, back as far as 8.0.

cheers

andrew