COPYable logs status

Started by Andrew Dunstanover 18 years ago20 messages

andrew@dunslane.net

over 18 years ago

[summarising discussion on -patches]

The situation with this patch is that I now have it in a state where I
think it could be applied, but there is one blocker, namely that we do
not have a way of preventing the interleaving of log messages from
different backends, which leads to garbled logs. This is an existing
issue about which we have had complaints, but it becomes critical for a
facility the whole purpose of which is to provide logs in a format
guaranteed to work with our COPY command.

Unfortunately, there is no solution in sight for this problem, certainly
not one which I think can be devised and implemented simply at this
stage of the cycle. The solution we'd like to use, LWLocks, is not
workable in his context. In consequence, I don't think we have any
option but to shelve this item for the time being.

A couple of bugs have been found and fixes identified, during the review
process, so it's not a total loss, but it is nevertheless a pity that we
can't deliver this feature in 8.3.

cheers

andrew

Alvaro Herrera

alvherre@commandprompt.com

over 18 years ago

In reply to: Andrew Dunstan (#1)

Re: COPYable logs status

Andrew Dunstan wrote:

Unfortunately, there is no solution in sight for this problem, certainly
not one which I think can be devised and implemented simply at this
stage of the cycle. The solution we'd like to use, LWLocks, is not
workable in his context. In consequence, I don't think we have any
option but to shelve this item for the time being.

The idea of one pipe per process is not really workable, because it
would mean having as many pipes as backends which does not sound very
good. But how about a mixed approach -- like have the all the backends
share a pipe, controlled by an LWLock, and the auxiliary process have a
separate pipe each?

One thing I haven't understood yet is how having multiple pipes help on
this issue. Is the logger reading from the pipe and then writing to a
file? (I haven't read the logger code).

--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
"Endurecerse, pero jamï¿½s perder la ternura" (E. Guevara)

Martijn van Oosterhout

kleptog@svana.org

over 18 years ago

In reply to: Andrew Dunstan (#1)

Re: COPYable logs status

On Fri, Jun 08, 2007 at 08:31:54AM -0400, Andrew Dunstan wrote:

The situation with this patch is that I now have it in a state where I
think it could be applied, but there is one blocker, namely that we do
not have a way of preventing the interleaving of log messages from
different backends, which leads to garbled logs. This is an existing
issue about which we have had complaints, but it becomes critical for a
facility the whole purpose of which is to provide logs in a format
guaranteed to work with our COPY command.

The whole semantics of PIPEBUF should prevent garbling, as long as each
write is a complete set of lines and no more than PIPEBUF bytes long.
Have we determined the actual cause of the garbling?

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

Alvaro Herrera

alvherre@commandprompt.com

over 18 years ago

In reply to: Martijn van Oosterhout (#3)

Re: COPYable logs status

Martijn van Oosterhout wrote:

On Fri, Jun 08, 2007 at 08:31:54AM -0400, Andrew Dunstan wrote:

The situation with this patch is that I now have it in a state where I
think it could be applied, but there is one blocker, namely that we do
not have a way of preventing the interleaving of log messages from
different backends, which leads to garbled logs. This is an existing
issue about which we have had complaints, but it becomes critical for a
facility the whole purpose of which is to provide logs in a format
guaranteed to work with our COPY command.

The whole semantics of PIPEBUF should prevent garbling, as long as each
write is a complete set of lines and no more than PIPEBUF bytes long.
Have we determined the actual cause of the garbling?

No, that's the main problem -- but it has been reported to happen on
entries shorter than PIPE_BUF chars.

--
Alvaro Herrera http://www.PlanetPostgreSQL.org/
"La persona que no querï¿½a pecar / estaba obligada a sentarse
en duras y empinadas sillas / desprovistas, por cierto
de blandos atenuantes" (Patricio Vogel)

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Alvaro Herrera (#4)

Re: COPYable logs status

Alvaro Herrera <alvherre@commandprompt.com> writes:

Martijn van Oosterhout wrote:

The whole semantics of PIPEBUF should prevent garbling, as long as each
write is a complete set of lines and no more than PIPEBUF bytes long.
Have we determined the actual cause of the garbling?

No, that's the main problem -- but it has been reported to happen on
entries shorter than PIPE_BUF chars.

It's not entirely clear to me whether there's been proven cases of
interpolation *into* a message shorter than PIPE_BUF (and remember
you've got to count all the lines when determining the length).
The message intruding into the other could certainly be shorter.

If there have been such cases, then our theories about what's going on
are all wet, or else there are some rather nasty bugs in some kernels'
pipe handling. So it would be good to pin this down.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Alvaro Herrera (#2)

Re: COPYable logs status

Alvaro Herrera <alvherre@commandprompt.com> writes:

The idea of one pipe per process is not really workable, because it
would mean having as many pipes as backends which does not sound very
good. But how about a mixed approach -- like have the all the backends
share a pipe, controlled by an LWLock, and the auxiliary process have a
separate pipe each?

Multiple pipes seem like a mess, and in any case the above still doesn't
work for stderr output produced by non-cooperative software (dynamic
loader for instance).

The only solution that I can see is to invent some sort of simple
protocol for the syslogger pipe. Assume that the kernel honors PIPE_BUF
(this assumption may need proving, see other message). We could imagine
having elog.c divvy up its writes to the pipe into chunks of less than
PIPE_BUF bytes, where each chunk carries info sufficient to let it be
reassembled. Perhaps something on the order of

\0 \0 2-byte-length source-PID end-flag text...

The syslogger reassembles these by joining messages with the same
origination PID, until it gets one with the end-flag set. It would need
enough code to track multiple in-progress messages.

The logger would have to also be able to deal with random text coming
down the pipe (due to aforesaid non-cooperative software). I would be
inclined to say just take any text not preceded by \0\0 as a standalone
message, up to the next \0\0. Long chunks of non-protocol text would
risk getting treated as multiple messages, but there's probably not a
lot of harm in that.

BTW, exactly what is the COPYable-logs code going to do with random
text? I trust the answer is not "throw it away".

regards, tom lane

Andrew Dunstan

andrew@dunslane.net

over 18 years ago

In reply to: Tom Lane (#5)

Re: COPYable logs status

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Martijn van Oosterhout wrote:

The whole semantics of PIPEBUF should prevent garbling, as long as each
write is a complete set of lines and no more than PIPEBUF bytes long.
Have we determined the actual cause of the garbling?

No, that's the main problem -- but it has been reported to happen on
entries shorter than PIPE_BUF chars.

It's not entirely clear to me whether there's been proven cases of
interpolation *into* a message shorter than PIPE_BUF (and remember
you've got to count all the lines when determining the length).
The message intruding into the other could certainly be shorter.

If there have been such cases, then our theories about what's going on
are all wet, or else there are some rather nasty bugs in some kernels'
pipe handling. So it would be good to pin this down.

Right. But we don't split lines into PIPE_BUF sized chunks. And doing so
would make loadable logs possibly rather less pleasant. Ideally we
should be able to deal with this despite the PIPE_BUF restriction on
atomic writes.

cheers

andrew

Andrew Sullivan

ajs@crankycanuck.ca

over 18 years ago

In reply to: Tom Lane (#6)

Re: COPYable logs status

On Fri, Jun 08, 2007 at 10:29:03AM -0400, Tom Lane wrote:

The only solution that I can see is to invent some sort of simple
protocol for the syslogger pipe.

Perhaps having a look at the current IETF syslog discussion will be
helpful in that case? (I know it's not directly relevant, but maybe
others have thought about some of these things. I haven't read the
draft, note.)

http://tools.ietf.org/html/draft-ietf-syslog-protocol-20

There's also the discussion of reliability in RFC 3195:

ftp://ftp.rfc-editor.org/in-notes/rfc3195.txt

--
Andrew Sullivan | ajs@crankycanuck.ca
The whole tendency of modern prose is away from concreteness.
--George Orwell

Matthew T. O'Connor

matthew@zeut.net

over 18 years ago

In reply to: Andrew Dunstan (#1)

Re: COPYable logs status

Andrew Dunstan wrote:

The situation with this patch is that I now have it in a state where I
think it could be applied, but there is one blocker, namely that we do
not have a way of preventing the interleaving of log messages from
different backends, which leads to garbled logs. This is an existing
issue about which we have had complaints, but it becomes critical for a
facility the whole purpose of which is to provide logs in a format
guaranteed to work with our COPY command.

Unfortunately, there is no solution in sight for this problem, certainly
not one which I think can be devised and implemented simply at this
stage of the cycle. The solution we'd like to use, LWLocks, is not
workable in his context. In consequence, I don't think we have any
option but to shelve this item for the time being.

I think this will get shot down, but here goes anyway...

How about creating a log-writing-process? Postmaster could write to the
log files directly until the log-writer is up and running, then all
processes can send their log output through the log-writer.

#10

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Matthew T. O'Connor (#9)

Re: COPYable logs status

"Matthew T. O'Connor" <matthew@zeut.net> writes:

How about creating a log-writing-process? Postmaster could write to the
log files directly until the log-writer is up and running, then all
processes can send their log output through the log-writer.

We *have* a log-writing process. The problem is in getting the data to it.

regards, tom lane

#11

Matthew T. O'Connor

matthew@zeut.net

over 18 years ago

In reply to: Tom Lane (#10)

Re: COPYable logs status

Tom Lane wrote:

"Matthew T. O'Connor" <matthew@zeut.net> writes:

How about creating a log-writing-process? Postmaster could write to the
log files directly until the log-writer is up and running, then all
processes can send their log output through the log-writer.

We *have* a log-writing process. The problem is in getting the data to it.

By that I assume you mean the bgwriter, I thought that was for WAL data,
I didn't think it could or perhaps should be used for normal log file
writing, but I also know I'm way outside my comfort area in talking
about this, so excuse the noise if this is way off base.

#12

Markus Schiltknecht

markus@bluegap.ch

over 18 years ago

In reply to: Tom Lane (#10)

Re: COPYable logs status

Hi,

Tom Lane wrote:

We *have* a log-writing process. The problem is in getting the data to it.

Remember the imessages approach I'm using for Postgres-R? It passes
messages around using shared memory and signals the receiver on incoming
data. It's not perfect, sure, but it's a general solution to a common
problem.

Maybe it's worth a thought, instead of fiddling with signals, special
shmem areas and possible races every time the 'getting data to another
process'-problem comes up?

Regards

Markus

#13

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Markus Schiltknecht (#12)

Re: COPYable logs status

Markus Schiltknecht <markus@bluegap.ch> writes:

Tom Lane wrote:

We *have* a log-writing process. The problem is in getting the data to it.

Remember the imessages approach I'm using for Postgres-R? It passes
messages around using shared memory and signals the receiver on incoming
data. It's not perfect, sure, but it's a general solution to a common
problem.

Uh-huh. And how will you get libc's dynamic-link code to buy into
issuing link error messages this way? Not to mention every other bit
of code that might get linked into the backend?

Trapping what comes out of stderr is simply too useful a behavior to lose.

regards, tom lane

#14

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Matthew T. O'Connor (#11)

Re: COPYable logs status

"Matthew T. O'Connor" <matthew@zeut.net> writes:

Tom Lane wrote:

We *have* a log-writing process. The problem is in getting the data to it.

By that I assume you mean the bgwriter, I thought that was for WAL data,

No, I'm talking about src/backend/postmaster/syslogger.c

regards, tom lane

#15

Markus Schiltknecht

markus@bluegap.ch

over 18 years ago

In reply to: Tom Lane (#13)

Re: COPYable logs status

Tom Lane wrote:

Markus Schiltknecht <markus@bluegap.ch> writes:

Tom Lane wrote:

We *have* a log-writing process. The problem is in getting the data to it.

Remember the imessages approach I'm using for Postgres-R? It passes
messages around using shared memory and signals the receiver on incoming
data. It's not perfect, sure, but it's a general solution to a common
problem.

Uh-huh. And how will you get libc's dynamic-link code to buy into
issuing link error messages this way? Not to mention every other bit
of code that might get linked into the backend?

I was refering to the 'getting data to another process' problem. If
that's the problem (as you said upthread) imessages might be a solution.

Trapping what comes out of stderr is simply too useful a behavior to lose.

Sure. I've never said anything against that.

Regards

Markus

#16

Andrew Dunstan

andrew@dunslane.net

over 18 years ago

In reply to: Tom Lane (#6)

Re: COPYable logs status

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

The idea of one pipe per process is not really workable, because it
would mean having as many pipes as backends which does not sound very
good. But how about a mixed approach -- like have the all the backends
share a pipe, controlled by an LWLock, and the auxiliary process have a
separate pipe each?

Multiple pipes seem like a mess, and in any case the above still doesn't
work for stderr output produced by non-cooperative software (dynamic
loader for instance).

The only solution that I can see is to invent some sort of simple
protocol for the syslogger pipe. Assume that the kernel honors PIPE_BUF
(this assumption may need proving, see other message). We could imagine
having elog.c divvy up its writes to the pipe into chunks of less than
PIPE_BUF bytes, where each chunk carries info sufficient to let it be
reassembled. Perhaps something on the order of

\0 \0 2-byte-length source-PID end-flag text...

The syslogger reassembles these by joining messages with the same
origination PID, until it gets one with the end-flag set. It would need
enough code to track multiple in-progress messages.

The logger would have to also be able to deal with random text coming
down the pipe (due to aforesaid non-cooperative software). I would be
inclined to say just take any text not preceded by \0\0 as a standalone
message, up to the next \0\0. Long chunks of non-protocol text would
risk getting treated as multiple messages, but there's probably not a
lot of harm in that.

BTW, exactly what is the COPYable-logs code going to do with random
text? I trust the answer is not "throw it away".

The CSVlog pipe is a separate pipe from the stderr pipe. Anything that
goes to stderr now will continue to go to stderr, wherever that is.

I like this scheme for a couple of reasons:
. it will include the ability to tell the real end of a message
. it will let us handle non-protocol messages (although there shouldn't
be any in the CSVlog pipe).

I'll try to get a patch out for just the stderr case, which should be
back-patchable, then adjust the CSVlog patch to use it.

I'm thinking of handling the partial lines with a small dynahash of
StringInfo buffers, which get discarded whenever we don't have a partial
line for the PID.

cheers

andrew

#17

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Andrew Dunstan (#16)

Re: COPYable logs status

Andrew Dunstan <andrew@dunslane.net> writes:

I'll try to get a patch out for just the stderr case, which should be
back-patchable, then adjust the CSVlog patch to use it.

Sounds like a plan.

I'm thinking of handling the partial lines with a small dynahash of
StringInfo buffers, which get discarded whenever we don't have a partial
line for the PID.

A hashtable might be overkill --- based on reports so far, it's unlikely
you'd have more than two or three messages being received concurrently,
so a simple list or array might be quicker to search.

regards, tom lane

#18

FAST PostgreSQL

fastpgs@fast.fujitsu.com.au

over 18 years ago

In reply to: Andrew Dunstan (#16)

Re: COPYable logs status

Andrew Dunstan wrote:

The CSVlog pipe is a separate pipe from the stderr pipe. Anything that
goes to stderr now will continue to go to stderr, wherever that is.

I like this scheme for a couple of reasons:
. it will include the ability to tell the real end of a message
. it will let us handle non-protocol messages (although there shouldn't
be any in the CSVlog pipe).

Another important reason I went for two seperate pipes is that, in
Windows, the pipe calls being blocking calls, the performance really
deteriorates unless we increase the allocated buffer to the pipes
dramatically.

On a rather decent machine, simply running the regression tests would
consume a lot of resources, especially when it comes to the errors tests.

Rgds,
Arul Shaji

Andrew Dunstan wrote:

Show quoted text

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

The idea of one pipe per process is not really workable, because it
would mean having as many pipes as backends which does not sound very
good. But how about a mixed approach -- like have the all the backends
share a pipe, controlled by an LWLock, and the auxiliary process have a
separate pipe each?

Multiple pipes seem like a mess, and in any case the above still doesn't
work for stderr output produced by non-cooperative software (dynamic
loader for instance).

The only solution that I can see is to invent some sort of simple
protocol for the syslogger pipe. Assume that the kernel honors PIPE_BUF
(this assumption may need proving, see other message). We could imagine
having elog.c divvy up its writes to the pipe into chunks of less than
PIPE_BUF bytes, where each chunk carries info sufficient to let it be
reassembled. Perhaps something on the order of

\0 \0 2-byte-length source-PID end-flag text...

The syslogger reassembles these by joining messages with the same
origination PID, until it gets one with the end-flag set. It would need
enough code to track multiple in-progress messages.

The logger would have to also be able to deal with random text coming
down the pipe (due to aforesaid non-cooperative software). I would be
inclined to say just take any text not preceded by \0\0 as a standalone
message, up to the next \0\0. Long chunks of non-protocol text would
risk getting treated as multiple messages, but there's probably not a
lot of harm in that.

BTW, exactly what is the COPYable-logs code going to do with random
text? I trust the answer is not "throw it away".

The CSVlog pipe is a separate pipe from the stderr pipe. Anything that
goes to stderr now will continue to go to stderr, wherever that is.

I like this scheme for a couple of reasons:
. it will include the ability to tell the real end of a message
. it will let us handle non-protocol messages (although there shouldn't
be any in the CSVlog pipe).

I'll try to get a patch out for just the stderr case, which should be
back-patchable, then adjust the CSVlog patch to use it.

I'm thinking of handling the partial lines with a small dynahash of
StringInfo buffers, which get discarded whenever we don't have a partial
line for the PID.

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#19

Andrew Dunstan

andrew@dunslane.net

over 18 years ago

In reply to: Tom Lane (#17)

1 attachment(s)

Re: [HACKERS] COPYable logs status

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

I'll try to get a patch out for just the stderr case, which should be
back-patchable, then adjust the CSVlog patch to use it.

Sounds like a plan.

I'm thinking of handling the partial lines with a small dynahash of
StringInfo buffers, which get discarded whenever we don't have a partial
line for the PID.

A hashtable might be overkill --- based on reports so far, it's unlikely
you'd have more than two or three messages being received concurrently,
so a simple list or array might be quicker to search.

Attached is a WIP patch ... I still have some debugging to do but I
think the basic logic is there. Comments welcome.

ATM it gets stuck in running installcheck and gdb shows the logger
hanging here:

enlargeStringInfo (str=0x9a91c8, needed=4085) at stringinfo.c:263
263 newlen = 2 * newlen;

Can I not use a StringInfo in the syslogger?

cheers

andrew

Attachments:

logpipe.patchtext/x-patch; name=logpipe.patchDownload

Index: src/backend/postmaster/syslogger.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/syslogger.c,v
retrieving revision 1.31
diff -c -r1.31 syslogger.c
*** src/backend/postmaster/syslogger.c	4 Jun 2007 22:21:42 -0000	1.31
--- src/backend/postmaster/syslogger.c	12 Jun 2007 23:23:38 -0000
***************
*** 42,47 ****
--- 42,48 ----
  #include "utils/guc.h"
  #include "utils/ps_status.h"
  #include "utils/timestamp.h"
+ #include "lib/stringinfo.h"
  
  /*
   * We really want line-buffered mode for logfile output, but Windows does
***************
*** 54,59 ****
--- 55,76 ----
  #define LBF_MODE	_IOLBF
  #endif
  
+ #if PIPE_BUF > 1024
+ #define READ_SIZE PIPE_BUF
+ #else
+ #define READ_SIZE 1024
+ #endif
+ 
+ /* 
+  * we use a buffer twice as big as a read so that if there is a fragment left
+  * after process what is read we can save it and copy it back before the next
+  * read.
+  */
+ #define READ_BUF_SIZE 2 * READ_SIZE
+ 
+ /* buffer to keep any partial chunks read between calls to read()/ReadFile() */
+ static char * read_fragment[READ_SIZE];
+ static int read_fragment_len = 0;
  
  /*
   * GUC parameters.	Redirect_stderr cannot be changed after postmaster
***************
*** 75,89 ****
   * Private state
   */
  static pg_time_t next_rotation_time;
- 
  static bool redirection_done = false;
- 
  static bool pipe_eof_seen = false;
- 
  static FILE *syslogFile = NULL;
- 
  static char *last_file_name = NULL;
  
  /* These must be exported for EXEC_BACKEND case ... annoying */
  #ifndef WIN32
  int			syslogPipe[2] = {-1, -1};
--- 92,110 ----
   * Private state
   */
  static pg_time_t next_rotation_time;
  static bool redirection_done = false;
  static bool pipe_eof_seen = false;
  static FILE *syslogFile = NULL;
  static char *last_file_name = NULL;
  
+ typedef struct
+ {
+ 	pid_t pid;
+     StringInfoData data;
+ } save_buffer;
+ #define CHUNK_SLOTS 20
+ static save_buffer saved_chunks[CHUNK_SLOTS];
+ 
  /* These must be exported for EXEC_BACKEND case ... annoying */
  #ifndef WIN32
  int			syslogPipe[2] = {-1, -1};
***************
*** 117,123 ****
  static void set_next_rotation_time(void);
  static void sigHupHandler(SIGNAL_ARGS);
  static void sigUsr1Handler(SIGNAL_ARGS);
! 
  
  /*
   * Main entry point for syslogger process
--- 138,144 ----
  static void set_next_rotation_time(void);
  static void sigHupHandler(SIGNAL_ARGS);
  static void sigUsr1Handler(SIGNAL_ARGS);
! static void write_chunk(const char * buffer, int count);
  
  /*
   * Main entry point for syslogger process
***************
*** 244,250 ****
  		bool		time_based_rotation = false;
  
  #ifndef WIN32
! 		char		logbuffer[1024];
  		int			bytesRead;
  		int			rc;
  		fd_set		rfds;
--- 265,271 ----
  		bool		time_based_rotation = false;
  
  #ifndef WIN32
! 		char		logbuffer[READ_BUF_SIZE];
  		int			bytesRead;
  		int			rc;
  		fd_set		rfds;
***************
*** 325,332 ****
  		}
  		else if (rc > 0 && FD_ISSET(syslogPipe[0], &rfds))
  		{
  			bytesRead = piperead(syslogPipe[0],
! 								 logbuffer, sizeof(logbuffer));
  
  			if (bytesRead < 0)
  			{
--- 346,354 ----
  		}
  		else if (rc > 0 && FD_ISSET(syslogPipe[0], &rfds))
  		{
+ 			memcpy(logbuffer, read_fragment, read_fragment_len);
  			bytesRead = piperead(syslogPipe[0],
! 								 logbuffer + read_fragment_len, READ_SIZE);
  
  			if (bytesRead < 0)
  			{
***************
*** 337,343 ****
  			}
  			else if (bytesRead > 0)
  			{
! 				write_syslogger_file(logbuffer, bytesRead);
  				continue;
  			}
  			else
--- 359,365 ----
  			}
  			else if (bytesRead > 0)
  			{
! 				write_syslogger_file(logbuffer, bytesRead + read_fragment_len);
  				continue;
  			}
  			else
***************
*** 349,354 ****
--- 371,380 ----
  				 * and all backends are shut down, and we are done.
  				 */
  				pipe_eof_seen = true;
+ 
+ 				/* if there's a fragment left then force it out now */
+ 				if (read_fragment_len)
+ 					write_chunk(read_fragment, read_fragment_len);
  			}
  		}
  #else							/* WIN32 */
***************
*** 626,631 ****
--- 652,785 ----
  void
  write_syslogger_file(const char *buffer, int count)
  {
+ 	char *cursor = (char *) buffer;
+ 	int  chunklen;
+ 	PipeProto p;
+ 	while (count > 0)
+ 	{
+ 		/* not enough data even for a header? save it until we get more */
+ 		if (count < sizeof(PipeProto))
+ 		{
+ 			memcpy(read_fragment, cursor, count);
+ 			read_fragment_len = count;
+ 			return;
+ 		}
+ 		/* process protocol chunks */
+ 		if ( cursor[0] == '\0' && cursor[1] == '\0' )
+ 		{
+ 			memcpy(&p,cursor,sizeof(PipeProto));
+ 			/* save a partial chunk in the fragment buffer */
+ 			if (p.len + PIPE_DATA_OFFSET > count)
+ 			{
+ 				memcpy(read_fragment, cursor, count);
+ 				read_fragment_len = count;
+ 				return;
+ 			}
+ 			/* 
+ 			 * save a complete non-final chunk in the poer-pid buffer 
+ 			 * if possible - if not just write it out.
+ 			 */
+ 			else if ( ! p.is_last )
+ 			{
+ 				int free_slot = -1, existing_slot = -1;
+ 				int i;
+ 				for (i = 0; i < CHUNK_SLOTS; i++)
+ 				{
+ 					if (saved_chunks[i].pid == 0 && free_slot < 0)
+ 						free_slot = i;
+ 					if (saved_chunks[i].pid == p.pid)
+ 					{
+ 						existing_slot = i;
+ 						break;
+ 					}
+ 				}
+ 				if (existing_slot > -1)
+ 				{
+ 					appendBinaryStringInfo(&saved_chunks[existing_slot].data,
+ 										   cursor + PIPE_DATA_OFFSET, p.len);
+ 				}
+ 				else if (free_slot > -1)
+ 				{
+ 					saved_chunks[free_slot].pid = p.pid;
+ 					initStringInfo(&saved_chunks[free_slot].data);
+ 					appendBinaryStringInfo(&saved_chunks[existing_slot].data,
+ 										   cursor + PIPE_DATA_OFFSET, p.len);
+ 				}
+ 				else
+ 				{
+ 					/* 
+ 					 * if there is no exisiting or free slot we'll just have to
+ 					 * take our chances and write out a part message and hope
+ 					 * that it's not followed by something from another pid.
+ 					 */
+ 					write_chunk(cursor + PIPE_DATA_OFFSET, p.len);
+ 				}
+ 				count -= PIPE_DATA_OFFSET + p.len;
+ 				cursor += PIPE_DATA_OFFSET + p.len;
+ 			}
+ 			/* 
+ 			 * add a final chunk to anything saved for that pid, and either way
+ 			 * write the whole thing out.
+ 			 */			   
+ 			else
+ 			{
+ 				int existing_slot = -1;
+ 				int i;
+ 				for (i = 0; i < CHUNK_SLOTS; i++)
+ 				{
+ 					if (saved_chunks[i].pid == p.pid)
+ 					{
+ 						existing_slot = i;
+ 						break;
+ 					}
+ 				}
+ 				if (existing_slot > -1)
+ 				{
+ 					appendBinaryStringInfo(&saved_chunks[existing_slot].data,
+ 										   cursor + PIPE_DATA_OFFSET, p.len);
+ 					write_chunk(saved_chunks[existing_slot].data.data,
+ 								saved_chunks[existing_slot].data.len);
+ 					saved_chunks[existing_slot].pid = 0;
+ 					pfree(saved_chunks[existing_slot].data.data);
+ 				}
+ 				else
+ 				{
+ 					/* the whole message was one chunk, probably. */
+ 					write_chunk(cursor + PIPE_DATA_OFFSET, p.len);
+ 				}
+ 				count -= PIPE_DATA_OFFSET + p.len;
+ 				cursor += PIPE_DATA_OFFSET + p.len;
+ 			}
+ 			
+ 		}
+ 		/* process non-protocol chunks */
+ 		{
+ 			/* look for the start of a protocol header */
+ 			for(chunklen = 1; chunklen + 1 < count; chunklen++)
+ 			{
+ 				if (cursor[chunklen] == '\0' && cursor[chunklen + 1] == '\0')
+ 				{
+ 					write_chunk(cursor, chunklen);
+ 					cursor += chunklen;
+ 					count -= chunklen;
+ 					break;
+ 				}
+ 			}
+ 			/* if no protocol header, write out the whole remaining buffer */
+ 			if (chunklen + 1 >= count)
+ 			{
+ 				write_chunk(cursor, count);
+ 				read_fragment_len = 0;
+ 				return;
+ 			}
+ 		}
+ 	}
+ 	
+ }
+ 
+ void
+ write_chunk(const char *buffer, int count)
+ {
  	int			rc;
  
  #ifndef WIN32
***************
*** 654,664 ****
  pipeThread(void *arg)
  {
  	DWORD		bytesRead;
! 	char		logbuffer[1024];
  
  	for (;;)
  	{
! 		if (!ReadFile(syslogPipe[0], logbuffer, sizeof(logbuffer),
  					  &bytesRead, 0))
  		{
  			DWORD		error = GetLastError();
--- 808,819 ----
  pipeThread(void *arg)
  {
  	DWORD		bytesRead;
! 	char		logbuffer[READ_BUF_SIZE];
  
  	for (;;)
  	{
! 		memcpy(logbuffer, read_buffer, read_fragment_len);
! 		if (!ReadFile(syslogPipe[0], logbuffer + read_fragment_len, READ_SIZE,
  					  &bytesRead, 0))
  		{
  			DWORD		error = GetLastError();
***************
*** 672,682 ****
  					 errmsg("could not read from logger pipe: %m")));
  		}
  		else if (bytesRead > 0)
! 			write_syslogger_file(logbuffer, bytesRead);
  	}
  
  	/* We exit the above loop only upon detecting pipe EOF */
  	pipe_eof_seen = true;
  	_endthread();
  	return 0;
  }
--- 827,842 ----
  					 errmsg("could not read from logger pipe: %m")));
  		}
  		else if (bytesRead > 0)
! 			write_syslogger_file(logbuffer, bytesRead + read_fragment_len);
  	}
  
  	/* We exit the above loop only upon detecting pipe EOF */
  	pipe_eof_seen = true;
+ 
+ 	/* if there's a fragment left then force it out now */
+ 	if (read_fragment_len)
+ 		write_chunk(read_fragment, read_fragment_len);
+ 
  	_endthread();
  	return 0;
  }
Index: src/backend/utils/error/elog.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/error/elog.c,v
retrieving revision 1.186
diff -c -r1.186 elog.c
*** src/backend/utils/error/elog.c	7 Jun 2007 21:45:59 -0000	1.186
--- src/backend/utils/error/elog.c	12 Jun 2007 23:23:43 -0000
***************
*** 56,61 ****
--- 56,62 ----
  #ifdef HAVE_SYSLOG
  #include <syslog.h>
  #endif
+ #include <limits.h>
  
  #include "access/transam.h"
  #include "access/xact.h"
***************
*** 71,76 ****
--- 72,78 ----
  #include "utils/ps_status.h"
  
  
+ 
  /* Global variables */
  ErrorContextCallback *error_context_stack = NULL;
  
***************
*** 124,129 ****
--- 126,135 ----
  static const char *error_severity(int elevel);
  static void append_with_tabs(StringInfo buf, const char *str);
  static bool is_log_level_output(int elevel, int log_min_level);
+ static void write_pipe_chunks(int fd, char * data, int len);
+ 
+ /* allow space for preamble plus a little head room */
+ #define MAX_CHUNK (sizeof(PipeChunk) - sizeof(PipeProto))
  
  
  /*
***************
*** 1783,1789 ****
  			write_eventlog(edata->elevel, buf.data);
  		else
  #endif
! 			fprintf(stderr, "%s", buf.data);
  	}
  
  	/* If in the syslogger process, try to write messages direct to file */
--- 1789,1798 ----
  			write_eventlog(edata->elevel, buf.data);
  		else
  #endif
! 		if (Redirect_stderr)
! 			write_pipe_chunks(fileno(stderr),buf.data, buf.len);
! 		else
! 			write(fileno(stderr), buf.data, buf.len);
  	}
  
  	/* If in the syslogger process, try to write messages direct to file */
***************
*** 1794,1799 ****
--- 1803,1838 ----
  }
  
  
+ static void
+ write_pipe_chunks(int fd, char * data, int len)
+ {
+ 	PipeChunk p;
+ 
+ 	Assert(len > 0);
+ 
+ 	p.proto.nuls[0] = p.proto.nuls[1] = '\0';
+ 	p.proto.pid = MyProcPid;
+ 	p.proto.is_last = false;
+ 	p.proto.len = MAX_CHUNK;
+ 
+ 	write_stderr("total len is %d\n",len);
+ 
+ 	/* write all but the last chunk */
+ 	while (len > MAX_CHUNK)
+ 	{
+ 		memcpy(p.proto.data, data, MAX_CHUNK);
+ 		write(fd, &p, PIPE_DATA_OFFSET + MAX_CHUNK );
+ 		data += MAX_CHUNK;
+ 		len -= MAX_CHUNK;
+ 	}
+ 
+ 	/* write the last chunk */
+ 	p.proto.is_last = true;
+ 	p.proto.len = len;
+ 	memcpy(p.proto.data, data, len);
+ 	write(fd, &p, PIPE_DATA_OFFSET + len);
+ }
+ 
  /*
   * Write error report to client
   */
Index: src/include/postmaster/syslogger.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/postmaster/syslogger.h,v
retrieving revision 1.8
diff -c -r1.8 syslogger.h
*** src/include/postmaster/syslogger.h	5 Jan 2007 22:19:57 -0000	1.8
--- src/include/postmaster/syslogger.h	12 Jun 2007 23:23:48 -0000
***************
*** 37,40 ****
--- 37,58 ----
  extern void SysLoggerMain(int argc, char *argv[]);
  #endif
  
+ /* primitive protocol structure for writing to syslogger pipe(s) */
+ typedef struct 
+ {
+ 	char      nuls[2];    /* always \0\0 */
+ 	uint16    len;        /* size of this chunk */
+ 	pid_t     pid;        /* our pid */
+ 	bool      is_last;    /* is this the last chunk? */
+ 	char      data[1];  
+ } PipeProto;
+ 
+ typedef union 
+ {
+ 	PipeProto    proto;
+ 	char         data[PIPE_BUF];
+ }  PipeChunk;
+ 
+ #define PIPE_DATA_OFFSET offsetof(PipeProto, data) /* 9 usually */
+ 
  #endif   /* _SYSLOGGER_H */

#20

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Andrew Dunstan (#19)

Re: [HACKERS] COPYable logs status

Andrew Dunstan <andrew@dunslane.net> writes:

Can I not use a StringInfo in the syslogger?

Should work, elog.c expects it will --- I'd wonder about premature pfree
or something like that. Are you testing with --enable-cassert?

regards, tom lane