log chunking broken with large queries under load

Started by Andrew Dunstanalmost 14 years ago13 messages
#1Andrew Dunstan
andrew@dunslane.net

Some of my PostgreSQL Experts colleagues have been complaining to me
that servers under load with very large queries cause CSV log files that
are corrupted, because lines are apparently multiplexed. The log
chunking protocol between the errlog routines and the syslogger is
supposed to prevent that, so I did a little work to try to reproduce it
in a controlled way. On my dual quad xeon setup, this script:

#!/bin/sh
par=$1
seq=$2

sed 2000q /usr/share/dict/words > words

psql -q -c 'drop table if exists foo'
psql -q -c 'create table foo (t text)'

echo '\set words `cat words`' > wordsin.sql
echo 'prepare fooplan (text) as insert into foo values ($1);' >>
wordsin.sql

for i in `seq 1 $seq`; do
echo "execute fooplan(:'words');" >> wordsin.sql
done

for i in `seq 1 $par`; do
psql -q -t -f wordsin.sql &
done
wait

called with parameters of 100 and 50 (i.e. 100 simultaneous clients each
doing 50 very large inserts) is enough to cause CSV log corruption quite
reliably on PostgreSQL 9.1.

This is a serious bug. I'm going to investigate, but it's causing major
pain, so anyone else who has any ideas is welcome to chime in.

cheers

andrew

#2Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#1)
1 attachment(s)
Re: log chunking broken with large queries under load

On 04/01/2012 06:34 PM, Andrew Dunstan wrote:

Some of my PostgreSQL Experts colleagues have been complaining to me
that servers under load with very large queries cause CSV log files
that are corrupted, because lines are apparently multiplexed. The log
chunking protocol between the errlog routines and the syslogger is
supposed to prevent that, so I did a little work to try to reproduce
it in a controlled way.

Well, a little further digging jogged my memory a bit. It looks like we
underestimated the amount of messages we might get as more than one
chunk fairly badly.

We could just increase CHUNK_SLOTS in syslogger.c, but I opted instead
to stripe the slots with a two dimensional array, so we didn't have to
search a larger number of slots for any given message. See the attached
patch.

I'm not sure how much we want to scale this up. I set CHUNK_STRIPES to
20 to start with, and I've asked some colleagues with very heavy log
loads with very large queries to test it out if possible. If anyone else
has a similar load I'd appreciate similar testing.

cheers

andrew

Attachments:

pgsyslogger.patchtext/x-patch; name=pgsyslogger.patchDownload
*** a/src/backend/postmaster/syslogger.c
--- b/src/backend/postmaster/syslogger.c
***************
*** 93,101 **** static char *last_file_name = NULL;
  static char *last_csv_file_name = NULL;
  
  /*
!  * Buffers for saving partial messages from different backends. We don't expect
!  * that there will be very many outstanding at one time, so 20 seems plenty of
!  * leeway. If this array gets full we won't lose messages, but we will lose
   * the protocol protection against them being partially written or interleaved.
   *
   * An inactive buffer has pid == 0 and undefined contents of data.
--- 93,108 ----
  static char *last_csv_file_name = NULL;
  
  /*
!  * Buffers for saving partial messages from different backends.
!  *
!  * Under heavy load we can get quite a few of them, so we stripe them across
!  * an array based on the mod of the pid, which seem an adequate hash function
!  * for pids. We do this rather then just keeping a one-dimensional array so
!  * we don't have to probe too many slots for any given pid.
!  * 20 stripes of 20 slots each seems likely to be adequate leeway, but if there
!  * are lots of overruns we might need to increase CHUNK_STRIPES a bit, or look
!  * at some alternative scheme.
!  * If a stripe gets full we won't lose messages, but we will lose
   * the protocol protection against them being partially written or interleaved.
   *
   * An inactive buffer has pid == 0 and undefined contents of data.
***************
*** 107,113 **** typedef struct
  } save_buffer;
  
  #define CHUNK_SLOTS 20
! static save_buffer saved_chunks[CHUNK_SLOTS];
  
  /* These must be exported for EXEC_BACKEND case ... annoying */
  #ifndef WIN32
--- 114,121 ----
  } save_buffer;
  
  #define CHUNK_SLOTS 20
! #define CHUNK_STRIPES 20
! static save_buffer saved_chunks[CHUNK_STRIPES][CHUNK_SLOTS];
  
  /* These must be exported for EXEC_BACKEND case ... annoying */
  #ifndef WIN32
***************
*** 725,730 **** process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
--- 733,739 ----
  	{
  		PipeProtoHeader p;
  		int			chunklen;
+ 		int         stripe;
  
  		/* Do we have a valid header? */
  		memcpy(&p, cursor, sizeof(PipeProtoHeader));
***************
*** 743,748 **** process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
--- 752,759 ----
  			dest = (p.is_last == 'T' || p.is_last == 'F') ?
  				LOG_DESTINATION_CSVLOG : LOG_DESTINATION_STDERR;
  
+ 			stripe = p.pid % CHUNK_STRIPES;
+ 
  			if (p.is_last == 'f' || p.is_last == 'F')
  			{
  				/*
***************
*** 756,780 **** process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
  
  				for (i = 0; i < CHUNK_SLOTS; i++)
  				{
! 					if (saved_chunks[i].pid == p.pid)
  					{
  						existing_slot = i;
  						break;
  					}
! 					if (free_slot < 0 && saved_chunks[i].pid == 0)
  						free_slot = i;
  				}
  				if (existing_slot >= 0)
  				{
! 					str = &(saved_chunks[existing_slot].data);
  					appendBinaryStringInfo(str,
  										   cursor + PIPE_HEADER_SIZE,
  										   p.len);
  				}
  				else if (free_slot >= 0)
  				{
! 					saved_chunks[free_slot].pid = p.pid;
! 					str = &(saved_chunks[free_slot].data);
  					initStringInfo(str);
  					appendBinaryStringInfo(str,
  										   cursor + PIPE_HEADER_SIZE,
--- 767,791 ----
  
  				for (i = 0; i < CHUNK_SLOTS; i++)
  				{
! 					if (saved_chunks[stripe][i].pid == p.pid)
  					{
  						existing_slot = i;
  						break;
  					}
! 					if (free_slot < 0 && saved_chunks[stripe][i].pid == 0)
  						free_slot = i;
  				}
  				if (existing_slot >= 0)
  				{
! 					str = &(saved_chunks[stripe][existing_slot].data);
  					appendBinaryStringInfo(str,
  										   cursor + PIPE_HEADER_SIZE,
  										   p.len);
  				}
  				else if (free_slot >= 0)
  				{
! 					saved_chunks[stripe][free_slot].pid = p.pid;
! 					str = &(saved_chunks[stripe][free_slot].data);
  					initStringInfo(str);
  					appendBinaryStringInfo(str,
  										   cursor + PIPE_HEADER_SIZE,
***************
*** 803,809 **** process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
  
  				for (i = 0; i < CHUNK_SLOTS; i++)
  				{
! 					if (saved_chunks[i].pid == p.pid)
  					{
  						existing_slot = i;
  						break;
--- 814,820 ----
  
  				for (i = 0; i < CHUNK_SLOTS; i++)
  				{
! 					if (saved_chunks[stripe][i].pid == p.pid)
  					{
  						existing_slot = i;
  						break;
***************
*** 811,822 **** process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
  				}
  				if (existing_slot >= 0)
  				{
! 					str = &(saved_chunks[existing_slot].data);
  					appendBinaryStringInfo(str,
  										   cursor + PIPE_HEADER_SIZE,
  										   p.len);
  					write_syslogger_file(str->data, str->len, dest);
! 					saved_chunks[existing_slot].pid = 0;
  					pfree(str->data);
  				}
  				else
--- 822,833 ----
  				}
  				if (existing_slot >= 0)
  				{
! 					str = &(saved_chunks[stripe][existing_slot].data);
  					appendBinaryStringInfo(str,
  										   cursor + PIPE_HEADER_SIZE,
  										   p.len);
  					write_syslogger_file(str->data, str->len, dest);
! 					saved_chunks[stripe][existing_slot].pid = 0;
  					pfree(str->data);
  				}
  				else
***************
*** 871,890 **** process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
  static void
  flush_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
  {
! 	int			i;
  	StringInfo	str;
  
  	/* Dump any incomplete protocol messages */
! 	for (i = 0; i < CHUNK_SLOTS; i++)
! 	{
! 		if (saved_chunks[i].pid != 0)
  		{
! 			str = &(saved_chunks[i].data);
! 			write_syslogger_file(str->data, str->len, LOG_DESTINATION_STDERR);
! 			saved_chunks[i].pid = 0;
! 			pfree(str->data);
  		}
- 	}
  
  	/*
  	 * Force out any remaining pipe data as-is; we don't bother trying to
--- 882,902 ----
  static void
  flush_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
  {
! 	int			i, stripe;
  	StringInfo	str;
  
  	/* Dump any incomplete protocol messages */
! 	for (stripe = 0; stripe < CHUNK_STRIPES; stripe++)
! 		for (i = 0; i < CHUNK_SLOTS; i++)
  		{
! 			if (saved_chunks[stripe][i].pid != 0)
! 			{
! 				str = &(saved_chunks[stripe][i].data);
! 				write_syslogger_file(str->data, str->len, LOG_DESTINATION_STDERR);
! 				saved_chunks[stripe][i].pid = 0;
! 				pfree(str->data);
! 			}
  		}
  
  	/*
  	 * Force out any remaining pipe data as-is; we don't bother trying to
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#2)
Re: log chunking broken with large queries under load

Andrew Dunstan <andrew@dunslane.net> writes:

On 04/01/2012 06:34 PM, Andrew Dunstan wrote:

Some of my PostgreSQL Experts colleagues have been complaining to me
that servers under load with very large queries cause CSV log files
that are corrupted,

We could just increase CHUNK_SLOTS in syslogger.c, but I opted instead
to stripe the slots with a two dimensional array, so we didn't have to
search a larger number of slots for any given message. See the attached
patch.

This seems like it isn't actually fixing the problem, only pushing out
the onset of trouble a bit. Should we not replace the fixed-size array
with a dynamic data structure?

regards, tom lane

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#3)
Re: log chunking broken with large queries under load

On 04/02/2012 12:00 PM, Tom Lane wrote:

Andrew Dunstan<andrew@dunslane.net> writes:

On 04/01/2012 06:34 PM, Andrew Dunstan wrote:

Some of my PostgreSQL Experts colleagues have been complaining to me
that servers under load with very large queries cause CSV log files
that are corrupted,

We could just increase CHUNK_SLOTS in syslogger.c, but I opted instead
to stripe the slots with a two dimensional array, so we didn't have to
search a larger number of slots for any given message. See the attached
patch.

This seems like it isn't actually fixing the problem, only pushing out
the onset of trouble a bit. Should we not replace the fixed-size array
with a dynamic data structure?

"A bit" = 10 to 20 times - more if we set CHUNK_STRIPES higher. :-)

But maybe your're right. If we do that and stick with my two-dimensional
scheme to keep the number of probes per chunk down, we'd need to reorg
the array every time we increased it. That might be a bit messy, but
might be ok. Or maybe linearly searching an array of several hundred
slots for our pid for every log chunk that comes in would be fast enough.

cheers

andrew

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#4)
Re: log chunking broken with large queries under load

Andrew Dunstan <andrew@dunslane.net> writes:

On 04/02/2012 12:00 PM, Tom Lane wrote:

This seems like it isn't actually fixing the problem, only pushing out
the onset of trouble a bit. Should we not replace the fixed-size array
with a dynamic data structure?

But maybe your're right. If we do that and stick with my two-dimensional
scheme to keep the number of probes per chunk down, we'd need to reorg
the array every time we increased it. That might be a bit messy, but
might be ok. Or maybe linearly searching an array of several hundred
slots for our pid for every log chunk that comes in would be fast enough.

You could do something like having a list of pending chunks for each
value of (pid mod 256). The length of each such list ought to be plenty
short under ordinary circumstances.

regards, tom lane

#6Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#5)
Re: log chunking broken with large queries under load

On 04/02/2012 12:44 PM, Tom Lane wrote:

Andrew Dunstan<andrew@dunslane.net> writes:

On 04/02/2012 12:00 PM, Tom Lane wrote:

This seems like it isn't actually fixing the problem, only pushing out
the onset of trouble a bit. Should we not replace the fixed-size array
with a dynamic data structure?

But maybe your're right. If we do that and stick with my two-dimensional
scheme to keep the number of probes per chunk down, we'd need to reorg
the array every time we increased it. That might be a bit messy, but
might be ok. Or maybe linearly searching an array of several hundred
slots for our pid for every log chunk that comes in would be fast enough.

You could do something like having a list of pending chunks for each
value of (pid mod 256). The length of each such list ought to be plenty
short under ordinary circumstances.

Yeah, ok, that should work. How big would we make each list to start
with? Still 20, or smaller?

cheers

andrew

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#6)
Re: log chunking broken with large queries under load

Andrew Dunstan <andrew@dunslane.net> writes:

On 04/02/2012 12:44 PM, Tom Lane wrote:

You could do something like having a list of pending chunks for each
value of (pid mod 256). The length of each such list ought to be plenty
short under ordinary circumstances.

Yeah, ok, that should work. How big would we make each list to start
with? Still 20, or smaller?

When I said "list", I meant a "List *". No fixed size.

regards, tom lane

#8Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#7)
1 attachment(s)
Re: log chunking broken with large queries under load

On 04/02/2012 01:03 PM, Tom Lane wrote:

Andrew Dunstan<andrew@dunslane.net> writes:

On 04/02/2012 12:44 PM, Tom Lane wrote:

You could do something like having a list of pending chunks for each
value of (pid mod 256). The length of each such list ought to be plenty
short under ordinary circumstances.

Yeah, ok, that should work. How big would we make each list to start
with? Still 20, or smaller?

When I said "list", I meant a "List *". No fixed size.

Ok, like this? Do we consider this a bug fix, to be backpatched?

cheers

andrew

Attachments:

pgsyslogger2.patchtext/x-patch; name=pgsyslogger2.patchDownload
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index a603ab4..2e4d437 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -35,6 +35,7 @@
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgtime.h"
+#include "nodes/pg_list.h"
 #include "postmaster/fork_process.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
@@ -93,10 +94,11 @@ static char *last_file_name = NULL;
 static char *last_csv_file_name = NULL;
 
 /*
- * Buffers for saving partial messages from different backends. We don't expect
- * that there will be very many outstanding at one time, so 20 seems plenty of
- * leeway. If this array gets full we won't lose messages, but we will lose
- * the protocol protection against them being partially written or interleaved.
+ * Buffers for saving partial messages from different backends.
+ *
+ * Keep 256 lists of these, the list for a given pid being pid mod 256,
+ * so that for any message we don't have to search a long list looking
+ * for a match and/or an open slot.
  *
  * An inactive buffer has pid == 0 and undefined contents of data.
  */
@@ -106,8 +108,8 @@ typedef struct
 	StringInfoData data;		/* accumulated data, as a StringInfo */
 } save_buffer;
 
-#define CHUNK_SLOTS 20
-static save_buffer saved_chunks[CHUNK_SLOTS];
+#define CHUNK_STRIPES 256
+static List *buffer_stripes[CHUNK_STRIPES];
 
 /* These must be exported for EXEC_BACKEND case ... annoying */
 #ifndef WIN32
@@ -725,6 +727,7 @@ process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
 	{
 		PipeProtoHeader p;
 		int			chunklen;
+		List         *buffer_stripe;
 
 		/* Do we have a valid header? */
 		memcpy(&p, cursor, sizeof(PipeProtoHeader));
@@ -743,53 +746,55 @@ process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
 			dest = (p.is_last == 'T' || p.is_last == 'F') ?
 				LOG_DESTINATION_CSVLOG : LOG_DESTINATION_STDERR;
 
+			buffer_stripe = buffer_stripes[p.pid % CHUNK_STRIPES];
+
 			if (p.is_last == 'f' || p.is_last == 'F')
 			{
 				/*
-				 * Save a complete non-final chunk in the per-pid buffer if
-				 * possible - if not just write it out.
+				 * Save a complete non-final chunk in the per-pid buffer
 				 */
-				int			free_slot = -1,
-							existing_slot = -1;
-				int			i;
+				ListCell    *cell;
+				save_buffer *existing_slot = NULL, 
+					*free_slot = NULL;
 				StringInfo	str;
 
-				for (i = 0; i < CHUNK_SLOTS; i++)
+				foreach (cell, buffer_stripe)
 				{
-					if (saved_chunks[i].pid == p.pid)
+					save_buffer *buffnode = cell->data.ptr_value;
+					if (buffnode->pid == p.pid)
 					{
-						existing_slot = i;
+						existing_slot = buffnode;
 						break;
 					}
-					if (free_slot < 0 && saved_chunks[i].pid == 0)
-						free_slot = i;
+					if (free_slot == NULL && buffnode->pid == 0)
+						free_slot = buffnode;
 				}
-				if (existing_slot >= 0)
+				if (existing_slot != NULL)
 				{
-					str = &(saved_chunks[existing_slot].data);
+					str = &(existing_slot->data);
 					appendBinaryStringInfo(str,
 										   cursor + PIPE_HEADER_SIZE,
 										   p.len);
 				}
-				else if (free_slot >= 0)
+				else 
 				{
-					saved_chunks[free_slot].pid = p.pid;
-					str = &(saved_chunks[free_slot].data);
+					if (free_slot == NULL)
+					{
+						/* 
+						 * Need a free slot, but there isn't one in the list,
+						 * so create a new one and extend the list with it.
+						 */
+						free_slot = palloc(sizeof(save_buffer));
+						buffer_stripe = lappend(buffer_stripe, free_slot);
+						buffer_stripes[p.pid % CHUNK_STRIPES] = buffer_stripe;
+					}
+					free_slot->pid = p.pid;
+					str = &(free_slot->data);
 					initStringInfo(str);
 					appendBinaryStringInfo(str,
 										   cursor + PIPE_HEADER_SIZE,
 										   p.len);
 				}
-				else
-				{
-					/*
-					 * If there is no free slot we'll just have to take our
-					 * chances and write out a partial message and hope that
-					 * it's not followed by something from another pid.
-					 */
-					write_syslogger_file(cursor + PIPE_HEADER_SIZE, p.len,
-										 dest);
-				}
 			}
 			else
 			{
@@ -797,26 +802,28 @@ process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
 				 * Final chunk --- add it to anything saved for that pid, and
 				 * either way write the whole thing out.
 				 */
-				int			existing_slot = -1;
-				int			i;
+				ListCell     *cell;
+				save_buffer *existing_slot = NULL;
 				StringInfo	str;
 
-				for (i = 0; i < CHUNK_SLOTS; i++)
+				foreach (cell, buffer_stripe)
 				{
-					if (saved_chunks[i].pid == p.pid)
+					save_buffer  *buffnode = cell->data.ptr_value;
+					if (buffnode->pid == p.pid)
 					{
-						existing_slot = i;
+						existing_slot = buffnode;
 						break;
 					}
 				}
-				if (existing_slot >= 0)
+
+				if (existing_slot != NULL)
 				{
-					str = &(saved_chunks[existing_slot].data);
+					str = &(existing_slot->data);
 					appendBinaryStringInfo(str,
 										   cursor + PIPE_HEADER_SIZE,
 										   p.len);
 					write_syslogger_file(str->data, str->len, dest);
-					saved_chunks[existing_slot].pid = 0;
+					existing_slot->pid = 0;
 					pfree(str->data);
 				}
 				else
@@ -871,18 +878,26 @@ process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
 static void
 flush_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
 {
-	int			i;
+	int			stripe;
+	List       *list;
+	ListCell   *cell;
 	StringInfo	str;
 
 	/* Dump any incomplete protocol messages */
-	for (i = 0; i < CHUNK_SLOTS; i++)
+	for (stripe = 0; stripe < CHUNK_STRIPES; stripe++)
 	{
-		if (saved_chunks[i].pid != 0)
+		list = buffer_stripes[stripe];
+
+		foreach (cell, list)
 		{
-			str = &(saved_chunks[i].data);
-			write_syslogger_file(str->data, str->len, LOG_DESTINATION_STDERR);
-			saved_chunks[i].pid = 0;
-			pfree(str->data);
+			save_buffer *buf = cell->data.ptr_value;
+			if (buf->pid != 0)
+			{
+				str = &(buf->data);
+				write_syslogger_file(str->data, str->len, LOG_DESTINATION_STDERR);
+				buf->pid = 0;
+				pfree(str->data);
+			}
 		}
 	}
 
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#8)
Re: log chunking broken with large queries under load

Andrew Dunstan <andrew@dunslane.net> writes:

On 04/02/2012 01:03 PM, Tom Lane wrote:

When I said "list", I meant a "List *". No fixed size.

Ok, like this?

I think this could use a bit of editorialization (I don't think the
"stripe" terminology is still applicable, in particular), but the
general idea seems OK.

Does anyone feel that it's a bad idea that list entries are never
reclaimed? In the worst case a transient load peak could result in
a long list that permanently adds search overhead. Not sure if it's
worth the extra complexity to delete a list cell that's no longer
needed, rather than leaving it present and empty.

Do we consider this a bug fix, to be backpatched?

Yes, definitely.

I think I'd like to have a go at coding it the other way (with
release of list entries), just to see if that comes out cleaner
or uglier than this way. If you don't mind I'll pick this up
and commit whichever way turns out better.

regards, tom lane

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#9)
Re: log chunking broken with large queries under load

On 04/04/2012 12:13 PM, Tom Lane wrote:

Andrew Dunstan<andrew@dunslane.net> writes:

On 04/02/2012 01:03 PM, Tom Lane wrote:

When I said "list", I meant a "List *". No fixed size.

Ok, like this?

I think this could use a bit of editorialization (I don't think the
"stripe" terminology is still applicable, in particular), but the
general idea seems OK.

Does anyone feel that it's a bad idea that list entries are never
reclaimed? In the worst case a transient load peak could result in
a long list that permanently adds search overhead. Not sure if it's
worth the extra complexity to delete a list cell that's no longer
needed, rather than leaving it present and empty.

Me either. The logic could possibly be something simple when we free a
node like "while the list tail is an available node prune the tail". But
as you say, it might not be worth it.

Do we consider this a bug fix, to be backpatched?

Yes, definitely.

I think I'd like to have a go at coding it the other way (with
release of list entries), just to see if that comes out cleaner
or uglier than this way. If you don't mind I'll pick this up
and commit whichever way turns out better.

Go for it.

cheers

andrew

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#10)
Re: log chunking broken with large queries under load

Andrew Dunstan <andrew@dunslane.net> writes:

On 04/04/2012 12:13 PM, Tom Lane wrote:

Does anyone feel that it's a bad idea that list entries are never
reclaimed? In the worst case a transient load peak could result in
a long list that permanently adds search overhead. Not sure if it's
worth the extra complexity to delete a list cell that's no longer
needed, rather than leaving it present and empty.

Me either. The logic could possibly be something simple when we free a
node like "while the list tail is an available node prune the tail". But
as you say, it might not be worth it.

The idea I had in mind was to compensate for adding list-removal logic
by getting rid of the concept of an unused entry. If the removal is
conditional then you can't do that and you end up with the complications
of both methods. Anyway I've not tried to code it yet.

regards, tom lane

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#11)
Re: log chunking broken with large queries under load

I wrote:

The idea I had in mind was to compensate for adding list-removal logic
by getting rid of the concept of an unused entry. If the removal is
conditional then you can't do that and you end up with the complications
of both methods. Anyway I've not tried to code it yet.

I concluded this would probably be a loser performance-wise, because it
would add a couple of palloc/pfree cycles to the processing of each
multi-chunk message, whether there was any contention or not. So I
committed the patch with just some cosmetic cleanups.

regards, tom lane

#13Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#12)
Re: log chunking broken with large queries under load

On 04/04/2012 03:09 PM, Tom Lane wrote:

I wrote:

The idea I had in mind was to compensate for adding list-removal logic
by getting rid of the concept of an unused entry. If the removal is
conditional then you can't do that and you end up with the complications
of both methods. Anyway I've not tried to code it yet.

I concluded this would probably be a loser performance-wise, because it
would add a couple of palloc/pfree cycles to the processing of each
multi-chunk message, whether there was any contention or not. So I
committed the patch with just some cosmetic cleanups.

OK, thanks for doing this.

cheers

andrew