Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

Started by Amit kapilaabout 13 years ago49 messages
#1Amit kapila
amit.kapila@huawei.com
3 attachment(s)

Tue, 26 Jun 2012 17:04:42 -0400 Robert Haas wrote:

I see you posted up a follow-up email asking Tom what he had in mind.
Personally, I don't think this needs incredibly complicated testing.
I think you should just test a workload involving inserting and/or
updating rows with lots of trailing NULL columns, and then another
workload with a table of similar width that... doesn't. If we can't
find a regression - or, better, we find a win in one or both cases -
then I think we're done here.

As per the last discussion for this patch, performance data needs to be provided before this patch's Review can proceed further.

So as per your suggestion and from the discussions about this patch, I have collected the performance data as below:

Results are taken with following configuration.
1. Schema - UNLOGGED TABLE with 2,000,000 records having all columns are INT type.
2. shared_buffers = 10GB
3. All the performance result are taken with single connection.
4. Performance is collected for INSERT operation (insert into temptable select * from inittable)

Platform details:
Operating System: Suse-Linux 10.2 x86_64
Hardware : 4 core (Intel(R) Xeon(R) CPU L5408 @ 2.13GHz)
RAM : 24GB

Documents Attached:
init.sh : Which will create the schema
sql_used.sql : sql's used for taking results

Trim_Nulls_Perf_Report.html : Performance data

Observations from Performance Results

------------------------------------------------

1. There is no performance change for cloumns that have all valid values(non- NULLs).

2. There is a visible performance increase when number of columns containing NULLS are more than > 60~70% in table have large number of columns.

3. There are visible space savings when number of columns containing NULLS are more than > 60~70% in table have large number of columns.

Let me know if there is more performance data needs to be collected for this patch?

With Regards,

Amit Kapila.

Attachments:

init.shapplication/octet-stream; name=init.shDownload
sql_used.sqltext/plain; name=sql_used.sqlDownload
Trim_Nulls_Perf_Report.htmltext/html; name=Trim_Nulls_Perf_Report.htmlDownload
#2Amit kapila
amit.kapila@huawei.com
In reply to: Amit kapila (#1)
1 attachment(s)

On Saturday, October 13, 2012 1:24 PM Amit kapila wrote:
Tue, 26 Jun 2012 17:04:42 -0400 Robert Haas wrote:

I see you posted up a follow-up email asking Tom what he had in mind.
Personally, I don't think this needs incredibly complicated testing.
I think you should just test a workload involving inserting and/or
updating rows with lots of trailing NULL columns, and then another
workload with a table of similar width that... doesn't. If we can't
find a regression - or, better, we find a win in one or both cases -
then I think we're done here.

As per the last discussion for this patch, performance data needs to be provided before this patch's Review can proceed >further.
So as per your suggestion and from the discussions about this patch, I have collected the performance data as below:

Results are taken with following configuration.
1. Schema - UNLOGGED TABLE with 2,000,000 records having all columns are INT type.
2. shared_buffers = 10GB
3. All the performance result are taken with single connection.
4. Performance is collected for INSERT operation (insert into temptable select * from inittable)

Platform details:
Operating System: Suse-Linux 10.2 x86_64
Hardware : 4 core (Intel(R) Xeon(R) CPU L5408 @ 2.13GHz)
RAM : 24GB

Further to Performance data, I have completed the review of the Patch.

Basic stuff:
------------
- Rebase of Patch is required.
As heap_fill_tuple function prototype is moved to different file [htup.h to htup_details.h]
- Compiles cleanly without any errors/warnings
- Regression tests pass.

Code Review comments:
---------------------
1. There is possibility of memory growth in case of toast table, if trailing toasted columns are updated to NULLs;
i.e. In Function toast_insert_or_update, for tuples when 'need_change' variable is true, numAttrs are modified to last non null column values,
and in old tuple de-toasted columns are not getting freed, if this repeats for more number of tuples there is chance of out of memory.

if ( need_change)
{
numAttrs = lastNonNullValOffset + 1;
....
}

if (need_delold)
for (i = 0; i < numAttrs; i++) <== Tailing toasted value wouldn't be freed as updated to NULL and numAttrs is modified to smaller value.
if (toast_delold[i])
toast_delete_datum(rel, toast_oldvalues[i]);

2. Comments need to updated in following functions; how ending null columns are skipped in header part.
heap_fill_tuple - function header
heap_form_tuple, heap_form_minimal_tuple, heap_form_minimal_tuple.

3. Why following change is required in function toast_flatten_tuple_attribute 
        -        numAttrs = tupleDesc->natts; 
      +        numAttrs = HeapTupleHeaderGetNatts(olddata); 

Detailed Performance Report for Insert and Update Operations is attached with this mail.

Observations from Performance Results
------------------------------------------------
1. There is no performance change for cloumns that have all valid values(non- NULLs).
2. There is a visible performance increase when number of columns containing NULLS are more than > 60~70% in table have large number of columns.
3. There are visible space savings when number of columns containing NULLS are more than > 60~70% in table have large number of columns.

With Regards,
Amit Kapila.

Attachments:

Trim_Tailing_Nulls_Perf_Report.htmltext/html; name=Trim_Tailing_Nulls_Perf_Report.htmlDownload
#3Amit kapila
amit.kapila@huawei.com
In reply to: Amit kapila (#2)
1 attachment(s)

From: pgsql-hackers-owner@postgresql.org [pgsql-hackers-owner@postgresql.org] on behalf of Amit kapila [amit.kapila@huawei.com]
Sent: Monday, October 15, 2012 7:28 PM
To: robertmhaas@gmail.com; josh@agliodbs.com
Cc: pgsql-hackers@postgresql.org
Subject: [HACKERS] Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On Monday, October 15, 2012 7:28 PM Amit kapila wrote:
On Saturday, October 13, 2012 1:24 PM Amit kapila wrote:
Tue, 26 Jun 2012 17:04:42 -0400 Robert Haas wrote:

I see you posted up a follow-up email asking Tom what he had in mind.
Personally, I don't think this needs incredibly complicated testing.
I think you should just test a workload involving inserting and/or
updating rows with lots of trailing NULL columns, and then another
workload with a table of similar width that... doesn't. If we can't
find a regression - or, better, we find a win in one or both cases -
then I think we're done here.

As per the last discussion for this patch, performance data needs to be provided before this patch's Review can proceed >further.
So as per your suggestion and from the discussions about this patch, I have collected the performance data as below:

Results are taken with following configuration.
1. Schema - UNLOGGED TABLE with 2,000,000 records having all columns are INT type.
2. shared_buffers = 10GB
3. All the performance result are taken with single connection.
4. Performance is collected for INSERT operation (insert into temptable select * from inittable)

Platform details:
Operating System: Suse-Linux 10.2 x86_64
Hardware : 4 core (Intel(R) Xeon(R) CPU L5408 @ 2.13GHz)
RAM : 24GB

Further to Performance data, I have completed the review of the Patch.

Please find the patch to address Review Comments attached with this mail.

IMO, now its ready for a committer.

With Regards,
Amit Kapila.

Attachments:

Truncate-trailing-null-columns-from-heap-rows.v2.patchapplication/octet-stream; name=Truncate-trailing-null-columns-from-heap-rows.v2.patchDownload
*** a/src/backend/access/common/heaptuple.c
--- b/src/backend/access/common/heaptuple.c
***************
*** 127,154 **** heap_compute_data_size(TupleDesc tupleDesc,
   * We also fill the null bitmap (if any) and set the infomask bits
   * that reflect the tuple's data contents.
   *
   * NOTE: it is now REQUIRED that the caller have pre-zeroed the data area.
   */
  void
  heap_fill_tuple(TupleDesc tupleDesc,
  				Datum *values, bool *isnull,
  				char *data, Size data_size,
! 				uint16 *infomask, bits8 *bit)
  {
  	bits8	   *bitP;
  	int			bitmask;
  	int			i;
- 	int			numberOfAttributes = tupleDesc->natts;
  	Form_pg_attribute *att = tupleDesc->attrs;
  
  #ifdef USE_ASSERT_CHECKING
  	char	   *start = data;
  #endif
  
  	if (bit != NULL)
  	{
  		bitP = &bit[-1];
  		bitmask = HIGHBIT;
  	}
  	else
  	{
--- 127,160 ----
   * We also fill the null bitmap (if any) and set the infomask bits
   * that reflect the tuple's data contents.
   *
+  * For truncating trailing nulls in the tuple pass the last non-null column
+  * value as numberOfAttributes. Also make sure that same number of columns
+  * are set in the tuple header formation.
+  *
   * NOTE: it is now REQUIRED that the caller have pre-zeroed the data area.
   */
  void
  heap_fill_tuple(TupleDesc tupleDesc,
  				Datum *values, bool *isnull,
  				char *data, Size data_size,
! 				uint16 *infomask, bits8 *bit,
! 				int numberOfAttributes)
  {
  	bits8	   *bitP;
  	int			bitmask;
  	int			i;
  	Form_pg_attribute *att = tupleDesc->attrs;
  
  #ifdef USE_ASSERT_CHECKING
  	char	   *start = data;
  #endif
  
+ 	*infomask &= ~(HEAP_HASNULL | HEAP_HASVARWIDTH | HEAP_HASEXTERNAL);
  	if (bit != NULL)
  	{
  		bitP = &bit[-1];
  		bitmask = HIGHBIT;
+ 		*infomask |= HEAP_HASNULL;
  	}
  	else
  	{
***************
*** 157,163 **** heap_fill_tuple(TupleDesc tupleDesc,
  		bitmask = 0;
  	}
  
- 	*infomask &= ~(HEAP_HASNULL | HEAP_HASVARWIDTH | HEAP_HASEXTERNAL);
  
  	for (i = 0; i < numberOfAttributes; i++)
  	{
--- 163,168 ----
***************
*** 176,182 **** heap_fill_tuple(TupleDesc tupleDesc,
  
  			if (isnull[i])
  			{
- 				*infomask |= HEAP_HASNULL;
  				continue;
  			}
  
--- 181,186 ----
***************
*** 636,649 **** heap_form_tuple(TupleDesc tupleDescriptor,
  	int			hoff;
  	bool		hasnull = false;
  	Form_pg_attribute *att = tupleDescriptor->attrs;
! 	int			numberOfAttributes = tupleDescriptor->natts;
  	int			i;
  
! 	if (numberOfAttributes > MaxTupleAttributeNumber)
  		ereport(ERROR,
  				(errcode(ERRCODE_TOO_MANY_COLUMNS),
  				 errmsg("number of columns (%d) exceeds limit (%d)",
! 						numberOfAttributes, MaxTupleAttributeNumber)));
  
  	/*
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
--- 640,659 ----
  	int			hoff;
  	bool		hasnull = false;
  	Form_pg_attribute *att = tupleDescriptor->attrs;
! 	int			maxNumberOfAttributes = tupleDescriptor->natts;
  	int			i;
+ 	int			lastNonNullAttribute = -1;
+ 	int			numberOfAttributes;		/* The actual number of attributes for
+ 										 * this particular row. trailing nulls
+ 										 * aren't stored so this may be
+ 										 * smaller than maxNumberOfAttributes */
  
! 
! 	if (maxNumberOfAttributes > MaxTupleAttributeNumber)
  		ereport(ERROR,
  				(errcode(ERRCODE_TOO_MANY_COLUMNS),
  				 errmsg("number of columns (%d) exceeds limit (%d)",
! 						maxNumberOfAttributes, MaxTupleAttributeNumber)));
  
  	/*
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
***************
*** 656,675 **** heap_form_tuple(TupleDesc tupleDescriptor,
  	 * if an attribute is already toasted, it must have been sent to disk
  	 * already and so cannot contain toasted attributes.
  	 */
! 	for (i = 0; i < numberOfAttributes; i++)
  	{
  		if (isnull[i])
  			hasnull = true;
! 		else if (att[i]->attlen == -1 &&
  				 att[i]->attalign == 'd' &&
  				 att[i]->attndims == 0 &&
  				 !VARATT_IS_EXTENDED(DatumGetPointer(values[i])))
! 		{
! 			values[i] = toast_flatten_tuple_attribute(values[i],
  													  att[i]->atttypid,
  													  att[i]->atttypmod);
  		}
  	}
  
  	/*
  	 * Determine total space needed
--- 666,691 ----
  	 * if an attribute is already toasted, it must have been sent to disk
  	 * already and so cannot contain toasted attributes.
  	 */
! 	for (i = 0; i < maxNumberOfAttributes; i++)
  	{
  		if (isnull[i])
  			hasnull = true;
! 		else
! 		{
! 			lastNonNullAttribute = i;
! 
! 			if (att[i]->attlen == -1 &&
  				 att[i]->attalign == 'd' &&
  				 att[i]->attndims == 0 &&
  				 !VARATT_IS_EXTENDED(DatumGetPointer(values[i])))
! 			{
! 				values[i] = toast_flatten_tuple_attribute(values[i],
  													  att[i]->atttypid,
  													  att[i]->atttypmod);
+ 			}
  		}
  	}
+ 	numberOfAttributes = lastNonNullAttribute + 1;
  
  	/*
  	 * Determine total space needed
***************
*** 719,725 **** heap_form_tuple(TupleDesc tupleDescriptor,
  					(char *) td + hoff,
  					data_len,
  					&td->t_infomask,
! 					(hasnull ? td->t_bits : NULL));
  
  	return tuple;
  }
--- 735,742 ----
  					(char *) td + hoff,
  					data_len,
  					&td->t_infomask,
! 					(hasnull ? td->t_bits : NULL),
! 					numberOfAttributes);
  
  	return tuple;
  }
***************
*** 1390,1403 **** heap_form_minimal_tuple(TupleDesc tupleDescriptor,
  	int			hoff;
  	bool		hasnull = false;
  	Form_pg_attribute *att = tupleDescriptor->attrs;
! 	int			numberOfAttributes = tupleDescriptor->natts;
  	int			i;
  
! 	if (numberOfAttributes > MaxTupleAttributeNumber)
  		ereport(ERROR,
  				(errcode(ERRCODE_TOO_MANY_COLUMNS),
  				 errmsg("number of columns (%d) exceeds limit (%d)",
! 						numberOfAttributes, MaxTupleAttributeNumber)));
  
  	/*
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
--- 1407,1425 ----
  	int			hoff;
  	bool		hasnull = false;
  	Form_pg_attribute *att = tupleDescriptor->attrs;
! 	int			maxNumberOfAttributes = tupleDescriptor->natts;
  	int			i;
+ 	int			lastNonNullAttribute = -1;
+ 	int			numberOfAttributes;		/* The actual number of attributes for
+ 										 * this particular row. trailing nulls
+ 										 * aren't stored so this may be
+ 										 * smaller than maxNumberOfAttributes */
  
! 	if (maxNumberOfAttributes > MaxTupleAttributeNumber)
  		ereport(ERROR,
  				(errcode(ERRCODE_TOO_MANY_COLUMNS),
  				 errmsg("number of columns (%d) exceeds limit (%d)",
! 						maxNumberOfAttributes, MaxTupleAttributeNumber)));
  
  	/*
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
***************
*** 1410,1429 **** heap_form_minimal_tuple(TupleDesc tupleDescriptor,
  	 * if an attribute is already toasted, it must have been sent to disk
  	 * already and so cannot contain toasted attributes.
  	 */
! 	for (i = 0; i < numberOfAttributes; i++)
  	{
  		if (isnull[i])
  			hasnull = true;
! 		else if (att[i]->attlen == -1 &&
  				 att[i]->attalign == 'd' &&
  				 att[i]->attndims == 0 &&
  				 !VARATT_IS_EXTENDED(values[i]))
! 		{
! 			values[i] = toast_flatten_tuple_attribute(values[i],
  													  att[i]->atttypid,
  													  att[i]->atttypmod);
  		}
  	}
  
  	/*
  	 * Determine total space needed
--- 1432,1456 ----
  	 * if an attribute is already toasted, it must have been sent to disk
  	 * already and so cannot contain toasted attributes.
  	 */
! 	for (i = 0; i < maxNumberOfAttributes; i++)
  	{
  		if (isnull[i])
  			hasnull = true;
! 		else
! 		{
! 			lastNonNullAttribute = i;
! 			if (att[i]->attlen == -1 &&
  				 att[i]->attalign == 'd' &&
  				 att[i]->attndims == 0 &&
  				 !VARATT_IS_EXTENDED(values[i]))
! 			{
! 				values[i] = toast_flatten_tuple_attribute(values[i],
  													  att[i]->atttypid,
  													  att[i]->atttypmod);
+ 			}
  		}
  	}
+ 	numberOfAttributes = lastNonNullAttribute + 1;
  
  	/*
  	 * Determine total space needed
***************
*** 1463,1469 **** heap_form_minimal_tuple(TupleDesc tupleDescriptor,
  					(char *) tuple + hoff,
  					data_len,
  					&tuple->t_infomask,
! 					(hasnull ? tuple->t_bits : NULL));
  
  	return tuple;
  }
--- 1490,1497 ----
  					(char *) tuple + hoff,
  					data_len,
  					&tuple->t_infomask,
! 					(hasnull ? tuple->t_bits : NULL),
! 					numberOfAttributes);
  
  	return tuple;
  }
*** a/src/backend/access/common/indextuple.c
--- b/src/backend/access/common/indextuple.c
***************
*** 29,34 ****
--- 29,41 ----
  /* ----------------
   *		index_form_tuple
   * ----------------
+  *
+  * Note that as opposed to heap rows we don't bother truncating off trailing
+  * null attributes. This is because the maximum number of attributes in an
+  * index (32) is well bounded and there isn't much to be gained by trying to
+  * reduce the bitmap that tracks null attributes since we don't expect index
+  * keys have a lot of trailing nulls. Also, the logic for tracking the null
+  * bitmap is hard-wired at 4 bytes (see IndexAttributeBitMapData).
   */
  IndexTuple
  index_form_tuple(TupleDesc tupleDescriptor,
***************
*** 139,145 **** index_form_tuple(TupleDesc tupleDescriptor,
  					(char *) tp + hoff,
  					data_size,
  					&tupmask,
! 					(hasnull ? (bits8 *) tp + sizeof(IndexTupleData) : NULL));
  
  #ifdef TOAST_INDEX_HACK
  	for (i = 0; i < numberOfAttributes; i++)
--- 146,153 ----
  					(char *) tp + hoff,
  					data_size,
  					&tupmask,
! 					(hasnull ? (bits8 *) tp + sizeof(IndexTupleData) : NULL),
! 					numberOfAttributes);
  
  #ifdef TOAST_INDEX_HACK
  	for (i = 0; i < numberOfAttributes; i++)
*** a/src/backend/access/heap/tuptoaster.c
--- b/src/backend/access/heap/tuptoaster.c
***************
*** 425,430 **** toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
--- 425,431 ----
  	bool		need_free = false;
  	bool		need_delold = false;
  	bool		has_nulls = false;
+ 	int			lastNonNullValOffset = -1;
  
  	Size		maxDataLen;
  	Size		hoff;
***************
*** 534,539 **** toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
--- 535,544 ----
  			has_nulls = true;
  			continue;
  		}
+ 		else
+ 		{
+ 			lastNonNullValOffset = i;
+ 		}
  
  		/*
  		 * Now look at varlena attributes
***************
*** 593,600 **** toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
  
  	/* compute header overhead --- this should match heap_form_tuple() */
  	hoff = offsetof(HeapTupleHeaderData, t_bits);
! 	if (has_nulls)
! 		hoff += BITMAPLEN(numAttrs);
  	if (newtup->t_data->t_infomask & HEAP_HASOID)
  		hoff += sizeof(Oid);
  	hoff = MAXALIGN(hoff);
--- 598,605 ----
  
  	/* compute header overhead --- this should match heap_form_tuple() */
  	hoff = offsetof(HeapTupleHeaderData, t_bits);
! 	if (has_nulls && lastNonNullValOffset > -1)
! 		hoff += BITMAPLEN(lastNonNullValOffset + 1);
  	if (newtup->t_data->t_infomask & HEAP_HASOID)
  		hoff += sizeof(Oid);
  	hoff = MAXALIGN(hoff);
***************
*** 869,874 **** toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
--- 874,886 ----
  		int32		new_tuple_len;
  
  		/*
+ 		 * Only store up to the last non-null attribute in side tuple i.e
+ 		 * truncate the trailing null columns. Once the tuple is formed reset
+ 		 * numAttrs to previous value
+ 		 */
+ 		numAttrs = lastNonNullValOffset + 1;
+ 
+ 		/*
  		 * Calculate the new size of the tuple.
  		 *
  		 * Note: we used to assume here that the old tuple's t_hoff must equal
***************
*** 879,885 **** toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
  		 * whether there needs to be one at all.
  		 */
  		new_header_len = offsetof(HeapTupleHeaderData, t_bits);
! 		if (has_nulls)
  			new_header_len += BITMAPLEN(numAttrs);
  		if (olddata->t_infomask & HEAP_HASOID)
  			new_header_len += sizeof(Oid);
--- 891,897 ----
  		 * whether there needs to be one at all.
  		 */
  		new_header_len = offsetof(HeapTupleHeaderData, t_bits);
! 		if (has_nulls && numAttrs > 0)
  			new_header_len += BITMAPLEN(numAttrs);
  		if (olddata->t_infomask & HEAP_HASOID)
  			new_header_len += sizeof(Oid);
***************
*** 914,920 **** toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
  						(char *) new_data + new_header_len,
  						new_data_len,
  						&(new_data->t_infomask),
! 						has_nulls ? new_data->t_bits : NULL);
  	}
  	else
  		result_tuple = newtup;
--- 926,935 ----
  						(char *) new_data + new_header_len,
  						new_data_len,
  						&(new_data->t_infomask),
! 						has_nulls ? new_data->t_bits : NULL,
! 						numAttrs);
! 
! 		numAttrs = tupleDesc->natts;
  	}
  	else
  		result_tuple = newtup;
***************
*** 1147,1153 **** toast_flatten_tuple_attribute(Datum value,
  					(char *) new_data + new_header_len,
  					new_data_len,
  					&(new_data->t_infomask),
! 					has_nulls ? new_data->t_bits : NULL);
  
  	/*
  	 * Free allocated temp values
--- 1162,1169 ----
  					(char *) new_data + new_header_len,
  					new_data_len,
  					&(new_data->t_infomask),
! 					has_nulls ? new_data->t_bits : NULL,
! 					numAttrs);
  
  	/*
  	 * Free allocated temp values
*** a/src/include/access/htup_details.h
--- b/src/include/access/htup_details.h
***************
*** 618,624 **** extern Size heap_compute_data_size(TupleDesc tupleDesc,
  extern void heap_fill_tuple(TupleDesc tupleDesc,
  				Datum *values, bool *isnull,
  				char *data, Size data_size,
! 				uint16 *infomask, bits8 *bit);
  extern bool heap_attisnull(HeapTuple tup, int attnum);
  extern Datum nocachegetattr(HeapTuple tup, int attnum,
  			   TupleDesc att);
--- 618,624 ----
  extern void heap_fill_tuple(TupleDesc tupleDesc,
  				Datum *values, bool *isnull,
  				char *data, Size data_size,
! 				uint16 *infomask, bits8 *bit, int lastNonNullAttributeOffset);
  extern bool heap_attisnull(HeapTuple tup, int attnum);
  extern Datum nocachegetattr(HeapTuple tup, int attnum,
  			   TupleDesc att);
*** /dev/null
--- b/src/test/regress/expected/trailing_nulls.out
***************
*** 0 ****
--- 1,210 ----
+ --
+ -- The row bitmap and row length don't track trailing nulls. Verify that this works correctly.
+ --
+ -- simple case
+ CREATE TABLE trailing_null (
+ 	c1 varchar(10), 
+ 	c2 varchar(10), 
+ 	c3 varchar(10), 
+ 	c4 varchar(10), 
+ 	c5 varchar(10) 
+ 	);
+ 	
+ INSERT INTO trailing_null VALUES (null, null, null, null, null);
+ INSERT INTO trailing_null VALUES (null, null, null, null, '5');
+ INSERT INTO trailing_null VALUES (null, null, null, '4', null);
+ INSERT INTO trailing_null VALUES (null, null, '3', null, null);
+ INSERT INTO trailing_null VALUES (null, '2', null, null, null);
+ INSERT INTO trailing_null VALUES ('1', null, null, null, null);
+ SELECT * FROM trailing_null ORDER BY c1, c2, c3, c4, c5;
+  c1 | c2 | c3 | c4 | c5 
+ ----+----+----+----+----
+  1  |    |    |    | 
+     | 2  |    |    | 
+     |    | 3  |    | 
+     |    |    | 4  | 
+     |    |    |    | 5
+     |    |    |    | 
+ (6 rows)
+ 
+ CREATE INDEX trailing_nulli ON trailing_null (c1, c2, c3, c4, c5);
+ INSERT INTO trailing_null VALUES ('1', null, null, null, null);
+ SELECT * FROM trailing_null ORDER BY c1, c2, c3, c4, c5;
+  c1 | c2 | c3 | c4 | c5 
+ ----+----+----+----+----
+  1  |    |    |    | 
+  1  |    |    |    | 
+     | 2  |    |    | 
+     |    | 3  |    | 
+     |    |    | 4  | 
+     |    |    |    | 5
+     |    |    |    | 
+ (7 rows)
+ 
+ DROP TABLE trailing_null;
+ -- corner case, no columns
+ CREATE TABLE trailing_null ();
+ DROP TABLE trailing_null;
+ -- toastable
+ CREATE TABLE trailing_null (
+ 	c1 varchar(500), 
+ 	c2 varchar(500), 
+ 	c3 varchar(500), 
+ 	c4 varchar(500), 
+ 	c5 varchar(500), 
+ 	c6 varchar(500), 
+ 	c7 varchar(500), 
+ 	c8 varchar(500), 
+ 	c9 varchar(500), 
+ 	c10 varchar(500), 
+ 	c11 varchar(500), 
+ 	c12 varchar(500), 
+ 	c13 varchar(500), 
+ 	c14 varchar(500), 
+ 	c15 varchar(500), 
+ 	c16 varchar(500), 
+ 	c17 varchar(500)
+ 	);
+ -- toast: easily compressed
+ INSERT INTO trailing_null VALUES (
+ 	rpad('c1', 450, 'x'),
+ 	rpad('c2', 450, 'x'),
+ 	rpad('c3', 450, 'x'),
+ 	rpad('c4', 450, 'x'),
+ 	rpad('c5', 450, 'x'),
+ 	rpad('c6', 450, 'x'),
+ 	rpad('c7', 450, 'x'),
+ 	rpad('c8', 450, 'x'),
+ 	rpad('c9', 450, 'x'),
+ 	rpad('c10', 450, 'x'),
+ 	rpad('c11', 450, 'x'),
+ 	rpad('c12', 450, 'x'),
+ 	rpad('c13', 450, 'x'),
+ 	rpad('c14', 450, 'x'),
+ 	rpad('c15', 450, 'x'),
+ 	rpad('c16', 450, 'x'),
+ 	rpad('c17', 450, 'x')
+ 	);
+ -- toast: off row storage
+ INSERT INTO trailing_null VALUES (
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3)
+ );
+ -- verify that we have rows in our toast storage
+ VACUUM FULL trailing_null;
+ VACUUM ANALYZE trailing_null;
+ SELECT CAST(reltuples AS INTEGER) FROM pg_class
+ 	WHERE oid = (SELECT reltoastrelid FROM pg_class where relname = 'trailing_null');
+  reltuples 
+ -----------
+          4
+ (1 row)
+ 
+ 	
+ -- non-toast, separate codepath
+ UPDATE trailing_null SET c17 = null;
+ SELECT length(c1), length(c16), c17 FROM trailing_null;
+  length | length | c17 
+ --------+--------+-----
+     450 |    450 | 
+     336 |    336 | 
+ (2 rows)
+ 
+ VACUUM FULL trailing_null;
+ VACUUM ANALYZE trailing_null;
+ SELECT CAST(reltuples AS INTEGER) FROM pg_class
+ 	WHERE oid = (SELECT reltoastrelid FROM pg_class where relname = 'trailing_null');
+  reltuples 
+ -----------
+          4
+ (1 row)
+ 
+ UPDATE trailing_null SET c16 = null;
+ SELECT length(c1), c16, c17 FROM trailing_null;
+  length | c16 | c17 
+ --------+-----+-----
+     450 |     | 
+     336 |     | 
+ (2 rows)
+ 
+ VACUUM FULL trailing_null;
+ VACUUM ANALYZE trailing_null;
+ SELECT CAST(reltuples AS INTEGER) FROM pg_class
+ 	WHERE oid = (SELECT reltoastrelid FROM pg_class where relname = 'trailing_null');
+  reltuples 
+ -----------
+          4
+ (1 row)
+ 
+ UPDATE trailing_null SET
+ 	c1 = null,
+ 	c2 = null,
+ 	c3 = null,
+ 	c4 = null,
+ 	c5 = null,
+ 	c6 = null,
+ 	c7 = null,
+ 	c8 = null,
+ 	c9 = null,
+ 	c10 = null,
+ 	c11 = null,
+ 	c12 = null,
+ 	c13 = null,
+ 	c14 = null,
+ 	c15 = null,
+ 	c16 = null,
+ 	c17 = null;
+ -- verify that we got rid of our toasted data
+ VACUUM FULL trailing_null;
+ VACUUM ANALYZE trailing_null;
+ SELECT CAST(reltuples AS INTEGER) FROM pg_class
+ 	WHERE oid = (SELECT reltoastrelid FROM pg_class where relname = 'trailing_null');
+  reltuples 
+ -----------
+          4
+ (1 row)
+ 
+ 	
+ DROP TABLE trailing_null;
+ -- Let's hit the detoasting logic related to composite objects. Borrowed
+ -- from rowtypes.sql
+ -- The object here is to ensure that toasted references inside
+ -- composite values don't cause problems.  The large f1 value will
+ -- be toasted inside pp, it must still work after being copied to people.
+ CREATE TYPE fullname as (first text, last text, extra text);
+ CREATE temp TABLE pp (f1 text, c1 varchar(10) null);
+ INSERT INTO pp VALUES (repeat('abcdefghijkl', 100000), null);
+ CREATE temp TABLE people (fn fullname, c1 varchar(10), c2 varchar(10));
+ -- trailing null in our nested object 
+ INSERT into people SELECT('Jim', f1, null)::fullname, c1, null from pp;
+ SELECT (fn).first, length((fn).last), c1, c2 from people;
+  first | length  | c1 | c2 
+ -------+---------+----+----
+  Jim   | 1200000 |    | 
+ (1 row)
+ 
+ ALTER TABLE people ADD COLUMN c3 varchar(10);
+ SELECT (fn).first, length((fn).last), c1, c2, c3 from people;
+  first | length  | c1 | c2 | c3 
+ -------+---------+----+----+----
+  Jim   | 1200000 |    |    | 
+ (1 row)
+ 
+ DROP TABLE pp;
+ DROP TABLE people;
+ DROP TYPE fullname;
*** a/src/test/regress/parallel_schedule
--- b/src/test/regress/parallel_schedule
***************
*** 98,104 **** test: event_trigger
  # ----------
  # Another group of parallel tests
  # ----------
! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock json
  
  # ----------
  # Another group of parallel tests
--- 98,104 ----
  # ----------
  # Another group of parallel tests
  # ----------
! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock json trailing_nulls 
  
  # ----------
  # Another group of parallel tests
*** /dev/null
--- b/src/test/regress/sql/trailing_nulls.sql
***************
*** 0 ****
--- 1,166 ----
+ --
+ -- The row bitmap and row length don't track trailing nulls. Verify that this works correctly.
+ --
+ 
+ -- simple case
+ CREATE TABLE trailing_null (
+ 	c1 varchar(10), 
+ 	c2 varchar(10), 
+ 	c3 varchar(10), 
+ 	c4 varchar(10), 
+ 	c5 varchar(10) 
+ 	);
+ 	
+ INSERT INTO trailing_null VALUES (null, null, null, null, null);
+ INSERT INTO trailing_null VALUES (null, null, null, null, '5');
+ INSERT INTO trailing_null VALUES (null, null, null, '4', null);
+ INSERT INTO trailing_null VALUES (null, null, '3', null, null);
+ INSERT INTO trailing_null VALUES (null, '2', null, null, null);
+ INSERT INTO trailing_null VALUES ('1', null, null, null, null);
+ 
+ SELECT * FROM trailing_null ORDER BY c1, c2, c3, c4, c5;
+ 
+ CREATE INDEX trailing_nulli ON trailing_null (c1, c2, c3, c4, c5);
+ 
+ INSERT INTO trailing_null VALUES ('1', null, null, null, null);
+ 
+ SELECT * FROM trailing_null ORDER BY c1, c2, c3, c4, c5;
+ 
+ DROP TABLE trailing_null;
+ 
+ -- corner case, no columns
+ CREATE TABLE trailing_null ();
+ DROP TABLE trailing_null;
+ 
+ 
+ -- toastable
+ CREATE TABLE trailing_null (
+ 	c1 varchar(500), 
+ 	c2 varchar(500), 
+ 	c3 varchar(500), 
+ 	c4 varchar(500), 
+ 	c5 varchar(500), 
+ 	c6 varchar(500), 
+ 	c7 varchar(500), 
+ 	c8 varchar(500), 
+ 	c9 varchar(500), 
+ 	c10 varchar(500), 
+ 	c11 varchar(500), 
+ 	c12 varchar(500), 
+ 	c13 varchar(500), 
+ 	c14 varchar(500), 
+ 	c15 varchar(500), 
+ 	c16 varchar(500), 
+ 	c17 varchar(500)
+ 	);
+ 
+ -- toast: easily compressed
+ INSERT INTO trailing_null VALUES (
+ 	rpad('c1', 450, 'x'),
+ 	rpad('c2', 450, 'x'),
+ 	rpad('c3', 450, 'x'),
+ 	rpad('c4', 450, 'x'),
+ 	rpad('c5', 450, 'x'),
+ 	rpad('c6', 450, 'x'),
+ 	rpad('c7', 450, 'x'),
+ 	rpad('c8', 450, 'x'),
+ 	rpad('c9', 450, 'x'),
+ 	rpad('c10', 450, 'x'),
+ 	rpad('c11', 450, 'x'),
+ 	rpad('c12', 450, 'x'),
+ 	rpad('c13', 450, 'x'),
+ 	rpad('c14', 450, 'x'),
+ 	rpad('c15', 450, 'x'),
+ 	rpad('c16', 450, 'x'),
+ 	rpad('c17', 450, 'x')
+ 	);
+ 
+ -- toast: off row storage
+ INSERT INTO trailing_null VALUES (
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3)
+ );
+ 
+ -- verify that we have rows in our toast storage
+ VACUUM FULL trailing_null;
+ VACUUM ANALYZE trailing_null;
+ SELECT CAST(reltuples AS INTEGER) FROM pg_class
+ 	WHERE oid = (SELECT reltoastrelid FROM pg_class where relname = 'trailing_null');
+ 	
+ -- non-toast, separate codepath
+ UPDATE trailing_null SET c17 = null;
+ SELECT length(c1), length(c16), c17 FROM trailing_null;
+ VACUUM FULL trailing_null;
+ VACUUM ANALYZE trailing_null;
+ SELECT CAST(reltuples AS INTEGER) FROM pg_class
+ 	WHERE oid = (SELECT reltoastrelid FROM pg_class where relname = 'trailing_null');
+ 
+ UPDATE trailing_null SET c16 = null;
+ SELECT length(c1), c16, c17 FROM trailing_null;
+ VACUUM FULL trailing_null;
+ VACUUM ANALYZE trailing_null;
+ SELECT CAST(reltuples AS INTEGER) FROM pg_class
+ 	WHERE oid = (SELECT reltoastrelid FROM pg_class where relname = 'trailing_null');
+ 
+ UPDATE trailing_null SET
+ 	c1 = null,
+ 	c2 = null,
+ 	c3 = null,
+ 	c4 = null,
+ 	c5 = null,
+ 	c6 = null,
+ 	c7 = null,
+ 	c8 = null,
+ 	c9 = null,
+ 	c10 = null,
+ 	c11 = null,
+ 	c12 = null,
+ 	c13 = null,
+ 	c14 = null,
+ 	c15 = null,
+ 	c16 = null,
+ 	c17 = null;
+ 
+ -- verify that we got rid of our toasted data
+ VACUUM FULL trailing_null;
+ VACUUM ANALYZE trailing_null;
+ SELECT CAST(reltuples AS INTEGER) FROM pg_class
+ 	WHERE oid = (SELECT reltoastrelid FROM pg_class where relname = 'trailing_null');
+ 	
+ DROP TABLE trailing_null;
+ 
+ -- Let's hit the detoasting logic related to composite objects. Borrowed
+ -- from rowtypes.sql
+ 
+ -- The object here is to ensure that toasted references inside
+ -- composite values don't cause problems.  The large f1 value will
+ -- be toasted inside pp, it must still work after being copied to people.
+ CREATE TYPE fullname as (first text, last text, extra text);
+ CREATE temp TABLE pp (f1 text, c1 varchar(10) null);
+ INSERT INTO pp VALUES (repeat('abcdefghijkl', 100000), null);
+ CREATE temp TABLE people (fn fullname, c1 varchar(10), c2 varchar(10));
+ 
+ -- trailing null in our nested object 
+ INSERT into people SELECT('Jim', f1, null)::fullname, c1, null from pp;
+ SELECT (fn).first, length((fn).last), c1, c2 from people;
+ ALTER TABLE people ADD COLUMN c3 varchar(10);
+ SELECT (fn).first, length((fn).last), c1, c2, c3 from people;
+ 
+ DROP TABLE pp;
+ DROP TABLE people;
+ DROP TYPE fullname;
#4Simon Riggs
simon@2ndQuadrant.com
In reply to: Amit kapila (#1)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On 13 October 2012 08:54, Amit kapila <amit.kapila@huawei.com> wrote:

As per the last discussion for this patch, performance data needs to be
provided before this patch's Review can proceed further.

So as per your suggestion and from the discussions about this patch, I have
collected the performance data as below:

Results are taken with following configuration.
1. Schema - UNLOGGED TABLE with 2,000,000 records having all columns are INT
type.
2. shared_buffers = 10GB
3. All the performance result are taken with single connection.
4. Performance is collected for INSERT operation (insert into temptable
select * from inittable)

Platform details:
Operating System: Suse-Linux 10.2 x86_64
Hardware : 4 core (Intel(R) Xeon(R) CPU L5408 @ 2.13GHz)
RAM : 24GB

Documents Attached:
init.sh : Which will create the schema
sql_used.sql : sql's used for taking results

Trim_Nulls_Perf_Report.html : Performance data

Observations from Performance Results

------------------------------------------------

1. There is no performance change for cloumns that have all valid
values(non- NULLs).

2. There is a visible performance increase when number of columns containing
NULLS are more than > 60~70% in table have large number of columns.

3. There are visible space savings when number of columns containing NULLS
are more than > 60~70% in table have large number of columns.

Let me know if there is more performance data needs to be collected for this
patch?

I can't make sense of your performance report. Because of that I can't
derive the same conclusions from it you do.

Can you explain the performance results in more detail, so we can see
what they mean? Like which are the patched, which are the unpatched
results? Which results are comparable, what the percentages mean etc..

We might then move quickly towards commit, or at least more tests.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Amit Kapila
amit.kapila@huawei.com
In reply to: Simon Riggs (#4)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On Thursday, December 20, 2012 5:46 PM Simon Riggs wrote:

On 13 October 2012 08:54, Amit kapila <amit.kapila@huawei.com> wrote:

As per the last discussion for this patch, performance data needs to

be

provided before this patch's Review can proceed further.

So as per your suggestion and from the discussions about this patch,

I have

------------------------------------------------

1. There is no performance change for cloumns that have all valid
values(non- NULLs).

2. There is a visible performance increase when number of columns

containing

NULLS are more than > 60~70% in table have large number of columns.

3. There are visible space savings when number of columns containing

NULLS

are more than > 60~70% in table have large number of columns.

Let me know if there is more performance data needs to be collected

for this

patch?

I can't make sense of your performance report. Because of that I can't
derive the same conclusions from it you do.

Can you explain the performance results in more detail, so we can see
what they mean? Like which are the patched, which are the unpatched
results?

On the extreme let it is mentioned Original Code/ Trim Triling Nulls Patch.
In any case I have framed the results again as below:
1. Table with 800 columns
A. INSERT tuples with 600 trailing nulls
B. UPDATE last column value to "non-null"
C. UPDATE last column value to "null"
---------------------+---------------------+---------------------
Original Code | Trim Tailing NULLs | Improvement (%)
TPS space used| TPS space used | Results
(pages) | (pages) |
---------------------+---------------------+----------------------
1A: 0.2068 250000 | 0.2302 222223 | 10.1% tps, 11.1% space
1B: 0.0448 500000 | 0.0481 472223 | 6.8% tps, 5.6% space
1C: 0.0433 750000 | 0.0493 694445 | 12.2% tps, 7.4% space

2. Table with 800 columns
A. INSERT tuples with 300 trailing nulls
B. UPDATE last column value to "non-null"
C. UPDATE last column value to "null"
---------------------+---------------------+---------------------
Original Code | Trim Tailing NULLs | Improvement (%)
TPS space used| TPS space used | Results
(pages) | (pages) |
---------------------+---------------------+----------------------
2A: 0.0280 666667 | 0.0287 666667 | 2.3% tps, 0% space
2B: 0.0143 1333334 | 0.0152 1333334 | 5.3% tps, 0% space
2C: 0.0145 2000000 | 0.0149 2000000 | 2.9% tps, 0% space

3. Table with 300 columns
A. INSERT tuples with 150 trailing nulls
B. UPDATE last column value to "non-null"
C. UPDATE last column value to "null"
---------------------+---------------------+--------------------
Original Code | Trim Tailing NULLs | Improvement (%)
TPS space used| TPS space used | Results
(pages) | (pages) |
---------------------+---------------------+--------------------
3A: 0.2815 166667 | 0.2899 166667 | 2.9% tps, 0% space
3B: 0.0851 333334 | 0.0870 333334 | 2.2% tps, 0% space
3C: 0.0846 500000 | 0.0852 500000 | 0.7% tps, 0% space

4. Table with 300 columns
A. INSERT tuples with 250 trailing nulls
B. UPDATE last column value to "non-null"
C. UPDATE last column value to "null"
---------------------+---------------------+-------------------------
Original Code | Trim Tailing NULLs | Improvement (%)
TPS space used| TPS space used | Results
(pages) | (pages) |
---------------------+---------------------+-------------------------
4A: 0.5447 66667 | 0.5996 58824 | 09.2% tps, 11.8% space
4B: 0.1251 135633 | 0.1232 127790 | -01.5% tps, 5.8% space
4C: 0.1223 202299 | 0.1361 186613 | 10.1% tps, 7.5% space

Please let me know, if still it is not clear.

With Regards,
Amit Kapila.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Simon Riggs
simon@2ndQuadrant.com
In reply to: Amit kapila (#1)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On 20 December 2012 14:56, Amit Kapila <amit.kapila@huawei.com> wrote:

1. There is no performance change for cloumns that have all valid
values(non- NULLs).

I don't see any tests (at all) that measure this.

I'm particularly interested in lower numbers of columns, so we can
show no regression for the common case.

2. There is a visible performance increase when number of columns

containing

NULLS are more than > 60~70% in table have large number of columns.

3. There are visible space savings when number of columns containing

NULLS

are more than > 60~70% in table have large number of columns.

Agreed.

I would call that quite disappointing though and was expecting better.
Are we sure the patch works and the tests are correct?

The lack of any space saving for lower % values is strange and
somewhat worrying. There should be a 36? byte saving for 300 null
columns in an 800 column table - how does that not show up at all?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#6)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

Simon Riggs <simon@2ndQuadrant.com> writes:

The lack of any space saving for lower % values is strange and
somewhat worrying. There should be a 36? byte saving for 300 null
columns in an 800 column table - how does that not show up at all?

You could only fit about 4 such rows in an 8K page (assuming the columns
are all int4s). Unless the savings is enough to allow 5 rows to fit in
a page, the effective savings will be zilch.

This may well mean that the whole thing is a waste of time in most
scenarios --- the more likely it is to save anything, the more likely
that the savings will be lost anyway due to page alignment
considerations, because wider rows inherently pack less efficiently.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#7)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On 23 December 2012 17:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

The lack of any space saving for lower % values is strange and
somewhat worrying. There should be a 36? byte saving for 300 null
columns in an 800 column table - how does that not show up at all?

You could only fit about 4 such rows in an 8K page (assuming the columns
are all int4s). Unless the savings is enough to allow 5 rows to fit in
a page, the effective savings will be zilch.

If that's the case, the use case is tiny, especially considering how
sensitive the saving is to the exact location of the NULLs.

This may well mean that the whole thing is a waste of time in most
scenarios --- the more likely it is to save anything, the more likely
that the savings will be lost anyway due to page alignment
considerations, because wider rows inherently pack less efficiently.

ISTM that we'd get a better gain and a wider use case by compressing
the whole block, with some bits masked out to allow updates/deletes.
The string of zeroes in the null bitmap would compress easily, but so
would other aspects also.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Amit Kapila
amit.kapila@huawei.com
In reply to: Simon Riggs (#6)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On Sunday, December 23, 2012 8:11 PM Simon Riggs wrote:

On 20 December 2012 14:56, Amit Kapila <amit.kapila@huawei.com> wrote:

1. There is no performance change for cloumns that have all valid
values(non- NULLs).

I don't see any tests (at all) that measure this.

I'm particularly interested in lower numbers of columns, so we can
show no regression for the common case.

For now I have taken for 300 columns, I can take for 10~30 columns reading
as well if required

1. Table with 300 columns (all integer columns)
A. INSERT tuples without trailing nulls
B. UPDATE last column value to "null"
----------------+---------------------+------------------
Original Code | Trim Tailing NULLs | Improvement (%)
TPS | TPS | Results
----------------+---------------------+------------------
1A: 0.1348 | 0.1352 | 0.3%
1B: 0.0495 | 0.0495 | 0.0%

2. There is a visible performance increase when number of columns

containing

NULLS are more than > 60~70% in table have large number of

columns.

3. There are visible space savings when number of columns

containing

NULLS

are more than > 60~70% in table have large number of columns.

Agreed.

I would call that quite disappointing though and was expecting better.
Are we sure the patch works and the tests are correct?

The lack of any space saving for lower % values is strange and
somewhat worrying. There should be a 36? byte saving for 300 null
columns in an 800 column table - how does that not show up at all?

300 NULL's case will save approximately 108 bytes, as 3 tuples will be
accommodated in such case.
So now the total space left in page will be approximately 1900 bytes
(including 108 bytes saved by optimization).
Now the point is that in existing test case all rows are same (approx 2100
bytes), so no space saving is shown, but incase the last row is such that it
can get accommodated in space saved (remaining space of page + space saved
due to NULLS optimization), then it can show space savings as well.

In anycase there is a performance gain for 300 NULLS case as well.

Apart from above, the performance data for less number of columns (where the
trailing nulls are such that they cross word boundary) also show similar
gains:

The below cases (2 & 3) can give benefit as it will save 4 bytes per tuple

2. Table with 12 columns (first 3 integer followed by 9 Boolean columns)
A. INSERT tuples with 9 trailing nulls
B. UPDATE last column value to "non-null"
C. UPDATE last column value to "null"
---------------------+---------------------+---------------------
Original Code | Trim Tailing NULLs | Improvement (%)
TPS space used| TPS space used | Results
(pages) | (pages) |
---------------------+---------------------+----------------------
2A: 0.8485 12739 | 0.8524 10811 | 0.4% 15.1%
2B: 0.5847 25478 | 0.5749 23550 | -1.5% 7.5%
2C: 0.5591 38217 | 0.5545 34361 | 0.8% 10.0%

3. Table with 12 columns (first 3 integer followed by 9 Boolean columns)
A. INSERT tuples with 4 trailing nulls
B. UPDATE last column value to "non-null"
C. UPDATE last column value to "null"
---------------------+---------------------+---------------------
Original Code | Trim Tailing NULLs | Improvement (%)
TPS space used| TPS space used | Results
(pages) | (pages) |
---------------------+---------------------+----------------------
3A: 0.8443 14706 | 0.8626 12739 | 2.3% 13.3%
3B: 0.5307 29412 | 0.5272 27445 | -0.6% 6.7%
3C: 0.5102 44118 | 0.5218 40184 | 2.2% 8.9%

As a conclusion point, I would like to say that this patch doesn't have
performance regression for most used scenario's
and it gives benefit in some of the trailing null's cases.

With Regards,
Amit Kapila.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Simon Riggs
simon@2ndQuadrant.com
In reply to: Amit kapila (#1)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On 24 December 2012 13:13, Amit Kapila <amit.kapila@huawei.com> wrote:

Apart from above, the performance data for less number of columns (where the
trailing nulls are such that they cross word boundary) also show similar
gains:

The below cases (2 & 3) can give benefit as it will save 4 bytes per tuple

2. Table with 12 columns (first 3 integer followed by 9 Boolean columns)
A. INSERT tuples with 9 trailing nulls
B. UPDATE last column value to "non-null"
C. UPDATE last column value to "null"
---------------------+---------------------+---------------------
Original Code | Trim Tailing NULLs | Improvement (%)
TPS space used| TPS space used | Results
(pages) | (pages) |
---------------------+---------------------+----------------------
2A: 0.8485 12739 | 0.8524 10811 | 0.4% 15.1%
2B: 0.5847 25478 | 0.5749 23550 | -1.5% 7.5%
2C: 0.5591 38217 | 0.5545 34361 | 0.8% 10.0%

3. Table with 12 columns (first 3 integer followed by 9 Boolean columns)
A. INSERT tuples with 4 trailing nulls
B. UPDATE last column value to "non-null"
C. UPDATE last column value to "null"
---------------------+---------------------+---------------------
Original Code | Trim Tailing NULLs | Improvement (%)
TPS space used| TPS space used | Results
(pages) | (pages) |
---------------------+---------------------+----------------------
3A: 0.8443 14706 | 0.8626 12739 | 2.3% 13.3%
3B: 0.5307 29412 | 0.5272 27445 | -0.6% 6.7%
3C: 0.5102 44118 | 0.5218 40184 | 2.2% 8.9%

As a conclusion point, I would like to say that this patch doesn't have
performance regression for most used scenario's
and it gives benefit in some of the trailing null's cases.

Not really sure about the 100s of columns use case.

But showing gain in useful places in these more common cases wins my vote.

Thanks for testing. Barring objections, will commit.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Kevin Grittner
kgrittn@mail.com
In reply to: Simon Riggs (#10)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

Simon Riggs wrote:

Not really sure about the 100s of columns use case.

But showing gain in useful places in these more common cases wins
my vote.

Thanks for testing. Barring objections, will commit.

Do we have any results on just a plain, old stock pgbench run, with
the default table definitions?

That would be a reassuring complement to the other tests.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Amit Kapila
amit.kapila@huawei.com
In reply to: Kevin Grittner (#11)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On Monday, December 24, 2012 7:58 PM Kevin Grittner wrote:

Simon Riggs wrote:

Not really sure about the 100s of columns use case.

But showing gain in useful places in these more common cases wins
my vote.

Thanks for testing. Barring objections, will commit.

Do we have any results on just a plain, old stock pgbench run, with
the default table definitions?

That would be a reassuring complement to the other tests.

Shall run pgbench tpc_b with scalefactor - 50 for 10 mins with/without patch and will post the results.
Kindly let me know if you feel any other test needs to be run.

With Regards,
Amit Kapila.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Amit Kapila
amit.kapila@huawei.com
In reply to: Kevin Grittner (#11)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On Monday, December 24, 2012 7:58 PM Kevin Grittner wrote:

Simon Riggs wrote:

Not really sure about the 100s of columns use case.

But showing gain in useful places in these more common cases wins
my vote.

Thanks for testing. Barring objections, will commit.

Do we have any results on just a plain, old stock pgbench run, with
the default table definitions?

That would be a reassuring complement to the other tests.

Sever Configuration:
The database cluster will be initialized with locales
COLLATE: C
CTYPE: C
MESSAGES: en_US.UTF-8
MONETARY: en_US.UTF-8
NUMERIC: en_US.UTF-8
TIME: en_US.UTF-8

shared_buffers = 1GB
checkpoint_segments = 255
checkpoint_timeout = 15min

pgbench:
transaction type: TPC-B (sort of)
scaling factor: 75
query mode: simple
number of clients: 8
number of threads: 8
duration: 600 s

Performance: Average of 3 runs of pgbench in tps
9.3devel | with trailing null patch
----------+--------------------------
578.9872 | 573.4980

With Regards,
Amit Kapila.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#11)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On 24 December 2012 16:57, Amit Kapila <amit.kapila@huawei.com> wrote:

Performance: Average of 3 runs of pgbench in tps
9.3devel | with trailing null patch
----------+--------------------------
578.9872 | 573.4980

On balance, it would seem optimizing for this special case would
affect everybody negatively; not much, but enough. Which means we
should rekect this patch.

Do you have a reason why a different view might be taken?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Amit Kapila
amit.kapila@huawei.com
In reply to: Simon Riggs (#14)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On Wednesday, January 09, 2013 4:52 PM Simon Riggs wrote:

On 24 December 2012 16:57, Amit Kapila <amit.kapila@huawei.com> wrote:

Performance: Average of 3 runs of pgbench in tps
9.3devel | with trailing null patch
----------+--------------------------
578.9872 | 573.4980

On balance, it would seem optimizing for this special case would
affect everybody negatively; not much, but enough. Which means we
should rekect this patch.

Do you have a reason why a different view might be taken?

I have tried to dig why this gap is coming. I have observed that there is
very less change in normal path.
I wanted to give it some more time to exactly find if something can be done
to avoid performance dip in normal execution.

Right now I am busy in certain other work. But definitely in coming week or
so, I shall spare time to work on it again.

With Regards,
Amit Kapila.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#11)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On 9 January 2013 12:06, Amit Kapila <amit.kapila@huawei.com> wrote:

On Wednesday, January 09, 2013 4:52 PM Simon Riggs wrote:

On 24 December 2012 16:57, Amit Kapila <amit.kapila@huawei.com> wrote:

Performance: Average of 3 runs of pgbench in tps
9.3devel | with trailing null patch
----------+--------------------------
578.9872 | 573.4980

On balance, it would seem optimizing for this special case would
affect everybody negatively; not much, but enough. Which means we
should rekect this patch.

Do you have a reason why a different view might be taken?

I have tried to dig why this gap is coming. I have observed that there is
very less change in normal path.
I wanted to give it some more time to exactly find if something can be done
to avoid performance dip in normal execution.

Right now I am busy in certain other work. But definitely in coming week or
so, I shall spare time to work on it again.

Perhaps. Not every idea produces useful outcomes. Even after your
excellent research, it appears we haven't made this work yet. It's a
shame. Should we invest more time? It's considered rude to advise
others how to spend their time, but let me say this: we simply don't
have enough time to do everything and we need to be selective,
prioritising our time on to the things that look to give the best
benefit.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Amit Kapila
amit.kapila@huawei.com
In reply to: Simon Riggs (#16)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On Wednesday, January 09, 2013 5:45 PM Simon Riggs wrote:

On 9 January 2013 12:06, Amit Kapila <amit.kapila@huawei.com> wrote:

On Wednesday, January 09, 2013 4:52 PM Simon Riggs wrote:

On 24 December 2012 16:57, Amit Kapila <amit.kapila@huawei.com>

wrote:

Performance: Average of 3 runs of pgbench in tps
9.3devel | with trailing null patch
----------+--------------------------
578.9872 | 573.4980

On balance, it would seem optimizing for this special case would
affect everybody negatively; not much, but enough. Which means we
should rekect this patch.

Do you have a reason why a different view might be taken?

I have tried to dig why this gap is coming. I have observed that

there is

very less change in normal path.
I wanted to give it some more time to exactly find if something can

be done

to avoid performance dip in normal execution.

Right now I am busy in certain other work. But definitely in coming

week or

so, I shall spare time to work on it again.

Perhaps. Not every idea produces useful outcomes. Even after your
excellent research, it appears we haven't made this work yet. It's a
shame. Should we invest more time? It's considered rude to advise
others how to spend their time, but let me say this: we simply don't
have enough time to do everything and we need to be selective,
prioritising our time on to the things that look to give the best
benefit.

I think we can reject this patch.

With Regards,
Amit Kapila.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Noah Misch
noah@leadboat.com
In reply to: Simon Riggs (#14)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On Wed, Jan 09, 2013 at 11:22:06AM +0000, Simon Riggs wrote:

On 24 December 2012 16:57, Amit Kapila <amit.kapila@huawei.com> wrote:

Performance: Average of 3 runs of pgbench in tps
9.3devel | with trailing null patch
----------+--------------------------
578.9872 | 573.4980

On balance, it would seem optimizing for this special case would
affect everybody negatively; not much, but enough. Which means we
should rekect this patch.

Do you have a reason why a different view might be taken?

We've mostly ignored performance changes of a few percentage points, because
the global consequences of adding or removing code to the binary image can
easily change performance that much. It would be great to get to the point
where we can reason about 1% performance changes, but we're not there. If a
measured +1% performance change would have yielded a different decision, we
should rethink the decision based on more-robust criteria.

(Most of this was said in initial April 2012 discussion.) This patch is a
tough one, because it will rarely help the most-common workloads. If it can
reduce the tuple natts from 9 to 8, the tuple shrinks by a respectable eight
bytes. But if it reduces natts from 72 to 9, we save nothing. It pays off
nicely for the widest, sparsest tables: say, a table with 1000 columns, all
but ten are NULL, and those non-NULL columns are near the front of the table.
I've never seen such a design, but a few folks seemed to find it credible.

I've failed to come up with a non-arbitrary reason to recommend for or against
this patch, so I profess neutrality on the ultimate issue.

Thanks,
nm

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#18)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

Noah Misch <noah@leadboat.com> writes:

On Wed, Jan 09, 2013 at 11:22:06AM +0000, Simon Riggs wrote:

On balance, it would seem optimizing for this special case would
affect everybody negatively; not much, but enough. Which means we
should rekect this patch.

I've failed to come up with a non-arbitrary reason to recommend for or against
this patch, so I profess neutrality on the ultimate issue.

I think if we can't convincingly show an attractive performance gain,
we should reject the patch on the grounds of added complexity. Now,
the amount of added code surely isn't much, but nonetheless this patch
eliminates an invariant that's always been there (ie, that a tuple's
natts matches the tupdesc it was built with). That will at the very
least complicate forensic work, and it might well break third-party
code, or corners of the core system that we haven't noticed yet.

I'd be okay with this patch if it had more impressive performance
results, but as it is I think we're better off without it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Amit Kapila
amit.kapila@huawei.com
In reply to: Noah Misch (#18)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On Thursday, January 24, 2013 7:42 AM Noah Misch wrote:

On Wed, Jan 09, 2013 at 11:22:06AM +0000, Simon Riggs wrote:

On 24 December 2012 16:57, Amit Kapila <amit.kapila@huawei.com>

wrote:

Performance: Average of 3 runs of pgbench in tps
9.3devel | with trailing null patch
----------+--------------------------
578.9872 | 573.4980

On balance, it would seem optimizing for this special case would
affect everybody negatively; not much, but enough. Which means we
should rekect this patch.

Do you have a reason why a different view might be taken?

We've mostly ignored performance changes of a few percentage points,
because
the global consequences of adding or removing code to the binary image
can
easily change performance that much. It would be great to get to the
point
where we can reason about 1% performance changes, but we're not there.
If a
measured +1% performance change would have yielded a different
decision, we
should rethink the decision based on more-robust criteria.

Today, I had taken one more set of readings of original pg_bench which are
below in this mail. The difference is that this set of readings are on Suse
11 and with Shared buffers - 4G
IMO, the changes in this patch are not causing any regression, however it
gives performance/size reduction
in some of the cases as shown by data in previous mails.
So the point to decide is whether the usecases in which it is giving benefit
are worth to have this Patch?
As Tom had already raised some concern's about the code, I think the Patch
can only have merit if the usecase
makes sense to people.

System Configuration:
Hardware : 4 core (Intel(R) Xeon(R) CPU L5408 @ 2.13GHz) & RAM : 24GB
Operating System: Suse-Linux 11.2 x86_64

Sever Configuration:
The database cluster will be initialized with locales
COLLATE: C
CTYPE: C
MESSAGES: en_US.UTF-8
MONETARY: en_US.UTF-8
NUMERIC: en_US.UTF-8
TIME: en_US.UTF-8

shared_buffers = 4GB
checkpoint_segments = 255
checkpoint_timeout = 10min

pgbench:
transaction type: TPC-B (sort of)
scaling factor: 75
query mode: simple
number of clients: 4
number of threads: 4
duration: 300 s

original patch
Run-1: 311 312
Run-2: 307 302
Run-3: 300 312
Run-4: 285 295
Run-5: 308 305

Avg : 302.2 305.2

With Regards,
Amit Kapila.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Jameison Martin
jameisonb@yahoo.com
In reply to: Tom Lane (#19)
1 attachment(s)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

there have been a lot of different threads on this patch. i'm going to take a stab at teasing them out, summarizing them, and addressing them individually.

here is my categorization of the issues that have been raised about this patch:
1. is there really anyone that would actually benefit from this change?
2. what is the performance impact on more canonical use-cases (e.g. no trailing nulls)?
1. look at the impact on a micro benchmark of insert performance (Tom's original suggestion)
2. look at the impact on pgbench (the recent thread on this)
3. what is statistically relevant when interpreting benchmarks (Noah's thoughtful comments)?
3. what is the impact on the use-case this is meant to address?
1. space impact
2. performance impact
3. is this impact relevant ("attractive" in Tom's latest email)
* is the proposed change itself problematic?
1. is it overly complicated?
2. is there a more generalized approach that would be better (e.g. compress the bitmap)?
3. does it violate a principle in the code (Tom's latest email)?

1) is there really anyone (other than me) that would benefit from this change?

this is a tough one since it is so subjective. i mentioned a particular real world case of mine where this patch as is makes a very significant improvement. others have chimed in with examples where they have seen this in the wild (Josh Berkus and his telemetry example, and Greg Stark mentioned seeing patterns where users have wide schemas and trailing nulls though i'm not sure about the context). that said, it is hard to turn these examples into a generalized notion of how applicable it might be.

i think the primary beneficiaries of this change are enterprise application vendors (though there are clearly other users that would benefit like Josh). enterprise applications often need to deal with customizable fields in their object model. they typically model this with very wide schemas with a lot of generic nullable columns that have no semantic meaning until the customer decides to 'create' a custom field. i honestly suspect that this is actually a fairly common pattern with enterprise applications, though it is hard to say how big of a constituency this may be in the Postgres community. application developers at enterprise software vendors are often very savvy database users and generally know exactly how the database lays out rows and will consequently optimize their use of their wide schemas to minimize the space and performance impact of their custom fields. on other commercial databases trailing nulls are 'free' so enterprise application
developers carefully make use of generic columns designated as custom fields by using the first ones first. they may even 'recycle' columns in crafty ways to ensure they are using early columns as much as possible.

i can give more detail if folks would like to hear more about this.

also, i know there have been some suggestions about different ways to approach this issue of customized columns without without using such wide schemas. i can assure you that they are all even more problematic (generally in terms of performance). we can discuss this further as well. but i guess the main point here isn't whether this is the 'right' approach to application development, but that it is what people in fact are doing. 

in summary, it is difficult to quantify how often Postgres users have wide schemas with trailing nulls, but i believe it might be a fairly common pattern with enterprise software vendors that support customization, and it may even show up a bit in end-user applications as well (e.g. Josh's example).

2)  what is the performance impact on more canonical use-cases (e.g. no trailing nulls)?

a number of people (Tom, Simon, Kevin etc) have expressed a concern that any improvement to the particular use-case this patch is meant to address shouldn't negatively impact the performance of the more canonical use cases (e.g. no trailing nulls). i think there is a broad consensus on this (i agree as well). there are 3 sets of benchmark data addressing this concern. i generated a bunch of numbers doing a very specific micro-benchmark that attempts to isolate insert performance as suggested by Tom. that showed no performance degradation. Amit did some additional benchmarking. and finally there was a recent run of pgbench by Amit that originally showed a 1% degradation and a more recent run with 5 runs of both that showed a 0.2% degradation. [i'm very grateful to Amit for all of the work he has done on behalf of this, i'm sure it was more than he bargained for] 

since the pgbench numbers were the ones that caused the committed patch to be backed out it makes sense to address this at length. i agree with Noah's comments that maybe 1% performance changes shouldn't necessarily be treated as significant because of the inherent variance in test runs (stdev), because of the necessarily stricter and potentially onerous requirements this would likely impose on benchmarking in general (e.g. potentially require the benchmarker to flush the filesystem across runs, isolate the machine from the network, follow some rules for proper statical analysis of results, etc), and because even the changing size of a binary can have and a demonstrable impact some benchmarks. [parenthetically it is probably a good idea for someone to take a stab and formalizing the interpretation of pgbench results (what degree of isolation is required and so forth, thoughts about variance and so forth). if folks think this is a good idea a colleague is
willing to take a stab at this]

nonetheless, it seemed reasonable to me to look into Amit's original numbers more closely. Amit's more recent 5 iteration run number showed a ~0.2% performance degradation with the patch. that is arguably statistically irrelevant but it did seemed fairly repeatable, so a colleague of mine was kind enough to look into it. instead of trying to formally explain or eliminate this degradation he made a slight adjustment to the order of if/else in the patch and and now the patch seems marginally faster in pgbench (though arguably statistically irrelevant). i have attached this latest version of the patch and have included his pgbench numbers below.

so, i personally believe that the patch doesn't have a negative performance impact on the more canonical use-cases. there is a pretty significant amount of benchmarking to show this, and this makes intuitive sense if you examine the patch. so i believe the original reason for backing out the patch has been remedied by more extensive pgbench runs to remove a bit more noise and by a tiny adjustment in the code to make a minuscule performance improvement.

and, just to be clear, i'm not claiming that the patch actually improves the performance of canonical use cases. and i don't believe that anyone suggested that improving the performance of canonical use cases is a necessary requirement for this patch, but i might be mistaken about that.

3) what is the impact on the use-case this is meant to address?

i have produced some numbers that show significant space savings and performance improvements for the real world use-cases that i'm seeking to address. Amit and i have also demonstrated the impact of this change across a number of other scenarios (maybe 10 different scenarios in total).  in the particular real world use-case i'm trying to address -- a table with 800+ nullable columns of which only 50 are set on average -- the impact is  significant. there is a ~20% performance improvement on insert microbenchmarks and, more importantly to me, ~100 bytes saved per row on average. that is certainly relevant to me. given the challenges of achieving high page utilization with the mvcc/vacuuming model it might be argued that 100 bytes per row isn't important but it adds up in my case. 

but of course many people have pointed out that the impact of this change is roughly proportional to the number of trailing nulls we are talking about. this comes back to the original question of whether the benefit is general enough. it is hard for me to assess whether the impact is relevant for other use cases without a clear statement of those other use cases. Josh may have an opinion on his particular use case.

4) is the proposed change itself problematic?
4.1) complexity

the change complexity vs. the original code complexity is pretty subjective. the change basically just uses the number of non-null attributes in the row to format the row header instead of the tuple descriptor's natts. i obviously don't find the change excessively complicated but i have bias. i think the only thing to do is for individuals to look at the patch and see if the code has grown a more complicated or too complicated.

4.2) is there a more generalized approach that would be better (e.g. compress the bitmap)

i think it was Simon that suggested perhaps a more general approach such as compressing the bitmap might be better because it might be more widely applicable. this is certainly a valid point. ironically i took the simplistic approach of truncating the bitmap expressly because of its simplicity, and because i thought there might be a decent chance that a more complicated change might be rejected on the basis of the complexity alone (if nothing else there would be two bitmap formats that would need to be supported assuming no upgrade). that said, it would be interesting to prototype a bitmap compression approach and see how much it complicates the code and how it impacts performance. the added complexity vs. the potential gain of this approach may not make that much sense for my particular use-case, but it may for the community as a whole.

4.3) does it violate a principle in the code (Tom's latest email)

from my perspective, the code has to deal with the fact that natts on the persistent row is different than the tupdesc already, this is the foundation upon which ALTER TABLE ADD NULL and DROP COLUMN as a metadata operation are built. so i believe that this patch strengthens the code's ability to handle the ALTER TABLE ADD NULL case by generalizing it: now the code will more frequently need to deal with the disparity between natts and tupdesc rather than only after someone did an ALTER TABLE. i'm an advocate of making corner cases go through generalized code where possible.

however, in all honestly i hadn't consider Tom's point that the patch has created a situation where natts on a row can deviate from the tupdesc that it was built with (as opposed to the current tupdesc which it could always deviate due to a subsequent ALTER TABLE). this is a subtle point and i don't think i have enough experience in the Postgres code to argue one way or another about it. so i'll have to leave that determination up to people that are more familiar with the code.

thanks for reading all of this.

cheers.

-jamie

here are the performance numbers generated by my colleague against the current version of this patch (which is attached). to me this demonstrates that pgbench is not degraded, which was the reason the patch was backed out.

| | 20130121 | 20130121 |
| | vanilla | Modified patch |
|-------+----------+----------------|
| | 5213.30 | 5168.70 |
| | 5229.80 | 5158.28 |
| | 5183.47 | 5183.01 |
| | 5140.82 | 5217.11 |
| | 5180.93 | 5194.44 |
| | 5169.55 | 5202.02 |
| | 5199.78 | 5197.82 |
| | 5228.76 | 5175.75 |
| | 5187.35 | 5233.96 |
| | 5118.66 | 5221.44 |
| | 5176.20 | 5148.09 |
| | 5215.89 | 5192.26 |
| | 5221.13 | 5126.39 |
| | 5232.44 | 5164.58 |
| | 5188.14 | 5210.91 |
| | 5176.77 | 5200.41 |
| | 5218.18 | 5189.85 |
| | 5207.82 | 5158.01 |
| | 5203.77 | 5144.67 |
| | 5213.74 | 5171.70 |
| | 5154.11 | 5171.24 |
| | 5165.22 | 5174.00 |
| | 5196.96 | 5136.03 |
| | 5166.75 | 5176.05 |
| | 5116.69 | 5188.85 |
| | 5170.82 | 5197.51 |
| | 5124.63 | 5145.85 |
| | 5162.05 | 5190.66 |
| | 5198.82 | 5187.08 |
| | 5155.55 | 5199.11 |
| | 5166.95 | 5195.08 |
| | 5197.23 | 5170.88 |
| | 5145.07 | 5152.88 |
| | 5178.24 | 5170.48 |
| | 5128.73 | 5228.55 |
| | 5201.38 | 5189.90 |
| | 5161.96 | 5163.39 |
| | 5191.68 | 5164.13 |
| | 5193.44 | 5172.01 |
| | 5150.87 | 5140.21 |
| | 5118.95 | 5163.93 |
| | 5184.59 | 5145.37 |
| | 5135.52 | 5183.75 |
| | 5197.49 | 5173.54 |
| | 5186.67 | 5207.20 |
| | 5176.33 | 5183.88 |
| | 5185.09 | 5210.38 |
| | 5124.34 | 5182.11 |
|-------+----------+----------------|
| avg | 5178.50 | 5180.27 |
| stdev | 31.51 | 24.53 |
|-------+----------+----------------| Here is the output of pgbench: transaction type: TPC-B (sort of)
scaling factor: 75
query mode: simple
number of clients: 8
number of threads: 8
duration: 300 s I'm running with the database in /run/shm, and the following changes to
the standard postgresql.conf: shared_buffers = 1024MB
checkpoint_segments = 255
checkpoint_timeout = 15min

________________________________
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Noah Misch <noah@leadboat.com>
Cc: Simon Riggs <simon@2ndQuadrant.com>; Amit Kapila <amit.kapila@huawei.com>; Kevin Grittner <kgrittn@mail.com>; robertmhaas@gmail.com; josh@agliodbs.com; pgsql-hackers@postgresql.org; jameisonb@yahoo.com
Sent: Wednesday, January 23, 2013 8:13 PM
Subject: Re: [HACKERS] Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

Noah Misch <noah@leadboat.com> writes:

On Wed, Jan 09, 2013 at 11:22:06AM +0000, Simon Riggs wrote:

On balance, it would seem optimizing for this special case would
affect everybody negatively; not much, but enough. Which means we
should rekect this patch.

I've failed to come up with a non-arbitrary reason to recommend for or against
this patch, so I profess neutrality on the ultimate issue.

I think if we can't convincingly show an attractive performance gain,
we should reject the patch on the grounds of added complexity.  Now,
the amount of added code surely isn't much, but nonetheless this patch
eliminates an invariant that's always been there (ie, that a tuple's
natts matches the tupdesc it was built with).  That will at the very
least complicate forensic work, and it might well break third-party
code, or corners of the core system that we haven't noticed yet.

I'd be okay with this patch if it had more impressive performance
results, but as it is I think we're better off without it.

            regards, tom lane

Attachments:

1-24-2013-trailing-nulls.patchtext/x-patch; name=1-24-2013-trailing-nulls.patchDownload
*** a/src/backend/access/common/heaptuple.c
--- b/src/backend/access/common/heaptuple.c
***************
*** 133,154 **** void
  heap_fill_tuple(TupleDesc tupleDesc,
  				Datum *values, bool *isnull,
  				char *data, Size data_size,
! 				uint16 *infomask, bits8 *bit)
  {
  	bits8	   *bitP;
  	int			bitmask;
  	int			i;
- 	int			numberOfAttributes = tupleDesc->natts;
  	Form_pg_attribute *att = tupleDesc->attrs;
  
  #ifdef USE_ASSERT_CHECKING
  	char	   *start = data;
  #endif
  
  	if (bit != NULL)
  	{
  		bitP = &bit[-1];
  		bitmask = HIGHBIT;
  	}
  	else
  	{
--- 133,156 ----
  heap_fill_tuple(TupleDesc tupleDesc,
  				Datum *values, bool *isnull,
  				char *data, Size data_size,
! 				uint16 *infomask, bits8 *bit,
! 				int numberOfAttributes)
  {
  	bits8	   *bitP;
  	int			bitmask;
  	int			i;
  	Form_pg_attribute *att = tupleDesc->attrs;
  
  #ifdef USE_ASSERT_CHECKING
  	char	   *start = data;
  #endif
  
+ 	*infomask &= ~(HEAP_HASNULL | HEAP_HASVARWIDTH | HEAP_HASEXTERNAL);
  	if (bit != NULL)
  	{
  		bitP = &bit[-1];
  		bitmask = HIGHBIT;
+ 		*infomask |= HEAP_HASNULL;
  	}
  	else
  	{
***************
*** 157,163 **** heap_fill_tuple(TupleDesc tupleDesc,
  		bitmask = 0;
  	}
  
- 	*infomask &= ~(HEAP_HASNULL | HEAP_HASVARWIDTH | HEAP_HASEXTERNAL);
  
  	for (i = 0; i < numberOfAttributes; i++)
  	{
--- 159,164 ----
***************
*** 176,182 **** heap_fill_tuple(TupleDesc tupleDesc,
  
  			if (isnull[i])
  			{
- 				*infomask |= HEAP_HASNULL;
  				continue;
  			}
  
--- 177,182 ----
***************
*** 636,649 **** heap_form_tuple(TupleDesc tupleDescriptor,
  	int			hoff;
  	bool		hasnull = false;
  	Form_pg_attribute *att = tupleDescriptor->attrs;
! 	int			numberOfAttributes = tupleDescriptor->natts;
  	int			i;
  
! 	if (numberOfAttributes > MaxTupleAttributeNumber)
  		ereport(ERROR,
  				(errcode(ERRCODE_TOO_MANY_COLUMNS),
  				 errmsg("number of columns (%d) exceeds limit (%d)",
! 						numberOfAttributes, MaxTupleAttributeNumber)));
  
  	/*
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
--- 636,655 ----
  	int			hoff;
  	bool		hasnull = false;
  	Form_pg_attribute *att = tupleDescriptor->attrs;
! 	int			maxNumberOfAttributes = tupleDescriptor->natts;
  	int			i;
+ 	int 		lastNonNullAttribute = -1;
+ 	int 		numberOfAttributes; /* The actual number of attributes for this particular row.
+ 	 	 	 	 	 	 	 	 	 * trailing nulls aren't stored so this may be smaller than
+ 	 	 	 	 	 	 	 	 	 * maxNumberOfAttributes
+ 	 	 	 	 	 	 	 	 	 */
  
! 
! 	if (maxNumberOfAttributes > MaxTupleAttributeNumber)
  		ereport(ERROR,
  				(errcode(ERRCODE_TOO_MANY_COLUMNS),
  				 errmsg("number of columns (%d) exceeds limit (%d)",
! 						maxNumberOfAttributes, MaxTupleAttributeNumber)));
  
  	/*
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
***************
*** 656,675 **** heap_form_tuple(TupleDesc tupleDescriptor,
  	 * if an attribute is already toasted, it must have been sent to disk
  	 * already and so cannot contain toasted attributes.
  	 */
! 	for (i = 0; i < numberOfAttributes; i++)
  	{
! 		if (isnull[i])
! 			hasnull = true;
! 		else if (att[i]->attlen == -1 &&
  				 att[i]->attalign == 'd' &&
  				 att[i]->attndims == 0 &&
  				 !VARATT_IS_EXTENDED(DatumGetPointer(values[i])))
! 		{
! 			values[i] = toast_flatten_tuple_attribute(values[i],
  													  att[i]->atttypid,
  													  att[i]->atttypmod);
  		}
  	}
  
  	/*
  	 * Determine total space needed
--- 662,687 ----
  	 * if an attribute is already toasted, it must have been sent to disk
  	 * already and so cannot contain toasted attributes.
  	 */
! 	for (i = 0; i < maxNumberOfAttributes; i++)
  	{
! 		if (!isnull[i])
! 		{
! 			lastNonNullAttribute = i;
! 
! 			if (att[i]->attlen == -1 &&
  				 att[i]->attalign == 'd' &&
  				 att[i]->attndims == 0 &&
  				 !VARATT_IS_EXTENDED(DatumGetPointer(values[i])))
! 			{
! 				values[i] = toast_flatten_tuple_attribute(values[i],
  													  att[i]->atttypid,
  													  att[i]->atttypmod);
+ 			}
  		}
+ 		else
+ 			hasnull = true;
  	}
+ 	numberOfAttributes = lastNonNullAttribute + 1;
  
  	/*
  	 * Determine total space needed
***************
*** 719,725 **** heap_form_tuple(TupleDesc tupleDescriptor,
  					(char *) td + hoff,
  					data_len,
  					&td->t_infomask,
! 					(hasnull ? td->t_bits : NULL));
  
  	return tuple;
  }
--- 731,738 ----
  					(char *) td + hoff,
  					data_len,
  					&td->t_infomask,
! 					(hasnull ? td->t_bits : NULL),
! 					numberOfAttributes);
  
  	return tuple;
  }
***************
*** 1390,1403 **** heap_form_minimal_tuple(TupleDesc tupleDescriptor,
  	int			hoff;
  	bool		hasnull = false;
  	Form_pg_attribute *att = tupleDescriptor->attrs;
! 	int			numberOfAttributes = tupleDescriptor->natts;
  	int			i;
  
! 	if (numberOfAttributes > MaxTupleAttributeNumber)
  		ereport(ERROR,
  				(errcode(ERRCODE_TOO_MANY_COLUMNS),
  				 errmsg("number of columns (%d) exceeds limit (%d)",
! 						numberOfAttributes, MaxTupleAttributeNumber)));
  
  	/*
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
--- 1403,1421 ----
  	int			hoff;
  	bool		hasnull = false;
  	Form_pg_attribute *att = tupleDescriptor->attrs;
! 	int			maxNumberOfAttributes = tupleDescriptor->natts;
  	int			i;
+ 	int 		lastNonNullAttribute = -1;
+ 	int 		numberOfAttributes; /* The actual number of attributes for this particular row.
+ 	 	 	 	 	 	 	 	 	 * trailing nulls aren't stored so this may be smaller than
+ 	 	 	 	 	 	 	 	 	 * maxNumberOfAttributes
+ 	 	 	 	 	 	 	 	 	 */
  
! 	if (maxNumberOfAttributes > MaxTupleAttributeNumber)
  		ereport(ERROR,
  				(errcode(ERRCODE_TOO_MANY_COLUMNS),
  				 errmsg("number of columns (%d) exceeds limit (%d)",
! 						maxNumberOfAttributes, MaxTupleAttributeNumber)));
  
  	/*
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
***************
*** 1410,1429 **** heap_form_minimal_tuple(TupleDesc tupleDescriptor,
  	 * if an attribute is already toasted, it must have been sent to disk
  	 * already and so cannot contain toasted attributes.
  	 */
! 	for (i = 0; i < numberOfAttributes; i++)
  	{
! 		if (isnull[i])
! 			hasnull = true;
! 		else if (att[i]->attlen == -1 &&
  				 att[i]->attalign == 'd' &&
  				 att[i]->attndims == 0 &&
  				 !VARATT_IS_EXTENDED(values[i]))
! 		{
! 			values[i] = toast_flatten_tuple_attribute(values[i],
  													  att[i]->atttypid,
  													  att[i]->atttypmod);
  		}
  	}
  
  	/*
  	 * Determine total space needed
--- 1428,1452 ----
  	 * if an attribute is already toasted, it must have been sent to disk
  	 * already and so cannot contain toasted attributes.
  	 */
! 	for (i = 0; i < maxNumberOfAttributes; i++)
  	{
! 		if (!isnull[i])
! 		{
! 			lastNonNullAttribute = i;
! 			if (att[i]->attlen == -1 &&
  				 att[i]->attalign == 'd' &&
  				 att[i]->attndims == 0 &&
  				 !VARATT_IS_EXTENDED(values[i]))
! 			{
! 				values[i] = toast_flatten_tuple_attribute(values[i],
  													  att[i]->atttypid,
  													  att[i]->atttypmod);
+ 			}
  		}
+ 		else
+ 			hasnull = true;
  	}
+ 	numberOfAttributes = lastNonNullAttribute + 1;
  
  	/*
  	 * Determine total space needed
***************
*** 1463,1469 **** heap_form_minimal_tuple(TupleDesc tupleDescriptor,
  					(char *) tuple + hoff,
  					data_len,
  					&tuple->t_infomask,
! 					(hasnull ? tuple->t_bits : NULL));
  
  	return tuple;
  }
--- 1486,1493 ----
  					(char *) tuple + hoff,
  					data_len,
  					&tuple->t_infomask,
! 					(hasnull ? tuple->t_bits : NULL),
! 					numberOfAttributes);
  
  	return tuple;
  }
*** a/src/backend/access/common/indextuple.c
--- b/src/backend/access/common/indextuple.c
***************
*** 29,34 ****
--- 29,41 ----
  /* ----------------
   *		index_form_tuple
   * ----------------
+  *
+  * Note that as opposed to heap rows we don't bother truncating off trailing
+  * null attributes. This is because the maximum number of attributes in an
+  * index (32) is well bounded and there isn't much to be gained by trying to
+  * reduce the bitmap that tracks null attributes since we don't expect index
+  * keys have a lot of trailing nulls. Also, the logic for tracking the null
+  * bitmap is hard-wired at 4 bytes (see IndexAttributeBitMapData).
   */
  IndexTuple
  index_form_tuple(TupleDesc tupleDescriptor,
***************
*** 139,145 **** index_form_tuple(TupleDesc tupleDescriptor,
  					(char *) tp + hoff,
  					data_size,
  					&tupmask,
! 					(hasnull ? (bits8 *) tp + sizeof(IndexTupleData) : NULL));
  
  #ifdef TOAST_INDEX_HACK
  	for (i = 0; i < numberOfAttributes; i++)
--- 146,153 ----
  					(char *) tp + hoff,
  					data_size,
  					&tupmask,
! 					(hasnull ? (bits8 *) tp + sizeof(IndexTupleData) : NULL),
! 					numberOfAttributes);
  
  #ifdef TOAST_INDEX_HACK
  	for (i = 0; i < numberOfAttributes; i++)
*** a/src/backend/access/heap/tuptoaster.c
--- b/src/backend/access/heap/tuptoaster.c
***************
*** 425,430 **** toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
--- 425,431 ----
  	bool		need_free = false;
  	bool		need_delold = false;
  	bool		has_nulls = false;
+ 	int			lastNonNullValOffset = -1;
  
  	Size		maxDataLen;
  	Size		hoff;
***************
*** 534,539 **** toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
--- 535,544 ----
  			has_nulls = true;
  			continue;
  		}
+ 		else
+ 		{
+ 			lastNonNullValOffset = i;
+ 		}
  
  		/*
  		 * Now look at varlena attributes
***************
*** 593,600 **** toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
  
  	/* compute header overhead --- this should match heap_form_tuple() */
  	hoff = offsetof(HeapTupleHeaderData, t_bits);
! 	if (has_nulls)
! 		hoff += BITMAPLEN(numAttrs);
  	if (newtup->t_data->t_infomask & HEAP_HASOID)
  		hoff += sizeof(Oid);
  	hoff = MAXALIGN(hoff);
--- 598,605 ----
  
  	/* compute header overhead --- this should match heap_form_tuple() */
  	hoff = offsetof(HeapTupleHeaderData, t_bits);
! 	if (has_nulls && lastNonNullValOffset > -1)
! 		hoff += BITMAPLEN(lastNonNullValOffset+1);
  	if (newtup->t_data->t_infomask & HEAP_HASOID)
  		hoff += sizeof(Oid);
  	hoff = MAXALIGN(hoff);
***************
*** 867,872 **** toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
--- 872,878 ----
  		int32		new_header_len;
  		int32		new_data_len;
  		int32		new_tuple_len;
+ 		numAttrs = lastNonNullValOffset + 1; /* only store up to the last non-null attribute */
  
  		/*
  		 * Calculate the new size of the tuple.
***************
*** 879,885 **** toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
  		 * whether there needs to be one at all.
  		 */
  		new_header_len = offsetof(HeapTupleHeaderData, t_bits);
! 		if (has_nulls)
  			new_header_len += BITMAPLEN(numAttrs);
  		if (olddata->t_infomask & HEAP_HASOID)
  			new_header_len += sizeof(Oid);
--- 885,891 ----
  		 * whether there needs to be one at all.
  		 */
  		new_header_len = offsetof(HeapTupleHeaderData, t_bits);
! 		if (has_nulls && numAttrs > 0)
  			new_header_len += BITMAPLEN(numAttrs);
  		if (olddata->t_infomask & HEAP_HASOID)
  			new_header_len += sizeof(Oid);
***************
*** 914,920 **** toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
  						(char *) new_data + new_header_len,
  						new_data_len,
  						&(new_data->t_infomask),
! 						has_nulls ? new_data->t_bits : NULL);
  	}
  	else
  		result_tuple = newtup;
--- 920,927 ----
  						(char *) new_data + new_header_len,
  						new_data_len,
  						&(new_data->t_infomask),
! 						has_nulls ? new_data->t_bits : NULL,
! 						numAttrs);
  	}
  	else
  		result_tuple = newtup;
***************
*** 1060,1071 **** toast_flatten_tuple_attribute(Datum value,
  		return value;			/* not a composite type */
  
  	att = tupleDesc->attrs;
- 	numAttrs = tupleDesc->natts;
  
  	/*
  	 * Break down the tuple into fields.
  	 */
  	olddata = DatumGetHeapTupleHeader(value);
  	Assert(typeId == HeapTupleHeaderGetTypeId(olddata));
  	Assert(typeMod == HeapTupleHeaderGetTypMod(olddata));
  	/* Build a temporary HeapTuple control structure */
--- 1067,1078 ----
  		return value;			/* not a composite type */
  
  	att = tupleDesc->attrs;
  
  	/*
  	 * Break down the tuple into fields.
  	 */
  	olddata = DatumGetHeapTupleHeader(value);
+ 	numAttrs = HeapTupleHeaderGetNatts(olddata);
  	Assert(typeId == HeapTupleHeaderGetTypeId(olddata));
  	Assert(typeMod == HeapTupleHeaderGetTypMod(olddata));
  	/* Build a temporary HeapTuple control structure */
***************
*** 1147,1153 **** toast_flatten_tuple_attribute(Datum value,
  					(char *) new_data + new_header_len,
  					new_data_len,
  					&(new_data->t_infomask),
! 					has_nulls ? new_data->t_bits : NULL);
  
  	/*
  	 * Free allocated temp values
--- 1154,1161 ----
  					(char *) new_data + new_header_len,
  					new_data_len,
  					&(new_data->t_infomask),
! 					has_nulls ? new_data->t_bits : NULL,
! 			        numAttrs);
  
  	/*
  	 * Free allocated temp values
*** a/src/include/access/htup_details.h
--- b/src/include/access/htup_details.h
***************
*** 618,624 **** extern Size heap_compute_data_size(TupleDesc tupleDesc,
  extern void heap_fill_tuple(TupleDesc tupleDesc,
  				Datum *values, bool *isnull,
  				char *data, Size data_size,
! 				uint16 *infomask, bits8 *bit);
  extern bool heap_attisnull(HeapTuple tup, int attnum);
  extern Datum nocachegetattr(HeapTuple tup, int attnum,
  			   TupleDesc att);
--- 618,624 ----
  extern void heap_fill_tuple(TupleDesc tupleDesc,
  				Datum *values, bool *isnull,
  				char *data, Size data_size,
! 				uint16 *infomask, bits8 *bit, int lastNonNullAttributeOffset);
  extern bool heap_attisnull(HeapTuple tup, int attnum);
  extern Datum nocachegetattr(HeapTuple tup, int attnum,
  			   TupleDesc att);
*** /dev/null
--- b/src/test/regress/expected/trailing_nulls.out
***************
*** 0 ****
--- 1,210 ----
+ --
+ -- The row bitmap and row length don't track trailing nulls. Verify that this works correctly.
+ --
+ -- simple case
+ CREATE TABLE trailing_null (
+ 	c1 varchar(10), 
+ 	c2 varchar(10), 
+ 	c3 varchar(10), 
+ 	c4 varchar(10), 
+ 	c5 varchar(10) 
+ 	);
+ 	
+ INSERT INTO trailing_null VALUES (null, null, null, null, null);
+ INSERT INTO trailing_null VALUES (null, null, null, null, '5');
+ INSERT INTO trailing_null VALUES (null, null, null, '4', null);
+ INSERT INTO trailing_null VALUES (null, null, '3', null, null);
+ INSERT INTO trailing_null VALUES (null, '2', null, null, null);
+ INSERT INTO trailing_null VALUES ('1', null, null, null, null);
+ SELECT * FROM trailing_null ORDER BY c1, c2, c3, c4, c5;
+  c1 | c2 | c3 | c4 | c5 
+ ----+----+----+----+----
+  1  |    |    |    | 
+     | 2  |    |    | 
+     |    | 3  |    | 
+     |    |    | 4  | 
+     |    |    |    | 5
+     |    |    |    | 
+ (6 rows)
+ 
+ CREATE INDEX trailing_nulli ON trailing_null (c1, c2, c3, c4, c5);
+ INSERT INTO trailing_null VALUES ('1', null, null, null, null);
+ SELECT * FROM trailing_null ORDER BY c1, c2, c3, c4, c5;
+  c1 | c2 | c3 | c4 | c5 
+ ----+----+----+----+----
+  1  |    |    |    | 
+  1  |    |    |    | 
+     | 2  |    |    | 
+     |    | 3  |    | 
+     |    |    | 4  | 
+     |    |    |    | 5
+     |    |    |    | 
+ (7 rows)
+ 
+ DROP TABLE trailing_null;
+ -- corner case, no columns
+ CREATE TABLE trailing_null ();
+ DROP TABLE trailing_null;
+ -- toastable
+ CREATE TABLE trailing_null (
+ 	c1 varchar(500), 
+ 	c2 varchar(500), 
+ 	c3 varchar(500), 
+ 	c4 varchar(500), 
+ 	c5 varchar(500), 
+ 	c6 varchar(500), 
+ 	c7 varchar(500), 
+ 	c8 varchar(500), 
+ 	c9 varchar(500), 
+ 	c10 varchar(500), 
+ 	c11 varchar(500), 
+ 	c12 varchar(500), 
+ 	c13 varchar(500), 
+ 	c14 varchar(500), 
+ 	c15 varchar(500), 
+ 	c16 varchar(500), 
+ 	c17 varchar(500)
+ 	);
+ -- toast: easily compressed
+ INSERT INTO trailing_null VALUES (
+ 	rpad('c1', 450, 'x'),
+ 	rpad('c2', 450, 'x'),
+ 	rpad('c3', 450, 'x'),
+ 	rpad('c4', 450, 'x'),
+ 	rpad('c5', 450, 'x'),
+ 	rpad('c6', 450, 'x'),
+ 	rpad('c7', 450, 'x'),
+ 	rpad('c8', 450, 'x'),
+ 	rpad('c9', 450, 'x'),
+ 	rpad('c10', 450, 'x'),
+ 	rpad('c11', 450, 'x'),
+ 	rpad('c12', 450, 'x'),
+ 	rpad('c13', 450, 'x'),
+ 	rpad('c14', 450, 'x'),
+ 	rpad('c15', 450, 'x'),
+ 	rpad('c16', 450, 'x'),
+ 	rpad('c17', 450, 'x')
+ 	);
+ -- toast: off row storage
+ INSERT INTO trailing_null VALUES (
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3)
+ );
+ -- verify that we have rows in our toast storage
+ VACUUM FULL trailing_null;
+ VACUUM ANALYZE trailing_null;
+ SELECT CAST(reltuples AS INTEGER) FROM pg_class
+ 	WHERE oid = (SELECT reltoastrelid FROM pg_class where relname = 'trailing_null');
+  reltuples 
+ -----------
+          4
+ (1 row)
+ 
+ 	
+ -- non-toast, separate codepath
+ UPDATE trailing_null SET c17 = null;
+ SELECT length(c1), length(c16), c17 FROM trailing_null;
+  length | length | c17 
+ --------+--------+-----
+     450 |    450 | 
+     336 |    336 | 
+ (2 rows)
+ 
+ VACUUM FULL trailing_null;
+ VACUUM ANALYZE trailing_null;
+ SELECT CAST(reltuples AS INTEGER) FROM pg_class
+ 	WHERE oid = (SELECT reltoastrelid FROM pg_class where relname = 'trailing_null');
+  reltuples 
+ -----------
+          4
+ (1 row)
+ 
+ UPDATE trailing_null SET c16 = null;
+ SELECT length(c1), c16, c17 FROM trailing_null;
+  length | c16 | c17 
+ --------+-----+-----
+     450 |     | 
+     336 |     | 
+ (2 rows)
+ 
+ VACUUM FULL trailing_null;
+ VACUUM ANALYZE trailing_null;
+ SELECT CAST(reltuples AS INTEGER) FROM pg_class
+ 	WHERE oid = (SELECT reltoastrelid FROM pg_class where relname = 'trailing_null');
+  reltuples 
+ -----------
+          4
+ (1 row)
+ 
+ UPDATE trailing_null SET
+ 	c1 = null,
+ 	c2 = null,
+ 	c3 = null,
+ 	c4 = null,
+ 	c5 = null,
+ 	c6 = null,
+ 	c7 = null,
+ 	c8 = null,
+ 	c9 = null,
+ 	c10 = null,
+ 	c11 = null,
+ 	c12 = null,
+ 	c13 = null,
+ 	c14 = null,
+ 	c15 = null,
+ 	c16 = null,
+ 	c17 = null;
+ -- verify that we got rid of our toasted data
+ VACUUM FULL trailing_null;
+ VACUUM ANALYZE trailing_null;
+ SELECT CAST(reltuples AS INTEGER) FROM pg_class
+ 	WHERE oid = (SELECT reltoastrelid FROM pg_class where relname = 'trailing_null');
+  reltuples 
+ -----------
+          4
+ (1 row)
+ 
+ 	
+ DROP TABLE trailing_null;
+ -- Let's hit the detoasting logic related to composite objects. Borrowed
+ -- from rowtypes.sql
+ -- The object here is to ensure that toasted references inside
+ -- composite values don't cause problems.  The large f1 value will
+ -- be toasted inside pp, it must still work after being copied to people.
+ CREATE TYPE fullname as (first text, last text, extra text);
+ CREATE temp TABLE pp (f1 text, c1 varchar(10) null);
+ INSERT INTO pp VALUES (repeat('abcdefghijkl', 100000), null);
+ CREATE temp TABLE people (fn fullname, c1 varchar(10), c2 varchar(10));
+ -- trailing null in our nested object 
+ INSERT into people SELECT('Jim', f1, null)::fullname, c1, null from pp;
+ SELECT (fn).first, length((fn).last), c1, c2 from people;
+  first | length  | c1 | c2 
+ -------+---------+----+----
+  Jim   | 1200000 |    | 
+ (1 row)
+ 
+ ALTER TABLE people ADD COLUMN c3 varchar(10);
+ SELECT (fn).first, length((fn).last), c1, c2, c3 from people;
+  first | length  | c1 | c2 | c3 
+ -------+---------+----+----+----
+  Jim   | 1200000 |    |    | 
+ (1 row)
+ 
+ DROP TABLE pp;
+ DROP TABLE people;
+ DROP TYPE fullname;
*** a/src/test/regress/parallel_schedule
--- b/src/test/regress/parallel_schedule
***************
*** 98,104 **** test: event_trigger
  # ----------
  # Another group of parallel tests
  # ----------
! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock json
  
  # ----------
  # Another group of parallel tests
--- 98,104 ----
  # ----------
  # Another group of parallel tests
  # ----------
! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock json trailing_nulls 
  
  # ----------
  # Another group of parallel tests
*** /dev/null
--- b/src/test/regress/sql/trailing_nulls.sql
***************
*** 0 ****
--- 1,166 ----
+ --
+ -- The row bitmap and row length don't track trailing nulls. Verify that this works correctly.
+ --
+ 
+ -- simple case
+ CREATE TABLE trailing_null (
+ 	c1 varchar(10), 
+ 	c2 varchar(10), 
+ 	c3 varchar(10), 
+ 	c4 varchar(10), 
+ 	c5 varchar(10) 
+ 	);
+ 	
+ INSERT INTO trailing_null VALUES (null, null, null, null, null);
+ INSERT INTO trailing_null VALUES (null, null, null, null, '5');
+ INSERT INTO trailing_null VALUES (null, null, null, '4', null);
+ INSERT INTO trailing_null VALUES (null, null, '3', null, null);
+ INSERT INTO trailing_null VALUES (null, '2', null, null, null);
+ INSERT INTO trailing_null VALUES ('1', null, null, null, null);
+ 
+ SELECT * FROM trailing_null ORDER BY c1, c2, c3, c4, c5;
+ 
+ CREATE INDEX trailing_nulli ON trailing_null (c1, c2, c3, c4, c5);
+ 
+ INSERT INTO trailing_null VALUES ('1', null, null, null, null);
+ 
+ SELECT * FROM trailing_null ORDER BY c1, c2, c3, c4, c5;
+ 
+ DROP TABLE trailing_null;
+ 
+ -- corner case, no columns
+ CREATE TABLE trailing_null ();
+ DROP TABLE trailing_null;
+ 
+ 
+ -- toastable
+ CREATE TABLE trailing_null (
+ 	c1 varchar(500), 
+ 	c2 varchar(500), 
+ 	c3 varchar(500), 
+ 	c4 varchar(500), 
+ 	c5 varchar(500), 
+ 	c6 varchar(500), 
+ 	c7 varchar(500), 
+ 	c8 varchar(500), 
+ 	c9 varchar(500), 
+ 	c10 varchar(500), 
+ 	c11 varchar(500), 
+ 	c12 varchar(500), 
+ 	c13 varchar(500), 
+ 	c14 varchar(500), 
+ 	c15 varchar(500), 
+ 	c16 varchar(500), 
+ 	c17 varchar(500)
+ 	);
+ 
+ -- toast: easily compressed
+ INSERT INTO trailing_null VALUES (
+ 	rpad('c1', 450, 'x'),
+ 	rpad('c2', 450, 'x'),
+ 	rpad('c3', 450, 'x'),
+ 	rpad('c4', 450, 'x'),
+ 	rpad('c5', 450, 'x'),
+ 	rpad('c6', 450, 'x'),
+ 	rpad('c7', 450, 'x'),
+ 	rpad('c8', 450, 'x'),
+ 	rpad('c9', 450, 'x'),
+ 	rpad('c10', 450, 'x'),
+ 	rpad('c11', 450, 'x'),
+ 	rpad('c12', 450, 'x'),
+ 	rpad('c13', 450, 'x'),
+ 	rpad('c14', 450, 'x'),
+ 	rpad('c15', 450, 'x'),
+ 	rpad('c16', 450, 'x'),
+ 	rpad('c17', 450, 'x')
+ 	);
+ 
+ -- toast: off row storage
+ INSERT INTO trailing_null VALUES (
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3),
+  repeat('1234567890abcdefghijklmnopqrstuvwzyzABCDEFGHIJKLMNOPQRSTUVWZYZ0987654321zywvutsrqponmlkjihgfedcba1b2c3e5f6g7h9i0', 3)
+ );
+ 
+ -- verify that we have rows in our toast storage
+ VACUUM FULL trailing_null;
+ VACUUM ANALYZE trailing_null;
+ SELECT CAST(reltuples AS INTEGER) FROM pg_class
+ 	WHERE oid = (SELECT reltoastrelid FROM pg_class where relname = 'trailing_null');
+ 	
+ -- non-toast, separate codepath
+ UPDATE trailing_null SET c17 = null;
+ SELECT length(c1), length(c16), c17 FROM trailing_null;
+ VACUUM FULL trailing_null;
+ VACUUM ANALYZE trailing_null;
+ SELECT CAST(reltuples AS INTEGER) FROM pg_class
+ 	WHERE oid = (SELECT reltoastrelid FROM pg_class where relname = 'trailing_null');
+ 
+ UPDATE trailing_null SET c16 = null;
+ SELECT length(c1), c16, c17 FROM trailing_null;
+ VACUUM FULL trailing_null;
+ VACUUM ANALYZE trailing_null;
+ SELECT CAST(reltuples AS INTEGER) FROM pg_class
+ 	WHERE oid = (SELECT reltoastrelid FROM pg_class where relname = 'trailing_null');
+ 
+ UPDATE trailing_null SET
+ 	c1 = null,
+ 	c2 = null,
+ 	c3 = null,
+ 	c4 = null,
+ 	c5 = null,
+ 	c6 = null,
+ 	c7 = null,
+ 	c8 = null,
+ 	c9 = null,
+ 	c10 = null,
+ 	c11 = null,
+ 	c12 = null,
+ 	c13 = null,
+ 	c14 = null,
+ 	c15 = null,
+ 	c16 = null,
+ 	c17 = null;
+ 
+ -- verify that we got rid of our toasted data
+ VACUUM FULL trailing_null;
+ VACUUM ANALYZE trailing_null;
+ SELECT CAST(reltuples AS INTEGER) FROM pg_class
+ 	WHERE oid = (SELECT reltoastrelid FROM pg_class where relname = 'trailing_null');
+ 	
+ DROP TABLE trailing_null;
+ 
+ -- Let's hit the detoasting logic related to composite objects. Borrowed
+ -- from rowtypes.sql
+ 
+ -- The object here is to ensure that toasted references inside
+ -- composite values don't cause problems.  The large f1 value will
+ -- be toasted inside pp, it must still work after being copied to people.
+ CREATE TYPE fullname as (first text, last text, extra text);
+ CREATE temp TABLE pp (f1 text, c1 varchar(10) null);
+ INSERT INTO pp VALUES (repeat('abcdefghijkl', 100000), null);
+ CREATE temp TABLE people (fn fullname, c1 varchar(10), c2 varchar(10));
+ 
+ -- trailing null in our nested object 
+ INSERT into people SELECT('Jim', f1, null)::fullname, c1, null from pp;
+ SELECT (fn).first, length((fn).last), c1, c2 from people;
+ ALTER TABLE people ADD COLUMN c3 varchar(10);
+ SELECT (fn).first, length((fn).last), c1, c2, c3 from people;
+ 
+ DROP TABLE pp;
+ DROP TABLE people;
+ DROP TYPE fullname;
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jameison Martin (#21)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

Jameison Martin <jameisonb@yahoo.com> writes:

there have been a lot of different threads on this patch. i'm going to take a stab at teasing them out, summarizing them, and addressing them individually.

Thanks for reviewing the bidding so carefully.

4.3) does it violate a principle in the code (Tom's latest email)

from my perspective, the code has to deal with the fact that natts on the persistent row is different than the tupdesc already, this is the foundation upon which ALTER TABLE ADD NULL and DROP COLUMN as a metadata operation are built. so i believe that this patch strengthens the code's ability to handle the ALTER TABLE ADD NULL case by generalizing it: now the code will more frequently need to deal with the disparity between natts and tupdesc rather than only after someone did an ALTER TABLE. i'm an advocate of making corner cases go through generalized code where possible.

I think we're already pretty convinced that that case works, since ALTER
TABLE ADD COLUMN has been around for a very long time.

however, in all honestly i hadn't consider Tom's point that the patch has created a situation where natts on a row can deviate from the tupdesc that it was built with (as opposed to the current tupdesc which it could always deviate due to a subsequent ALTER TABLE). this is a subtle point and i don't think i have enough experience in the Postgres code to argue one way or another about it. so i'll have to leave that determination up to people that are more familiar with the code.

To be a bit more concrete, the thing that is worrying me is that the
executor frequently transforms tuples between "virtual" and HeapTuple
formats (look at the TupleTableSlot infrastructure, junk filters, and
tuplesort/tuplestore, for example). Up to now such transformation could
be counted on not to affect the apparent number of columns in the tuple;
but with this patch, a tuple materialized as a HeapTuple might well show
an natts smaller than what you'd conclude from looking at it in virtual
slot format. Now it's surely true that any code that's broken by that
would be broken anyway if it were fed a tuple direct-from-disk from a
relation that had suffered a later ADD COLUMN, but there are lots of
code paths where raw disk tuples would never appear. So IMO there's a
pretty fair chance of exposing latent bugs with this.

As an example that this type of concern isn't hypothetical, and that
identifying such bugs can be very difficult, see commit 039680aff.
That bug had been latent for years before it was exposed by an unrelated
change, and it took several more years to track it down.

As I said, I'd be willing to take this risk if the patch showed more
attractive performance benefits ... but it still seems mighty marginal
from here.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Greg Stark
stark@mit.edu
In reply to: Tom Lane (#22)

On Thu, Jan 24, 2013 at 8:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

As I said, I'd be willing to take this risk if the patch showed more
attractive performance benefits ... but it still seems mighty marginal
from here.

I think the benchmarks given so far are actually barking up the wrong
tree though. There are usage patterns that some people do engage in
that involve dozens or hundreds of columns that are mostly NULL. If
they're saving 10-20 bytes per row that's not insignificant. And they
could be saving much more than that.

That said I don't know just how common that usage pattern is. And I'm
not sure how many of those people would be able to arrange for the
null columns to land at the end of the row.

It's a bit frustrating because it does seem like if it had been
written this way to begin with nobody would every have questioned
whether it was a good idea and nobody would ever have suggested
ripping it out.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Craig Ringer
craig@2ndQuadrant.com
In reply to: Jameison Martin (#21)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On 01/25/2013 03:43 AM, Jameison Martin wrote:

there have been a lot of different threads on this patch. i'm going to
take a stab at teasing them out, summarizing them, and addressing them
individually.

Is this patch on the CF app? I can't seem to find it in 2013-01 or
2013-next, and I wanted to add your summary.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#23)

Greg Stark <stark@mit.edu> writes:

That said I don't know just how common that usage pattern is. And I'm
not sure how many of those people would be able to arrange for the
null columns to land at the end of the row.

Obviously, they can't arrange that all the time (else their trailing
columns are unnecessary). It'd have to be something like ordering the
columns in descending probability of being not-null --- and unless the
later columns *all* have very small probability of being not-null,
you're not gonna win much. Much as I hate to suggest that somebody
consider EAV-like storage, one does have to wonder whether a schema
like that doesn't need redesign more than it needs an implementation
tweak.

It's a bit frustrating because it does seem like if it had been
written this way to begin with nobody would every have questioned
whether it was a good idea and nobody would ever have suggested
ripping it out.

True, but we'd also have a much higher probability that there aren't
latent bugs because of expectations that t_natts is trustworthy.
It's the cost of getting from here to there that's scaring me.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26Amit Kapila
amit.kapila@huawei.com
In reply to: Craig Ringer (#24)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On 01/25/2013 03:43 AM, Jameison Martin wrote:

there have been a lot of different threads on this patch. i'm going to

take a stab at > teasing them out, summarizing them, and addressing them
individually.

Is this patch on the CF app? I can't seem to find it in 2013-01 or

2013-next, and I

wanted to add your summary.

It is in previous CF (2012-11) in Returned with Feedback section
https://commitfest.postgresql.org/action/commitfest_view?id=16

With Regards,
Amit Kapila.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Andres Freund
andres@2ndquadrant.com
In reply to: Amit Kapila (#26)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On 2013-01-25 09:06:12 +0530, Amit Kapila wrote:

On 01/25/2013 03:43 AM, Jameison Martin wrote:

there have been a lot of different threads on this patch. i'm going to

take a stab at > teasing them out, summarizing them, and addressing them
individually.

Is this patch on the CF app? I can't seem to find it in 2013-01 or

2013-next, and I

wanted to add your summary.

It is in previous CF (2012-11) in Returned with Feedback section
https://commitfest.postgresql.org/action/commitfest_view?id=16

I'd argue that the patch ought to be still marked as "Returned with
Feedback" instead of again being marked as "Needs Review". I don't
really see anything new that has come up?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28Amit Kapila
amit.kapila@huawei.com
In reply to: Andres Freund (#27)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On Monday, June 24, 2013 5:48 PM Andres Freund wrote:

On 2013-01-25 09:06:12 +0530, Amit Kapila wrote:

On 01/25/2013 03:43 AM, Jameison Martin wrote:

there have been a lot of different threads on this patch. i'm going

to

take a stab at > teasing them out, summarizing them, and addressing

them

individually.

Is this patch on the CF app? I can't seem to find it in 2013-01 or

2013-next, and I

wanted to add your summary.

It is in previous CF (2012-11) in Returned with Feedback section
https://commitfest.postgresql.org/action/commitfest_view?id=16

I'd argue that the patch ought to be still marked as "Returned with
Feedback" instead of again being marked as "Needs Review". I don't
really see anything new that has come up?

You are right that nothing new has come up. The major reason why this patch
could not go into 9.3 was that it is not clearly evident whether
Performance gains of this patch are sufficient to take risks (Tom points out
that bugs caused by such changes can be difficult to identify) of committing
this code.

I will summarize the results, and if most of us feel that they are not good
enough, then we can return this patch.

Observations from Performance Results for tables with less columns
--------------------------------------------------------------------
1. Approximately 13% space savings for table with 12 columns, out of which
last 4 are Nulls.
This table schema is such first 3 columns are integers and last 9 are
boolean's.

Refer below link for detailed results:
/messages/by-id/00b401cde1d8$746c90f0$5d45b2d0$@kapila@
huawei.com

Observations from Performance Results for tables with large columns
--------------------------------------------------------------------
1. There is a visible performance increase when number of columns containing
NULLS are more than > 60~70% in table have large number of columns.
Approximately 12% for table (having 800 cols, out of which 600 are nulls
OR having 300 columns, out of which 250 are nulls)
2. There are visible space savings when number of columns containing NULLS
are more than > 60~70% in table have large number of columns.
Approximately 11% for table (having 800 cols, out of which 600 are nulls
OR having 300 columns, out of which 250 are nulls)

Refer below link for detailed results:
/messages/by-id/6C0B27F7206C9E4CA54AE035729E9C382853AA4
0@szxeml509-mbs

Observation from original pgbench
----------------------------------

1. There is < 1% performance dip for original pgbench, this could be due to
fluctuation in readings or some consequences of addition of code which is
difficult to reason out.
Refer below link for detailed results:
/messages/by-id/00bc01cde1f7$be4b4cb0$3ae1e610$@kapila@
huawei.com

With Regards,
Amit Kapila.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Kapila (#28)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

Amit Kapila <amit.kapila@huawei.com> writes:

I will summarize the results, and if most of us feel that they are not good
enough, then we can return this patch.

Aside from the question of whether there's really any generally-useful
performance improvement from this patch, it strikes me that this change
forecloses other games that we might want to play with interpretation of
the value of a tuple's natts.

In particular, when I was visiting Salesforce last week, the point came
up that they'd really like ALTER TABLE ADD COLUMN to be "free" even when
the column had a non-null DEFAULT. It's not too difficult to imagine
how we might support that, at least when the default expression is a
constant: decree that if the requested attribute number is beyond natts,
look aside at the tuple descriptor to see if the column is marked as
having a magic default value, and if so, substitute that rather than
just returning NULL. (This has to be a "magic" value separate from
the current default, else a subsequent DROP or SET DEFAULT would do
the wrong thing.)

However, this idea conflicts with an optimization that supposes it can
reduce natts to suppress null columns: if the column was actually stored
as null, you'd lose that information, and would incorrectly return the
magic default on subsequent reads.

I think it might be possible to make both ideas play together, by
not reducing natts further than the last column with a magic default.
However, that means extra complexity in heap_form_tuple, which would
invalidate the performance measurements done in support of this patch.

In short, there's no free lunch ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#29)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On 24 June 2013 15:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Kapila <amit.kapila@huawei.com> writes:

I will summarize the results, and if most of us feel that they are not good
enough, then we can return this patch.

Aside from the question of whether there's really any generally-useful
performance improvement from this patch, it strikes me that this change
forecloses other games that we might want to play with interpretation of
the value of a tuple's natts.

In particular, when I was visiting Salesforce last week, the point came
up that they'd really like ALTER TABLE ADD COLUMN to be "free" even when
the column had a non-null DEFAULT. It's not too difficult to imagine
how we might support that, at least when the default expression is a
constant: decree that if the requested attribute number is beyond natts,
look aside at the tuple descriptor to see if the column is marked as
having a magic default value, and if so, substitute that rather than
just returning NULL. (This has to be a "magic" value separate from
the current default, else a subsequent DROP or SET DEFAULT would do
the wrong thing.)

Now that is a mighty fine idea.

However, this idea conflicts with an optimization that supposes it can
reduce natts to suppress null columns: if the column was actually stored
as null, you'd lose that information, and would incorrectly return the
magic default on subsequent reads.

I think it might be possible to make both ideas play together, by
not reducing natts further than the last column with a magic default.
However, that means extra complexity in heap_form_tuple, which would
invalidate the performance measurements done in support of this patch.

In short, there's no free lunch ...

Agreed.

I think its rather a shame that the proponents of this patch have
tried so hard to push this through without working variations on the
theme. Please guys, go away and get creative; rethink the approach so
that you can have your lunch without anybody else losing theirs. I
would add that I have seen the use case and want to support it, but
we're looking to add to the codebase not just steal small bites of
performance from existing use cases.

As a practical suggestion, why not change the macro so it doesn't even
try to do anything different unless the number of columns is > 72. A
constant comparison should go very quickly and isolate the code better
from the more typical code path. If not, lets try some other ideas,
like Tom just did.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#30)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On Mon, Jun 24, 2013 at 4:05 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

I think its rather a shame that the proponents of this patch have
tried so hard to push this through without working variations on the
theme. Please guys, go away and get creative; rethink the approach so
that you can have your lunch without anybody else losing theirs. I
would add that I have seen the use case and want to support it, but
we're looking to add to the codebase not just steal small bites of
performance from existing use cases.

If there's an actual performance consequence of applying this patch,
then I think that's a good reason for rejecting it. But if the best
argument we can come up with is that we might someday try to do even
more clever things with the tuple's natts value, I guess I'm not very
impressed. The reason why we have to rewrite the table when someone
adds a column with a non-NULL default is because we need to store the
new value in every row. Sure, we could defer that in this particular
case. But users might be mighty dismayed to see CLUSTER or VACUUM
FULL -- or a dump-and-reload! -- cause the table to become much LARGER
than it was before. Having some kind of column-oriented compression
would be darn nifty, but this particular path doesn't seem
particularly promising to me.

I think the larger and more fundamental problem with natts is that
there's just not very much bit space available there. Even as it is,
someone who adds and drops columns with any regularity will eventually
hit the wall, and there aren't any good alternatives at that point.
Were there more bit space available here, we could even do things like
allow some special cases of ALTER TABLE .. SET DATA TYPE not to
rewrite the table; natts could become a sort of tuple version number,
and we'd remember how to convert to newer versions on the fly. But
with only 2048 values available, it's not really feasible to start
consuming them for any more operations than we already do. I'm
generally a big fan of the principle that no single patch should be
allowed to crowd out room for future extensibility in this particular
area, but in this case I think the bit space available is *already* so
limited that we're not likely to get much further with it without
changing the storage format.

So, Tom, how's that pluggable storage format coming? :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32Josh Berkus
josh@agliodbs.com
In reply to: Kevin Grittner (#11)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

Simon,

I think its rather a shame that the proponents of this patch have
tried so hard to push this through without working variations on the
theme. Please guys, go away and get creative; rethink the approach so
that you can have your lunch without anybody else losing theirs. I
would add that I have seen the use case and want to support it, but
we're looking to add to the codebase not just steal small bites of
performance from existing use cases.

Do we really have ironclad numbers which show that the patch affects
performance negatively? I didn't think the previous performance test
was comprehensive; I was planning to develop one myself this week.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#31)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

Robert Haas <robertmhaas@gmail.com> writes:

If there's an actual performance consequence of applying this patch,
then I think that's a good reason for rejecting it. But if the best
argument we can come up with is that we might someday try to do even
more clever things with the tuple's natts value, I guess I'm not very
impressed. The reason why we have to rewrite the table when someone
adds a column with a non-NULL default is because we need to store the
new value in every row. Sure, we could defer that in this particular
case. But users might be mighty dismayed to see CLUSTER or VACUUM
FULL -- or a dump-and-reload! -- cause the table to become much LARGER
than it was before. Having some kind of column-oriented compression
would be darn nifty, but this particular path doesn't seem
particularly promising to me.

The point of what I was suggesting isn't to conserve storage, but to
reduce downtime during a schema change. Remember that a rewriting ALTER
TABLE locks everyone out of that table for a long time.

So, Tom, how's that pluggable storage format coming? :-)

Well, actually, it's looking to me like heap_form_tuple will be
underneath the API cut, because alternate storage managers will probably
have other tuple storage formats --- column stores being the poster
child here, but in any case the tuple header format is very unlikely
to be universal.

Which means that whether this patch gets applied to mainline is going
to be moot for Salesforce's purposes; they will certainly want the
equivalent logic in their storage code, because they've got tables with
many hundreds of mostly-null columns, but whether heap_form_tuple acts
this way or not won't affect them.

So unless we consider that many-hundreds-of-columns is a design center
for general purpose use of Postgres, we should be evaluating this patch
strictly on its usefulness for more typical table widths. And my take
on that is that (1) lots of columns isn't our design center (for the
reasons you mentioned among others), and (2) the case for the patch
looks pretty weak otherwise.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#33)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On 24 June 2013 21:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So, Tom, how's that pluggable storage format coming? :-)

Well, actually, it's looking to me like heap_form_tuple will be
underneath the API cut, because alternate storage managers will probably
have other tuple storage formats --- column stores being the poster
child here, but in any case the tuple header format is very unlikely
to be universal.

Hopefully we would allow multiple storage managers to be active at once,
one per table?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#35Jameison Martin
jameisonb@yahoo.com
In reply to: Josh Berkus (#32)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

i believe the last submission of the patch had no negative performance impact on any of the tested use cases, but you'd have to go back and see the last exchange on that. 

i think it was the concern about potentially regressing the codeline in unforeseen ways without a clear benefit to all but one use case (wide tables) that stalled things.

Show quoted text

________________________________
From: Josh Berkus <josh@agliodbs.com>
To: Simon Riggs <simon@2ndQuadrant.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Amit Kapila <amit.kapila@huawei.com>; Andres Freund <andres@2ndquadrant.com>; Craig Ringer <craig@2ndquadrant.com>; Jameison Martin <jameisonb@yahoo.com>; Noah Misch <noah@leadboat.com>; Kevin Grittner <kgrittn@mail.com>; robertmhaas@gmail.com; pgsql-hackers@postgresql.org
Sent: Monday, June 24, 2013 10:32 PM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

Simon,

I think its rather a shame that the proponents of this patch have
tried so hard to push this through without working variations on the
theme. Please guys, go away and get creative; rethink the approach so
that you can have your lunch without anybody else losing theirs. I
would add that I have seen the use case and want to support it, but
we're looking to add to the codebase not just steal small bites of
performance from existing use cases.

Do we really have ironclad numbers which show that the patch affects
performance negatively?  I didn't think the previous performance test
was comprehensive; I was planning to develop one myself this week.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#36Josh Berkus
josh@agliodbs.com
In reply to: Kevin Grittner (#11)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On 06/24/2013 01:50 PM, Tom Lane wrote:

The point of what I was suggesting isn't to conserve storage, but to
reduce downtime during a schema change. Remember that a rewriting ALTER
TABLE locks everyone out of that table for a long time.

Right, but I'm worried about the "surprise!" factor. That is, if we
make the first default "free" by using a magic value, then a SET DEFAULT
on that column is going to have some very surprising results as suddenly
the whole table needs to get written out for the old default. In many
use cases, this would still be a net win, since 80% of the time users
don't change defaults after column creation. But we'd have to make it
much less surprising somehow. Also for the reason Tom pointed out, the
optimization would only work on with NOT NULL columns ... leading to
another potential unexpected surprise when the column is marked NULLable.

So unless we consider that many-hundreds-of-columns is a design center
for general purpose use of Postgres, we should be evaluating this patch
strictly on its usefulness for more typical table widths. And my take
on that is that (1) lots of columns isn't our design center (for the
reasons you mentioned among others), and (2) the case for the patch
looks pretty weak otherwise.

Well, actually, hundreds of columns is reasonably common for a certain
user set (ERP, CRM, etc.). If we could handle that use case very
efficiently, then it would win us some users, since other RDMBSes don't.
However, there are multiple issues with having hundreds of columns, of
which storage optimization is only one ... and probably the smallest one
at that.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#36)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

Josh Berkus <josh@agliodbs.com> writes:

On 06/24/2013 01:50 PM, Tom Lane wrote:

The point of what I was suggesting isn't to conserve storage, but to
reduce downtime during a schema change. Remember that a rewriting ALTER
TABLE locks everyone out of that table for a long time.

Right, but I'm worried about the "surprise!" factor. That is, if we
make the first default "free" by using a magic value, then a SET DEFAULT
on that column is going to have some very surprising results as suddenly
the whole table needs to get written out for the old default.

No, that's why we'd store the magic default separately. That will be
permanent and unaffected by later SET DEFAULT operations. (This
requires that every subsequently created tuple store the column
explicitly so that the magic default doesn't affect it; which is exactly
why there's a conflict with the currently-proposed patch.)

... Also for the reason Tom pointed out, the
optimization would only work on with NOT NULL columns ... leading to
another potential unexpected surprise when the column is marked NULLable.

Huh? We already have the case of null default handled.

Well, actually, hundreds of columns is reasonably common for a certain
user set (ERP, CRM, etc.). If we could handle that use case very
efficiently, then it would win us some users, since other RDMBSes don't.
However, there are multiple issues with having hundreds of columns, of
which storage optimization is only one ... and probably the smallest one
at that.

Agreed; there are a lot of things we'd have to address if we really
wanted to claim this is a domain we work well in. (I suspect Salesforce
will be chipping away at some of those issues, but as I said,
heap_form_tuple is not in their critical path.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38Josh Berkus
josh@agliodbs.com
In reply to: Kevin Grittner (#11)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On 06/24/2013 02:21 PM, Tom Lane wrote:

Right, but I'm worried about the "surprise!" factor. That is, if we
make the first default "free" by using a magic value, then a SET DEFAULT
on that column is going to have some very surprising results as suddenly
the whole table needs to get written out for the old default.

No, that's why we'd store the magic default separately. That will be
permanent and unaffected by later SET DEFAULT operations. (This
requires that every subsequently created tuple store the column
explicitly so that the magic default doesn't affect it; which is exactly
why there's a conflict with the currently-proposed patch.)

Yah. So how likely is this to get done sometime in the next 2 releases?
It's only a conflict if someone is going to write the code ...

Agreed; there are a lot of things we'd have to address if we really
wanted to claim this is a domain we work well in. (I suspect Salesforce
will be chipping away at some of those issues, but as I said,
heap_form_tuple is not in their critical path.)

Well, doing the trailing column truncation as part of a general plan to
make having 600 columns less painful would make more sense than doing it
on its own. For my personal use case(s), I don't really care about the
null bitmap that much; that amount of space simply isn't that
significant compared to the other performance issues involved. I
started evaluating this patch just because I'm one of the few people
with a realistic test case.

Anyway, let's decide if acceptance of this patch hinges on performance
or not. I'll take me a whole evening to set up a good performance test,
so I don't want to do it if it's going to be a moot point.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#33)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On Mon, Jun 24, 2013 at 4:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The point of what I was suggesting isn't to conserve storage, but to
reduce downtime during a schema change. Remember that a rewriting ALTER
TABLE locks everyone out of that table for a long time.

Sure, I understand. As Josh says, though, it'd still be potentially
quite surprising.

So, Tom, how's that pluggable storage format coming? :-)

Well, actually, it's looking to me like heap_form_tuple will be
underneath the API cut, because alternate storage managers will probably
have other tuple storage formats --- column stores being the poster
child here, but in any case the tuple header format is very unlikely
to be universal.

I've had the same thought. Unfortunately, there are a lot of things -
like the TupleTableSlot abstraction - that are fairly deeply in bed
with the format used by our current heap AM; so we might be imposing a
performance overhead for anyone who uses some other representation.
Unless you have some idea how to avoid that?

Which means that whether this patch gets applied to mainline is going
to be moot for Salesforce's purposes; they will certainly want the
equivalent logic in their storage code, because they've got tables with
many hundreds of mostly-null columns, but whether heap_form_tuple acts
this way or not won't affect them.

OK.

So unless we consider that many-hundreds-of-columns is a design center
for general purpose use of Postgres, we should be evaluating this patch
strictly on its usefulness for more typical table widths. And my take
on that is that (1) lots of columns isn't our design center (for the
reasons you mentioned among others), and (2) the case for the patch
looks pretty weak otherwise.

I guess I have yet to be convinced that it really hurts anything.
Your example seems fairly hypothetical and could easily be something
no one ever implements. I don't feel terribly strongly that we need
to take this patch, and I don't really care that much whether we do or
not; I'm just not really convinced there's any actual evidence that we
shouldn't. The standard isn't that it's got to make something really
important better; it's just got to make something better without
making anything more important worse.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40Noah Misch
noah@leadboat.com
In reply to: Josh Berkus (#32)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On Mon, Jun 24, 2013 at 01:32:41PM -0700, Josh Berkus wrote:

Do we really have ironclad numbers which show that the patch affects
performance negatively? I didn't think the previous performance test
was comprehensive; I was planning to develop one myself this week.

Not ironclad, no:
/messages/by-id/20130124021205.GB29954@tornado.leadboat.com

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41Amit Kapila
amit.kapila@huawei.com
In reply to: Tom Lane (#29)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On Monday, June 24, 2013 8:20 PM Tom Lane wrote:

Amit Kapila <amit.kapila@huawei.com> writes:

I will summarize the results, and if most of us feel that they are

not good

enough, then we can return this patch.

Aside from the question of whether there's really any generally-useful
performance improvement from this patch, it strikes me that this change
forecloses other games that we might want to play with interpretation
of
the value of a tuple's natts.

In particular, when I was visiting Salesforce last week, the point came
up that they'd really like ALTER TABLE ADD COLUMN to be "free" even
when
the column had a non-null DEFAULT. It's not too difficult to imagine
how we might support that, at least when the default expression is a
constant: decree that if the requested attribute number is beyond
natts,
look aside at the tuple descriptor to see if the column is marked as
having a magic default value, and if so, substitute that rather than
just returning NULL. (This has to be a "magic" value separate from
the current default, else a subsequent DROP or SET DEFAULT would do
the wrong thing.)

However, this idea conflicts with an optimization that supposes it can
reduce natts to suppress null columns: if the column was actually
stored
as null, you'd lose that information, and would incorrectly return the
magic default on subsequent reads.

I think it might be possible to make both ideas play together, by
not reducing natts further than the last column with a magic default.
However, that means extra complexity in heap_form_tuple, which would
invalidate the performance measurements done in support of this patch.

It can have slight impact on normal scenario's, but I am not sure how much
because
the change will be very less(may be one extra if check and one assignment)

For this Patch's scenario, I think the major benefit for Performance is in
heap_fill_tuple() where the
For loop is reduced. However added some logic in heap_form_tuple can
reduce the performance improvement,
but there can still be space saving benefit.

With Regards,
Amit Kapila.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42Jamie Martin
jameisonb@yahoo.com
In reply to: Amit Kapila (#41)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

FYI I submitted a slightly modified patch since Amit's measurements that is slightly faster.

On Jun 25, 2013, at 1:26 PM, Amit Kapila <amit.kapila@huawei.com> wrote:

On Monday, June 24, 2013 8:20 PM Tom Lane wrote:

Amit Kapila <amit.kapila@huawei.com> writes:

I will summarize the results, and if most of us feel that they are

not good

enough, then we can return this patch.

Aside from the question of whether there's really any generally-useful
performance improvement from this patch, it strikes me that this change
forecloses other games that we might want to play with interpretation
of
the value of a tuple's natts.

In particular, when I was visiting Salesforce last week, the point came
up that they'd really like ALTER TABLE ADD COLUMN to be "free" even
when
the column had a non-null DEFAULT. It's not too difficult to imagine
how we might support that, at least when the default expression is a
constant: decree that if the requested attribute number is beyond
natts,
look aside at the tuple descriptor to see if the column is marked as
having a magic default value, and if so, substitute that rather than
just returning NULL. (This has to be a "magic" value separate from
the current default, else a subsequent DROP or SET DEFAULT would do
the wrong thing.)

However, this idea conflicts with an optimization that supposes it can
reduce natts to suppress null columns: if the column was actually
stored
as null, you'd lose that information, and would incorrectly return the
magic default on subsequent reads.

I think it might be possible to make both ideas play together, by
not reducing natts further than the last column with a magic default.
However, that means extra complexity in heap_form_tuple, which would
invalidate the performance measurements done in support of this patch.

It can have slight impact on normal scenario's, but I am not sure how much
because
the change will be very less(may be one extra if check and one assignment)

For this Patch's scenario, I think the major benefit for Performance is in
heap_fill_tuple() where the
For loop is reduced. However added some logic in heap_form_tuple can
reduce the performance improvement,
but there can still be space saving benefit.

With Regards,
Amit Kapila.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43Josh Berkus
josh@agliodbs.com
In reply to: Kevin Grittner (#11)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

All,

So, is this patch currently depending on performance testing, or not?
Like I said, it'll be a chunk of time to set up what I beleive is a
realistic performance test, so I don't want to do it if the patch is
likely to be bounced for other reasons.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#43)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

Josh Berkus <josh@agliodbs.com> writes:

So, is this patch currently depending on performance testing, or not?
Like I said, it'll be a chunk of time to set up what I beleive is a
realistic performance test, so I don't want to do it if the patch is
likely to be bounced for other reasons.

AFAICS, the threshold question here is whether the patch helps usefully
for tables with typical numbers of columns (ie, not 800 ;-)), and
doesn't hurt materially in any common scenario. If it does, I think
we'd take it. I've not read the patch, so I don't know if there are
minor cosmetic or correctness issues, but at bottom this seems to be a
very straightforward performance tradeoff.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45Josh Berkus
josh@agliodbs.com
In reply to: Kevin Grittner (#11)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On 06/27/2013 11:11 AM, Tom Lane wrote:

AFAICS, the threshold question here is whether the patch helps usefully
for tables with typical numbers of columns (ie, not 800 ;-)), and

I wouldn't expect it to help at all for < 50 columns, and probably not
measurably below 200.

doesn't hurt materially in any common scenario. If it does, I think

Yeah, I think that's be bigger question. Ok, I'll start working on a
new test case. Here's my thinking on performance tests:

1. run pgbench 10 times both with and without the patch. See if there's
any measurable difference. There should not be.

2. Run with/without comparisons for the following scenarios:

Each table would have a SERIAL pk, a timestamp, and:

Chains of booleans:
# attributes NULL probability
16 0% 50% 90%
128 0% 50% 90%
512 0% 50% 90%

Chains of text:
16 0% 50% 90%
256 0% 50% 90%

... for 100,000 rows each.

Comparisons would include:

1) table size before and after testing
2) time required to read 1000 rows, by key
3) time required to read rows with each of
100 random column positions = some value
4) time required to insert 1000 rows
5) time required to update 1000 rows

Geez, I feel tired just thinking about it. Jamie, can your team do any
of this?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46Josh Berkus
josh@agliodbs.com
In reply to: Kevin Grittner (#11)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On 06/26/2013 07:05 AM, Jamie Martin wrote:

FYI I submitted a slightly modified patch since Amit's measurements that is slightly faster.

Yes. My perspective is that this is a worthwhile optimization for a
minority, but fairly well-known, use case, provided that it doesn't
negatively impact any other, more common use case. Potential cases
where I can see negative impact are:

A) normal table with a few, mostly non-null columns (recent pgbench
testing seems to have validated no measurable impact).

B) table with many (400+) mostly non-null columns

C) table with many (400+) mostly null columns, where column #390 was
null and gets updated to a not null value

I don't *think* that Jamie's performance tests have really addressed the
above cases. However, do people agree that if performance on the patch
passes for all of A, B and C, then it's OK to apply?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47Greg Stark
stark@mit.edu
In reply to: Josh Berkus (#45)
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On Thu, Jun 27, 2013 at 10:12 PM, Josh Berkus <josh@agliodbs.com> wrote:

Yeah, I think that's be bigger question. Ok, I'll start working on a
new test case. Here's my thinking on performance tests:

1. run pgbench 10 times both with and without the patch. See if there's
any measurable difference. There should not be.

I don't even see the point in extensive empirical results. Empirical
results are somewhat useful for measuring the cpu cycle cost when the
optimization doesn't give any benefit. That should be one fairly
simple test and my understanding is that it's been done and shown no
significant cost.

When the optimization does kick in it saves space. Saving space is
something we can calculate the effect precisely of and don't need
empirical tests to validate. I would still want to check that it
actually works as expected of course but that's a matter of measuring
the row sizes, not timing lengthy pgbench runs.

Neither of these address Tom's concerns about API changes and future
flexibility. I was assigned this patch in the rreviewers list and my
inclination would be to take it but I wasn't about to
overrule Tom. If he says he's ok with it then I'm fine going ahead and
reviewing the code. If I still have commit bits I could even commit
it.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48Josh Berkus
josh@agliodbs.com
In reply to: Kevin Grittner (#11)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On 07/11/2013 09:28 AM, Greg Stark wrote:

Neither of these address Tom's concerns about API changes and future
flexibility. I was assigned this patch in the rreviewers list and my
inclination would be to take it but I wasn't about to
overrule Tom. If he says he's ok with it then I'm fine going ahead and
reviewing the code. If I still have commit bits I could even commit
it.

API changes? I can't find that issue in the discussion.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49Peter Eisentraut
peter_e@gmx.net
In reply to: Jameison Martin (#21)
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

This patch is in the 2013-09 commitfest but needs a rebase.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers