logical column ordering

Started by Alvaro Herreraabout 11 years ago136 messages

alvherre@2ndquadrant.com

about 11 years ago

1 attachment(s)

So I've been updating my very old patch to allow logical and physical
column reordering. Here's a WIP first cut for examination. There are
plenty of rough edges here; most importantly there is no UI at all for
column reordering other than direct UPDATEs of pg_attribute, which most
likely falls afoul of cache invalidation problems. For now I'm focusing
on correct processing when columns are moved logically; I haven't yet
started to see how to move columns physically, but I think that part is
much more localized than the logical one.

Just as a reminder, this effort is an implementation of ideas that have
been discussed previously; in particular, see these threads:

/messages/by-id/20414.1166719407@sss.pgh.pa.us (2006)
/messages/by-id/6843.1172126270@sss.pgh.pa.us (2007)
/messages/by-id/23035.1227659434@sss.pgh.pa.us (2008)

To recap, this is based on the idea of having three numbers for each
attribute rather than a single attnum; the first of these is attnum (a
number that uniquely identifies an attribute since its inception and may
or may not have any relationship to its storage position and the place
it expands to through user interaction). The second is attphysnum,
which indicates where it is stored in the physical structure. The third
is attlognum, which indicates where it expands in "*", where must its
values be placed in COPY or VALUES lists, etc --- the logical position
as the user sees it.

The first thing where this matters is tuple descriptor expansion in
parse analysis; at this stage, things such as "*" (in "select *") are
turned into a target list, which must be sorted according to attlognum.
To achieve this I added a new routine to tupledescs,
TupleDescGetSortedAttrs() which computes a new Attribute array and
caches it in the TupleDesc for later uses; this array points to the
same elements in the normal attribute list but is order by attlognum.

Additionally there are a number of places that iterate on such target
lists and use the iterator as the attribute number; those were modified
to have a separate attribute number as attnum within the loop.

Another place that needs tweaking is heapam.c, which must construct a
physical tuple from Datum/nulls arrays (heap_form_tuple). In some cases
the input arrays are sorted in logical column order. I have opted to
add a flag that indicates whether the array is in logical order; if it
is the routines compute the correct physical order. (Actually as I
mentioned above I still haven't made any effort to make sure they work
in the case that attnum differs from attphysnum, but this should be
reasonably contained changes.)

The part where I stopped just before sending the current state is this
error message:

alvherre=# select * from quux where (a,c) in ( select a,c from quux );
select * from quux where (a,c) in ( select a,c from quux );
ERROR: failed to find unique expression in subplan tlist

I'm going to see about it while I get feedback on the rest of this patch; in
particular, extra test cases that fail to work when columns have been
moved around are welcome, so that I can add them to the regress test.
What I have now is the basics I'm building as I go along. The
regression tests show examples of some logical column renumbering (which
can be done after the table already contains some data) but none of
physical column renumbering (which can only be done when the table is
completely empty.) My hunch is that the sample foo, bar, baz, quux
tables should present plenty of opportunities to display brokenness in
the planner and executor.

PS: Phil Currier allegedly had a patch back in 2007-2008 that did this,
or something very similar ... though he never posted a single bit of it,
and then he vanished without a trace. If he's still available it would
be nice to see his WIP patch, even if outdated, as it might serve as
inspiration and let us know what other places need tweaking.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

columns-logical.patchtext/x-diff; charset=us-asciiDownload

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 18ae318..54473be 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2368,6 +2368,9 @@ get_attnum_pk_pos(int *pkattnums, int pknumatts, int key)
 	return -1;
 }
 
+/*
+ * FIXME this probably needs to be tweaked.
+ */
 static HeapTuple
 get_tuple_of_interest(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals)
 {
diff --git a/contrib/spi/timetravel.c b/contrib/spi/timetravel.c
index a37cbee..d7e6e50 100644
--- a/contrib/spi/timetravel.c
+++ b/contrib/spi/timetravel.c
@@ -314,6 +314,7 @@ timetravel(PG_FUNCTION_ARGS)
 		Oid		   *ctypes;
 		char		sql[8192];
 		char		separ = ' ';
+		Form_pg_attribute *attrs;
 
 		/* allocate ctypes for preparation */
 		ctypes = (Oid *) palloc(natts * sizeof(Oid));
@@ -322,10 +323,11 @@ timetravel(PG_FUNCTION_ARGS)
 		 * Construct query: INSERT INTO _relation_ VALUES ($1, ...)
 		 */
 		snprintf(sql, sizeof(sql), "INSERT INTO %s VALUES (", relname);
+		attrs = TupleDescGetSortedAttrs(tupdesc);
 		for (i = 1; i <= natts; i++)
 		{
-			ctypes[i - 1] = SPI_gettypeid(tupdesc, i);
-			if (!(tupdesc->attrs[i - 1]->attisdropped)) /* skip dropped columns */
+			ctypes[i - 1] = SPI_gettypeid(tupdesc, attrs[i - 1]->attnum);
+			if (!(attrs[i - 1]->attisdropped)) /* skip dropped columns */
 			{
 				snprintf(sql + strlen(sql), sizeof(sql) - strlen(sql), "%c$%d", separ, i);
 				separ = ',';
diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index af649c0..137446a 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -159,7 +159,7 @@ brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno, BrinMemTuple *tuple,
 	len = hoff = MAXALIGN(len);
 
 	data_len = heap_compute_data_size(brtuple_disk_tupdesc(brdesc),
-									  values, nulls);
+									  values, nulls, false);
 
 	len += data_len;
 
@@ -177,6 +177,7 @@ brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno, BrinMemTuple *tuple,
 	heap_fill_tuple(brtuple_disk_tupdesc(brdesc),
 					values,
 					nulls,
+					false,
 					(char *) rettuple + hoff,
 					data_len,
 					&phony_infomask,
diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index 009ebe7..5dfbed5 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -79,11 +79,15 @@
 /*
  * heap_compute_data_size
  *		Determine size of the data area of a tuple to be constructed
+ *
+ * logical_order means that the values and isnull arrays are sorted
+ * following attlognum.
  */
 Size
 heap_compute_data_size(TupleDesc tupleDesc,
 					   Datum *values,
-					   bool *isnull)
+					   bool *isnull,
+					   bool logical_order)
 {
 	Size		data_length = 0;
 	int			i;
@@ -93,11 +97,14 @@ heap_compute_data_size(TupleDesc tupleDesc,
 	for (i = 0; i < numberOfAttributes; i++)
 	{
 		Datum		val;
+		int			idx;
 
-		if (isnull[i])
+		idx = logical_order ? att[i]->attlognum - 1 : i;
+
+		if (isnull[idx])
 			continue;
 
-		val = values[i];
+		val = values[idx];
 
 		if (ATT_IS_PACKABLE(att[i]) &&
 			VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
@@ -124,6 +131,9 @@ heap_compute_data_size(TupleDesc tupleDesc,
  * heap_fill_tuple
  *		Load data portion of a tuple from values/isnull arrays
  *
+ * logical_order means that the values and isnull arrays are sorted
+ * following attlognum.
+ *
  * We also fill the null bitmap (if any) and set the infomask bits
  * that reflect the tuple's data contents.
  *
@@ -132,6 +142,7 @@ heap_compute_data_size(TupleDesc tupleDesc,
 void
 heap_fill_tuple(TupleDesc tupleDesc,
 				Datum *values, bool *isnull,
+				bool logical_order,
 				char *data, Size data_size,
 				uint16 *infomask, bits8 *bit)
 {
@@ -162,6 +173,13 @@ heap_fill_tuple(TupleDesc tupleDesc,
 	for (i = 0; i < numberOfAttributes; i++)
 	{
 		Size		data_length;
+		int			idx;
+		Datum		value;
+		bool		thisnull;
+
+		idx = logical_order ? att[i]->attlognum - 1 : i;
+
+		thisnull = isnull[idx];
 
 		if (bit != NULL)
 		{
@@ -174,7 +192,7 @@ heap_fill_tuple(TupleDesc tupleDesc,
 				bitmask = 1;
 			}
 
-			if (isnull[i])
+			if (thisnull)
 			{
 				*infomask |= HEAP_HASNULL;
 				continue;
@@ -183,6 +201,8 @@ heap_fill_tuple(TupleDesc tupleDesc,
 			*bitP |= bitmask;
 		}
 
+		value = values[idx];
+
 		/*
 		 * XXX we use the att_align macros on the pointer value itself, not on
 		 * an offset.  This is a bit of a hack.
@@ -192,13 +212,13 @@ heap_fill_tuple(TupleDesc tupleDesc,
 		{
 			/* pass-by-value */
 			data = (char *) att_align_nominal(data, att[i]->attalign);
-			store_att_byval(data, values[i], att[i]->attlen);
+			store_att_byval(data, value, att[i]->attlen);
 			data_length = att[i]->attlen;
 		}
 		else if (att[i]->attlen == -1)
 		{
 			/* varlena */
-			Pointer		val = DatumGetPointer(values[i]);
+			Pointer		val = DatumGetPointer(value);
 
 			*infomask |= HEAP_HASVARWIDTH;
 			if (VARATT_IS_EXTERNAL(val))
@@ -236,8 +256,8 @@ heap_fill_tuple(TupleDesc tupleDesc,
 			/* cstring ... never needs alignment */
 			*infomask |= HEAP_HASVARWIDTH;
 			Assert(att[i]->attalign == 'c');
-			data_length = strlen(DatumGetCString(values[i])) + 1;
-			memcpy(data, DatumGetPointer(values[i]), data_length);
+			data_length = strlen(DatumGetCString(value)) + 1;
+			memcpy(data, DatumGetPointer(value), data_length);
 		}
 		else
 		{
@@ -245,7 +265,7 @@ heap_fill_tuple(TupleDesc tupleDesc,
 			data = (char *) att_align_nominal(data, att[i]->attalign);
 			Assert(att[i]->attlen > 0);
 			data_length = att[i]->attlen;
-			memcpy(data, DatumGetPointer(values[i]), data_length);
+			memcpy(data, DatumGetPointer(value), data_length);
 		}
 
 		data += data_length;
@@ -660,9 +680,10 @@ heap_copy_tuple_as_datum(HeapTuple tuple, TupleDesc tupleDesc)
  * The result is allocated in the current memory context.
  */
 HeapTuple
-heap_form_tuple(TupleDesc tupleDescriptor,
-				Datum *values,
-				bool *isnull)
+heap_form_tuple_extended(TupleDesc tupleDescriptor,
+						 Datum *values,
+						 bool *isnull,
+						 int flags)
 {
 	HeapTuple	tuple;			/* return tuple */
 	HeapTupleHeader td;			/* tuple data */
@@ -672,6 +693,7 @@ heap_form_tuple(TupleDesc tupleDescriptor,
 	bool		hasnull = false;
 	int			numberOfAttributes = tupleDescriptor->natts;
 	int			i;
+	bool		logical_order = (flags & HTOPT_LOGICAL_ORDER) != 0;
 
 	if (numberOfAttributes > MaxTupleAttributeNumber)
 		ereport(ERROR,
@@ -704,7 +726,8 @@ heap_form_tuple(TupleDesc tupleDescriptor,
 
 	hoff = len = MAXALIGN(len); /* align user data safely */
 
-	data_len = heap_compute_data_size(tupleDescriptor, values, isnull);
+	data_len = heap_compute_data_size(tupleDescriptor, values, isnull,
+									  logical_order);
 
 	len += data_len;
 
@@ -737,6 +760,7 @@ heap_form_tuple(TupleDesc tupleDescriptor,
 	heap_fill_tuple(tupleDescriptor,
 					values,
 					isnull,
+					logical_order,
 					(char *) td + hoff,
 					data_len,
 					&td->t_infomask,
@@ -887,7 +911,7 @@ heap_modifytuple(HeapTuple tuple,
 }
 
 /*
- * heap_deform_tuple
+ * heap_deform_tuple_extended
  *		Given a tuple, extract data into values/isnull arrays; this is
  *		the inverse of heap_form_tuple.
  *
@@ -904,8 +928,8 @@ heap_modifytuple(HeapTuple tuple,
  *		noncacheable attribute offsets are involved.
  */
 void
-heap_deform_tuple(HeapTuple tuple, TupleDesc tupleDesc,
-				  Datum *values, bool *isnull)
+heap_deform_tuple_extended(HeapTuple tuple, TupleDesc tupleDesc,
+						   Datum *values, bool *isnull, int flags)
 {
 	HeapTupleHeader tup = tuple->t_data;
 	bool		hasnulls = HeapTupleHasNulls(tuple);
@@ -917,6 +941,7 @@ heap_deform_tuple(HeapTuple tuple, TupleDesc tupleDesc,
 	long		off;			/* offset in tuple data */
 	bits8	   *bp = tup->t_bits;		/* ptr to null bitmap in tuple */
 	bool		slow = false;	/* can we use/set attcacheoff? */
+	bool		logical_order = (flags & HTOPT_LOGICAL_ORDER) != 0;
 
 	natts = HeapTupleHeaderGetNatts(tup);
 
@@ -934,16 +959,18 @@ heap_deform_tuple(HeapTuple tuple, TupleDesc tupleDesc,
 	for (attnum = 0; attnum < natts; attnum++)
 	{
 		Form_pg_attribute thisatt = att[attnum];
+		int		fillattnum = logical_order ?
+			thisatt->attlognum - 1 : thisatt->attnum - 1;
 
-		if (hasnulls && att_isnull(attnum, bp))
+		if (hasnulls && att_isnull(thisatt->attnum - 1, bp))
 		{
-			values[attnum] = (Datum) 0;
-			isnull[attnum] = true;
+			values[fillattnum] = (Datum) 0;
+			isnull[fillattnum] = true;
 			slow = true;		/* can't use attcacheoff anymore */
 			continue;
 		}
 
-		isnull[attnum] = false;
+		isnull[fillattnum] = false;
 
 		if (!slow && thisatt->attcacheoff >= 0)
 			off = thisatt->attcacheoff;
@@ -974,7 +1001,7 @@ heap_deform_tuple(HeapTuple tuple, TupleDesc tupleDesc,
 				thisatt->attcacheoff = off;
 		}
 
-		values[attnum] = fetchatt(thisatt, tp + off);
+		values[fillattnum] = fetchatt(thisatt, tp + off);
 
 		off = att_addlength_pointer(off, thisatt->attlen, tp + off);
 
@@ -985,6 +1012,8 @@ heap_deform_tuple(HeapTuple tuple, TupleDesc tupleDesc,
 	/*
 	 * If tuple doesn't have all the atts indicated by tupleDesc, read the
 	 * rest as null
+	 *
+	 * FIXME -- this is wrong if HTOPT_LOGICAL_ORDER
 	 */
 	for (; attnum < tdesc_natts; attnum++)
 	{
@@ -1398,12 +1427,17 @@ heap_freetuple(HeapTuple htup)
  * "minimal" tuple lacking a HeapTupleData header as well as room for system
  * columns.
  *
+ * If the HTOPT_LOGICAL_ORDER flag is set, the values and isnull arrays are
+ * sorted in logical order, so we re-sort them to build the tuple in correct
+ * physical order.
+ *
  * The result is allocated in the current memory context.
  */
 MinimalTuple
 heap_form_minimal_tuple(TupleDesc tupleDescriptor,
 						Datum *values,
-						bool *isnull)
+						bool *isnull,
+						int flags)
 {
 	MinimalTuple tuple;			/* return tuple */
 	Size		len,
@@ -1412,6 +1446,7 @@ heap_form_minimal_tuple(TupleDesc tupleDescriptor,
 	bool		hasnull = false;
 	int			numberOfAttributes = tupleDescriptor->natts;
 	int			i;
+	bool		logical_order = (flags & HTOPT_LOGICAL_ORDER) != 0;
 
 	if (numberOfAttributes > MaxTupleAttributeNumber)
 		ereport(ERROR,
@@ -1444,7 +1479,7 @@ heap_form_minimal_tuple(TupleDesc tupleDescriptor,
 
 	hoff = len = MAXALIGN(len); /* align user data safely */
 
-	data_len = heap_compute_data_size(tupleDescriptor, values, isnull);
+	data_len = heap_compute_data_size(tupleDescriptor, values, isnull, logical_order);
 
 	len += data_len;
 
@@ -1466,6 +1501,7 @@ heap_form_minimal_tuple(TupleDesc tupleDescriptor,
 	heap_fill_tuple(tupleDescriptor,
 					values,
 					isnull,
+					logical_order,
 					(char *) tuple + hoff,
 					data_len,
 					&tuple->t_infomask,
diff --git a/src/backend/access/common/indextuple.c b/src/backend/access/common/indextuple.c
index 8d9a893..16a45c2 100644
--- a/src/backend/access/common/indextuple.c
+++ b/src/backend/access/common/indextuple.c
@@ -121,10 +121,10 @@ index_form_tuple(TupleDesc tupleDescriptor,
 	hoff = IndexInfoFindDataOffset(infomask);
 #ifdef TOAST_INDEX_HACK
 	data_size = heap_compute_data_size(tupleDescriptor,
-									   untoasted_values, isnull);
+									   untoasted_values, isnull, false);
 #else
 	data_size = heap_compute_data_size(tupleDescriptor,
-									   values, isnull);
+									   values, isnull, false);
 #endif
 	size = hoff + data_size;
 	size = MAXALIGN(size);		/* be conservative */
@@ -139,6 +139,7 @@ index_form_tuple(TupleDesc tupleDescriptor,
 					values,
 #endif
 					isnull,
+					false,
 					(char *) tp + hoff,
 					data_size,
 					&tupmask,
diff --git a/src/backend/access/common/printtup.c b/src/backend/access/common/printtup.c
index c7fa727..4c74df3 100644
--- a/src/backend/access/common/printtup.c
+++ b/src/backend/access/common/printtup.c
@@ -199,6 +199,10 @@ SendRowDescriptionMessage(TupleDesc typeinfo, List *targetlist, int16 *formats)
 	pq_beginmessage(&buf, 'T'); /* tuple descriptor message type */
 	pq_sendint(&buf, natts, 2); /* # of attrs in tuples */
 
+	/*
+	 * The attributes in the slot's descriptor are already in logical order;
+	 * we don't editorialize on the ordering here.
+	 */
 	for (i = 0; i < natts; ++i)
 	{
 		Oid			atttypid = attrs[i]->atttypid;
@@ -327,7 +331,8 @@ printtup(TupleTableSlot *slot, DestReceiver *self)
 	pq_sendint(&buf, natts, 2);
 
 	/*
-	 * send the attributes of this tuple
+	 * Send the attributes of this tuple.  Note the attributes of the slot's
+	 * descriptor are already in logical order.
 	 */
 	for (i = 0; i < natts; ++i)
 	{
@@ -430,7 +435,8 @@ printtup_20(TupleTableSlot *slot, DestReceiver *self)
 		pq_sendint(&buf, j, 1);
 
 	/*
-	 * send the attributes of this tuple
+	 * Send the attributes of this tuple.  Note the attributes of the slot's
+	 * descriptor are already in logical order.
 	 */
 	for (i = 0; i < natts; ++i)
 	{
@@ -517,7 +523,8 @@ debugStartup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	int			i;
 
 	/*
-	 * show the return type of the tuples
+	 * Show the return type of the tuples.  Note the attributes of the slot's
+	 * descriptor are already in logical order.
 	 */
 	for (i = 0; i < natts; ++i)
 		printatt((unsigned) i + 1, attinfo[i], NULL);
@@ -540,6 +547,10 @@ debugtup(TupleTableSlot *slot, DestReceiver *self)
 	Oid			typoutput;
 	bool		typisvarlena;
 
+	/*
+	 * Send the attributes of this tuple.  Note the attributes of the slot's
+	 * descriptor are already in logical order.
+	 */
 	for (i = 0; i < natts; ++i)
 	{
 		attr = slot_getattr(slot, i + 1, &isnull);
@@ -612,7 +623,8 @@ printtup_internal_20(TupleTableSlot *slot, DestReceiver *self)
 		pq_sendint(&buf, j, 1);
 
 	/*
-	 * send the attributes of this tuple
+	 * Send the attributes of this tuple.  Note the attributes of the slot's
+	 * descriptor are already in logical order.
 	 */
 	for (i = 0; i < natts; ++i)
 	{
diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c
index f3b3689..1f4c86a 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -25,6 +25,7 @@
 #include "parser/parse_type.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/resowner_private.h"
 #include "utils/syscache.h"
 
@@ -87,6 +88,7 @@ CreateTemplateTupleDesc(int natts, bool hasoid)
 	 * Initialize other fields of the tupdesc.
 	 */
 	desc->natts = natts;
+	desc->logattrs = NULL;
 	desc->constr = NULL;
 	desc->tdtypeid = RECORDOID;
 	desc->tdtypmod = -1;
@@ -120,6 +122,7 @@ CreateTupleDesc(int natts, bool hasoid, Form_pg_attribute *attrs)
 	desc = (TupleDesc) palloc(sizeof(struct tupleDesc));
 	desc->attrs = attrs;
 	desc->natts = natts;
+	desc->logattrs = NULL;
 	desc->constr = NULL;
 	desc->tdtypeid = RECORDOID;
 	desc->tdtypmod = -1;
@@ -154,6 +157,8 @@ CreateTupleDescCopy(TupleDesc tupdesc)
 	desc->tdtypeid = tupdesc->tdtypeid;
 	desc->tdtypmod = tupdesc->tdtypmod;
 
+	Assert(desc->logattrs == NULL);
+
 	return desc;
 }
 
@@ -251,6 +256,7 @@ TupleDescCopyEntry(TupleDesc dst, AttrNumber dstAttno,
 	 * bit to avoid a useless O(N^2) penalty.
 	 */
 	dst->attrs[dstAttno - 1]->attnum = dstAttno;
+	dst->attrs[dstAttno - 1]->attlognum = dstAttno;
 	dst->attrs[dstAttno - 1]->attcacheoff = -1;
 
 	/* since we're not copying constraints or defaults, clear these */
@@ -301,6 +307,9 @@ FreeTupleDesc(TupleDesc tupdesc)
 		pfree(tupdesc->constr);
 	}
 
+	if (tupdesc->logattrs)
+		pfree(tupdesc->logattrs);
+
 	pfree(tupdesc);
 }
 
@@ -345,7 +354,7 @@ DecrTupleDescRefCount(TupleDesc tupdesc)
  * Note: we deliberately do not check the attrelid and tdtypmod fields.
  * This allows typcache.c to use this routine to see if a cached record type
  * matches a requested type, and is harmless for relcache.c's uses.
- * We don't compare tdrefcount, either.
+ * We don't compare tdrefcount nor logattrs, either.
  */
 bool
 equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
@@ -386,6 +395,13 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
 			return false;
 		if (attr1->attlen != attr2->attlen)
 			return false;
+		if (attr1->attphysnum != attr2->attphysnum)
+			return false;
+		/* intentionally do not compare attlognum */
+#if 0
+		if (attr1->attlognum != attr2->attlognum)
+			return false;
+#endif
 		if (attr1->attndims != attr2->attndims)
 			return false;
 		if (attr1->atttypmod != attr2->atttypmod)
@@ -529,6 +545,8 @@ TupleDescInitEntry(TupleDesc desc,
 	att->atttypmod = typmod;
 
 	att->attnum = attributeNumber;
+	att->attphysnum = attributeNumber;
+	att->attlognum = attributeNumber;
 	att->attndims = attdim;
 
 	att->attnotnull = false;
@@ -574,6 +592,27 @@ TupleDescInitEntryCollation(TupleDesc desc,
 	desc->attrs[attributeNumber - 1]->attcollation = collationid;
 }
 
+/*
+ * TupleDescInitEntryLognum
+ *
+ * Assign a nondefault lognum to a previously initialized tuple descriptor
+ * entry.
+ */
+void
+TupleDescInitEntryLognum(TupleDesc desc,
+						 AttrNumber attributeNumber,
+						 int attlognum)
+{
+	/*
+	 * sanity checks
+	 */
+	AssertArg(PointerIsValid(desc));
+	AssertArg(attributeNumber >= 1);
+	AssertArg(attributeNumber <= desc->natts);
+
+	desc->attrs[attributeNumber - 1]->attlognum = attlognum;
+}
+
 
 /*
  * BuildDescForRelation
@@ -666,6 +705,8 @@ BuildDescForRelation(List *schema)
 		desc->constr = NULL;
 	}
 
+	Assert(desc->logattrs == NULL);
+
 	return desc;
 }
 
@@ -726,5 +767,49 @@ BuildDescFromLists(List *names, List *types, List *typmods, List *collations)
 		TupleDescInitEntryCollation(desc, attnum, attcollation);
 	}
 
+	Assert(desc->logattrs == NULL);
 	return desc;
 }
+
+/*
+ * qsort callback for TupleDescGetSortedAttrs
+ */
+static int
+cmplognum(const void *attr1, const void *attr2)
+{
+	Form_pg_attribute	att1 = *(Form_pg_attribute *) attr1;
+	Form_pg_attribute	att2 = *(Form_pg_attribute *) attr2;
+
+	if (att1->attlognum < att2->attlognum)
+		return -1;
+	if (att1->attlognum > att2->attlognum)
+		return 1;
+	return 0;
+}
+
+/*
+ * Return the array of attrs sorted by logical position
+ */
+Form_pg_attribute *
+TupleDescGetSortedAttrs(TupleDesc desc)
+{
+	if (desc->logattrs == NULL)
+	{
+		Form_pg_attribute *attrs;
+
+		/*
+		 * logattrs must be allocated in the same memcxt as the tupdesc it
+		 * belongs to, so that it isn't reset ahead of time.
+		 */
+		attrs = MemoryContextAlloc(GetMemoryChunkContext(desc),
+								   sizeof(Form_pg_attribute) * desc->natts);
+		memcpy(attrs, desc->attrs,
+			   sizeof(Form_pg_attribute) * desc->natts);
+
+		qsort(attrs, desc->natts, sizeof(Form_pg_attribute), cmplognum);
+
+		desc->logattrs = attrs;
+	}
+
+	return desc->logattrs;
+}
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index ce44bbd..017713d 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -658,8 +658,8 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 	 * Look for attributes with attstorage 'x' to compress.  Also find large
 	 * attributes with attstorage 'x' or 'e', and store them external.
 	 */
-	while (heap_compute_data_size(tupleDesc,
-								  toast_values, toast_isnull) > maxDataLen)
+	while (heap_compute_data_size(tupleDesc, toast_values, toast_isnull,
+								  false) > maxDataLen)
 	{
 		int			biggest_attno = -1;
 		int32		biggest_size = MAXALIGN(TOAST_POINTER_SIZE);
@@ -748,8 +748,8 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 	 * Second we look for attributes of attstorage 'x' or 'e' that are still
 	 * inline.  But skip this if there's no toast table to push them to.
 	 */
-	while (heap_compute_data_size(tupleDesc,
-								  toast_values, toast_isnull) > maxDataLen &&
+	while (heap_compute_data_size(tupleDesc, toast_values, toast_isnull,
+								  false) > maxDataLen &&
 		   rel->rd_rel->reltoastrelid != InvalidOid)
 	{
 		int			biggest_attno = -1;
@@ -799,8 +799,8 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 	 * Round 3 - this time we take attributes with storage 'm' into
 	 * compression
 	 */
-	while (heap_compute_data_size(tupleDesc,
-								  toast_values, toast_isnull) > maxDataLen)
+	while (heap_compute_data_size(tupleDesc, toast_values, toast_isnull,
+								  false) > maxDataLen)
 	{
 		int			biggest_attno = -1;
 		int32		biggest_size = MAXALIGN(TOAST_POINTER_SIZE);
@@ -862,8 +862,8 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 	 */
 	maxDataLen = TOAST_TUPLE_TARGET_MAIN - hoff;
 
-	while (heap_compute_data_size(tupleDesc,
-								  toast_values, toast_isnull) > maxDataLen &&
+	while (heap_compute_data_size(tupleDesc, toast_values, toast_isnull,
+								  false) > maxDataLen &&
 		   rel->rd_rel->reltoastrelid != InvalidOid)
 	{
 		int			biggest_attno = -1;
@@ -937,8 +937,8 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 		if (olddata->t_infomask & HEAP_HASOID)
 			new_header_len += sizeof(Oid);
 		new_header_len = MAXALIGN(new_header_len);
-		new_data_len = heap_compute_data_size(tupleDesc,
-											  toast_values, toast_isnull);
+		new_data_len = heap_compute_data_size(tupleDesc, toast_values,
+											  toast_isnull, false);
 		new_tuple_len = new_header_len + new_data_len;
 
 		/*
@@ -964,6 +964,7 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 		heap_fill_tuple(tupleDesc,
 						toast_values,
 						toast_isnull,
+						false,
 						(char *) new_data + new_header_len,
 						new_data_len,
 						&(new_data->t_infomask),
@@ -1170,8 +1171,8 @@ toast_flatten_tuple_to_datum(HeapTupleHeader tup,
 	if (tup->t_infomask & HEAP_HASOID)
 		new_header_len += sizeof(Oid);
 	new_header_len = MAXALIGN(new_header_len);
-	new_data_len = heap_compute_data_size(tupleDesc,
-										  toast_values, toast_isnull);
+	new_data_len = heap_compute_data_size(tupleDesc, toast_values,
+										  toast_isnull, false);
 	new_tuple_len = new_header_len + new_data_len;
 
 	new_data = (HeapTupleHeader) palloc0(new_tuple_len);
@@ -1194,6 +1195,7 @@ toast_flatten_tuple_to_datum(HeapTupleHeader tup,
 	heap_fill_tuple(tupleDesc,
 					toast_values,
 					toast_isnull,
+					false,
 					(char *) new_data + new_header_len,
 					new_data_len,
 					&(new_data->t_infomask),
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 4a542e6..a3464e1 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -705,7 +705,9 @@ DefineAttr(char *name, char *type, int attnum)
 
 	namestrcpy(&attrtypes[attnum]->attname, name);
 	elog(DEBUG4, "column %s %s", NameStr(attrtypes[attnum]->attname), type);
-	attrtypes[attnum]->attnum = attnum + 1;		/* fillatt */
+	attrtypes[attnum]->attnum = attnum + 1;
+	attrtypes[attnum]->attphysnum = attnum + 1;
+	attrtypes[attnum]->attlognum = attnum + 1;
 
 	typeoid = gettype(type);
 
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index ca89879..85a46a3 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -198,6 +198,8 @@ foreach my $catname (@{ $catalogs->{names} })
 				$attnum++;
 				my $row = emit_pgattr_row($table_name, $attr, $priornotnull);
 				$row->{attnum}        = $attnum;
+				$row->{attphysnum}    = $attnum;
+				$row->{attlognum}     = $attnum;
 				$row->{attstattarget} = '-1';
 				$priornotnull &= ($row->{attnotnull} eq 't');
 
@@ -235,6 +237,8 @@ foreach my $catname (@{ $catalogs->{names} })
 					$attnum--;
 					my $row = emit_pgattr_row($table_name, $attr, 1);
 					$row->{attnum}        = $attnum;
+					$row->{attphysnum}    = $attnum;
+					$row->{attlognum}     = $attnum;
 					$row->{attstattarget} = '0';
 
 					# some catalogs don't have oids
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index e523ee9..93c182c 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -136,37 +136,49 @@ static List *insert_ordered_unique_oid(List *list, Oid datum);
 
 static FormData_pg_attribute a1 = {
 	0, {"ctid"}, TIDOID, 0, sizeof(ItemPointerData),
-	SelfItemPointerAttributeNumber, 0, -1, -1,
+	SelfItemPointerAttributeNumber, SelfItemPointerAttributeNumber,
+	SelfItemPointerAttributeNumber,
+	0, -1, -1,
 	false, 'p', 's', true, false, false, true, 0
 };
 
 static FormData_pg_attribute a2 = {
 	0, {"oid"}, OIDOID, 0, sizeof(Oid),
-	ObjectIdAttributeNumber, 0, -1, -1,
+	ObjectIdAttributeNumber, ObjectIdAttributeNumber,
+	ObjectIdAttributeNumber,
+	0, -1, -1,
 	true, 'p', 'i', true, false, false, true, 0
 };
 
 static FormData_pg_attribute a3 = {
 	0, {"xmin"}, XIDOID, 0, sizeof(TransactionId),
-	MinTransactionIdAttributeNumber, 0, -1, -1,
+	MinTransactionIdAttributeNumber, MinTransactionIdAttributeNumber,
+	MinTransactionIdAttributeNumber,
+	0, -1, -1,
 	true, 'p', 'i', true, false, false, true, 0
 };
 
 static FormData_pg_attribute a4 = {
 	0, {"cmin"}, CIDOID, 0, sizeof(CommandId),
-	MinCommandIdAttributeNumber, 0, -1, -1,
+	MinCommandIdAttributeNumber, MinCommandIdAttributeNumber,
+	MinCommandIdAttributeNumber,
+	0, -1, -1,
 	true, 'p', 'i', true, false, false, true, 0
 };
 
 static FormData_pg_attribute a5 = {
 	0, {"xmax"}, XIDOID, 0, sizeof(TransactionId),
-	MaxTransactionIdAttributeNumber, 0, -1, -1,
+	MaxTransactionIdAttributeNumber, MaxTransactionIdAttributeNumber,
+	MaxTransactionIdAttributeNumber,
+	0, -1, -1,
 	true, 'p', 'i', true, false, false, true, 0
 };
 
 static FormData_pg_attribute a6 = {
 	0, {"cmax"}, CIDOID, 0, sizeof(CommandId),
-	MaxCommandIdAttributeNumber, 0, -1, -1,
+	MaxCommandIdAttributeNumber, MaxCommandIdAttributeNumber,
+	MaxCommandIdAttributeNumber,
+	0, -1, -1,
 	true, 'p', 'i', true, false, false, true, 0
 };
 
@@ -178,7 +190,9 @@ static FormData_pg_attribute a6 = {
  */
 static FormData_pg_attribute a7 = {
 	0, {"tableoid"}, OIDOID, 0, sizeof(Oid),
-	TableOidAttributeNumber, 0, -1, -1,
+	TableOidAttributeNumber, TableOidAttributeNumber,
+	TableOidAttributeNumber,
+	0, -1, -1,
 	true, 'p', 'i', true, false, false, true, 0
 };
 
@@ -615,6 +629,8 @@ InsertPgAttributeTuple(Relation pg_attribute_rel,
 	values[Anum_pg_attribute_attstattarget - 1] = Int32GetDatum(new_attribute->attstattarget);
 	values[Anum_pg_attribute_attlen - 1] = Int16GetDatum(new_attribute->attlen);
 	values[Anum_pg_attribute_attnum - 1] = Int16GetDatum(new_attribute->attnum);
+	values[Anum_pg_attribute_attphysnum - 1] = Int16GetDatum(new_attribute->attphysnum);
+	values[Anum_pg_attribute_attlognum - 1] = Int16GetDatum(new_attribute->attlognum);
 	values[Anum_pg_attribute_attndims - 1] = Int32GetDatum(new_attribute->attndims);
 	values[Anum_pg_attribute_attcacheoff - 1] = Int32GetDatum(new_attribute->attcacheoff);
 	values[Anum_pg_attribute_atttypmod - 1] = Int32GetDatum(new_attribute->atttypmod);
@@ -2174,6 +2190,7 @@ AddRelationNewConstraints(Relation rel,
 	foreach(cell, newColDefaults)
 	{
 		RawColumnDefault *colDef = (RawColumnDefault *) lfirst(cell);
+		/* FIXME -- does this need to change? apparently not, but it's suspicious */
 		Form_pg_attribute atp = rel->rd_att->attrs[colDef->attnum - 1];
 
 		expr = cookDefault(pstate, colDef->raw_default,
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 844d413..1b62405 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -348,6 +348,8 @@ ConstructTupleDescriptor(Relation heapRelation,
 			 * attr
 			 */
 			to->attnum = i + 1;
+			to->attlognum = i + 1;
+			to->attphysnum = i + 1;
 
 			to->attstattarget = -1;
 			to->attcacheoff = -1;
@@ -382,6 +384,8 @@ ConstructTupleDescriptor(Relation heapRelation,
 			 * Assign some of the attributes values. Leave the rest as 0.
 			 */
 			to->attnum = i + 1;
+			to->attlognum = i + 1;
+			to->attphysnum = i + 1;
 			to->atttypid = keyType;
 			to->attlen = typeTup->typlen;
 			to->attbyval = typeTup->typbyval;
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 08abe14..764ce77 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -158,7 +158,7 @@ typedef struct CopyStateData
 	bool		file_has_oids;
 	FmgrInfo	oid_in_function;
 	Oid			oid_typioparam;
-	FmgrInfo   *in_functions;	/* array of input functions for each attrs */
+	FmgrInfo   *in_functions;	/* array of input functions for each attr */
 	Oid		   *typioparams;	/* array of element types for in_functions */
 	int		   *defmap;			/* array of default att numbers */
 	ExprState **defexprs;		/* array of default att expressions */
@@ -4296,7 +4296,7 @@ CopyGetAttnums(TupleDesc tupDesc, Relation rel, List *attnamelist)
 	if (attnamelist == NIL)
 	{
 		/* Generate default column list */
-		Form_pg_attribute *attr = tupDesc->attrs;
+		Form_pg_attribute *attr = TupleDescGetSortedAttrs(tupDesc);
 		int			attr_count = tupDesc->natts;
 		int			i;
 
@@ -4304,7 +4304,7 @@ CopyGetAttnums(TupleDesc tupDesc, Relation rel, List *attnamelist)
 		{
 			if (attr[i]->attisdropped)
 				continue;
-			attnums = lappend_int(attnums, i + 1);
+			attnums = lappend_int(attnums, attr[i]->attnum);
 		}
 	}
 	else
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1e737a0..ea9490f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1485,7 +1485,8 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
 		TupleDesc	tupleDesc;
 		TupleConstr *constr;
 		AttrNumber *newattno;
-		AttrNumber	parent_attno;
+		AttrNumber	parent_colctr;
+		Form_pg_attribute *parent_attrs;
 
 		/*
 		 * A self-exclusive lock is needed here.  If two backends attempt to
@@ -1542,6 +1543,7 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
 			parentsWithOids++;
 
 		tupleDesc = RelationGetDescr(relation);
+		parent_attrs = TupleDescGetSortedAttrs(tupleDesc);
 		constr = tupleDesc->constr;
 
 		/*
@@ -1552,10 +1554,17 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
 		newattno = (AttrNumber *)
 			palloc0(tupleDesc->natts * sizeof(AttrNumber));
 
-		for (parent_attno = 1; parent_attno <= tupleDesc->natts;
-			 parent_attno++)
+		/*
+		 * parent_colctr is the index into the logical-ordered array of parent
+		 * columns; parent_attno is the attnum of each column.  The newattno
+		 * map entries must use the latter for numbering; the former is a loop
+		 * counter only.
+		 */
+		for (parent_colctr = 1; parent_colctr <= tupleDesc->natts;
+			 parent_colctr++)
 		{
-			Form_pg_attribute attribute = tupleDesc->attrs[parent_attno - 1];
+			Form_pg_attribute attribute = parent_attrs[parent_colctr - 1];
+			AttrNumber	parent_attno = attribute->attnum;
 			char	   *attributeName = NameStr(attribute->attname);
 			int			exist_attno;
 			ColumnDef  *def;
@@ -4727,6 +4736,8 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
 	attribute.attcacheoff = -1;
 	attribute.atttypmod = typmod;
 	attribute.attnum = newattnum;
+	attribute.attlognum = newattnum;
+	attribute.attphysnum = newattnum;
 	attribute.attbyval = tform->typbyval;
 	attribute.attndims = list_length(colDef->typeName->arrayBounds);
 	attribute.attstorage = tform->typstorage;
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 88af735..6896098 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -1189,6 +1189,9 @@ ExecEvalParamExtern(ExprState *exprstate, ExprContext *econtext,
  *		to use these.  Ex: overpaid(EMP) might call GetAttributeByNum().
  *		Note: these are actually rather slow because they do a typcache
  *		lookup on each call.
+ *
+ *	FIXME -- probably these functions should consider attrno a logical column
+ *	number
  */
 Datum
 GetAttributeByNum(HeapTupleHeader tuple,
@@ -3289,7 +3292,8 @@ ExecEvalRow(RowExprState *rstate,
 		i++;
 	}
 
-	tuple = heap_form_tuple(rstate->tupdesc, values, isnull);
+	tuple = heap_form_tuple_extended(rstate->tupdesc, values, isnull,
+									 HTOPT_LOGICAL_ORDER);
 
 	pfree(values);
 	pfree(isnull);
@@ -4035,6 +4039,7 @@ ExecEvalFieldSelect(FieldSelectState *fstate,
 	TupleDesc	tupDesc;
 	Form_pg_attribute attr;
 	HeapTupleData tmptup;
+	Form_pg_attribute *attrs;
 
 	tupDatum = ExecEvalExpr(fstate->arg, econtext, isNull, isDone);
 
@@ -4062,7 +4067,8 @@ ExecEvalFieldSelect(FieldSelectState *fstate,
 	if (fieldnum > tupDesc->natts)		/* should never happen */
 		elog(ERROR, "attribute number %d exceeds number of columns %d",
 			 fieldnum, tupDesc->natts);
-	attr = tupDesc->attrs[fieldnum - 1];
+	attrs = TupleDescGetSortedAttrs(tupDesc);
+	attr = attrs[fieldnum - 1];
 
 	/* Check for dropped column, and force a NULL result if so */
 	if (attr->attisdropped)
@@ -4085,7 +4091,7 @@ ExecEvalFieldSelect(FieldSelectState *fstate,
 	tmptup.t_data = tuple;
 
 	result = heap_getattr(&tmptup,
-						  fieldnum,
+						  attr->attnum,
 						  tupDesc,
 						  isNull);
 	return result;
@@ -4111,6 +4117,7 @@ ExecEvalFieldStore(FieldStoreState *fstate,
 	bool	   *isnull;
 	Datum		save_datum;
 	bool		save_isNull;
+	Form_pg_attribute *attrs;
 	ListCell   *l1,
 			   *l2;
 
@@ -4122,6 +4129,7 @@ ExecEvalFieldStore(FieldStoreState *fstate,
 	/* Lookup tupdesc if first time through or after rescan */
 	tupDesc = get_cached_rowtype(fstore->resulttype, -1,
 								 &fstate->argdesc, econtext);
+	attrs = TupleDescGetSortedAttrs(tupDesc);
 
 	/* Allocate workspace */
 	values = (Datum *) palloc(tupDesc->natts * sizeof(Datum));
@@ -4142,7 +4150,8 @@ ExecEvalFieldStore(FieldStoreState *fstate,
 		tmptup.t_tableOid = InvalidOid;
 		tmptup.t_data = tuphdr;
 
-		heap_deform_tuple(&tmptup, tupDesc, values, isnull);
+		heap_deform_tuple_extended(&tmptup, tupDesc, values, isnull,
+								   HTOPT_LOGICAL_ORDER);
 	}
 	else
 	{
@@ -4160,8 +4169,10 @@ ExecEvalFieldStore(FieldStoreState *fstate,
 	{
 		ExprState  *newval = (ExprState *) lfirst(l1);
 		AttrNumber	fieldnum = lfirst_int(l2);
+		AttrNumber	attnum = attrs[fieldnum - 1]->attnum;
+
 
-		Assert(fieldnum > 0 && fieldnum <= tupDesc->natts);
+		Assert(attnum > 0 && attnum <= tupDesc->natts);
 
 		/*
 		 * Use the CaseTestExpr mechanism to pass down the old value of the
@@ -4172,19 +4183,20 @@ ExecEvalFieldStore(FieldStoreState *fstate,
 		 * assignment can't be within a CASE either.  (So saving and restoring
 		 * the caseValue is just paranoia, but let's do it anyway.)
 		 */
-		econtext->caseValue_datum = values[fieldnum - 1];
-		econtext->caseValue_isNull = isnull[fieldnum - 1];
+		econtext->caseValue_datum = values[attnum - 1];
+		econtext->caseValue_isNull = isnull[attnum - 1];
 
-		values[fieldnum - 1] = ExecEvalExpr(newval,
-											econtext,
-											&isnull[fieldnum - 1],
-											NULL);
+		values[attnum - 1] = ExecEvalExpr(newval,
+										  econtext,
+										  &isnull[attnum - 1],
+										  NULL);
 	}
 
 	econtext->caseValue_datum = save_datum;
 	econtext->caseValue_isNull = save_isNull;
 
-	tuple = heap_form_tuple(tupDesc, values, isnull);
+	tuple = heap_form_tuple_extended(tupDesc, values, isnull,
+									 HTOPT_LOGICAL_ORDER);
 
 	pfree(values);
 	pfree(isnull);
@@ -4830,7 +4842,7 @@ ExecInitExpr(Expr *node, PlanState *parent)
 				BlessTupleDesc(rstate->tupdesc);
 				/* Set up evaluation, skipping any deleted columns */
 				Assert(list_length(rowexpr->args) <= rstate->tupdesc->natts);
-				attrs = rstate->tupdesc->attrs;
+				attrs = TupleDescGetSortedAttrs(rstate->tupdesc);
 				i = 0;
 				foreach(l, rowexpr->args)
 				{
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 1319519..476fa18 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -271,11 +271,12 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc
 	int			attrno;
 	bool		hasoid;
 	ListCell   *tlist_item = list_head(tlist);
+	Form_pg_attribute *attrs = TupleDescGetSortedAttrs(tupdesc);
 
 	/* Check the tlist attributes */
 	for (attrno = 1; attrno <= numattrs; attrno++)
 	{
-		Form_pg_attribute att_tup = tupdesc->attrs[attrno - 1];
+		Form_pg_attribute att_tup = attrs[attrno - 1];
 		Var		   *var;
 
 		if (tlist_item == NULL)
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 0811941..7e6c9d5 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -599,7 +599,8 @@ ExecCopySlotMinimalTuple(TupleTableSlot *slot)
 	 */
 	return heap_form_minimal_tuple(slot->tts_tupleDescriptor,
 								   slot->tts_values,
-								   slot->tts_isnull);
+								   slot->tts_isnull,
+								   HTOPT_LOGICAL_ORDER);
 }
 
 /* --------------------------------
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 4d11260..69a8ab6 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1657,6 +1657,7 @@ check_sql_fn_retval(Oid func_id, Oid rettype, List *queryTreeList,
 	{
 		/* Returns a rowtype */
 		TupleDesc	tupdesc;
+		Form_pg_attribute *attrs;
 		int			tupnatts;	/* physical number of columns in tuple */
 		int			tuplogcols; /* # of nondeleted columns in tuple */
 		int			colindex;	/* physical column index */
@@ -1721,6 +1722,7 @@ check_sql_fn_retval(Oid func_id, Oid rettype, List *queryTreeList,
 		 * result columns if the caller asked for that.
 		 */
 		tupnatts = tupdesc->natts;
+		attrs = TupleDescGetSortedAttrs(tupdesc);
 		tuplogcols = 0;			/* we'll count nondeleted cols as we go */
 		colindex = 0;
 		newtlist = NIL;			/* these are only used if modifyTargetList */
@@ -1749,7 +1751,7 @@ check_sql_fn_retval(Oid func_id, Oid rettype, List *queryTreeList,
 							 errmsg("return type mismatch in function declared to return %s",
 									format_type_be(rettype)),
 					errdetail("Final statement returns too many columns.")));
-				attr = tupdesc->attrs[colindex - 1];
+				attr = attrs[colindex - 1];
 				if (attr->attisdropped && modifyTargetList)
 				{
 					Expr	   *null_expr;
@@ -1806,7 +1808,7 @@ check_sql_fn_retval(Oid func_id, Oid rettype, List *queryTreeList,
 		/* remaining columns in tupdesc had better all be dropped */
 		for (colindex++; colindex <= tupnatts; colindex++)
 		{
-			if (!tupdesc->attrs[colindex - 1]->attisdropped)
+			if (!attrs[colindex - 1]->attisdropped)
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
 						 errmsg("return type mismatch in function declared to return %s",
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 6b1bf7b..2081e54 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2006,6 +2006,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
 	COPY_SCALAR_FIELD(rtekind);
 	COPY_SCALAR_FIELD(relid);
 	COPY_SCALAR_FIELD(relkind);
+	COPY_NODE_FIELD(lognums);
 	COPY_NODE_FIELD(subquery);
 	COPY_SCALAR_FIELD(security_barrier);
 	COPY_SCALAR_FIELD(jointype);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index edbd09f..5631dc0 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -24,6 +24,7 @@
 #include <ctype.h>
 
 #include "lib/stringinfo.h"
+#include "nodes/execnodes.h"
 #include "nodes/plannodes.h"
 #include "nodes/relation.h"
 #include "utils/datum.h"
@@ -1440,6 +1441,22 @@ _outTargetEntry(StringInfo str, const TargetEntry *node)
 }
 
 static void
+_outGenericExprState(StringInfo str, const GenericExprState *node)
+{
+	WRITE_NODE_TYPE("GENERICEXPRSTATE");
+
+	WRITE_NODE_FIELD(arg);
+}
+
+static void
+_outExprState(StringInfo str, const ExprState *node)
+{
+	WRITE_NODE_TYPE("EXPRSTATE");
+
+	WRITE_NODE_FIELD(expr);
+}
+
+static void
 _outRangeTblRef(StringInfo str, const RangeTblRef *node)
 {
 	WRITE_NODE_TYPE("RANGETBLREF");
@@ -2420,6 +2437,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
 		case RTE_RELATION:
 			WRITE_OID_FIELD(relid);
 			WRITE_CHAR_FIELD(relkind);
+			WRITE_NODE_FIELD(lognums);
 			break;
 		case RTE_SUBQUERY:
 			WRITE_NODE_FIELD(subquery);
@@ -3073,6 +3091,12 @@ _outNode(StringInfo str, const void *obj)
 			case T_FromExpr:
 				_outFromExpr(str, obj);
 				break;
+			case T_GenericExprState:
+				_outGenericExprState(str, obj);
+				break;
+			case T_ExprState:
+				_outExprState(str, obj);
+				break;
 
 			case T_Path:
 				_outPath(str, obj);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index a3efdd4..38dc982 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1216,6 +1216,7 @@ _readRangeTblEntry(void)
 		case RTE_RELATION:
 			READ_OID_FIELD(relid);
 			READ_CHAR_FIELD(relkind);
+			READ_NODE_FIELD(lognums);
 			break;
 		case RTE_SUBQUERY:
 			READ_NODE_FIELD(subquery);
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 9cb1378..a3d66b2 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -1749,7 +1749,7 @@ pullup_replace_vars_callback(Var *var,
 		 * expansion with varlevelsup = 0, and then adjust if needed.
 		 */
 		expandRTE(rcon->target_rte,
-				  var->varno, 0 /* not varlevelsup */ , var->location,
+				  var->varno, 0 /* not varlevelsup */ , var->location, false,
 				  (var->vartype != RECORDOID),
 				  &colnames, &fields);
 		/* Adjust the generated per-field Vars, but don't insert PHVs */
diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c
index 4ab12e5..c226079 100644
--- a/src/backend/optimizer/prep/preptlist.c
+++ b/src/backend/optimizer/prep/preptlist.c
@@ -216,6 +216,7 @@ expand_targetlist(List *tlist, int command_type,
 	 */
 	rel = heap_open(getrelid(result_relation, range_table), NoLock);
 
+	/* FIXME --- do we need a different order of attributes here? */
 	numattrs = RelationGetNumberOfAttributes(rel);
 
 	for (attrno = 1; attrno <= numattrs; attrno++)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index b2becfa..f63f201 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -857,17 +857,19 @@ build_physical_tlist(PlannerInfo *root, RelOptInfo *rel)
 	int			attrno,
 				numattrs;
 	List	   *colvars;
+	Form_pg_attribute *attrs;
 
 	switch (rte->rtekind)
 	{
 		case RTE_RELATION:
 			/* Assume we already have adequate lock */
 			relation = heap_open(rte->relid, NoLock);
+			attrs = TupleDescGetSortedAttrs(RelationGetDescr(relation));
 
 			numattrs = RelationGetNumberOfAttributes(relation);
 			for (attrno = 1; attrno <= numattrs; attrno++)
 			{
-				Form_pg_attribute att_tup = relation->rd_att->attrs[attrno - 1];
+				Form_pg_attribute att_tup = attrs[attrno - 1];
 
 				if (att_tup->attisdropped)
 				{
@@ -917,7 +919,7 @@ build_physical_tlist(PlannerInfo *root, RelOptInfo *rel)
 		case RTE_VALUES:
 		case RTE_CTE:
 			/* Not all of these can have dropped cols, but share code anyway */
-			expandRTE(rte, varno, 0, -1, true /* include dropped */ ,
+			expandRTE(rte, varno, 0, -1, true /* include dropped */ , false,
 					  NULL, &colvars);
 			foreach(l, colvars)
 			{
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index cc569ed..6c79bd3 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -682,7 +682,7 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt)
 		/*
 		 * Generate list of Vars referencing the RTE
 		 */
-		expandRTE(rte, rtr->rtindex, 0, -1, false, NULL, &exprList);
+		expandRTE(rte, rtr->rtindex, 0, -1, false, false, NULL, &exprList);
 	}
 	else
 	{
@@ -1209,7 +1209,7 @@ transformValuesClause(ParseState *pstate, SelectStmt *stmt)
 	 * Generate a targetlist as though expanding "*"
 	 */
 	Assert(pstate->p_next_resno == 1);
-	qry->targetList = expandRelAttrs(pstate, rte, rtindex, 0, -1);
+	qry->targetList = expandRelAttrs(pstate, rte, rtindex, 0, false, -1);
 
 	/*
 	 * The grammar allows attaching ORDER BY, LIMIT, and FOR UPDATE to a
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 4931dca..3b33f82 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -879,9 +879,9 @@ transformFromClauseItem(ParseState *pstate, Node *n,
 		 *
 		 * Note: expandRTE returns new lists, safe for me to modify
 		 */
-		expandRTE(l_rte, l_rtindex, 0, -1, false,
+		expandRTE(l_rte, l_rtindex, 0, -1, false, true,
 				  &l_colnames, &l_colvars);
-		expandRTE(r_rte, r_rtindex, 0, -1, false,
+		expandRTE(r_rte, r_rtindex, 0, -1, false, true,
 				  &r_colnames, &r_colvars);
 
 		/*
diff --git a/src/backend/parser/parse_coerce.c b/src/backend/parser/parse_coerce.c
index 8416d36..dc8f2e1 100644
--- a/src/backend/parser/parse_coerce.c
+++ b/src/backend/parser/parse_coerce.c
@@ -906,6 +906,7 @@ coerce_record_to_complex(ParseState *pstate, Node *node,
 	int			i;
 	int			ucolno;
 	ListCell   *arg;
+	Form_pg_attribute	*attrs;
 
 	if (node && IsA(node, RowExpr))
 	{
@@ -924,7 +925,7 @@ coerce_record_to_complex(ParseState *pstate, Node *node,
 		RangeTblEntry *rte;
 
 		rte = GetRTEByRangeTablePosn(pstate, rtindex, sublevels_up);
-		expandRTE(rte, rtindex, sublevels_up, vlocation, false,
+		expandRTE(rte, rtindex, sublevels_up, vlocation, false, false,
 				  NULL, &args);
 	}
 	else
@@ -939,6 +940,7 @@ coerce_record_to_complex(ParseState *pstate, Node *node,
 	newargs = NIL;
 	ucolno = 1;
 	arg = list_head(args);
+	attrs = TupleDescGetSortedAttrs(tupdesc);
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		Node	   *expr;
@@ -946,7 +948,7 @@ coerce_record_to_complex(ParseState *pstate, Node *node,
 		Oid			exprtype;
 
 		/* Fill in NULLs for dropped columns in rowtype */
-		if (tupdesc->attrs[i]->attisdropped)
+		if (attrs[i]->attisdropped)
 		{
 			/*
 			 * can't use atttypid here, but it doesn't really matter what type
@@ -970,8 +972,8 @@ coerce_record_to_complex(ParseState *pstate, Node *node,
 
 		cexpr = coerce_to_target_type(pstate,
 									  expr, exprtype,
-									  tupdesc->attrs[i]->atttypid,
-									  tupdesc->attrs[i]->atttypmod,
+									  attrs[i]->atttypid,
+									  attrs[i]->atttypmod,
 									  ccontext,
 									  COERCE_IMPLICIT_CAST,
 									  -1);
@@ -983,7 +985,7 @@ coerce_record_to_complex(ParseState *pstate, Node *node,
 							format_type_be(targetTypeId)),
 					 errdetail("Cannot cast type %s to %s in column %d.",
 							   format_type_be(exprtype),
-							   format_type_be(tupdesc->attrs[i]->atttypid),
+							   format_type_be(attrs[i]->atttypid),
 							   ucolno),
 					 parser_coercion_errposition(pstate, location, expr)));
 		newargs = lappend(newargs, cexpr);
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index 9ebd3fd..909f397 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -1759,6 +1759,7 @@ ParseComplexProjection(ParseState *pstate, char *funcname, Node *first_arg,
 {
 	TupleDesc	tupdesc;
 	int			i;
+	Form_pg_attribute *attrs;
 
 	/*
 	 * Special case for whole-row Vars so that we can resolve (foo.*).bar even
@@ -1796,9 +1797,10 @@ ParseComplexProjection(ParseState *pstate, char *funcname, Node *first_arg,
 		return NULL;			/* unresolvable RECORD type */
 	Assert(tupdesc);
 
+	attrs = TupleDescGetSortedAttrs(tupdesc);
 	for (i = 0; i < tupdesc->natts; i++)
 	{
-		Form_pg_attribute att = tupdesc->attrs[i];
+		Form_pg_attribute att = attrs[i];
 
 		if (strcmp(funcname, NameStr(att->attname)) == 0 &&
 			!att->attisdropped)
diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index 478584d..41a1464 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -43,12 +43,12 @@ static void markRTEForSelectPriv(ParseState *pstate, RangeTblEntry *rte,
 					 int rtindex, AttrNumber col);
 static void expandRelation(Oid relid, Alias *eref,
 			   int rtindex, int sublevels_up,
-			   int location, bool include_dropped,
+			   int location, bool include_dropped, bool logical_sort,
 			   List **colnames, List **colvars);
 static void expandTupleDesc(TupleDesc tupdesc, Alias *eref,
 				int count, int offset,
 				int rtindex, int sublevels_up,
-				int location, bool include_dropped,
+				int location, bool include_dropped, bool logical_sort,
 				List **colnames, List **colvars);
 static int	specialAttNum(const char *attname);
 static bool isQueryUsingTempRelation_walker(Node *node, void *context);
@@ -519,6 +519,12 @@ GetCTEForRTE(ParseState *pstate, RangeTblEntry *rte, int rtelevelsup)
 	return NULL;				/* keep compiler quiet */
 }
 
+static int16
+get_attnum_by_lognum(RangeTblEntry *rte, int16 attlognum)
+{
+	return list_nth_int(rte->lognums, attlognum - 1);
+}
+
 /*
  * scanRTEForColumn
  *	  Search the column names of a single RTE for the given name.
@@ -561,6 +567,8 @@ scanRTEForColumn(ParseState *pstate, RangeTblEntry *rte, char *colname,
 						 errmsg("column reference \"%s\" is ambiguous",
 								colname),
 						 parser_errposition(pstate, location)));
+			if (rte->lognums)
+				attnum = get_attnum_by_lognum(rte, attnum);
 			var = make_var(pstate, rte, attnum, location);
 			/* Require read access to the column */
 			markVarForSelectPriv(pstate, var, rte);
@@ -830,14 +838,19 @@ markVarForSelectPriv(ParseState *pstate, Var *var, RangeTblEntry *rte)
  * empty strings for any dropped columns, so that it will be one-to-one with
  * physical column numbers.
  *
+ * If lognums is not NULL, it will be filled with a map from logical column
+ * numbers to attnum; that way, the nth element of eref->colnames corresponds
+ * to the attnum found in the nth element of lognums.
+ *
  * It is an error for there to be more aliases present than required.
  */
 static void
-buildRelationAliases(TupleDesc tupdesc, Alias *alias, Alias *eref)
+buildRelationAliases(TupleDesc tupdesc, Alias *alias, Alias *eref, List **lognums)
 {
 	int			maxattrs = tupdesc->natts;
 	ListCell   *aliaslc;
 	int			numaliases;
+	Form_pg_attribute *attrs;
 	int			varattno;
 	int			numdropped = 0;
 
@@ -856,9 +869,11 @@ buildRelationAliases(TupleDesc tupdesc, Alias *alias, Alias *eref)
 		numaliases = 0;
 	}
 
+	attrs = TupleDescGetSortedAttrs(tupdesc);
+
 	for (varattno = 0; varattno < maxattrs; varattno++)
 	{
-		Form_pg_attribute attr = tupdesc->attrs[varattno];
+		Form_pg_attribute attr = attrs[varattno];
 		Value	   *attrname;
 
 		if (attr->attisdropped)
@@ -883,6 +898,9 @@ buildRelationAliases(TupleDesc tupdesc, Alias *alias, Alias *eref)
 		}
 
 		eref->colnames = lappend(eref->colnames, attrname);
+
+		if (lognums)
+			*lognums = lappend_int(*lognums, attr->attnum);
 	}
 
 	/* Too many user-supplied aliases? */
@@ -1030,7 +1048,7 @@ addRangeTableEntry(ParseState *pstate,
 	 * and/or actual column names.
 	 */
 	rte->eref = makeAlias(refname, NIL);
-	buildRelationAliases(rel->rd_att, alias, rte->eref);
+	buildRelationAliases(rel->rd_att, alias, rte->eref, &rte->lognums);
 
 	/*
 	 * Drop the rel refcount, but keep the access lock till end of transaction
@@ -1090,7 +1108,7 @@ addRangeTableEntryForRelation(ParseState *pstate,
 	 * and/or actual column names.
 	 */
 	rte->eref = makeAlias(refname, NIL);
-	buildRelationAliases(rel->rd_att, alias, rte->eref);
+	buildRelationAliases(rel->rd_att, alias, rte->eref, &rte->lognums);
 
 	/*
 	 * Set flags and access permissions.
@@ -1422,7 +1440,7 @@ addRangeTableEntryForFunction(ParseState *pstate,
 	}
 
 	/* Use the tupdesc while assigning column aliases for the RTE */
-	buildRelationAliases(tupdesc, alias, eref);
+	buildRelationAliases(tupdesc, alias, eref, NULL);
 
 	/*
 	 * Set flags and access permissions.
@@ -1787,13 +1805,16 @@ addRTEtoQuery(ParseState *pstate, RangeTblEntry *rte,
  * values to use in the created Vars.  Ordinarily rtindex should match the
  * actual position of the RTE in its rangetable.
  *
+ * If logical_sort is true, then the resulting lists are sorted by logical
+ * column number (attlognum); otherwise use regular attnum.
+ *
  * The output lists go into *colnames and *colvars.
  * If only one of the two kinds of output list is needed, pass NULL for the
  * output pointer for the unwanted one.
  */
 void
 expandRTE(RangeTblEntry *rte, int rtindex, int sublevels_up,
-		  int location, bool include_dropped,
+		  int location, bool include_dropped, bool logical_sort,
 		  List **colnames, List **colvars)
 {
 	int			varattno;
@@ -1808,8 +1829,8 @@ expandRTE(RangeTblEntry *rte, int rtindex, int sublevels_up,
 		case RTE_RELATION:
 			/* Ordinary relation RTE */
 			expandRelation(rte->relid, rte->eref,
-						   rtindex, sublevels_up, location,
-						   include_dropped, colnames, colvars);
+						   rtindex, sublevels_up, location, include_dropped,
+						   logical_sort, colnames, colvars);
 			break;
 		case RTE_SUBQUERY:
 			{
@@ -1875,7 +1896,8 @@ expandRTE(RangeTblEntry *rte, int rtindex, int sublevels_up,
 						expandTupleDesc(tupdesc, rte->eref,
 										rtfunc->funccolcount, atts_done,
 										rtindex, sublevels_up, location,
-										include_dropped, colnames, colvars);
+										include_dropped, logical_sort,
+										colnames, colvars);
 					}
 					else if (functypclass == TYPEFUNC_SCALAR)
 					{
@@ -2127,7 +2149,7 @@ expandRTE(RangeTblEntry *rte, int rtindex, int sublevels_up,
  */
 static void
 expandRelation(Oid relid, Alias *eref, int rtindex, int sublevels_up,
-			   int location, bool include_dropped,
+			   int location, bool include_dropped, bool logical_sort,
 			   List **colnames, List **colvars)
 {
 	Relation	rel;
@@ -2136,7 +2158,7 @@ expandRelation(Oid relid, Alias *eref, int rtindex, int sublevels_up,
 	rel = relation_open(relid, AccessShareLock);
 	expandTupleDesc(rel->rd_att, eref, rel->rd_att->natts, 0,
 					rtindex, sublevels_up,
-					location, include_dropped,
+					location, include_dropped, logical_sort,
 					colnames, colvars);
 	relation_close(rel, AccessShareLock);
 }
@@ -2153,11 +2175,17 @@ expandRelation(Oid relid, Alias *eref, int rtindex, int sublevels_up,
 static void
 expandTupleDesc(TupleDesc tupdesc, Alias *eref, int count, int offset,
 				int rtindex, int sublevels_up,
-				int location, bool include_dropped,
+				int location, bool include_dropped, bool logical_sort,
 				List **colnames, List **colvars)
 {
 	ListCell   *aliascell = list_head(eref->colnames);
-	int			varattno;
+	int			attnum;
+	Form_pg_attribute *attrs;
+
+	if (logical_sort)
+		attrs = TupleDescGetSortedAttrs(tupdesc);
+	else
+		attrs = tupdesc->attrs;
 
 	if (colnames)
 	{
@@ -2171,9 +2199,10 @@ expandTupleDesc(TupleDesc tupdesc, Alias *eref, int count, int offset,
 	}
 
 	Assert(count <= tupdesc->natts);
-	for (varattno = 0; varattno < count; varattno++)
+	for (attnum = 0; attnum < count; attnum++)
 	{
-		Form_pg_attribute attr = tupdesc->attrs[varattno];
+		Form_pg_attribute attr = attrs[attnum];
+		int		varattno = attr->attnum - 1;
 
 		if (attr->attisdropped)
 		{
@@ -2240,7 +2269,7 @@ expandTupleDesc(TupleDesc tupdesc, Alias *eref, int count, int offset,
  */
 List *
 expandRelAttrs(ParseState *pstate, RangeTblEntry *rte,
-			   int rtindex, int sublevels_up, int location)
+			   int rtindex, int sublevels_up, bool logical_sort, int location)
 {
 	List	   *names,
 			   *vars;
@@ -2248,7 +2277,7 @@ expandRelAttrs(ParseState *pstate, RangeTblEntry *rte,
 			   *var;
 	List	   *te_list = NIL;
 
-	expandRTE(rte, rtindex, sublevels_up, location, false,
+	expandRTE(rte, rtindex, sublevels_up, location, false, logical_sort,
 			  &names, &vars);
 
 	/*
diff --git a/src/backend/parser/parse_target.c b/src/backend/parser/parse_target.c
index 328e0c6..5227c73 100644
--- a/src/backend/parser/parse_target.c
+++ b/src/backend/parser/parse_target.c
@@ -896,7 +896,7 @@ checkInsertTargets(ParseState *pstate, List *cols, List **attrnos)
 		/*
 		 * Generate default column list for INSERT.
 		 */
-		Form_pg_attribute *attr = pstate->p_target_relation->rd_att->attrs;
+		Form_pg_attribute *attr = TupleDescGetSortedAttrs(pstate->p_target_relation->rd_att);
 		int			numcol = pstate->p_target_relation->rd_rel->relnatts;
 		int			i;
 
@@ -913,7 +913,7 @@ checkInsertTargets(ParseState *pstate, List *cols, List **attrnos)
 			col->val = NULL;
 			col->location = -1;
 			cols = lappend(cols, col);
-			*attrnos = lappend_int(*attrnos, i + 1);
+			*attrnos = lappend_int(*attrnos, attr[i]->attnum);
 		}
 	}
 	else
@@ -931,7 +931,7 @@ checkInsertTargets(ParseState *pstate, List *cols, List **attrnos)
 			char	   *name = col->name;
 			int			attrno;
 
-			/* Lookup column name, ereport on failure */
+			/* Lookup column number, ereport on failure */
 			attrno = attnameAttNum(pstate->p_target_relation, name, false);
 			if (attrno == InvalidAttrNumber)
 				ereport(ERROR,
@@ -1184,6 +1184,7 @@ ExpandAllTables(ParseState *pstate, int location)
 											RTERangeTablePosn(pstate, rte,
 															  NULL),
 											0,
+											true,
 											location));
 	}
 
@@ -1252,14 +1253,14 @@ ExpandSingleTable(ParseState *pstate, RangeTblEntry *rte,
 	{
 		/* expandRelAttrs handles permissions marking */
 		return expandRelAttrs(pstate, rte, rtindex, sublevels_up,
-							  location);
+							  true, location);
 	}
 	else
 	{
 		List	   *vars;
 		ListCell   *l;
 
-		expandRTE(rte, rtindex, sublevels_up, location, false,
+		expandRTE(rte, rtindex, sublevels_up, location, false, true,
 				  NULL, &vars);
 
 		/*
@@ -1296,6 +1297,7 @@ ExpandRowReference(ParseState *pstate, Node *expr,
 	TupleDesc	tupleDesc;
 	int			numAttrs;
 	int			i;
+	Form_pg_attribute *attr;
 
 	/*
 	 * If the rowtype expression is a whole-row Var, we can expand the fields
@@ -1342,9 +1344,10 @@ ExpandRowReference(ParseState *pstate, Node *expr,
 
 	/* Generate a list of references to the individual fields */
 	numAttrs = tupleDesc->natts;
+	attr = TupleDescGetSortedAttrs(tupleDesc);
 	for (i = 0; i < numAttrs; i++)
 	{
-		Form_pg_attribute att = tupleDesc->attrs[i];
+		Form_pg_attribute att = attr[i];
 		FieldSelect *fselect;
 
 		if (att->attisdropped)
@@ -1413,7 +1416,7 @@ expandRecordVariable(ParseState *pstate, Var *var, int levelsup)
 				   *lvar;
 		int			i;
 
-		expandRTE(rte, var->varno, 0, var->location, false,
+		expandRTE(rte, var->varno, 0, var->location, false, false,
 				  &names, &vars);
 
 		tupleDesc = CreateTemplateTupleDesc(list_length(vars), false);
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index c9e4b68..ddee31d 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -1326,7 +1326,7 @@ ReplaceVarsFromTargetList_callback(Var *var,
 		 */
 		expandRTE(rcon->target_rte,
 				  var->varno, var->varlevelsup, var->location,
-				  (var->vartype != RECORDOID),
+				  (var->vartype != RECORDOID), false,
 				  &colnames, &fields);
 		/* Adjust the generated per-field Vars... */
 		fields = (List *) replace_rte_variables_mutator((Node *) fields,
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index 9543d01..04cdf11 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -89,6 +89,7 @@ record_in(PG_FUNCTION_ARGS)
 	Datum	   *values;
 	bool	   *nulls;
 	StringInfoData buf;
+	Form_pg_attribute *attrs;
 
 	/*
 	 * Use the passed type unless it's RECORD; we can't support input of
@@ -138,6 +139,8 @@ record_in(PG_FUNCTION_ARGS)
 		my_extra->ncolumns = ncolumns;
 	}
 
+	attrs = TupleDescGetSortedAttrs(tupdesc);
+
 	values = (Datum *) palloc(ncolumns * sizeof(Datum));
 	nulls = (bool *) palloc(ncolumns * sizeof(bool));
 
@@ -159,15 +162,17 @@ record_in(PG_FUNCTION_ARGS)
 
 	for (i = 0; i < ncolumns; i++)
 	{
-		ColumnIOData *column_info = &my_extra->columns[i];
-		Oid			column_type = tupdesc->attrs[i]->atttypid;
+		Form_pg_attribute	attr = attrs[i];
+		int16		attnum = attr->attnum - 1;
+		ColumnIOData *column_info = &my_extra->columns[attnum];
+		Oid			column_type = attr->atttypid;
 		char	   *column_data;
 
 		/* Ignore dropped columns in datatype, but fill with nulls */
-		if (tupdesc->attrs[i]->attisdropped)
+		if (attr->attisdropped)
 		{
-			values[i] = (Datum) 0;
-			nulls[i] = true;
+			values[attnum] = (Datum) 0;
+			nulls[attnum] = true;
 			continue;
 		}
 
@@ -188,7 +193,7 @@ record_in(PG_FUNCTION_ARGS)
 		if (*ptr == ',' || *ptr == ')')
 		{
 			column_data = NULL;
-			nulls[i] = true;
+			nulls[attnum] = true;
 		}
 		else
 		{
@@ -233,7 +238,7 @@ record_in(PG_FUNCTION_ARGS)
 			}
 
 			column_data = buf.data;
-			nulls[i] = false;
+			nulls[attnum] = false;
 		}
 
 		/*
@@ -249,10 +254,10 @@ record_in(PG_FUNCTION_ARGS)
 			column_info->column_type = column_type;
 		}
 
-		values[i] = InputFunctionCall(&column_info->proc,
-									  column_data,
-									  column_info->typioparam,
-									  tupdesc->attrs[i]->atttypmod);
+		values[attnum] = InputFunctionCall(&column_info->proc,
+										   column_data,
+										   column_info->typioparam,
+										   attr->atttypmod);
 
 		/*
 		 * Prep for next column
@@ -311,6 +316,7 @@ record_out(PG_FUNCTION_ARGS)
 	Datum	   *values;
 	bool	   *nulls;
 	StringInfoData buf;
+	Form_pg_attribute	*attrs;
 
 	/* Extract type info from the tuple itself */
 	tupType = HeapTupleHeaderGetTypeId(rec);
@@ -352,6 +358,8 @@ record_out(PG_FUNCTION_ARGS)
 		my_extra->ncolumns = ncolumns;
 	}
 
+	attrs = TupleDescGetSortedAttrs(tupdesc);
+
 	values = (Datum *) palloc(ncolumns * sizeof(Datum));
 	nulls = (bool *) palloc(ncolumns * sizeof(bool));
 
@@ -365,22 +373,24 @@ record_out(PG_FUNCTION_ARGS)
 
 	for (i = 0; i < ncolumns; i++)
 	{
-		ColumnIOData *column_info = &my_extra->columns[i];
-		Oid			column_type = tupdesc->attrs[i]->atttypid;
+		Form_pg_attribute attrib = attrs[i];
+		int16		attnum = attrib->attnum - 1;
+		ColumnIOData *column_info = &my_extra->columns[attnum];
+		Oid			column_type = attrib->atttypid;
 		Datum		attr;
 		char	   *value;
 		char	   *tmp;
 		bool		nq;
 
 		/* Ignore dropped columns in datatype */
-		if (tupdesc->attrs[i]->attisdropped)
+		if (attrib->attisdropped)
 			continue;
 
 		if (needComma)
 			appendStringInfoChar(&buf, ',');
 		needComma = true;
 
-		if (nulls[i])
+		if (nulls[attnum])
 		{
 			/* emit nothing... */
 			continue;
@@ -399,7 +409,7 @@ record_out(PG_FUNCTION_ARGS)
 			column_info->column_type = column_type;
 		}
 
-		attr = values[i];
+		attr = values[attnum];
 		value = OutputFunctionCall(&column_info->proc, attr);
 
 		/* Detect whether we need double quotes for this value */
@@ -464,6 +474,7 @@ record_recv(PG_FUNCTION_ARGS)
 	int			i;
 	Datum	   *values;
 	bool	   *nulls;
+	Form_pg_attribute *attrs;
 
 	/*
 	 * Use the passed type unless it's RECORD; we can't support input of
@@ -507,6 +518,7 @@ record_recv(PG_FUNCTION_ARGS)
 		my_extra->ncolumns = ncolumns;
 	}
 
+	attrs = TupleDescGetSortedAttrs(tupdesc);
 	values = (Datum *) palloc(ncolumns * sizeof(Datum));
 	nulls = (bool *) palloc(ncolumns * sizeof(bool));
 
@@ -529,8 +541,10 @@ record_recv(PG_FUNCTION_ARGS)
 	/* Process each column */
 	for (i = 0; i < ncolumns; i++)
 	{
-		ColumnIOData *column_info = &my_extra->columns[i];
-		Oid			column_type = tupdesc->attrs[i]->atttypid;
+		Form_pg_attribute   attr = attrs[i];
+		int16       attnum = attr->attnum - 1;
+		ColumnIOData *column_info = &my_extra->columns[attnum];
+		Oid			column_type = attr->atttypid;
 		Oid			coltypoid;
 		int			itemlen;
 		StringInfoData item_buf;
@@ -538,10 +552,10 @@ record_recv(PG_FUNCTION_ARGS)
 		char		csave;
 
 		/* Ignore dropped columns in datatype, but fill with nulls */
-		if (tupdesc->attrs[i]->attisdropped)
+		if (attr->attisdropped)
 		{
-			values[i] = (Datum) 0;
-			nulls[i] = true;
+			values[attnum] = (Datum) 0;
+			nulls[attnum] = true;
 			continue;
 		}
 
@@ -564,7 +578,7 @@ record_recv(PG_FUNCTION_ARGS)
 		{
 			/* -1 length means NULL */
 			bufptr = NULL;
-			nulls[i] = true;
+			nulls[attnum] = true;
 			csave = 0;			/* keep compiler quiet */
 		}
 		else
@@ -586,7 +600,7 @@ record_recv(PG_FUNCTION_ARGS)
 			buf->data[buf->cursor] = '\0';
 
 			bufptr = &item_buf;
-			nulls[i] = false;
+			nulls[attnum] = false;
 		}
 
 		/* Now call the column's receiveproc */
@@ -600,10 +614,10 @@ record_recv(PG_FUNCTION_ARGS)
 			column_info->column_type = column_type;
 		}
 
-		values[i] = ReceiveFunctionCall(&column_info->proc,
-										bufptr,
-										column_info->typioparam,
-										tupdesc->attrs[i]->atttypmod);
+		values[attnum] = ReceiveFunctionCall(&column_info->proc,
+											 bufptr,
+											 column_info->typioparam,
+											 attr->atttypmod);
 
 		if (bufptr)
 		{
@@ -654,6 +668,7 @@ record_send(PG_FUNCTION_ARGS)
 	Datum	   *values;
 	bool	   *nulls;
 	StringInfoData buf;
+	Form_pg_attribute	*attrs;
 
 	/* Extract type info from the tuple itself */
 	tupType = HeapTupleHeaderGetTypeId(rec);
@@ -695,6 +710,8 @@ record_send(PG_FUNCTION_ARGS)
 		my_extra->ncolumns = ncolumns;
 	}
 
+	attrs = TupleDescGetSortedAttrs(tupdesc);
+
 	values = (Datum *) palloc(ncolumns * sizeof(Datum));
 	nulls = (bool *) palloc(ncolumns * sizeof(bool));
 
@@ -715,13 +732,15 @@ record_send(PG_FUNCTION_ARGS)
 
 	for (i = 0; i < ncolumns; i++)
 	{
-		ColumnIOData *column_info = &my_extra->columns[i];
-		Oid			column_type = tupdesc->attrs[i]->atttypid;
+		Form_pg_attribute attrib = attrs[i];
+		int16		attnum = attrib->attnum - 1;
+		ColumnIOData *column_info = &my_extra->columns[attnum];
+		Oid			column_type = tupdesc->attrs[attnum]->atttypid;
 		Datum		attr;
 		bytea	   *outputbytes;
 
 		/* Ignore dropped columns in datatype */
-		if (tupdesc->attrs[i]->attisdropped)
+		if (attrib->attisdropped)
 			continue;
 
 		pq_sendint(&buf, column_type, sizeof(Oid));
@@ -746,7 +765,7 @@ record_send(PG_FUNCTION_ARGS)
 			column_info->column_type = column_type;
 		}
 
-		attr = values[i];
+		attr = values[attnum];
 		outputbytes = SendFunctionCall(&column_info->proc, attr);
 		pq_sendint(&buf, VARSIZE(outputbytes) - VARHDRSZ, 4);
 		pq_sendbytes(&buf, VARDATA(outputbytes),
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index a69aae3..d05da9a 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -731,7 +731,7 @@ tuplestore_putvalues(Tuplestorestate *state, TupleDesc tdesc,
 	MinimalTuple tuple;
 	MemoryContext oldcxt = MemoryContextSwitchTo(state->context);
 
-	tuple = heap_form_minimal_tuple(tdesc, values, isnull);
+	tuple = heap_form_minimal_tuple(tdesc, values, isnull, 0);
 	USEMEM(state, GetMemoryChunkSpace(tuple));
 
 	tuplestore_puttuple_common(state, (void *) tuple);
diff --git a/src/include/access/htup_details.h b/src/include/access/htup_details.h
index 300c2a5..8a8f243 100644
--- a/src/include/access/htup_details.h
+++ b/src/include/access/htup_details.h
@@ -720,11 +720,24 @@ extern Datum fastgetattr(HeapTuple tup, int attnum, TupleDesc tupleDesc,
 	)
 
 
+/*
+ * Option flags for several of the functions below.
+ */
+/* indicates that the various values arrays are in logical column order */
+#define HTOPT_LOGICAL_ORDER            (1 << 0)
+
+/* backwards-compatibility macros */
+#define heap_form_tuple(tupdesc, values, isnull) \
+		heap_form_tuple_extended((tupdesc), (values), (isnull), 0)
+#define heap_deform_tuple(tuple, tupdesc, values, isnull) \
+		heap_deform_tuple_extended((tuple), (tupdesc), (values), (isnull), 0)
+
 /* prototypes for functions in common/heaptuple.c */
 extern Size heap_compute_data_size(TupleDesc tupleDesc,
-					   Datum *values, bool *isnull);
+					   Datum *values, bool *isnull, bool logical_order);
 extern void heap_fill_tuple(TupleDesc tupleDesc,
 				Datum *values, bool *isnull,
+				bool logical_order,
 				char *data, Size data_size,
 				uint16 *infomask, bits8 *bit);
 extern bool heap_attisnull(HeapTuple tup, int attnum);
@@ -735,15 +748,16 @@ extern Datum heap_getsysattr(HeapTuple tup, int attnum, TupleDesc tupleDesc,
 extern HeapTuple heap_copytuple(HeapTuple tuple);
 extern void heap_copytuple_with_tuple(HeapTuple src, HeapTuple dest);
 extern Datum heap_copy_tuple_as_datum(HeapTuple tuple, TupleDesc tupleDesc);
-extern HeapTuple heap_form_tuple(TupleDesc tupleDescriptor,
-				Datum *values, bool *isnull);
+extern HeapTuple heap_form_tuple_extended(TupleDesc tupleDescriptor,
+						 Datum *values, bool *isnull, int flags);
 extern HeapTuple heap_modify_tuple(HeapTuple tuple,
 				  TupleDesc tupleDesc,
 				  Datum *replValues,
 				  bool *replIsnull,
 				  bool *doReplace);
-extern void heap_deform_tuple(HeapTuple tuple, TupleDesc tupleDesc,
-				  Datum *values, bool *isnull);
+extern void heap_deform_tuple_extended(HeapTuple tuple, TupleDesc tupleDesc,
+						   Datum *values, bool *isnull, int flags);
+
 
 /* these three are deprecated versions of the three above: */
 extern HeapTuple heap_formtuple(TupleDesc tupleDescriptor,
@@ -757,7 +771,7 @@ extern void heap_deformtuple(HeapTuple tuple, TupleDesc tupleDesc,
 				 Datum *values, char *nulls);
 extern void heap_freetuple(HeapTuple htup);
 extern MinimalTuple heap_form_minimal_tuple(TupleDesc tupleDescriptor,
-						Datum *values, bool *isnull);
+						Datum *values, bool *isnull, int flags);
 extern void heap_free_minimal_tuple(MinimalTuple mtup);
 extern MinimalTuple heap_copy_minimal_tuple(MinimalTuple mtup);
 extern HeapTuple heap_tuple_from_minimal_tuple(MinimalTuple mtup);
diff --git a/src/include/access/tupdesc.h b/src/include/access/tupdesc.h
index 083f4bd..f02cad8 100644
--- a/src/include/access/tupdesc.h
+++ b/src/include/access/tupdesc.h
@@ -60,6 +60,11 @@ typedef struct tupleConstr
  * row type, or a value >= 0 to allow the rowtype to be looked up in the
  * typcache.c type cache.
  *
+ * We keep an array of attribute sorted by attlognum.  This helps *-expansion.
+ * The array is initially set to NULL, and is only populated on first access;
+ * those wanting to access it should always do it through
+ * TupleDescGetSortedAttrs.
+ *
  * Tuple descriptors that live in caches (relcache or typcache, at present)
  * are reference-counted: they can be deleted when their reference count goes
  * to zero.  Tuple descriptors created by the executor need no reference
@@ -73,6 +78,7 @@ typedef struct tupleDesc
 	int			natts;			/* number of attributes in the tuple */
 	Form_pg_attribute *attrs;
 	/* attrs[N] is a pointer to the description of Attribute Number N+1 */
+	Form_pg_attribute *logattrs;	/* array of attributes sorted by attlognum */
 	TupleConstr *constr;		/* constraints, or NULL if none */
 	Oid			tdtypeid;		/* composite type ID for tuple type */
 	int32		tdtypmod;		/* typmod for tuple type */
@@ -123,8 +129,14 @@ extern void TupleDescInitEntryCollation(TupleDesc desc,
 							AttrNumber attributeNumber,
 							Oid collationid);
 
+extern void TupleDescInitEntryLognum(TupleDesc desc,
+						 AttrNumber attributeNumber,
+						 int attlognum);
+
 extern TupleDesc BuildDescForRelation(List *schema);
 
 extern TupleDesc BuildDescFromLists(List *names, List *types, List *typmods, List *collations);
 
+extern Form_pg_attribute *TupleDescGetSortedAttrs(TupleDesc desc);
+
 #endif   /* TUPDESC_H */
diff --git a/src/include/catalog/pg_attribute.h b/src/include/catalog/pg_attribute.h
index 391d568..cd671a4 100644
--- a/src/include/catalog/pg_attribute.h
+++ b/src/include/catalog/pg_attribute.h
@@ -63,19 +63,26 @@ CATALOG(pg_attribute,1249) BKI_BOOTSTRAP BKI_WITHOUT_OIDS BKI_ROWTYPE_OID(75) BK
 	int16		attlen;
 
 	/*
-	 * attnum is the "attribute number" for the attribute:	A value that
-	 * uniquely identifies this attribute within its class. For user
-	 * attributes, Attribute numbers are greater than 0 and not greater than
-	 * the number of attributes in the class. I.e. if the Class pg_class says
-	 * that Class XYZ has 10 attributes, then the user attribute numbers in
-	 * Class pg_attribute must be 1-10.
-	 *
+	 * attnum uniquely identifies the column within its class, throughout its
+	 * lifetime.  For user attributes, Attribute numbers are greater than 0 and
+	 * less than or equal to the number of attributes in the class. For
+	 * instance, if the Class pg_class says that Class XYZ has 10 attributes,
+	 * then the user attribute numbers in Class pg_attribute must be 1-10.
 	 * System attributes have attribute numbers less than 0 that are unique
 	 * within the class, but not constrained to any particular range.
 	 *
-	 * Note that (attnum - 1) is often used as the index to an array.
+	 * attphysnum (physical position) specifies the position in which the
+	 * column is stored in physical tuples.  This might differ from attnum if
+	 * there are useful optimizations in storage space, for example alignment
+	 * considerations.
+	 *
+	 * attlognum (logical position) specifies the position in which the column
+	 * is expanded in "SELECT * FROM rel", INSERT queries that don't specify a
+	 * explicite column list, and the like.
 	 */
 	int16		attnum;
+	int16		attphysnum;
+	int16		attlognum;
 
 	/*
 	 * attndims is the declared number of dimensions, if an array type,
@@ -188,28 +195,31 @@ typedef FormData_pg_attribute *Form_pg_attribute;
  * ----------------
  */
 
-#define Natts_pg_attribute				21
+#define Natts_pg_attribute				23
 #define Anum_pg_attribute_attrelid		1
 #define Anum_pg_attribute_attname		2
 #define Anum_pg_attribute_atttypid		3
 #define Anum_pg_attribute_attstattarget 4
 #define Anum_pg_attribute_attlen		5
 #define Anum_pg_attribute_attnum		6
-#define Anum_pg_attribute_attndims		7
-#define Anum_pg_attribute_attcacheoff	8
-#define Anum_pg_attribute_atttypmod		9
-#define Anum_pg_attribute_attbyval		10
-#define Anum_pg_attribute_attstorage	11
-#define Anum_pg_attribute_attalign		12
-#define Anum_pg_attribute_attnotnull	13
-#define Anum_pg_attribute_atthasdef		14
-#define Anum_pg_attribute_attisdropped	15
-#define Anum_pg_attribute_attislocal	16
-#define Anum_pg_attribute_attinhcount	17
-#define Anum_pg_attribute_attcollation	18
-#define Anum_pg_attribute_attacl		19
-#define Anum_pg_attribute_attoptions	20
-#define Anum_pg_attribute_attfdwoptions 21
+#define Anum_pg_attribute_attphysnum	7
+#define Anum_pg_attribute_attlognum		8
+#define Anum_pg_attribute_attndims		9
+#define Anum_pg_attribute_attcacheoff	10
+#define Anum_pg_attribute_atttypmod		11
+#define Anum_pg_attribute_attbyval		12
+#define Anum_pg_attribute_attstorage	13
+#define Anum_pg_attribute_attalign		14
+#define Anum_pg_attribute_attnotnull	15
+#define Anum_pg_attribute_atthasdef		16
+#define Anum_pg_attribute_attisdropped	17
+#define Anum_pg_attribute_attislocal	18
+#define Anum_pg_attribute_attinhcount	19
+#define Anum_pg_attribute_attcollation	20
+#define Anum_pg_attribute_attacl		21
+#define Anum_pg_attribute_attoptions	22
+#define Anum_pg_attribute_attfdwoptions	23
+
 
 
 /* ----------------
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 1054cd0..6eff578 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -142,7 +142,7 @@ typedef FormData_pg_class *Form_pg_class;
  */
 DATA(insert OID = 1247 (  pg_type		PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f f t n 3 1 _null_ _null_ ));
 DESCR("");
-DATA(insert OID = 1249 (  pg_attribute	PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f f t n 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 (  pg_attribute	PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 23 0 f f f f f f t n 3 1 _null_ _null_ ));
 DESCR("");
 DATA(insert OID = 1255 (  pg_proc		PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f f t n 3 1 _null_ _null_ ));
 DESCR("");
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 5eaa435..8319d45 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -747,10 +747,11 @@ typedef struct RangeTblEntry
 	 */
 
 	/*
-	 * Fields valid for a plain relation RTE (else zero):
+	 * Fields valid for a plain relation RTE (else zero/NIL):
 	 */
 	Oid			relid;			/* OID of the relation */
 	char		relkind;		/* relation kind (see pg_class.relkind) */
+	List	   *lognums;		/* int list of logical column numbers */
 
 	/*
 	 * Fields valid for a subquery RTE (else NULL):
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 6d9f3d9..7227df5 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -146,8 +146,8 @@ typedef struct Var
 	Expr		xpr;
 	Index		varno;			/* index of this var's relation in the range
 								 * table, or INNER_VAR/OUTER_VAR/INDEX_VAR */
-	AttrNumber	varattno;		/* attribute number of this var, or zero for
-								 * all */
+	AttrNumber	varattno;		/* identity attribute number (attnum) of this
+								 * var, or zero for all */
 	Oid			vartype;		/* pg_type OID for the type of this var */
 	int32		vartypmod;		/* pg_attribute typmod value */
 	Oid			varcollid;		/* OID of collation, or InvalidOid if none */
diff --git a/src/include/parser/parse_relation.h b/src/include/parser/parse_relation.h
index d8b9493..b30d779 100644
--- a/src/include/parser/parse_relation.h
+++ b/src/include/parser/parse_relation.h
@@ -89,10 +89,10 @@ extern void errorMissingRTE(ParseState *pstate, RangeVar *relation) __attribute_
 extern void errorMissingColumn(ParseState *pstate,
 	   char *relname, char *colname, int location) __attribute__((noreturn));
 extern void expandRTE(RangeTblEntry *rte, int rtindex, int sublevels_up,
-		  int location, bool include_dropped,
+		  int location, bool include_dropped, bool logical_sort,
 		  List **colnames, List **colvars);
 extern List *expandRelAttrs(ParseState *pstate, RangeTblEntry *rte,
-			   int rtindex, int sublevels_up, int location);
+			   int rtindex, int sublevels_up, bool logical_sort, int location);
 extern int	attnameAttNum(Relation rd, const char *attname, bool sysColOK);
 extern Name attnumAttName(Relation rd, int attid);
 extern Oid	attnumTypeId(Relation rd, int attid);
diff --git a/src/test/regress/expected/col_order.out b/src/test/regress/expected/col_order.out
new file mode 100644
index 0000000..45d6918
--- /dev/null
+++ b/src/test/regress/expected/col_order.out
@@ -0,0 +1,286 @@
+drop table if exists foo, bar, baz cascade;
+NOTICE:  table "foo" does not exist, skipping
+NOTICE:  table "bar" does not exist, skipping
+NOTICE:  table "baz" does not exist, skipping
+create table foo (
+	a int default 42,
+	b timestamp default '1975-02-15 12:00',
+	c text);
+insert into foo values (142857, '1888-04-29', 'hello world');
+begin;
+update pg_attribute set attlognum = 1 where attname = 'c' and attrelid = 'foo'::regclass;
+update pg_attribute set attlognum = 2 where attname = 'a' and attrelid = 'foo'::regclass;
+update pg_attribute set attlognum = 3 where attname = 'b' and attrelid = 'foo'::regclass;
+commit;
+insert into foo values ('column c', 123, '2010-03-03 10:10:10');
+insert into foo (c, a, b) values ('c again', 456, '2010-03-03 11:12:13');
+insert into foo values ('and c', 789);	-- defaults column b
+insert into foo (c, b) values ('the c', '1975-01-10 08:00');	-- defaults column a
+select * from foo;
+      c      |   a    |            b             
+-------------+--------+--------------------------
+ hello world | 142857 | Sun Apr 29 00:00:00 1888
+ column c    |    123 | Wed Mar 03 10:10:10 2010
+ c again     |    456 | Wed Mar 03 11:12:13 2010
+ and c       |    789 | Sat Feb 15 12:00:00 1975
+ the c       |     42 | Fri Jan 10 08:00:00 1975
+(5 rows)
+
+select foo from foo;
+                        foo                        
+---------------------------------------------------
+ ("hello world",142857,"Sun Apr 29 00:00:00 1888")
+ ("column c",123,"Wed Mar 03 10:10:10 2010")
+ ("c again",456,"Wed Mar 03 11:12:13 2010")
+ ("and c",789,"Sat Feb 15 12:00:00 1975")
+ ("the c",42,"Fri Jan 10 08:00:00 1975")
+(5 rows)
+
+select foo.* from foo;
+      c      |   a    |            b             
+-------------+--------+--------------------------
+ hello world | 142857 | Sun Apr 29 00:00:00 1888
+ column c    |    123 | Wed Mar 03 10:10:10 2010
+ c again     |    456 | Wed Mar 03 11:12:13 2010
+ and c       |    789 | Sat Feb 15 12:00:00 1975
+ the c       |     42 | Fri Jan 10 08:00:00 1975
+(5 rows)
+
+select a,c,b from foo;
+   a    |      c      |            b             
+--------+-------------+--------------------------
+ 142857 | hello world | Sun Apr 29 00:00:00 1888
+    123 | column c    | Wed Mar 03 10:10:10 2010
+    456 | c again     | Wed Mar 03 11:12:13 2010
+    789 | and c       | Sat Feb 15 12:00:00 1975
+     42 | the c       | Fri Jan 10 08:00:00 1975
+(5 rows)
+
+select c,b,a from foo;
+      c      |            b             |   a    
+-------------+--------------------------+--------
+ hello world | Sun Apr 29 00:00:00 1888 | 142857
+ column c    | Wed Mar 03 10:10:10 2010 |    123
+ c again     | Wed Mar 03 11:12:13 2010 |    456
+ and c       | Sat Feb 15 12:00:00 1975 |    789
+ the c       | Fri Jan 10 08:00:00 1975 |     42
+(5 rows)
+
+select a from foo;
+   a    
+--------
+ 142857
+    123
+    456
+    789
+     42
+(5 rows)
+
+select b from foo;
+            b             
+--------------------------
+ Sun Apr 29 00:00:00 1888
+ Wed Mar 03 10:10:10 2010
+ Wed Mar 03 11:12:13 2010
+ Sat Feb 15 12:00:00 1975
+ Fri Jan 10 08:00:00 1975
+(5 rows)
+
+select c from foo;
+      c      
+-------------
+ hello world
+ column c
+ c again
+ and c
+ the c
+(5 rows)
+
+select (foo).* from foo;
+      c      |   a    |            b             
+-------------+--------+--------------------------
+ hello world | 142857 | Sun Apr 29 00:00:00 1888
+ column c    |    123 | Wed Mar 03 10:10:10 2010
+ c again     |    456 | Wed Mar 03 11:12:13 2010
+ and c       |    789 | Sat Feb 15 12:00:00 1975
+ the c       |     42 | Fri Jan 10 08:00:00 1975
+(5 rows)
+
+select ROW((foo).*) from foo;
+                        row                        
+---------------------------------------------------
+ ("hello world",142857,"Sun Apr 29 00:00:00 1888")
+ ("column c",123,"Wed Mar 03 10:10:10 2010")
+ ("c again",456,"Wed Mar 03 11:12:13 2010")
+ ("and c",789,"Sat Feb 15 12:00:00 1975")
+ ("the c",42,"Fri Jan 10 08:00:00 1975")
+(5 rows)
+
+select ROW((foo).*)::foo from foo;
+                        row                        
+---------------------------------------------------
+ ("hello world",142857,"Sun Apr 29 00:00:00 1888")
+ ("column c",123,"Wed Mar 03 10:10:10 2010")
+ ("c again",456,"Wed Mar 03 11:12:13 2010")
+ ("and c",789,"Sat Feb 15 12:00:00 1975")
+ ("the c",42,"Fri Jan 10 08:00:00 1975")
+(5 rows)
+
+select (ROW((foo).*)::foo).* from foo;
+      c      |   a    |            b             
+-------------+--------+--------------------------
+ hello world | 142857 | Sun Apr 29 00:00:00 1888
+ column c    |    123 | Wed Mar 03 10:10:10 2010
+ c again     |    456 | Wed Mar 03 11:12:13 2010
+ and c       |    789 | Sat Feb 15 12:00:00 1975
+ the c       |     42 | Fri Jan 10 08:00:00 1975
+(5 rows)
+
+create function f() returns setof foo language sql as $$
+select * from foo;
+$$;
+select * from f();
+      c      |   a    |            b             
+-------------+--------+--------------------------
+ hello world | 142857 | Sun Apr 29 00:00:00 1888
+ column c    |    123 | Wed Mar 03 10:10:10 2010
+ c again     |    456 | Wed Mar 03 11:12:13 2010
+ and c       |    789 | Sat Feb 15 12:00:00 1975
+ the c       |     42 | Fri Jan 10 08:00:00 1975
+(5 rows)
+
+insert into foo
+	select (row('ah', 1126, '2012-10-15')::foo).*
+	returning *;
+ c  |  a   |            b             
+----+------+--------------------------
+ ah | 1126 | Mon Oct 15 00:00:00 2012
+(1 row)
+
+insert into foo
+	select (row('eh', 1125, '2012-10-16')::foo).*
+	returning foo.*;
+ c  |  a   |            b             
+----+------+--------------------------
+ eh | 1125 | Tue Oct 16 00:00:00 2012
+(1 row)
+
+insert into foo values
+	('values one', 1, '2008-10-20'),
+	('values two', 2, '2004-08-15');
+copy foo from stdin;
+select * from foo order by 2;
+      c      |   a    |            b             
+-------------+--------+--------------------------
+ values one  |      1 | Mon Oct 20 00:00:00 2008
+ values two  |      2 | Sun Aug 15 00:00:00 2004
+ the c       |     42 | Fri Jan 10 08:00:00 1975
+ column c    |    123 | Wed Mar 03 10:10:10 2010
+ c again     |    456 | Wed Mar 03 11:12:13 2010
+ and c       |    789 | Sat Feb 15 12:00:00 1975
+ copy one    |   1001 | Thu Dec 10 23:54:00 1998
+ copy two    |   1002 | Thu Aug 01 09:22:00 1996
+ eh          |   1125 | Tue Oct 16 00:00:00 2012
+ ah          |   1126 | Mon Oct 15 00:00:00 2012
+ hello world | 142857 | Sun Apr 29 00:00:00 1888
+(11 rows)
+
+-- Test some joins
+create table bar (x text, y int default 142857, z timestamp );
+insert into bar values ('oh no', default, '1937-04-28');
+insert into bar values ('oh yes', 42, '1492-12-31');
+begin;
+update pg_attribute set attlognum = 3 where attname = 'x' and attrelid = 'bar'::regclass;
+update pg_attribute set attlognum = 1 where attname = 'z' and attrelid = 'bar'::regclass;
+commit;
+select foo.* from bar, foo where bar.y = foo.a;
+      c      |   a    |            b             
+-------------+--------+--------------------------
+ the c       |     42 | Fri Jan 10 08:00:00 1975
+ hello world | 142857 | Sun Apr 29 00:00:00 1888
+(2 rows)
+
+select bar.* from bar, foo where bar.y = foo.a;
+            z             |   y    |   x    
+--------------------------+--------+--------
+ Sat Dec 31 00:00:00 1492 |     42 | oh yes
+ Wed Apr 28 00:00:00 1937 | 142857 | oh no
+(2 rows)
+
+select * from bar, foo where bar.y = foo.a;
+            z             |   y    |   x    |      c      |   a    |            b             
+--------------------------+--------+--------+-------------+--------+--------------------------
+ Sat Dec 31 00:00:00 1492 |     42 | oh yes | the c       |     42 | Fri Jan 10 08:00:00 1975
+ Wed Apr 28 00:00:00 1937 | 142857 | oh no  | hello world | 142857 | Sun Apr 29 00:00:00 1888
+(2 rows)
+
+select * from foo join bar on (foo.a = bar.y);
+      c      |   a    |            b             |            z             |   y    |   x    
+-------------+--------+--------------------------+--------------------------+--------+--------
+ the c       |     42 | Fri Jan 10 08:00:00 1975 | Sat Dec 31 00:00:00 1492 |     42 | oh yes
+ hello world | 142857 | Sun Apr 29 00:00:00 1888 | Wed Apr 28 00:00:00 1937 | 142857 | oh no
+(2 rows)
+
+alter table bar rename y to a;
+select * from foo natural join bar;
+   a    |      c      |            b             |            z             |   x    
+--------+-------------+--------------------------+--------------------------+--------
+     42 | the c       | Fri Jan 10 08:00:00 1975 | Sat Dec 31 00:00:00 1492 | oh yes
+ 142857 | hello world | Sun Apr 29 00:00:00 1888 | Wed Apr 28 00:00:00 1937 | oh no
+(2 rows)
+
+select * from foo join bar using (a);
+   a    |      c      |            b             |            z             |   x    
+--------+-------------+--------------------------+--------------------------+--------
+     42 | the c       | Fri Jan 10 08:00:00 1975 | Sat Dec 31 00:00:00 1492 | oh yes
+ 142857 | hello world | Sun Apr 29 00:00:00 1888 | Wed Apr 28 00:00:00 1937 | oh no
+(2 rows)
+
+create table baz (e point) inherits (foo, bar); -- fail to merge defaults
+NOTICE:  merging multiple inherited definitions of column "a"
+ERROR:  column "a" inherits conflicting default values
+HINT:  To resolve the conflict, specify a default explicitly.
+create table baz (e point, a int default 23) inherits (foo, bar);
+NOTICE:  merging multiple inherited definitions of column "a"
+NOTICE:  merging column "a" with inherited definition
+insert into baz (e) values ('(1,1)');
+select * from foo;
+      c      |   a    |            b             
+-------------+--------+--------------------------
+ hello world | 142857 | Sun Apr 29 00:00:00 1888
+ column c    |    123 | Wed Mar 03 10:10:10 2010
+ c again     |    456 | Wed Mar 03 11:12:13 2010
+ and c       |    789 | Sat Feb 15 12:00:00 1975
+ the c       |     42 | Fri Jan 10 08:00:00 1975
+ ah          |   1126 | Mon Oct 15 00:00:00 2012
+ eh          |   1125 | Tue Oct 16 00:00:00 2012
+ values one  |      1 | Mon Oct 20 00:00:00 2008
+ values two  |      2 | Sun Aug 15 00:00:00 2004
+ copy one    |   1001 | Thu Dec 10 23:54:00 1998
+ copy two    |   1002 | Thu Aug 01 09:22:00 1996
+             |     23 | Sat Feb 15 12:00:00 1975
+(12 rows)
+
+select * from bar;
+            z             |   a    |   x    
+--------------------------+--------+--------
+ Wed Apr 28 00:00:00 1937 | 142857 | oh no
+ Sat Dec 31 00:00:00 1492 |     42 | oh yes
+                          |     23 | 
+(3 rows)
+
+select * from baz;
+ c | a  |            b             | z | x |   e   
+---+----+--------------------------+---+---+-------
+   | 23 | Sat Feb 15 12:00:00 1975 |   |   | (1,1)
+(1 row)
+
+create table quux (a int, b int[], c int);
+begin;
+update pg_attribute set attlognum = 1 where attnum = 2 and attrelid = 'quux'::regclass;
+update pg_attribute set attlognum = 2 where attnum = 1 and attrelid = 'quux'::regclass;
+commit;
+select * from quux where (a,c) in ( select a,c from quux );
+ERROR:  failed to find unique expression in subplan tlist
+drop table foo, bar, baz, quux cascade;
+NOTICE:  drop cascades to function f()
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 62cc198..d2c7e52 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -30,7 +30,7 @@ test: point lseg line box path polygon circle date time timetz timestamp timesta
 # geometry depends on point, lseg, box, path, polygon and circle
 # horology depends on interval, timetz, timestamp, timestamptz, reltime and abstime
 # ----------
-test: geometry horology regex oidjoins type_sanity opr_sanity
+test: geometry horology regex oidjoins type_sanity opr_sanity col_order
 
 # ----------
 # These four each depend on the previous one
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 07fc827..d8a045a 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -49,6 +49,7 @@ test: regex
 test: oidjoins
 test: type_sanity
 test: opr_sanity
+test: col_order
 test: insert
 test: create_function_1
 test: create_type
diff --git a/src/test/regress/sql/col_order.sql b/src/test/regress/sql/col_order.sql
new file mode 100644
index 0000000..556c30e
--- /dev/null
+++ b/src/test/regress/sql/col_order.sql
@@ -0,0 +1,85 @@
+drop table if exists foo, bar, baz cascade;
+create table foo (
+	a int default 42,
+	b timestamp default '1975-02-15 12:00',
+	c text);
+insert into foo values (142857, '1888-04-29', 'hello world');
+
+begin;
+update pg_attribute set attlognum = 1 where attname = 'c' and attrelid = 'foo'::regclass;
+update pg_attribute set attlognum = 2 where attname = 'a' and attrelid = 'foo'::regclass;
+update pg_attribute set attlognum = 3 where attname = 'b' and attrelid = 'foo'::regclass;
+commit;
+
+insert into foo values ('column c', 123, '2010-03-03 10:10:10');
+insert into foo (c, a, b) values ('c again', 456, '2010-03-03 11:12:13');
+insert into foo values ('and c', 789);	-- defaults column b
+insert into foo (c, b) values ('the c', '1975-01-10 08:00');	-- defaults column a
+
+select * from foo;
+select foo from foo;
+select foo.* from foo;
+select a,c,b from foo;
+select c,b,a from foo;
+select a from foo;
+select b from foo;
+select c from foo;
+select (foo).* from foo;
+select ROW((foo).*) from foo;
+select ROW((foo).*)::foo from foo;
+select (ROW((foo).*)::foo).* from foo;
+
+create function f() returns setof foo language sql as $$
+select * from foo;
+$$;
+select * from f();
+
+insert into foo
+	select (row('ah', 1126, '2012-10-15')::foo).*
+	returning *;
+insert into foo
+	select (row('eh', 1125, '2012-10-16')::foo).*
+	returning foo.*;
+
+insert into foo values
+	('values one', 1, '2008-10-20'),
+	('values two', 2, '2004-08-15');
+
+copy foo from stdin;
+copy one	1001	1998-12-10 23:54
+copy two	1002	1996-08-01 09:22
+\.
+select * from foo order by 2;
+
+-- Test some joins
+create table bar (x text, y int default 142857, z timestamp );
+insert into bar values ('oh no', default, '1937-04-28');
+insert into bar values ('oh yes', 42, '1492-12-31');
+begin;
+update pg_attribute set attlognum = 3 where attname = 'x' and attrelid = 'bar'::regclass;
+update pg_attribute set attlognum = 1 where attname = 'z' and attrelid = 'bar'::regclass;
+commit;
+select foo.* from bar, foo where bar.y = foo.a;
+select bar.* from bar, foo where bar.y = foo.a;
+select * from bar, foo where bar.y = foo.a;
+select * from foo join bar on (foo.a = bar.y);
+alter table bar rename y to a;
+select * from foo natural join bar;
+select * from foo join bar using (a);
+
+create table baz (e point) inherits (foo, bar); -- fail to merge defaults
+create table baz (e point, a int default 23) inherits (foo, bar);
+insert into baz (e) values ('(1,1)');
+select * from foo;
+select * from bar;
+select * from baz;
+
+create table quux (a int, b int[], c int);
+begin;
+update pg_attribute set attlognum = 1 where attnum = 2 and attrelid = 'quux'::regclass;
+update pg_attribute set attlognum = 2 where attnum = 1 and attrelid = 'quux'::regclass;
+commit;
+select * from quux where (a,c) in ( select a,c from quux );
+
+
+drop table foo, bar, baz, quux cascade;

Jim Nasby

Jim.Nasby@BlueTreble.com

about 11 years ago

In reply to: Alvaro Herrera (#1)

Re: logical column ordering

On 12/9/14, 11:41 AM, Alvaro Herrera wrote:

I'm going to see about it while I get feedback on the rest of this patch; in
particular, extra test cases that fail to work when columns have been
moved around are welcome, so that I can add them to the regress test.
What I have now is the basics I'm building as I go along. The
regression tests show examples of some logical column renumbering (which
can be done after the table already contains some data) but none of
physical column renumbering (which can only be done when the table is
completely empty.) My hunch is that the sample foo, bar, baz, quux
tables should present plenty of opportunities to display brokenness in
the planner and executor.

The ideal case would be to do something like randomizing logical and physical ordering as tables are created throughout the entire test suite (presumably as an option). That should work for physical ordering, but presumably it would pointlessly blow up on logical ordering because the expected output is hard-coded.

Perhaps instead of randomizing logical ordering we could force that to be the same ordering in which fields were supplied and actually randomize attnum?

In particular, I'm thinking that in DefineRelation we can randomize stmt->tableElts before merging in inheritance attributes.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Josh Berkus

josh@agliodbs.com

about 11 years ago

In reply to: Alvaro Herrera (#1)

Re: logical column ordering

On 12/09/2014 09:41 AM, Alvaro Herrera wrote:

The first thing where this matters is tuple descriptor expansion in
parse analysis; at this stage, things such as "*" (in "select *") are
turned into a target list, which must be sorted according to attlognum.
To achieve this I added a new routine to tupledescs,

The two other major cases are:

INSERT INTO table SELECT|VALUES ...

COPY table FROM|TO ...

... although copy should just be a subclass of SELECT.

Question on COPY, though: there's reasons why people would want COPY to
dump in either physical or logical order. If you're doing COPY to
create CSV files for output, then you want the columns in logical order.
If you're doing COPY for pg_dump, then you want them in physical order
for faster dump/reload. So we're almost certainly going to need to have
an option for COPY.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WMbabec396840f3be2f40e87e9027a4e1754b12e210fc191725f6e523c73786e77a40d4dafbdb2c095d7d0b52675464e8f@asav-1.01.com

Andrew Dunstan

andrew@dunslane.net

about 11 years ago

In reply to: Josh Berkus (#3)

Re: logical column ordering

On 12/09/2014 06:19 PM, Josh Berkus wrote:

On 12/09/2014 09:41 AM, Alvaro Herrera wrote:

The first thing where this matters is tuple descriptor expansion in
parse analysis; at this stage, things such as "*" (in "select *") are
turned into a target list, which must be sorted according to attlognum.
To achieve this I added a new routine to tupledescs,

The two other major cases are:

INSERT INTO table SELECT|VALUES ...

COPY table FROM|TO ...

... although copy should just be a subclass of SELECT.

Question on COPY, though: there's reasons why people would want COPY to
dump in either physical or logical order. If you're doing COPY to
create CSV files for output, then you want the columns in logical order.
If you're doing COPY for pg_dump, then you want them in physical order
for faster dump/reload. So we're almost certainly going to need to have
an option for COPY.

I seriously doubt it, although I could be wrong. Unless someone can show
a significant performance gain from using physical order, which would be
a bit of a surprise to me, I would just stick with logical ordering as
the default.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

alvherre@2ndquadrant.com

about 11 years ago

In reply to: Andrew Dunstan (#4)

Re: logical column ordering

Andrew Dunstan wrote:

On 12/09/2014 06:19 PM, Josh Berkus wrote:

On 12/09/2014 09:41 AM, Alvaro Herrera wrote:

The first thing where this matters is tuple descriptor expansion in
parse analysis; at this stage, things such as "*" (in "select *") are
turned into a target list, which must be sorted according to attlognum.
To achieve this I added a new routine to tupledescs,

The two other major cases are:

INSERT INTO table SELECT|VALUES ...

COPY table FROM|TO ...

Yes, both are covered.

... although copy should just be a subclass of SELECT.

It is not. There's one part of COPY that goes through SELECT
processing, but only when the "table" being copied is a subselect.
Normal COPY does not use the same code path.

Question on COPY, though: there's reasons why people would want COPY to
dump in either physical or logical order. If you're doing COPY to
create CSV files for output, then you want the columns in logical order.
If you're doing COPY for pg_dump, then you want them in physical order
for faster dump/reload. So we're almost certainly going to need to have
an option for COPY.

I seriously doubt it, although I could be wrong. Unless someone can show a
significant performance gain from using physical order, which would be a bit
of a surprise to me, I would just stick with logical ordering as the
default.

Well, we have an optimization that avoids a projection step IIRC by
using the "physical tlist" instead of having to build a tailored one. I
guess the reason that's there is because somebody did measure an
improvement. Maybe it *is* worth having as an option for pg_dump ...

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Josh Berkus (#3)

Re: logical column ordering

Josh Berkus <josh@agliodbs.com> writes:

Question on COPY, though: there's reasons why people would want COPY to
dump in either physical or logical order. If you're doing COPY to
create CSV files for output, then you want the columns in logical order.
If you're doing COPY for pg_dump, then you want them in physical order
for faster dump/reload. So we're almost certainly going to need to have
an option for COPY.

This is complete nonsense, Josh, or at least it is until you can provide
some solid evidence to believe that column ordering would make any
noticeable performance difference in COPY. I know of no reason to believe
that the existing user-defined-column-ordering option makes any difference.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Alvaro Herrera (#5)

Re: logical column ordering

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Andrew Dunstan wrote:

I seriously doubt it, although I could be wrong. Unless someone can show a
significant performance gain from using physical order, which would be a bit
of a surprise to me, I would just stick with logical ordering as the
default.

Well, we have an optimization that avoids a projection step IIRC by
using the "physical tlist" instead of having to build a tailored one. I
guess the reason that's there is because somebody did measure an
improvement. Maybe it *is* worth having as an option for pg_dump ...

The physical tlist thing is there because it's demonstrable that
ExecProject() takes nonzero time. COPY does not go through ExecProject
though. What's more, it already has code to deal with a user-specified
column order, and nobody's ever claimed that that code imposes a
measurable performance overhead.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Tom Lane (#7)

Re: logical column ordering

On Wed, Dec 10, 2014 at 12:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Andrew Dunstan wrote:

I seriously doubt it, although I could be wrong. Unless someone can show a
significant performance gain from using physical order, which would be a bit
of a surprise to me, I would just stick with logical ordering as the
default.

Well, we have an optimization that avoids a projection step IIRC by
using the "physical tlist" instead of having to build a tailored one. I
guess the reason that's there is because somebody did measure an
improvement. Maybe it *is* worth having as an option for pg_dump ...

The physical tlist thing is there because it's demonstrable that
ExecProject() takes nonzero time. COPY does not go through ExecProject
though. What's more, it already has code to deal with a user-specified
column order, and nobody's ever claimed that that code imposes a
measurable performance overhead.

Also, if we're adding options to use the physical rather than the
logical column ordering in too many places, that's probably a sign
that we need to rethink this whole concept. The concept of a logical
column ordering doesn't have much meaning if you're constantly forced
to fall back to some other column ordering whenever you want good
performance.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

alvherre@2ndquadrant.com

about 11 years ago

In reply to: Robert Haas (#8)

Re: logical column ordering

Robert Haas wrote:

On Wed, Dec 10, 2014 at 12:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Andrew Dunstan wrote:

I seriously doubt it, although I could be wrong. Unless someone can show a
significant performance gain from using physical order, which would be a bit
of a surprise to me, I would just stick with logical ordering as the
default.

Well, we have an optimization that avoids a projection step IIRC by
using the "physical tlist" instead of having to build a tailored one. I
guess the reason that's there is because somebody did measure an
improvement. Maybe it *is* worth having as an option for pg_dump ...

The physical tlist thing is there because it's demonstrable that
ExecProject() takes nonzero time. COPY does not go through ExecProject
though. What's more, it already has code to deal with a user-specified
column order, and nobody's ever claimed that that code imposes a
measurable performance overhead.

Also, if we're adding options to use the physical rather than the
logical column ordering in too many places, that's probably a sign
that we need to rethink this whole concept. The concept of a logical
column ordering doesn't have much meaning if you're constantly forced
to fall back to some other column ordering whenever you want good
performance.

FWIW I have no intention to add options for physical/logical ordering
anywhere. All users will see is that tables will follow the same
(logical) order everywhere.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Alvaro Herrera (#9)

Re: logical column ordering

On Wed, Dec 10, 2014 at 9:25 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

FWIW I have no intention to add options for physical/logical ordering
anywhere. All users will see is that tables will follow the same
(logical) order everywhere.

Just to be clear, I wasn't in any way attending to say that the patch
had a problem in this area. I was just expressing concern about the
apparent rush to judgement on whether converting between physical and
logical column ordering would be expensive. I certainly think that's
something that we should test - for example, we might want to consider
whether there are cases where you could maybe convince the executor to
spend a lot of time pointlessly reorganizing tuples in ways that
wouldn't happen today. But I have no particular reason to think that
any issues we hit there issues won't be solvable.

To the extent that I have any concern about the patch at this point,
it's around stability. I would awfully rather see something like this
get committed at the beginning of a development cycle than the end.
It's quite possible that I'm being more nervous than is justified, but
given that we're *still* fixing bugs related to dropped-column
handling (cf. 9b35ddce93a2ef336498baa15581b9d10f01db9c from July of
this year) which was added in July 2002, maybe not.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Stephen Frost

sfrost@snowman.net

about 11 years ago

In reply to: Robert Haas (#10)

Re: logical column ordering

Robert, all,

* Robert Haas (robertmhaas@gmail.com) wrote:

To the extent that I have any concern about the patch at this point,
it's around stability. I would awfully rather see something like this
get committed at the beginning of a development cycle than the end.

I tend to agree with this; we have a pretty bad habit of bouncing
around big patches until the end and then committing them. That's not
as bad when the patch has been getting reviews and feedback over a few
months (or years) but this particular patch isn't even code-complete at
this point, aiui.

It's quite possible that I'm being more nervous than is justified, but
given that we're *still* fixing bugs related to dropped-column
handling (cf. 9b35ddce93a2ef336498baa15581b9d10f01db9c from July of
this year) which was added in July 2002, maybe not.

I'm not quite sure that I see how that's relevant. Bugs will happen,
unfortunately, no matter how much review is done of a given patch. That
isn't to say that we shouldn't do any review, but it's a trade-off.
This change, at least, strikes me as less likely to have subtle bugs
in it as compared to the dropped column case.

Thanks,

Stephen

#12

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Stephen Frost (#11)

Re: logical column ordering

On Wed, Dec 10, 2014 at 11:22 AM, Stephen Frost <sfrost@snowman.net> wrote:

I'm not quite sure that I see how that's relevant. Bugs will happen,
unfortunately, no matter how much review is done of a given patch. That
isn't to say that we shouldn't do any review, but it's a trade-off.
This change, at least, strikes me as less likely to have subtle bugs
in it as compared to the dropped column case.

Yeah, that's possible. They seem similar to me because they both
introduce new ways for the tuple as it is stored on disk to be
different from what must be shown to the user. But I don't really
know how well that translates to what needs to be changed on a code
level. If we're basically going back over all the same places that
needed special handling for attisdropped, then the likelihood of bugs
may be quite a bit lower than it was for that patch, because now we
know (mostly!) which places require attisdropped handling and we can
audit them all to make sure they handle this, too. But if it's a
completely different set of places that need to be touched, then I
think there's a lively possibility for bugs of omission.

Ultimately, I think this is mostly going to be a question of what
Alvaro feels comfortable with; he's presumably going to have a better
sense of where and to what extent there might be bugs lurking than any
of the rest of us. But the point is worth considering, because I
think we would probably all agree that having a release that is stable
and usable right out of the gate is more important than having any
single feature get into that release.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Andres Freund

andres@2ndquadrant.com

about 11 years ago

In reply to: Robert Haas (#12)

Re: logical column ordering

On 2014-12-10 12:06:11 -0500, Robert Haas wrote:

Ultimately, I think this is mostly going to be a question of what
Alvaro feels comfortable with; he's presumably going to have a better
sense of where and to what extent there might be bugs lurking than any
of the rest of us. But the point is worth considering, because I
think we would probably all agree that having a release that is stable
and usable right out of the gate is more important than having any
single feature get into that release.

Sure, 9.4 needs to be out of the gate. I don't think anybody doubts
that.

But the scheduling of commits with regard to the 9.5 schedule actually
opens a relevant question: When are we planning to release 9.5? Because
If we try ~ one year from now it's a whole different ballgame than if we
try to go back to september. And I think there's pretty good arguments
for both.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Stephen Frost

sfrost@snowman.net

about 11 years ago

In reply to: Andres Freund (#13)

Re: logical column ordering

* Andres Freund (andres@2ndquadrant.com) wrote:

But the scheduling of commits with regard to the 9.5 schedule actually
opens a relevant question: When are we planning to release 9.5? Because
If we try ~ one year from now it's a whole different ballgame than if we
try to go back to september. And I think there's pretty good arguments
for both.

This should really be on its own thread for discussion... I'm leaning,
at the moment at least, towards the September release schedule. I agree
that having a later release would allow us to get more into it, but
there's a lot to be said for the consistency we've kept up over the past
few years with a September (our last non-September release was 8.4).

I'd certainly vote against planning for a mid-December release as, at
least in my experience, it's a bad idea to try and do December (or
January 1..) major releases. October might work, but that's not much of
a change from September. Late January or February would probably work
but that's quite a shift from September and don't think it'd be
particularly better. Worse, I'm nervous we might get into a habit of
longer and longer releases. Having yearly releases, imv at least, is
really good for the project and those who depend on it. New features
are available pretty quickly to end-users and people can plan around our
schedule pretty easily (eg- plan to do DB upgrades in January/February).

Thanks,

Stephen

#15

Josh Berkus

josh@agliodbs.com

about 11 years ago

In reply to: Josh Berkus (#3)

Re: logical column ordering

On 12/10/2014 05:14 PM, Stephen Frost wrote:

* Andres Freund (andres@2ndquadrant.com) wrote:

But the scheduling of commits with regard to the 9.5 schedule actually
opens a relevant question: When are we planning to release 9.5? Because
If we try ~ one year from now it's a whole different ballgame than if we
try to go back to september. And I think there's pretty good arguments
for both.

This should really be on its own thread for discussion... I'm leaning,
at the moment at least, towards the September release schedule. I agree
that having a later release would allow us to get more into it, but
there's a lot to be said for the consistency we've kept up over the past
few years with a September (our last non-September release was 8.4).

Can we please NOT discuss this in the thread about someone's patch? Thanks.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WM8730e287691fedfb173de4b16e300acb1167d14622354807cc43048b59e3366cc246602f28fd12b980b07f7a09bb154c@asav-2.01.com

#16

Josh Berkus

josh@agliodbs.com

about 11 years ago

In reply to: Alvaro Herrera (#1)

Re: logical column ordering

On 12/09/2014 09:11 PM, Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

Question on COPY, though: there's reasons why people would want COPY to
dump in either physical or logical order. If you're doing COPY to
create CSV files for output, then you want the columns in logical order.
If you're doing COPY for pg_dump, then you want them in physical order
for faster dump/reload. So we're almost certainly going to need to have
an option for COPY.

This is complete nonsense, Josh, or at least it is until you can provide
some solid evidence to believe that column ordering would make any
noticeable performance difference in COPY. I know of no reason to believe
that the existing user-defined-column-ordering option makes any difference.

Chill, dude, chill. When we have a patch, I'll do some performance
testing, and we'll see if it's something we care about or not. There's
no reason to be belligerent about it.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WMb2894de8245392ed88002ec749c2f24bd2037f66943d60b40bb06df9a9533454e42e8e1838829e1a4a3214110fc1560b@asav-3.01.com

#17

Andres Freund

andres@2ndquadrant.com

about 11 years ago

In reply to: Josh Berkus (#15)

Re: logical column ordering

On 2014-12-10 19:06:28 -0800, Josh Berkus wrote:

On 12/10/2014 05:14 PM, Stephen Frost wrote:

* Andres Freund (andres@2ndquadrant.com) wrote:

But the scheduling of commits with regard to the 9.5 schedule actually
opens a relevant question: When are we planning to release 9.5? Because
If we try ~ one year from now it's a whole different ballgame than if we
try to go back to september. And I think there's pretty good arguments
for both.

This should really be on its own thread for discussion... I'm leaning,
at the moment at least, towards the September release schedule. I agree
that having a later release would allow us to get more into it, but
there's a lot to be said for the consistency we've kept up over the past
few years with a September (our last non-September release was 8.4).

Can we please NOT discuss this in the thread about someone's patch? Thanks.

Well, it's relevant for the arguments made about the patches future...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Josh Berkus (#15)

9.5 release scheduling (was Re: logical column ordering)

Josh Berkus <josh@agliodbs.com> writes:

On 12/10/2014 05:14 PM, Stephen Frost wrote:

* Andres Freund (andres@2ndquadrant.com) wrote:

But the scheduling of commits with regard to the 9.5 schedule actually
opens a relevant question: When are we planning to release 9.5? Because
If we try ~ one year from now it's a whole different ballgame than if we
try to go back to september. And I think there's pretty good arguments
for both.

This should really be on its own thread for discussion... I'm leaning,
at the moment at least, towards the September release schedule. I agree
that having a later release would allow us to get more into it, but
there's a lot to be said for the consistency we've kept up over the past
few years with a September (our last non-September release was 8.4).

Can we please NOT discuss this in the thread about someone's patch? Thanks.

Quite. So, here's a new thread.

MHO is that, although 9.4 has slipped more than any of us would like,
9.5 development launched right on time in August. So I don't see a
good reason to postpone 9.5 release just because 9.4 has slipped.
I think we should stick to the schedule agreed to in Ottawa last spring.

Comments?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Josh Berkus

josh@agliodbs.com

about 11 years ago

In reply to: Tom Lane (#18)

Re: 9.5 release scheduling (was Re: logical column ordering)

On 12/10/2014 09:35 PM, Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

On 12/10/2014 05:14 PM, Stephen Frost wrote:

* Andres Freund (andres@2ndquadrant.com) wrote:

But the scheduling of commits with regard to the 9.5 schedule actually
opens a relevant question: When are we planning to release 9.5? Because
If we try ~ one year from now it's a whole different ballgame than if we
try to go back to september. And I think there's pretty good arguments
for both.

This should really be on its own thread for discussion... I'm leaning,
at the moment at least, towards the September release schedule. I agree
that having a later release would allow us to get more into it, but
there's a lot to be said for the consistency we've kept up over the past
few years with a September (our last non-September release was 8.4).

Can we please NOT discuss this in the thread about someone's patch? Thanks.

Quite. So, here's a new thread.

MHO is that, although 9.4 has slipped more than any of us would like,
9.5 development launched right on time in August. So I don't see a
good reason to postpone 9.5 release just because 9.4 has slipped.
I think we should stick to the schedule agreed to in Ottawa last spring.

Comments?

If anything, it seems like a great reason to try to get 9.5 out BEFORE
we open the tree for 9.6/10.0/Cloud++. While there were technical
issues, 9.4 dragged a considerable amount because most people were
ignoring it in favor of 9.5 development.

So far, I haven't seen any features for 9.5 which would delay a more
timely release the way we did for 9.4. Anybody know of a bombshell
someone's going to drop on us for CF5?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WM3e4863365dde52cea1482b101f2f1583d6f6842c40189dc256a29706be45d8ed0afba4d39ffc73e3627f4af92cf94004@asav-3.01.com

#20

Peter Geoghegan

pg@heroku.com

about 11 years ago

In reply to: Josh Berkus (#19)

Re: 9.5 release scheduling (was Re: logical column ordering)

On Wed, Dec 10, 2014 at 10:18 PM, Josh Berkus <josh@agliodbs.com> wrote:

So far, I haven't seen any features for 9.5 which would delay a more
timely release the way we did for 9.4. Anybody know of a bombshell
someone's going to drop on us for CF5?

I had wondered about that myself. What about jsquery? Is that something
that is planned for submission some time in the current cycle?

FWIW, I strongly doubt that I'll find time to work on anything like that
for 9.5.

--
Peter Geoghegan

#21

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Tom Lane (#18)

Re: 9.5 release scheduling (was Re: logical column ordering)

On Thu, Dec 11, 2014 at 12:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Quite. So, here's a new thread.

MHO is that, although 9.4 has slipped more than any of us would like,
9.5 development launched right on time in August. So I don't see a
good reason to postpone 9.5 release just because 9.4 has slipped.
I think we should stick to the schedule agreed to in Ottawa last spring.

Comments?

I'm fine with that, but in the spirit of playing the devil's advocate:

1. At the development meeting, Simon argued for the 5CF schedule for
this release, with CF5 not starting until February, as a way of making
sure that there was time after the release of 9.4 to get feedback from
actual users in time to do something about it for 9.5. If anything,
we're going to end up being significantly worse off in that regard
than we would have been, because we're releasing in late December
instead of early September; an extra month of time to get patches in
does not make up for a release that was delayed nearly three months.

2. It's not clear that we're going to have a particularly-impressive
list of major features for 9.5. So far we've got RLS and BRIN. I
expect that GROUPING SETS is far enough along that it should be
possible to get it in before development ends, and there are a few
performance patches pending (Andres's lwlock scalability patches,
Rahila's work on compressing full-page writes) that I think will
probably make the grade. But after that it seems to me that it gets
pretty thin on the ground. Are we going to bill commit timestamp
tracking - with replication node ID tracking as the real goal, despite
the name - as a major feature, or DDL deparsing if that goes in, as
major features? As useful as they may be for BDR, they don't strike
me as things we can publicize as major features independent of BDR.
And it's getting awfully late for any other major work that people are
thinking of to start showing up.

Now, against all that, if we don't get back on our usual release
schedule then (a) it will look like we're losing momentum, which I'm
actually afraid may be true rather than merely a perception, and (b)
people whose stuff did get in will have to wait longer to see it
released. So, I'm not sure waiting is any better. But there are
certainly some things not to like about where we are.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Josh Berkus (#19)

Re: 9.5 release scheduling (was Re: logical column ordering)

On Thu, Dec 11, 2014 at 1:18 AM, Josh Berkus <josh@agliodbs.com> wrote:

While there were technical
issues, 9.4 dragged a considerable amount because most people were
ignoring it in favor of 9.5 development.

I think 9.4 dragged almost entirely because of one issue: the
compressibility of JSONB. And it became pretty clear early on in that
discussion that it was not going to be resolved until Tom signed off
on some proposed fix. Which I think points to the issue Bruce
mentioned about how busy many of our key contributors are. I could
have easily spent four times as much time doing reviewing and
committing over the last few months, but I'm not willing to work an 80
or 100 hour week to make that happen.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Robert Haas (#22)

Re: 9.5 release scheduling (was Re: logical column ordering)

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Dec 11, 2014 at 1:18 AM, Josh Berkus <josh@agliodbs.com> wrote:

While there were technical
issues, 9.4 dragged a considerable amount because most people were
ignoring it in favor of 9.5 development.

I think 9.4 dragged almost entirely because of one issue: the
compressibility of JSONB.

Meh. While we certainly weren't very speedy about resolving that,
I don't think that issue deserves all or even most of the blame.
I agree with Josh: the problem really was that people were not focusing
on getting 9.4 tested and releasable. One way in which that lack of
focus manifested was not having any urgency about resolving JSONB ...
but it struck me as a systemic problem and not that specific issue's
fault.

I'd finger two underlying issues here:

1. As Bruce points out in a nearby thread, we've been in commitfest mode
more or less continuously since August. That inherently sucks away
developer mindshare from testing already-committed stuff.

2. The amount of pre-release testing we get from people outside the
hard-core development crowd seems to be continuing to decrease.
We were fortunate that somebody found the JSONB issue before it was
too late to do anything about it. Personally, I'm very worried that
there are other such bugs in 9.4. But I've given up hoping that any
more testing will happen until we put out something that calls itself
9.4.0, which is why I voted to release in the core discussion about it.

I don't know what to do about either of those things, but I predict
future release cycles will be just as bad unless we can fix them.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Robert Haas (#21)

Re: 9.5 release scheduling (was Re: logical column ordering)

On Thu, Dec 11, 2014 at 10:37:32AM -0500, Robert Haas wrote:

2. It's not clear that we're going to have a particularly-impressive
list of major features for 9.5. So far we've got RLS and BRIN. I
expect that GROUPING SETS is far enough along that it should be
possible to get it in before development ends, and there are a few
performance patches pending (Andres's lwlock scalability patches,
Rahila's work on compressing full-page writes) that I think will
probably make the grade. But after that it seems to me that it gets
pretty thin on the ground. Are we going to bill commit timestamp
tracking - with replication node ID tracking as the real goal, despite
the name - as a major feature, or DDL deparsing if that goes in, as
major features? As useful as they may be for BDR, they don't strike
me as things we can publicize as major features independent of BDR.
And it's getting awfully late for any other major work that people are
thinking of to start showing up.

How bad is the 9.5 feature list going to be compared to the 9.4 one that
had JSONB, but also a lot of infrastructure additions.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Bruce Momjian (#24)

Re: 9.5 release scheduling (was Re: logical column ordering)

Bruce Momjian <bruce@momjian.us> writes:

On Thu, Dec 11, 2014 at 10:37:32AM -0500, Robert Haas wrote:

2. It's not clear that we're going to have a particularly-impressive
list of major features for 9.5.

How bad is the 9.5 feature list going to be compared to the 9.4 one that
had JSONB, but also a lot of infrastructure additions.

Well, whatever the list ends up being, "let's wait until we have some
more features" isn't a tenable scheduling policy. We learned years ago
the folly of delaying a release until not-quite-ready feature X was ready.
Are we going to delay 9.5 until not-even-proposed-yet features are ready?

More abstractly, there's a lot of value in having a predictable release
schedule. That's going to mean that some release cycles are thin on
user-visible features, even if just as much work went into them. It's
the nature of the game.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Tom Lane (#23)

Re: 9.5 release scheduling (was Re: logical column ordering)

On Thu, Dec 11, 2014 at 11:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think 9.4 dragged almost entirely because of one issue: the
compressibility of JSONB.

Meh. While we certainly weren't very speedy about resolving that,
I don't think that issue deserves all or even most of the blame.
I agree with Josh: the problem really was that people were not focusing
on getting 9.4 tested and releasable. One way in which that lack of
focus manifested was not having any urgency about resolving JSONB ...
but it struck me as a systemic problem and not that specific issue's
fault.

I'd finger two underlying issues here:

1. As Bruce points out in a nearby thread, we've been in commitfest mode
more or less continuously since August. That inherently sucks away
developer mindshare from testing already-committed stuff.

The problem is that, on the one hand, we have a number of serious
problems with things that got committed and turned out to have
problems - the multixact stuff, and JSONB, in particular - and on the
other hand, we are lacking in adequate committer bandwidth to properly
handle all of the new patches that come in. We can fix the first
problem by tightening up on the requirements for committing things,
but that exacerbates the second problem. Or we can fix the second
problem by loosening up on the requirements for commit, but that
exacerbates the first problem. Promoting more or fewer committers is
really the same trade-off: if you're very careful about who you
promote, you'll get better people but not as many of them, so less
will get done but with fewer mistakes; if you're more generous in
handing out commit bits, you reduce the bottleneck to stuff getting
done but, inevitably, you'll be trusting people in whom you have at
least slightly less confidence. There's an inherent tension between
quality and rate of progress that we can't get rid of, and the fact
that some of our best people are busier than ever with things other
than PostgreSQL hacking is not helping - not only because less actual
review/commit happens, but because newcomers to the community don't
have as much contact with the more senior people who could help mentor
them if they only had the time.

2. The amount of pre-release testing we get from people outside the
hard-core development crowd seems to be continuing to decrease.
We were fortunate that somebody found the JSONB issue before it was
too late to do anything about it. Personally, I'm very worried that
there are other such bugs in 9.4. But I've given up hoping that any
more testing will happen until we put out something that calls itself
9.4.0, which is why I voted to release in the core discussion about it.

I don't know what to do about either of those things, but I predict
future release cycles will be just as bad unless we can fix them.

I agree. We have had a few people, Jeff Janes perhaps foremost among
them, who have done a lot of really useful testing, but overall it
does feel pretty thin.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

David G Johnston

david.g.johnston@gmail.com

about 11 years ago

In reply to: Tom Lane (#23)

Re: 9.5 release scheduling (was Re: logical column ordering)

Tom Lane-2 wrote

Robert Haas <

robertmhaas@

> writes:

On Thu, Dec 11, 2014 at 1:18 AM, Josh Berkus <

josh@

> wrote:

While there were technical
issues, 9.4 dragged a considerable amount because most people were
ignoring it in favor of 9.5 development.

I think 9.4 dragged almost entirely because of one issue: the
compressibility of JSONB.

2. The amount of pre-release testing we get from people outside the
hard-core development crowd seems to be continuing to decrease.
We were fortunate that somebody found the JSONB issue before it was
too late to do anything about it. Personally, I'm very worried that
there are other such bugs in 9.4. But I've given up hoping that any
more testing will happen until we put out something that calls itself
9.4.0, which is why I voted to release in the core discussion about it.

The compressibility properties of a new type seem like something that should
be mandated before it is committed - it shouldn't require good fortune that

--
View this message in context: http://postgresql.nabble.com/logical-column-ordering-tp5829775p5830115.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

David G Johnston

david.g.johnston@gmail.com

about 11 years ago

In reply to: David G Johnston (#27)

Re: 9.5 release scheduling (was Re: logical column ordering)

David G Johnston wrote

Tom Lane-2 wrote

Robert Haas <

robertmhaas@

> writes:

On Thu, Dec 11, 2014 at 1:18 AM, Josh Berkus <

josh@

> wrote:

While there were technical
issues, 9.4 dragged a considerable amount because most people were
ignoring it in favor of 9.5 development.

I think 9.4 dragged almost entirely because of one issue: the
compressibility of JSONB.

2. The amount of pre-release testing we get from people outside the
hard-core development crowd seems to be continuing to decrease.
We were fortunate that somebody found the JSONB issue before it was
too late to do anything about it. Personally, I'm very worried that
there are other such bugs in 9.4. But I've given up hoping that any
more testing will happen until we put out something that calls itself
9.4.0, which is why I voted to release in the core discussion about it.

The compressibility properties of a new type seem like something that
should be mandated before it is committed - it shouldn't require good
fortune that

... someone thought to test it out. Is there anywhere to effectively add
that to maximize the chance someone remembers for next time?

3. Effort and brain power was also diverted to fixing 9.3 this time around.

I don't have any answers but if a release is thin on features but thick on
review and testing I wouldn't complain. That testing, especially applied to
existing releases, would have considerable benefit to those still using
older supported releases instead of only benefitting those who can upgrade
to the latest and greatest. An existing user on an older release is just,
if not more, important than the potential user who is looking for new
features before deciding to migrate or begin using PostgreSQL.

David J.

--
View this message in context: http://postgresql.nabble.com/logical-column-ordering-tp5829775p5830116.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29

Heikki Linnakangas

hlinnakangas@vmware.com

about 11 years ago

In reply to: Robert Haas (#26)

Re: 9.5 release scheduling (was Re: logical column ordering)

On 12/11/2014 06:59 PM, Robert Haas wrote:

On Thu, Dec 11, 2014 at 11:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think 9.4 dragged almost entirely because of one issue: the
compressibility of JSONB.

Meh. While we certainly weren't very speedy about resolving that,
I don't think that issue deserves all or even most of the blame.
I agree with Josh: the problem really was that people were not focusing
on getting 9.4 tested and releasable. One way in which that lack of
focus manifested was not having any urgency about resolving JSONB ...
but it struck me as a systemic problem and not that specific issue's
fault.

I'd finger two underlying issues here:

1. As Bruce points out in a nearby thread, we've been in commitfest mode
more or less continuously since August. That inherently sucks away
developer mindshare from testing already-committed stuff.

The problem is that, on the one hand, we have a number of serious
problems with things that got committed and turned out to have
problems - the multixact stuff, and JSONB, in particular - and on the
other hand, we are lacking in adequate committer bandwidth to properly
handle all of the new patches that come in. We can fix the first
problem by tightening up on the requirements for committing things,
but that exacerbates the second problem. Or we can fix the second
problem by loosening up on the requirements for commit, but that
exacerbates the first problem.

There is a third option: reject more patches, more quickly, with less
discussion. IOW, handle new patches "less properly".

The commitfest is good at making sure that no patch completely falls off
the radar. That's also a problem. Before the commitfest process, many
patches were not actively rejected, but they just fell to the side if no
committer was interested enough in it to pick it up, review, and commit.

There are a lot of patches in the October commitfest that I personally
don't care about, and if I was a dictator I would just drop as "not
worth the trouble to review". Often a patch just doesn't feel quite
right, or would require some work to clean up, but you can't immediately
point to anything outright wrong with it. It takes some effort to review
such a patch enough to give feedback on it, if you want more meaningful
feedback than "This smells bad".

I imagine that it's the same for everyone else. Many of the patches that
sit in the commitfest for weeks are patches that no-one really cares
much about. I'm not sure what to do about that. It would be harsh to
reject a patch just because no-one's gotten around to reviewing it, and
if we start doing that, it's not clear what the point of a commitfest is
any more.

Perhaps we should change the process so that it is the patch author's
responsibility to find a reviewer, and a committer, for the patch. If
you can't cajole anyone to review your patch, it's a sign that no-one
cares about it, and the patch is rejected by default. Or put a quick
+1/-1 voting option to each patch in the commitfest, to get a quick
gauge of how the committers feel about it.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Peter Geoghegan

pg@heroku.com

about 11 years ago

In reply to: Robert Haas (#21)

Re: 9.5 release scheduling (was Re: logical column ordering)

On Thu, Dec 11, 2014 at 7:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:

2. It's not clear that we're going to have a particularly-impressive
list of major features for 9.5. So far we've got RLS and BRIN. I
expect that GROUPING SETS is far enough along that it should be
possible to get it in before development ends, and there are a few
performance patches pending (Andres's lwlock scalability patches,
Rahila's work on compressing full-page writes) that I think will
probably make the grade. But after that it seems to me that it gets
pretty thin on the ground.

I'm slightly surprised that you didn't mention abbreviated keys in
that list of performance features. You're reviewing that patch; how do
you feel about it now?

Are we going to bill commit timestamp
tracking - with replication node ID tracking as the real goal, despite
the name - as a major feature, or DDL deparsing if that goes in, as
major features? As useful as they may be for BDR, they don't strike
me as things we can publicize as major features independent of BDR.
And it's getting awfully late for any other major work that people are
thinking of to start showing up.

Version 1.0 of INSERT ... ON CONFLICT UPDATE was posted in August -
when development launched. It still doesn't have a reviewer, and it
isn't actually in evidence that someone else has so much as downloaded
and applied the patch (I'm sure someone has done that much, but the
fact is that all the feedback that I've received this year concerns
the semantics/syntax, which you can form an opinion on by just looking
at the extensive documentation and other supplementary material I've
written). It's consistently one of the most requested features, and
yet major aspects of the design, that permeate through every major
subsystem go unremarked on for months now. This feature is
*definitely* major feature list material, since people have been
loudly requesting it for over a decade, and yet no one mentions it in
this thread (only Bruce mentioned it in the other thread about the
effectiveness of the CF process). It's definitely in the top 2 or 3
most requested features, alongside much harder problems like parallel
query and comprehensive partitioning support.

If there is a lesson here for people that are working on major
features, or me personally, I don't know what it is -- if anyone else
knows, please tell me. I've bent over backwards to make the patch as
accessible as possible, and as easy to review as possible. I also
think its potential to destabilize the system (as major features go)
is only about average. What am I doing wrong here?

There is an enormous amount of supplementary documentation associated
with the patch:

https://wiki.postgresql.org/wiki/UPSERT
https://wiki.postgresql.org/wiki/Value_locking

Both of these pages are far larger than the Wiki page for RLS, for
example. The UPSERT wiki page is kept right up to date.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31

Jeff Janes

jeff.janes@gmail.com

about 11 years ago

In reply to: Tom Lane (#23)

Re: 9.5 release scheduling (was Re: logical column ordering)

On Thu, Dec 11, 2014 at 8:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

2. The amount of pre-release testing we get from people outside the
hard-core development crowd seems to be continuing to decrease.
We were fortunate that somebody found the JSONB issue before it was
too late to do anything about it. Personally, I'm very worried that
there are other such bugs in 9.4. But I've given up hoping that any
more testing will happen until we put out something that calls itself
9.4.0, which is why I voted to release in the core discussion about it.

We are not particularly inviting of feedback for whatever testing has been
done.

The definitive guide seems to be
https://wiki.postgresql.org/wiki/HowToBetaTest, and says:

You can report tests by email. You can subscribe to any PostgreSQL mailing
list from the subscription form <http://www.postgresql.org/community/lists/>
.

- pgsql-bugs: this is the preferred mailing list if you think you have
found a bug in the beta. You can also use the Bug Reporting Form
<http://www.postgresql.org/support/submitbug/>.
- pgsql-hackers: bugs, questions, and successful test reports are
welcome here if you are already subscribed to pgsql-hackers. Note that
pgsql-hackers is a high-traffic mailing list with a lot of development
discussion.

=========

So if you find a bug, you can report it on the bug reporting form (which
doesn't have a drop-down entry for 9.4RC1).

If you have positive results rather than negative ones (or even complaints
that are not actually bugs), you can subscribe to a mailing list which
generates a lot of traffic which is probably over your head and not
interesting to you.

Does the core team keep a mental list of items they want to see tested by
the public, and they will spend their own time testing those things
themselves if they don't hear back on some positive tests for them?

If we find reports of public testing that yields good results (or at least
no bugs) to be useful, we should be more clear on how to go about doing
it. But are positive reports useful? If I report a bug, I can write down
the steps to reproduce it, and then follow my own instructions to make sure
it does actually reproduce it. If I find no bugs, it is just "I did a
bunch of random stuff and nothing bad happened, that I noticed".

Chees,

Jeff

#32

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Peter Geoghegan (#30)

Re: 9.5 release scheduling (was Re: logical column ordering)

On Thu, Dec 11, 2014 at 1:06 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Thu, Dec 11, 2014 at 7:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:

2. It's not clear that we're going to have a particularly-impressive
list of major features for 9.5. So far we've got RLS and BRIN. I
expect that GROUPING SETS is far enough along that it should be
possible to get it in before development ends, and there are a few
performance patches pending (Andres's lwlock scalability patches,
Rahila's work on compressing full-page writes) that I think will
probably make the grade. But after that it seems to me that it gets
pretty thin on the ground.

I'm slightly surprised that you didn't mention abbreviated keys in
that list of performance features. You're reviewing that patch; how do
you feel about it now?

I'm not sure it's where I think it needs to be yet, but yeah, I think
that will get in. I thought of it after I hit send.

Are we going to bill commit timestamp
tracking - with replication node ID tracking as the real goal, despite
the name - as a major feature, or DDL deparsing if that goes in, as
major features? As useful as they may be for BDR, they don't strike
me as things we can publicize as major features independent of BDR.
And it's getting awfully late for any other major work that people are
thinking of to start showing up.

Version 1.0 of INSERT ... ON CONFLICT UPDATE was posted in August -
when development launched. It still doesn't have a reviewer, and it
isn't actually in evidence that someone else has so much as downloaded
and applied the patch (I'm sure someone has done that much, but the
fact is that all the feedback that I've received this year concerns
the semantics/syntax, which you can form an opinion on by just looking
at the extensive documentation and other supplementary material I've
written). It's consistently one of the most requested features, and
yet major aspects of the design, that permeate through every major
subsystem go unremarked on for months now. This feature is
*definitely* major feature list material, since people have been
loudly requesting it for over a decade, and yet no one mentions it in
this thread (only Bruce mentioned it in the other thread about the
effectiveness of the CF process). It's definitely in the top 2 or 3
most requested features, alongside much harder problems like parallel
query and comprehensive partitioning support.

I'm not sure whether that patch is going to go anywhere. There has
been so much arguing about different aspects of the design that felt
rather unproductive that I think most of the people who would be
qualified to commit it have grown a bit gun-shy. That would be a good
problem to get fixed, but I don't have a proposal.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33

Josh Berkus

josh@agliodbs.com

about 11 years ago

In reply to: Tom Lane (#18)

Re: 9.5 release scheduling (was Re: logical column ordering)

On 12/11/2014 09:22 AM, Heikki Linnakangas wrote:

I imagine that it's the same for everyone else. Many of the patches that
sit in the commitfest for weeks are patches that no-one really cares
much about. I'm not sure what to do about that. It would be harsh to
reject a patch just because no-one's gotten around to reviewing it, and
if we start doing that, it's not clear what the point of a commitfest is
any more.

So the "nobody cares" argument is manifestly based on false assumptions.
Are you contending that nobody cares about UPSERT or GROUPING SETS? In
my experience, the patches that sit for weeks on the CF fall into 4 groups:

1. Large complicated patches that only a handful of committers are even
capable of reviewing.

2. Obscure patches which require specialized knowledge our outside input
to review.

3. Inconsequential patches where the submitter doesn't work the social
process.

4. Patches with contentious spec or syntax.

5. Patches which everyone wants but have persistent hard-to-resolve issues.

Of these only (3) would fit into "nobody cares", and that's a pretty
small group.

There's also a chicken-and-egg problem there. Say that we started not
reviewing DW features because "nobody cares". Then the people who
contributed those features don't go on to become major contributors,
which means they won't review further DW patches. Which means that
we've just closed off an entire use-case for PostgreSQL. I'd think that
PostGIS would have taught us the value of the "nobody cares" fallacy.

Also, if we go back on the promise that "every patch gets a review",
then we're definitely headed towards no more new contributors. As Noah
said at one developer meeting (to paraphrase), one of the few things
which keeps contributors persisting through our baroque,
poorly-documented, excessively political contribution process is the
promise that they'll get a fair shake for their invested time. If we
drop that promise, we'll solve our workflow problem by cutting off the
flow of new patches entirely.

Perhaps we should change the process so that it is the patch author's
responsibility to find a reviewer, and a committer, for the patch. If
you can't cajole anyone to review your patch, it's a sign that no-one
cares about it, and the patch is rejected by default. Or put a quick
+1/-1 voting option to each patch in the commitfest, to get a quick
gauge of how the committers feel about it.

Again, that process would favor existing contributors and other folks
who know how to "work the Postgres community". It would be effectively
the same as hanging up a sign which says "no new contributors wanted".
It would also be dramatically increasing the amount of politics around
submitted patches, which take up far more time than the technical work.

Overall, we're experiencing this issue because of a few predictable reasons:

1. a gradual increase in the number of submitted patches, especially
large patches

2. Tom Lane cutting back on the number of other people's patches he
reviews and revises.

3. Other major committers getting busier with their day jobs.

4. Failure to recruit, mentor and promote new committers at a rate
proportional to the number of new contributors or the size of our community.

5. Failure to adopt or develop automated systems to remove some of the
grunt work from patch submission and review.

6. Failure to adhere to any uniform standards of patch handing for the
Commitfests.

(2) out of these is actually the biggest thing we're seeing right now, I
think. Tom was historically a one-man-patch-fixing machine, at one
time responsible for 70% of the patches committed to PostgreSQL. This
was never a good thing for the community, even if it was a good thing
for the code base, becuase it discouraged others from stepping into a
senior committer role. Now Tom has cut back, and others have to take up
the slack. In the long run this will be good for our development
community; in the short run it's kind of painful.

I will also point out that some of the existing senior committers, who
are the heaviest burdened under the existing system, have also been the
most resistant to any change in the system. You (Heikki) yourself
expressed a strong opposition to any attempt to recruit more reviewers.
So, given that you are inside the heart of the problem, do you have a
solution other than your proposal above that we simply stop accepting
new contributors? Or is that your solution? It would work, for some
definition of "work".

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WMcf951e7e6a1f5a64026f29ee2ec552367e91cb72366d7abb2609ca071314f5e205d17498bc67269e40df050e592d6fbc@asav-1.01.com

#34

Josh Berkus

josh@agliodbs.com

about 11 years ago

In reply to: Josh Berkus (#15)

Re: 9.5 release scheduling (was Re: logical column ordering)

On 12/11/2014 08:59 AM, Tom Lane wrote:

More abstractly, there's a lot of value in having a predictable release
schedule. That's going to mean that some release cycles are thin on
user-visible features, even if just as much work went into them. It's
the nature of the game.

+ 1,000,000 from me. ;-)

Frankly, BRIN, UPSERT and a couple other things are plenty for a
Postgres release. Other SQL databases would be thrilled to have that
much ... can you name 3 major advances in the last MySQL release?

And given that I've seen nothing about jquery/VODKA since pgCon, I'm
expecting them for 9.6/whatever, not 9.5. There's a whole longish
syntax discussion we haven't even started yet, let alone actual
technical review.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WM1ea8268e36539d991b49c380c36afe7b2ea30647cee9585b4fec50270469a01761bfa8924e4cfd9d2f68437b56cb1c52@asav-1.01.com

#35

Heikki Linnakangas

hlinnakangas@vmware.com

about 11 years ago

In reply to: Josh Berkus (#33)

Re: 9.5 release scheduling (was Re: logical column ordering)

On 12/11/2014 08:51 PM, Josh Berkus wrote:

On 12/11/2014 09:22 AM, Heikki Linnakangas wrote:

I imagine that it's the same for everyone else. Many of the patches that
sit in the commitfest for weeks are patches that no-one really cares
much about. I'm not sure what to do about that. It would be harsh to
reject a patch just because no-one's gotten around to reviewing it, and
if we start doing that, it's not clear what the point of a commitfest is
any more.

So the "nobody cares" argument is manifestly based on false assumptions.
Are you contending that nobody cares about UPSERT or GROUPING SETS?

No. I was thinking of patches like "Add IF NOT EXISTS to CREATE TABLE AS
and CREATE MATERIALIZED VIEW", "event triggers: more DROP info", "Point
to polygon distance operator" and "pgcrypto: support PGP signatures".
And nobody was an exaggeration; clearly *someone* cares about those
things, or they wouldn't have written a patch in the first place. But
none of the committers care enough to pick them up. Even in the case
that someone reviews them, it's often not because the reviewer is
interested in the patch, it's just to help out with the process.

(Apologies to the authors of those patches; they were just the first few
that caught my eye)

There's also a chicken-and-egg problem there. Say that we started not
reviewing DW features because "nobody cares". Then the people who
contributed those features don't go on to become major contributors,
which means they won't review further DW patches. Which means that
we've just closed off an entire use-case for PostgreSQL. I'd think that
PostGIS would have taught us the value of the "nobody cares" fallacy.

Also, if we go back on the promise that "every patch gets a review",
then we're definitely headed towards no more new contributors. As Noah
said at one developer meeting (to paraphrase), one of the few things
which keeps contributors persisting through our baroque,
poorly-documented, excessively political contribution process is the
promise that they'll get a fair shake for their invested time. If we
drop that promise, we'll solve our workflow problem by cutting off the
flow of new patches entirely.

Yeah, there is that.

Perhaps we should change the process so that it is the patch author's
responsibility to find a reviewer, and a committer, for the patch. If
you can't cajole anyone to review your patch, it's a sign that no-one
cares about it, and the patch is rejected by default. Or put a quick
+1/-1 voting option to each patch in the commitfest, to get a quick
gauge of how the committers feel about it.

Again, that process would favor existing contributors and other folks
who know how to "work the Postgres community". It would be effectively
the same as hanging up a sign which says "no new contributors wanted".
It would also be dramatically increasing the amount of politics around
submitted patches, which take up far more time than the technical work.

I was thinking that by getting candid feedback that none of the existing
contributors are interested to look at your patch, the author could
revise the patch to garner more interest, or perhaps promise to review
someone else's patch in return. Right now, the patches just linger, and
the author doesn't know why, what's going to happen or what to do about it.

I will also point out that some of the existing senior committers, who
are the heaviest burdened under the existing system, have also been the
most resistant to any change in the system. You (Heikki) yourself
expressed a strong opposition to any attempt to recruit more reviewers.

Huh, I did? Can you elaborate?

So, given that you are inside the heart of the problem, do you have a
solution other than your proposal above that we simply stop accepting
new contributors? Or is that your solution? It would work, for some
definition of "work".

I don't have any good solutions, I'm afraid. It might help to ping the
reviewers who have signed up more often, to make the reviews to happen
more quickly. I did some nudge people in the August commitfest, but I
felt that it didn't actually achieve much. The most efficient way to
move the commitfest forward was to just review more patches myself.

That's one thought. Robert said the same thing about when he was the
commitfest manager; he just reviewed most the patches himself in the
end. And you mentioned that Tom used to review 70% of all incoming
patches. How about we make that official? It's the commitfest manager's
duty to review all the patches. He can recruit others if he can, but at
the end of the day he's expected to take a look at every patch, and
commit or return with some feedback.

The problem with that is that we'll have a hard time to find volunteers
for that. But we only need to find one sucker for each commitfest. I can
volunteer to do that once a year; if the other active committers do the
same, we're covered.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36

Alvaro Herrera

alvherre@2ndquadrant.com

about 11 years ago

In reply to: Heikki Linnakangas (#35)

Re: 9.5 release scheduling (was Re: logical column ordering)

Heikki Linnakangas wrote:

That's one thought. Robert said the same thing about when he was the
commitfest manager; he just reviewed most the patches himself in the end.
And you mentioned that Tom used to review 70% of all incoming patches. How
about we make that official? It's the commitfest manager's duty to review
all the patches. He can recruit others if he can, but at the end of the day
he's expected to take a look at every patch, and commit or return with some
feedback.

The problem with that is that we'll have a hard time to find volunteers for
that. But we only need to find one sucker for each commitfest. I can
volunteer to do that once a year; if the other active committers do the
same, we're covered.

Hm, I was commitfest manager once, too, and this exact same thing
happened. Maybe we need to make that official, and give the sucker some
perk.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

Peter Geoghegan

pg@heroku.com

about 11 years ago

In reply to: Alvaro Herrera (#36)

Re: 9.5 release scheduling (was Re: logical column ordering)

On Thu, Dec 11, 2014 at 11:52 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

The problem with that is that we'll have a hard time to find volunteers for
that. But we only need to find one sucker for each commitfest. I can
volunteer to do that once a year; if the other active committers do the
same, we're covered.

Hm, I was commitfest manager once, too, and this exact same thing
happened. Maybe we need to make that official, and give the sucker some
perk.

I should do it at least once, if only because I haven't experienced it.
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

Peter Geoghegan

pg@heroku.com

about 11 years ago

In reply to: Robert Haas (#32)

Re: 9.5 release scheduling (was Re: logical column ordering)

On Thu, Dec 11, 2014 at 10:43 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Version 1.0 of INSERT ... ON CONFLICT UPDATE was posted in August -
when development launched. It still doesn't have a reviewer, and it
isn't actually in evidence that someone else has so much as downloaded
and applied the patch

I'm not sure whether that patch is going to go anywhere. There has
been so much arguing about different aspects of the design that felt
rather unproductive that I think most of the people who would be
qualified to commit it have grown a bit gun-shy. That would be a good
problem to get fixed, but I don't have a proposal.

Really?

I have acted comprehensively on 100% of feedback to date on the
semantics/syntax of the ON CONFLICT UPDATE patch. The record reflects
that. I don't believe that the problem is that people are gun shy. We
haven't even had a real disagreement since last year, and last year's
discussion of value locking was genuinely very useful (I hate to say
it, but the foreign key locking patch might have considered the
possibility of "unprincipled deadlocks" more closely, as we saw
recently).

Lots of other systems have a comparable feature. Most recently, VoltDB
added a custom "UPSERT", even though they don't have half the SQL
features we do. It's a complicated feature, but it's not a
particularly complicated feature as big, impactful features go (like
LATERAL, or logical decoding, or the foreign key locking stuff). It's
entirely possible to get the feature in in the next few months, if
someone will work with me on it.

Even Heikki, who worked on this with me far more than everyone else,
found the value locking page a useful summary. He didn't even know
that there was a third design advocated by Simon and Andres until he
saw it! The discussion was difficult, but had a useful outcome,
because accounting for "unprincipled deadlocks" is a major source of
complexity. I have seen a lot of benefit from comprehensively
documenting stuff in one place (the wiki pages), but that has tapered
off.
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

Alvaro Herrera

alvherre@2ndquadrant.com

about 11 years ago

In reply to: Peter Geoghegan (#37)

Re: 9.5 release scheduling (was Re: logical column ordering)

Peter Geoghegan wrote:

On Thu, Dec 11, 2014 at 11:52 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

The problem with that is that we'll have a hard time to find volunteers for
that. But we only need to find one sucker for each commitfest. I can
volunteer to do that once a year; if the other active committers do the
same, we're covered.

Hm, I was commitfest manager once, too, and this exact same thing
happened. Maybe we need to make that official, and give the sucker some
perk.

I should do it at least once, if only because I haven't experienced it.

We could make an slogan out of it: you've never experienced the
PostgreSQL development process if you haven't been a CFM at least once
in your life.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Robert Haas (#26)

Re: 9.5 release scheduling (was Re: logical column ordering)

On Thu, Dec 11, 2014 at 11:59:58AM -0500, Robert Haas wrote:

The problem is that, on the one hand, we have a number of serious
problems with things that got committed and turned out to have
problems - the multixact stuff, and JSONB, in particular - and on the
other hand, we are lacking in adequate committer bandwidth to properly
handle all of the new patches that come in. We can fix the first
problem by tightening up on the requirements for committing things,
but that exacerbates the second problem. Or we can fix the second
problem by loosening up on the requirements for commit, but that
exacerbates the first problem. Promoting more or fewer committers is
really the same trade-off: if you're very careful about who you
promote, you'll get better people but not as many of them, so less
will get done but with fewer mistakes; if you're more generous in
handing out commit bits, you reduce the bottleneck to stuff getting
done but, inevitably, you'll be trusting people in whom you have at
least slightly less confidence. There's an inherent tension between
quality and rate of progress that we can't get rid of, and the fact
that some of our best people are busier than ever with things other
than PostgreSQL hacking is not helping - not only because less actual
review/commit happens, but because newcomers to the community don't
have as much contact with the more senior people who could help mentor
them if they only had the time.

Great outline of the tradeoffs involved in being more aggressive about
committing. I do think the multixact and JSONB problems have spooked
some of us to be slower/more careful about committing.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41

Jeff Janes

jeff.janes@gmail.com

about 11 years ago

In reply to: Heikki Linnakangas (#35)

Re: 9.5 release scheduling (was Re: logical column ordering)

On Thu, Dec 11, 2014 at 11:40 AM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:

On 12/11/2014 08:51 PM, Josh Berkus wrote:

On 12/11/2014 09:22 AM, Heikki Linnakangas wrote:

Perhaps we should change the process so that it is the patch author's
responsibility to find a reviewer, and a committer, for the patch. If
you can't cajole anyone to review your patch, it's a sign that no-one
cares about it, and the patch is rejected by default. Or put a quick
+1/-1 voting option to each patch in the commitfest, to get a quick
gauge of how the committers feel about it.

Again, that process would favor existing contributors and other folks
who know how to "work the Postgres community". It would be effectively
the same as hanging up a sign which says "no new contributors wanted".
It would also be dramatically increasing the amount of politics around
submitted patches, which take up far more time than the technical work.

I was thinking that by getting candid feedback that none of the existing
contributors are interested to look at your patch, the author could revise
the patch to garner more interest, or perhaps promise to review someone
else's patch in return. Right now, the patches just linger, and the author
doesn't know why, what's going to happen or what to do about it.

I agree. Having your patch disappear into the void is not friendly at
all. But I don't think a commentless "-1" is the answer, either. That
might one of the few things worse than silence. Even if the comment is
just "This seems awfully complicated for a minimally useful feature" or
"This will probably have unintended side effects in the executor that I
don't have time to trace down, can someone make an argument for
correctness", or even "this format is unreadable, I won't review this until
it is fixed" then at least the author (and perhaps more important, junior
contributors who are not the author) will know either what argument to make
to lobby for it, what avenue to take when reviewing it, or whether to just
forget it and work on something more important.

Cheers,

Jeff

#42

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Jeff Janes (#41)

Re: 9.5 release scheduling (was Re: logical column ordering)

On Thu, Dec 11, 2014 at 3:47 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

I agree. Having your patch disappear into the void is not friendly at all.
But I don't think a commentless "-1" is the answer, either. That might one
of the few things worse than silence. Even if the comment is just "This
seems awfully complicated for a minimally useful feature" or "This will
probably have unintended side effects in the executor that I don't have time
to trace down, can someone make an argument for correctness", or even "this
format is unreadable, I won't review this until it is fixed" then at least
the author (and perhaps more important, junior contributors who are not the
author) will know either what argument to make to lobby for it, what avenue
to take when reviewing it, or whether to just forget it and work on
something more important.

Agreed!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: David G Johnston (#27)

Re: 9.5 release scheduling (was Re: logical column ordering)

On Thu, Dec 11, 2014 at 10:04:43AM -0700, David G Johnston wrote:

Tom Lane-2 wrote

Robert Haas <

robertmhaas@

> writes:

On Thu, Dec 11, 2014 at 1:18 AM, Josh Berkus <

josh@

> wrote:

While there were technical
issues, 9.4 dragged a considerable amount because most people were
ignoring it in favor of 9.5 development.

I think 9.4 dragged almost entirely because of one issue: the
compressibility of JSONB.

2. The amount of pre-release testing we get from people outside the
hard-core development crowd seems to be continuing to decrease.
We were fortunate that somebody found the JSONB issue before it was
too late to do anything about it. Personally, I'm very worried that
there are other such bugs in 9.4. But I've given up hoping that any
more testing will happen until we put out something that calls itself
9.4.0, which is why I voted to release in the core discussion about it.

The compressibility properties of a new type seem like something that should
be mandated before it is committed - it shouldn't require good fortune that

Odd are the next problem will have nothing to do with compressibility 
--- we can't assume old failure will repeat themselves.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44

David Johnston

david.g.johnston@gmail.com

about 11 years ago

In reply to: Bruce Momjian (#43)

Re: 9.5 release scheduling (was Re: logical column ordering)

On Thu, Dec 11, 2014 at 2:05 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Thu, Dec 11, 2014 at 10:04:43AM -0700, David G Johnston wrote:

Tom Lane-2 wrote

Robert Haas <

robertmhaas@

> writes:

On Thu, Dec 11, 2014 at 1:18 AM, Josh Berkus <

josh@

> wrote:

While there were technical
issues, 9.4 dragged a considerable amount because most people were
ignoring it in favor of 9.5 development.

I think 9.4 dragged almost entirely because of one issue: the
compressibility of JSONB.

2. The amount of pre-release testing we get from people outside the
hard-core development crowd seems to be continuing to decrease.
We were fortunate that somebody found the JSONB issue before it was
too late to do anything about it. Personally, I'm very worried that
there are other such bugs in 9.4. But I've given up hoping that any
more testing will happen until we put out something that calls itself
9.4.0, which is why I voted to release in the core discussion about it.

The compressibility properties of a new type seem like something that

should

be mandated before it is committed - it shouldn't require good fortune

that
Odd are the next problem will have nothing to do with compressibility
--- we can't assume old failure will repeat themselves.

tl;dr: assign two people, a manager/curator and a lead reviewer. Give
the curator better tools and the responsibility to engage the community.
If the primary reviewer cannot review a patch in the current commitfest it
can be marked "awaiting reviewer" and moved to the next CF for evaluation
by the next pair of individuals. At minimum, though, if it does get moved
the manager AND reviewer need to comment on why it was not handled during
their CF. Also, formalize documentation targeting developers and reviewers
just like the documentation for users has been formalized and committed to
the repository. That knowledge and history is probably more important that
source code commit log and definitely should be more accessible to
newcomers.

While true a checklist of things to look for and evaluate when adding a
new type to the system still has value. How new types interact, if at all,
with TOAST seems like something that warrants explicit attention before
commit, and there are probably others (e.g., OIDs, input/output function
volatility and configuration, etc...). Maybe this exists somewhere but it
you are considering improvements to the commitfest application having an
top-of-page & always-visible checklist that can bootstrapped based upon the
patch/feature type and modified for specific nuances for the item in
question seems like it would be valuable.

If this was in place before 9.3 then the whatever category the multi-xact
patches fall into would want to have their template modified to incorporate
the various research and testing - along with links to the discussions -
the resulted from the various bug reports that were submitted. It could
even be structured to both be an interactive checklist as well as acting as
curated documentation for developers and reviewers. The wiki has some of
this but if the goal is to encourage people to learn how to contribute to
PostgreSQL it should receive a similar level of attention and quality
control that our documentation for people wanting to learn how to use
PostgreSQL receive. But that is above and beyond the simple goal of having
meaningful checklists attached to each of the major commit-fest items whose
TODO items can be commented upon and serve as a reference for how close to
commit a feature may be. Comments can be as simple as a place for people
to upload a psql script and say "this is what I did and everything seemed
to work/fail in the way I expected - on this platform".

Curation is hard though so I get why easier - just provide links to the
mailing list mainly - actions are what is currently being used. Instead of
the CF manager being a reviewer (which is a valid approach) having them be
a curator and providing better tools geared toward that role (both to do
the work and to share the results) seem like a better ideal. Having that
role a CF reviewer should maybe also be assigned. The two people -
reviewer and manager - would then be responsible for ensuring that reviews
are happening and that the communication to and recruitment from the
community is being handled - respectively.

Just some off-the-cuff thoughts...

David J.

#45

Alvaro Herrera

alvherre@2ndquadrant.com

about 11 years ago

In reply to: David Johnston (#44)

Re: 9.5 release scheduling (was Re: logical column ordering)

FWIW I don't think any amount of process would have gotten multixact to
not have the copious bugs it had. It was just too complex a patch,
doing ugly things to parts too deeply linked to the inner guts of the
server. We might have spared a few with some extra testing (such as the
idiotic wraparound bugs, and perhaps the pg_upgrade problems), but most
of the rest of it would have taken a real production load to notice --
which is exactly what happened.

(Of course, review from Tom might have unearthed many of these bugs
before they hit final release, but we're saying we don't want to be
dependent on Tom's review for every patch so that doesn't count.)

Review is good, but (as history shows) some bugs can slip through even
extensive review such as the one multixacts got from Noah and Andres.
Had anyone put some real stress on the beta, we could have noticed some
of these bugs much earlier. One of the problems we have is that our
serious users don't seem to take the beta period seriously. All our
users are too busy for that, it seems, and they expect that "somebody
else will test the point-zero release", and that by the time it hits
point-one or point-two it will be bug-free, which is quite clearly not
the case.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46

Peter Geoghegan

pg@heroku.com

about 11 years ago

In reply to: Alvaro Herrera (#45)

Re: 9.5 release scheduling (was Re: logical column ordering)

On Thu, Dec 11, 2014 at 1:49 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Review is good, but (as history shows) some bugs can slip through even
extensive review such as the one multixacts got from Noah and Andres.
Had anyone put some real stress on the beta, we could have noticed some
of these bugs much earlier. One of the problems we have is that our
serious users don't seem to take the beta period seriously. All our
users are too busy for that, it seems, and they expect that "somebody
else will test the point-zero release", and that by the time it hits
point-one or point-two it will be bug-free, which is quite clearly not
the case.

Good, strategic stress testing has a big role to play here IMV. I have
had some difficulty generating interest in that, but I think it's
essential.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47

Stephen Frost

sfrost@snowman.net

about 11 years ago

In reply to: Robert Haas (#21)

Re: 9.5 release scheduling (was Re: logical column ordering)

* Robert Haas (robertmhaas@gmail.com) wrote:

2. It's not clear that we're going to have a particularly-impressive
list of major features for 9.5. So far we've got RLS and BRIN. I
expect that GROUPING SETS is far enough along that it should be
possible to get it in before development ends, and there are a few
performance patches pending (Andres's lwlock scalability patches,
Rahila's work on compressing full-page writes) that I think will
probably make the grade. But after that it seems to me that it gets
pretty thin on the ground.

When it comes to a list of major features for 9.5, it'd be pretty
terrible, imv, if we manage to screw up and not get UPSERT taken care
of (again..). BDR will be great to have too, but we lose out far more
often for lack of what those outside the community perceive as a simple
and obvious feature that nearly every other system they deal with has.

Now, against all that, if we don't get back on our usual release
schedule then (a) it will look like we're losing momentum, which I'm
actually afraid may be true rather than merely a perception, and (b)
people whose stuff did get in will have to wait longer to see it
released. So, I'm not sure waiting is any better. But there are
certainly some things not to like about where we are.

I agree with both of these. It doesn't help that we have non-committers
working on major patches for years and aren't able to see the fruits of
their labors. I'm as much at fault for that happening at times as
anyone and I don't have any silver bullets but I certainly don't like
it.

Thanks,

Stephen

#48

Andres Freund

andres@anarazel.de

about 11 years ago

In reply to: Alvaro Herrera (#1)

Re: logical column ordering

On 2014-12-09 14:41:46 -0300, Alvaro Herrera wrote:

So I've been updating my very old patch to allow logical and physical
column reordering. Here's a WIP first cut for examination.

Do you have a updated patch that has ripened further?

The first thing where this matters is tuple descriptor expansion in
parse analysis; at this stage, things such as "*" (in "select *") are
turned into a target list, which must be sorted according to attlognum.
To achieve this I added a new routine to tupledescs,
TupleDescGetSortedAttrs() which computes a new Attribute array and
caches it in the TupleDesc for later uses; this array points to the
same elements in the normal attribute list but is order by attlognum.

That sounds sane.

Another place that needs tweaking is heapam.c, which must construct a
physical tuple from Datum/nulls arrays (heap_form_tuple). In some cases
the input arrays are sorted in logical column order.

I'm not sure that changing heaptuple.c's API (you mean that, not
heapam.c, right?) is a good level to tackle this at. I think some
function to reorder values/isnull arrays into logical order and reverse
might end up being less invasive and actually faster.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49

Alvaro Herrera

alvherre@2ndquadrant.com

about 11 years ago

In reply to: Andres Freund (#48)

Re: logical column ordering

Andres Freund wrote:

On 2014-12-09 14:41:46 -0300, Alvaro Herrera wrote:

So I've been updating my very old patch to allow logical and physical
column reordering. Here's a WIP first cut for examination.

Do you have a updated patch that has ripened further?

Not yet. Phil was kind enough to send me his old patch for study; I
am stealing a few interesting ideas from there, in particular:

Another place that needs tweaking is heapam.c, which must construct a
physical tuple from Datum/nulls arrays (heap_form_tuple). In some cases
the input arrays are sorted in logical column order.

I'm not sure that changing heaptuple.c's API (you mean that, not
heapam.c, right?) is a good level to tackle this at. I think some
function to reorder values/isnull arrays into logical order and reverse
might end up being less invasive and actually faster.

Phil took a different route here than I did, and I think his design is
better than mine. The main idea is that the Datum/nulls arrays in a
TupleTableSlot always follows physical order (he calls it "storage
order"), rather than this very strange mixture of things I did by
hacking the heaptuple.c API. So I'm reworking my patch with that in
mind.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50

Michael Paquier

michael.paquier@gmail.com

almost 11 years ago

In reply to: Alvaro Herrera (#49)

Re: logical column ordering

On Sun, Jan 4, 2015 at 10:37 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

So I'm reworking my patch with that in mind.

Switching to returned with feedback. Alvaro, feel free to add an entry
to the next CF if you are planning to work on it again.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Alvaro Herrera (#49)

Re: logical column ordering

I've decided to abandon this patch. I have spent too much time looking
at it now.

If anyone is interested in trying to study, I can provide the patches I
came up with, explanations, and references to prior discussion -- feel
free to ask.

My main motivation for this work is to enable a later patch for column
stores. Right now, since columns have monotonically increasing attnums,
it's rather difficult to have columns that are stored elsewhere. My
plan for that now is much more modest, something like adding a constant
10000 to attnums and that would let us identify columns that are outside
the heap -- or something like that. I haven't fully worked it out yet.

Just a few quick notes about this patch: last thing I was doing was mess
with setrefs.c so that Var nodes carry "varphysnum" annotations, which
are set to 0 during initial planner phases, and are turned into the
correct attphysnum (the value extracted from catalogs) so that
TupleDescs constructed from targetlists by ExecTypeFromTL and friends
can have the correct attphysnum too. I think this part works correctly,
with the horrible exception that I had to do a relation_open() in
setrefs.c to get hold of the right attphysnum from a tupledesc obtained
from catalogs. That's not acceptable at all; I think the right way to
do this would be to store a list of numbers earlier (not sure when) and
store it in RelOptInfo or RangeTableEntry; that would be accesible
during setrefs.c.

The other bit I did was modify all the heaptuple.c code so that it could
deal correctly with tupledescs that have attphysnums and attlognum in an
order different from stock attnum. That took some time to get right,
but I think it's also correct now.

One issue I had was finding places that use "attnum" as an index into
the tupledesc "attrs" array. I had to examine all these places and
change them to use a "physattrs" array, which is an array that has been
sorted by physical number. I don't think all the changes are correct,
and I'm not sure that I caught them all.

Anyway it seems to me that this is "mostly there". If somebody is
interested in learning executor code, this project would be damn cool to
get done.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Alvaro Herrera (#51)

Re: logical column ordering

On 20.1.2015 22:30, Alvaro Herrera wrote:

I've decided to abandon this patch. I have spent too much time looking
at it now.

If anyone is interested in trying to study, I can provide the patches I
came up with, explanations, and references to prior discussion -- feel
free to ask.

I'll take look. Can you share the patches etc. - either here, or maybe
send it to me directly?

regards
Tomas

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#53

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Tomas Vondra (#52)

4 attachment(s)

Re: logical column ordering

Hi,

attached is the result of my first attempt to make the logical column
ordering patch work. This touches a lot of code in the executor that is
mostly new to me, so if you see something that looks like an obvious
bug, it probably is (so let me know).

improvements
------------
The main improvements of this version are that:

* initdb actually works (while before it was crashing)

* regression tests work, with two exceptions

(a) 'subselect' fails because EXPLAIN prints columns in physical order
(but we expect logical)

(b) col_order crashes works because of tuple descriptor mismatch in a
function call (this actually causes a segfault)

The main change is this patch is that tlist_matches_tupdesc() now checks
target list vs. physical attribute order, which may result in doing a
projection (in cases when that would not be done previously).

I don not claim this is the best approach - maybe it would be better to
keep the physical tuple and reorder it lazily. That's why I kept a few
pieces of code (fix_physno_mutator) and a few unused fields in Var.

Over the time I've heard various use cases for this patch, but in most
cases it was quite speculative. If you have an idea where this might be
useful, can you explain it here, or maybe point me to a place where it's
described?

There's also a few FIXMEs, mostly from Alvaro's version of the patch.
Some of them are probably obsolete, but I wasn't 100% sure by that so
I've left them in place until I understand the code sufficiently.

randomized testing
------------------
I've also attached a python script for simple randomized testing. Just
execute it like this:

$ python randomize-attlognum.py -t test_1 test_2 \
--init-script attlognum-init.sql \
--test-script attlognum-test.sql

and it will do this over and over

$ dropdb test
$ createdb test
$ run init script
$ randomly set attlognums for the tables (test_1 and test_2)
$ run test script

It does not actually check the result, but my experience is that when
there's a bug in handling the descriptor, it results in segfault pretty
fast (just put some varlena columns into the table).

plans / future
--------------
After discussing this with Alvaro, we've both agreed that this is far
too high-risk change to commit in the very last CF (even if it was in a
better shape). So while it's added to 2015-02 CF, we're aiming for 9.6
if things go well.

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

logical-column-ordering.patchtext/x-diff; name=logical-column-ordering.patchDownload

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index c5892d3..7528724 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2368,6 +2368,9 @@ get_attnum_pk_pos(int *pkattnums, int pknumatts, int key)
 	return -1;
 }
 
+/*
+ * FIXME this probably needs to be tweaked.
+ */
 static HeapTuple
 get_tuple_of_interest(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals)
 {
diff --git a/contrib/spi/timetravel.c b/contrib/spi/timetravel.c
index 0699438..30e496c 100644
--- a/contrib/spi/timetravel.c
+++ b/contrib/spi/timetravel.c
@@ -314,6 +314,7 @@ timetravel(PG_FUNCTION_ARGS)
 		Oid		   *ctypes;
 		char		sql[8192];
 		char		separ = ' ';
+		Form_pg_attribute *attrs;
 
 		/* allocate ctypes for preparation */
 		ctypes = (Oid *) palloc(natts * sizeof(Oid));
@@ -322,10 +323,11 @@ timetravel(PG_FUNCTION_ARGS)
 		 * Construct query: INSERT INTO _relation_ VALUES ($1, ...)
 		 */
 		snprintf(sql, sizeof(sql), "INSERT INTO %s VALUES (", relname);
+		attrs = TupleDescGetLogSortedAttrs(tupdesc);
 		for (i = 1; i <= natts; i++)
 		{
-			ctypes[i - 1] = SPI_gettypeid(tupdesc, i);
-			if (!(tupdesc->attrs[i - 1]->attisdropped)) /* skip dropped columns */
+			ctypes[i - 1] = SPI_gettypeid(tupdesc, attrs[i - 1]->attnum);
+			if (!(attrs[i - 1]->attisdropped)) /* skip dropped columns */
 			{
 				snprintf(sql + strlen(sql), sizeof(sql) - strlen(sql), "%c$%d", separ, i);
 				separ = ',';
diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index 6cd4e8e..14787ce 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -79,6 +79,8 @@
 /*
  * heap_compute_data_size
  *		Determine size of the data area of a tuple to be constructed
+ *
+ * Note: input arrays must be in attnum order.
  */
 Size
 heap_compute_data_size(TupleDesc tupleDesc,
@@ -88,16 +90,23 @@ heap_compute_data_size(TupleDesc tupleDesc,
 	Size		data_length = 0;
 	int			i;
 	int			numberOfAttributes = tupleDesc->natts;
-	Form_pg_attribute *att = tupleDesc->attrs;
+	Form_pg_attribute *att = TupleDescGetPhysSortedAttrs(tupleDesc);
 
+	/*
+	 * We need to consider the attributes in physical order for storage, yet
+	 * our input arrays are in attnum order.  In this loop, "i" is an index
+	 * into the attphysnum-sorted attribute array, and idx is an index into the
+	 * input arrays.
+	 */
 	for (i = 0; i < numberOfAttributes; i++)
 	{
 		Datum		val;
+		int			idx = att[i]->attnum - 1;
 
-		if (isnull[i])
+		if (isnull[idx])
 			continue;
 
-		val = values[i];
+		val = values[idx];
 
 		if (ATT_IS_PACKABLE(att[i]) &&
 			VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
@@ -127,6 +136,8 @@ heap_compute_data_size(TupleDesc tupleDesc,
  * We also fill the null bitmap (if any) and set the infomask bits
  * that reflect the tuple's data contents.
  *
+ * NB - the input arrays must be in attnum order.
+ *
  * NOTE: it is now REQUIRED that the caller have pre-zeroed the data area.
  */
 void
@@ -139,7 +150,7 @@ heap_fill_tuple(TupleDesc tupleDesc,
 	int			bitmask;
 	int			i;
 	int			numberOfAttributes = tupleDesc->natts;
-	Form_pg_attribute *att = tupleDesc->attrs;
+	Form_pg_attribute *att = TupleDescGetPhysSortedAttrs(tupleDesc);
 
 #ifdef USE_ASSERT_CHECKING
 	char	   *start = data;
@@ -159,9 +170,17 @@ heap_fill_tuple(TupleDesc tupleDesc,
 
 	*infomask &= ~(HEAP_HASNULL | HEAP_HASVARWIDTH | HEAP_HASEXTERNAL);
 
+	/*
+	 * We need to consider the attributes in physical order for storage, yet
+	 * our input arrays are in attnum order.  In this loop, "i" is an index
+	 * into the attphysnum-sorted attribute array, and idx is an index into the
+	 * input arrays.
+	 */
 	for (i = 0; i < numberOfAttributes; i++)
 	{
 		Size		data_length;
+		Datum		value;
+		int			idx = att[i]->attnum - 1;
 
 		if (bit != NULL)
 		{
@@ -174,7 +193,7 @@ heap_fill_tuple(TupleDesc tupleDesc,
 				bitmask = 1;
 			}
 
-			if (isnull[i])
+			if (isnull[idx])
 			{
 				*infomask |= HEAP_HASNULL;
 				continue;
@@ -183,6 +202,8 @@ heap_fill_tuple(TupleDesc tupleDesc,
 			*bitP |= bitmask;
 		}
 
+		value = values[idx];
+
 		/*
 		 * XXX we use the att_align macros on the pointer value itself, not on
 		 * an offset.  This is a bit of a hack.
@@ -192,13 +213,13 @@ heap_fill_tuple(TupleDesc tupleDesc,
 		{
 			/* pass-by-value */
 			data = (char *) att_align_nominal(data, att[i]->attalign);
-			store_att_byval(data, values[i], att[i]->attlen);
+			store_att_byval(data, value, att[i]->attlen);
 			data_length = att[i]->attlen;
 		}
 		else if (att[i]->attlen == -1)
 		{
 			/* varlena */
-			Pointer		val = DatumGetPointer(values[i]);
+			Pointer		val = DatumGetPointer(value);
 
 			*infomask |= HEAP_HASVARWIDTH;
 			if (VARATT_IS_EXTERNAL(val))
@@ -236,8 +257,8 @@ heap_fill_tuple(TupleDesc tupleDesc,
 			/* cstring ... never needs alignment */
 			*infomask |= HEAP_HASVARWIDTH;
 			Assert(att[i]->attalign == 'c');
-			data_length = strlen(DatumGetCString(values[i])) + 1;
-			memcpy(data, DatumGetPointer(values[i]), data_length);
+			data_length = strlen(DatumGetCString(value)) + 1;
+			memcpy(data, DatumGetPointer(value), data_length);
 		}
 		else
 		{
@@ -245,7 +266,7 @@ heap_fill_tuple(TupleDesc tupleDesc,
 			data = (char *) att_align_nominal(data, att[i]->attalign);
 			Assert(att[i]->attlen > 0);
 			data_length = att[i]->attlen;
-			memcpy(data, DatumGetPointer(values[i]), data_length);
+			memcpy(data, DatumGetPointer(value), data_length);
 		}
 
 		data += data_length;
@@ -326,10 +347,12 @@ nocachegetattr(HeapTuple tuple,
 {
 	HeapTupleHeader tup = tuple->t_data;
 	Form_pg_attribute *att = tupleDesc->attrs;
+	Form_pg_attribute *physatt = TupleDescGetPhysSortedAttrs(tupleDesc);
 	char	   *tp;				/* ptr to data part of tuple */
 	bits8	   *bp = tup->t_bits;		/* ptr to null bitmap in tuple */
 	bool		slow = false;	/* do we have to walk attrs? */
 	int			off;			/* current offset within data */
+	int			attphysnum;
 
 	/* ----------------
 	 *	 Three cases:
@@ -340,7 +363,9 @@ nocachegetattr(HeapTuple tuple,
 	 * ----------------
 	 */
 
+	/* determine the indexes into the physical and regular attribute arrays */
 	attnum--;
+	attphysnum = att[attnum]->attphysnum - 1;
 
 	if (!HeapTupleNoNulls(tuple))
 	{
@@ -394,9 +419,9 @@ nocachegetattr(HeapTuple tuple,
 		{
 			int			j;
 
-			for (j = 0; j <= attnum; j++)
+			for (j = 0; j <= attphysnum; j++)
 			{
-				if (att[j]->attlen <= 0)
+				if (physatt[j]->attlen <= 1)
 				{
 					slow = true;
 					break;
@@ -419,24 +444,24 @@ nocachegetattr(HeapTuple tuple,
 		 * fixed-width columns, in hope of avoiding future visits to this
 		 * routine.
 		 */
-		att[0]->attcacheoff = 0;
+		physatt[0]->attcacheoff = 0;
 
 		/* we might have set some offsets in the slow path previously */
-		while (j < natts && att[j]->attcacheoff > 0)
+		while (j < natts && physatt[j]->attcacheoff > 0)
 			j++;
 
-		off = att[j - 1]->attcacheoff + att[j - 1]->attlen;
+		off = physatt[j - 1]->attcacheoff + physatt[j - 1]->attlen;
 
 		for (; j < natts; j++)
 		{
-			if (att[j]->attlen <= 0)
+			if (physatt[j]->attlen <= 0)
 				break;
 
-			off = att_align_nominal(off, att[j]->attalign);
+			off = att_align_nominal(off, physatt[j]->attalign);
 
-			att[j]->attcacheoff = off;
+			physatt[j]->attcacheoff = off;
 
-			off += att[j]->attlen;
+			off += physatt[j]->attlen;
 		}
 
 		Assert(j > attnum);
@@ -457,20 +482,21 @@ nocachegetattr(HeapTuple tuple,
 		 * then advance over the attr based on its length.  Nulls have no
 		 * storage and no alignment padding either.  We can use/set
 		 * attcacheoff until we reach either a null or a var-width attribute.
+		 * "i" is an index into the attphysnum-ordered array here.
 		 */
 		off = 0;
 		for (i = 0;; i++)		/* loop exit is at "break" */
 		{
-			if (HeapTupleHasNulls(tuple) && att_isnull(i, bp))
+			if (HeapTupleHasNulls(tuple) && att_isnull(physatt[i]->attnum - 1, bp))
 			{
 				usecache = false;
 				continue;		/* this cannot be the target att */
 			}
 
 			/* If we know the next offset, we can skip the rest */
-			if (usecache && att[i]->attcacheoff >= 0)
-				off = att[i]->attcacheoff;
-			else if (att[i]->attlen == -1)
+			if (usecache && physatt[i]->attcacheoff >= 0)
+				off = physatt[i]->attcacheoff;
+			else if (physatt[i]->attlen == -1)
 			{
 				/*
 				 * We can only cache the offset for a varlena attribute if the
@@ -479,11 +505,11 @@ nocachegetattr(HeapTuple tuple,
 				 * either an aligned or unaligned value.
 				 */
 				if (usecache &&
-					off == att_align_nominal(off, att[i]->attalign))
-					att[i]->attcacheoff = off;
+					off == att_align_nominal(off, physatt[i]->attalign))
+					physatt[i]->attcacheoff = off;
 				else
 				{
-					off = att_align_pointer(off, att[i]->attalign, -1,
+					off = att_align_pointer(off, physatt[i]->attalign, -1,
 											tp + off);
 					usecache = false;
 				}
@@ -491,18 +517,19 @@ nocachegetattr(HeapTuple tuple,
 			else
 			{
 				/* not varlena, so safe to use att_align_nominal */
-				off = att_align_nominal(off, att[i]->attalign);
+				off = att_align_nominal(off, physatt[i]->attalign);
 
 				if (usecache)
-					att[i]->attcacheoff = off;
+					physatt[i]->attcacheoff = off;
 			}
 
-			if (i == attnum)
+			/* if this this is our attribute, we're done */
+			if (physatt[i]->attnum - 1 == attnum)
 				break;
 
-			off = att_addlength_pointer(off, att[i]->attlen, tp + off);
+			off = att_addlength_pointer(off, physatt[i]->attlen, tp + off);
 
-			if (usecache && att[i]->attlen <= 0)
+			if (usecache && physatt[i]->attlen <= 0)
 				usecache = false;
 		}
 	}
@@ -657,6 +684,8 @@ heap_copy_tuple_as_datum(HeapTuple tuple, TupleDesc tupleDesc)
  *		construct a tuple from the given values[] and isnull[] arrays,
  *		which are of the length indicated by tupleDescriptor->natts
  *
+ *		The input arrays must always be in attnum order.
+ *
  * The result is allocated in the current memory context.
  */
 HeapTuple
@@ -889,7 +918,8 @@ heap_modifytuple(HeapTuple tuple,
 /*
  * heap_deform_tuple
  *		Given a tuple, extract data into values/isnull arrays; this is
- *		the inverse of heap_form_tuple.
+ *		the inverse of heap_form_tuple.  Like that routine, the output
+ *		arrays are sorted in attnum order.
  *
  *		Storage for the values/isnull arrays is provided by the caller;
  *		it should be sized according to tupleDesc->natts not
@@ -909,10 +939,10 @@ heap_deform_tuple(HeapTuple tuple, TupleDesc tupleDesc,
 {
 	HeapTupleHeader tup = tuple->t_data;
 	bool		hasnulls = HeapTupleHasNulls(tuple);
-	Form_pg_attribute *att = tupleDesc->attrs;
+	Form_pg_attribute *att = TupleDescGetPhysSortedAttrs(tupleDesc);
 	int			tdesc_natts = tupleDesc->natts;
 	int			natts;			/* number of atts to extract */
-	int			attnum;
+	int			i;
 	char	   *tp;				/* ptr to tuple data */
 	long		off;			/* offset in tuple data */
 	bits8	   *bp = tup->t_bits;		/* ptr to null bitmap in tuple */
@@ -931,9 +961,10 @@ heap_deform_tuple(HeapTuple tuple, TupleDesc tupleDesc,
 
 	off = 0;
 
-	for (attnum = 0; attnum < natts; attnum++)
+	for (i = 0; i < natts; i++)
 	{
-		Form_pg_attribute thisatt = att[attnum];
+		Form_pg_attribute thisatt = att[i];
+		int		attnum = thisatt->attnum - 1;
 
 		if (hasnulls && att_isnull(attnum, bp))
 		{
@@ -983,13 +1014,14 @@ heap_deform_tuple(HeapTuple tuple, TupleDesc tupleDesc,
 	}
 
 	/*
-	 * If tuple doesn't have all the atts indicated by tupleDesc, read the
-	 * rest as null
+	 * Read remaining attributes as nulls
+	 *
+	 * XXX think this through ...
 	 */
-	for (; attnum < tdesc_natts; attnum++)
+	for (; i < tdesc_natts; i++)
 	{
-		values[attnum] = (Datum) 0;
-		isnull[attnum] = true;
+		values[i] = (Datum) 0;
+		isnull[i] = true;
 	}
 }
 
@@ -1042,10 +1074,9 @@ heap_deformtuple(HeapTuple tuple,
  *		This is essentially an incremental version of heap_deform_tuple:
  *		on each call we extract attributes up to the one needed, without
  *		re-computing information about previously extracted attributes.
- *		slot->tts_nvalid is the number of attributes already extracted.
  */
 static void
-slot_deform_tuple(TupleTableSlot *slot, int natts)
+slot_deform_tuple(TupleTableSlot *slot, AttrNumber natts)
 {
 	HeapTuple	tuple = slot->tts_tuple;
 	TupleDesc	tupleDesc = slot->tts_tupleDescriptor;
@@ -1054,6 +1085,8 @@ slot_deform_tuple(TupleTableSlot *slot, int natts)
 	HeapTupleHeader tup = tuple->t_data;
 	bool		hasnulls = HeapTupleHasNulls(tuple);
 	Form_pg_attribute *att = tupleDesc->attrs;
+	Form_pg_attribute *physatt = TupleDescGetPhysSortedAttrs(tupleDesc);
+	int			maxphysnum;
 	int			attnum;
 	char	   *tp;				/* ptr to tuple data */
 	long		off;			/* offset in tuple data */
@@ -1064,15 +1097,16 @@ slot_deform_tuple(TupleTableSlot *slot, int natts)
 	 * Check whether the first call for this tuple, and initialize or restore
 	 * loop state.
 	 */
-	attnum = slot->tts_nvalid;
-	if (attnum == 0)
+	if (slot->tts_nphysvalid == 0)
 	{
+		Assert(slot->tts_nvalid == 0);
 		/* Start from the first attribute */
 		off = 0;
 		slow = false;
 	}
 	else
 	{
+		Assert(slot->tts_nvalid != 0);
 		/* Restore state from previous execution */
 		off = slot->tts_off;
 		slow = slot->tts_slow;
@@ -1080,19 +1114,74 @@ slot_deform_tuple(TupleTableSlot *slot, int natts)
 
 	tp = (char *) tup + tup->t_hoff;
 
-	for (; attnum < natts; attnum++)
+	/*
+	 * Scan the attribute array to determine the maximum physical position that
+	 * we need to extract.  Start from the position next to the one that was
+	 * last extracted.
+	 */
+	maxphysnum = slot->tts_nphysvalid;
+	for (attnum = slot->tts_nvalid + 1; attnum <= natts; attnum++)
+	{
+		if (att[attnum - 1]->attphysnum > maxphysnum)
+			maxphysnum = tupleDesc->attrs[attnum - 1]->attphysnum;
+	}
+
+	/*
+	 * Now walk the physical-order attribute array to decode up the point so
+	 * determined, starting from the element one past the one we already
+	 * extracted.
+	 */
+	for (attnum = slot->tts_nphysvalid + 1; attnum <= maxphysnum; attnum++)
 	{
-		Form_pg_attribute thisatt = att[attnum];
+		int	i;
+		Form_pg_attribute thisatt;
+		int			thisattnum = -1;
 
-		if (hasnulls && att_isnull(attnum, bp))
+		/*
+		 * We need to find all physical attributes between the one we fetched
+		 * previously (tts_nphysvalid) and the one we need now (maxphysnum).
+		 * There might be 'holes' when requesting virtual tuples (e.g. a single
+		 * attribute with attphysnum = 100) so we'll walk through physatt to
+		 * find the proper attribute. If we don't find it we skip to the next
+		 * physattnum.
+		 *
+		 * FIXME This is a bit ugly, because TupleDescGetPhysSortedAttrs does
+		 *       not expect holes, so the whole idea of direct indexing into
+		 *       physatt is not working here. The current solution is rather
+		 *       straight-forward and probably not very efficient - simply
+		 *       skip those physnum values not present in the tuple descriptor.
+		 *
+		 *       There are a few ways to fix this: e.g. building the physatt
+		 *       in an 'expanded' form, including missing attributes, which
+		 *       would allow direct indexing (but what will happen at the
+		 *       places that don't expect this?).
+		 */
+
+		for (i = 0; i < tupleDesc->natts; i++)
 		{
-			values[attnum] = (Datum) 0;
-			isnull[attnum] = true;
+			if (physatt[i]->attphysnum == attnum)
+			{
+				thisatt = physatt[i];
+				thisattnum = thisatt->attnum - 1;
+				break;
+			}
+		}
+
+		/* skip attphysnum not present in the virtual tuple */
+		if (thisattnum == -1)
+			continue;
+
+		Assert(thisatt->attphysnum == attnum);
+
+		if (hasnulls && att_isnull(thisattnum, bp))
+		{
+			values[thisattnum] = (Datum) 0;
+			isnull[thisattnum] = true;
 			slow = true;		/* can't use attcacheoff anymore */
 			continue;
 		}
 
-		isnull[attnum] = false;
+		isnull[thisattnum] = false;
 
 		if (!slow && thisatt->attcacheoff >= 0)
 			off = thisatt->attcacheoff;
@@ -1123,7 +1212,7 @@ slot_deform_tuple(TupleTableSlot *slot, int natts)
 				thisatt->attcacheoff = off;
 		}
 
-		values[attnum] = fetchatt(thisatt, tp + off);
+		values[thisattnum] = fetchatt(thisatt, tp + off);
 
 		off = att_addlength_pointer(off, thisatt->attlen, tp + off);
 
@@ -1132,9 +1221,16 @@ slot_deform_tuple(TupleTableSlot *slot, int natts)
 	}
 
 	/*
+	 * XXX could we scan further and move tts_nvalid a bit higher, without
+	 * decoding further?  Scanning in physical order might have extracted more
+	 * attributes than what was requested.
+	 */
+
+	/*
 	 * Save state for next execution
 	 */
-	slot->tts_nvalid = attnum;
+	slot->tts_nphysvalid = maxphysnum;
+	slot->tts_nvalid = natts;
 	slot->tts_off = off;
 	slot->tts_slow = slow;
 }
@@ -1152,7 +1248,7 @@ slot_deform_tuple(TupleTableSlot *slot, int natts)
  *		when the physical tuple is longer than the tupdesc.
  */
 Datum
-slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull)
+slot_getattr(TupleTableSlot *slot, AttrNumber attnum, bool *isnull)
 {
 	HeapTuple	tuple = slot->tts_tuple;
 	TupleDesc	tupleDesc = slot->tts_tupleDescriptor;
@@ -1170,6 +1266,9 @@ slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull)
 		return heap_getsysattr(tuple, attnum, tupleDesc, isnull);
 	}
 
+	/* This only gets attributes by attnum, never by physnum or lognum. */
+	Assert(tupleDesc->attrs[attnum - 1]->attnum == attnum);
+
 	/*
 	 * fast path if desired attribute already cached
 	 */
@@ -1250,7 +1349,8 @@ slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull)
 void
 slot_getallattrs(TupleTableSlot *slot)
 {
-	int			tdesc_natts = slot->tts_tupleDescriptor->natts;
+	TupleDesc	tupdesc = slot->tts_tupleDescriptor;
+	int			tdesc_natts = tupdesc->natts;
 	int			attnum;
 	HeapTuple	tuple;
 
@@ -1277,6 +1377,8 @@ slot_getallattrs(TupleTableSlot *slot)
 	/*
 	 * If tuple doesn't have all the atts indicated by tupleDesc, read the
 	 * rest as null
+	 *
+	 * FIXME -- needs actual thought
 	 */
 	for (; attnum < tdesc_natts; attnum++)
 	{
@@ -1314,16 +1416,21 @@ slot_getsomeattrs(TupleTableSlot *slot, int attnum)
 		elog(ERROR, "cannot extract attribute from empty tuple slot");
 
 	/*
-	 * load up any slots available from physical tuple
+	 * make sure we don't try to fetch anything past the end of the physical tuple
 	 */
 	attno = HeapTupleHeaderGetNatts(tuple->t_data);
 	attno = Min(attno, attnum);
 
+	/*
+	 * load up any slots available from physical tuple
+	 */
 	slot_deform_tuple(slot, attno);
 
 	/*
 	 * If tuple doesn't have all the atts indicated by tupleDesc, read the
 	 * rest as null
+	 *
+	 * FIXME -- this needs some actual thought
 	 */
 	for (; attno < attnum; attno++)
 	{
@@ -1392,7 +1499,8 @@ heap_freetuple(HeapTuple htup)
 /*
  * heap_form_minimal_tuple
  *		construct a MinimalTuple from the given values[] and isnull[] arrays,
- *		which are of the length indicated by tupleDescriptor->natts
+ *		which are of the length indicated by tupleDescriptor->natts.  The input
+ *		arrays must always be sorted in attnum order.
  *
  * This is exactly like heap_form_tuple() except that the result is a
  * "minimal" tuple lacking a HeapTupleData header as well as room for system
diff --git a/src/backend/access/common/printtup.c b/src/backend/access/common/printtup.c
index baed981..7482812 100644
--- a/src/backend/access/common/printtup.c
+++ b/src/backend/access/common/printtup.c
@@ -199,6 +199,10 @@ SendRowDescriptionMessage(TupleDesc typeinfo, List *targetlist, int16 *formats)
 	pq_beginmessage(&buf, 'T'); /* tuple descriptor message type */
 	pq_sendint(&buf, natts, 2); /* # of attrs in tuples */
 
+	/*
+	 * The attributes in the slot's descriptor are already in logical order;
+	 * we don't editorialize on the ordering here.
+	 */
 	for (i = 0; i < natts; ++i)
 	{
 		Oid			atttypid = attrs[i]->atttypid;
@@ -248,6 +252,8 @@ SendRowDescriptionMessage(TupleDesc typeinfo, List *targetlist, int16 *formats)
 
 /*
  * Get the lookup info that printtup() needs
+ *
+ * The resulting array is indexed by attnum.
  */
 static void
 printtup_prepare_info(DR_printtup *myState, TupleDesc typeinfo, int numAttrs)
@@ -327,7 +333,8 @@ printtup(TupleTableSlot *slot, DestReceiver *self)
 	pq_sendint(&buf, natts, 2);
 
 	/*
-	 * send the attributes of this tuple
+	 * Send the attributes of this tuple.  Note the attributes of the slot's
+	 * descriptor are already in the correct output order. (XXX which is ...)
 	 */
 	for (i = 0; i < natts; ++i)
 	{
@@ -430,7 +437,8 @@ printtup_20(TupleTableSlot *slot, DestReceiver *self)
 		pq_sendint(&buf, j, 1);
 
 	/*
-	 * send the attributes of this tuple
+	 * Send the attributes of this tuple.  Note the attributes of the slot's
+	 * descriptor are already in logical order.
 	 */
 	for (i = 0; i < natts; ++i)
 	{
@@ -517,7 +525,8 @@ debugStartup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	int			i;
 
 	/*
-	 * show the return type of the tuples
+	 * Show the return type of the tuples.  Note the attributes of the slot's
+	 * descriptor are already in logical order.
 	 */
 	for (i = 0; i < natts; ++i)
 		printatt((unsigned) i + 1, attinfo[i], NULL);
@@ -533,6 +542,7 @@ debugtup(TupleTableSlot *slot, DestReceiver *self)
 {
 	TupleDesc	typeinfo = slot->tts_tupleDescriptor;
 	int			natts = typeinfo->natts;
+	Form_pg_attribute attrib;
 	int			i;
 	Datum		attr;
 	char	   *value;
@@ -540,17 +550,21 @@ debugtup(TupleTableSlot *slot, DestReceiver *self)
 	Oid			typoutput;
 	bool		typisvarlena;
 
+	/*
+	 * Send the attributes of this tuple.  Note the attributes of the slot's
+	 * descriptor are already in logical order.
+	 */
 	for (i = 0; i < natts; ++i)
 	{
+		attrib = typeinfo->attrs[i];
 		attr = slot_getattr(slot, i + 1, &isnull);
 		if (isnull)
 			continue;
-		getTypeOutputInfo(typeinfo->attrs[i]->atttypid,
-						  &typoutput, &typisvarlena);
+		getTypeOutputInfo(attrib->atttypid, &typoutput, &typisvarlena);
 
 		value = OidOutputFunctionCall(typoutput, attr);
 
-		printatt((unsigned) i + 1, typeinfo->attrs[i], value);
+		printatt((unsigned) i + 1, attrib, value);
 	}
 	printf("\t----\n");
 }
@@ -612,7 +626,8 @@ printtup_internal_20(TupleTableSlot *slot, DestReceiver *self)
 		pq_sendint(&buf, j, 1);
 
 	/*
-	 * send the attributes of this tuple
+	 * Send the attributes of this tuple.  Note the attributes of the slot's
+	 * descriptor are already in logical order.
 	 */
 	for (i = 0; i < natts; ++i)
 	{
diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c
index 41d71c8..924f4af 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -25,6 +25,7 @@
 #include "parser/parse_type.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/memutils.h"
 #include "utils/resowner_private.h"
 #include "utils/syscache.h"
 
@@ -87,6 +88,8 @@ CreateTemplateTupleDesc(int natts, bool hasoid)
 	 * Initialize other fields of the tupdesc.
 	 */
 	desc->natts = natts;
+	desc->logattrs = NULL;
+	desc->physattrs = NULL;
 	desc->constr = NULL;
 	desc->tdtypeid = RECORDOID;
 	desc->tdtypmod = -1;
@@ -120,6 +123,8 @@ CreateTupleDesc(int natts, bool hasoid, Form_pg_attribute *attrs)
 	desc = (TupleDesc) palloc(sizeof(struct tupleDesc));
 	desc->attrs = attrs;
 	desc->natts = natts;
+	desc->logattrs = NULL;
+	desc->physattrs = NULL;
 	desc->constr = NULL;
 	desc->tdtypeid = RECORDOID;
 	desc->tdtypmod = -1;
@@ -154,6 +159,9 @@ CreateTupleDescCopy(TupleDesc tupdesc)
 	desc->tdtypeid = tupdesc->tdtypeid;
 	desc->tdtypmod = tupdesc->tdtypmod;
 
+	Assert(desc->logattrs == NULL);
+	Assert(desc->physattrs == NULL);
+
 	return desc;
 }
 
@@ -251,11 +259,16 @@ TupleDescCopyEntry(TupleDesc dst, AttrNumber dstAttno,
 	 * bit to avoid a useless O(N^2) penalty.
 	 */
 	dst->attrs[dstAttno - 1]->attnum = dstAttno;
+	dst->attrs[dstAttno - 1]->attlognum = dstAttno;
 	dst->attrs[dstAttno - 1]->attcacheoff = -1;
 
 	/* since we're not copying constraints or defaults, clear these */
 	dst->attrs[dstAttno - 1]->attnotnull = false;
 	dst->attrs[dstAttno - 1]->atthasdef = false;
+
+	/* Reset the new entry physical and logical position too */
+	dst->attrs[dstAttno - 1]->attphysnum = dstAttno;
+	dst->attrs[dstAttno - 1]->attlognum = dstAttno;
 }
 
 /*
@@ -301,6 +314,11 @@ FreeTupleDesc(TupleDesc tupdesc)
 		pfree(tupdesc->constr);
 	}
 
+	if (tupdesc->logattrs)
+		pfree(tupdesc->logattrs);
+	if (tupdesc->physattrs)
+		pfree(tupdesc->physattrs);
+
 	pfree(tupdesc);
 }
 
@@ -345,7 +363,7 @@ DecrTupleDescRefCount(TupleDesc tupdesc)
  * Note: we deliberately do not check the attrelid and tdtypmod fields.
  * This allows typcache.c to use this routine to see if a cached record type
  * matches a requested type, and is harmless for relcache.c's uses.
- * We don't compare tdrefcount, either.
+ * We don't compare tdrefcount nor logattrs, either.
  */
 bool
 equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
@@ -386,6 +404,12 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
 			return false;
 		if (attr1->attlen != attr2->attlen)
 			return false;
+#if 0
+		if (attr1->attphysnum != attr2->attphysnum)
+			return false;
+		if (attr1->attlognum != attr2->attlognum)
+			return false;
+#endif
 		if (attr1->attndims != attr2->attndims)
 			return false;
 		if (attr1->atttypmod != attr2->atttypmod)
@@ -529,6 +553,8 @@ TupleDescInitEntry(TupleDesc desc,
 	att->atttypmod = typmod;
 
 	att->attnum = attributeNumber;
+	att->attphysnum = attributeNumber;
+	att->attlognum = attributeNumber;
 	att->attndims = attdim;
 
 	att->attnotnull = false;
@@ -574,6 +600,24 @@ TupleDescInitEntryCollation(TupleDesc desc,
 	desc->attrs[attributeNumber - 1]->attcollation = collationid;
 }
 
+/*
+ * Assign a nondefault attphysnum to a previously initialized tuple descriptor
+ * entry.
+ */
+void
+TupleDescInitEntryPhysicalPosition(TupleDesc desc,
+								   AttrNumber attributeNumber,
+								   AttrNumber attphysnum)
+{
+	/*
+	 * sanity checks
+	 */
+	AssertArg(PointerIsValid(desc));
+	AssertArg(attributeNumber >= 1);
+	AssertArg(attributeNumber <= desc->natts);
+
+	desc->attrs[attributeNumber - 1]->attphysnum = attphysnum;
+}
 
 /*
  * BuildDescForRelation
@@ -666,6 +710,9 @@ BuildDescForRelation(List *schema)
 		desc->constr = NULL;
 	}
 
+	Assert(desc->logattrs == NULL);
+	Assert(desc->physattrs == NULL);
+
 	return desc;
 }
 
@@ -726,5 +773,78 @@ BuildDescFromLists(List *names, List *types, List *typmods, List *collations)
 		TupleDescInitEntryCollation(desc, attnum, attcollation);
 	}
 
+	Assert(desc->logattrs == NULL);
+	Assert(desc->physattrs == NULL);
 	return desc;
 }
+
+/*
+ * qsort callback for TupleDescGetSortedAttrs
+ */
+static int
+cmplognum(const void *attr1, const void *attr2)
+{
+	Form_pg_attribute	att1 = *(Form_pg_attribute *) attr1;
+	Form_pg_attribute	att2 = *(Form_pg_attribute *) attr2;
+
+	if (att1->attlognum < att2->attlognum)
+		return -1;
+	if (att1->attlognum > att2->attlognum)
+		return 1;
+	return 0;
+}
+
+static int
+cmpphysnum(const void *attr1, const void *attr2)
+{
+	Form_pg_attribute	att1 = *(Form_pg_attribute *) attr1;
+	Form_pg_attribute	att2 = *(Form_pg_attribute *) attr2;
+
+	if (att1->attphysnum < att2->attphysnum)
+		return -1;
+	if (att1->attphysnum > att2->attphysnum)
+		return 1;
+	return 0;
+}
+
+static inline Form_pg_attribute *
+tupdescSortAttrs(TupleDesc desc,
+				 int (*cmpfn)(const void *, const void *))
+{
+	Form_pg_attribute *attrs;
+	Size	size = sizeof(Form_pg_attribute) * desc->natts;
+
+	/*
+	 * The attribute arrays must be allocated in the same memcxt as the tupdesc
+	 * they belong to, so that they aren't reset ahead of time.
+	 */
+	attrs = MemoryContextAlloc(GetMemoryChunkContext(desc), size);
+	memcpy(attrs, desc->attrs, size);
+	qsort(attrs, desc->natts, sizeof(Form_pg_attribute), cmpfn);
+
+	return attrs;
+}
+
+/*
+ * Return the array of attrs sorted by logical position
+ */
+Form_pg_attribute *
+TupleDescGetLogSortedAttrs(TupleDesc desc)
+{
+	if (desc->logattrs == NULL)
+		desc->logattrs = tupdescSortAttrs(desc, cmplognum);
+
+	return desc->logattrs;
+}
+
+/*
+ * Return the array of attrs sorted by physical position
+ */
+Form_pg_attribute *
+TupleDescGetPhysSortedAttrs(TupleDesc desc)
+{
+	if (desc->physattrs == NULL)
+		desc->physattrs = tupdescSortAttrs(desc, cmpphysnum);
+
+	return desc->physattrs;
+}
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 8464e87..d2d547b 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -690,8 +690,8 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 	 * Look for attributes with attstorage 'x' to compress.  Also find large
 	 * attributes with attstorage 'x' or 'e', and store them external.
 	 */
-	while (heap_compute_data_size(tupleDesc,
-								  toast_values, toast_isnull) > maxDataLen)
+	while (heap_compute_data_size(tupleDesc, toast_values,
+								  toast_isnull) > maxDataLen)
 	{
 		int			biggest_attno = -1;
 		int32		biggest_size = MAXALIGN(TOAST_POINTER_SIZE);
@@ -780,8 +780,8 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 	 * Second we look for attributes of attstorage 'x' or 'e' that are still
 	 * inline.  But skip this if there's no toast table to push them to.
 	 */
-	while (heap_compute_data_size(tupleDesc,
-								  toast_values, toast_isnull) > maxDataLen &&
+	while (heap_compute_data_size(tupleDesc, toast_values,
+								  toast_isnull) > maxDataLen &&
 		   rel->rd_rel->reltoastrelid != InvalidOid)
 	{
 		int			biggest_attno = -1;
@@ -831,8 +831,8 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 	 * Round 3 - this time we take attributes with storage 'm' into
 	 * compression
 	 */
-	while (heap_compute_data_size(tupleDesc,
-								  toast_values, toast_isnull) > maxDataLen)
+	while (heap_compute_data_size(tupleDesc, toast_values,
+								  toast_isnull) > maxDataLen)
 	{
 		int			biggest_attno = -1;
 		int32		biggest_size = MAXALIGN(TOAST_POINTER_SIZE);
@@ -894,8 +894,8 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 	 */
 	maxDataLen = TOAST_TUPLE_TARGET_MAIN - hoff;
 
-	while (heap_compute_data_size(tupleDesc,
-								  toast_values, toast_isnull) > maxDataLen &&
+	while (heap_compute_data_size(tupleDesc, toast_values,
+								  toast_isnull) > maxDataLen &&
 		   rel->rd_rel->reltoastrelid != InvalidOid)
 	{
 		int			biggest_attno = -1;
@@ -969,8 +969,8 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 		if (olddata->t_infomask & HEAP_HASOID)
 			new_header_len += sizeof(Oid);
 		new_header_len = MAXALIGN(new_header_len);
-		new_data_len = heap_compute_data_size(tupleDesc,
-											  toast_values, toast_isnull);
+		new_data_len = heap_compute_data_size(tupleDesc, toast_values,
+											  toast_isnull);
 		new_tuple_len = new_header_len + new_data_len;
 
 		/*
@@ -1202,8 +1202,8 @@ toast_flatten_tuple_to_datum(HeapTupleHeader tup,
 	if (tup->t_infomask & HEAP_HASOID)
 		new_header_len += sizeof(Oid);
 	new_header_len = MAXALIGN(new_header_len);
-	new_data_len = heap_compute_data_size(tupleDesc,
-										  toast_values, toast_isnull);
+	new_data_len = heap_compute_data_size(tupleDesc, toast_values,
+										  toast_isnull);
 	new_tuple_len = new_header_len + new_data_len;
 
 	new_data = (HeapTupleHeader) palloc0(new_tuple_len);
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index ad49964..5f1f288 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -658,7 +658,9 @@ DefineAttr(char *name, char *type, int attnum, int nullness)
 
 	namestrcpy(&attrtypes[attnum]->attname, name);
 	elog(DEBUG4, "column %s %s", NameStr(attrtypes[attnum]->attname), type);
-	attrtypes[attnum]->attnum = attnum + 1;		/* fillatt */
+	attrtypes[attnum]->attnum = attnum + 1;
+	attrtypes[attnum]->attphysnum = attnum + 1;
+	attrtypes[attnum]->attlognum = attnum + 1;
 
 	typeoid = gettype(type);
 
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index a5c78ee..fcc12ab 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -217,6 +217,8 @@ foreach my $catname (@{ $catalogs->{names} })
 				$attnum++;
 				my $row = emit_pgattr_row($table_name, $attr, $priornotnull);
 				$row->{attnum}        = $attnum;
+				$row->{attphysnum}    = $attnum;
+				$row->{attlognum}     = $attnum;
 				$row->{attstattarget} = '-1';
 				$priornotnull &= ($row->{attnotnull} eq 't');
 
@@ -254,6 +256,8 @@ foreach my $catname (@{ $catalogs->{names} })
 					$attnum--;
 					my $row = emit_pgattr_row($table_name, $attr, 1);
 					$row->{attnum}        = $attnum;
+					$row->{attphysnum}    = $attnum;
+					$row->{attlognum}     = $attnum;
 					$row->{attstattarget} = '0';
 
 					# some catalogs don't have oids
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 17f7266..43466c4 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -136,37 +136,49 @@ static List *insert_ordered_unique_oid(List *list, Oid datum);
 
 static FormData_pg_attribute a1 = {
 	0, {"ctid"}, TIDOID, 0, sizeof(ItemPointerData),
-	SelfItemPointerAttributeNumber, 0, -1, -1,
+	SelfItemPointerAttributeNumber, SelfItemPointerAttributeNumber,
+	SelfItemPointerAttributeNumber,
+	0, -1, -1,
 	false, 'p', 's', true, false, false, true, 0
 };
 
 static FormData_pg_attribute a2 = {
 	0, {"oid"}, OIDOID, 0, sizeof(Oid),
-	ObjectIdAttributeNumber, 0, -1, -1,
+	ObjectIdAttributeNumber, ObjectIdAttributeNumber,
+	ObjectIdAttributeNumber,
+	0, -1, -1,
 	true, 'p', 'i', true, false, false, true, 0
 };
 
 static FormData_pg_attribute a3 = {
 	0, {"xmin"}, XIDOID, 0, sizeof(TransactionId),
-	MinTransactionIdAttributeNumber, 0, -1, -1,
+	MinTransactionIdAttributeNumber, MinTransactionIdAttributeNumber,
+	MinTransactionIdAttributeNumber,
+	0, -1, -1,
 	true, 'p', 'i', true, false, false, true, 0
 };
 
 static FormData_pg_attribute a4 = {
 	0, {"cmin"}, CIDOID, 0, sizeof(CommandId),
-	MinCommandIdAttributeNumber, 0, -1, -1,
+	MinCommandIdAttributeNumber, MinCommandIdAttributeNumber,
+	MinCommandIdAttributeNumber,
+	0, -1, -1,
 	true, 'p', 'i', true, false, false, true, 0
 };
 
 static FormData_pg_attribute a5 = {
 	0, {"xmax"}, XIDOID, 0, sizeof(TransactionId),
-	MaxTransactionIdAttributeNumber, 0, -1, -1,
+	MaxTransactionIdAttributeNumber, MaxTransactionIdAttributeNumber,
+	MaxTransactionIdAttributeNumber,
+	0, -1, -1,
 	true, 'p', 'i', true, false, false, true, 0
 };
 
 static FormData_pg_attribute a6 = {
 	0, {"cmax"}, CIDOID, 0, sizeof(CommandId),
-	MaxCommandIdAttributeNumber, 0, -1, -1,
+	MaxCommandIdAttributeNumber, MaxCommandIdAttributeNumber,
+	MaxCommandIdAttributeNumber,
+	0, -1, -1,
 	true, 'p', 'i', true, false, false, true, 0
 };
 
@@ -178,7 +190,9 @@ static FormData_pg_attribute a6 = {
  */
 static FormData_pg_attribute a7 = {
 	0, {"tableoid"}, OIDOID, 0, sizeof(Oid),
-	TableOidAttributeNumber, 0, -1, -1,
+	TableOidAttributeNumber, TableOidAttributeNumber,
+	TableOidAttributeNumber,
+	0, -1, -1,
 	true, 'p', 'i', true, false, false, true, 0
 };
 
@@ -615,6 +629,8 @@ InsertPgAttributeTuple(Relation pg_attribute_rel,
 	values[Anum_pg_attribute_attstattarget - 1] = Int32GetDatum(new_attribute->attstattarget);
 	values[Anum_pg_attribute_attlen - 1] = Int16GetDatum(new_attribute->attlen);
 	values[Anum_pg_attribute_attnum - 1] = Int16GetDatum(new_attribute->attnum);
+	values[Anum_pg_attribute_attphysnum - 1] = Int16GetDatum(new_attribute->attphysnum);
+	values[Anum_pg_attribute_attlognum - 1] = Int16GetDatum(new_attribute->attlognum);
 	values[Anum_pg_attribute_attndims - 1] = Int32GetDatum(new_attribute->attndims);
 	values[Anum_pg_attribute_attcacheoff - 1] = Int32GetDatum(new_attribute->attcacheoff);
 	values[Anum_pg_attribute_atttypmod - 1] = Int32GetDatum(new_attribute->atttypmod);
@@ -2174,6 +2190,7 @@ AddRelationNewConstraints(Relation rel,
 	foreach(cell, newColDefaults)
 	{
 		RawColumnDefault *colDef = (RawColumnDefault *) lfirst(cell);
+		/* FIXME -- does this need to change? apparently not, but it's suspicious */
 		Form_pg_attribute atp = rel->rd_att->attrs[colDef->attnum - 1];
 
 		expr = cookDefault(pstate, colDef->raw_default,
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index f85ed93..5df7302 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -348,6 +348,8 @@ ConstructTupleDescriptor(Relation heapRelation,
 			 * attr
 			 */
 			to->attnum = i + 1;
+			to->attlognum = i + 1;
+			to->attphysnum = i + 1;
 
 			to->attstattarget = -1;
 			to->attcacheoff = -1;
@@ -382,6 +384,8 @@ ConstructTupleDescriptor(Relation heapRelation,
 			 * Assign some of the attributes values. Leave the rest as 0.
 			 */
 			to->attnum = i + 1;
+			to->attlognum = i + 1;
+			to->attphysnum = i + 1;
 			to->atttypid = keyType;
 			to->attlen = typeTup->typlen;
 			to->attbyval = typeTup->typbyval;
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 92ff632..fc60a6a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -158,7 +158,7 @@ typedef struct CopyStateData
 	bool		file_has_oids;
 	FmgrInfo	oid_in_function;
 	Oid			oid_typioparam;
-	FmgrInfo   *in_functions;	/* array of input functions for each attrs */
+	FmgrInfo   *in_functions;	/* array of input functions for each attr */
 	Oid		   *typioparams;	/* array of element types for in_functions */
 	int		   *defmap;			/* array of default att numbers */
 	ExprState **defexprs;		/* array of default att expressions */
@@ -4319,7 +4319,7 @@ CopyGetAttnums(TupleDesc tupDesc, Relation rel, List *attnamelist)
 	if (attnamelist == NIL)
 	{
 		/* Generate default column list */
-		Form_pg_attribute *attr = tupDesc->attrs;
+		Form_pg_attribute *attr = TupleDescGetLogSortedAttrs(tupDesc);
 		int			attr_count = tupDesc->natts;
 		int			i;
 
@@ -4327,7 +4327,7 @@ CopyGetAttnums(TupleDesc tupDesc, Relation rel, List *attnamelist)
 		{
 			if (attr[i]->attisdropped)
 				continue;
-			attnums = lappend_int(attnums, i + 1);
+			attnums = lappend_int(attnums, attr[i]->attnum);
 		}
 	}
 	else
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f5d5b63..a646b8c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1484,7 +1484,8 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
 		TupleDesc	tupleDesc;
 		TupleConstr *constr;
 		AttrNumber *newattno;
-		AttrNumber	parent_attno;
+		AttrNumber	parent_colctr;
+		Form_pg_attribute *parent_attrs;
 
 		/*
 		 * A self-exclusive lock is needed here.  If two backends attempt to
@@ -1541,6 +1542,7 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
 			parentsWithOids++;
 
 		tupleDesc = RelationGetDescr(relation);
+		parent_attrs = TupleDescGetLogSortedAttrs(tupleDesc);
 		constr = tupleDesc->constr;
 
 		/*
@@ -1551,10 +1553,17 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
 		newattno = (AttrNumber *)
 			palloc0(tupleDesc->natts * sizeof(AttrNumber));
 
-		for (parent_attno = 1; parent_attno <= tupleDesc->natts;
-			 parent_attno++)
+		/*
+		 * parent_colctr is the index into the logical-ordered array of parent
+		 * columns; parent_attno is the attnum of each column.  The newattno
+		 * map entries must use the latter for numbering; the former is a loop
+		 * counter only.
+		 */
+		for (parent_colctr = 1; parent_colctr <= tupleDesc->natts;
+			 parent_colctr++)
 		{
-			Form_pg_attribute attribute = tupleDesc->attrs[parent_attno - 1];
+			Form_pg_attribute attribute = parent_attrs[parent_colctr - 1];
+			AttrNumber	parent_attno = attribute->attnum;
 			char	   *attributeName = NameStr(attribute->attname);
 			int			exist_attno;
 			ColumnDef  *def;
@@ -4726,6 +4735,8 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
 	attribute.attcacheoff = -1;
 	attribute.atttypmod = typmod;
 	attribute.attnum = newattnum;
+	attribute.attlognum = newattnum;
+	attribute.attphysnum = newattnum;
 	attribute.attbyval = tform->typbyval;
 	attribute.attndims = list_length(colDef->typeName->arrayBounds);
 	attribute.attstorage = tform->typstorage;
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index fec76d4..89b850d 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -624,6 +624,9 @@ ExecEvalScalarVar(ExprState *exprstate, ExprContext *econtext,
 
 		attr = slot_tupdesc->attrs[attnum - 1];
 
+		/* The attnums must match (Var vs. the attribute). */
+		Assert(attr->attnum == attnum);
+
 		/* can't check type if dropped, since atttypid is probably 0 */
 		if (!attr->attisdropped)
 		{
@@ -830,6 +833,7 @@ ExecEvalWholeRowVar(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 									  slot_tupdesc->natts,
 									  var_tupdesc->natts)));
 
+		/* FIXME -- this should probably consider attributes in logical order */
 		for (i = 0; i < var_tupdesc->natts; i++)
 		{
 			Form_pg_attribute vattr = var_tupdesc->attrs[i];
@@ -1029,6 +1033,7 @@ ExecEvalWholeRowSlow(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 	/* Check to see if any dropped attributes are non-null */
 	for (i = 0; i < var_tupdesc->natts; i++)
 	{
+		/* XXX should probably consider attributes in logical order */
 		Form_pg_attribute vattr = var_tupdesc->attrs[i];
 		Form_pg_attribute sattr = tupleDesc->attrs[i];
 
@@ -1182,6 +1187,9 @@ ExecEvalParamExtern(ExprState *exprstate, ExprContext *econtext,
  *		to use these.  Ex: overpaid(EMP) might call GetAttributeByNum().
  *		Note: these are actually rather slow because they do a typcache
  *		lookup on each call.
+ *
+ *	FIXME -- probably these functions should consider attrno a logical column
+ *	number
  */
 Datum
 GetAttributeByNum(HeapTupleHeader tuple,
@@ -1618,6 +1626,7 @@ tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc)
 
 	for (i = 0; i < dst_tupdesc->natts; i++)
 	{
+		/* XXX should consider attributes in logical order? */
 		Form_pg_attribute dattr = dst_tupdesc->attrs[i];
 		Form_pg_attribute sattr = src_tupdesc->attrs[i];
 
@@ -3258,6 +3267,7 @@ ExecEvalRow(RowExprState *rstate,
 	int			natts;
 	ListCell   *arg;
 	int			i;
+	Form_pg_attribute *attrs;
 
 	/* Set default values for result flags: non-null, not a set result */
 	*isNull = false;
@@ -3272,13 +3282,18 @@ ExecEvalRow(RowExprState *rstate,
 	/* preset to nulls in case rowtype has some later-added columns */
 	memset(isnull, true, natts * sizeof(bool));
 
-	/* Evaluate field values */
+	/*
+	 * Evaluate field values.  Note the incoming expr array is sorted in
+	 * logical order.
+	 */
+	attrs = TupleDescGetLogSortedAttrs(rstate->tupdesc);
 	i = 0;
 	foreach(arg, rstate->args)
 	{
 		ExprState  *e = (ExprState *) lfirst(arg);
+		int			attnum = attrs[i]->attnum - 1;
 
-		values[i] = ExecEvalExpr(e, econtext, &isnull[i], NULL);
+		values[attnum] = ExecEvalExpr(e, econtext, &isnull[attnum], NULL);
 		i++;
 	}
 
@@ -4028,6 +4043,7 @@ ExecEvalFieldSelect(FieldSelectState *fstate,
 	TupleDesc	tupDesc;
 	Form_pg_attribute attr;
 	HeapTupleData tmptup;
+	Form_pg_attribute *attrs;
 
 	tupDatum = ExecEvalExpr(fstate->arg, econtext, isNull, isDone);
 
@@ -4055,7 +4071,9 @@ ExecEvalFieldSelect(FieldSelectState *fstate,
 	if (fieldnum > tupDesc->natts)		/* should never happen */
 		elog(ERROR, "attribute number %d exceeds number of columns %d",
 			 fieldnum, tupDesc->natts);
-	attr = tupDesc->attrs[fieldnum - 1];
+//	attrs = TupleDescGetLogSortedAttrs(tupDesc);
+	attrs = tupDesc->attrs;
+	attr = attrs[fieldnum - 1];
 
 	/* Check for dropped column, and force a NULL result if so */
 	if (attr->attisdropped)
@@ -4078,7 +4096,7 @@ ExecEvalFieldSelect(FieldSelectState *fstate,
 	tmptup.t_data = tuple;
 
 	result = heap_getattr(&tmptup,
-						  fieldnum,
+						  attr->attnum,
 						  tupDesc,
 						  isNull);
 	return result;
@@ -4104,6 +4122,7 @@ ExecEvalFieldStore(FieldStoreState *fstate,
 	bool	   *isnull;
 	Datum		save_datum;
 	bool		save_isNull;
+	Form_pg_attribute *attrs;
 	ListCell   *l1,
 			   *l2;
 
@@ -4115,6 +4134,8 @@ ExecEvalFieldStore(FieldStoreState *fstate,
 	/* Lookup tupdesc if first time through or after rescan */
 	tupDesc = get_cached_rowtype(fstore->resulttype, -1,
 								 &fstate->argdesc, econtext);
+//	attrs = TupleDescGetLogSortedAttrs(tupDesc);
+	attrs = tupDesc->attrs;
 
 	/* Allocate workspace */
 	values = (Datum *) palloc(tupDesc->natts * sizeof(Datum));
@@ -4153,8 +4174,10 @@ ExecEvalFieldStore(FieldStoreState *fstate,
 	{
 		ExprState  *newval = (ExprState *) lfirst(l1);
 		AttrNumber	fieldnum = lfirst_int(l2);
+		AttrNumber	attnum = attrs[fieldnum - 1]->attnum;
+
 
-		Assert(fieldnum > 0 && fieldnum <= tupDesc->natts);
+		Assert(attnum > 0 && attnum <= tupDesc->natts);
 
 		/*
 		 * Use the CaseTestExpr mechanism to pass down the old value of the
@@ -4165,13 +4188,13 @@ ExecEvalFieldStore(FieldStoreState *fstate,
 		 * assignment can't be within a CASE either.  (So saving and restoring
 		 * the caseValue is just paranoia, but let's do it anyway.)
 		 */
-		econtext->caseValue_datum = values[fieldnum - 1];
-		econtext->caseValue_isNull = isnull[fieldnum - 1];
+		econtext->caseValue_datum = values[attnum - 1];
+		econtext->caseValue_isNull = isnull[attnum - 1];
 
-		values[fieldnum - 1] = ExecEvalExpr(newval,
-											econtext,
-											&isnull[fieldnum - 1],
-											NULL);
+		values[attnum - 1] = ExecEvalExpr(newval,
+										  econtext,
+										  &isnull[attnum - 1],
+										  NULL);
 	}
 
 	econtext->caseValue_datum = save_datum;
@@ -4818,12 +4841,13 @@ ExecInitExpr(Expr *node, PlanState *parent)
 					rstate->tupdesc = lookup_rowtype_tupdesc_copy(rowexpr->row_typeid, -1);
 				}
 				/* In either case, adopt RowExpr's column aliases */
+				/* XXX this is problematic ... names should be assigned in logical order */
 				ExecTypeSetColNames(rstate->tupdesc, rowexpr->colnames);
 				/* Bless the tupdesc in case it's now of type RECORD */
 				BlessTupleDesc(rstate->tupdesc);
 				/* Set up evaluation, skipping any deleted columns */
 				Assert(list_length(rowexpr->args) <= rstate->tupdesc->natts);
-				attrs = rstate->tupdesc->attrs;
+				attrs = TupleDescGetLogSortedAttrs(rstate->tupdesc);
 				i = 0;
 				foreach(l, rowexpr->args)
 				{
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 3f0d809..d2721fe 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -271,11 +271,12 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc
 	int			attrno;
 	bool		hasoid;
 	ListCell   *tlist_item = list_head(tlist);
+	Form_pg_attribute *attrs = TupleDescGetPhysSortedAttrs(tupdesc);
 
 	/* Check the tlist attributes */
 	for (attrno = 1; attrno <= numattrs; attrno++)
 	{
-		Form_pg_attribute att_tup = tupdesc->attrs[attrno - 1];
+		Form_pg_attribute att_tup = attrs[attrno - 1];
 		Var		   *var;
 
 		if (tlist_item == NULL)
@@ -286,7 +287,7 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc
 		/* if these Asserts fail, planner messed up */
 		Assert(var->varno == varno);
 		Assert(var->varlevelsup == 0);
-		if (var->varattno != attrno)
+		if (var->varattno != att_tup->attnum)
 			return false;		/* out of order */
 		if (att_tup->attisdropped)
 			return false;		/* table contains dropped columns */
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 753754d..7092f2b 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -120,6 +120,7 @@ MakeTupleTableSlot(void)
 	slot->tts_mcxt = CurrentMemoryContext;
 	slot->tts_buffer = InvalidBuffer;
 	slot->tts_nvalid = 0;
+	slot->tts_nphysvalid = 0;
 	slot->tts_values = NULL;
 	slot->tts_isnull = NULL;
 	slot->tts_mintuple = NULL;
@@ -353,6 +354,7 @@ ExecStoreTuple(HeapTuple tuple,
 
 	/* Mark extracted state invalid */
 	slot->tts_nvalid = 0;
+	slot->tts_nphysvalid = 0;
 
 	/*
 	 * If tuple is on a disk page, keep the page pinned as long as we hold a
@@ -426,6 +428,7 @@ ExecStoreMinimalTuple(MinimalTuple mtup,
 
 	/* Mark extracted state invalid */
 	slot->tts_nvalid = 0;
+	slot->tts_nphysvalid = 0;
 
 	return slot;
 }
@@ -472,6 +475,7 @@ ExecClearTuple(TupleTableSlot *slot)	/* slot in which to store tuple */
 	 */
 	slot->tts_isempty = true;
 	slot->tts_nvalid = 0;
+	slot->tts_nphysvalid = 0;
 
 	return slot;
 }
@@ -499,6 +503,7 @@ ExecStoreVirtualTuple(TupleTableSlot *slot)
 
 	slot->tts_isempty = false;
 	slot->tts_nvalid = slot->tts_tupleDescriptor->natts;
+	slot->tts_nphysvalid = slot->tts_tupleDescriptor->natts;
 
 	return slot;
 }
@@ -595,11 +600,12 @@ ExecCopySlotMinimalTuple(TupleTableSlot *slot)
 		return minimal_tuple_from_heap_tuple(slot->tts_tuple);
 
 	/*
-	 * Otherwise we need to build a tuple from the Datum array.
+	 * Otherwise we need to build a tuple from the Datum array.  The
+	 * arrays in the slot are in physical order, so we need to re-sort
+	 * them in attnum order to pass them to heap_form_minimal_tuple.
 	 */
 	return heap_form_minimal_tuple(slot->tts_tupleDescriptor,
-								   slot->tts_values,
-								   slot->tts_isnull);
+								   slot->tts_values, slot->tts_isnull);
 }
 
 /* --------------------------------
@@ -771,6 +777,7 @@ ExecMaterializeSlot(TupleTableSlot *slot)
 	 * that we have not pfree'd tts_mintuple, if there is one.)
 	 */
 	slot->tts_nvalid = 0;
+	slot->tts_nphysvalid = 0;
 
 	/*
 	 * On the same principle of not depending on previous remote storage,
@@ -925,6 +932,7 @@ ExecTypeFromTLInternal(List *targetList, bool hasoid, bool skipjunk)
 
 		if (skipjunk && tle->resjunk)
 			continue;
+
 		TupleDescInitEntry(typeInfo,
 						   cur_resno,
 						   tle->resname,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 6c3eff7..952fb4f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1657,6 +1657,7 @@ check_sql_fn_retval(Oid func_id, Oid rettype, List *queryTreeList,
 	{
 		/* Returns a rowtype */
 		TupleDesc	tupdesc;
+		Form_pg_attribute *attrs;
 		int			tupnatts;	/* physical number of columns in tuple */
 		int			tuplogcols; /* # of nondeleted columns in tuple */
 		int			colindex;	/* physical column index */
@@ -1716,11 +1717,13 @@ check_sql_fn_retval(Oid func_id, Oid rettype, List *queryTreeList,
 
 		/*
 		 * Verify that the targetlist matches the return tuple type. We scan
-		 * the non-deleted attributes to ensure that they match the datatypes
+		 * the non-deleted attributes, in logical ordering, to ensure that
+		 * they match the datatypes
 		 * of the non-resjunk columns.  For deleted attributes, insert NULL
 		 * result columns if the caller asked for that.
 		 */
 		tupnatts = tupdesc->natts;
+		attrs = TupleDescGetLogSortedAttrs(tupdesc);
 		tuplogcols = 0;			/* we'll count nondeleted cols as we go */
 		colindex = 0;
 		newtlist = NIL;			/* these are only used if modifyTargetList */
@@ -1749,7 +1752,7 @@ check_sql_fn_retval(Oid func_id, Oid rettype, List *queryTreeList,
 							 errmsg("return type mismatch in function declared to return %s",
 									format_type_be(rettype)),
 					errdetail("Final statement returns too many columns.")));
-				attr = tupdesc->attrs[colindex - 1];
+				attr = attrs[colindex - 1];
 				if (attr->attisdropped && modifyTargetList)
 				{
 					Expr	   *null_expr;
@@ -1806,7 +1809,7 @@ check_sql_fn_retval(Oid func_id, Oid rettype, List *queryTreeList,
 		/* remaining columns in tupdesc had better all be dropped */
 		for (colindex++; colindex <= tupnatts; colindex++)
 		{
-			if (!tupdesc->attrs[colindex - 1]->attisdropped)
+			if (!attrs[colindex - 1]->attisdropped)
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
 						 errmsg("return type mismatch in function declared to return %s",
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 83d562e..73df8af 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -146,6 +146,7 @@ ExecGroup(GroupState *node)
 			 * Compare with first tuple and see if this tuple is of the same
 			 * group.  If so, ignore it and keep scanning.
 			 */
+			/* FIXME -- here, the grpColIdx seems to cause trouble */
 			if (!execTuplesMatch(firsttupleslot, outerslot,
 								 numCols, grpColIdx,
 								 node->eqfunctions,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 5282a4f..3570020 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1084,6 +1084,7 @@ _copyVar(const Var *from)
 
 	COPY_SCALAR_FIELD(varno);
 	COPY_SCALAR_FIELD(varattno);
+	COPY_SCALAR_FIELD(varphysno);
 	COPY_SCALAR_FIELD(vartype);
 	COPY_SCALAR_FIELD(vartypmod);
 	COPY_SCALAR_FIELD(varcollid);
@@ -1792,6 +1793,7 @@ _copyTargetEntry(const TargetEntry *from)
 	COPY_SCALAR_FIELD(ressortgroupref);
 	COPY_SCALAR_FIELD(resorigtbl);
 	COPY_SCALAR_FIELD(resorigcol);
+	COPY_SCALAR_FIELD(resorigphyscol);
 	COPY_SCALAR_FIELD(resjunk);
 
 	return newnode;
@@ -2009,6 +2011,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
 	COPY_SCALAR_FIELD(rtekind);
 	COPY_SCALAR_FIELD(relid);
 	COPY_SCALAR_FIELD(relkind);
+	COPY_NODE_FIELD(lognums);
 	COPY_NODE_FIELD(subquery);
 	COPY_SCALAR_FIELD(security_barrier);
 	COPY_SCALAR_FIELD(jointype);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index fe509b0..f3b47e5 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -141,6 +141,7 @@ _equalVar(const Var *a, const Var *b)
 {
 	COMPARE_SCALAR_FIELD(varno);
 	COMPARE_SCALAR_FIELD(varattno);
+	COMPARE_SCALAR_FIELD(varphysno);
 	COMPARE_SCALAR_FIELD(vartype);
 	COMPARE_SCALAR_FIELD(vartypmod);
 	COMPARE_SCALAR_FIELD(varcollid);
@@ -691,6 +692,7 @@ _equalTargetEntry(const TargetEntry *a, const TargetEntry *b)
 	COMPARE_SCALAR_FIELD(ressortgroupref);
 	COMPARE_SCALAR_FIELD(resorigtbl);
 	COMPARE_SCALAR_FIELD(resorigcol);
+	COMPARE_SCALAR_FIELD(resorigphyscol);
 	COMPARE_SCALAR_FIELD(resjunk);
 
 	return true;
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 6fdf44d..0c54920 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -90,6 +90,8 @@ makeVar(Index varno,
 
 	/* Likewise, we just set location to "unknown" here */
 	var->location = -1;
+	/* Likewise, we just set physical position to invalid */
+	var->varphysno = InvalidAttrNumber;
 
 	return var;
 }
@@ -250,6 +252,7 @@ makeTargetEntry(Expr *expr,
 	tle->ressortgroupref = 0;
 	tle->resorigtbl = InvalidOid;
 	tle->resorigcol = 0;
+	tle->resorigphyscol = 0;
 
 	tle->resjunk = resjunk;
 
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 2f417fe..90e44cb 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -24,6 +24,7 @@
 #include <ctype.h>
 
 #include "lib/stringinfo.h"
+#include "nodes/execnodes.h"
 #include "nodes/plannodes.h"
 #include "nodes/relation.h"
 #include "utils/datum.h"
@@ -917,6 +918,7 @@ _outVar(StringInfo str, const Var *node)
 
 	WRITE_UINT_FIELD(varno);
 	WRITE_INT_FIELD(varattno);
+	WRITE_INT_FIELD(varphysno);
 	WRITE_OID_FIELD(vartype);
 	WRITE_INT_FIELD(vartypmod);
 	WRITE_OID_FIELD(varcollid);
@@ -1439,10 +1441,27 @@ _outTargetEntry(StringInfo str, const TargetEntry *node)
 	WRITE_UINT_FIELD(ressortgroupref);
 	WRITE_OID_FIELD(resorigtbl);
 	WRITE_INT_FIELD(resorigcol);
+	WRITE_INT_FIELD(resorigphyscol);
 	WRITE_BOOL_FIELD(resjunk);
 }
 
 static void
+_outGenericExprState(StringInfo str, const GenericExprState *node)
+{
+	WRITE_NODE_TYPE("GENERICEXPRSTATE");
+
+	WRITE_NODE_FIELD(arg);
+}
+
+static void
+_outExprState(StringInfo str, const ExprState *node)
+{
+	WRITE_NODE_TYPE("EXPRSTATE");
+
+	WRITE_NODE_FIELD(expr);
+}
+
+static void
 _outRangeTblRef(StringInfo str, const RangeTblRef *node)
 {
 	WRITE_NODE_TYPE("RANGETBLREF");
@@ -2425,6 +2444,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
 		case RTE_RELATION:
 			WRITE_OID_FIELD(relid);
 			WRITE_CHAR_FIELD(relkind);
+			WRITE_NODE_FIELD(lognums);
 			break;
 		case RTE_SUBQUERY:
 			WRITE_NODE_FIELD(subquery);
@@ -3106,6 +3126,12 @@ _outNode(StringInfo str, const void *obj)
 			case T_FromExpr:
 				_outFromExpr(str, obj);
 				break;
+			case T_GenericExprState:
+				_outGenericExprState(str, obj);
+				break;
+			case T_ExprState:
+				_outExprState(str, obj);
+				break;
 
 			case T_Path:
 				_outPath(str, obj);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 563209c..d82e533 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -429,6 +429,7 @@ _readVar(void)
 
 	READ_UINT_FIELD(varno);
 	READ_INT_FIELD(varattno);
+	READ_INT_FIELD(varphysno);
 	READ_OID_FIELD(vartype);
 	READ_INT_FIELD(vartypmod);
 	READ_OID_FIELD(varcollid);
@@ -1143,6 +1144,7 @@ _readTargetEntry(void)
 	READ_UINT_FIELD(ressortgroupref);
 	READ_OID_FIELD(resorigtbl);
 	READ_INT_FIELD(resorigcol);
+	READ_INT_FIELD(resorigphyscol);
 	READ_BOOL_FIELD(resjunk);
 
 	READ_DONE();
@@ -1218,6 +1220,7 @@ _readRangeTblEntry(void)
 		case RTE_RELATION:
 			READ_OID_FIELD(relid);
 			READ_CHAR_FIELD(relkind);
+			READ_NODE_FIELD(lognums);
 			break;
 		case RTE_SUBQUERY:
 			READ_NODE_FIELD(subquery);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ec828cd..d392ade 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -78,8 +78,9 @@ typedef struct
 	(((con)->consttype == REGCLASSOID || (con)->consttype == OIDOID) && \
 	 !(con)->constisnull)
 
-#define fix_scan_list(root, lst, rtoffset) \
+/* #define fix_scan_list(root, lst, rtoffset) \
 	((List *) fix_scan_expr(root, (Node *) (lst), rtoffset))
+	*/
 
 static void add_rtes_to_flat_rtable(PlannerInfo *root, bool recursing);
 static void flatten_unplanned_rtes(PlannerGlobal *glob, RangeTblEntry *rte);
@@ -156,13 +157,16 @@ static bool extract_query_dependencies_walker(Node *node,
  * 3. We adjust Vars in upper plan nodes to refer to the outputs of their
  * subplans.
  *
- * 4. PARAM_MULTIEXPR Params are replaced by regular PARAM_EXEC Params,
+ * 4. We correct attphysnum annotations in scan tuple descriptors to match
+ * the real values.
+ *
+ * 5. PARAM_MULTIEXPR Params are replaced by regular PARAM_EXEC Params,
  * now that we have finished planning all MULTIEXPR subplans.
  *
- * 5. We compute regproc OIDs for operators (ie, we look up the function
+ * 6. We compute regproc OIDs for operators (ie, we look up the function
  * that implements each op).
  *
- * 6. We create lists of specific objects that the plan depends on.
+ * 7. We create lists of specific objects that the plan depends on.
  * This will be used by plancache.c to drive invalidation of cached plans.
  * Relation dependencies are represented by OIDs, and everything else by
  * PlanInvalItems (this distinction is motivated by the shared-inval APIs).
@@ -418,6 +422,89 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte)
 		glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
 }
 
+
+static Node *fix_physno_mutator(Node *node, void *context);
+
+static List *
+fix_scan_list(PlannerInfo *root, List *targetlist, int rtoffset)
+{
+	Node  *node;
+
+	node = fix_physno_mutator((Node *) targetlist, root);
+	return (List *) fix_scan_expr(root, node, rtoffset);
+}
+
+#include "parser/parsetree.h"
+#include "utils/rel.h"
+#include "access/heapam.h"
+static Node *
+fix_physno_mutator(Node *node, void *context)
+{
+	PlannerInfo *root = (PlannerInfo *) context;
+
+	if (node == NULL)
+		return NULL;
+
+	if (IsA(node, Var))
+	{
+		/* do the transformation here */
+		/*
+		 * FIXME --- there must be a way to do this properly .. perhaps save the
+		 * attphysnum array in the RTE struct?
+		 */
+		Var		*var = (Var *) node;
+		RangeTblEntry *rte;
+		Relation	rel;
+
+		/* varphysno is equal to varattno by default */
+		var->varphysno = var->varattno;
+
+		if (var->varattno > 0 && !IS_SPECIAL_VARNO(var->varno))
+		{
+			rte = rt_fetch(var->varno, root->parse->rtable);
+
+			/* if it's an actual relation, use find the proper attphysnum */
+			if (rte->rtekind == RTE_RELATION)
+			{
+				rel = relation_open(rte->relid, NoLock);
+
+				/*
+				 * First Var in this relation, cache the attphysnums.
+				 *
+				 * FIXME This caches all the physnums at once, as it's simpler
+				 */
+				if (rte->physnums == NULL)
+				{
+					AttrNumber attnum;
+
+					/* will be initialized to InvalidAttrNumber (0), which is OK */
+					rte->physnums = (AttrNumber*)palloc0(rel->rd_att->natts * sizeof(AttrNumber));
+
+					for (attnum = 1; attnum <= rel->rd_att->natts; attnum++)
+					{
+						/* must not be already set (duplicate attphysnum) */
+						Assert(rte->physnums[rel->rd_att->attrs[attnum-1]->attnum-1] == 0);
+						rte->physnums[rel->rd_att->attrs[attnum-1]->attnum-1]
+										= rel->rd_att->attrs[attnum-1]->attphysnum;
+					}
+				}
+
+				/* lookup the varphysno in the cache */
+				var->varphysno = rte->physnums[var->varattno-1];
+
+				/* make sure we actually found it */
+				Assert(var->varphysno != InvalidAttrNumber);
+
+				relation_close(rel, NoLock);
+			}
+		}
+	}
+
+	return expression_tree_mutator(node, fix_physno_mutator, (void *) context);
+}
+
+
+
 /*
  * set_plan_refs: recurse through the Plan nodes of a single subquery level
  */
@@ -613,6 +700,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			 */
 			set_dummy_tlist_references(plan, rtoffset);
 
+
 			/*
 			 * Since these plan types don't check quals either, we should not
 			 * find any qual expression attached to them.
@@ -1066,6 +1154,7 @@ copyVar(Var *var)
 	return newvar;
 }
 
+
 /*
  * fix_expr_common
  *		Do generic set_plan_references processing on an expression node
@@ -1172,6 +1261,7 @@ fix_param_node(PlannerInfo *root, Param *p)
  *		Do set_plan_references processing on a scan-level expression
  *
  * This consists of incrementing all Vars' varnos by rtoffset,
+ * setting all Vars' varphysnum to their correct values,
  * replacing PARAM_MULTIEXPR Params, expanding PlaceHolderVars,
  * looking up operator opcode info for OpExpr and related nodes,
  * and adding OIDs from regclass Const nodes into root->glob->relationOids.
@@ -1206,6 +1296,7 @@ fix_scan_expr(PlannerInfo *root, Node *node, int rtoffset)
 	}
 }
 
+
 static Node *
 fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context)
 {
@@ -1214,7 +1305,7 @@ fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context)
 	if (IsA(node, Var))
 	{
 		Var		   *var = copyVar((Var *) node);
-
+	
 		Assert(var->varlevelsup == 0);
 
 		/*
@@ -1227,6 +1318,7 @@ fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context)
 			var->varno += context->rtoffset;
 		if (var->varnoold > 0)
 			var->varnoold += context->rtoffset;
+
 		return (Node *) var;
 	}
 	if (IsA(node, Param))
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 8a0199b..7ccfb2a 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -1749,7 +1749,7 @@ pullup_replace_vars_callback(Var *var,
 		 * expansion with varlevelsup = 0, and then adjust if needed.
 		 */
 		expandRTE(rcon->target_rte,
-				  var->varno, 0 /* not varlevelsup */ , var->location,
+				  var->varno, 0 /* not varlevelsup */ , var->location, false,
 				  (var->vartype != RECORDOID),
 				  &colnames, &fields);
 		/* Adjust the generated per-field Vars, but don't insert PHVs */
diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c
index 8a6c3cc..384c599 100644
--- a/src/backend/optimizer/prep/preptlist.c
+++ b/src/backend/optimizer/prep/preptlist.c
@@ -216,6 +216,7 @@ expand_targetlist(List *tlist, int command_type,
 	 */
 	rel = heap_open(getrelid(result_relation, range_table), NoLock);
 
+	/* FIXME --- do we need a different order of attributes here? */
 	numattrs = RelationGetNumberOfAttributes(rel);
 
 	for (attrno = 1; attrno <= numattrs; attrno++)
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 84d58ae..af10116 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2216,6 +2216,8 @@ CommuteRowCompareExpr(RowCompareExpr *clause)
  * is still what it was when the expression was parsed.  This is needed to
  * guard against improper simplification after ALTER COLUMN TYPE.  (XXX we
  * may well need to make similar checks elsewhere?)
+ *
+ * FIXME do we need to do something about the fieldnum here?
  */
 static bool
 rowtype_field_matches(Oid rowtypeid, int fieldnum,
@@ -3253,6 +3255,7 @@ eval_const_expressions_mutator(Node *node,
 							return fld;
 					}
 				}
+				/* FIXME  does this need change? */
 				newfselect = makeNode(FieldSelect);
 				newfselect->arg = (Expr *) arg;
 				newfselect->fieldnum = fselect->fieldnum;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 313a5c1..de5ce38 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -858,17 +858,19 @@ build_physical_tlist(PlannerInfo *root, RelOptInfo *rel)
 	int			attrno,
 				numattrs;
 	List	   *colvars;
+	Form_pg_attribute *attrs;
 
 	switch (rte->rtekind)
 	{
 		case RTE_RELATION:
 			/* Assume we already have adequate lock */
 			relation = heap_open(rte->relid, NoLock);
+			attrs = TupleDescGetLogSortedAttrs(RelationGetDescr(relation));
 
 			numattrs = RelationGetNumberOfAttributes(relation);
 			for (attrno = 1; attrno <= numattrs; attrno++)
 			{
-				Form_pg_attribute att_tup = relation->rd_att->attrs[attrno - 1];
+				Form_pg_attribute att_tup = attrs[attrno - 1];
 
 				if (att_tup->attisdropped)
 				{
@@ -918,7 +920,7 @@ build_physical_tlist(PlannerInfo *root, RelOptInfo *rel)
 		case RTE_VALUES:
 		case RTE_CTE:
 			/* Not all of these can have dropped cols, but share code anyway */
-			expandRTE(rte, varno, 0, -1, true /* include dropped */ ,
+			expandRTE(rte, varno, 0, -1, true /* include dropped */ , false,
 					  NULL, &colvars);
 			foreach(l, colvars)
 			{
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index a68f2e8..7049a33 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -682,7 +682,7 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt)
 		/*
 		 * Generate list of Vars referencing the RTE
 		 */
-		expandRTE(rte, rtr->rtindex, 0, -1, false, NULL, &exprList);
+		expandRTE(rte, rtr->rtindex, 0, -1, false, false, NULL, &exprList);
 	}
 	else
 	{
@@ -1209,7 +1209,7 @@ transformValuesClause(ParseState *pstate, SelectStmt *stmt)
 	 * Generate a targetlist as though expanding "*"
 	 */
 	Assert(pstate->p_next_resno == 1);
-	qry->targetList = expandRelAttrs(pstate, rte, rtindex, 0, -1);
+	qry->targetList = expandRelAttrs(pstate, rte, rtindex, 0, false, -1);
 
 	/*
 	 * The grammar allows attaching ORDER BY, LIMIT, and FOR UPDATE to a
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 8d90b50..1766afa 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -723,7 +723,7 @@ transformRangeFunction(ParseState *pstate, RangeFunction *r)
  *
  * *top_rti: receives the rangetable index of top_rte.  (Ditto.)
  *
- * *namespace: receives a List of ParseNamespaceItems for the RTEs exposed
+ * *namespace: receives a List of ParseNamespaceItem for the RTEs exposed
  * as table/column names by this item.  (The lateral_only flags in these items
  * are indeterminate and should be explicitly set by the caller before use.)
  */
@@ -880,9 +880,9 @@ transformFromClauseItem(ParseState *pstate, Node *n,
 		 *
 		 * Note: expandRTE returns new lists, safe for me to modify
 		 */
-		expandRTE(l_rte, l_rtindex, 0, -1, false,
+		expandRTE(l_rte, l_rtindex, 0, -1, false, true,
 				  &l_colnames, &l_colvars);
-		expandRTE(r_rte, r_rtindex, 0, -1, false,
+		expandRTE(r_rte, r_rtindex, 0, -1, false, true,
 				  &r_colnames, &r_colvars);
 
 		/*
diff --git a/src/backend/parser/parse_coerce.c b/src/backend/parser/parse_coerce.c
index a4e494b..2bfcb24 100644
--- a/src/backend/parser/parse_coerce.c
+++ b/src/backend/parser/parse_coerce.c
@@ -906,6 +906,7 @@ coerce_record_to_complex(ParseState *pstate, Node *node,
 	int			i;
 	int			ucolno;
 	ListCell   *arg;
+	Form_pg_attribute	*attrs;
 
 	if (node && IsA(node, RowExpr))
 	{
@@ -924,7 +925,7 @@ coerce_record_to_complex(ParseState *pstate, Node *node,
 		RangeTblEntry *rte;
 
 		rte = GetRTEByRangeTablePosn(pstate, rtindex, sublevels_up);
-		expandRTE(rte, rtindex, sublevels_up, vlocation, false,
+		expandRTE(rte, rtindex, sublevels_up, vlocation, false, false,
 				  NULL, &args);
 	}
 	else
@@ -939,6 +940,7 @@ coerce_record_to_complex(ParseState *pstate, Node *node,
 	newargs = NIL;
 	ucolno = 1;
 	arg = list_head(args);
+	attrs = TupleDescGetLogSortedAttrs(tupdesc);
 	for (i = 0; i < tupdesc->natts; i++)
 	{
 		Node	   *expr;
@@ -946,7 +948,7 @@ coerce_record_to_complex(ParseState *pstate, Node *node,
 		Oid			exprtype;
 
 		/* Fill in NULLs for dropped columns in rowtype */
-		if (tupdesc->attrs[i]->attisdropped)
+		if (attrs[i]->attisdropped)
 		{
 			/*
 			 * can't use atttypid here, but it doesn't really matter what type
@@ -970,8 +972,8 @@ coerce_record_to_complex(ParseState *pstate, Node *node,
 
 		cexpr = coerce_to_target_type(pstate,
 									  expr, exprtype,
-									  tupdesc->attrs[i]->atttypid,
-									  tupdesc->attrs[i]->atttypmod,
+									  attrs[i]->atttypid,
+									  attrs[i]->atttypmod,
 									  ccontext,
 									  COERCE_IMPLICIT_CAST,
 									  -1);
@@ -983,7 +985,7 @@ coerce_record_to_complex(ParseState *pstate, Node *node,
 							format_type_be(targetTypeId)),
 					 errdetail("Cannot cast type %s to %s in column %d.",
 							   format_type_be(exprtype),
-							   format_type_be(tupdesc->attrs[i]->atttypid),
+							   format_type_be(attrs[i]->atttypid),
 							   ucolno),
 					 parser_coercion_errposition(pstate, location, expr)));
 		newargs = lappend(newargs, cexpr);
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index a200804..9acf5b7 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -1759,6 +1759,7 @@ ParseComplexProjection(ParseState *pstate, char *funcname, Node *first_arg,
 {
 	TupleDesc	tupdesc;
 	int			i;
+	Form_pg_attribute *attrs;
 
 	/*
 	 * Special case for whole-row Vars so that we can resolve (foo.*).bar even
@@ -1796,9 +1797,10 @@ ParseComplexProjection(ParseState *pstate, char *funcname, Node *first_arg,
 		return NULL;			/* unresolvable RECORD type */
 	Assert(tupdesc);
 
+	attrs = TupleDescGetLogSortedAttrs(tupdesc);
 	for (i = 0; i < tupdesc->natts; i++)
 	{
-		Form_pg_attribute att = tupdesc->attrs[i];
+		Form_pg_attribute att = attrs[i];
 
 		if (strcmp(funcname, NameStr(att->attname)) == 0 &&
 			!att->attisdropped)
diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index 8d4f79f..1312a37 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -43,12 +43,12 @@ static void markRTEForSelectPriv(ParseState *pstate, RangeTblEntry *rte,
 					 int rtindex, AttrNumber col);
 static void expandRelation(Oid relid, Alias *eref,
 			   int rtindex, int sublevels_up,
-			   int location, bool include_dropped,
+			   int location, bool include_dropped, bool logical_sort,
 			   List **colnames, List **colvars);
 static void expandTupleDesc(TupleDesc tupdesc, Alias *eref,
 				int count, int offset,
 				int rtindex, int sublevels_up,
-				int location, bool include_dropped,
+				int location, bool include_dropped, bool logical_sort,
 				List **colnames, List **colvars);
 static int	specialAttNum(const char *attname);
 static bool isQueryUsingTempRelation_walker(Node *node, void *context);
@@ -519,6 +519,12 @@ GetCTEForRTE(ParseState *pstate, RangeTblEntry *rte, int rtelevelsup)
 	return NULL;				/* keep compiler quiet */
 }
 
+static int16
+get_attnum_by_lognum(RangeTblEntry *rte, int16 attlognum)
+{
+	return list_nth_int(rte->lognums, attlognum - 1);
+}
+
 /*
  * scanRTEForColumn
  *	  Search the column names of a single RTE for the given name.
@@ -527,6 +533,11 @@ GetCTEForRTE(ParseState *pstate, RangeTblEntry *rte, int rtelevelsup)
  *
  * Side effect: if we find a match, mark the RTE as requiring read access
  * for the column.
+ *
+ * XXX the coding of this routine seems to make it impossible to have
+ * attlognums using a different numbering scheme from attnums, which would be a
+ * handy tool to detect incorrect usage of either array.  Can we fix that,
+ * and are there other places that suffer from the same problem?
  */
 Node *
 scanRTEForColumn(ParseState *pstate, RangeTblEntry *rte, char *colname,
@@ -561,6 +572,13 @@ scanRTEForColumn(ParseState *pstate, RangeTblEntry *rte, char *colname,
 						 errmsg("column reference \"%s\" is ambiguous",
 								colname),
 						 parser_errposition(pstate, location)));
+			/*
+			 * If the RTE has lognums, the eref->colnames array is sorted in
+			 * logical order; in that case we need to map the attnum we have
+			 * (which is a logical attnum) to the identity one.
+			 */
+			if (rte->lognums)
+				attnum = get_attnum_by_lognum(rte, attnum);
 			var = make_var(pstate, rte, attnum, location);
 			/* Require read access to the column */
 			markVarForSelectPriv(pstate, var, rte);
@@ -830,14 +848,19 @@ markVarForSelectPriv(ParseState *pstate, Var *var, RangeTblEntry *rte)
  * empty strings for any dropped columns, so that it will be one-to-one with
  * physical column numbers.
  *
+ * If lognums is not NULL, it will be filled with a map from logical column
+ * numbers to attnum; that way, the nth element of eref->colnames corresponds
+ * to the attnum found in the nth element of lognums.
+ *
  * It is an error for there to be more aliases present than required.
  */
 static void
-buildRelationAliases(TupleDesc tupdesc, Alias *alias, Alias *eref)
+buildRelationAliases(TupleDesc tupdesc, Alias *alias, Alias *eref, List **lognums)
 {
 	int			maxattrs = tupdesc->natts;
 	ListCell   *aliaslc;
 	int			numaliases;
+	Form_pg_attribute *attrs;
 	int			varattno;
 	int			numdropped = 0;
 
@@ -856,9 +879,11 @@ buildRelationAliases(TupleDesc tupdesc, Alias *alias, Alias *eref)
 		numaliases = 0;
 	}
 
+	attrs = TupleDescGetLogSortedAttrs(tupdesc);
+
 	for (varattno = 0; varattno < maxattrs; varattno++)
 	{
-		Form_pg_attribute attr = tupdesc->attrs[varattno];
+		Form_pg_attribute attr = attrs[varattno];
 		Value	   *attrname;
 
 		if (attr->attisdropped)
@@ -883,6 +908,9 @@ buildRelationAliases(TupleDesc tupdesc, Alias *alias, Alias *eref)
 		}
 
 		eref->colnames = lappend(eref->colnames, attrname);
+
+		if (lognums)
+			*lognums = lappend_int(*lognums, attr->attnum);
 	}
 
 	/* Too many user-supplied aliases? */
@@ -1030,7 +1058,7 @@ addRangeTableEntry(ParseState *pstate,
 	 * and/or actual column names.
 	 */
 	rte->eref = makeAlias(refname, NIL);
-	buildRelationAliases(rel->rd_att, alias, rte->eref);
+	buildRelationAliases(rel->rd_att, alias, rte->eref, &rte->lognums);
 
 	/*
 	 * Drop the rel refcount, but keep the access lock till end of transaction
@@ -1090,7 +1118,7 @@ addRangeTableEntryForRelation(ParseState *pstate,
 	 * and/or actual column names.
 	 */
 	rte->eref = makeAlias(refname, NIL);
-	buildRelationAliases(rel->rd_att, alias, rte->eref);
+	buildRelationAliases(rel->rd_att, alias, rte->eref, &rte->lognums);
 
 	/*
 	 * Set flags and access permissions.
@@ -1422,7 +1450,7 @@ addRangeTableEntryForFunction(ParseState *pstate,
 	}
 
 	/* Use the tupdesc while assigning column aliases for the RTE */
-	buildRelationAliases(tupdesc, alias, eref);
+	buildRelationAliases(tupdesc, alias, eref, NULL);
 
 	/*
 	 * Set flags and access permissions.
@@ -1787,13 +1815,16 @@ addRTEtoQuery(ParseState *pstate, RangeTblEntry *rte,
  * values to use in the created Vars.  Ordinarily rtindex should match the
  * actual position of the RTE in its rangetable.
  *
+ * If logical_sort is true, then the resulting lists are sorted by logical
+ * column number (attlognum); otherwise use regular attnum.
+ *
  * The output lists go into *colnames and *colvars.
  * If only one of the two kinds of output list is needed, pass NULL for the
  * output pointer for the unwanted one.
  */
 void
 expandRTE(RangeTblEntry *rte, int rtindex, int sublevels_up,
-		  int location, bool include_dropped,
+		  int location, bool include_dropped, bool logical_sort,
 		  List **colnames, List **colvars)
 {
 	int			varattno;
@@ -1808,8 +1839,8 @@ expandRTE(RangeTblEntry *rte, int rtindex, int sublevels_up,
 		case RTE_RELATION:
 			/* Ordinary relation RTE */
 			expandRelation(rte->relid, rte->eref,
-						   rtindex, sublevels_up, location,
-						   include_dropped, colnames, colvars);
+						   rtindex, sublevels_up, location, include_dropped,
+						   logical_sort, colnames, colvars);
 			break;
 		case RTE_SUBQUERY:
 			{
@@ -1875,7 +1906,8 @@ expandRTE(RangeTblEntry *rte, int rtindex, int sublevels_up,
 						expandTupleDesc(tupdesc, rte->eref,
 										rtfunc->funccolcount, atts_done,
 										rtindex, sublevels_up, location,
-										include_dropped, colnames, colvars);
+										include_dropped, logical_sort,
+										colnames, colvars);
 					}
 					else if (functypclass == TYPEFUNC_SCALAR)
 					{
@@ -2127,7 +2159,7 @@ expandRTE(RangeTblEntry *rte, int rtindex, int sublevels_up,
  */
 static void
 expandRelation(Oid relid, Alias *eref, int rtindex, int sublevels_up,
-			   int location, bool include_dropped,
+			   int location, bool include_dropped, bool logical_sort,
 			   List **colnames, List **colvars)
 {
 	Relation	rel;
@@ -2136,7 +2168,7 @@ expandRelation(Oid relid, Alias *eref, int rtindex, int sublevels_up,
 	rel = relation_open(relid, AccessShareLock);
 	expandTupleDesc(rel->rd_att, eref, rel->rd_att->natts, 0,
 					rtindex, sublevels_up,
-					location, include_dropped,
+					location, include_dropped, logical_sort,
 					colnames, colvars);
 	relation_close(rel, AccessShareLock);
 }
@@ -2153,11 +2185,15 @@ expandRelation(Oid relid, Alias *eref, int rtindex, int sublevels_up,
 static void
 expandTupleDesc(TupleDesc tupdesc, Alias *eref, int count, int offset,
 				int rtindex, int sublevels_up,
-				int location, bool include_dropped,
+				int location, bool include_dropped, bool logical_sort,
 				List **colnames, List **colvars)
 {
 	ListCell   *aliascell = list_head(eref->colnames);
-	int			varattno;
+	int			attnum;
+	Form_pg_attribute *attrs;
+
+	attrs = (logical_sort ? TupleDescGetLogSortedAttrs(tupdesc) :
+			 tupdesc->attrs);
 
 	if (colnames)
 	{
@@ -2171,9 +2207,10 @@ expandTupleDesc(TupleDesc tupdesc, Alias *eref, int count, int offset,
 	}
 
 	Assert(count <= tupdesc->natts);
-	for (varattno = 0; varattno < count; varattno++)
+	for (attnum = 0; attnum < count; attnum++)
 	{
-		Form_pg_attribute attr = tupdesc->attrs[varattno];
+		Form_pg_attribute attr = attrs[attnum];
+		int		varattno = attr->attnum - 1;
 
 		if (attr->attisdropped)
 		{
@@ -2221,6 +2258,8 @@ expandTupleDesc(TupleDesc tupdesc, Alias *eref, int count, int offset,
 							  attr->atttypid, attr->atttypmod,
 							  attr->attcollation,
 							  sublevels_up);
+
+			varnode->varphysno = InvalidAttrNumber;
 			varnode->location = location;
 
 			*colvars = lappend(*colvars, varnode);
@@ -2240,7 +2279,7 @@ expandTupleDesc(TupleDesc tupdesc, Alias *eref, int count, int offset,
  */
 List *
 expandRelAttrs(ParseState *pstate, RangeTblEntry *rte,
-			   int rtindex, int sublevels_up, int location)
+			   int rtindex, int sublevels_up, bool logical_sort, int location)
 {
 	List	   *names,
 			   *vars;
@@ -2248,7 +2287,7 @@ expandRelAttrs(ParseState *pstate, RangeTblEntry *rte,
 			   *var;
 	List	   *te_list = NIL;
 
-	expandRTE(rte, rtindex, sublevels_up, location, false,
+	expandRTE(rte, rtindex, sublevels_up, location, false, logical_sort,
 			  &names, &vars);
 
 	/*
@@ -2301,7 +2340,10 @@ get_rte_attribute_name(RangeTblEntry *rte, AttrNumber attnum)
 	 */
 	if (rte->alias &&
 		attnum > 0 && attnum <= list_length(rte->alias->colnames))
+	{
+		/* FIXME change attnum to lognum! */
 		return strVal(list_nth(rte->alias->colnames, attnum - 1));
+	}
 
 	/*
 	 * If the RTE is a relation, go to the system catalogs not the
@@ -2408,6 +2450,7 @@ get_rte_attribute_type(RangeTblEntry *rte, AttrNumber attnum,
 
 							Assert(tupdesc);
 							Assert(attnum <= tupdesc->natts);
+							/* FIXME map using lognums?? */
 							att_tup = tupdesc->attrs[attnum - 1];
 
 							/*
@@ -2604,6 +2647,7 @@ get_rte_attribute_is_dropped(RangeTblEntry *rte, AttrNumber attnum)
 
 							Assert(tupdesc);
 							Assert(attnum - atts_done <= tupdesc->natts);
+							/* FIXME -- map using lognums? */
 							att_tup = tupdesc->attrs[attnum - atts_done - 1];
 							return att_tup->attisdropped;
 						}
@@ -2696,7 +2740,7 @@ attnameAttNum(Relation rd, const char *attname, bool sysColOK)
 		Form_pg_attribute att = rd->rd_att->attrs[i];
 
 		if (namestrcmp(&(att->attname), attname) == 0 && !att->attisdropped)
-			return i + 1;
+			return att->attnum;
 	}
 
 	if (sysColOK)
diff --git a/src/backend/parser/parse_target.c b/src/backend/parser/parse_target.c
index 3724330..9edc822 100644
--- a/src/backend/parser/parse_target.c
+++ b/src/backend/parser/parse_target.c
@@ -896,7 +896,7 @@ checkInsertTargets(ParseState *pstate, List *cols, List **attrnos)
 		/*
 		 * Generate default column list for INSERT.
 		 */
-		Form_pg_attribute *attr = pstate->p_target_relation->rd_att->attrs;
+		Form_pg_attribute *attr = TupleDescGetLogSortedAttrs(pstate->p_target_relation->rd_att);
 		int			numcol = pstate->p_target_relation->rd_rel->relnatts;
 		int			i;
 
@@ -913,7 +913,7 @@ checkInsertTargets(ParseState *pstate, List *cols, List **attrnos)
 			col->val = NULL;
 			col->location = -1;
 			cols = lappend(cols, col);
-			*attrnos = lappend_int(*attrnos, i + 1);
+			*attrnos = lappend_int(*attrnos, attr[i]->attnum);
 		}
 	}
 	else
@@ -931,7 +931,7 @@ checkInsertTargets(ParseState *pstate, List *cols, List **attrnos)
 			char	   *name = col->name;
 			int			attrno;
 
-			/* Lookup column name, ereport on failure */
+			/* Lookup column number, ereport on failure */
 			attrno = attnameAttNum(pstate->p_target_relation, name, false);
 			if (attrno == InvalidAttrNumber)
 				ereport(ERROR,
@@ -1184,6 +1184,7 @@ ExpandAllTables(ParseState *pstate, int location)
 											RTERangeTablePosn(pstate, rte,
 															  NULL),
 											0,
+											true,
 											location));
 	}
 
@@ -1252,14 +1253,14 @@ ExpandSingleTable(ParseState *pstate, RangeTblEntry *rte,
 	{
 		/* expandRelAttrs handles permissions marking */
 		return expandRelAttrs(pstate, rte, rtindex, sublevels_up,
-							  location);
+							  true, location);
 	}
 	else
 	{
 		List	   *vars;
 		ListCell   *l;
 
-		expandRTE(rte, rtindex, sublevels_up, location, false,
+		expandRTE(rte, rtindex, sublevels_up, location, false, true,
 				  NULL, &vars);
 
 		/*
@@ -1296,6 +1297,7 @@ ExpandRowReference(ParseState *pstate, Node *expr,
 	TupleDesc	tupleDesc;
 	int			numAttrs;
 	int			i;
+	Form_pg_attribute *attr;
 
 	/*
 	 * If the rowtype expression is a whole-row Var, we can expand the fields
@@ -1342,9 +1344,10 @@ ExpandRowReference(ParseState *pstate, Node *expr,
 
 	/* Generate a list of references to the individual fields */
 	numAttrs = tupleDesc->natts;
+	attr = TupleDescGetLogSortedAttrs(tupleDesc);
 	for (i = 0; i < numAttrs; i++)
 	{
-		Form_pg_attribute att = tupleDesc->attrs[i];
+		Form_pg_attribute att = attr[i];
 		FieldSelect *fselect;
 
 		if (att->attisdropped)
@@ -1352,7 +1355,7 @@ ExpandRowReference(ParseState *pstate, Node *expr,
 
 		fselect = makeNode(FieldSelect);
 		fselect->arg = (Expr *) copyObject(expr);
-		fselect->fieldnum = i + 1;
+		fselect->fieldnum = att->attnum;
 		fselect->resulttype = att->atttypid;
 		fselect->resulttypmod = att->atttypmod;
 		/* save attribute's collation for parse_collate.c */
@@ -1413,7 +1416,7 @@ expandRecordVariable(ParseState *pstate, Var *var, int levelsup)
 				   *lvar;
 		int			i;
 
-		expandRTE(rte, var->varno, 0, var->location, false,
+		expandRTE(rte, var->varno, 0, var->location, false, false,
 				  &names, &vars);
 
 		tupleDesc = CreateTemplateTupleDesc(list_length(vars), false);
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index df45708..df659ad 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -1327,7 +1327,7 @@ ReplaceVarsFromTargetList_callback(Var *var,
 		 */
 		expandRTE(rcon->target_rte,
 				  var->varno, var->varlevelsup, var->location,
-				  (var->vartype != RECORDOID),
+				  (var->vartype != RECORDOID), false,
 				  &colnames, &fields);
 		/* Adjust the generated per-field Vars... */
 		fields = (List *) replace_rte_variables_mutator((Node *) fields,
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index a65e18d..4df1dd9 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -89,6 +89,7 @@ record_in(PG_FUNCTION_ARGS)
 	Datum	   *values;
 	bool	   *nulls;
 	StringInfoData buf;
+	Form_pg_attribute *attrs;
 
 	/*
 	 * Use the passed type unless it's RECORD; we can't support input of
@@ -138,6 +139,8 @@ record_in(PG_FUNCTION_ARGS)
 		my_extra->ncolumns = ncolumns;
 	}
 
+	attrs = TupleDescGetLogSortedAttrs(tupdesc);
+
 	values = (Datum *) palloc(ncolumns * sizeof(Datum));
 	nulls = (bool *) palloc(ncolumns * sizeof(bool));
 
@@ -159,15 +162,17 @@ record_in(PG_FUNCTION_ARGS)
 
 	for (i = 0; i < ncolumns; i++)
 	{
-		ColumnIOData *column_info = &my_extra->columns[i];
-		Oid			column_type = tupdesc->attrs[i]->atttypid;
+		Form_pg_attribute	attr = attrs[i];
+		int16		attnum = attr->attnum - 1;
+		ColumnIOData *column_info = &my_extra->columns[attnum];
+		Oid			column_type = attr->atttypid;
 		char	   *column_data;
 
 		/* Ignore dropped columns in datatype, but fill with nulls */
-		if (tupdesc->attrs[i]->attisdropped)
+		if (attr->attisdropped)
 		{
-			values[i] = (Datum) 0;
-			nulls[i] = true;
+			values[attnum] = (Datum) 0;
+			nulls[attnum] = true;
 			continue;
 		}
 
@@ -188,7 +193,7 @@ record_in(PG_FUNCTION_ARGS)
 		if (*ptr == ',' || *ptr == ')')
 		{
 			column_data = NULL;
-			nulls[i] = true;
+			nulls[attnum] = true;
 		}
 		else
 		{
@@ -233,7 +238,7 @@ record_in(PG_FUNCTION_ARGS)
 			}
 
 			column_data = buf.data;
-			nulls[i] = false;
+			nulls[attnum] = false;
 		}
 
 		/*
@@ -249,10 +254,10 @@ record_in(PG_FUNCTION_ARGS)
 			column_info->column_type = column_type;
 		}
 
-		values[i] = InputFunctionCall(&column_info->proc,
-									  column_data,
-									  column_info->typioparam,
-									  tupdesc->attrs[i]->atttypmod);
+		values[attnum] = InputFunctionCall(&column_info->proc,
+										   column_data,
+										   column_info->typioparam,
+										   attr->atttypmod);
 
 		/*
 		 * Prep for next column
@@ -311,6 +316,7 @@ record_out(PG_FUNCTION_ARGS)
 	Datum	   *values;
 	bool	   *nulls;
 	StringInfoData buf;
+	Form_pg_attribute	*attrs;
 
 	/* Extract type info from the tuple itself */
 	tupType = HeapTupleHeaderGetTypeId(rec);
@@ -352,6 +358,8 @@ record_out(PG_FUNCTION_ARGS)
 		my_extra->ncolumns = ncolumns;
 	}
 
+	attrs = TupleDescGetLogSortedAttrs(tupdesc);
+
 	values = (Datum *) palloc(ncolumns * sizeof(Datum));
 	nulls = (bool *) palloc(ncolumns * sizeof(bool));
 
@@ -365,22 +373,24 @@ record_out(PG_FUNCTION_ARGS)
 
 	for (i = 0; i < ncolumns; i++)
 	{
-		ColumnIOData *column_info = &my_extra->columns[i];
-		Oid			column_type = tupdesc->attrs[i]->atttypid;
+		Form_pg_attribute attrib = attrs[i];
+		int16		attnum = attrib->attnum - 1;
+		ColumnIOData *column_info = &my_extra->columns[attnum];
+		Oid			column_type = attrib->atttypid;
 		Datum		attr;
 		char	   *value;
 		char	   *tmp;
 		bool		nq;
 
 		/* Ignore dropped columns in datatype */
-		if (tupdesc->attrs[i]->attisdropped)
+		if (attrib->attisdropped)
 			continue;
 
 		if (needComma)
 			appendStringInfoChar(&buf, ',');
 		needComma = true;
 
-		if (nulls[i])
+		if (nulls[attnum])
 		{
 			/* emit nothing... */
 			continue;
@@ -399,7 +409,7 @@ record_out(PG_FUNCTION_ARGS)
 			column_info->column_type = column_type;
 		}
 
-		attr = values[i];
+		attr = values[attnum];
 		value = OutputFunctionCall(&column_info->proc, attr);
 
 		/* Detect whether we need double quotes for this value */
@@ -464,6 +474,7 @@ record_recv(PG_FUNCTION_ARGS)
 	int			i;
 	Datum	   *values;
 	bool	   *nulls;
+	Form_pg_attribute *attrs;
 
 	/*
 	 * Use the passed type unless it's RECORD; we can't support input of
@@ -507,6 +518,7 @@ record_recv(PG_FUNCTION_ARGS)
 		my_extra->ncolumns = ncolumns;
 	}
 
+	attrs = TupleDescGetLogSortedAttrs(tupdesc);
 	values = (Datum *) palloc(ncolumns * sizeof(Datum));
 	nulls = (bool *) palloc(ncolumns * sizeof(bool));
 
@@ -529,8 +541,10 @@ record_recv(PG_FUNCTION_ARGS)
 	/* Process each column */
 	for (i = 0; i < ncolumns; i++)
 	{
-		ColumnIOData *column_info = &my_extra->columns[i];
-		Oid			column_type = tupdesc->attrs[i]->atttypid;
+		Form_pg_attribute   attr = attrs[i];
+		int16       attnum = attr->attnum - 1;
+		ColumnIOData *column_info = &my_extra->columns[attnum];
+		Oid			column_type = attr->atttypid;
 		Oid			coltypoid;
 		int			itemlen;
 		StringInfoData item_buf;
@@ -538,10 +552,10 @@ record_recv(PG_FUNCTION_ARGS)
 		char		csave;
 
 		/* Ignore dropped columns in datatype, but fill with nulls */
-		if (tupdesc->attrs[i]->attisdropped)
+		if (attr->attisdropped)
 		{
-			values[i] = (Datum) 0;
-			nulls[i] = true;
+			values[attnum] = (Datum) 0;
+			nulls[attnum] = true;
 			continue;
 		}
 
@@ -564,7 +578,7 @@ record_recv(PG_FUNCTION_ARGS)
 		{
 			/* -1 length means NULL */
 			bufptr = NULL;
-			nulls[i] = true;
+			nulls[attnum] = true;
 			csave = 0;			/* keep compiler quiet */
 		}
 		else
@@ -586,7 +600,7 @@ record_recv(PG_FUNCTION_ARGS)
 			buf->data[buf->cursor] = '\0';
 
 			bufptr = &item_buf;
-			nulls[i] = false;
+			nulls[attnum] = false;
 		}
 
 		/* Now call the column's receiveproc */
@@ -600,10 +614,10 @@ record_recv(PG_FUNCTION_ARGS)
 			column_info->column_type = column_type;
 		}
 
-		values[i] = ReceiveFunctionCall(&column_info->proc,
-										bufptr,
-										column_info->typioparam,
-										tupdesc->attrs[i]->atttypmod);
+		values[attnum] = ReceiveFunctionCall(&column_info->proc,
+											 bufptr,
+											 column_info->typioparam,
+											 attr->atttypmod);
 
 		if (bufptr)
 		{
@@ -654,6 +668,7 @@ record_send(PG_FUNCTION_ARGS)
 	Datum	   *values;
 	bool	   *nulls;
 	StringInfoData buf;
+	Form_pg_attribute	*attrs;
 
 	/* Extract type info from the tuple itself */
 	tupType = HeapTupleHeaderGetTypeId(rec);
@@ -695,6 +710,8 @@ record_send(PG_FUNCTION_ARGS)
 		my_extra->ncolumns = ncolumns;
 	}
 
+	attrs = TupleDescGetLogSortedAttrs(tupdesc);
+
 	values = (Datum *) palloc(ncolumns * sizeof(Datum));
 	nulls = (bool *) palloc(ncolumns * sizeof(bool));
 
@@ -715,13 +732,15 @@ record_send(PG_FUNCTION_ARGS)
 
 	for (i = 0; i < ncolumns; i++)
 	{
-		ColumnIOData *column_info = &my_extra->columns[i];
-		Oid			column_type = tupdesc->attrs[i]->atttypid;
+		Form_pg_attribute attrib = attrs[i];
+		int16		attnum = attrib->attnum - 1;
+		ColumnIOData *column_info = &my_extra->columns[attnum];
+		Oid			column_type = tupdesc->attrs[attnum]->atttypid;
 		Datum		attr;
 		bytea	   *outputbytes;
 
 		/* Ignore dropped columns in datatype */
-		if (tupdesc->attrs[i]->attisdropped)
+		if (attrib->attisdropped)
 			continue;
 
 		pq_sendint(&buf, column_type, sizeof(Oid));
@@ -746,7 +765,7 @@ record_send(PG_FUNCTION_ARGS)
 			column_info->column_type = column_type;
 		}
 
-		attr = values[i];
+		attr = values[attnum];
 		outputbytes = SendFunctionCall(&column_info->proc, attr);
 		pq_sendint(&buf, VARSIZE(outputbytes) - VARHDRSZ, 4);
 		pq_sendbytes(&buf, VARDATA(outputbytes),
diff --git a/src/include/access/htup_details.h b/src/include/access/htup_details.h
index 0a673cd..c431925 100644
--- a/src/include/access/htup_details.h
+++ b/src/include/access/htup_details.h
@@ -740,14 +740,15 @@ extern HeapTuple heap_copytuple(HeapTuple tuple);
 extern void heap_copytuple_with_tuple(HeapTuple src, HeapTuple dest);
 extern Datum heap_copy_tuple_as_datum(HeapTuple tuple, TupleDesc tupleDesc);
 extern HeapTuple heap_form_tuple(TupleDesc tupleDescriptor,
-				Datum *values, bool *isnull);
+						 Datum *values, bool *isnull);
 extern HeapTuple heap_modify_tuple(HeapTuple tuple,
 				  TupleDesc tupleDesc,
 				  Datum *replValues,
 				  bool *replIsnull,
 				  bool *doReplace);
 extern void heap_deform_tuple(HeapTuple tuple, TupleDesc tupleDesc,
-				  Datum *values, bool *isnull);
+						   Datum *values, bool *isnull);
+
 
 /* these three are deprecated versions of the three above: */
 extern HeapTuple heap_formtuple(TupleDesc tupleDescriptor,
diff --git a/src/include/access/tupdesc.h b/src/include/access/tupdesc.h
index 91b0034..bb88ce4 100644
--- a/src/include/access/tupdesc.h
+++ b/src/include/access/tupdesc.h
@@ -60,6 +60,14 @@ typedef struct tupleConstr
  * row type, or a value >= 0 to allow the rowtype to be looked up in the
  * typcache.c type cache.
  *
+ * For descriptors coming out of catalogued relations, it is possible to obtain
+ * an array of attributes sorted by attlognum and attphysnum.  The attlognum
+ * one helps *-expansion, among other things; the attphysnum one is useful for
+ * encoding and decoding tuples to and from the on-disk representation.  The
+ * arrays are initially set to NULL, and are only populated on first access;
+ * those wanting to access it should always do it through
+ * TupleDescGetLogSortedAttrs / TupleDescGetPhysSortedAttrs.
+ *
  * Tuple descriptors that live in caches (relcache or typcache, at present)
  * are reference-counted: they can be deleted when their reference count goes
  * to zero.  Tuple descriptors created by the executor need no reference
@@ -73,6 +81,8 @@ typedef struct tupleDesc
 	int			natts;			/* number of attributes in the tuple */
 	Form_pg_attribute *attrs;
 	/* attrs[N] is a pointer to the description of Attribute Number N+1 */
+	Form_pg_attribute *logattrs;	/* array of attributes sorted by attlognum */
+	Form_pg_attribute *physattrs;	/* array of attributes sorted by attphysnum */
 	TupleConstr *constr;		/* constraints, or NULL if none */
 	Oid			tdtypeid;		/* composite type ID for tuple type */
 	int32		tdtypmod;		/* typmod for tuple type */
@@ -123,8 +133,16 @@ extern void TupleDescInitEntryCollation(TupleDesc desc,
 							AttrNumber attributeNumber,
 							Oid collationid);
 
+extern void TupleDescInitEntryPhysicalPosition(TupleDesc desc,
+								   AttrNumber attributeNumber,
+								   AttrNumber attphysnum);
+
 extern TupleDesc BuildDescForRelation(List *schema);
 
 extern TupleDesc BuildDescFromLists(List *names, List *types, List *typmods, List *collations);
 
+extern Form_pg_attribute *TupleDescGetLogSortedAttrs(TupleDesc desc);
+
+extern Form_pg_attribute *TupleDescGetPhysSortedAttrs(TupleDesc desc);
+
 #endif   /* TUPDESC_H */
diff --git a/src/include/access/tupmacs.h b/src/include/access/tupmacs.h
index 2f84fee..2dfe913 100644
--- a/src/include/access/tupmacs.h
+++ b/src/include/access/tupmacs.h
@@ -17,6 +17,8 @@
 
 /*
  * check to see if the ATT'th bit of an array of 8-bit bytes is set.
+ *
+ * Note that the index into the nulls array is attnum-1.
  */
 #define att_isnull(ATT, BITS) (!((BITS)[(ATT) >> 3] & (1 << ((ATT) & 0x07))))
 
diff --git a/src/include/catalog/pg_attribute.h b/src/include/catalog/pg_attribute.h
index 87a3462..ba10a01 100644
--- a/src/include/catalog/pg_attribute.h
+++ b/src/include/catalog/pg_attribute.h
@@ -63,19 +63,26 @@ CATALOG(pg_attribute,1249) BKI_BOOTSTRAP BKI_WITHOUT_OIDS BKI_ROWTYPE_OID(75) BK
 	int16		attlen;
 
 	/*
-	 * attnum is the "attribute number" for the attribute:	A value that
-	 * uniquely identifies this attribute within its class. For user
-	 * attributes, Attribute numbers are greater than 0 and not greater than
-	 * the number of attributes in the class. I.e. if the Class pg_class says
-	 * that Class XYZ has 10 attributes, then the user attribute numbers in
-	 * Class pg_attribute must be 1-10.
-	 *
+	 * attnum uniquely identifies the column within its class, throughout its
+	 * lifetime.  For user attributes, Attribute numbers are greater than 0 and
+	 * less than or equal to the number of attributes in the class. For
+	 * instance, if the Class pg_class says that Class XYZ has 10 attributes,
+	 * then the user attribute numbers in Class pg_attribute must be 1-10.
 	 * System attributes have attribute numbers less than 0 that are unique
 	 * within the class, but not constrained to any particular range.
 	 *
-	 * Note that (attnum - 1) is often used as the index to an array.
+	 * attphysnum (physical position) specifies the position in which the
+	 * column is stored in physical tuples.  This might differ from attnum if
+	 * there are useful optimizations in storage space, for example alignment
+	 * considerations.
+	 *
+	 * attlognum (logical position) specifies the position in which the column
+	 * is expanded in "SELECT * FROM rel" and any other query where the column
+	 * ordering is user-visible.
 	 */
 	int16		attnum;
+	int16		attphysnum;
+	int16		attlognum;
 
 	/*
 	 * attndims is the declared number of dimensions, if an array type,
@@ -188,28 +195,31 @@ typedef FormData_pg_attribute *Form_pg_attribute;
  * ----------------
  */
 
-#define Natts_pg_attribute				21
+#define Natts_pg_attribute				23
 #define Anum_pg_attribute_attrelid		1
 #define Anum_pg_attribute_attname		2
 #define Anum_pg_attribute_atttypid		3
 #define Anum_pg_attribute_attstattarget 4
 #define Anum_pg_attribute_attlen		5
 #define Anum_pg_attribute_attnum		6
-#define Anum_pg_attribute_attndims		7
-#define Anum_pg_attribute_attcacheoff	8
-#define Anum_pg_attribute_atttypmod		9
-#define Anum_pg_attribute_attbyval		10
-#define Anum_pg_attribute_attstorage	11
-#define Anum_pg_attribute_attalign		12
-#define Anum_pg_attribute_attnotnull	13
-#define Anum_pg_attribute_atthasdef		14
-#define Anum_pg_attribute_attisdropped	15
-#define Anum_pg_attribute_attislocal	16
-#define Anum_pg_attribute_attinhcount	17
-#define Anum_pg_attribute_attcollation	18
-#define Anum_pg_attribute_attacl		19
-#define Anum_pg_attribute_attoptions	20
-#define Anum_pg_attribute_attfdwoptions 21
+#define Anum_pg_attribute_attphysnum	7
+#define Anum_pg_attribute_attlognum		8
+#define Anum_pg_attribute_attndims		9
+#define Anum_pg_attribute_attcacheoff	10
+#define Anum_pg_attribute_atttypmod		11
+#define Anum_pg_attribute_attbyval		12
+#define Anum_pg_attribute_attstorage	13
+#define Anum_pg_attribute_attalign		14
+#define Anum_pg_attribute_attnotnull	15
+#define Anum_pg_attribute_atthasdef		16
+#define Anum_pg_attribute_attisdropped	17
+#define Anum_pg_attribute_attislocal	18
+#define Anum_pg_attribute_attinhcount	19
+#define Anum_pg_attribute_attcollation	20
+#define Anum_pg_attribute_attacl		21
+#define Anum_pg_attribute_attoptions	22
+#define Anum_pg_attribute_attfdwoptions	23
+
 
 
 /* ----------------
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 8b4c35c..c5f0411 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -142,7 +142,7 @@ typedef FormData_pg_class *Form_pg_class;
  */
 DATA(insert OID = 1247 (  pg_type		PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f f t n 3 1 _null_ _null_ ));
 DESCR("");
-DATA(insert OID = 1249 (  pg_attribute	PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f f t n 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 (  pg_attribute	PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 23 0 f f f f f f t n 3 1 _null_ _null_ ));
 DESCR("");
 DATA(insert OID = 1255 (  pg_proc		PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f f t n 3 1 _null_ _null_ ));
 DESCR("");
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index 48f84bf..9abc3f4 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -89,10 +89,21 @@
  * buffer page.)
  *
  * tts_nvalid indicates the number of valid columns in the tts_values/isnull
+ * arrays.  When the slot is holding a "virtual" tuple this must be equal to the
+ * descriptor's natts.  When the slot is holding a physical tuple this is equal
+ * to the latest column to which we have fully extracted, that is, there are no
+ * "holes" at the left of this column.  Since the disposition of attributes in
+ * the physical storage might not match the ordering of the attributes in the
+ * tuple descriptor, we keep a separate tts_nphysvalid counter which determines
+ * the point up to which we have physically extracted the values.
+ *
+ * tts_nvalid indicates the number of valid columns in the tts_values/isnull
  * arrays.  When the slot is holding a "virtual" tuple this must be equal
  * to the descriptor's natts.  When the slot is holding a physical tuple
  * this is equal to the number of columns we have extracted (we always
- * extract columns from left to right, so there are no holes).
+ * extract columns from left to right, so there are no holes).  Note that since
+ * the tts_values/tts_isnull arrays follow physical ordering, tts_nvalid
+ * is an attphysnum.
  *
  * tts_values/tts_isnull are allocated when a descriptor is assigned to the
  * slot; they are of length equal to the descriptor's natts.
@@ -122,6 +133,7 @@ typedef struct TupleTableSlot
 	MemoryContext tts_mcxt;		/* slot itself is in this context */
 	Buffer		tts_buffer;		/* tuple's buffer, or InvalidBuffer */
 	int			tts_nvalid;		/* # of valid values in tts_values */
+	int			tts_nphysvalid;	/* # of values actually decoded */
 	Datum	   *tts_values;		/* current per-attribute values */
 	bool	   *tts_isnull;		/* current per-attribute isnull flags */
 	MinimalTuple tts_mintuple;	/* minimal tuple, or NULL if none */
@@ -165,9 +177,11 @@ extern TupleTableSlot *ExecCopySlot(TupleTableSlot *dstslot,
 			 TupleTableSlot *srcslot);
 
 /* in access/common/heaptuple.c */
-extern Datum slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull);
+extern Datum slot_getattr(TupleTableSlot *slot, AttrNumber attnum, bool *isnull);
 extern void slot_getallattrs(TupleTableSlot *slot);
 extern void slot_getsomeattrs(TupleTableSlot *slot, int attnum);
 extern bool slot_attisnull(TupleTableSlot *slot, int attnum);
+extern void slot_sort_datumarrays(TupleTableSlot *slot, Datum **values,
+					  bool **isnull);
 
 #endif   /* TUPTABLE_H */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index ac13302..1f9a453 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -754,10 +754,12 @@ typedef struct RangeTblEntry
 	 */
 
 	/*
-	 * Fields valid for a plain relation RTE (else zero):
+	 * Fields valid for a plain relation RTE (else zero/NIL):
 	 */
 	Oid			relid;			/* OID of the relation */
 	char		relkind;		/* relation kind (see pg_class.relkind) */
+	List	   *lognums;		/* int list of logical column numbers */
+	AttrNumber *physnums;		/* mapping attnum => attphysnum (NIL) */
 
 	/*
 	 * Fields valid for a subquery RTE (else NULL):
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index dbc5a35..910141e 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -146,8 +146,9 @@ typedef struct Var
 	Expr		xpr;
 	Index		varno;			/* index of this var's relation in the range
 								 * table, or INNER_VAR/OUTER_VAR/INDEX_VAR */
-	AttrNumber	varattno;		/* attribute number of this var, or zero for
-								 * all */
+	AttrNumber	varattno;		/* identity attribute number (attnum) of this
+								 * var, or zero for all */
+	AttrNumber	varphysno;		/* physical position of column in table */
 	Oid			vartype;		/* pg_type OID for the type of this var */
 	int32		vartypmod;		/* pg_attribute typmod value */
 	Oid			varcollid;		/* OID of collation, or InvalidOid if none */
@@ -1212,6 +1213,7 @@ typedef struct TargetEntry
 								 * clause */
 	Oid			resorigtbl;		/* OID of column's source table */
 	AttrNumber	resorigcol;		/* column's number in source table */
+	AttrNumber	resorigphyscol;	/* column's physical position in source table */
 	bool		resjunk;		/* set to true to eliminate the attribute from
 								 * final target list */
 } TargetEntry;
diff --git a/src/include/parser/parse_relation.h b/src/include/parser/parse_relation.h
index c886335..9c85509 100644
--- a/src/include/parser/parse_relation.h
+++ b/src/include/parser/parse_relation.h
@@ -89,10 +89,10 @@ extern void errorMissingRTE(ParseState *pstate, RangeVar *relation) __attribute_
 extern void errorMissingColumn(ParseState *pstate,
 	   char *relname, char *colname, int location) __attribute__((noreturn));
 extern void expandRTE(RangeTblEntry *rte, int rtindex, int sublevels_up,
-		  int location, bool include_dropped,
+		  int location, bool include_dropped, bool logical_sort,
 		  List **colnames, List **colvars);
 extern List *expandRelAttrs(ParseState *pstate, RangeTblEntry *rte,
-			   int rtindex, int sublevels_up, int location);
+			   int rtindex, int sublevels_up, bool logical_sort, int location);
 extern int	attnameAttNum(Relation rd, const char *attname, bool sysColOK);
 extern Name attnumAttName(Relation rd, int attid);
 extern Oid	attnumTypeId(Relation rd, int attid);
diff --git a/src/test/regress/expected/col_order.out b/src/test/regress/expected/col_order.out
new file mode 100644
index 0000000..45d6918
--- /dev/null
+++ b/src/test/regress/expected/col_order.out
@@ -0,0 +1,286 @@
+drop table if exists foo, bar, baz cascade;
+NOTICE:  table "foo" does not exist, skipping
+NOTICE:  table "bar" does not exist, skipping
+NOTICE:  table "baz" does not exist, skipping
+create table foo (
+	a int default 42,
+	b timestamp default '1975-02-15 12:00',
+	c text);
+insert into foo values (142857, '1888-04-29', 'hello world');
+begin;
+update pg_attribute set attlognum = 1 where attname = 'c' and attrelid = 'foo'::regclass;
+update pg_attribute set attlognum = 2 where attname = 'a' and attrelid = 'foo'::regclass;
+update pg_attribute set attlognum = 3 where attname = 'b' and attrelid = 'foo'::regclass;
+commit;
+insert into foo values ('column c', 123, '2010-03-03 10:10:10');
+insert into foo (c, a, b) values ('c again', 456, '2010-03-03 11:12:13');
+insert into foo values ('and c', 789);	-- defaults column b
+insert into foo (c, b) values ('the c', '1975-01-10 08:00');	-- defaults column a
+select * from foo;
+      c      |   a    |            b             
+-------------+--------+--------------------------
+ hello world | 142857 | Sun Apr 29 00:00:00 1888
+ column c    |    123 | Wed Mar 03 10:10:10 2010
+ c again     |    456 | Wed Mar 03 11:12:13 2010
+ and c       |    789 | Sat Feb 15 12:00:00 1975
+ the c       |     42 | Fri Jan 10 08:00:00 1975
+(5 rows)
+
+select foo from foo;
+                        foo                        
+---------------------------------------------------
+ ("hello world",142857,"Sun Apr 29 00:00:00 1888")
+ ("column c",123,"Wed Mar 03 10:10:10 2010")
+ ("c again",456,"Wed Mar 03 11:12:13 2010")
+ ("and c",789,"Sat Feb 15 12:00:00 1975")
+ ("the c",42,"Fri Jan 10 08:00:00 1975")
+(5 rows)
+
+select foo.* from foo;
+      c      |   a    |            b             
+-------------+--------+--------------------------
+ hello world | 142857 | Sun Apr 29 00:00:00 1888
+ column c    |    123 | Wed Mar 03 10:10:10 2010
+ c again     |    456 | Wed Mar 03 11:12:13 2010
+ and c       |    789 | Sat Feb 15 12:00:00 1975
+ the c       |     42 | Fri Jan 10 08:00:00 1975
+(5 rows)
+
+select a,c,b from foo;
+   a    |      c      |            b             
+--------+-------------+--------------------------
+ 142857 | hello world | Sun Apr 29 00:00:00 1888
+    123 | column c    | Wed Mar 03 10:10:10 2010
+    456 | c again     | Wed Mar 03 11:12:13 2010
+    789 | and c       | Sat Feb 15 12:00:00 1975
+     42 | the c       | Fri Jan 10 08:00:00 1975
+(5 rows)
+
+select c,b,a from foo;
+      c      |            b             |   a    
+-------------+--------------------------+--------
+ hello world | Sun Apr 29 00:00:00 1888 | 142857
+ column c    | Wed Mar 03 10:10:10 2010 |    123
+ c again     | Wed Mar 03 11:12:13 2010 |    456
+ and c       | Sat Feb 15 12:00:00 1975 |    789
+ the c       | Fri Jan 10 08:00:00 1975 |     42
+(5 rows)
+
+select a from foo;
+   a    
+--------
+ 142857
+    123
+    456
+    789
+     42
+(5 rows)
+
+select b from foo;
+            b             
+--------------------------
+ Sun Apr 29 00:00:00 1888
+ Wed Mar 03 10:10:10 2010
+ Wed Mar 03 11:12:13 2010
+ Sat Feb 15 12:00:00 1975
+ Fri Jan 10 08:00:00 1975
+(5 rows)
+
+select c from foo;
+      c      
+-------------
+ hello world
+ column c
+ c again
+ and c
+ the c
+(5 rows)
+
+select (foo).* from foo;
+      c      |   a    |            b             
+-------------+--------+--------------------------
+ hello world | 142857 | Sun Apr 29 00:00:00 1888
+ column c    |    123 | Wed Mar 03 10:10:10 2010
+ c again     |    456 | Wed Mar 03 11:12:13 2010
+ and c       |    789 | Sat Feb 15 12:00:00 1975
+ the c       |     42 | Fri Jan 10 08:00:00 1975
+(5 rows)
+
+select ROW((foo).*) from foo;
+                        row                        
+---------------------------------------------------
+ ("hello world",142857,"Sun Apr 29 00:00:00 1888")
+ ("column c",123,"Wed Mar 03 10:10:10 2010")
+ ("c again",456,"Wed Mar 03 11:12:13 2010")
+ ("and c",789,"Sat Feb 15 12:00:00 1975")
+ ("the c",42,"Fri Jan 10 08:00:00 1975")
+(5 rows)
+
+select ROW((foo).*)::foo from foo;
+                        row                        
+---------------------------------------------------
+ ("hello world",142857,"Sun Apr 29 00:00:00 1888")
+ ("column c",123,"Wed Mar 03 10:10:10 2010")
+ ("c again",456,"Wed Mar 03 11:12:13 2010")
+ ("and c",789,"Sat Feb 15 12:00:00 1975")
+ ("the c",42,"Fri Jan 10 08:00:00 1975")
+(5 rows)
+
+select (ROW((foo).*)::foo).* from foo;
+      c      |   a    |            b             
+-------------+--------+--------------------------
+ hello world | 142857 | Sun Apr 29 00:00:00 1888
+ column c    |    123 | Wed Mar 03 10:10:10 2010
+ c again     |    456 | Wed Mar 03 11:12:13 2010
+ and c       |    789 | Sat Feb 15 12:00:00 1975
+ the c       |     42 | Fri Jan 10 08:00:00 1975
+(5 rows)
+
+create function f() returns setof foo language sql as $$
+select * from foo;
+$$;
+select * from f();
+      c      |   a    |            b             
+-------------+--------+--------------------------
+ hello world | 142857 | Sun Apr 29 00:00:00 1888
+ column c    |    123 | Wed Mar 03 10:10:10 2010
+ c again     |    456 | Wed Mar 03 11:12:13 2010
+ and c       |    789 | Sat Feb 15 12:00:00 1975
+ the c       |     42 | Fri Jan 10 08:00:00 1975
+(5 rows)
+
+insert into foo
+	select (row('ah', 1126, '2012-10-15')::foo).*
+	returning *;
+ c  |  a   |            b             
+----+------+--------------------------
+ ah | 1126 | Mon Oct 15 00:00:00 2012
+(1 row)
+
+insert into foo
+	select (row('eh', 1125, '2012-10-16')::foo).*
+	returning foo.*;
+ c  |  a   |            b             
+----+------+--------------------------
+ eh | 1125 | Tue Oct 16 00:00:00 2012
+(1 row)
+
+insert into foo values
+	('values one', 1, '2008-10-20'),
+	('values two', 2, '2004-08-15');
+copy foo from stdin;
+select * from foo order by 2;
+      c      |   a    |            b             
+-------------+--------+--------------------------
+ values one  |      1 | Mon Oct 20 00:00:00 2008
+ values two  |      2 | Sun Aug 15 00:00:00 2004
+ the c       |     42 | Fri Jan 10 08:00:00 1975
+ column c    |    123 | Wed Mar 03 10:10:10 2010
+ c again     |    456 | Wed Mar 03 11:12:13 2010
+ and c       |    789 | Sat Feb 15 12:00:00 1975
+ copy one    |   1001 | Thu Dec 10 23:54:00 1998
+ copy two    |   1002 | Thu Aug 01 09:22:00 1996
+ eh          |   1125 | Tue Oct 16 00:00:00 2012
+ ah          |   1126 | Mon Oct 15 00:00:00 2012
+ hello world | 142857 | Sun Apr 29 00:00:00 1888
+(11 rows)
+
+-- Test some joins
+create table bar (x text, y int default 142857, z timestamp );
+insert into bar values ('oh no', default, '1937-04-28');
+insert into bar values ('oh yes', 42, '1492-12-31');
+begin;
+update pg_attribute set attlognum = 3 where attname = 'x' and attrelid = 'bar'::regclass;
+update pg_attribute set attlognum = 1 where attname = 'z' and attrelid = 'bar'::regclass;
+commit;
+select foo.* from bar, foo where bar.y = foo.a;
+      c      |   a    |            b             
+-------------+--------+--------------------------
+ the c       |     42 | Fri Jan 10 08:00:00 1975
+ hello world | 142857 | Sun Apr 29 00:00:00 1888
+(2 rows)
+
+select bar.* from bar, foo where bar.y = foo.a;
+            z             |   y    |   x    
+--------------------------+--------+--------
+ Sat Dec 31 00:00:00 1492 |     42 | oh yes
+ Wed Apr 28 00:00:00 1937 | 142857 | oh no
+(2 rows)
+
+select * from bar, foo where bar.y = foo.a;
+            z             |   y    |   x    |      c      |   a    |            b             
+--------------------------+--------+--------+-------------+--------+--------------------------
+ Sat Dec 31 00:00:00 1492 |     42 | oh yes | the c       |     42 | Fri Jan 10 08:00:00 1975
+ Wed Apr 28 00:00:00 1937 | 142857 | oh no  | hello world | 142857 | Sun Apr 29 00:00:00 1888
+(2 rows)
+
+select * from foo join bar on (foo.a = bar.y);
+      c      |   a    |            b             |            z             |   y    |   x    
+-------------+--------+--------------------------+--------------------------+--------+--------
+ the c       |     42 | Fri Jan 10 08:00:00 1975 | Sat Dec 31 00:00:00 1492 |     42 | oh yes
+ hello world | 142857 | Sun Apr 29 00:00:00 1888 | Wed Apr 28 00:00:00 1937 | 142857 | oh no
+(2 rows)
+
+alter table bar rename y to a;
+select * from foo natural join bar;
+   a    |      c      |            b             |            z             |   x    
+--------+-------------+--------------------------+--------------------------+--------
+     42 | the c       | Fri Jan 10 08:00:00 1975 | Sat Dec 31 00:00:00 1492 | oh yes
+ 142857 | hello world | Sun Apr 29 00:00:00 1888 | Wed Apr 28 00:00:00 1937 | oh no
+(2 rows)
+
+select * from foo join bar using (a);
+   a    |      c      |            b             |            z             |   x    
+--------+-------------+--------------------------+--------------------------+--------
+     42 | the c       | Fri Jan 10 08:00:00 1975 | Sat Dec 31 00:00:00 1492 | oh yes
+ 142857 | hello world | Sun Apr 29 00:00:00 1888 | Wed Apr 28 00:00:00 1937 | oh no
+(2 rows)
+
+create table baz (e point) inherits (foo, bar); -- fail to merge defaults
+NOTICE:  merging multiple inherited definitions of column "a"
+ERROR:  column "a" inherits conflicting default values
+HINT:  To resolve the conflict, specify a default explicitly.
+create table baz (e point, a int default 23) inherits (foo, bar);
+NOTICE:  merging multiple inherited definitions of column "a"
+NOTICE:  merging column "a" with inherited definition
+insert into baz (e) values ('(1,1)');
+select * from foo;
+      c      |   a    |            b             
+-------------+--------+--------------------------
+ hello world | 142857 | Sun Apr 29 00:00:00 1888
+ column c    |    123 | Wed Mar 03 10:10:10 2010
+ c again     |    456 | Wed Mar 03 11:12:13 2010
+ and c       |    789 | Sat Feb 15 12:00:00 1975
+ the c       |     42 | Fri Jan 10 08:00:00 1975
+ ah          |   1126 | Mon Oct 15 00:00:00 2012
+ eh          |   1125 | Tue Oct 16 00:00:00 2012
+ values one  |      1 | Mon Oct 20 00:00:00 2008
+ values two  |      2 | Sun Aug 15 00:00:00 2004
+ copy one    |   1001 | Thu Dec 10 23:54:00 1998
+ copy two    |   1002 | Thu Aug 01 09:22:00 1996
+             |     23 | Sat Feb 15 12:00:00 1975
+(12 rows)
+
+select * from bar;
+            z             |   a    |   x    
+--------------------------+--------+--------
+ Wed Apr 28 00:00:00 1937 | 142857 | oh no
+ Sat Dec 31 00:00:00 1492 |     42 | oh yes
+                          |     23 | 
+(3 rows)
+
+select * from baz;
+ c | a  |            b             | z | x |   e   
+---+----+--------------------------+---+---+-------
+   | 23 | Sat Feb 15 12:00:00 1975 |   |   | (1,1)
+(1 row)
+
+create table quux (a int, b int[], c int);
+begin;
+update pg_attribute set attlognum = 1 where attnum = 2 and attrelid = 'quux'::regclass;
+update pg_attribute set attlognum = 2 where attnum = 1 and attrelid = 'quux'::regclass;
+commit;
+select * from quux where (a,c) in ( select a,c from quux );
+ERROR:  failed to find unique expression in subplan tlist
+drop table foo, bar, baz, quux cascade;
+NOTICE:  drop cascades to function f()
diff --git a/src/test/regress/expected/create_table.out b/src/test/regress/expected/create_table.out
index 35451d5..3815510 100644
--- a/src/test/regress/expected/create_table.out
+++ b/src/test/regress/expected/create_table.out
@@ -31,9 +31,9 @@ CREATE TABLE onek (
 	string4		name
 );
 CREATE TABLE tenk1 (
-	unique1		int4,
-	unique2		int4,
 	two			int4,
+	unique2		int4,
+	unique1		int4,
 	four		int4,
 	ten			int4,
 	twenty		int4,
@@ -48,6 +48,8 @@ CREATE TABLE tenk1 (
 	stringu2	name,
 	string4		name
 ) WITH OIDS;
+UPDATE pg_attribute SET attlognum = 1 where attrelid = 'tenk1'::regclass and attname = 'unique1';
+UPDATE pg_attribute SET attlognum = 3 where attrelid = 'tenk1'::regclass and attname = 'two';
 CREATE TABLE tenk2 (
 	unique1 	int4,
 	unique2 	int4,
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index e0ae2f2..973d3be 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -30,7 +30,7 @@ test: point lseg line box path polygon circle date time timetz timestamp timesta
 # geometry depends on point, lseg, box, path, polygon and circle
 # horology depends on interval, timetz, timestamp, timestamptz, reltime and abstime
 # ----------
-test: geometry horology regex oidjoins type_sanity opr_sanity
+test: geometry horology regex oidjoins type_sanity opr_sanity col_order
 
 # ----------
 # These four each depend on the previous one
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 7f762bd..711b767 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -49,6 +49,7 @@ test: regex
 test: oidjoins
 test: type_sanity
 test: opr_sanity
+test: col_order
 test: insert
 test: create_function_1
 test: create_type
diff --git a/src/test/regress/sql/col_order.sql b/src/test/regress/sql/col_order.sql
new file mode 100644
index 0000000..db808b4
--- /dev/null
+++ b/src/test/regress/sql/col_order.sql
@@ -0,0 +1,99 @@
+drop table if exists foo, bar, baz cascade;
+
+create table foo (
+	a int,
+	b int,
+	c int,
+	d int,
+	e int);
+update pg_attribute set attphysnum = 4 where attname = 'b' and attrelid = 'foo'::regclass;
+update pg_attribute set attphysnum = 2 where attname = 'd' and attrelid = 'foo'::regclass;
+insert into foo values (1, 2, 3, 4, 5);
+select * from foo;
+-- \quit
+drop table foo;
+
+create table foo (
+	a int default 42,
+	b timestamp default '1975-02-15 12:00',
+	c text);
+insert into foo values (142857, '1888-04-29', 'hello world');
+
+begin;
+update pg_attribute set attlognum = 1 where attname = 'c' and attrelid = 'foo'::regclass;
+update pg_attribute set attlognum = 2 where attname = 'a' and attrelid = 'foo'::regclass;
+update pg_attribute set attlognum = 3 where attname = 'b' and attrelid = 'foo'::regclass;
+commit;
+
+insert into foo values ('column c', 123, '2010-03-03 10:10:10');
+insert into foo (c, a, b) values ('c again', 456, '2010-03-03 11:12:13');
+insert into foo values ('and c', 789);	-- defaults column b
+insert into foo (c, b) values ('the c', '1975-01-10 08:00');	-- defaults column a
+
+select * from foo;
+select foo from foo;
+select foo.* from foo;
+select a,c,b from foo;
+select c,b,a from foo;
+select a from foo;
+select b from foo;
+select c from foo;
+select (foo).* from foo;
+select ROW((foo).*) from foo;
+select ROW((foo).*)::foo from foo;
+select (ROW((foo).*)::foo).* from foo;
+
+create function f() returns setof foo language sql as $$
+select * from foo;
+$$;
+select * from f();
+
+insert into foo
+	select (row('ah', 1126, '2012-10-15')::foo).*
+	returning *;
+insert into foo
+	select (row('eh', 1125, '2012-10-16')::foo).*
+	returning foo.*;
+
+insert into foo values
+	('values one', 1, '2008-10-20'),
+	('values two', 2, '2004-08-15');
+
+copy foo from stdin;
+copy one	1001	1998-12-10 23:54
+copy two	1002	1996-08-01 09:22
+\.
+select * from foo order by 2;
+
+-- Test some joins
+create table bar (x text, y int default 142857, z timestamp );
+insert into bar values ('oh no', default, '1937-04-28');
+insert into bar values ('oh yes', 42, '1492-12-31');
+begin;
+update pg_attribute set attlognum = 3 where attname = 'x' and attrelid = 'bar'::regclass;
+update pg_attribute set attlognum = 1 where attname = 'z' and attrelid = 'bar'::regclass;
+commit;
+select foo.* from bar, foo where bar.y = foo.a;
+select bar.* from bar, foo where bar.y = foo.a;
+select * from bar, foo where bar.y = foo.a;
+select * from foo join bar on (foo.a = bar.y);
+alter table bar rename y to a;
+select * from foo natural join bar;
+select * from foo join bar using (a);
+
+create table baz (e point) inherits (foo, bar); -- fail to merge defaults
+create table baz (e point, a int default 23) inherits (foo, bar);
+insert into baz (e) values ('(1,1)');
+select * from foo;
+select * from bar;
+select * from baz;
+
+create table quux (a int, b int[], c int);
+begin;
+update pg_attribute set attlognum = 1 where attnum = 2 and attrelid = 'quux'::regclass;
+update pg_attribute set attlognum = 2 where attnum = 1 and attrelid = 'quux'::regclass;
+commit;
+select * from quux where (a,c) in ( select a,c from quux );
+
+
+drop table foo, bar, baz, quux cascade;
diff --git a/src/test/regress/sql/create_table.sql b/src/test/regress/sql/create_table.sql
index 08029a9..a6feccd 100644
--- a/src/test/regress/sql/create_table.sql
+++ b/src/test/regress/sql/create_table.sql
@@ -35,9 +35,9 @@ CREATE TABLE onek (
 );
 
 CREATE TABLE tenk1 (
-	unique1		int4,
-	unique2		int4,
 	two			int4,
+	unique2		int4,
+	unique1		int4,
 	four		int4,
 	ten			int4,
 	twenty		int4,
@@ -53,6 +53,9 @@ CREATE TABLE tenk1 (
 	string4		name
 ) WITH OIDS;
 
+UPDATE pg_attribute SET attlognum = 1 where attrelid = 'tenk1'::regclass and attname = 'unique1';
+UPDATE pg_attribute SET attlognum = 3 where attrelid = 'tenk1'::regclass and attname = 'two';
+
 CREATE TABLE tenk2 (
 	unique1 	int4,
 	unique2 	int4,

randomize-attlognum.pytext/x-python; name=randomize-attlognum.pyDownload

attlognum-init.sqlapplication/sql; name=attlognum-init.sqlDownload

attlognum-test.sqlapplication/sql; name=attlognum-test.sqlDownload

#54

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 11 years ago

In reply to: Tomas Vondra (#53)

Re: logical column ordering

On 2/23/15 5:09 PM, Tomas Vondra wrote:

Over the time I've heard various use cases for this patch, but in most
cases it was quite speculative. If you have an idea where this might be
useful, can you explain it here, or maybe point me to a place where it's
described?

For better or worse, table structure is a form of documentation for a
system. As such, it's very valuable to group related fields in a table
together. When creating a table, that's easy, but as soon as you need to
alter your careful ordering can easily end up out the window.

Perhaps to some that just sounds like pointless window dressing, but my
experience is that on a complex system the less organized things are the
more bugs you get due to overlooking something.

The other reason for this patch (which it maybe doesn't support
anymore?) is to allow Postgres to use an optimal physical ordering of
fields on a page to reduce space wasted on alignment, as well as taking
nullability and varlena into account.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55

Josh Berkus

josh@agliodbs.com

almost 11 years ago

In reply to: Alvaro Herrera (#1)

Re: logical column ordering

On 02/26/2015 10:49 AM, Jim Nasby wrote:

On 2/23/15 5:09 PM, Tomas Vondra wrote:

Over the time I've heard various use cases for this patch, but in most
cases it was quite speculative. If you have an idea where this might be
useful, can you explain it here, or maybe point me to a place where it's
described?

For better or worse, table structure is a form of documentation for a
system. As such, it's very valuable to group related fields in a table
together. When creating a table, that's easy, but as soon as you need to
alter your careful ordering can easily end up out the window.

Perhaps to some that just sounds like pointless window dressing, but my
experience is that on a complex system the less organized things are the
more bugs you get due to overlooking something.

Yes. Consider a BI table which has 110 columns. Having these columns
in a sensible order,even though some were added after table creation,
would be a significant usability benefit for DBAs.

A second usability benefit is making it easy to keep table columns for
import tables sync'd with COPY.

Neither of these is sufficient to overshadow performance penalties, but
they are both common activities/annoyances, and not speculative in the
least. And I haven't heard that there are any performance issues
associated with this patch. Are there?

The other reason for this patch (which it maybe doesn't support
anymore?) is to allow Postgres to use an optimal physical ordering of
fields on a page to reduce space wasted on alignment, as well as taking
nullability and varlena into account.

... and that's the bigger reason. I was under the impression that we'd
get LCO in first, and then add the column arrangement optimization in
the next version.

In fact, I would argue that LCO *needs* to be a feature at least one
version before we try to add column ordering optimization (COO). The
reason being that with LCO, users can implement COO as a client tool or
extension, maybe even an addition to pg_repack. This will allow users
to experiment with, and prove, algorithms for COO, allowing us to put in
a tested algorithm when we're ready to add it to core.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WM2a9faa772fa16e744458f3dd8da9e736d19ec6097f45755be464f1b4b44e62ca784e90838475289ffb212f0a5c380b29@asav-3.01.com

#56

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Josh Berkus (#55)

Re: logical column ordering

Josh Berkus wrote:

On 02/26/2015 10:49 AM, Jim Nasby wrote:

The other reason for this patch (which it maybe doesn't support
anymore?) is to allow Postgres to use an optimal physical ordering of
fields on a page to reduce space wasted on alignment, as well as taking
nullability and varlena into account.

... and that's the bigger reason. I was under the impression that we'd
get LCO in first, and then add the column arrangement optimization in
the next version.

The changes in this patch aren't really optimizations -- they are a
complete rework on the design of storage of columns. In the current
system, columns exist physically on disk in the same order as they are
created, and in the same order as they appear logically (i.e. SELECT *,
psql's \d, etc). This patch decouples these three things so that they
can changed freely -- but provides no user interface to do so. I think
that trying to only decouple the thing we currently have in two pieces,
and then have a subsequent patch to decouple again, is additional
conceptual complexity for no gain.

A future patch can implement physical ordering optimization (group
columns physically to avoid alignment padding, for instance, and also
put not-nullable fixed-length column at the start of the physical tuple,
so that the "attcacheoffset" thing can be applied in more cases), but I
think it's folly to try to attempt that in the current patch, which is
already hugely complicated.

In fact, I would argue that LCO *needs* to be a feature at least one
version before we try to add column ordering optimization (COO). The
reason being that with LCO, users can implement COO as a client tool or
extension, maybe even an addition to pg_repack. This will allow users
to experiment with, and prove, algorithms for COO, allowing us to put in
a tested algorithm when we're ready to add it to core.

The current patch will clear the road for such experimentation.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57

Josh Berkus

josh@agliodbs.com

almost 11 years ago

In reply to: Alvaro Herrera (#1)

Re: logical column ordering

On 02/26/2015 01:54 PM, Alvaro Herrera wrote:

This patch decouples these three things so that they
can changed freely -- but provides no user interface to do so. I think
that trying to only decouple the thing we currently have in two pieces,
and then have a subsequent patch to decouple again, is additional
conceptual complexity for no gain.

Oh, I didn't realize there weren't commands to change the LCO. Without
at least SQL syntax for LCO, I don't see why we'd take it; this sounds
more like a WIP patch.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WM16325ef241108fb49a6e4116322d49edd0a09488e7a1868337e40e491a89695318c28d8e980c4da99dd1c36228e061a3@asav-1.01.com

#58

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Josh Berkus (#57)

Re: logical column ordering

Josh Berkus wrote:

On 02/26/2015 01:54 PM, Alvaro Herrera wrote:

This patch decouples these three things so that they
can changed freely -- but provides no user interface to do so. I think
that trying to only decouple the thing we currently have in two pieces,
and then have a subsequent patch to decouple again, is additional
conceptual complexity for no gain.

Oh, I didn't realize there weren't commands to change the LCO. Without
at least SQL syntax for LCO, I don't see why we'd take it; this sounds
more like a WIP patch.

The reason for doing it this way is that changing the underlying
architecture is really hard, without having to bear an endless hackers
bike shed discussion about the best userland syntax to use. It seems a
much better approach to do the actually difficult part first, then let
the rest to be argued to death by others and let those others do the
easy part (and take all the credit along with that); that way, that
discussion does not kill other possible uses that the new architecture
allows.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#59

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 11 years ago

In reply to: Alvaro Herrera (#58)

Re: logical column ordering

On 2/26/15 4:01 PM, Alvaro Herrera wrote:

Josh Berkus wrote:

On 02/26/2015 01:54 PM, Alvaro Herrera wrote:

This patch decouples these three things so that they
can changed freely -- but provides no user interface to do so. I think
that trying to only decouple the thing we currently have in two pieces,
and then have a subsequent patch to decouple again, is additional
conceptual complexity for no gain.

Oh, I didn't realize there weren't commands to change the LCO. Without
at least SQL syntax for LCO, I don't see why we'd take it; this sounds
more like a WIP patch.

The reason for doing it this way is that changing the underlying
architecture is really hard, without having to bear an endless hackers
bike shed discussion about the best userland syntax to use. It seems a
much better approach to do the actually difficult part first, then let
the rest to be argued to death by others and let those others do the
easy part (and take all the credit along with that); that way, that
discussion does not kill other possible uses that the new architecture
allows.

+1. This patch is already several years old; lets not delay it further.

Plus, I suspect that you could hack the catalog directly if you really
wanted to change LCO. Worst case you could create a C function to do it.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#60

Andres Freund

andres@2ndquadrant.com

almost 11 years ago

In reply to: Jim Nasby (#59)

Re: logical column ordering

On 2015-02-26 16:16:54 -0600, Jim Nasby wrote:

On 2/26/15 4:01 PM, Alvaro Herrera wrote:

The reason for doing it this way is that changing the underlying
architecture is really hard, without having to bear an endless hackers
bike shed discussion about the best userland syntax to use. It seems a
much better approach to do the actually difficult part first, then let
the rest to be argued to death by others and let those others do the
easy part (and take all the credit along with that); that way, that
discussion does not kill other possible uses that the new architecture
allows.

I agree that it's a sane order of developing things. But: I don't think
it makes sense to commit it without the capability to change the
order. Not so much because it'll not serve a purpose at that point, but
rather because it'll essentially untestable. And this is a patch that
needs a fair amount of changes, both automated, and manual.

At least during development I'd even add a function that randomizes the
physical order of columns, and keeps the logical one. Running the
regression tests that way seems likely to catch a fair number of bugs.

+1. This patch is already several years old; lets not delay it further.

Plus, I suspect that you could hack the catalog directly if you really
wanted to change LCO. Worst case you could create a C function to do it.

I don't think that's sufficient for testing purposes. Not only for the
patch itself, but also for getting it right in new code.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#61

Tom Lane

tgl@sss.pgh.pa.us

almost 11 years ago

In reply to: Jim Nasby (#59)

Re: logical column ordering

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:

On 2/26/15 4:01 PM, Alvaro Herrera wrote:

Josh Berkus wrote:

Oh, I didn't realize there weren't commands to change the LCO. Without
at least SQL syntax for LCO, I don't see why we'd take it; this sounds
more like a WIP patch.

The reason for doing it this way is that changing the underlying
architecture is really hard, without having to bear an endless hackers
bike shed discussion about the best userland syntax to use. It seems a
much better approach to do the actually difficult part first, then let
the rest to be argued to death by others and let those others do the
easy part (and take all the credit along with that); that way, that
discussion does not kill other possible uses that the new architecture
allows.

+1. This patch is already several years old; lets not delay it further.

I tend to agree with that, but how are we going to test things if there's
no mechanism to create a table in which the orderings are different?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#62

Kevin Grittner

kgrittn@ymail.com

almost 11 years ago

In reply to: Tomas Vondra (#53)

Re: logical column ordering

Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Over the time I've heard various use cases for this patch, but in most
cases it was quite speculative. If you have an idea where this might be
useful, can you explain it here, or maybe point me to a place where it's
described?

One use case is to be able to suppress default display of columns that are
used for internal purposes. For example, incremental maintenance of
materialized views will require storing a "count(t)" column, and sometimes
state information for aggregate columns, in addition to what the users
explicitly request. At the developers' meeting there was discussion of
whether and how to avoid displaying these by default, and it was felt that
when we have this logical column ordering it would be good to have a way to
suppress default display. Perhaps this could be as simple as a special
value for logical position.
--

Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#63

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 11 years ago

In reply to: Andres Freund (#60)

Re: logical column ordering

On 2/26/15 4:34 PM, Andres Freund wrote:

On 2015-02-26 16:16:54 -0600, Jim Nasby wrote:

On 2/26/15 4:01 PM, Alvaro Herrera wrote:

The reason for doing it this way is that changing the underlying
architecture is really hard, without having to bear an endless hackers
bike shed discussion about the best userland syntax to use. It seems a
much better approach to do the actually difficult part first, then let
the rest to be argued to death by others and let those others do the
easy part (and take all the credit along with that); that way, that
discussion does not kill other possible uses that the new architecture
allows.

I agree that it's a sane order of developing things. But: I don't think
it makes sense to commit it without the capability to change the
order. Not so much because it'll not serve a purpose at that point, but
rather because it'll essentially untestable. And this is a patch that
needs a fair amount of changes, both automated, and manual.

It's targeted for 9.6 anyway, so we have a while to decide on what's
committed when. It's possible that there's no huge debate on the syntax.

At least during development I'd even add a function that randomizes the
physical order of columns, and keeps the logical one. Running the
regression tests that way seems likely to catch a fair number of bugs.

Yeah, I've thought that would be a necessary part of testing. Not really
sure how we'd go about it with existing framework though...

+1. This patch is already several years old; lets not delay it further.

Plus, I suspect that you could hack the catalog directly if you really
wanted to change LCO. Worst case you could create a C function to do it.

I don't think that's sufficient for testing purposes. Not only for the
patch itself, but also for getting it right in new code.

We'll want to do testing that it doesn't make sense for the API to
support. For example, randomizing the storage ordering; that's not a
sane use case. Ideally we wouldn't even expose physical ordering because
the code would be smart enough to figure it out.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#64

David Steele

david@pgmasters.net

almost 11 years ago

In reply to: Jim Nasby (#54)

1 attachment(s)

Re: logical column ordering

On 2/26/15 1:49 PM, Jim Nasby wrote:

On 2/23/15 5:09 PM, Tomas Vondra wrote:

Over the time I've heard various use cases for this patch, but in most
cases it was quite speculative. If you have an idea where this might be
useful, can you explain it here, or maybe point me to a place where it's
described?

For better or worse, table structure is a form of documentation for a
system. As such, it's very valuable to group related fields in a table
together. When creating a table, that's easy, but as soon as you need to
alter your careful ordering can easily end up out the window.

Perhaps to some that just sounds like pointless window dressing, but my
experience is that on a complex system the less organized things are the
more bugs you get due to overlooking something.

I agree with Jim's comments. I've generally followed column ordering
that goes something like:

1) primary key
2) foreign keys
3) flags
4) other programmatic data fields (type, order, etc.)
5) non-programmatic data fields (name, description, etc.)

The immediate practical benefit of this is that users are more likely to
see fields that they need without scrolling right. Documentation is
also clearer because fields tend to go from most to least important
(left to right, top to bottom). Also, if you are consistent enough then
users just *know* where to look.

I wrote a function a while back that reorders columns in tables (it not
only deals with reordering, but triggers, constraints, indexes, etc.,
though there are some caveats). It's painful and not very efficient,
but easy to use.

Most dimension tables that I've worked with are in the millions of rows
so reordering is not problem. With fact tables, I assess on a
case-by-case basis. It would be nice to not have to do that triage.

The function is attached if anyone is interested.

--
- David Steele
david@pgmasters.net

Attachments:

column_move.sqltext/plain; charset=UTF-8; name=column_move.sqlDownload

/******************************************************************************** CATALOG_TABLE_COLUMN_MOVE Function
********************************************************************************create or replace function _utility.catalog_table_column_move
(
    strSchemaName text, 
    strTableName text, 
    strColumnName text, 
    strColumnNameBefore text
)
    returns void as $$
declare
    rIndex record;
    rConstraint record;
    rColumn record;
    strSchemaTable text = strSchemaName || '.' || strTableName;
    strDdl text;
    strClusterIndex text;
begin
    -- Raise notice that a reorder is in progress
    raise notice 'Reorder columns in table %.% (% before %)', strSchemaName, strTableName, strColumnName, strColumnNameBefore;

    -- Get the cluster index
    select pg_index.relname
      into strClusterIndex
      from pg_namespace
           inner join pg_class
                on pg_class.relnamespace = pg_namespace.oid
               and pg_class.relname = strTableName
           inner join pg_index pg_index_map
                on pg_index_map.indrelid = pg_class.oid
               and pg_index_map.indisclustered = true
           inner join pg_class pg_index
                on pg_index.oid = pg_index_map.indexrelid
     where pg_namespace.nspname = strSchemaName;

    if strClusterIndex is null then
        raise exception 'Table %.% must have a cluster index before reordering', strSchemaName, strTableName;
    end if;

    -- Disable all user triggers
    strDdl = 'alter table ' || strSchemaTable || ' disable trigger user';
    raise notice '        Disable triggers [%]', strDdl;
    execute strDdl;

    -- Create temp table to hold ddl
    create temp table temp_catalogtablecolumnreorder
    (
        type text not null,
        name text not null,
        ddl text not null
    );

    -- Save index ddl in a temp table
    raise notice '    Save indexes';

    for rIndex in
        with index as
        (
            select _utility.catalog_index_list_get(strSchemaName, strTableName) as name
        ),
        index_ddl as
        (
            select index.name,
                   _utility.catalog_index_create_get(_utility.catalog_index_get(strSchemaName, index.name)) as ddl
              from index
        )
        select index.name,
               index_ddl.ddl
          from index
               left outer join index_ddl
                    on index_ddl.name = index.name
                   and index_ddl.ddl not like '%[function]%'
    loop
        raise notice '        Save %', rIndex.name;
        insert into temp_catalogtablecolumnreorder values ('index', rIndex.name, rIndex.ddl);
    end loop;

    -- Save constraint ddl in a temp table
    raise notice '    Save constraints';

    for rConstraint in
        with constraint_list as
        (
            select _utility.catalog_constraint_list_get(strSchemaName, strTableName, '{p,u,f,c}') as name
        ),
        constraint_ddl as
        (
            select constraint_list.name,
                   _utility.catalog_constraint_create_get(_utility.catalog_constraint_get(strSchemaName, strTableName, 
                                                                                          constraint_list.name)) as ddl
              from constraint_list
        )
        select constraint_list.name,
               constraint_ddl.ddl
          from constraint_list
               left outer join constraint_ddl
                    on constraint_ddl.name = constraint_list.name
    loop
        raise notice '        Save %', rConstraint.name;
        insert into temp_catalogtablecolumnreorder values ('constraint', rConstraint.name, rConstraint.ddl);
    end loop;

    -- Move column
    for rColumn in
        with table_column as
        (
            select pg_attribute.attname as name,
                   rank() over (order by pg_attribute.attnum) as rank,
                   pg_type.typname as type,
                   case when pg_attribute.atttypmod = -1 then null else ((atttypmod - 4) >> 16) & 65535 end as precision,
                   case when pg_attribute.atttypmod = -1 then null else (atttypmod - 4) & 65535 end as scale,
                   not pg_attribute.attnotnull as nullable,
                   pg_attrdef.adsrc as default,
                   pg_attribute.*
              from pg_namespace
                   inner join pg_class
                        on pg_class.relnamespace = pg_namespace.oid
                       and pg_class.relname = strTableName
                   inner join pg_attribute
                        on pg_attribute.attrelid = pg_class.oid
                       and pg_attribute.attnum >= 1
                       and pg_attribute.attisdropped = false
                   inner join pg_type
                        on pg_type.oid = pg_attribute.atttypid
                   left outer join pg_attrdef
                        on pg_attrdef.adrelid = pg_class.oid
                       and pg_attrdef.adnum = pg_attribute.attnum
             where pg_namespace.nspname = strSchemaName
             order by pg_attribute.attnum
        )
        select table_column.*
          from table_column table_column_before
               inner join table_column
                    on table_column.rank >= table_column_before.rank
                   and table_column.name <> strColumnName
         where table_column_before.name = strColumnNameBefore
    loop
        raise notice '    Move column %', rColumn.name;

        strDdl = 'alter table ' || strSchemaTable || ' rename column "' || rColumn.name || '" to "@' || rColumn.name || '@"';
        raise notice '        Rename [%]', strDdl;
        execute strDdl;
        
        strDdl = 'alter table ' || strSchemaTable || ' add "' || rColumn.name || '" ' || rColumn.type ||
                 case when rColumn.precision is not null then '(' || rColumn.precision || ', ' || rColumn.scale || ')' else '' end;
        raise notice '        Create [%]', strDdl;
        execute strDdl;
        
        strDdl = 'update ' || strSchemaTable || ' set "' || rColumn.name || '" = "@' || rColumn.name || '@"';
        raise notice '        Copy [%]', strDdl;
        execute strDdl;

        strDdl = 'alter table ' || strSchemaTable || ' drop column "@' || rColumn.name || '@"';
        raise notice '        Drop [%]', strDdl;
        execute strDdl;

        if rColumn."default" is not null then
            strDdl = 'alter table ' || strSchemaTable || ' alter column "' || rColumn.name || '" set default ' || rColumn.default;
            raise notice '        Default [%]', strDdl;
            execute strDdl;
        end if;

        if rColumn.nullable = false then
            strDdl = 'alter table ' || strSchemaTable || ' alter column "' || rColumn.name || '" set not null';
            raise notice '        Not Null [%]', strDdl;
            execute strDdl;
        end if;
    end loop;

    -- Rebuild indexes
    raise notice '    Rebuild indexes';

    for rIndex in
        select name,
               ddl
          from temp_catalogtablecolumnreorder
         where type = 'index'
    loop
        begin
            execute rIndex.ddl;
            raise notice '        Rebuild % [%]', rIndex.name, rIndex.ddl;
        exception
            when duplicate_table then
                raise notice '        Skip % [%]', rIndex.name, rIndex.ddl;
        end;
    end loop;

    -- Rebuild constraints
    raise notice '    Rebuild constraints';
    
    for rConstraint in
        select name,
               ddl
          from temp_catalogtablecolumnreorder
         where type = 'constraint'
    loop
        begin
            execute rConstraint.ddl;
            raise notice '        Rebuild % [%]', rConstraint.name, rConstraint.ddl;
        exception
            when duplicate_object or duplicate_table or invalid_table_definition then
                raise notice '        Skip % [%]', rConstraint.name, rConstraint.ddl;
        end;
    end loop;

    -- Recluster table
    strDdl = 'cluster ' || strSchemaTable || ' using ' || strClusterIndex;
    raise notice '    Recluster [%]', strDdl;
    execute strDdl;

    -- Enable all user triggers
    strDdl = 'alter table ' || strSchemaTable || ' enable trigger user';
    raise notice '    Enable triggers [%]', strDdl;
    execute strDdl;

    -- Drop temp tables
    drop table temp_catalogtablecolumnreorder;
end
$$ language plpgsql security invoker;

comment on function _utility.catalog_table_column_move(text, text, text, text) is
'Moves a column before another column in a table.  For example:

{{perform _utility.catalog_table_column_move(''attribute'', ''attribute'', ''target'', ''active'');}}

will position the "target" column right before the "active" column.  It''s not currently possible to directly move a column to the
right but this can be achieved by multiple moves of columns to the left.

There are a few caveats:
* The table must have a cluster index.  Moving columns is messy on the storage and the table needs to be re-clustered afterwards.
* Column referencing triggers will not automatically be dropped or rebuilt.
* Column specific permissions are not restored after the move.
* A column cannot be moved before the primary key if there are foreign key references from other tables.';

#65

Gavin Flower

GavinFlower@archidevsys.co.nz

almost 11 years ago

In reply to: David Steele (#64)

Re: logical column ordering

On 27/02/15 14:08, David Steele wrote:
[...]

I agree with Jim's comments. I've generally followed column ordering
that goes something like:

1) primary key
2) foreign keys
3) flags
4) other programmatic data fields (type, order, etc.)
5) non-programmatic data fields (name, description, etc.)

The immediate practical benefit of this is that users are more likely to
see fields that they need without scrolling right. Documentation is
also clearer because fields tend to go from most to least important
(left to right, top to bottom). Also, if you are consistent enough then
users just *know* where to look.

I wrote a function a while back that reorders columns in tables (it not
only deals with reordering, but triggers, constraints, indexes, etc.,
though there are some caveats). It's painful and not very efficient,
but easy to use.

Most dimension tables that I've worked with are in the millions of rows
so reordering is not problem. With fact tables, I assess on a
case-by-case basis. It would be nice to not have to do that triage.

The function is attached if anyone is interested.

I've never formally written down the order of how I define
fields^H^H^H^H^H^H columns in a table, but David's list is the same
order I use.

The only additional ordering I do: is to put fields that are likely to
be long text fields, at the end of the record.

So I would certainly appreciate my logical ordering to be the natural
order they appear.

Cheers,
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#66

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Tom Lane (#61)

Re: logical column ordering

On 26.2.2015 23:36, Tom Lane wrote:

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:

On 2/26/15 4:01 PM, Alvaro Herrera wrote:

Josh Berkus wrote:

Oh, I didn't realize there weren't commands to change the LCO. Without
at least SQL syntax for LCO, I don't see why we'd take it; this sounds
more like a WIP patch.

The reason for doing it this way is that changing the underlying
architecture is really hard, without having to bear an endless hackers
bike shed discussion about the best userland syntax to use. It seems a
much better approach to do the actually difficult part first, then let
the rest to be argued to death by others and let those others do the
easy part (and take all the credit along with that); that way, that
discussion does not kill other possible uses that the new architecture
allows.

+1. This patch is already several years old; lets not delay it further.

I tend to agree with that, but how are we going to test things if there's
no mechanism to create a table in which the orderings are different?

Physical or logical orderings?

Physical ordering is still determined by the CREATE TABLE command, so
you can do either

CREATE TABLE order_1 (
a INT,
b INT
);

or (to get the reverse order)

CREATE TABLE order_2 (
b INT,
a INT
);

Logical ordering may be updated directly in pg_attribute catalog, by
tweaking the attlognum column

UPDATE pg_attribute SET attlognum = 10
WHERE attrelid = 'order_1'::regclass AND attname = 'a';

Of course, this does not handle duplicities, ranges and such, so that
needs to be done manually. But I think inventing something like

ALTER TABLE order_1 ALTER COLUMN a SET lognum = 11;

should be straightforward. But IMHO getting the internals working
properly first is more important.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#67

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Tomas Vondra (#66)

Re: logical column ordering

Tomas Vondra wrote:

Physical ordering is still determined by the CREATE TABLE command,

In theory, you should be able to UPDATE attphysnum in pg_attribute too
when the table is empty, and new tuples inserted would follow the
specified ordering.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#68

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Alvaro Herrera (#67)

Re: logical column ordering

On 27.2.2015 19:23, Alvaro Herrera wrote:

Tomas Vondra wrote:

Physical ordering is still determined by the CREATE TABLE command,

In theory, you should be able to UPDATE attphysnum in pg_attribute
too when the table is empty, and new tuples inserted would follow
the specified ordering.

Good idea - that'd give you descriptors with

(attnum != attphysnum)

which might trigger some bugs.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#69

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Kevin Grittner (#62)

Re: logical column ordering

On 26.2.2015 23:42, Kevin Grittner wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Over the time I've heard various use cases for this patch, but in
most cases it was quite speculative. If you have an idea where
this might be useful, can you explain it here, or maybe point me to
a place where it's described?

One use case is to be able to suppress default display of columns
that are used for internal purposes. For example, incremental
maintenance of materialized views will require storing a "count(t)"
column, and sometimes state information for aggregate columns, in
addition to what the users explicitly request. At the developers'
meeting there was discussion of whether and how to avoid displaying
these by default, and it was felt that when we have this logical
column ordering it would be good to have a way tosuppress default
display. Perhaps this could be as simple as a special value for
logical position.

I don't see how hiding columns is related to this patch at all. That's
completely unrelated thing, and it certainly is not part of this patch.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#70

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Tomas Vondra (#69)

Re: logical column ordering

Tomas Vondra wrote:

On 26.2.2015 23:42, Kevin Grittner wrote:

One use case is to be able to suppress default display of columns
that are used for internal purposes. For example, incremental
maintenance of materialized views will require storing a "count(t)"
column, and sometimes state information for aggregate columns, in
addition to what the users explicitly request. At the developers'
meeting there was discussion of whether and how to avoid displaying
these by default, and it was felt that when we have this logical
column ordering it would be good to have a way tosuppress default
display. Perhaps this could be as simple as a special value for
logical position.

I don't see how hiding columns is related to this patch at all. That's
completely unrelated thing, and it certainly is not part of this patch.

It's not directly related to the patch, but I think the intent is that
once we have this patch it will be possible to apply other
transformations, such as having columns that are effectively hidden --
consider for example the idea that attlognum be set to a negative
number. (For instance, consider the idea that system columns may all
have -1 as attlognum, which would just be an indicator that they are
never present in logical column expansion. That makes sense to me; what
reason do we have to keep them using the current attnums they have?)

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#71

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Alvaro Herrera (#70)

Re: logical column ordering

On 27.2.2015 19:50, Alvaro Herrera wrote:

Tomas Vondra wrote:

On 26.2.2015 23:42, Kevin Grittner wrote:

One use case is to be able to suppress default display of columns
that are used for internal purposes. For example, incremental
maintenance of materialized views will require storing a "count(t)"
column, and sometimes state information for aggregate columns, in
addition to what the users explicitly request. At the developers'
meeting there was discussion of whether and how to avoid displaying
these by default, and it was felt that when we have this logical
column ordering it would be good to have a way tosuppress default
display. Perhaps this could be as simple as a special value for
logical position.

I don't see how hiding columns is related to this patch at all. That's
completely unrelated thing, and it certainly is not part of this patch.

It's not directly related to the patch, but I think the intent is that
once we have this patch it will be possible to apply other
transformations, such as having columns that are effectively hidden --
consider for example the idea that attlognum be set to a negative
number. (For instance, consider the idea that system columns may all
have -1 as attlognum, which would just be an indicator that they are
never present in logical column expansion. That makes sense to me; what
reason do we have to keep them using the current attnums they have?)

My vote is against reusing attlognums in this way - I see that as a
misuse, making the code needlessly complex. A better way to achieve this
is to introduce a simple 'is hidden' flag into pg_attribute, and
something as simple as

ALTER TABLE t ALTER COLUMN a SHOW;
ALTER TABLE t ALTER COLUMN a HIDE;

or something along the lines. Not only that's cleaner, but it's also a
better interface for the users than 'you have to set attlognum to a
negative value.'

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#72

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Andres Freund (#60)

1 attachment(s)

Re: logical column ordering

On 26.2.2015 23:34, Andres Freund wrote:

On 2015-02-26 16:16:54 -0600, Jim Nasby wrote:

On 2/26/15 4:01 PM, Alvaro Herrera wrote:

The reason for doing it this way is that changing the underlying
architecture is really hard, without having to bear an endless hackers
bike shed discussion about the best userland syntax to use. It seems a
much better approach to do the actually difficult part first, then let
the rest to be argued to death by others and let those others do the
easy part (and take all the credit along with that); that way, that
discussion does not kill other possible uses that the new architecture
allows.

I agree that it's a sane order of developing things. But: I don't
think it makes sense to commit it without the capability to change
the order. Not so much because it'll not serve a purpose at that
point, but rather because it'll essentially untestable. And this is a
patch that needs a fair amount of changes, both automated, and
manual.

I agree that committing something without a code that actually uses it
in practice is not the best approach. That's one of the reasons why I
was asking for the use cases we expect from this.

At least during development I'd even add a function that randomizes
the physical order of columns, and keeps the logical one. Running
the regression tests that way seems likely to catch a fair number of
bugs.

That's not all that difficult, and can be done in 10 lines of PL/pgSQL -
see the attached file. Should be sufficient for development testing (and
in production there's not much use for that anyway).

+1. This patch is already several years old; lets not delay it further.

Plus, I suspect that you could hack the catalog directly if you
really wanted to change LCO. Worst case you could create a C
function to do it.

I don't think that's sufficient for testing purposes. Not only for
the patch itself, but also for getting it right in new code.

I think we could calls to the randomization functions into some of the
regression tests (say 'create_tables.sql'), but that makes regression
tests ... well, random, and I'm not convinced that's a good thing.

Also, this makes regression tests harder to think, because "SELECT *"
does different things depending on the attlognum order.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#73

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Tomas Vondra (#72)

Re: logical column ordering

Tomas Vondra wrote:

I think we could calls to the randomization functions into some of the
regression tests (say 'create_tables.sql'), but that makes regression
tests ... well, random, and I'm not convinced that's a good thing.

Also, this makes regression tests harder to think, because "SELECT *"
does different things depending on the attlognum order.

No, that approach doesn't seem very useful. Rather, randomize the
columns in the CREATE TABLE statement, and then fix up the attlognums so
that the SELECT * expansion is the same as it would be with the
not-randomized CREATE TABLE.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#74

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Alvaro Herrera (#73)

Re: logical column ordering

On 27.2.2015 20:34, Alvaro Herrera wrote:

Tomas Vondra wrote:

I think we could calls to the randomization functions into some of the
regression tests (say 'create_tables.sql'), but that makes regression
tests ... well, random, and I'm not convinced that's a good thing.

Also, this makes regression tests harder to think, because "SELECT *"
does different things depending on the attlognum order.

No, that approach doesn't seem very useful. Rather, randomize the
columns in the CREATE TABLE statement, and then fix up the attlognums so
that the SELECT * expansion is the same as it would be with the
not-randomized CREATE TABLE.

Yes, that's a possible approach too - possibly a better one for
regression tests as it fixes the 'SELECT *' but it effectively uses
fixed 'attlognum' and 'attnum' values (it's difficult to randomize
those, as they may be referenced in other catalogs).

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#75

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Tomas Vondra (#74)

Re: logical column ordering

Tomas Vondra wrote:

On 27.2.2015 20:34, Alvaro Herrera wrote:

Tomas Vondra wrote:

I think we could calls to the randomization functions into some of the
regression tests (say 'create_tables.sql'), but that makes regression
tests ... well, random, and I'm not convinced that's a good thing.

Also, this makes regression tests harder to think, because "SELECT *"
does different things depending on the attlognum order.

No, that approach doesn't seem very useful. Rather, randomize the
columns in the CREATE TABLE statement, and then fix up the attlognums so
that the SELECT * expansion is the same as it would be with the
not-randomized CREATE TABLE.

Yes, that's a possible approach too - possibly a better one for
regression tests as it fixes the 'SELECT *' but it effectively uses
fixed 'attlognum' and 'attnum' values (it's difficult to randomize
those, as they may be referenced in other catalogs).

Why would you care what values are used as attnum? If anything
misbehaves, surely that would be a bug in the patch. (Of course, you
can't just change the numbers too much later after the fact, because the
attnum values could have propagated into other tables via foreign keys
and such; it needs to be done during executing CREATE TABLE or
immediately thereafter.)

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#76

Arthur Silva

arthurprs@gmail.com

almost 11 years ago

In reply to: Tomas Vondra (#74)

Re: logical column ordering

On Fri, Feb 27, 2015 at 4:44 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

On 27.2.2015 20:34, Alvaro Herrera wrote:

Tomas Vondra wrote:

I think we could calls to the randomization functions into some of the
regression tests (say 'create_tables.sql'), but that makes regression
tests ... well, random, and I'm not convinced that's a good thing.

Also, this makes regression tests harder to think, because "SELECT *"
does different things depending on the attlognum order.

No, that approach doesn't seem very useful. Rather, randomize the
columns in the CREATE TABLE statement, and then fix up the attlognums so
that the SELECT * expansion is the same as it would be with the
not-randomized CREATE TABLE.

Yes, that's a possible approach too - possibly a better one for
regression tests as it fixes the 'SELECT *' but it effectively uses
fixed 'attlognum' and 'attnum' values (it's difficult to randomize
those, as they may be referenced in other catalogs).

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Sorry to intrude, I've been following this post and I was wondering if it
would allow (in the currently planed form or in the future) a wider set of
non-rewriting DDLs to Postgres. For example, drop a column without
rewriting the table.

#77

Stephen Frost

sfrost@snowman.net

almost 11 years ago

In reply to: Arthur Silva (#76)

Re: logical column ordering

* Arthur Silva (arthurprs@gmail.com) wrote:

Sorry to intrude, I've been following this post and I was wondering if it
would allow (in the currently planed form or in the future) a wider set of
non-rewriting DDLs to Postgres. For example, drop a column without
rewriting the table.

Uh, we already support that. Ditto for add column (but you have to add
it with all NULL values; you can't add it with a default value).

Thanks,

Stephen

#78

Andres Freund

andres@2ndquadrant.com

almost 11 years ago

In reply to: Arthur Silva (#76)

Re: logical column ordering

On 2015-02-27 16:51:30 -0300, Arthur Silva wrote:

Sorry to intrude, I've been following this post and I was wondering if it
would allow (in the currently planed form or in the future) a wider set of
non-rewriting DDLs to Postgres.

I don't think it makes a big difference.

For example, drop a column without rewriting the table.

That's already possible.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#79

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Arthur Silva (#76)

Re: logical column ordering

Arthur Silva wrote:

Sorry to intrude, I've been following this post and I was wondering if it
would allow (in the currently planed form or in the future) a wider set of
non-rewriting DDLs to Postgres. For example, drop a column without
rewriting the table.

Perhaps. But dropping a column already does not rewrite the table, only
marks the column as dropped in system catalogs, so do you have a better
example.

One obvious example is that you have

CREATE TABLE t (
t1 int,
t3 int
);

and later want to add t2 in the middle, the only way currently is to
drop the table and start again (re-creating all dependant views, FKs,
etc). With the patch you will be able to add the column at the right
place. If no default value is supplied for the new column, no table
rewrite is necessary at all.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#80

Josh Berkus

josh@agliodbs.com

almost 11 years ago

In reply to: Jim Nasby (#54)

Re: logical column ordering

Tomas,

So for an API, 100% of the use cases I have for this feature would be
satisfied by:

ALTER TABLE ______ ALTER COLUMN _____ SET ORDER #

and:

ALTER TABLE _____ ADD COLUMN colname coltype ORDER #

If that's infeasible, a function would be less optimal, but would work:

SELECT pg_column_order(schemaname, tablename, colname, attnum)

If you set the order # to one where a column already exists, other
column attnums would get "bumped down", closing up any gaps in the
process. Obviously, this would require some kind of exclusive lock, but
then ALTER TABLE usually does, doesn't it?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WMed0b7490bbfe863c4f6432d5362066807633e185ea019296cb944a3de0e469af0f2d932aac2d80378f61eab9830cacf0@asav-1.01.com

#81

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Alvaro Herrera (#75)

Re: logical column ordering

On 27.2.2015 20:49, Alvaro Herrera wrote:

Tomas Vondra wrote:

On 27.2.2015 20:34, Alvaro Herrera wrote:

Tomas Vondra wrote:

I think we could calls to the randomization functions into some of the
regression tests (say 'create_tables.sql'), but that makes regression
tests ... well, random, and I'm not convinced that's a good thing.

Also, this makes regression tests harder to think, because "SELECT *"
does different things depending on the attlognum order.

No, that approach doesn't seem very useful. Rather, randomize the
columns in the CREATE TABLE statement, and then fix up the attlognums so
that the SELECT * expansion is the same as it would be with the
not-randomized CREATE TABLE.

Yes, that's a possible approach too - possibly a better one for
regression tests as it fixes the 'SELECT *' but it effectively uses
fixed 'attlognum' and 'attnum' values (it's difficult to randomize
those, as they may be referenced in other catalogs).

Why would you care what values are used as attnum? If anything
misbehaves, surely that would be a bug in the patch. (Of course, you
can't just change the numbers too much later after the fact, because the
attnum values could have propagated into other tables via foreign keys
and such; it needs to be done during executing CREATE TABLE or
immediately thereafter.)

Because attnums are referenced in other catalogs? For example when you
define PRIMARY KEY or UNIQUE constraint in the table, an index is
created, which gets a row in pg_index catalog, and that references the
attnum values in indkey column.

If you just randomize the attnums in pg_attribute (withouth fixing all
the attnum references), it's going to go BOOOOM.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#82

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Josh Berkus (#80)

Re: logical column ordering

On 27.2.2015 21:09, Josh Berkus wrote:

Tomas,

So for an API, 100% of the use cases I have for this feature would be
satisfied by:

ALTER TABLE ______ ALTER COLUMN _____ SET ORDER #

and:

ALTER TABLE _____ ADD COLUMN colname coltype ORDER #

Yes, I imagined an interface like that. Just to be clear, you're talking
about logical order (and not a physical one), right?

Do we need an API to modify physical column order? (I don't think so.)

If that's infeasible, a function would be less optimal, but would work:

SELECT pg_column_order(schemaname, tablename, colname, attnum)

If we need a user interface, let's have a proper one (ALTER TABLE).

If you set the order # to one where a column already exists, other
column attnums would get "bumped down", closing up any gaps in the
process. Obviously, this would require some kind of exclusive lock,
but then ALTER TABLE usually does, doesn't it?

If we ignore the system columns, the current implementation assumes that
the values in each of the three columns (attnum, attlognum and
attphysnum) are distinct and within 1..natts. So there are no gaps and
you'll always set the value to an existing one (so yes, shuffling is
necessary).

And yes, that certainly requires an exclusive lock on the pg_attribute
(I don't think we need a lock on the table itself).

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#83

Gavin Flower

GavinFlower@archidevsys.co.nz

almost 11 years ago

In reply to: Josh Berkus (#80)

Re: logical column ordering

On 28/02/15 09:09, Josh Berkus wrote:

Tomas,

So for an API, 100% of the use cases I have for this feature would be
satisfied by:

ALTER TABLE ______ ALTER COLUMN _____ SET ORDER #

and:

ALTER TABLE _____ ADD COLUMN colname coltype ORDER #

If that's infeasible, a function would be less optimal, but would work:

SELECT pg_column_order(schemaname, tablename, colname, attnum)

If you set the order # to one where a column already exists, other
column attnums would get "bumped down", closing up any gaps in the
process. Obviously, this would require some kind of exclusive lock, but
then ALTER TABLE usually does, doesn't it?

Might be an idea to allow lists of columns and their starting position.

ALTER TABLE customer ALTER COLUMN (job_id, type_id, account_num) SET
ORDER 3;

So in a table with fields:

1. id
2. *account_num*
3. dob
4. start_date
5. *type_id*
6. preferred_status
7. */job_id/*
8. comment

would end up like:

1. id
2. dob
3. *job_id*
4. *type_id*
5. *account_num*
6. start_date
7. preferred_status
8. comment

Am assuming positions are numbered from 1 onwards.

Cheers,
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#84

Josh Berkus

josh@agliodbs.com

almost 11 years ago

In reply to: Jim Nasby (#54)

Re: logical column ordering

On 02/27/2015 12:25 PM, Tomas Vondra wrote:

On 27.2.2015 21:09, Josh Berkus wrote:

Tomas,

So for an API, 100% of the use cases I have for this feature would be
satisfied by:

ALTER TABLE ______ ALTER COLUMN _____ SET ORDER #

and:

ALTER TABLE _____ ADD COLUMN colname coltype ORDER #

Yes, I imagined an interface like that. Just to be clear, you're talking
about logical order (and not a physical one), right?

Correct. The only reason to rearrange the physical columns is in order
to optimize, which presumably would be carried out by a utility command,
e.g. VACUUM FULL OPTIMIZE.

If we ignore the system columns, the current implementation assumes that
the values in each of the three columns (attnum, attlognum and
attphysnum) are distinct and within 1..natts. So there are no gaps and
you'll always set the value to an existing one (so yes, shuffling is
necessary).

And yes, that certainly requires an exclusive lock on the pg_attribute
(I don't think we need a lock on the table itself).

If MVCC on pg_attribute is good enough to not lock against concurrent
"SELECT *", then that would be lovely.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WMc512f680f05856c82edb868301d08487878a893c19d0cf25a36340567ef029979a77c50fc3c6711617dc889cb6be12d6@asav-2.01.com

#85

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Josh Berkus (#55)

Re: logical column ordering

OK, so let me summarize here (the other posts in this subthread are
discussing the user interface, not the use cases, so I'll respond here).

There are two main use cases:

1) change the order of columns in "SELECT *"

- display columns so that related ones grouped together
(irrespectedly whether they were added later, etc.)

- keep columns synced with COPY

- requires user interface (ALTER TABLE _ ALTER COLUMN _ SET ORDER _)

2) optimization of physical order (efficient storage / tuple deforming)

- more efficient order for storage (deforming)

- may be done manually by reordering columns in CREATE TABLE

- should be done automatically (no user interface required)

- seems useful both for on-disk physical tuples, and virtual tuples

Anything else?

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#86

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Josh Berkus (#84)

Re: logical column ordering

On 27.2.2015 21:42, Josh Berkus wrote:

On 02/27/2015 12:25 PM, Tomas Vondra wrote:

On 27.2.2015 21:09, Josh Berkus wrote:

Tomas,

So for an API, 100% of the use cases I have for this feature would be
satisfied by:

ALTER TABLE ______ ALTER COLUMN _____ SET ORDER #

and:

ALTER TABLE _____ ADD COLUMN colname coltype ORDER #

Yes, I imagined an interface like that. Just to be clear, you're
talking about logical order (and not a physical one), right?

Correct. The only reason to rearrange the physical columns is in
order to optimize, which presumably would be carried out by a utility
command, e.g. VACUUM FULL OPTIMIZE.

I was thinking more about CREATE TABLE at this moment, but yeah, VACUUM
FULL OPTIMIZE might do the same thing.

If we ignore the system columns, the current implementation assumes
that the values in each of the three columns (attnum, attlognum
and attphysnum) are distinct and within 1..natts. So there are no
gaps and you'll always set the value to an existing one (so yes,
shuffling is necessary).

And yes, that certainly requires an exclusive lock on the
pg_attribute (I don't think we need a lock on the table itself).

If MVCC on pg_attribute is good enough to not lock against concurrent
"SELECT *", then that would be lovely.

Yeah, I think this will need a bit more thought. We certainly don't want
blocking queries on the table, but we need a consistent list of column
definitions from pg_attribute.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#87

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Tomas Vondra (#85)

Re: logical column ordering

Tomas Vondra wrote:

1) change the order of columns in "SELECT *"

- display columns so that related ones grouped together
(irrespectedly whether they were added later, etc.)

- keep columns synced with COPY

- requires user interface (ALTER TABLE _ ALTER COLUMN _ SET ORDER _)

Not sure about the "ORDER #" syntax. In ALTER ENUM we have "AFTER
<value>" and such .. I'd consider that instead.

2) optimization of physical order (efficient storage / tuple deforming)

- more efficient order for storage (deforming)

- may be done manually by reordering columns in CREATE TABLE

- should be done automatically (no user interface required)

A large part of it can be done automatically: for instance, not-nullable
fixed length types ought to come first, because that enables the
attcacheoff optimizations in heaptuple.c to fire for more columns. But
what column comes next? The offset of the column immediately after them
can also be cached, and so it would be faster to obtain than other
attributes. Which one to choose here is going to be a coin toss in most
cases, but I suppose there are cases out there which can benefit from
having a particular column there.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#88

Gavin Flower

GavinFlower@archidevsys.co.nz

almost 11 years ago

In reply to: Alvaro Herrera (#87)

Re: logical column ordering

On 28/02/15 09:49, Alvaro Herrera wrote:

Tomas Vondra wrote:

1) change the order of columns in "SELECT *"

- display columns so that related ones grouped together
(irrespectedly whether they were added later, etc.)

- keep columns synced with COPY

- requires user interface (ALTER TABLE _ ALTER COLUMN _ SET ORDER _)

Not sure about the "ORDER #" syntax. In ALTER ENUM we have "AFTER
<value>" and such .. I'd consider that instead.

2) optimization of physical order (efficient storage / tuple deforming)

- more efficient order for storage (deforming)

- may be done manually by reordering columns in CREATE TABLE

- should be done automatically (no user interface required)

A large part of it can be done automatically: for instance, not-nullable
fixed length types ought to come first, because that enables the
attcacheoff optimizations in heaptuple.c to fire for more columns. But
what column comes next? The offset of the column immediately after them
can also be cached, and so it would be faster to obtain than other
attributes. Which one to choose here is going to be a coin toss in most
cases, but I suppose there are cases out there which can benefit from
having a particular column there.

Possible, if there is no obvious (to the system) way of deciding the
order of 2 columns, then the logical order should be used?

As either the order does not really matter, or an expert DBA might know
which is more efficient.

Cheers,
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#89

Arthur Silva

arthurprs@gmail.com

almost 11 years ago

In reply to: Alvaro Herrera (#79)

Re: logical column ordering

On Feb 27, 2015 5:02 PM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote:

Arthur Silva wrote:

Sorry to intrude, I've been following this post and I was wondering if

would allow (in the currently planed form or in the future) a wider set

non-rewriting DDLs to Postgres. For example, drop a column without
rewriting the table.

Perhaps. But dropping a column already does not rewrite the table, only
marks the column as dropped in system catalogs, so do you have a better
example.

One obvious example is that you have

CREATE TABLE t (
t1 int,
t3 int
);

and later want to add t2 in the middle, the only way currently is to
drop the table and start again (re-creating all dependant views, FKs,
etc). With the patch you will be able to add the column at the right
place. If no default value is supplied for the new column, no table
rewrite is necessary at all.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Cool! I didn't know I could drop stuff without rewriting.

Ya, that's another example, people do these from GUI tools. That's a nice
side effect. Cool (again)!

#90

Josh Berkus

josh@agliodbs.com

almost 11 years ago

In reply to: Jim Nasby (#54)

Re: logical column ordering

On 02/27/2015 12:48 PM, Tomas Vondra wrote:

On 27.2.2015 21:42, Josh Berkus wrote:

On 02/27/2015 12:25 PM, Tomas Vondra wrote:

On 27.2.2015 21:09, Josh Berkus wrote:

Tomas,

So for an API, 100% of the use cases I have for this feature would be
satisfied by:

ALTER TABLE ______ ALTER COLUMN _____ SET ORDER #

and:

ALTER TABLE _____ ADD COLUMN colname coltype ORDER #

Yes, I imagined an interface like that. Just to be clear, you're
talking about logical order (and not a physical one), right?

Correct. The only reason to rearrange the physical columns is in
order to optimize, which presumably would be carried out by a utility
command, e.g. VACUUM FULL OPTIMIZE.

I was thinking more about CREATE TABLE at this moment, but yeah, VACUUM
FULL OPTIMIZE might do the same thing.

Actually, I'm going to go back on what I said.

We need an API for physical column reordering, even if it's just pg_
functions. The reason is that we want to enable people writing their
own physical column re-ordering tools, so that our users can figure out
for us what the best reordering algorithm is.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WM905d7edde1c7e31059c10db685b9385e9cb387f829df1e5b3e955090735b6b3410fafb1a9cfc8a19e1b3811b9a4916ec@asav-3.01.com

#91

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Josh Berkus (#90)

Re: logical column ordering

On 27.2.2015 23:48, Josh Berkus wrote:

On 02/27/2015 12:48 PM, Tomas Vondra wrote:

On 27.2.2015 21:42, Josh Berkus wrote:

On 02/27/2015 12:25 PM, Tomas Vondra wrote:

On 27.2.2015 21:09, Josh Berkus wrote:

Tomas,

So for an API, 100% of the use cases I have for this feature would be
satisfied by:

ALTER TABLE ______ ALTER COLUMN _____ SET ORDER #

and:

ALTER TABLE _____ ADD COLUMN colname coltype ORDER #

Yes, I imagined an interface like that. Just to be clear, you're
talking about logical order (and not a physical one), right?

Correct. The only reason to rearrange the physical columns is in
order to optimize, which presumably would be carried out by a utility
command, e.g. VACUUM FULL OPTIMIZE.

I was thinking more about CREATE TABLE at this moment, but yeah, VACUUM
FULL OPTIMIZE might do the same thing.

Actually, I'm going to go back on what I said.

We need an API for physical column reordering, even if it's just pg_
functions. The reason is that we want to enable people writing their
own physical column re-ordering tools, so that our users can figure out
for us what the best reordering algorithm is.

I doubt that. For example, do you realize you can only do that while the
table is completely empty, and in that case you can just do a CREATE
TABLE with the proper order?

I also doubt the users will be able to optimize the order better than
users, who usually have on idea of how this actually works internally.

But if we want to allow users to define this, I'd say let's make that
part of CREATE TABLE, i.e. the order of columns defines logical order,
and you use something like 'AFTER' to specify physical order.

CREATE TABLE test (
a INT AFTER b, -- attlognum = 1, attphysnum = 2
b INT -- attlognum = 2, attphysnum = 1
);

It might get tricky because of cycles, though.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#92

David G Johnston

david.g.johnston@gmail.com

almost 11 years ago

In reply to: Tomas Vondra (#91)

Re: logical column ordering

Tomas Vondra-4 wrote

But if we want to allow users to define this, I'd say let's make that
part of CREATE TABLE, i.e. the order of columns defines logical order,
and you use something like 'AFTER' to specify physical order.

CREATE TABLE test (
a INT AFTER b, -- attlognum = 1, attphysnum = 2
b INT -- attlognum = 2, attphysnum = 1
);

Why not memorialize this as a storage parameter?

CREATE TABLE test (
a INT, b INT
)
WITH (layout = 'b, a')
;

A canonical form should be defined and then ALTER TABLE can either directly
update the current value or require the user to specify a new layout before
it will perform the alteration.

David J.

--
View this message in context: http://postgresql.nabble.com/logical-column-ordering-tp5829775p5839825.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#93

Josh Berkus

josh@agliodbs.com

almost 11 years ago

In reply to: Jim Nasby (#54)

Re: logical column ordering

On 02/27/2015 03:06 PM, Tomas Vondra wrote:

On 27.2.2015 23:48, Josh Berkus wrote:

Actually, I'm going to go back on what I said.

We need an API for physical column reordering, even if it's just pg_
functions. The reason is that we want to enable people writing their
own physical column re-ordering tools, so that our users can figure out
for us what the best reordering algorithm is.

I doubt that. For example, do you realize you can only do that while the
table is completely empty, and in that case you can just do a CREATE
TABLE with the proper order?

Well, you could recreate the table as the de-facto API, although as you
point out below that still requires new syntax.

But I was thinking of something which would re-write the table, just
like ADD COLUMN x DEFAULT '' does now.

I also doubt the users will be able to optimize the order better than
users, who usually have on idea of how this actually works internally.

We have a lot of power users, including a lot of the people on this
mailing list.

Among the things we don't know about ordering optimization:

* How important is it for index performance to keep key columns adjacent?

* How important is it to pack values < 4 bytes, as opposed to packing
values which are non-nullable?

* How important is it to pack values of the same size, as opposed to
packing values which are non-nullable?

But if we want to allow users to define this, I'd say let's make that
part of CREATE TABLE, i.e. the order of columns defines logical order,
and you use something like 'AFTER' to specify physical order.

CREATE TABLE test (
a INT AFTER b, -- attlognum = 1, attphysnum = 2
b INT -- attlognum = 2, attphysnum = 1
);

It might get tricky because of cycles, though.

It would be a lot easier to allow the user to specific a scalar.

CREATE TABLE test (
a INT NOT NULL WITH ( lognum 1, physnum 2 )
b INT WITH ( lognum 2, physnum 1 )

... and just throw an error if the user creates duplicates or gaps.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WMa29114dc20d5fcd9a6f6958ad565502a9043d19ac7f0f71072050f470bf5b7709021de9e6f4fe0d4f6c6ce7a3ac822c2@asav-1.01.com

#94

David G Johnston

david.g.johnston@gmail.com

almost 11 years ago

In reply to: David G Johnston (#92)

Re: logical column ordering

David G Johnston wrote

Tomas Vondra-4 wrote

But if we want to allow users to define this, I'd say let's make that
part of CREATE TABLE, i.e. the order of columns defines logical order,
and you use something like 'AFTER' to specify physical order.

CREATE TABLE test (
a INT AFTER b, -- attlognum = 1, attphysnum = 2
b INT -- attlognum = 2, attphysnum = 1
);

Why not memorialize this as a storage parameter?

CREATE TABLE test (
a INT, b INT
)
WITH (layout = 'b, a')
;

A canonical form should be defined and then ALTER TABLE can either
directly update the current value or require the user to specify a new
layout before it will perform the alteration.

David J.

Extending the idea a bit further why not have "CREATE TABLE" be the API; or
at least something similar to it?

CREATE OR REARRANGE TABLE test (
[...]
)
WITH ();

The "[...]" part would be logical and the WITH() would define the physical.
The "PK" would simply be whatever is required to make the command work.

David J.

--
View this message in context: http://postgresql.nabble.com/logical-column-ordering-tp5829775p5839828.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#95

Gavin Flower

GavinFlower@archidevsys.co.nz

almost 11 years ago

In reply to: Josh Berkus (#93)

Re: logical column ordering

On 28/02/15 12:21, Josh Berkus wrote:

On 02/27/2015 03:06 PM, Tomas Vondra wrote:

On 27.2.2015 23:48, Josh Berkus wrote:

Actually, I'm going to go back on what I said.

We need an API for physical column reordering, even if it's just pg_
functions. The reason is that we want to enable people writing their
own physical column re-ordering tools, so that our users can figure out
for us what the best reordering algorithm is.

I doubt that. For example, do you realize you can only do that while the
table is completely empty, and in that case you can just do a CREATE
TABLE with the proper order?

Well, you could recreate the table as the de-facto API, although as you
point out below that still requires new syntax.

But I was thinking of something which would re-write the table, just
like ADD COLUMN x DEFAULT '' does now.

I also doubt the users will be able to optimize the order better than
users, who usually have on idea of how this actually works internally.

We have a lot of power users, including a lot of the people on this
mailing list.

Among the things we don't know about ordering optimization:

* How important is it for index performance to keep key columns adjacent?

* How important is it to pack values < 4 bytes, as opposed to packing
values which are non-nullable?

* How important is it to pack values of the same size, as opposed to
packing values which are non-nullable?

But if we want to allow users to define this, I'd say let's make that
part of CREATE TABLE, i.e. the order of columns defines logical order,
and you use something like 'AFTER' to specify physical order.

CREATE TABLE test (
a INT AFTER b, -- attlognum = 1, attphysnum = 2
b INT -- attlognum = 2, attphysnum = 1
);

It might get tricky because of cycles, though.

It would be a lot easier to allow the user to specific a scalar.

CREATE TABLE test (
a INT NOT NULL WITH ( lognum 1, physnum 2 )
b INT WITH ( lognum 2, physnum 1 )

... and just throw an error if the user creates duplicates or gaps.

I thinks gaps should be okay.

Remember BASIC? Lines numbers tended to be in 10's so you could easily
slot new lines in between the existing ones - essential when using the
Teletype input/output device.

Though the difference here is that you would NOT want to remember the
original order numbers (at least I don't think that would be worth the
coding effort & space). However, the key is the actual order, not the
numbering. However, that might require a WARNING message to say that
the columns would be essentially renumbered from 1 onwards when the
table was actually created - if gaps had been left.

Cheers,
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#96

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 11 years ago

In reply to: Gavin Flower (#83)

Re: logical column ordering

On 2/27/15 2:37 PM, Gavin Flower wrote:

Might be an idea to allow lists of columns and their starting position.

ALTER TABLE customer ALTER COLUMN (job_id, type_id, account_num) SET
ORDER 3;

I would certainly want something along those lines because it would be
*way* less verbose (and presumably a lot faster) than a slew of ALTER
TABLE statements.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#97

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 11 years ago

In reply to: Alvaro Herrera (#87)

Re: logical column ordering

On 2/27/15 2:49 PM, Alvaro Herrera wrote:

Tomas Vondra wrote:

1) change the order of columns in "SELECT *"

- display columns so that related ones grouped together
(irrespectedly whether they were added later, etc.)

FWIW, I find the ordering more important for things like \d than SELECT *.

Hey, after we get this done the next step is allowing different logical
ordering for different uses! ;P

- keep columns synced with COPY

- requires user interface (ALTER TABLE _ ALTER COLUMN _ SET ORDER _)

Not sure about the "ORDER #" syntax. In ALTER ENUM we have "AFTER
<value>" and such .. I'd consider that instead.

+1. See also Gavin's suggestion of ALTER COLUMN (a, b, c) SET ORDER ...

2) optimization of physical order (efficient storage / tuple deforming)

- more efficient order for storage (deforming)

- may be done manually by reordering columns in CREATE TABLE

- should be done automatically (no user interface required)

A large part of it can be done automatically: for instance, not-nullable
fixed length types ought to come first, because that enables the

I would think that eliminating wasted space due to alignment would be
more important than optimizing attcacheoff, at least for a database that
doesn't fit in memory. Even if it does fit in memory I suspect memory
bandwidth is more important than clock cycles.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#98

Matt Kelly

mkellycs@gmail.com

almost 11 years ago

In reply to: Jim Nasby (#97)

Re: logical column ordering

Even if it does fit in memory I suspect memory bandwidth is more important
than clock cycles.

http://people.freebsd.org/~lstewart/articles/cpumemory.pdf

This paper is old but the ratios should still be pretty accurate. Main
memory is 240 clock cycles away and L1d is only 3. If the experiments in
this paper still hold true loading the 8K block into L1d is far more
expensive than the CPU processing done once the block is in cache.

When one adds in NUMA to the contention on this shared resource, its not
that hard for a 40 core machine to starve for memory bandwidth, and for
cores to sit idle waiting for main memory. Eliminating wasted space seems
far more important even when everything could fit in memory already.

#99

Gavin Flower

GavinFlower@archidevsys.co.nz

almost 11 years ago

In reply to: Jim Nasby (#97)

Re: logical column ordering

On 28/02/15 18:34, Jim Nasby wrote:

On 2/27/15 2:49 PM, Alvaro Herrera wrote:

Tomas Vondra wrote:

1) change the order of columns in "SELECT *"

- display columns so that related ones grouped together
(irrespectedly whether they were added later, etc.)

FWIW, I find the ordering more important for things like \d than
SELECT *.

Hey, after we get this done the next step is allowing different
logical ordering for different uses! ;P

[...]

How about CREATE COLUMN SELECTION my_col_sel (a, g, b, e) FROM TABLE
my_table;

Notes:

1. The column names must match those of the table

2. The COLUMN SELECTION is associated with the specified table

3. If a column gets renamed, then the COLUMN SELECTION effectively gets
updated to use the new column name
(This can probably be done automatically, by virtue of storing
references to the appropriate column definitions)

4. Allow fewer columns in the COLUMN SELECTION than the original table

5. Allow the the same column to be duplicated
(trivial, simply don't raise an error for duplicates)

6. Allow the COLUMN SELECTION name to appear instead of the list of
columns after the SELECT key word
(SELECT COLUMN SELECTION my_col_sel FROM my_table WHERE ... - must
match table in FROM clause)

If several tables are defined in the FROM clause, and 2 different tables
have COLUMN SELECTION with identical names, then the COLUMN SELECTION
names in the SELECT must be prefixed either the table name or its alias.

Cheers,
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#100

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Tomas Vondra (#91)

Re: logical column ordering

Tomas Vondra wrote:

We need an API for physical column reordering, even if it's just pg_
functions. The reason is that we want to enable people writing their
own physical column re-ordering tools, so that our users can figure out
for us what the best reordering algorithm is.

I doubt that. For example, do you realize you can only do that while the
table is completely empty, and in that case you can just do a CREATE
TABLE with the proper order?

Not if you have views or constraints depending on the table definition
-- it's not trivial to drop/recreate the table in that case, but you can
of course think about truncating it, then reorder columns, then
repopulate.

Even better you can cause a full table rewrite if needed.

But if we want to allow users to define this, I'd say let's make that
part of CREATE TABLE, i.e. the order of columns defines logical order,
and you use something like 'AFTER' to specify physical order.

CREATE TABLE test (
a INT AFTER b, -- attlognum = 1, attphysnum = 2
b INT -- attlognum = 2, attphysnum = 1
);

Surely you want an ALTER command as a minimum; perhaps that is enough
and there is no need for options in CREATE.

It might get tricky because of cycles, though.

If there's a cycle, just raise an error.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#101

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Jim Nasby (#96)

Re: logical column ordering

Jim Nasby wrote:

On 2/27/15 2:37 PM, Gavin Flower wrote:

Might be an idea to allow lists of columns and their starting position.

ALTER TABLE customer ALTER COLUMN (job_id, type_id, account_num) SET
ORDER 3;

I would certainly want something along those lines because it would be *way*
less verbose (and presumably a lot faster) than a slew of ALTER TABLE
statements.

You know you can issue multiple subcommands in one ALTER TABLE
statement, right? There's no reason to do more than one table rewrite.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#102

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Jim Nasby (#97)

Re: logical column ordering

Jim Nasby wrote:

On 2/27/15 2:49 PM, Alvaro Herrera wrote:

Tomas Vondra wrote:

1) change the order of columns in "SELECT *"

- display columns so that related ones grouped together
(irrespectedly whether they were added later, etc.)

FWIW, I find the ordering more important for things like \d than SELECT *.

Logical column ordering is (or should be) used in all places where
column lists are expanded user-visibly. \d would be one of those.
Think column order in the output of a JOIN also, for instance -- they
must follow logical column order.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#103

Bruce Momjian

bruce@momjian.us

almost 11 years ago

In reply to: Josh Berkus (#57)

Re: logical column ordering

On Thu, Feb 26, 2015 at 01:55:44PM -0800, Josh Berkus wrote:

On 02/26/2015 01:54 PM, Alvaro Herrera wrote:

This patch decouples these three things so that they
can changed freely -- but provides no user interface to do so. I think
that trying to only decouple the thing we currently have in two pieces,
and then have a subsequent patch to decouple again, is additional
conceptual complexity for no gain.

Oh, I didn't realize there weren't commands to change the LCO. Without
at least SQL syntax for LCO, I don't see why we'd take it; this sounds
more like a WIP patch.

FYI, pg_upgrade is going to need pg_dump --binary-upgrade to output the
columns in physical order with some logical ordering information, i.e.
pg_upgrade cannot be passed only logical ordering from pg_dump.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#104

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 11 years ago

In reply to: Bruce Momjian (#103)

Re: logical column ordering

On 3/3/15 11:23 AM, Bruce Momjian wrote:

On Thu, Feb 26, 2015 at 01:55:44PM -0800, Josh Berkus wrote:

On 02/26/2015 01:54 PM, Alvaro Herrera wrote:

This patch decouples these three things so that they
can changed freely -- but provides no user interface to do so. I think
that trying to only decouple the thing we currently have in two pieces,
and then have a subsequent patch to decouple again, is additional
conceptual complexity for no gain.

Oh, I didn't realize there weren't commands to change the LCO. Without
at least SQL syntax for LCO, I don't see why we'd take it; this sounds
more like a WIP patch.

FYI, pg_upgrade is going to need pg_dump --binary-upgrade to output the
columns in physical order with some logical ordering information, i.e.
pg_upgrade cannot be passed only logical ordering from pg_dump.

Wouldn't it need attno info too, so all 3 orderings?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#105

Bruce Momjian

bruce@momjian.us

almost 11 years ago

In reply to: Jim Nasby (#104)

Re: logical column ordering

On Tue, Mar 3, 2015 at 11:24:38AM -0600, Jim Nasby wrote:

On 3/3/15 11:23 AM, Bruce Momjian wrote:

On Thu, Feb 26, 2015 at 01:55:44PM -0800, Josh Berkus wrote:

On 02/26/2015 01:54 PM, Alvaro Herrera wrote:

This patch decouples these three things so that they
can changed freely -- but provides no user interface to do so. I think
that trying to only decouple the thing we currently have in two pieces,
and then have a subsequent patch to decouple again, is additional
conceptual complexity for no gain.

Oh, I didn't realize there weren't commands to change the LCO. Without
at least SQL syntax for LCO, I don't see why we'd take it; this sounds
more like a WIP patch.

FYI, pg_upgrade is going to need pg_dump --binary-upgrade to output the
columns in physical order with some logical ordering information, i.e.
pg_upgrade cannot be passed only logical ordering from pg_dump.

Wouldn't it need attno info too, so all 3 orderings?

Uh, what is the third ordering? Physical, logical, and ? It already
gets information about dropped columns, if that is the third one.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#106

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 11 years ago

In reply to: Bruce Momjian (#105)

Re: logical column ordering

On 3/3/15 11:26 AM, Bruce Momjian wrote:

On Tue, Mar 3, 2015 at 11:24:38AM -0600, Jim Nasby wrote:

On 3/3/15 11:23 AM, Bruce Momjian wrote:

On Thu, Feb 26, 2015 at 01:55:44PM -0800, Josh Berkus wrote:

On 02/26/2015 01:54 PM, Alvaro Herrera wrote:

This patch decouples these three things so that they
can changed freely -- but provides no user interface to do so. I think
that trying to only decouple the thing we currently have in two pieces,
and then have a subsequent patch to decouple again, is additional
conceptual complexity for no gain.

Oh, I didn't realize there weren't commands to change the LCO. Without
at least SQL syntax for LCO, I don't see why we'd take it; this sounds
more like a WIP patch.

FYI, pg_upgrade is going to need pg_dump --binary-upgrade to output the
columns in physical order with some logical ordering information, i.e.
pg_upgrade cannot be passed only logical ordering from pg_dump.

Wouldn't it need attno info too, so all 3 orderings?

Uh, what is the third ordering? Physical, logical, and ? It already
gets information about dropped columns, if that is the third one.

attnum; used in other catalogs to reference columns.

If you're shuffling everything though pg_dump anyway then maybe it's not
needed...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#107

Bruce Momjian

bruce@momjian.us

almost 11 years ago

In reply to: Jim Nasby (#106)

Re: logical column ordering

On Tue, Mar 3, 2015 at 11:41:20AM -0600, Jim Nasby wrote:

Wouldn't it need attno info too, so all 3 orderings?

Uh, what is the third ordering? Physical, logical, and ? It already
gets information about dropped columns, if that is the third one.

attnum; used in other catalogs to reference columns.

If you're shuffling everything though pg_dump anyway then maybe it's
not needed...

Yes, all those attno system table links are done with pg_dump SQL
commands.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#108

Peter Eisentraut

peter_e@gmx.net

almost 11 years ago

In reply to: Alvaro Herrera (#1)

Re: logical column ordering

On 12/9/14 12:41 PM, Alvaro Herrera wrote:

To recap, this is based on the idea of having three numbers for each
attribute rather than a single attnum; the first of these is attnum (a
number that uniquely identifies an attribute since its inception and may
or may not have any relationship to its storage position and the place
it expands to through user interaction). The second is attphysnum,
which indicates where it is stored in the physical structure. The third
is attlognum, which indicates where it expands in "*", where must its
values be placed in COPY or VALUES lists, etc --- the logical position
as the user sees it.

Side idea: Let attnum be the logical number, introduce attphysnum as
the storage position, and add an oid to pg_attribute as the eternal
identifier.

That way you avoid breaking pretty much all user code that looks at
pg_attribute, which will probably do something like ORDER BY attnum.

Also, one could get rid of all sorts of ugly code that works around the
lack of an oid in pg_attribute, such as in the dependency tracking.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#109

Tom Lane

tgl@sss.pgh.pa.us

almost 11 years ago

In reply to: Peter Eisentraut (#108)

Re: logical column ordering

Peter Eisentraut <peter_e@gmx.net> writes:

Side idea: Let attnum be the logical number, introduce attphysnum as
the storage position, and add an oid to pg_attribute as the eternal
identifier.

That way you avoid breaking pretty much all user code that looks at
pg_attribute, which will probably do something like ORDER BY attnum.

Also, one could get rid of all sorts of ugly code that works around the
lack of an oid in pg_attribute, such as in the dependency tracking.

I think using an OID would break more stuff than it fixes in dependency
tracking; in particular you would now need an explicit dependency link
from each column to its table, because the "sub-object" knowledge would
no longer work. In any case this patch is going to be plenty big enough
already without saddling it with a major rewrite of the dependency system.

I agree though that it's worth considering defining pg_attribute.attnum as
the logical column position so as to minimize the effects on client-side
code. I doubt there is much stuff client-side that cares about column
creation order, but there is plenty that cares about logical column order.
OTOH this would introduce confusion into the backend code, since Alvaro's
definition of attnum is what most of the backend should care about.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#110

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Tom Lane (#109)

Re: logical column ordering

On 12.3.2015 03:16, Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

Side idea: Let attnum be the logical number, introduce attphysnum
as the storage position, and add an oid to pg_attribute as the
eternal identifier.

That way you avoid breaking pretty much all user code that looks at
pg_attribute, which will probably do something like ORDER BY
attnum.

Also, one could get rid of all sorts of ugly code that works around
the lack of an oid in pg_attribute, such as in the dependency
tracking.

I think using an OID would break more stuff than it fixes in
dependency tracking; in particular you would now need an explicit
dependency link from each column to its table, because the
"sub-object" knowledge would no longer work. In any case this patch
is going to be plenty big enough already without saddling it with a
major rewrite of the dependency system.

Exactly. I believe Alvaro considered that option in the past.

I agree though that it's worth considering defining
pg_attribute.attnum as the logical column position so as to minimize
the effects on client-side code. I doubt there is much stuff
client-side that cares about column creation order, but there is
plenty that cares about logical column order. OTOH this would
introduce confusion into the backend code, since Alvaro's definition
of attnum is what most of the backend should care about.

IMHO reusing attnum for logical column order would actually make it more
complex, especially if we allow users to modify the logical order using
ALTER TABLE. Because if you change it, you have to walk through all the
places where it might be referenced and update those too (say, columns
referenced in indexes and such). Keeping attnum immutable makes this
much easier and simpler.

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#111

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Tomas Vondra (#110)

Re: logical column ordering

Tomas Vondra wrote:

On 12.3.2015 03:16, Tom Lane wrote:

I agree though that it's worth considering defining
pg_attribute.attnum as the logical column position so as to minimize
the effects on client-side code. I doubt there is much stuff
client-side that cares about column creation order, but there is
plenty that cares about logical column order. OTOH this would
introduce confusion into the backend code, since Alvaro's definition
of attnum is what most of the backend should care about.

IMHO reusing attnum for logical column order would actually make it more
complex, especially if we allow users to modify the logical order using
ALTER TABLE. Because if you change it, you have to walk through all the
places where it might be referenced and update those too (say, columns
referenced in indexes and such). Keeping attnum immutable makes this
much easier and simpler.

I think you're misunderstanding. The suggestion, as I understand it, is
to rename the attnum column to something else (maybe, say, attidnum),
and rename attlognum to attnum. That preserves the existing property
that "ORDER BY attnum" gives you the correct view of the table from the
point of view of the user. That's very useful because it means clients
looking at pg_attribute need less changes, or maybe none at all.

I think this wouldn't be too difficult to implement, because there
aren't that many places that refer to the column-identity attribute by
name; most of them just grab the TupleDesc->attrs array in whatever
order is appropriate and scan that in a loop. Only a few of these use
att->attnum inside the loop --- that's what would need to be changed,
and it should be pretty mechanical.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#112

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Alvaro Herrera (#111)

Re: logical column ordering

On 12.3.2015 14:17, Alvaro Herrera wrote:

Tomas Vondra wrote:

On 12.3.2015 03:16, Tom Lane wrote:

I agree though that it's worth considering defining
pg_attribute.attnum as the logical column position so as to minimize
the effects on client-side code. I doubt there is much stuff
client-side that cares about column creation order, but there is
plenty that cares about logical column order. OTOH this would
introduce confusion into the backend code, since Alvaro's definition
of attnum is what most of the backend should care about.

IMHO reusing attnum for logical column order would actually make it more
complex, especially if we allow users to modify the logical order using
ALTER TABLE. Because if you change it, you have to walk through all the
places where it might be referenced and update those too (say, columns
referenced in indexes and such). Keeping attnum immutable makes this
much easier and simpler.

I think you're misunderstanding. The suggestion, as I understand it,
is to rename the attnum column to something else (maybe, say,
attidnum), and rename attlognum to attnum. That preserves the
existing property that "ORDER BY attnum" gives you the correct view
of the table from the point of view of the user. That's very useful
because it means clients looking at pg_attribute need less changes,
or maybe none at all.

Hmm ... I understood it as a suggestion to drop attlognum and just
define (attnum, attphysnum).

I think this wouldn't be too difficult to implement, because there
aren't that many places that refer to the column-identity attribute
by name; most of them just grab the TupleDesc->attrs array in
whatever order is appropriate and scan that in a loop. Only a few of
these use att->attnum inside the loop --- that's what would need to
be changed, and it should be pretty mechanical.

I think it's way more complicated. We may fix all the pieces of the
code, but that's not all - attnum is referenced in various system views,
catalogs and such. For example pg_stats view does this:

FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE NOT attisdropped
AND has_column_privilege(c.oid, a.attnum, 'select');

information_schema also uses attnum on many places too.

I see the catalogs as a kind of public API, and redefining the meaning
of an existing column this way seems tricky, especially when we
reference it from other catalogs - I'm pretty sure there's plenty of SQL
queries in various tools that rely on this. Just google for "pg_indexes
indkeys unnest" and you'll find posts like this one from Craig:

http://stackoverflow.com/questions/18121103/how-to-get-the-index-column-orderasc-desc-nulls-first-from-postgresql

specifically tell people to do this:

SELECT
...
FROM (
SELECT
pg_class.relname,
...
unnest(pg_index.indkey) AS k
FROM pg_index
INNER JOIN pg_class ON pg_index.indexrelid = pg_class.oid
) i
...
INNER JOIN pg_attribute ON (pg_attribute.attrelid = i.indrelid
AND pg_attribute.attnum = k);

which specifically tells people to match attnum vs. indkeys. If we
redefine the meaning of attnum, and instead match indkeys against a
different column (say, attidnum), all those queries will be broken.

Which actually breaks the catalog definition as specified here:

http://www.postgresql.org/docs/devel/static/catalog-pg-index.html

which explicitly says that indkey references pg_attribute.attnum.

But maybe we don't really care about breaking this API and it is a good
approach - I need to think about it and try it.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#113

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Tomas Vondra (#112)

Re: logical column ordering

Tomas Vondra wrote:

On 12.3.2015 14:17, Alvaro Herrera wrote:

Tomas Vondra wrote:

On 12.3.2015 03:16, Tom Lane wrote:

I agree though that it's worth considering defining
pg_attribute.attnum as the logical column position so as to minimize
the effects on client-side code. I doubt there is much stuff
client-side that cares about column creation order, but there is
plenty that cares about logical column order. OTOH this would
introduce confusion into the backend code, since Alvaro's definition
of attnum is what most of the backend should care about.

IMHO reusing attnum for logical column order would actually make it more
complex, especially if we allow users to modify the logical order using
ALTER TABLE. Because if you change it, you have to walk through all the
places where it might be referenced and update those too (say, columns
referenced in indexes and such). Keeping attnum immutable makes this
much easier and simpler.

I think you're misunderstanding. The suggestion, as I understand it,
is to rename the attnum column to something else (maybe, say,
attidnum), and rename attlognum to attnum. That preserves the
existing property that "ORDER BY attnum" gives you the correct view
of the table from the point of view of the user. That's very useful
because it means clients looking at pg_attribute need less changes,
or maybe none at all.

Hmm ... I understood it as a suggestion to drop attlognum and just
define (attnum, attphysnum).

Pretty sure it wasn't that.

I think this wouldn't be too difficult to implement, because there
aren't that many places that refer to the column-identity attribute
by name; most of them just grab the TupleDesc->attrs array in
whatever order is appropriate and scan that in a loop. Only a few of
these use att->attnum inside the loop --- that's what would need to
be changed, and it should be pretty mechanical.

I think it's way more complicated. We may fix all the pieces of the
code, but that's not all - attnum is referenced in various system views,
catalogs and such. For example pg_stats view does this:

FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE NOT attisdropped
AND has_column_privilege(c.oid, a.attnum, 'select');

information_schema also uses attnum on many places too.

Those can be fixed with relative ease to refer to attidnum instead.

I see the catalogs as a kind of public API, and redefining the meaning
of an existing column this way seems tricky, especially when we
reference it from other catalogs - I'm pretty sure there's plenty of SQL
queries in various tools that rely on this.

That's true, but then we've never promised that system catalogs remain
unchanged forever. That would essentially stop development.

However, there's a difference between making a query silently given
different results, and breaking it completely forcing the user to
re-study how to write it. I think the latter is better. In that light
we should just drop attnum as a column name, and use something else:
maybe (attidnum, attlognum, attphysnum). So all queries in the wild
would be forced to be updated, but we would not silently change
semantics instead.

Which actually breaks the catalog definition as specified here:

http://www.postgresql.org/docs/devel/static/catalog-pg-index.html

which explicitly says that indkey references pg_attribute.attnum.

That's a simple doc fix.

But maybe we don't really care about breaking this API and it is a good
approach - I need to think about it and try it.

Yeah, thanks.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#114

Andres Freund

andres@2ndquadrant.com

almost 11 years ago

In reply to: Tom Lane (#109)

Re: logical column ordering

Hi,

On 2015-03-11 22:16:52 -0400, Tom Lane wrote:

I agree though that it's worth considering defining pg_attribute.attnum as
the logical column position so as to minimize the effects on client-side
code.

I actually wonder if it'd not make more sense to define it as the
physical column number. That'd reduce the invasiveness and risk of the
patch considerably. It means that most existing code doesn't have to be
changed and can just continue to refer to attnum like today. There's
much less risk of it being wrongly used to refer to the physical offset
instead of creation order. Queries against attnum would still give a
somewhat sane response.

It would make some ALTER TABLE commands a bit more complex if we want to
allow reordering the physical order. But that seems like a much more
localized complexity than previous patches in this thread (although I've
not looked at the last version).

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#115

Tom Lane

tgl@sss.pgh.pa.us

almost 11 years ago

In reply to: Alvaro Herrera (#113)

Re: logical column ordering

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

However, there's a difference between making a query silently given
different results, and breaking it completely forcing the user to
re-study how to write it. I think the latter is better. In that light
we should just drop attnum as a column name, and use something else:
maybe (attidnum, attlognum, attphysnum). So all queries in the wild
would be forced to be updated, but we would not silently change
semantics instead.

Hm. I'm on board with renaming like that inside the backend, but
I'm very much less excited about forcibly breaking client queries.
I think there is little if any client-side code that will care about
either attidnum or attphysnum, so forcing people to think about it
will just create make-work.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#116

Peter Eisentraut

peter_e@gmx.net

almost 11 years ago

In reply to: Andres Freund (#114)

Re: logical column ordering

On 3/12/15 10:07 AM, Andres Freund wrote:

I actually wonder if it'd not make more sense to define it as the
physical column number. That'd reduce the invasiveness and risk of the
patch considerably.

Clearly, the number of places where attnum has to be changed to
something else is not zero, and so it doesn't matter if a lot or a few
have to be changed. They all have to be looked at and considered.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#117

Peter Eisentraut

peter_e@gmx.net

almost 11 years ago

In reply to: Tom Lane (#109)

Re: logical column ordering

On 3/11/15 10:16 PM, Tom Lane wrote:

I think using an OID would break more stuff than it fixes in dependency
tracking; in particular you would now need an explicit dependency link
from each column to its table, because the "sub-object" knowledge would
no longer work.

That might not be a bad thing, but ...

In any case this patch is going to be plenty big enough
already without saddling it with a major rewrite of the dependency system.

... is true.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#118

Jeff Janes

jeff.janes@gmail.com

almost 11 years ago

In reply to: Tomas Vondra (#53)

Re: logical column ordering

On Mon, Feb 23, 2015 at 3:09 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

Hi,

attached is the result of my first attempt to make the logical column
ordering patch work. This touches a lot of code in the executor that is
mostly new to me, so if you see something that looks like an obvious
bug, it probably is (so let me know).

There is an apply conflict in src/backend/parser/parse_relation.c in the
comments for scanRTEForColumn.

It seems like it should be easy to ignore, but when I ignore it I get make
check failing all over the place.

(The first patch posted in this thread seems to work for me, I did some
testing on it before I realized it was obsolete.)

Cheers,

Jeff

#119

Robert Haas

robertmhaas@gmail.com

almost 11 years ago

In reply to: Alvaro Herrera (#113)

Re: logical column ordering

On Thu, Mar 12, 2015 at 9:57 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

However, there's a difference between making a query silently given
different results, and breaking it completely forcing the user to
re-study how to write it. I think the latter is better. In that light
we should just drop attnum as a column name, and use something else:
maybe (attidnum, attlognum, attphysnum). So all queries in the wild
would be forced to be updated, but we would not silently change
semantics instead.

+1 for that approach. Much better to break all of the third-party
code out there definitively than to bet on which attribute people are
going to want to use most commonly.

I'm a little confused as to the status of this patch. It's marked as
Waiting on Author in the CommitFest application, and the last patch
version was posted in December. The fact that the new CommitFest
application encourages people to blindly move things to the next CF
instead of forcing patch authors to reopen the record when they update
the patch is, IMHO, not good. It's just going to lead to the CF
application filling up with things that the authors aren't really
working on. We've got enough work to do with the patches that are
actually under active development.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#120

Peter Geoghegan

pg@heroku.com

almost 11 years ago

In reply to: Robert Haas (#119)

Re: logical column ordering

On Mon, Mar 23, 2015 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I'm a little confused as to the status of this patch. It's marked as
Waiting on Author in the CommitFest application, and the last patch
version was posted in December. The fact that the new CommitFest
application encourages people to blindly move things to the next CF
instead of forcing patch authors to reopen the record when they update
the patch is, IMHO, not good. It's just going to lead to the CF
application filling up with things that the authors aren't really
working on. We've got enough work to do with the patches that are
actually under active development.

Maybe there should be a "stalled" patch status summary, that
highlights patches that have not had their status change in (say) 2
weeks. Although it wouldn't really be a status summary, since that
they're mutually exclusive with each other in the CF app (e.g. a patch
cannot be both "Waiting on Author" and "Ready for Committer").

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#121

Jeff Janes

jeff.janes@gmail.com

almost 11 years ago

In reply to: Robert Haas (#119)

Re: logical column ordering

On Mon, Mar 23, 2015 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 12, 2015 at 9:57 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

However, there's a difference between making a query silently given
different results, and breaking it completely forcing the user to
re-study how to write it. I think the latter is better. In that light
we should just drop attnum as a column name, and use something else:
maybe (attidnum, attlognum, attphysnum). So all queries in the wild
would be forced to be updated, but we would not silently change
semantics instead.

+1 for that approach. Much better to break all of the third-party
code out there definitively than to bet on which attribute people are
going to want to use most commonly.

I'm a little confused as to the status of this patch. It's marked as
Waiting on Author in the CommitFest application, and the last patch
version was posted in December.

There was a patch here, which in the commit fest is "hidden" behind other
non-attachments in the same email:

Attachment (randomize.sql
</messages/by-id/attachment/37076/randomize.sql>)
at 2015-02-27
19:10:21
</messages/by-id/54F0C11D.7000906@2ndquadrant.com/> from
Tomas Vondra <tomas.vondra at 2ndquadrant.com>

But that patch failed the majority of "make check" checks in my hands. So
I also don't know what the status is.

Cheers,

Jeff

#122

Andres Freund

andres@2ndquadrant.com

almost 11 years ago

In reply to: Robert Haas (#119)

Re: logical column ordering

On 2015-03-23 13:01:48 -0400, Robert Haas wrote:

I'm a little confused as to the status of this patch. It's marked as
Waiting on Author in the CommitFest application, and the last patch
version was posted in December.

I think it fairly can be marked as "returned with feedback" for now?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#123

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Andres Freund (#122)

Re: logical column ordering

Andres Freund wrote:

On 2015-03-23 13:01:48 -0400, Robert Haas wrote:

I'm a little confused as to the status of this patch. It's marked as
Waiting on Author in the CommitFest application, and the last patch
version was posted in December.

I think it fairly can be marked as "returned with feedback" for now?

... which means that no useful feedback was received at all in this
round for this patch. (There was lots of feedback, mind you, but as far
as I can see it was all on the subject of how the patch is going to be
summarily rejected unless user-visible controls are offered -- and you
already know my opinion on that matter.)

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#124

Andres Freund

andres@2ndquadrant.com

almost 11 years ago

In reply to: Alvaro Herrera (#123)

Re: logical column ordering

On 2015-03-23 14:19:50 -0300, Alvaro Herrera wrote:

Andres Freund wrote:

On 2015-03-23 13:01:48 -0400, Robert Haas wrote:

I'm a little confused as to the status of this patch. It's marked as
Waiting on Author in the CommitFest application, and the last patch
version was posted in December.

I think it fairly can be marked as "returned with feedback" for now?

... which means that no useful feedback was received at all in this
round for this patch. (There was lots of feedback, mind you, but as far
as I can see it was all on the subject of how the patch is going to be
summarily rejected unless user-visible controls are offered -- and you
already know my opinion on that matter.)

To me the actual blocker seems to be the implementation. Which doesn't
look like it's going to be ready for 9.5; there seems to be loads of
work left to do. It's hard to provide non flame-bait feedback if the
patch isn't ready. I'm not sure what review you'd like to see at this
stage?

I think your approach of concentrating on the technical parts is sane,
and I'd continue going that way.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#125

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Robert Haas (#119)

Re: logical column ordering

Hi,

On 23.3.2015 18:01, Robert Haas wrote:

On Thu, Mar 12, 2015 at 9:57 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

However, there's a difference between making a query silently given
different results, and breaking it completely forcing the user to
re-study how to write it. I think the latter is better. In that light
we should just drop attnum as a column name, and use something else:
maybe (attidnum, attlognum, attphysnum). So all queries in the wild
would be forced to be updated, but we would not silently change
semantics instead.

+1 for that approach. Much better to break all of the third-party
code out there definitively than to bet on which attribute people are
going to want to use most commonly.

I'm a little confused as to the status of this patch. It's marked as
Waiting on Author in the CommitFest application, and the last patch
version was posted in December. The fact that the new CommitFest
application encourages people to blindly move things to the next CF
instead of forcing patch authors to reopen the record when they update
the patch is, IMHO, not good. It's just going to lead to the CF
application filling up with things that the authors aren't really
working on. We've got enough work to do with the patches that are
actually under active development.

The last version of the patch was submitted on 24/2 by me. Not sure why
it's not listed in the CF app, but it's here:

/messages/by-id/54EBB312.7090000@2ndquadrant.com

I'm working on a new version of the patch, based on the ideas that were
mentioned in this thread. I plan to post a new version within a few
days, hopefully.

Anyway, it's obvious this patch won't make it into 9.5 - it's a lot of
subtle changes on many places, so it's not suitable for the last
commitfest. But the feedback is welcome, of course.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#126

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Peter Geoghegan (#120)

Re: logical column ordering

Hi,

On 23.3.2015 18:07, Peter Geoghegan wrote:

On Mon, Mar 23, 2015 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I'm a little confused as to the status of this patch. It's marked as
Waiting on Author in the CommitFest application, and the last patch
version was posted in December. The fact that the new CommitFest
application encourages people to blindly move things to the next CF
instead of forcing patch authors to reopen the record when they update
the patch is, IMHO, not good. It's just going to lead to the CF
application filling up with things that the authors aren't really
working on. We've got enough work to do with the patches that are
actually under active development.

Maybe there should be a "stalled" patch status summary, that
highlights patches that have not had their status change in (say) 2
weeks. Although it wouldn't really be a status summary, since that
they're mutually exclusive with each other in the CF app (e.g. a patch
cannot be both "Waiting on Author" and "Ready for Committer").

Not sure how that's supposed to improve the situation? Also, when you
change the status to 'stalled', it only makes it more difficult to
identify why it was stalled (was it waiting for author or a review?).

What might be done is tracking "time since last patch/review", but I
really don't know how we're going to identify that considering the
problems with identifying which messages are patches.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#127

Peter Geoghegan

pg@heroku.com

almost 11 years ago

In reply to: Tomas Vondra (#126)

Re: logical column ordering

On Mon, Mar 23, 2015 at 11:50 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Not sure how that's supposed to improve the situation? Also, when you
change the status to 'stalled', it only makes it more difficult to
identify why it was stalled (was it waiting for author or a review?).

What might be done is tracking "time since last patch/review", but I
really don't know how we're going to identify that considering the
problems with identifying which messages are patches.

Perhaps I explained myself poorly. I am proposing having a totally
automated/mechanical way of highlighting no actual change in status in
the CF app. So I think we are in agreement here, or close enough. I
was just talking about a somewhat arbitrary point at which patches are
considered to have stalled within the CF app.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#128

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Jeff Janes (#121)

Re: logical column ordering

On 23.3.2015 18:08, Jeff Janes wrote:

On Mon, Mar 23, 2015 at 10:01 AM, Robert Haas <robertmhaas@gmail.com
<mailto:robertmhaas@gmail.com>> wrote:

There was a patch here, which in the commit fest is "hidden" behind
other non-attachments in the same email:

Attachment (randomize.sql
</messages/by-id/attachment/37076/randomize.sql>)
at 2015-02-27 19:10:21
</messages/by-id/54F0C11D.7000906@2ndquadrant.com/> from
Tomas Vondra <tomas.vondra at 2ndquadrant.com <http://2ndquadrant.com>>

But that patch failed the majority of "make check" checks in my hands.
So I also don't know what the status is.

Ummm, that's not a patch but a testing script ...

There was a patch submitted on 23/2, and I believe that passes most make
check tests, except for two IIRC. But it's not perfect - it was the
first version that mostly worked, and was somehow suitable for getting
feedback.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#129

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Peter Geoghegan (#127)

Re: logical column ordering

On 23.3.2015 19:52, Peter Geoghegan wrote:

On Mon, Mar 23, 2015 at 11:50 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Not sure how that's supposed to improve the situation? Also, when you
change the status to 'stalled', it only makes it more difficult to
identify why it was stalled (was it waiting for author or a review?).

What might be done is tracking "time since last patch/review", but
I really don't know how we're going to identify that considering
the problems with identifying which messages are patches.

Perhaps I explained myself poorly. I am proposing having a totally
automated/mechanical way of highlighting no actual change in status
in the CF app. So I think we are in agreement here, or close enough.
I was just talking about a somewhat arbitrary point at which patches
are considered to have stalled within the CF app.

Oh, right. Yes, tracking time since the last status change like this
might be useful, although my experience is that many patches are stuck
at some status yet there was a long discussion on the list ... Not sure
if that counts as 'stalled'.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#130

Josh Berkus

josh@agliodbs.com

almost 11 years ago

In reply to: Alvaro Herrera (#1)

Re: logical column ordering

On 03/23/2015 02:32 PM, Tomas Vondra wrote:

Oh, right. Yes, tracking time since the last status change like this
might be useful, although my experience is that many patches are stuck
at some status yet there was a long discussion on the list ... Not sure
if that counts as 'stalled'.

"Time since last email" maybe.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WMd318d79017111bfa19d39a48bcc2b3985935a58891cbcd6672d018bfb9b535f20d66af2e920d8d79a34dc2f033683d0d@asav-2.01.com

#131

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Andres Freund (#124)

Re: logical column ordering

On 23.3.2015 18:30, Andres Freund wrote:

I think it fairly can be marked as "returned with feedback" for
now?

That will eventually be the end result, yes. If it's time to do that
now, or leave the patch in the CF and only bounce it at the end, I don't
know.

... which means that no useful feedback was received at all in
this round for this patch. (There was lots of feedback, mind you,
but as far as I can see it was all on the subject of how the patch
is going to be summarily rejected unless user-visible controls are
offered -- and you already know my opinion on that matter.)

To me the actual blocker seems to be the implementation. Which
doesn't look like it's going to be ready for 9.5; there seems to be
loads of work left to do. It's hard to provide non flame-bait
feedback if the patch isn't ready. I'm not sure what review you'd
like to see at this stage?

The version I posted at the end of February is certainly incomplete (and
some of the regression tests fail), but it seemed reasonably complete to
get some feedback. That is not to say parts of the patch are probably
wrong / need rework.

I think your approach of concentrating on the technical parts is
sane, and I'd continue going that way.

I do work in that direction. OTOH I think it's useful to provide some
sort of "minimum usable API" so that people can actually use it without
messing with catalogs directly. It certainly won't have all the bells
and whistles, though.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#132

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 11 years ago

In reply to: Alvaro Herrera (#1)

Re: logical column ordering

On 23.3.2015 22:53, Jeff Janes wrote:

On Mon, Mar 23, 2015 at 11:52 AM, Tomas Vondra

Sorry, the 23/2 one is the one I meant. I got confused over which of
the emails listed as having an attachment but no patch was the one that
actually had a patch. (If the commitfest app can't correctly deal with
more than one attachment, it needs to at least give an indication that
this condition may exist).

But I am still getting a lot of errors during make check.

60 of 153 tests failed

Some of them look like maybe a change in the expected output file didn't
get included in the patch, but at least one was a coredump.

Yes, there were two coredumps (as noted in the message with the patch).

Not sure of the other errors - it certainly is possible I forgot to
include something in the patch. Thanks for noticing this, will look into
that.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: CAMkU1zC09KdzD_gMGQu5LaygwQkbvRhvwZ+wMxtTajNh_CQpQ@mail.gmail.com

#133

Kevin Grittner

kgrittn@ymail.com

almost 11 years ago

In reply to: Robert Haas (#119)

Re: logical column ordering

Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 12, 2015 at 9:57 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

However, there's a difference between making a query silently given
different results, and breaking it completely forcing the user to
re-study how to write it. I think the latter is better. In that light
we should just drop attnum as a column name, and use something else:
maybe (attidnum, attlognum, attphysnum). So all queries in the wild
would be forced to be updated, but we would not silently change
semantics instead.

+1 for that approach. Much better to break all of the third-party
code out there definitively than to bet on which attribute people are
going to want to use most commonly.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#134

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Andres Freund (#124)

Re: logical column ordering

I've been looking at this again. It has become apparent to me that what
we're doing in parse analysis is wrong, and the underlying node
representation is wrong too. Here's a different approach, which I hope
will give better fruits. I'm still working on implementing the ideas
here (and figuring out what the fallout is).

Currently, the patch stores RangeTblEntry->eref->colnames in logical
order; and it also adds a "map" from logical colnums to "attnum" (called
"lognums"). Now this is problematic for two reasons:

1. the lognums map becomes part of the stored representation of a rule;
any time you modified the logical ordering of a table underlying some
view, the view's _RETURN rule would have to be modified as well. Not
good.

2. RTE->eref->colnames is in attlognum order and thus can only be sanely
interpreted if RTE->lognums is available, so not only lognums would have
to be modified, but colnames as well.

I think the solution to both these issues is to store colnames in attnum
ordering not logical, and *not* output RTE->lognums as part of
_outRangeTblEntry. This means that every time we read the RTE for the
table we need to obtain lognums from its tupledesc. RTE->eref->colnames
can then be sorted appropriately at plan time.

At RTE creation time (addRangeTableEntry and siblings) we can obtain
lognums and physnums. Both these arrays are available for later
application in setrefs.c, avoiding the need of the obviously misplaced
relation_open() call we currently have there.

There is one gotcha, which is that expandTupleDesc (and, really,
everything from expandRTE downwards) will need to be split in two
somehow: one part needs to fill in the colnames array in attnum order,
and the other part needs to expand the attribute array into Var nodes in
logical order.

(If you recall, we need attphysnums at setrefs.c time so that we can
fix-up any TupleDesc created from a targetlist so that it contains the
proper attphysnum values. The attphysnum values for each attribute do
not propagate properly there, and I believe this is the mechanism to do
so.)

As I said, I'm still writing the first pieces of this so I'm not sure
what other ramifications it will have. If there are any thoughts, I
would appreciate them. (Particularly useful input on whether it is
acceptable to omit lognums/physnums from _outRangeTblEntry.)

An alternative idea would be to add lognums and physnums to RelOptInfo
instead of RangeTblEntry (we would do so during get_relation_info). I'm
not sure how this works for setrefs.c though, if at all; the advantage
is that RelOptInfo is not part of stored rules so we don't have to worry
about not saving them there.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#135

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Alvaro Herrera (#134)

Re: logical column ordering

On Tue, Apr 14, 2015 at 2:38 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

As I said, I'm still writing the first pieces of this so I'm not sure
what other ramifications it will have. If there are any thoughts, I
would appreciate them. (Particularly useful input on whether it is
acceptable to omit lognums/physnums from _outRangeTblEntry.)

I think the general rule is that an RTE should not contain any
structure information about the underlying relation that can
potentially change: the OID is OK because it's immutable for any given
relation. The relkind is not quite immutable because you can create a
_SELECT rule on a table and turn it into a view; I'm not sure how we
handle that, but it's a fairly minor exception anyhow. Everything
else in the RTE, with the new and perhaps-unfortunate exception of
security quals, is stuff derived from what's in the query, not the
table. I think it would be good for this to work the same way: the
structural information about the table should be found in the
relcache, not the RTE.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#136

Noah Misch

noah@leadboat.com

over 10 years ago

In reply to: Jeff Janes (#31)

Re: 9.5 release scheduling (was Re: logical column ordering)

On Thu, Dec 11, 2014 at 10:24:20AM -0800, Jeff Janes wrote:

On Thu, Dec 11, 2014 at 8:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

2. The amount of pre-release testing we get from people outside the
hard-core development crowd seems to be continuing to decrease.
We were fortunate that somebody found the JSONB issue before it was
too late to do anything about it.

We are not particularly inviting of feedback for whatever testing has been
done.

The definitive guide seems to be
https://wiki.postgresql.org/wiki/HowToBetaTest, and says:

You can report tests by email. You can subscribe to any PostgreSQL mailing
list from the subscription form <http://www.postgresql.org/community/lists/>
.

- pgsql-bugs: this is the preferred mailing list if you think you have
found a bug in the beta. You can also use the Bug Reporting Form
<http://www.postgresql.org/support/submitbug/>.
- pgsql-hackers: bugs, questions, and successful test reports are
welcome here if you are already subscribed to pgsql-hackers. Note that
pgsql-hackers is a high-traffic mailing list with a lot of development
discussion.

=========

So if you find a bug, you can report it on the bug reporting form (which
doesn't have a drop-down entry for 9.4RC1).

Let's get 9.5 alpha/beta/rc releases into that drop-down as we release them.

If you have positive results rather than negative ones (or even complaints
that are not actually bugs), you can subscribe to a mailing list which
generates a lot of traffic which is probably over your head and not
interesting to you.

Feel welcome to revise that part. Don't messages from non-subscribed people
make it to the list after manual moderation? Testers might want to create a
no-delivery subscription to avoid moderation delay, but the decision to
receive all -hackers mail is separate.

Does the core team keep a mental list of items they want to see tested by
the public, and they will spend their own time testing those things
themselves if they don't hear back on some positive tests for them?

Not sure about the core team. I myself would test essentially the same things
during beta regardless of what end users report having tested, because end
users will pick different test scenarios for the same features.

If we find reports of public testing that yields good results (or at least
no bugs) to be useful, we should be more clear on how to go about doing
it. But are positive reports useful? If I report a bug, I can write down
the steps to reproduce it, and then follow my own instructions to make sure
it does actually reproduce it. If I find no bugs, it is just "I did a
bunch of random stuff and nothing bad happened, that I noticed".

Positive reports have potential to be useful. In particular, mention the new
features you took action to try. Areas like BRIN, pg_rewind, foreign tables,
event triggers, CREATE POLICY, INSERT ... ON CONFLICT, and GROUPING SETS are
either completely new or have new sub-features. If nothing else, we can CC
reporters when considering changes to features they reported upon. Other
analysis would become attractive given a larger corpus of positive reports.

Thanks,
nm

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers