Simple Column reordering

Started by Simon Riggsabout 19 years ago35 messageshackers
Jump to latest
#1Simon Riggs
simon@2ndQuadrant.com

Column storage position is the subject of many long threads in recent
times. Solutions proposed for this have been both fairly complex and
long enough that nothing seems likely to happen for 8.3. If I'm wrong,
then of course this proposal would be superceded.

I propose that at CREATE TABLE time, the column ordering is re-ordered
so that the table columns are packed more efficiently. This would be a
physical re-ordering, so that SELECT * and COPY without explicit column
definitions would differ from the original CREATE TABLE statement.

This would be an optional feature, off by default, controlled by a
USERSET GUC
optimize_column_order = off (default) | on

When the full column ordering proposal is implemented,
optimize_column_ordering would be set to default to on. The feature
would be supported for at least one more release after this to allow bug
analysis.

The proposed ordering would be:
1. All fixed length columns, arranged so that alignment is efficient
2. All variable length columns

All column ordering would stay as close as possible to original order

No changes would be made apart from at CREATE TABLE time.

The ordering would be repeatable, so that the order would not change on
repeated dump/restore of a table with no changes.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#2Andrew Dunstan
andrew@dunslane.net
In reply to: Simon Riggs (#1)
Re: Simple Column reordering

Simon Riggs wrote:

I propose that at CREATE TABLE time, the column ordering is re-ordered
so that the table columns are packed more efficiently. This would be a
physical re-ordering, so that SELECT * and COPY without explicit column
definitions would differ from the original CREATE TABLE statement.

This would be an optional feature, off by default, controlled by a
USERSET GUC
optimize_column_order = off (default) | on

Umm, you want a GUC setting to enable standards-breaking behaviour and
that will be obsolete when we do column ordering right, which is not
likely to be more than one release away, and could even still happen in
this coming release?

I hope I haven't misunderstood.

cheers

andrew

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andrew Dunstan (#2)
Re: Simple Column reordering

Andrew Dunstan wrote:

Simon Riggs wrote:

I propose that at CREATE TABLE time, the column ordering is re-ordered
so that the table columns are packed more efficiently. This would be a
physical re-ordering, so that SELECT * and COPY without explicit column
definitions would differ from the original CREATE TABLE statement.

This would be an optional feature, off by default, controlled by a
USERSET GUC
optimize_column_order = off (default) | on

Umm, you want a GUC setting to enable standards-breaking behaviour and
that will be obsolete when we do column ordering right, which is not
likely to be more than one release away, and could even still happen in
this coming release?

Given that we already seem to have a patch implementing a complete
solution, or part thereof, this would seem a rather shortsighted
proposal. Why not develop the whole thing and be done with it?

I don't understand the reluctance to implementing all of it. The most
serious objection I've seen, from Andreas IIRC, is that it would make
drivers' lives more difficult; but really, drivers have to cope with
dropped columns today which is a pain, and ISTM this proposal (not this
one here, but the three-column proposal) would make that a bit simpler.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#4Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#3)
Re: Simple Column reordering

"Alvaro Herrera" <alvherre@commandprompt.com> writes:

Given that we already seem to have a patch implementing a complete
solution

we do?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Alvaro Herrera (#3)
Re: Simple Column reordering

On Thu, 2007-02-22 at 23:49 -0300, Alvaro Herrera wrote:

Andrew Dunstan wrote:

Simon Riggs wrote:

I propose that at CREATE TABLE time, the column ordering is re-ordered
so that the table columns are packed more efficiently. This would be a
physical re-ordering, so that SELECT * and COPY without explicit column
definitions would differ from the original CREATE TABLE statement.

This would be an optional feature, off by default, controlled by a
USERSET GUC
optimize_column_order = off (default) | on

Umm, you want a GUC setting to enable standards-breaking behaviour and
that will be obsolete when we do column ordering right, which is not
likely to be more than one release away, and could even still happen in
this coming release?

Given that we already seem to have a patch implementing a complete
solution, or part thereof, this would seem a rather shortsighted
proposal. Why not develop the whole thing and be done with it?

That would be my preference, but if it doesn't happen, I wanted to have
a clear secondary proposal documented.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#6Simon Riggs
simon@2ndQuadrant.com
In reply to: Andrew Dunstan (#2)
Re: Simple Column reordering

On Thu, 2007-02-22 at 20:07 -0600, Andrew Dunstan wrote:

Simon Riggs wrote:

I propose that at CREATE TABLE time, the column ordering is re-ordered
so that the table columns are packed more efficiently. This would be a
physical re-ordering, so that SELECT * and COPY without explicit column
definitions would differ from the original CREATE TABLE statement.

This would be an optional feature, off by default, controlled by a
USERSET GUC
optimize_column_order = off (default) | on

Umm, you want a GUC setting to enable standards-breaking behaviour and
that will be obsolete when we do column ordering right, which is not
likely to be more than one release away, and could even still happen in
this coming release?

If this is standards-breaking as you say, I would withdraw immediately.
I checked the SQL standard and could not see how this would do so. The
standard states SELECT * would return columns in order; it doesn't say
what that order should be, nor does CREATE TABLE enforce the ordering to
be the same as it has specified, AFAICS. Please correct me and I will
withdraw. Practical issues seem far stronger drivers than standards
issues here, which is why the parameter would default=off.

If the full implementation exists and works, I would welcome it. This
proposal is really aimed at what we do if that doesn't occur; we must
wait to see if it will. I see that many users would want to get
something sooner rather than later. That isn't a commercial perspective,
I see that as a PostgreSQL advocacy perspective.

I also see that we are forcing change into the on-disk format of heaps
in this release. If we defer this to another release then we would be
effectively changing the on-disk format again in next release.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#7Lukas Kahwe Smith
smith@pooteeweet.org
In reply to: Simon Riggs (#6)
Re: Simple Column reordering

Simon Riggs wrote:

If this is standards-breaking as you say, I would withdraw immediately.
I checked the SQL standard and could not see how this would do so. The
standard states SELECT * would return columns in order; it doesn't say
what that order should be, nor does CREATE TABLE enforce the ordering to
be the same as it has specified, AFAICS. Please correct me and I will
withdraw. Practical issues seem far stronger drivers than standards
issues here, which is why the parameter would default=off.

I did not follow the entire thread. I just wanted to point out that IIRC
MS SQL Server (and maybe also Sybase) do automatically optimize the
internal order of how columns are stored to move fixed length (which
also means non NULLable for these two servers) columns to the left.
Maybe this will serve as a reference point (not necessarily for
standards compliance of course).

regards,
Lukas

#8Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Alvaro Herrera (#3)
Re: Simple Column reordering

I don't understand the reluctance to implementing all of it.
The most serious objection I've seen, from Andreas IIRC, is
that it would make drivers' lives more difficult; but really,
drivers have to cope with dropped columns today which is a

Yes, I already said, that my objection is probably moot in face of
drop column.

pain, and ISTM this proposal (not this one here, but the
three-column proposal) would make that a bit simpler.

Do all of them support dropped columns correctly, yet ?

Andreas

#9Guillaume Smet
guillaume.smet@gmail.com
In reply to: Simon Riggs (#6)
Re: Simple Column reordering

Hi Simon,

On 2/23/07, Simon Riggs <simon@2ndquadrant.com> wrote:

If this is standards-breaking as you say, I would withdraw immediately.
I checked the SQL standard and could not see how this would do so. The
standard states SELECT * would return columns in order; it doesn't say
what that order should be, nor does CREATE TABLE enforce the ordering to
be the same as it has specified, AFAICS. Please correct me and I will
withdraw.

Phil Currier seems to have a preliminary proposition so I think we
should wait for his patch. AFAICS in what he did, the physical
position is not tied to the column ordering and it's better IMHO. It
doesn't seem very intrusive and it doesn't change the "visible"
behavior of PostgreSQL.
Personnaly, I really need my column ordering. When I create my table,
I put the columns in a logical order and I need they stay as I created
them (for \d for example - on a large table I make semantic groups so
I can find the field names more easily).
And I think a lot of people do it as well so I'm not sure people would
use a GUC which changes that much their habits.

--
Guillaume

#10Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Lukas Kahwe Smith (#7)
Re: Simple Column reordering

If this is standards-breaking as you say, I would withdraw

immediately.

I checked the SQL standard and could not see how this would do so.

The

standard states SELECT * would return columns in order; it doesn't

say

Imho the create table order is implied. What other order would they mean
with "in order" ?

what that order should be, nor does CREATE TABLE enforce the

ordering

to be the same as it has specified, AFAICS. Please correct me and I
will withdraw. Practical issues seem far stronger drivers than
standards issues here, which is why the parameter would default=off.

I did not follow the entire thread. I just wanted to point
out that IIRC MS SQL Server (and maybe also Sybase) do
automatically optimize the internal order of how columns are
stored to move fixed length (which also means non NULLable
for these two servers) columns to the left.
Maybe this will serve as a reference point (not necessarily
for standards compliance of course).

If you state that, it is imho also important to note that this is not
externally visible. select * is not altered.
I think we can have this based on Phil's patch. Simon would probably be
willing to extend it to choose the best physical order during create
table time. And the first step would be done.

The next step could then be to "fix" drop column and add "add column
before"
eighter using a 3rd column or using the existing 2.

Andreas

#11Simon Riggs
simon@2ndQuadrant.com
In reply to: Guillaume Smet (#9)
Re: Simple Column reordering

On Fri, 2007-02-23 at 09:46 +0100, Guillaume Smet wrote:

Hi Simon,

On 2/23/07, Simon Riggs <simon@2ndquadrant.com> wrote:

If this is standards-breaking as you say, I would withdraw immediately.
I checked the SQL standard and could not see how this would do so. The
standard states SELECT * would return columns in order; it doesn't say
what that order should be, nor does CREATE TABLE enforce the ordering to
be the same as it has specified, AFAICS. Please correct me and I will
withdraw.

Phil Currier seems to have a preliminary proposition so I think we
should wait for his patch.

That is exactly what I've said also, so it is good we agree.

I had read that Phil had declined to work on it further; I hope he
changes his mind on that.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#12Albe Laurenz
all@adv.magwien.gv.at
In reply to: Simon Riggs (#1)
Re: Simple Column reordering

Simon Riggs wrote:

I propose that at CREATE TABLE time, the column ordering is

re-ordered

so that the table columns are packed more efficiently. This would be

a

physical re-ordering, so that SELECT * and COPY without explicit

column

definitions would differ from the original CREATE TABLE statement.

How about INSERTs without column names? Wouldn't that also cease to
work?

In particular, 'pg_dump --format=plain --inserts' would produce unusable
output unless --attribute-inserts is also given, right?

Yours,
Laurenz Albe

#13Guillaume Smet
guillaume.smet@gmail.com
In reply to: Simon Riggs (#11)
Re: Simple Column reordering

On 2/23/07, Simon Riggs <simon@2ndquadrant.com> wrote:

I had read that Phil had declined to work on it further; I hope he
changes his mind on that.

IIRC he just said he wasn't interested to work on the visible ordering
part (as in MySQL) and I don't think it's a problem as even if it's
related it's a different problem.

--
Guillaume

#14Peter Eisentraut
peter_e@gmx.net
In reply to: Simon Riggs (#6)
Re: Simple Column reordering

Am Freitag, 23. Februar 2007 09:08 schrieb Simon Riggs:

If this is standards-breaking as you say, I would withdraw immediately.
I checked the SQL standard and could not see how this would do so. The
standard states SELECT * would return columns in order; it doesn't say
what that order should be,

b) Otherwise, the <select list> “*” is equivalent to a <value expression>
sequence in which each <value expression> is a column reference that
references a column of T and each column of T is referenced exactly once. The
columns are referenced in the ascending sequence of their ordinal position
within T.

nor does CREATE TABLE enforce the ordering to
be the same as it has specified, AFAICS.

b) The column descriptors of every column of T, according to the Syntax Rules
and General Rules of Subclause 11.4, “<column definition>”, applied to the
<column definition>s contained in TEL, in the order in which they were
specified.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#15Simon Riggs
simon@2ndQuadrant.com
In reply to: Peter Eisentraut (#14)
Re: Simple Column reordering

On Fri, 2007-02-23 at 11:25 +0100, Peter Eisentraut wrote:

Am Freitag, 23. Februar 2007 09:08 schrieb Simon Riggs:

If this is standards-breaking as you say, I would withdraw immediately.
I checked the SQL standard and could not see how this would do so. The
standard states SELECT * would return columns in order; it doesn't say
what that order should be,

b) Otherwise, the <select list> “*” is equivalent to a <value expression>
sequence in which each <value expression> is a column reference that
references a column of T and each column of T is referenced exactly once. The
columns are referenced in the ascending sequence of their ordinal position
within T.

Which begs the question: what is their ordinal position? If we change
the ordinal position at CREATE TABLE time then the SELECT * would still
work per standard. So the next point is the important one:

nor does CREATE TABLE enforce the ordering to
be the same as it has specified, AFAICS.

b) The column descriptors of every column of T, according to the Syntax Rules
and General Rules of Subclause 11.4, “<column definition>”, applied to the
<column definition>s contained in TEL, in the order in which they were
specified.

I read both of these phrases prior to posting the original suggestion.

My reading was that this was about constraints on columns, not the
columns themselves, when that phrase was taken in context. I take it you
think that reading was wrong?

I'm not a lawyer, so I'll not argue too close to the edge.

Suggestion withdrawn.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#16Peter Eisentraut
peter_e@gmx.net
In reply to: Simon Riggs (#15)
Re: Simple Column reordering

Am Freitag, 23. Februar 2007 12:25 schrieb Simon Riggs:

My reading was that this was about constraints on columns, not the
columns themselves, when that phrase was taken in context. I take it you
think that reading was wrong?

I see nothing there that speaks of constraints.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#17Andrew Dunstan
andrew@dunslane.net
In reply to: Simon Riggs (#15)
Re: Simple Column reordering

Simon Riggs wrote:

On Fri, 2007-02-23 at 11:25 +0100, Peter Eisentraut wrote:

Am Freitag, 23. Februar 2007 09:08 schrieb Simon Riggs:

If this is standards-breaking as you say, I would withdraw immediately.
I checked the SQL standard and could not see how this would do so. The
standard states SELECT * would return columns in order; it doesn't say
what that order should be,

b) Otherwise, the <select list> “*” is equivalent to a <value expression>
sequence in which each <value expression> is a column reference that
references a column of T and each column of T is referenced exactly once. The
columns are referenced in the ascending sequence of their ordinal position
within T.

Which begs the question: what is their ordinal position? If we change
the ordinal position at CREATE TABLE time then the SELECT * would still
work per standard.

That's quite a stretch. Surely "their ordinal position" can't mean
"their ordinal position as arbitrarily determined at CREATE TABLE time
by the implementation".

I really don't think that we can accept under any circumstances a
situation where something as simple as this breaks:

create table foo (x text, y int);
insert into foo values ('qwerty',1);

Physical storage optimization must not have any SQL level visibility or
consequences, IMNSHO, regardless of what we do about providing mutable
display order.

If you really want an interim solution, what about a builtin function
that would explicitly mutate the definition and table contents (if any)
along the lines you want? (assuming that's lots less work than just
doing the whole thing right to start with). Or even one which just
*displayed* the optimal order might be sufficient assistance to DBAs who
want to take advantage of this.

cheers

andrew

#18Simon Riggs
simon@2ndQuadrant.com
In reply to: Andrew Dunstan (#17)
Re: Simple Column reordering

On Fri, 2007-02-23 at 07:52 -0500, Andrew Dunstan wrote:

I really don't think that we can accept under any circumstances a
situation where something ... breaks:

Yes, I've accepted that, in response to Peter earlier today.

If you really want an interim solution, what about a builtin function
that would explicitly mutate the definition and table contents (if any)
along the lines you want? (assuming that's lots less work than just
doing the whole thing right to start with). Or even one which just
*displayed* the optimal order might be sufficient assistance to DBAs who
want to take advantage of this.

I think the only interim solution now is to put functionality into
PgAdmin et al to optimize the column order.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#6)
Re: Simple Column reordering

"Simon Riggs" <simon@2ndquadrant.com> writes:

If this is standards-breaking as you say, I would withdraw immediately.
I checked the SQL standard and could not see how this would do so. The
standard states SELECT * would return columns in order; it doesn't say
what that order should be, nor does CREATE TABLE enforce the ordering to
be the same as it has specified, AFAICS.

SQL92 7.9 <query specification> defines the meaning of SELECT * as

b) Otherwise, the <select list> "*" is equivalent to a <value
expression> sequence in which each <value expression> is a
<column reference> that references a column of T and each
column of T is referenced exactly once. The columns are ref-
erenced in the ascending sequence of their ordinal position
within T.

11.3 <table definition> says

2) The degree of the table being created is initially set to 0; the
General Rules of Subclause 11.4, "<column definition>" specify
the degree of the table being created during the definition of
columns in that table.

and 11.4 <column definition> says

4) The degree of the table T being defined in the containing <table
definition> or <temporary table declaration> or altered by the
containing <alter table statement> is increased by 1.

5) A column descriptor is created that describes the column being
defined. .... The ordinal position included
in the column descriptor is equal to the degree of T.

Now, I will grant you that it doesn't actually say anywhere that the
column definitions in CREATE TABLE must be processed left to right, but
if they meant this behavior to be implementation-dependent they would
have said so. Given the number of places in the spec in which semantics
are directly dependent on ordinal position, I cannot think that that is
intended --- for example, the behavior of <row comparison> becomes
completely undefined if column ordinal positions aren't fixed.

regards, tom lane

#20Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Simon Riggs (#18)
Re: Simple Column reordering

On Fri, Feb 23, 2007 at 02:09:55PM +0000, Simon Riggs wrote:

If you really want an interim solution, what about a builtin function
that would explicitly mutate the definition and table contents (if any)
along the lines you want? (assuming that's lots less work than just
doing the whole thing right to start with). Or even one which just
*displayed* the optimal order might be sufficient assistance to DBAs who
want to take advantage of this.

I think the only interim solution now is to put functionality into
PgAdmin et al to optimize the column order.

Well, if it comes to that it would be good to have pgAdmin et all driven
by logic in the database, so that people using psql can benefit as well.
Perhaps a function that is passed an existing table and re-creates it in
optimal order (if it's empty...). Or at least spits out a CREATE TABLE
statement for you that's in optimal order.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#21Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jim Nasby (#20)
#22Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Alvaro Herrera (#21)
#23Bruce Momjian
bruce@momjian.us
In reply to: Lukas Kahwe Smith (#7)
#24Lukas Kahwe Smith
smith@pooteeweet.org
In reply to: Bruce Momjian (#23)
#25Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#1)
#26Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#25)
#27Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#26)
#28Joshua D. Drake
jd@commandprompt.com
In reply to: Andrew Dunstan (#26)
#29Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#25)
#30Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#29)
#31Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#30)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#31)
#33Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#31)
#34Josh Berkus
josh@agliodbs.com
In reply to: Bruce Momjian (#30)
#35Andrew Dunstan
andrew@dunslane.net
In reply to: Josh Berkus (#34)