logical column position

Started by Neil Conwayover 22 years ago41 messageshackers
Jump to latest
#1Neil Conway
neilc@samurai.com

I'd like to add a new column to pg_attribute that specifies the
attribute's "logical position" within its relation. The idea here is
to separate the logical order of the columns in a relation from the
on-disk storage of the relation's tuples. This allows us to easily &
quickly change column order, add an additional column before or after
an existing column, etc.

At present, attnum basically does three things: identifies an column
within a relation, indicates which columns are system columns, and
defines the order of a relation's columns. I'd like to move this last
functionality into a separate pg_attribute column named "attpos" (or
"attlogicalpos"):

- when the table is created, attnum == attpos. System columns
have attpos < 0, as with attnum. At no point will two
columns of the same relation have the same attpos.

- when returning output to the client and no column ordering
is implied by the query (e.g. "SELECT * ..."), we sort the
columns in ascending attpos order.

- when storing a tuple on disk, we don't consider attpos

- if we want to change the order of the column's in a
relation, we can do so merely by updating pg_attribute; no
changes to the on-disk storage of the relation should be
necessary

A few notes:

(a) ISTM this should also apply to COPY TO and COPY FROM if the user
didn't supply a column list. Is this reasonable? It would break
dumps of the table's contents, but then again, dumps aren't
guaranteed to remain valid over arbitrary changes to the table's
meta-data.

(b) Using the above scheme that attnum == attpos initially, there
won't be any gaps in the sequence of attpos values. That means
that if, for example, we want to move the column in position 50
to position 1, we'll need to change the position's of all the
columns in positions [1..49] (and suffer the resulting MVCC
bloat in pg_attribute). Changing the column order is hardly a
performance critical operation, so that might be acceptable.

If we want to avoid this, one easy (but arguably unclean) way to
do so would be to make the initial value of attpos == attnum *
1000, and make attpos an int4 rather than an int2. Then, we can
do most column reordering operations with only a single
pg_attribute update -- in the worst-case that enough
re-orderings are done that we overflow the 999 "padding"
positions, we can just fall-back to doing multiple pg_attribute
updates. Is this worth doing, and/or is there a better way to
achieve the same effect?

(c) Do I need to consider inheritance?

Comments are welcome.

-Neil

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Neil Conway (#1)
Re: logical column position

Neil Conway writes:

(b) Using the above scheme that attnum == attpos initially, there
won't be any gaps in the sequence of attpos values. That means
that if, for example, we want to move the column in position 50
to position 1, we'll need to change the position's of all the
columns in positions [1..49] (and suffer the resulting MVCC
bloat in pg_attribute). Changing the column order is hardly a
performance critical operation, so that might be acceptable.

I don't think you can speak of "bloat" for pg_attribute. But you can
speak of a problem when you want to do the old col = col + 1 in the
presence of a unique index.

(c) Do I need to consider inheritance?

Inheritance is based on column names, so initially no, but if there is a
command to alter the column order, then it should have an ONLY option.

--
Peter Eisentraut peter_e@gmx.net

#3Jon Jensen
jon@endpoint.com
In reply to: Neil Conway (#1)
Re: logical column position

On Mon, 17 Nov 2003, Neil Conway wrote:

I'd like to add a new column to pg_attribute that specifies the
attribute's "logical position" within its relation. The idea here is
to separate the logical order of the columns in a relation from the
on-disk storage of the relation's tuples. This allows us to easily &
quickly change column order, add an additional column before or after
an existing column, etc.

That sounds excellent!

(a) ISTM this should also apply to COPY TO and COPY FROM if the user
didn't supply a column list. Is this reasonable? It would break
dumps of the table's contents, but then again, dumps aren't
guaranteed to remain valid over arbitrary changes to the table's
meta-data.

You're just saying it'd break old dumps, right? I'd assume COPY FROM would
use attpos ordering when writing out columns, or that every user-visible
interaction with the table pretends the columns are in attpos order. So
dumps would break no more or less than when adding or dropping a column
currently, right?

Jon

#4Neil Conway
neilc@samurai.com
In reply to: Peter Eisentraut (#2)
Re: logical column position

Peter Eisentraut <peter_e@gmx.net> writes:

I don't think you can speak of "bloat" for pg_attribute. But you
can speak of a problem when you want to do the old col = col + 1 in
the presence of a unique index.

I'm sorry, but I'm not sure what either of these comments mean -- can
you elaborate?

-Neil

#5Neil Conway
neilc@samurai.com
In reply to: Jon Jensen (#3)
Re: logical column position

Jon Jensen <jon@endpoint.com> writes:

You're just saying it'd break old dumps, right? I'd assume COPY FROM
would use attpos ordering when writing out columns, or that every
user-visible interaction with the table pretends the columns are in
attpos order. So dumps would break no more or less than when adding
or dropping a column currently, right?

Right -- AFAICS, the only change in COPY compatibility would be if you
COPY TO'd a table and then changed the logical column order in some
fashion, you would no longer be able to restore the dump (unless you
specified a column list for the COPY FROM -- which, btw, pg_dump
does). I don't think it will be a problem, I just thought I'd mention
it.

-Neil

#6Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Neil Conway (#5)
Re: logical column position

Right -- AFAICS, the only change in COPY compatibility would be if you
COPY TO'd a table and then changed the logical column order in some
fashion, you would no longer be able to restore the dump (unless you
specified a column list for the COPY FROM -- which, btw, pg_dump
does). I don't think it will be a problem, I just thought I'd mention
it.

Well it's the same problem as if you'd dropped a column in the middle of
the table.

BTW, one main consideration is that all the postgres admin apps will now
need to support ORDER BY attlognum for 7.5+.

Chris

#7Bruce Momjian
bruce@momjian.us
In reply to: Neil Conway (#4)
Re: logical column position

Neil Conway wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

I don't think you can speak of "bloat" for pg_attribute. But you
can speak of a problem when you want to do the old col = col + 1 in
the presence of a unique index.

I'm sorry, but I'm not sure what either of these comments mean -- can
you elaborate?

Peter is pointing out a problem with our unique indexes that might
cause you a problem. Suppose you have a unique index in attlognum:

test=> create table xx( lognum int);
CREATE TABLE
test=> insert into xx values (1);
INSERT 17145 1
test=> insert into xx values (2);
INSERT 17146 1
test=> update xx set lognum = lognum + 1;
UPDATE 2
test=> create unique index yy on xx (lognum);
CREATE INDEX
test=> update xx set lognum = lognum + 1;
ERROR: duplicate key violates unique constraint "yy"

There is discussion to delay unique constraint failures until commit,
then recheck them to see if they are still valid, sort of like what we
do with deferred triggers. This would fix the problem because on
commit, those values are unique, but aren't while the rows are updated
invidually. If we don't get that working you might want to use the 1000
gap idea because it doesn't cause this problem, and we don't support

1600 columns, so a 1000 gap shouldn't cause a problem and can be

modified later. If they hit 999 updates, just tell them to dump/reload
the table.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#8Rod Taylor
rbt@rbt.ca
In reply to: Christopher Kings-Lynne (#6)
Re: logical column position

On Mon, 2003-11-17 at 20:24, Christopher Kings-Lynne wrote:

Right -- AFAICS, the only change in COPY compatibility would be if you
COPY TO'd a table and then changed the logical column order in some
fashion, you would no longer be able to restore the dump (unless you
specified a column list for the COPY FROM -- which, btw, pg_dump
does). I don't think it will be a problem, I just thought I'd mention
it.

Well it's the same problem as if you'd dropped a column in the middle of
the table.

BTW, one main consideration is that all the postgres admin apps will now
need to support ORDER BY attlognum for 7.5+.

Yeah... how about maintaining attnum for the logical attribute number
and create an attphysnum or something for the physical position instead?

This is more intrusive into the source, but you don't need to teach new
tricks to external entities.

#9Peter Eisentraut
peter_e@gmx.net
In reply to: Christopher Kings-Lynne (#6)
Re: logical column position

Christopher Kings-Lynne writes:

BTW, one main consideration is that all the postgres admin apps will now
need to support ORDER BY attlognum for 7.5+.

But that is only really important if they've also used the ALTER TABLE
RESHUFFLE COLUMNS feature. So if they make one alteration for 7.5, they
need to do another. That seems fair.

--
Peter Eisentraut peter_e@gmx.net

#10Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Peter Eisentraut (#9)
Re: logical column position

BTW, one main consideration is that all the postgres admin apps will now
need to support ORDER BY attlognum for 7.5+.

But that is only really important if they've also used the ALTER TABLE
RESHUFFLE COLUMNS feature. So if they make one alteration for 7.5, they
need to do another. That seems fair.

Good point.

Chris

#11Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Peter Eisentraut (#9)
Re: logical column position

On Tue, 18 Nov 2003, Peter Eisentraut wrote:

Christopher Kings-Lynne writes:

BTW, one main consideration is that all the postgres admin apps will now
need to support ORDER BY attlognum for 7.5+.

But that is only really important if they've also used the ALTER TABLE
RESHUFFLE COLUMNS feature. So if they make one alteration for 7.5, they
need to do another. That seems fair.

The ability to reshuffle and to get the correct ordering in a client app
are separate unless we're going to assume that all access goes through
that particular client. If one user uses psql and shuffles them, a
second user using fooclient may not see the new ordering.

#12Dave Cramer
pg@fastcrypt.com
In reply to: Bruce Momjian (#7)
Re: logical column position

Will adding the logical attribute number break all of the external
tools? pg_dump, etc are all dependent on attnum now?

Would it be possible to keep the meaning of attnum the same externally
and add another column internally to represent the physical number?

Dave

#13Bruce Momjian
bruce@momjian.us
In reply to: Dave Cramer (#12)
Re: logical column position

Dave Cramer wrote:

Will adding the logical attribute number break all of the external
tools? pg_dump, etc are all dependent on attnum now?

Would it be possible to keep the meaning of attnum the same externally
and add another column internally to represent the physical number?

Interesting idea. It would require a lot of code renaming in the
backend, but it could be done.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#14Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Bruce Momjian (#13)
Re: logical column position

Will adding the logical attribute number break all of the external
tools? pg_dump, etc are all dependent on attnum now?

Would it be possible to keep the meaning of attnum the same externally
and add another column internally to represent the physical number?

Interesting idea. It would require a lot of code renaming in the
backend, but it could be done.

Given that the interfaces comprise pgadmin, phppgadmin, jdbc and odbc
and all the main developers for those lists read all these posts, I
think the massive amount of effort to maintain the external interface
isn't worth it.

I can vouch that it would take me exactly 2 minutes to add support for
attlognums in phpPgAdmin.

Chris

#15Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Christopher Kings-Lynne (#14)
Re: logical column position

Christopher Kings-Lynne wrote:

Will adding the logical attribute number break all of the external
tools? pg_dump, etc are all dependent on attnum now?

Would it be possible to keep the meaning of attnum the same externally
and add another column internally to represent the physical number?

Interesting idea. It would require a lot of code renaming in the
backend, but it could be done.

Given that the interfaces comprise pgadmin, phppgadmin, jdbc and odbc
and all the main developers for those lists read all these posts, I
think the massive amount of effort to maintain the external interface
isn't worth it.

I can vouch that it would take me exactly 2 minutes to add support for
attlognums in phpPgAdmin.

Lesson 1 in "SQL for beginners" says "Don't use SELECT * if you rely on
the order of columns". This discussion is about fixing a problem that
only inexperienced programmers have. It's like an extra set of wheels on
top of your car, just in case you drive wrong way...

What's happening if you simply delete a column? Ordering 1,2,3,5? Insert
another column of the same name, as a previously deleted, will it get
the old position number? And so on. IMHO, way too much effort for
working around situations that should be avoided anyway.

Regards,
Andreas

#16Dave Cramer
pg@fastcrypt.com
In reply to: Andreas Pflug (#15)
Re: logical column position

Andreas,

The point of this is to maintain the column position. I don't think that
an alter of a column type should move the column position. It may be
that programmers should not rely on this, but it happens, and in very
large projects. If we can avoid unexpected side-affects like moving the
columns position, then I think we should.

Dave

Show quoted text

On Wed, 2003-11-19 at 12:30, Andreas Pflug wrote:

Christopher Kings-Lynne wrote:

Will adding the logical attribute number break all of the external
tools? pg_dump, etc are all dependent on attnum now?

Would it be possible to keep the meaning of attnum the same externally
and add another column internally to represent the physical number?

Interesting idea. It would require a lot of code renaming in the
backend, but it could be done.

Given that the interfaces comprise pgadmin, phppgadmin, jdbc and odbc
and all the main developers for those lists read all these posts, I
think the massive amount of effort to maintain the external interface
isn't worth it.

I can vouch that it would take me exactly 2 minutes to add support for
attlognums in phpPgAdmin.

Lesson 1 in "SQL for beginners" says "Don't use SELECT * if you rely on
the order of columns". This discussion is about fixing a problem that
only inexperienced programmers have. It's like an extra set of wheels on
top of your car, just in case you drive wrong way...

What's happening if you simply delete a column? Ordering 1,2,3,5? Insert
another column of the same name, as a previously deleted, will it get
the old position number? And so on. IMHO, way too much effort for
working around situations that should be avoided anyway.

Regards,
Andreas

#17Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Dave Cramer (#16)
Re: logical column position

Dave Cramer wrote:

Andreas,

The point of this is to maintain the column position. I don't think that
an alter of a column type should move the column position.

Why should ALTER COLUMN change the column number, i.e. position?

It may be that programmers should not rely on this, but it happens, and in very
large projects. If we can avoid unexpected side-affects like moving the
columns position, then I think we should.

This is *expected* if behaviour if you delete and add columns; is there
any DB system out there that allows to reshuffle the column ordering?

Instead of some order-ordering facility it would be better to support
all kinds of column type changes, not only binary compatible ones. This
would help everybody, not only maintainers of ill-designed software.

Regards,
Andreas

#18Dave Cramer
pg@fastcrypt.com
In reply to: Andreas Pflug (#17)
Re: logical column position

Andreas,

On Wed, 2003-11-19 at 13:07, Andreas Pflug wrote:

Dave Cramer wrote:

Andreas,

The point of this is to maintain the column position. I don't think that
an alter of a column type should move the column position.

Why should ALTER COLUMN change the column number, i.e. position?

Rod's current proposed patch does that if you do an alter column alter
type. This is an artifact of the underlying mechanism. (ren old col, add
new col, update newcol=oldcol::newtype). Which is the point of the
logical column number discussion, and the todo item.

It may be that programmers should not rely on this, but it happens, and in very
large projects. If we can avoid unexpected side-affects like moving the
columns position, then I think we should.

This is *expected* if behaviour if you delete and add columns; is there
any DB system out there that allows to reshuffle the column ordering?

Yes, informix allows you to add the column before|after a column, and
mysql allows for add column after col. those are the only two I know
about.there could be more.

Show quoted text

Instead of some order-ordering facility it would be better to support
all kinds of column type changes, not only binary compatible ones. This
would help everybody, not only maintainers of ill-designed software.

Regards,
Andreas

#19Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Dave Cramer (#18)
Re: logical column position

Dave Cramer wrote:

Andreas,

On Wed, 2003-11-19 at 13:07, Andreas Pflug wrote:

Dave Cramer wrote:

Andreas,

The point of this is to maintain the column position. I don't think that
an alter of a column type should move the column position.

Why should ALTER COLUMN change the column number, i.e. position?

Rod's current proposed patch does that if you do an alter column alter
type. This is an artifact of the underlying mechanism. (ren old col, add
new col, update newcol=oldcol::newtype).

I must have missed that, can't find it in hackers?!?
In this case the old attnum value should simply be reused, to retain the
original ordering. IMHO this is necessary to prevent problems with any
object referencing a column (index, view, ...) The current proposal
invents the attpos for column ordering purpose only, but
views/indexes/etc will reference attnum, and would need updates.

Actually, a column that changes its attnum is just like a table changing
its oid, i.e. it's not the same object any more. This will provoke
problems in administration tools (at least in pgAdmin3, which will try
to refresh its display with the formerly known oid/attnum af ter
executing a change), and maybe other places too.

To put it differently: a ALTER COLUMN command may never-ever change the
identifier of the column, i.e. attrelid/attnum.

Regards,
Andreas

#20Hannu Krosing
hannu@tm.ee
In reply to: Andreas Pflug (#19)
Re: logical column position

Andreas Pflug kirjutas K, 19.11.2003 kell 20:45:

Dave Cramer wrote:

Why should ALTER COLUMN change the column number, i.e. position?

Rod's current proposed patch does that if you do an alter column alter
type. This is an artifact of the underlying mechanism. (ren old col, add
new col, update newcol=oldcol::newtype).

I must have missed that, can't find it in hackers?!?

Was on [PATCHES] IIRC.

In this case the old attnum value should simply be reused, to retain the
original ordering. IMHO this is necessary to prevent problems with any
object referencing a column (index, view, ...)

Actually these have to be recreaqted, especially when changing column
type.

Rod's patchs does that too ;)

The current proposal
invents the attpos for column ordering purpose only,

That's the only place _user_ sees it. The other uses are taken care of
inide database backend.

but
views/indexes/etc will reference attnum, and would need updates.

they also "reference" column type, and thus need to be updated anyway
when column type changes.

Actually, a column that changes its attnum is just like a table changing
its oid, i.e. it's not the same object any more. This will provoke
problems in administration tools (at least in pgAdmin3, which will try
to refresh its display with the formerly known oid/attnum af ter
executing a change), and maybe other places too.

Sure. _any_ change to database structure could break a client not
(designed to be) aware of that change.

To put it differently: a ALTER COLUMN command may never-ever change the
identifier of the column, i.e. attrelid/attnum.

to be even more restirictive: ALTER COLUMN may never-ever change the
type of the column, as this too may break some apps. Nah!

-------------
Hannu

#21Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Hannu Krosing (#20)
#22Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Andreas Pflug (#17)
#23Hannu Krosing
hannu@tm.ee
In reply to: Andreas Pflug (#21)
#24Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Hannu Krosing (#23)
#25Hannu Krosing
hannu@tm.ee
In reply to: Andreas Pflug (#24)
#26Dave Cramer
pg@fastcrypt.com
In reply to: Andreas Pflug (#24)
#27Manfred Koizar
mkoi-pg@aon.at
In reply to: Andreas Pflug (#17)
#28Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Hannu Krosing (#25)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#25)
#30Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#30)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Rod Taylor (#8)
#33Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Tom Lane (#32)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Pflug (#33)
#35Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Tom Lane (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Pflug (#35)
#37Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Tom Lane (#36)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Pflug (#19)
#39Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Tom Lane (#38)
#40Hannu Krosing
hannu@tm.ee
In reply to: Andreas Pflug (#28)
#41Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Hannu Krosing (#40)