DROP COLUMN Progress

Started by Christopher Kings-Lynneover 23 years ago15 messages
#1Christopher Kings-Lynne
chriskl@familyhealth.com.au
1 attachment(s)

OK,

This is the problem I'm having with the DROP COLUMN implementation. Since
I've already incorporated all of Hiroshi's changes, I think this may have
been an issue in his trial implementation as well.

I have attached my current patch, which works fine and compiles properly.

Ok, if you drop a column 'b', then all these work properly:

select * from tab;
select tab.* from tab;
select b from tab;
update tab set b = 3;
select * from tab where b = 3;
insert into tab (b) values (3);

That's all good. However, the issue is that one of the things that happens
when you drop a column is that the column is renamed to 'dropped_%attnum%'.
So, say the 'b' column is renamed to 'dropped_2', then you can do this:

select dropped_2 from tab;
select tab.dropped_2 from tab;
update tab set dropped_2 = 3;
select * from tab where dropped_2 = 3;

Where have I missed the COLUMN_IS_DROPPED checks???

Another thing: I don't want to name dropped columns 'dropped_...' as I
think that's unfair on our non-English speaking users. Should I just use
'xxxx' or something?

Thanks for any help,

Chris

Attachments:

dropcolumn.txt.gzapplication/x-gzip; name=dropcolumn.txt.gzDownload
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christopher Kings-Lynne (#1)
Re: DROP COLUMN Progress

"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:

So, say the 'b' column is renamed to 'dropped_2', then you can do this:

select dropped_2 from tab;
select tab.dropped_2 from tab;
update tab set dropped_2 = 3;
select * from tab where dropped_2 = 3;

Where have I missed the COLUMN_IS_DROPPED checks???

Sounds like you aren't checking in the part of the parser that resolves
simple variable references.

Another thing: I don't want to name dropped columns 'dropped_...' as I
think that's unfair on our non-English speaking users. Should I just use
'xxxx' or something?

Don't be silly --- the system catalogs are completely English-centric
already. Do you want to change all the catalog and column names to
meaningless strings? Since the dropped columns should be invisible to
anyone who's not poking at the catalogs, I don't see that we are adding
any cognitive load ...

regards, tom lane

#3Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Christopher Kings-Lynne (#1)
Re: DROP COLUMN Progress

Christopher Kings-Lynne wrote:

OK,

This is the problem I'm having with the DROP COLUMN implementation. Since
I've already incorporated all of Hiroshi's changes, I think this may have
been an issue in his trial implementation as well.

I have attached my current patch, which works fine and compiles properly.

Ok, if you drop a column 'b', then all these work properly:

select * from tab;
select tab.* from tab;
select b from tab;
update tab set b = 3;
select * from tab where b = 3;
insert into tab (b) values (3);

That's all good. However, the issue is that one of the things that happens
when you drop a column is that the column is renamed to 'dropped_%attnum%'.
So, say the 'b' column is renamed to 'dropped_2', then you can do this:

select dropped_2 from tab;
select tab.dropped_2 from tab;
update tab set dropped_2 = 3;
select * from tab where dropped_2 = 3;

Where have I missed the COLUMN_IS_DROPPED checks???

OK, my guess is that it is checks in parser/. I would issue each of
these queries with a non-existant column name, find the error message in
the code, and add an isdropped check in those places.

Another thing: I don't want to name dropped columns 'dropped_...' as I
think that's unfair on our non-English speaking users. Should I just use
'xxxx' or something?

I think "dropped" is OK. The SQL is still English, e.g. DROP.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#4Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Bruce Momjian (#3)
Re: DROP COLUMN Progress

OK, my guess is that it is checks in parser/. I would issue each of
these queries with a non-existant column name, find the error message in
the code, and add an isdropped check in those places.

OK, I think I've narrowed it down to this block of code in scanRTEForColumn
in parse_relation.c:

/*
* Scan the user column names (or aliases) for a match. Complain if
* multiple matches.
*/
foreach(c, rte->eref->colnames)
{
/* @@ SKIP DROPPED HERE? @@ */
attnum++;
if (strcmp(strVal(lfirst(c)), colname) == 0)
{
if (result)
elog(ERROR, "Column reference \"%s\" is
ambiguous", colname);
result = (Node *) make_var(pstate, rte, attnum);
rte->checkForRead = true;
}
}

I'm thinking that I should put a 'SearchSysCacheCopy' where my @@ comment is
to retrieve the attribute by name, and then do a check to see if it's
dropped. Is that the best/fastest way of doing things? Seems unfortunate
to add a another cache lookup in every parsed query...

Comments?

Chris

#5Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Christopher Kings-Lynne (#4)
Re: DROP COLUMN Progress

Christopher Kings-Lynne wrote:

/*
* Scan the user column names (or aliases) for a match. Complain if
* multiple matches.
*/
foreach(c, rte->eref->colnames)
{
/* @@ SKIP DROPPED HERE? @@ */
attnum++;
if (strcmp(strVal(lfirst(c)), colname) == 0)
{
if (result)
elog(ERROR, "Column reference \"%s\" is
ambiguous", colname);
result = (Node *) make_var(pstate, rte, attnum);
rte->checkForRead = true;
}
}

I'm thinking that I should put a 'SearchSysCacheCopy' where my @@ comment is
to retrieve the attribute by name, and then do a check to see if it's
dropped. Is that the best/fastest way of doing things? Seems unfortunate
to add a another cache lookup in every parsed query...

I am still looking but perhaps you could supress dropped columns from
getting into eref in the first place.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#6Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Bruce Momjian (#5)
Re: DROP COLUMN Progress

I'm thinking that I should put a 'SearchSysCacheCopy' where my

@@ comment is

to retrieve the attribute by name, and then do a check to see if it's
dropped. Is that the best/fastest way of doing things? Seems

unfortunate

to add a another cache lookup in every parsed query...

I am still looking but perhaps you could supress dropped columns from
getting into eref in the first place.

OK, that's done. I'm working on not allowing dropped columns in UPDATE
targets now.

Chris

#7Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Christopher Kings-Lynne (#6)
Re: DROP COLUMN Progress

I am still looking but perhaps you could supress dropped columns from
getting into eref in the first place.

OK, that's done. I'm working on not allowing dropped columns in UPDATE
targets now.

OK, I've fixed it so that dropped columns cannot be targetted in an update
statement, however now I'm running into this problem:

If you issue an INSERT statement without qualifying any field names, ie:

INSERT INTO tab VALUES (3);

Although it will automatically insert NULL for the dropped columns, it still
does cache lookups for the type of the dropped columns, etc. I noticed that
when I tried setting the atttypid of the dropped column to (Oid)-1. Where
is the bit of code that figures out the list of columns to insert into in an
unqualified INSERT statement?

I'm thinking that it would be nice if the dropped columns never even make it
into the list of target attributes, for performance reasons...

Chris

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#5)
Re: DROP COLUMN Progress

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Christopher Kings-Lynne wrote:

I'm thinking that I should put a 'SearchSysCacheCopy' where my @@ comment is
to retrieve the attribute by name, and then do a check to see if it's
dropped. Is that the best/fastest way of doing things? Seems unfortunate
to add a another cache lookup in every parsed query...

I am still looking but perhaps you could supress dropped columns from
getting into eref in the first place.

That was my first thought also, but then the wrong attnum would be used
in the "make_var". Ugh. I think what Chris needs to do is extend the
eref data structure so that there can be placeholders for dropped
attributes. Perhaps NULLs could be included in the list, and then the
code would become like

attnum++;
if (lfirst(c) && strcmp(strVal(lfirst(c)), colname) == 0)
...

This would require changes in quite a number of places :-( but I'm not
sure we have much choice. The eref structure really needs to line up
with attnums.

Another possibility is to enter the dropped attnames in the eref list
normally, and do the drop test *after* hitting a match not before.
This is still slow, but not as horrendously O(N^2) slow as Chris's
original pseudocode. I'm not sure how much work it'd really save though;
you might find yourself hitting all the same places to add tests.

regards, tom lane

#9Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Christopher Kings-Lynne (#7)
Re: DROP COLUMN Progress

Christopher Kings-Lynne wrote:

I am still looking but perhaps you could supress dropped columns from
getting into eref in the first place.

OK, that's done. I'm working on not allowing dropped columns in UPDATE
targets now.

OK, I've fixed it so that dropped columns cannot be targetted in an update
statement, however now I'm running into this problem:

If you issue an INSERT statement without qualifying any field names, ie:

INSERT INTO tab VALUES (3);

Although it will automatically insert NULL for the dropped columns, it still
does cache lookups for the type of the dropped columns, etc. I noticed that
when I tried setting the atttypid of the dropped column to (Oid)-1. Where
is the bit of code that figures out the list of columns to insert into in an
unqualified INSERT statement?

parse_target.c::checkInsertTargets()

I'm thinking that it would be nice if the dropped columns never even make it
into the list of target attributes, for performance reasons...

Yea, just sloppy to have them there.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#10Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#8)
Re: DROP COLUMN Progress

That was my first thought also, but then the wrong attnum would be used
in the "make_var". Ugh. I think what Chris needs to do is extend the
eref data structure so that there can be placeholders for dropped
attributes. Perhaps NULLs could be included in the list, and then the
code would become like

Hmmm... I don't get it - at the moment I'm preventing them from even
getting into the eref and all regression tests pass and every test I try
works as well...

Chris

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christopher Kings-Lynne (#10)
Re: DROP COLUMN Progress

"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:

That was my first thought also, but then the wrong attnum would be used
in the "make_var". Ugh. I think what Chris needs to do is extend the
eref data structure so that there can be placeholders for dropped
attributes. Perhaps NULLs could be included in the list, and then the
code would become like

Hmmm... I don't get it - at the moment I'm preventing them from even
getting into the eref and all regression tests pass and every test I try
works as well...

Are you checking access to columns that're to the right of the one
dropped?

regards, tom lane

#12Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#11)
Re: DROP COLUMN Progress

"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:

That was my first thought also, but then the wrong attnum would be used
in the "make_var". Ugh. I think what Chris needs to do is extend the
eref data structure so that there can be placeholders for dropped
attributes. Perhaps NULLs could be included in the list, and then the
code would become like

Hmmm... I don't get it - at the moment I'm preventing them from even
getting into the eref and all regression tests pass and every test I try
works as well...

Are you checking access to columns that're to the right of the one
dropped?

OK, interesting:

test=# create table test (a int4, b int4, c int4, d int4);
CREATE TABLE
test=# insert into test values (1,2,3,4);
INSERT 16588 1
test=# alter table test drop b;
ALTER TABLE
test=# select * from test;
a | d | d
---+---+---
1 | 3 | 4
(1 row)

It half works, half doesn't. Sigh - how come these things always turn out
harder than I think!?

pg_attribute:

test=# select attrelid,attname,attisdropped from pg_attribute where
attrelid=16586 and attnum > 0;
attrelid | attname | attisdropped
----------+-----------+--------------
16586 | a | f
16586 | dropped_2 | t
16586 | c | f
16586 | d | f
(4 rows)

Chris

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christopher Kings-Lynne (#12)
Re: DROP COLUMN Progress

"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:

Are you checking access to columns that're to the right of the one
dropped?

OK, interesting:

test=# create table test (a int4, b int4, c int4, d int4);
CREATE TABLE
test=# insert into test values (1,2,3,4);
INSERT 16588 1
test=# alter table test drop b;
ALTER TABLE
test=# select * from test;
a | d | d
---+---+---
1 | 3 | 4
(1 row)

What of

SELECT a,c,d FROM test

I'll bet that doesn't work at all...

regards, tom lane

#14Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#13)
Re: DROP COLUMN Progress

test=# create table test (a int4, b int4, c int4, d int4);
CREATE TABLE
test=# insert into test values (1,2,3,4);
INSERT 16588 1
test=# alter table test drop b;
ALTER TABLE
test=# select * from test;
a | d | d
---+---+---
1 | 3 | 4
(1 row)

What of

SELECT a,c,d FROM test

I'll bet that doesn't work at all...

Yeah, broken. Damn.

test=# SELECT a,c,d FROM test;
a | c | d
---+---+---
1 | 2 | 3
(1 row)

test=# SELECT a,d FROM test;
a | d
---+---
1 | 3
(1 row)

test=# SELECT d,c,a FROM test;
d | c | a
---+---+---
3 | 2 | 1
(1 row)

Chris

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christopher Kings-Lynne (#14)
Re: DROP COLUMN Progress

"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:

What of
SELECT a,c,d FROM test
I'll bet that doesn't work at all...

Yeah, broken. Damn.

Yup. That loop we were just looking at needs to derive the correct
attnum when it matches a column name. If you remove deleted columns
from the eref list altogether, you get the wrong answer.

regards, tom lane