varchar() troubles
I've been seeing trouble with varchar() columns for a little while, and
since it is still there with a fresh install of the development tree
it's time to report it:
postgres=> create table t (v varchar(80),i int);
CREATE
postgres=> insert into t values ('hi',1);
INSERT 18122 1
postgres=> select * from t;
v |i
--+-
hi|0
(1 row)
As you can see, the varchar() column apparently trashes the subsequent
column. I ran across it trying to verify the tutorial examples for the
documentation. If the varchar() is the last column in the table, the
problem does not crop up, at least in the simplest case:
postgres=> create table t2 (i int, v varchar(80));
CREATE
postgres=> insert into t2 values (2,'hi');
INSERT 18133 1
postgres=> select * from t2;
i|v
-+--
2|hi
(1 row)
Also, I believe that if the varchar() field is substantially shorter the
problem does not manifest itself:
postgres=> create table t4 (v varchar(4), i int);
CREATE
postgres=> insert into t4 values ('hi',4);
INSERT 18156 1
postgres=> select * from t4;
v |i
--+-
hi|4
(1 row)
but varchar(10) still shows the problem:
postgres=> create table t3 (v varchar(10), i int);
CREATE
postgres=> insert into t3 values ('hi',3);
INSERT 18145 1
postgres=> select * from t3;
v |i
--+-
hi|0
(1 row)
This is from the development source as of around 1998-01-12 14:30 GMT
-
Tom
I've been seeing trouble with varchar() columns for a little while, and
since it is still there with a fresh install of the development tree
it's time to report it:postgres=> create table t (v varchar(80),i int);
CREATE
postgres=> insert into t values ('hi',1);
INSERT 18122 1
postgres=> select * from t;
v |i
--+-
hi|0
(1 row)
Did you see this before or only after the varchar() length change I
made?
--
Bruce Momjian
maillist@candle.pha.pa.us
Bruce Momjian wrote:
I've been seeing trouble with varchar() columns for a little while, and
since it is still there with a fresh install of the development tree
it's time to report it:postgres=> create table t (v varchar(80),i int);
CREATE
postgres=> insert into t values ('hi',1);
INSERT 18122 1
postgres=> select * from t;
v |i
--+-
hi|0
(1 row)Did you see this before or only after the varchar() length change I
made?
Pretty sure only after but it's hard to tell for sure. My trees for 971204
and 971222 both core dump on inserts to varchar, but I can't remember what
else I was doing with the trees at the time. v6.2.1p5 works OK on this:
postgres=> create table t (v varchar(80),i int);
CREATE
postgres=> insert into t values ('hi',1);
INSERT 142735 1
postgres=> select * from t;
v |i
--+-
hi|1
(1 row)
-
Tom
Pretty sure only after but it's hard to tell for sure. My trees for 971204
and 971222 both core dump on inserts to varchar, but I can't remember what
else I was doing with the trees at the time.
I now recall that the varchar code was broken during this time (for who knows
how long?), as I discovered when trying to reproduce the tutorial results for
the documentation.
The problem was that some things were copied using VARSIZE rather than
subtracting out VARHDRSZ first (actually, I think it might have use
sizeof(int) and other dangers too). I patched that near the end of the year
and my 980101.d tree and 980106.d tree do not exhibit the symptom:
postgres=> create table t (v varchar(80),i int);
CREATE
postgres=> insert into t values ('hi', 1);
INSERT 643562 1
postgres=> select * from t;
v |i
--+-
hi|1
(1 row)
Hope this helps :/
- Tom
Hi,
I can confirm that I see the same problem here.
Maybe todo with the recent varchar() storage changes.
Keith.
"Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Show quoted text
I've been seeing trouble with varchar() columns for a little while, and
since it is still there with a fresh install of the development tree
it's time to report it:postgres=> create table t (v varchar(80),i int);
CREATE
postgres=> insert into t values ('hi',1);
INSERT 18122 1
postgres=> select * from t;
v |i
--+-
hi|0
(1 row)As you can see, the varchar() column apparently trashes the subsequent
column. I ran across it trying to verify the tutorial examples for the
documentation. If the varchar() is the last column in the table, the
problem does not crop up, at least in the simplest case:postgres=> create table t2 (i int, v varchar(80));
CREATE
postgres=> insert into t2 values (2,'hi');
INSERT 18133 1
postgres=> select * from t2;
i|v
-+--
2|hi
(1 row)Also, I believe that if the varchar() field is substantially shorter the
problem does not manifest itself:postgres=> create table t4 (v varchar(4), i int);
CREATE
postgres=> insert into t4 values ('hi',4);
INSERT 18156 1
postgres=> select * from t4;
v |i
--+-
hi|4
(1 row)but varchar(10) still shows the problem:
postgres=> create table t3 (v varchar(10), i int);
CREATE
postgres=> insert into t3 values ('hi',3);
INSERT 18145 1
postgres=> select * from t3;
v |i
--+-
hi|0
(1 row)This is from the development source as of around 1998-01-12 14:30 GMT
-
Tom
Import Notes
Resolved by subject fallback
Pretty sure only after but it's hard to tell for sure. My trees for 971204
and 971222 both core dump on inserts to varchar, but I can't remember what
else I was doing with the trees at the time. v6.2.1p5 works OK on this:postgres=> create table t (v varchar(80),i int);
CREATE
postgres=> insert into t values ('hi',1);
INSERT 142735 1
postgres=> select * from t;
v |i
--+-
hi|1
(1 row)
I am working on it. Look at this:
test=> create table t15 (x varchar(7),x1 int, x2 int, x3 int)
test-> ;
CREATE
test=> insert into t15 values ('as',1,2,3);
INSERT 143436 1
test=> select * from t15;
x |x1|x2|x3
--+--+--+--
as| 2| 3| 0
(1 row)
Srange, huh?
--
Bruce Momjian
maillist@candle.pha.pa.us
Pretty sure only after but it's hard to tell for sure. My trees for 971204
and 971222 both core dump on inserts to varchar, but I can't remember what
else I was doing with the trees at the time. v6.2.1p5 works OK on this:postgres=> create table t (v varchar(80),i int);
CREATE
postgres=> insert into t values ('hi',1);
INSERT 142735 1
postgres=> select * from t;
v |i
--+-
hi|1
(1 row)
The data on disk is OK, so it must be the retrieve code.
--
Bruce Momjian
maillist@candle.pha.pa.us
Pretty sure only after but it's hard to tell for sure. My trees for 971204
and 971222 both core dump on inserts to varchar, but I can't remember what
else I was doing with the trees at the time.I now recall that the varchar code was broken during this time (for who knows
how long?), as I discovered when trying to reproduce the tutorial results for
the documentation.The problem was that some things were copied using VARSIZE rather than
subtracting out VARHDRSZ first (actually, I think it might have use
sizeof(int) and other dangers too). I patched that near the end of the year
and my 980101.d tree and 980106.d tree do not exhibit the symptom:postgres=> create table t (v varchar(80),i int);
CREATE
postgres=> insert into t values ('hi', 1);
INSERT 643562 1
postgres=> select * from t;
v |i
--+-
hi|1
(1 row)
I have found that ExecEvalVar() uses a descriptor that has the attr
length set to the maximum, instead of -1. The ExecTypeFromTL() comment
says:
/* ----------------------------------------------------------------
* ExecTypeFromTL
*
* Currently there are about 4 different places where we create
* TupleDescriptors. They should all be merged, or perhaps
* be rewritten to call BuildDesc().
*
Clearly stating that the tuple descriptors in the system are created in
several places. Some places have the length set wrong. I am going to
have to take a look at all those places, and make sure they have
consistent behaviour.
--
Bruce Momjian
maillist@candle.pha.pa.us
Forwarded message:
The problem was that some things were copied using VARSIZE rather than
subtracting out VARHDRSZ first (actually, I think it might have use
sizeof(int) and other dangers too). I patched that near the end of the year
and my 980101.d tree and 980106.d tree do not exhibit the symptom:postgres=> create table t (v varchar(80),i int);
CREATE
postgres=> insert into t values ('hi', 1);
INSERT 643562 1
postgres=> select * from t;
v |i
--+-
hi|1
(1 row)I have found that ExecEvalVar() uses a descriptor that has the attr
length set to the maximum, instead of -1. The ExecTypeFromTL() comment
says:/* ----------------------------------------------------------------
* ExecTypeFromTL
*
* Currently there are about 4 different places where we create
* TupleDescriptors. They should all be merged, or perhaps
* be rewritten to call BuildDesc().
*Clearly stating that the tuple descriptors in the system are created in
several places. Some places have the length set wrong. I am going to
have to take a look at all those places, and make sure they have
consistent behaviour.
Vadim, can you look at this for me. If you set a break at ExecEvalVar
before executing the SELECT, you will see its
tupledescriptor->attrs[0].attlen is the max length, and not -1 as it
should be.
I can't figure out where that is getting set. Can you also check the
other tupledescriptor initializations to see they have the -1 for
varchar too. I am stumped.
--
Bruce Momjian
maillist@candle.pha.pa.us
Import Notes
Resolved by subject fallback
Bruce Momjian wrote:
I have found that ExecEvalVar() uses a descriptor that has the attr
length set to the maximum, instead of -1. The ExecTypeFromTL() comment
...
Vadim, can you look at this for me. If you set a break at ExecEvalVar
before executing the SELECT, you will see its
tupledescriptor->attrs[0].attlen is the max length, and not -1 as it
should be.I can't figure out where that is getting set. Can you also check the
other tupledescriptor initializations to see they have the -1 for
varchar too. I am stumped.
Why attlen should be -1 ?
attlen in pg_attribute for v in table t is 84, why run-time attlen
should be -1 ? How else maxlen constraint could be checked ?
IMHO, you have to change heap_getattr() to check is atttype == VARCHAROID
and use vl_len if yes. Also, other places where attlen is used must be
changed too - e.g. ExecEvalVar():
{
len = tuple_type->attrs[attnum - 1]->attlen;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
byval = tuple_type->attrs[attnum - 1]->attbyval ? true : false;
}
execConstByVal = byval;
execConstLen = len;
^^^^^^^^^^^^^^^^^^ - used in nodeHash.c
Vadim
Bruce Momjian wrote:
I have found that ExecEvalVar() uses a descriptor that has the attr
length set to the maximum, instead of -1. The ExecTypeFromTL() comment...
Vadim, can you look at this for me. If you set a break at ExecEvalVar
before executing the SELECT, you will see its
tupledescriptor->attrs[0].attlen is the max length, and not -1 as it
should be.I can't figure out where that is getting set. Can you also check the
other tupledescriptor initializations to see they have the -1 for
varchar too. I am stumped.Why attlen should be -1 ?
attlen in pg_attribute for v in table t is 84, why run-time attlen
should be -1 ? How else maxlen constraint could be checked ?
IMHO, you have to change heap_getattr() to check is atttype == VARCHAROID
and use vl_len if yes. Also, other places where attlen is used must be
changed too - e.g. ExecEvalVar():{
len = tuple_type->attrs[attnum - 1]->attlen;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
byval = tuple_type->attrs[attnum - 1]->attbyval ? true : false;
}execConstByVal = byval;
execConstLen = len;
^^^^^^^^^^^^^^^^^^ - used in nodeHash.c
The major problem is that TupleDesc comes from several places, and
attlen means several things.
There are some cases where TupleDesc (int numatt, Attrs[]) is created
on-the-fly (tupdesc.c), and the attlen is the length of the type. In
other cases, we get attlen from opening the relation, heap_open(), and
in these cases it is the length as defined for the particular attribute.
Certainly a bad situation. I am not sure about a fix.
--
Bruce Momjian
maillist@candle.pha.pa.us
Forwarded message:
Why attlen should be -1 ?
attlen in pg_attribute for v in table t is 84, why run-time attlen
should be -1 ? How else maxlen constraint could be checked ?
IMHO, you have to change heap_getattr() to check is atttype == VARCHAROID
and use vl_len if yes. Also, other places where attlen is used must be
changed too - e.g. ExecEvalVar():{
len = tuple_type->attrs[attnum - 1]->attlen;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
byval = tuple_type->attrs[attnum - 1]->attbyval ? true : false;
}execConstByVal = byval;
execConstLen = len;
^^^^^^^^^^^^^^^^^^ - used in nodeHash.cThe major problem is that TupleDesc comes from several places, and
attlen means several things.There are some cases where TupleDesc (int numatt, Attrs[]) is created
on-the-fly (tupdesc.c), and the attlen is the length of the type. In
other cases, we get attlen from opening the relation, heap_open(), and
in these cases it is the length as defined for the particular attribute.Certainly a bad situation. I am not sure about a fix.
OK, here is a temporary fix to the problem. It does the heap_open(),
then replaces the attrs for VARCHAR with attlen of -1. You can't just
change the field, because the data is in a cache and you are just
returned a pointer.
Can I add an 'attdeflen' for "attributed defined length" field to
pg_attribute, and change the attlen references needed to the new field?
This is the only proper way to fix it.
---------------------------------------------------------------------------
*** ./backend/executor/execAmi.c.orig Thu Jan 15 22:42:13 1998
--- ./backend/executor/execAmi.c Thu Jan 15 23:54:37 1998
***************
*** 42,47 ****
--- 42,48 ----
#include "access/genam.h"
#include "access/heapam.h"
#include "catalog/heap.h"
+ #include "catalog/pg_type.h"
static Pointer
ExecBeginScan(Relation relation, int nkeys, ScanKey skeys,
***************
*** 124,129 ****
--- 125,155 ----
if (relation == NULL)
elog(DEBUG, "ExecOpenR: relation == NULL, heap_open failed.");
+ {
+ int i;
+ Relation trel = palloc(sizeof(RelationData));
+ TupleDesc tdesc = palloc(sizeof(struct tupleDesc));
+ AttributeTupleForm *tatt =
+ palloc(sizeof(AttributeTupleForm*)*relation->rd_att->natts);
+
+ memcpy(trel, relation, sizeof(RelationData));
+ memcpy(tdesc, relation->rd_att, sizeof(struct tupleDesc));
+ trel->rd_att = tdesc;
+ tdesc->attrs = tatt;
+
+ for (i = 0; i < relation->rd_att->natts; i++)
+ {
+ if (relation->rd_att->attrs[i]->atttypid != VARCHAROID)
+ tdesc->attrs[i] = relation->rd_att->attrs[i];
+ else
+ {
+ tdesc->attrs[i] = palloc(sizeof(FormData_pg_attribute));
+ memcpy(tdesc->attrs[i], relation->rd_att->attrs[i],
+ sizeof(FormData_pg_attribute));
+ tdesc->attrs[i]->attlen = -1;
+ }
+ }
+ }
return relation;
}
--
Bruce Momjian
maillist@candle.pha.pa.us
Import Notes
Resolved by subject fallback
OK, here is a temporary fix to the problem. It does the heap_open(),
then replaces the attrs for VARCHAR with attlen of -1. You can't just
change the field, because the data is in a cache and you are just
returned a pointer.Can I add an 'attdeflen' for "attributed defined length" field to
pg_attribute, and change the attlen references needed to the new field?
This is the only proper way to fix it.
Bruce, does your "temporary fix" seem to repair all known problems with varchar()? If so, would you be interested in
holding off on a "proper fix" and coming back to it after v6.3 is released? At that time, we can try solving the general
problem of retaining column-specific attributes, such as your max len for varchar, declared dimensions for arrays, and
numeric() and decimal() types. Or, if you have time to try a solution now _and_ come back to it later...
- Tom
OK, here is a temporary fix to the problem. It does the heap_open(),
then replaces the attrs for VARCHAR with attlen of -1. You can't just
change the field, because the data is in a cache and you are just
returned a pointer.Can I add an 'attdeflen' for "attributed defined length" field to
pg_attribute, and change the attlen references needed to the new field?
This is the only proper way to fix it.
Bruce, does your "temporary fix" seem to repair all known problems
with varchar()? If so, would you be interested in > holding off on a
"proper fix" and coming back to it after v6.3 is released? At that time,
we can try solving the general > problem of retaining column-specific
attributes, such as your max len for varchar, declared dimensions for
arrays, and > numeric() and decimal() types. Or, if you have time to try
a solution now _and_ come back to it later... > >
[Those wide post really are difficult.]
I don't think my solution is perfect or complete. I only caught one
group of heap_open calls used in the executor. I could funnel all of
them through this patched function, but I can imagine there would be
ones I would miss. Once the structure is gotten from the cache, it
seems to fly around the executor code quite freely, and it is hard to
know when a tuple descriptor is being created, if it is being used for
data creation or data reference. attlen references are much clearer in
their intent.
If I add a new field type to FormData_pg_attribute, I can then check
each attlen reference, and check if it is trying to move through the
on-disk storage (attlen/typlen) or create a new/modify an entry
(attdeflen).
How much time I have depends on what Vadim needs me to do for
subselects.
--
Bruce Momjian
maillist@candle.pha.pa.us
OK, here is a temporary fix to the problem. It does the heap_open(),
then replaces the attrs for VARCHAR with attlen of -1. You can't just
change the field, because the data is in a cache and you are just
returned a pointer.Can I add an 'attdeflen' for "attributed defined length" field to
pg_attribute, and change the attlen references needed to the new field?
This is the only proper way to fix it.Bruce, does your "temporary fix" seem to repair all known problems with varchar()? If so, would you be interested in
holding off on a "proper fix" and coming back to it after v6.3 is released? At that time, we can try solving the general
problem of retaining column-specific attributes, such as your max len for varchar, declared dimensions for arrays, and
numeric() and decimal() types. Or, if you have time to try a solution now _and_ come back to it later...- Tom
In fact, I am inclined to leave attlen unchanged, and add atttyplen that
is a copy of the length of the type. That way, the attlen for varchar()
really contains the defined length, and atttyplen is used for disk
references, and it is very clear what it means.
--
Bruce Momjian
maillist@candle.pha.pa.us
Bruce Momjian wrote:
Can I add an 'attdeflen' for "attributed defined length" field to
pg_attribute, and change the attlen references needed to the new field?
This is the only proper way to fix it.Bruce, does your "temporary fix" seem to repair all known problems with varchar()? If so, would you be interested in
holding off on a "proper fix" and coming back to it after v6.3 is released? At that time, we can try solving the general
problem of retaining column-specific attributes, such as your max len for varchar, declared dimensions for arrays, and
numeric() and decimal() types. Or, if you have time to try a solution now _and_ come back to it later...In fact, I am inclined to leave attlen unchanged, and add atttyplen that
is a copy of the length of the type. That way, the attlen for varchar()
really contains the defined length, and atttyplen is used for disk
references, and it is very clear what it means.
pg_attribute.h:
int2 attlen;
/*
* attlen is a copy of the typlen field from pg_type for this
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* attribute...
^^^^^^^^^
: I would suggest to don't change this to preserve the same meaning
for all data types. attlen = -1 means that attribute is varlena.
We certainly need in new field in pg_attribute! I don't like names like
"attdeflen" or "atttyplen" - bad names for NUMERIC etc. Something like
atttspec (ATTribute Type SPECification) is better, imho.
For the varchar(N) we'll have attlen = -1 and atttspec = N (or N + 4 - what's
better). For the text: attlen = -1 and atttspec = -1. And so on.
Of 'course, it's not so much matter where to put maxlen of varchar.
attlen = -1 for varchar just seems more clear to me.
But in any case we need in new field and, imho, this should be added
in 6.3
Vadim
Bruce Momjian wrote:
Can I add an 'attdeflen' for "attributed defined length" field to
pg_attribute, and change the attlen references needed to the new field?
This is the only proper way to fix it.Bruce, does your "temporary fix" seem to repair all known problems with varchar()? If so, would you be interested in
holding off on a "proper fix" and coming back to it after v6.3 is released? At that time, we can try solving the general
problem of retaining column-specific attributes, such as your max len for varchar, declared dimensions for arrays, and
numeric() and decimal() types. Or, if you have time to try a solution now _and_ come back to it later...In fact, I am inclined to leave attlen unchanged, and add atttyplen that
is a copy of the length of the type. That way, the attlen for varchar()
really contains the defined length, and atttyplen is used for disk
references, and it is very clear what it means.pg_attribute.h:
int2 attlen;
/*
* attlen is a copy of the typlen field from pg_type for this
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* attribute...
^^^^^^^^^
: I would suggest to don't change this to preserve the same meaning
for all data types. attlen = -1 means that attribute is varlena.We certainly need in new field in pg_attribute! I don't like names like
"attdeflen" or "atttyplen" - bad names for NUMERIC etc. Something like
atttspec (ATTribute Type SPECification) is better, imho.For the varchar(N) we'll have attlen = -1 and atttspec = N (or N + 4 - what's
better). For the text: attlen = -1 and atttspec = -1. And so on.Of 'course, it's not so much matter where to put maxlen of varchar.
attlen = -1 for varchar just seems more clear to me.But in any case we need in new field and, imho, this should be added
in 6.3
OK, we have a new pg_attribute column called 'atttypmod' for
'type-specific modifier'. Currently, it is only used to hold the char
and varchar length, but I am sure will be used soon for other types.
Here is the test:
test=> insert into test values ('asdfasdfasdfasdfasdfadsfasdf11',3);
INSERT 18282 1
test=> select * from test;
x |y
--------------------+-
asdfasdfasdfasdfasdf|3
(1 row)
'attlen' was certainly a confusing double-used field that I am glad to
return to single-use status.
I will be installing the patch soon, and will then start on subselects
in the parser. It will probably take me until Monday to finish that.
--
Bruce Momjian
maillist@candle.pha.pa.us
Bruce Momjian wrote:
OK, we have a new pg_attribute column called 'atttypmod' for
'type-specific modifier'. Currently, it is only used to hold the char
and varchar length, but I am sure will be used soon for other types.
Nice!
Here is the test:
test=> insert into test values ('asdfasdfasdfasdfasdfadsfasdf11',3);
INSERT 18282 1
test=> select * from test;
x |y
--------------------+-
asdfasdfasdfasdfasdf|3
(1 row)'attlen' was certainly a confusing double-used field that I am glad to
return to single-use status.I will be installing the patch soon, and will then start on subselects
in the parser. It will probably take me until Monday to finish that.
Ok.
Vadim
Bruce,
The new varchar() stuff looks good, just a minor problem with "select into"
where the new table does not seem to get a copy of the atttypmod value
from the source table.I had a quick look at the code but guess you'll find the problem 10 times
faster than I could.
OK, I have fixed this. The real way to fix this it to add restypmod to
Resdom, and pass the value all the way through the engine, so tupDesc
always has the proper atttypmod value, but it is not clear how to do
this in the parser, so I put the code back in to just do a lookup in
execMain/execUtils when doing an SELECT * INTO TABLE.
If we start using atttypmod more, we will have to do this.
--
Bruce Momjian
maillist@candle.pha.pa.us
Import Notes
Reply to msg id not found: 199801171952.TAA15767@mtcc.demon.co.uk | Resolved by subject fallback
Thanks Bruce,
It was obviously not quite as simple a problem as I had 1st imagined.
I did have a root around in the code but could not work out how the
attributes were copied to the newly created table.
Thanks for the fix,
Keith.
Bruce Momjian <maillist@candle.pha.pa.us>
Show quoted text
emkxp01@mtcc.demon.co.uk
Bruce,
The new varchar() stuff looks good, just a minor problem with "select into"
where the new table does not seem to get a copy of the atttypmod value
from the source table.I had a quick look at the code but guess you'll find the problem 10 times
faster than I could.OK, I have fixed this. The real way to fix this it to add restypmod to
Resdom, and pass the value all the way through the engine, so tupDesc
always has the proper atttypmod value, but it is not clear how to do
this in the parser, so I put the code back in to just do a lookup in
execMain/execUtils when doing an SELECT * INTO TABLE.If we start using atttypmod more, we will have to do this.
Import Notes
Resolved by subject fallback