Re: [SQL] Column name's length

Started by Tom Lanealmost 27 years ago22 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

Zalman Stern <zalman@netcom.com> writes:

Here are the two diffs that up the "name size" from 32 characters to 256
characters. (Once I get bit, I try to fix things real good so I don't get
bit again :-))
-----
diff postgresql-6.4.2/src/include/postgres_ext.h postgres-build/src/include/postgres_ext.h
34c34
< #define NAMEDATALEN 32
---

#define NAMEDATALEN 256

37c37
< #define OIDNAMELEN 36
---

#define OIDNAMELEN 260

-----
diff postgresql-6.4.2/src/include/storage/buf_internals.h postgres-build/src/include/storage/buf_internals.h
87c87
< #define PADDED_SBUFDESC_SIZE 128
---

#define PADDED_SBUFDESC_SIZE 1024

-----

It'd probably be worthwhile to move NAMEDATALEN to config.h and make the
other two symbols be computed off NAMEDATALEN. Any objections if I
sneak that change into 6.5, or is it too close to being a "new feature"?

regards, tom lane

#2Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#1)
Re: [HACKERS] Re: [SQL] Column name's length

Zalman Stern <zalman@netcom.com> writes:

Here are the two diffs that up the "name size" from 32 characters to 256
characters. (Once I get bit, I try to fix things real good so I don't get
bit again :-))
-----
diff postgresql-6.4.2/src/include/postgres_ext.h postgres-build/src/include/postgres_ext.h
34c34
< #define NAMEDATALEN 32
---

#define NAMEDATALEN 256

37c37
< #define OIDNAMELEN 36
---

#define OIDNAMELEN 260

-----
diff postgresql-6.4.2/src/include/storage/buf_internals.h postgres-build/src/include/storage/buf_internals.h
87c87
< #define PADDED_SBUFDESC_SIZE 128
---

#define PADDED_SBUFDESC_SIZE 1024

-----

It'd probably be worthwhile to move NAMEDATALEN to config.h and make the
other two symbols be computed off NAMEDATALEN. Any objections if I
sneak that change into 6.5, or is it too close to being a "new feature"?

Don't they have to be visible to outside apps, so it is in postgres_ext.h?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#2)
Re: [HACKERS] Re: [SQL] Column name's length

Bruce Momjian <maillist@candle.pha.pa.us> writes:

It'd probably be worthwhile to move NAMEDATALEN to config.h and make the
other two symbols be computed off NAMEDATALEN. Any objections if I
sneak that change into 6.5, or is it too close to being a "new feature"?

Don't they have to be visible to outside apps, so it is in postgres_ext.h?

Good point --- I was thinking that postgres_ext.h includes config.h,
but I see it ain't so. You're right, those definitions must stay where
they are.

Still, I wonder why OIDNAMELEN isn't just defined as
(NAMEDATALEN+sizeof(Oid)) rather than putting a comment to that effect.
I will check the uses and see if that is a safe change or not.

regards, tom lane

#4Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#3)
Re: [HACKERS] Re: [SQL] Column name's length

Bruce Momjian <maillist@candle.pha.pa.us> writes:

It'd probably be worthwhile to move NAMEDATALEN to config.h and make the
other two symbols be computed off NAMEDATALEN. Any objections if I
sneak that change into 6.5, or is it too close to being a "new feature"?

Don't they have to be visible to outside apps, so it is in postgres_ext.h?

Good point --- I was thinking that postgres_ext.h includes config.h,
but I see it ain't so. You're right, those definitions must stay where
they are.

Still, I wonder why OIDNAMELEN isn't just defined as
(NAMEDATALEN+sizeof(Oid)) rather than putting a comment to that effect.
I will check the uses and see if that is a safe change or not.

Yes, probably should be changed. The old code did some fancy sed with
it, so maybe it had to be a real number back then, or perhaps initdb
pulls it from the file. Not sure.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#5Peter T Mount
peter@retep.org.uk
In reply to: Bruce Momjian (#2)
Re: [HACKERS] Re: [SQL] Column name's length

On Tue, 1 Jun 1999, Bruce Momjian wrote:

Zalman Stern <zalman@netcom.com> writes:

Here are the two diffs that up the "name size" from 32 characters to 256
characters. (Once I get bit, I try to fix things real good so I don't get
bit again :-))
-----
diff postgresql-6.4.2/src/include/postgres_ext.h postgres-build/src/include/postgres_ext.h
34c34
< #define NAMEDATALEN 32
---

#define NAMEDATALEN 256

37c37
< #define OIDNAMELEN 36
---

#define OIDNAMELEN 260

-----
diff postgresql-6.4.2/src/include/storage/buf_internals.h postgres-build/src/include/storage/buf_internals.h
87c87
< #define PADDED_SBUFDESC_SIZE 128
---

#define PADDED_SBUFDESC_SIZE 1024

-----

It'd probably be worthwhile to move NAMEDATALEN to config.h and make the
other two symbols be computed off NAMEDATALEN. Any objections if I
sneak that change into 6.5, or is it too close to being a "new feature"?

Don't they have to be visible to outside apps, so it is in postgres_ext.h?

We would need to have some way of getting at it from the client - the
DatabaseMetaData method getColumnNameLength() would need to know about
this, and we can't refer to the C header files.

Peter

--
Peter T Mount peter@retep.org.uk
Main Homepage: http://www.retep.org.uk
PostgreSQL JDBC Faq: http://www.retep.org.uk/postgres
Java PDF Generator: http://www.retep.org.uk/pdf

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter T Mount (#5)
Re: [HACKERS] Re: [SQL] Column name's length

We've seen this table-name-plus-column-name-too-long problem before,
and I'm sure we're going to keep hearing about it until we fix it
somehow. Messing with NAMEDATALEN is probably not a very useful
answer for the average user, given the compatibility problems it
creates.

How about something like this: if the code finds that the names are
too long when forming an implicit index name, it truncates the names
to fit, and you are OK as long as the truncated name is unique.
For example

create table averylongtablename (averylongfieldname serial);

would truncate the input names to produce something like

averylongtable_averylongfie_key
averylongtable_averylongfie_seq

and you'd only get a failure if those indexes/sequences already existed.
(Truncating both names as shown above, not just the field name,
should reduce the probability of collisions.)

You could even imagine trying a few different possibilities in order
to find an unused name, but that worries me. I'd rather that it were
completely predictable what name would be used for a given key, and if
it depends on what already exists then it wouldn't be so predictable.
But there's nothing unpredictable about truncation to fit a known
length.

This is obviously not a 100% solution, since there's a risk of name
collisions (averylongfieldname1 and averylongfieldname2) but it's
probably a 95% solution, and it wouldn't take much work or risk.

Comments? Objections? I think I could argue that this is a bug fix
and deserves to be slipped into 6.5 ;-)

regards, tom lane

#7Zalman Stern
zalman@netcom.com
In reply to: Tom Lane (#6)
Re: [HACKERS] Re: [SQL] Column name's length

How about something like this: if the code finds that the names are
too long when forming an implicit index name, it truncates the names
to fit, and you are OK as long as the truncated name is unique.
For example

create table averylongtablename (averylongfieldname serial);

would truncate the input names to produce something like

averylongtable_averylongfie_key
averylongtable_averylongfie_seq

and you'd only get a failure if those indexes/sequences already existed.
(Truncating both names as shown above, not just the field name,
should reduce the probability of collisions.)

This only partially solves the problem and can introduce bugs into code
which is only reading from a database. When someone is setting up the
database to work on the system, they'll in theory get a failure so they
know it won't work. This really isn't true for our software though because
we have functions which dynamically query a table to see what columns it
has. In theory two queries for different longnames can resolve to the same
column name.

It is also a backwards compatibility hassle if you ever want to increase
the number of significant characters in the name. This is because the
existing database only knows the first 32 characters and *must* ignore
anything after that in lookups. You would have to keep track of which names
are "old style" and which are new. Why set yourself up like that?

-Z-

#8Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Tom Lane (#6)
Re: [HACKERS] Re: [SQL] Column name's length

How about something like this: if the code finds that the names are
too long when forming an implicit index name, it truncates the names
to fit, and you are OK as long as the truncated name is unique.
Comments? Objections? I think I could argue that this is a bug fix
and deserves to be slipped into 6.5 ;-)

I understand some folks think this is a problem, but have been
reluctant to include a "randomizer" in the created index name since it
would make the index name less clearly predictable. May as well use
something like "idx_<procid>_<timestamp>" or somesuch...

No real objection though, other than aesthetics. And those only count
for so much...

- Tom

--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California

#9Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#6)
Re: [HACKERS] Re: [SQL] Column name's length

This is obviously not a 100% solution, since there's a risk of name
collisions (averylongfieldname1 and averylongfieldname2) but it's
probably a 95% solution, and it wouldn't take much work or risk.

Comments? Objections? I think I could argue that this is a bug fix
and deserves to be slipped into 6.5 ;-)

Trying to slip it in as a bug fix. Sounds like me, Tom. :-)

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#9)
Re: [HACKERS] Re: [SQL] Column name's length

Zalman Stern <zalman@netcom.com> writes:

How about something like this: if the code finds that the names are
too long when forming an implicit index name, it truncates the names
to fit, and you are OK as long as the truncated name is unique.

This only partially solves the problem and can introduce bugs into code
which is only reading from a database. When someone is setting up the
database to work on the system, they'll in theory get a failure so they
know it won't work. This really isn't true for our software though because
we have functions which dynamically query a table to see what columns it
has. In theory two queries for different longnames can resolve to the same
column name.

Um, no, I don't think this has anything to do with whether you can
distinguish the names of different columns in a table.

What we are talking about is the names generated for indexes and
sequences that are needed to implement PRIMARY KEY and SERIAL column
attributes. Ideally these names are completely invisible to an SQL
application --- there's certainly no direct need for the app to know
about them. We could eliminate the whole issue if we generated names
along the lines of "pg_pkey_idx_48812091". But when you are looking at
the system catalogs it is useful to be able to tell what's what by eye.
So we compromise by generating names that include the table and column
name for which we're creating an implicit index or sequence.

The problem is that this implementation-detail-that-should-be-invisible
*is* visible to an SQL application, because it restricts the SQL app's
choice of table and column names. We need to avoid that restriction,
or at least reduce it as much as we can. I'm willing to sacrifice
a little bit of SQL naming freedom to preserve readability of the names
generated behind the scenes, but putting a hard limit on name length
is too much sacrifice. (This is more a question of designer's taste
than anything else --- you're certainly free to argue for a different
tradeoff point. But it is a tradeoff; there's no perfect solution.)

It is also a backwards compatibility hassle if you ever want to increase
the number of significant characters in the name. This is because the
existing database only knows the first 32 characters and *must* ignore
anything after that in lookups. You would have to keep track of which names
are "old style" and which are new. Why set yourself up like that?

No, because the SQL app should never need to know these names at all.
If they change, it won't affect app code. Increasing NAMEDATALEN
could not cause two table+field names to conflict where they did not
conflict before, so I see no risk there.

regards, tom lane

#11Zalman Stern
zalman@netcom.com
In reply to: Tom Lane (#10)
Re: [HACKERS] Re: [SQL] Column name's length

I misunderstood the context quite a bit. I would consider gluing the entire
full length name together and using 8 bytes or so for a strong hash of the
fullname. If there is another lookup path to get to the correct index name,
then one can just increment the hash until the name is unique. Or whatever.

-Z-

#12Jan Wieck
JanWieck@Yahoo.com
In reply to: Thomas Lockhart (#8)
Re: [HACKERS] Re: [SQL] Column name's length

How about something like this: if the code finds that the names are
too long when forming an implicit index name, it truncates the names
to fit, and you are OK as long as the truncated name is unique.
Comments? Objections? I think I could argue that this is a bug fix
and deserves to be slipped into 6.5 ;-)

I understand some folks think this is a problem, but have been
reluctant to include a "randomizer" in the created index name since it
would make the index name less clearly predictable. May as well use
something like "idx_<procid>_<timestamp>" or somesuch...

No real objection though, other than aesthetics. And those only count
for so much...

I've been wondering for some time why at all to build the
index and sequence names from those table/fieldnames. Only to
make them guessable?

What about building them from the tables OID plus the column
numbers. That way, auto created sequences could also be
automatically removed on a DROP TABLE because the system can
"guess" them.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #

#13Vadim Mikheev
vadim@krs.ru
In reply to: Jan Wieck (#12)
Re: [HACKERS] Re: [SQL] Column name's length

Jan Wieck wrote:

I understand some folks think this is a problem, but have been
reluctant to include a "randomizer" in the created index name since it
would make the index name less clearly predictable. May as well use
something like "idx_<procid>_<timestamp>" or somesuch...

No real objection though, other than aesthetics. And those only count
for so much...

I've been wondering for some time why at all to build the

And me -:)

index and sequence names from those table/fieldnames. Only to
make them guessable?

What about building them from the tables OID plus the column
numbers. That way, auto created sequences could also be
automatically removed on a DROP TABLE because the system can
"guess" them.

Actually, we should use names not allowed in CREATE statements!
So I would use "pg_" prefix...

Vadim

#14Jan Wieck
JanWieck@Yahoo.com
In reply to: Vadim Mikheev (#13)
Re: [HACKERS] Re: [SQL] Column name's length

Jan Wieck wrote:

I understand some folks think this is a problem, but have been
reluctant to include a "randomizer" in the created index name since it
would make the index name less clearly predictable. May as well use
something like "idx_<procid>_<timestamp>" or somesuch...

No real objection though, other than aesthetics. And those only count
for so much...

I've been wondering for some time why at all to build the

And me -:)

index and sequence names from those table/fieldnames. Only to
make them guessable?

What about building them from the tables OID plus the column
numbers. That way, auto created sequences could also be
automatically removed on a DROP TABLE because the system can
"guess" them.

Actually, we should use names not allowed in CREATE statements!
So I would use "pg_" prefix...

This would implicitly deny the user from dropping the created
index for a unique constraint :-) Same for the sequences -
what's good because they are used in the default clauses for
the serial field and dropping the sequence would corrupt the
table though.

I like it.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#14)
Re: [HACKERS] Re: [SQL] Column name's length

wieck@debis.com (Jan Wieck) writes:

Actually, we should use names not allowed in CREATE statements!
So I would use "pg_" prefix...

This would implicitly deny the user from dropping the created
index for a unique constraint :-) Same for the sequences -
what's good because they are used in the default clauses for
the serial field and dropping the sequence would corrupt the
table though.

Well, it's only good if the system will get rid of the objects when
the user drops the owning table. This is true for indexes but AFAIK
it is not yet true for sequences. So if we go with pg_ prefix now,
there will be *no* way short of superuser privilege to get rid of the
sequence object for a deleted table that had a serial field.

Also, this will break pg_dump, which will have no good way to restore
the state of a serial sequence object. (CREATE SEQUENCE pg_xxx will
fail, no?)

I like it.

Perhaps eventually we should wind up using names like "pg_pkey_8381292"
but I think this ought to wait until the system retains an explicit
representation of the relationship between these indexes/sequences and
the owning table, and until we think through the consequences for
pg_dump. For now we had better stick to unprivileged names.

regards, tom lane

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#15)
Re: [HACKERS] Re: [SQL] Column name's length

wieck@debis.com (Jan Wieck) writes:

What about building them from the tables OID plus the column
numbers.

The parser doesn't know what OID will be assigned to the table at the
time it builds the names for the derived objects. I suppose we could
postpone the creation of these names until after the table OID is known,
but that looks like a rather large and risky change to be making at this
stage of the release cycle...

At this point I like Zalman's idea, which if I understood it properly
went like this:

1. If table and column name are short enough, use "table_column_key"
etc (so, no change in the cases that the system accepts now).

2. Otherwise, truncate table and/or column name to fit, leaving room for
a few extra characters that are made from a hash of the removed
characters. The result would look something like "tab_col_5927_key".

This still isn't a 100% solution, but it's probably a 99.5% solution
where the simple truncation idea would be maybe 95%. Not sure that
the additional coverage is worth making the names harder to predict
for a person, though.

regards, tom lane

#17Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#15)
Re: [HACKERS] Re: [SQL] Column name's length

Tom Lane wrote:

I like it.

Perhaps eventually we should wind up using names like "pg_pkey_8381292"
but I think this ought to wait until the system retains an explicit
representation of the relationship between these indexes/sequences and
the owning table, and until we think through the consequences for
pg_dump. For now we had better stick to unprivileged names.

Of course! I didn't meant to do anything on it for v6.5.
Implementing automatic sequence deletion if they got created
due to serial fields is definitely feature. And I agree that
all the odds and ends have to get discussed down first.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #

#18Bruce Momjian
bruce@momjian.us
In reply to: Jan Wieck (#12)
Re: [HACKERS] Re: [SQL] Column name's length

How about something like this: if the code finds that the names are
too long when forming an implicit index name, it truncates the names
to fit, and you are OK as long as the truncated name is unique.
Comments? Objections? I think I could argue that this is a bug fix
and deserves to be slipped into 6.5 ;-)

I understand some folks think this is a problem, but have been
reluctant to include a "randomizer" in the created index name since it
would make the index name less clearly predictable. May as well use
something like "idx_<procid>_<timestamp>" or somesuch...

No real objection though, other than aesthetics. And those only count
for so much...

I've been wondering for some time why at all to build the
index and sequence names from those table/fieldnames. Only to
make them guessable?

What about building them from the tables OID plus the column
numbers. That way, auto created sequences could also be
automatically removed on a DROP TABLE because the system can
"guess" them.

Another idea would be to truncate table and column names equally to fit
in NAMEDATALEN, then if that is not unique, start replacing the last
letters of the string with number until it is unique:

tabnamecolname
tabnamecolnam1
tabnamecolnam2
tabnamecolna32

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#19Nat Howard
Nat.Howard@cello-us.com
In reply to: Thomas Lockhart (#8)
RE: [HACKERS] Re: [SQL] Column name's length

Just a suggestion: use an printably-encoded version of
md5 or sha, which are cryptographic hash algorithms.

It will make the name completely predictable:
if(too_long(name)) {
name = md5(name);
}

It will be *very* unlikely that there are any collisions.

Of course, a person won't say "gee, party_address_relation_code_types_seq is
too
long, I guess that will turn out to be d4420a3105e98e3e2e12c5c73019db59".

Show quoted text

-----Original Message-----
From: owner-pgsql-hackers@postgreSQL.org
[mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Thomas Lockhart
Sent: Wednesday, June 02, 1999 12:26 AM
To: Tom Lane
Cc: pgsql-hackers@postgreSQL.org; Zalman Stern
Subject: Re: [HACKERS] Re: [SQL] Column name's length

How about something like this: if the code finds that the names are
too long when forming an implicit index name, it truncates the names
to fit, and you are OK as long as the truncated name is unique.
Comments? Objections? I think I could argue that this is a bug fix
and deserves to be slipped into 6.5 ;-)

I understand some folks think this is a problem, but have been
reluctant to include a "randomizer" in the created index name since it
would make the index name less clearly predictable. May as well use
something like "idx_<procid>_<timestamp>" or somesuch...

No real objection though, other than aesthetics. And those only count
for so much...

- Tom

--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California

#20Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#15)
Re: [HACKERS] Re: [SQL] Column name's length

At 09:16 2/06/99 -0400, you wrote:

Well, it's only good if the system will get rid of the objects when
the user drops the owning table. This is true for indexes but AFAIK
it is not yet true for sequences. So if we go with pg_ prefix now,
there will be *no* way short of superuser privilege to get rid of the
sequence object for a deleted table that had a serial field.

Also, this will break pg_dump, which will have no good way to restore
the state of a serial sequence object. (CREATE SEQUENCE pg_xxx will
fail, no?)

I know I'm probably out of my depth here, but couldn't pg_dump ignore everything with a pg_* prefix? It can (safely?) assume any 'system' structures will be created as a result of some other user-based definition it is dumping?

[If you beat me about the head, I'll shut up]

Philip Warner.

P.S. I also like the idea of creating the 'system' structures with readily and reliably identifiable names, since it potentially gives the option of the user choosing to 'hide' them. As a user with about 20000 blobs to load, the output of a \d is pretty cumbersome.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: +61-03-5367 7422 | _________ \
Fax: +61-03-5367 7430 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#21)