Type Categories for User-Defined Types

Started by David E. Wheelerover 17 years ago33 messageshackers

Jump to latest

David E. Wheeler

david@kineticode.com

over 17 years ago

On Jul 29, 2008, at 11:41, Tom Lane wrote:

Okay, it's committed with minor revisions --- the biggest thing I
fixed
was the lack of an uninstall script.

Great, thanks!

I saw what you were talking about in terms of still having some
casting
issues: having to put in a quote_literal(citext) alias function seems
like a huge hack,

Yes, and I've been adding more hacks along the lines of:

CREATE OR REPLACE FUNCTION int8(citext)
RETURNS int8
AS 'SELECT int8( $1::text )'
LANGUAGE SQL IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION citext(int8)
RETURNS citext
AS 'SELECT text( $1 )::citext'
LANGUAGE SQL IMMUTABLE STRICT;

CREATE CAST (int8 AS citext) WITH FUNCTION citext(int8) AS ASSIGNMENT;
CREATE CAST (citext AS int8) WITH FUNCTION int8(citext);

I've been doing this for all the types, writing tests to see how text
behaves and replicating it with these hack functions. No, it's not
ideal.

and I notice that cases like

contrib_regression=# select 'a'::text || 'b'::citext;
ERROR: operator is not unique: text || citext

still don't work even though you put in an alias || operator.

Damn, I didn't even notice that! Can that be fixed?

It seems to me that trying to fix these things retail is a losing
proposition. The reason you need these, instead of having everything
"just work" like varchar does, is that citext isn't seen as a member
of the string type category, and so the "preferred type" preference
for
text isn't applied. What we ought to do about that IMHO is make a
way for user-defined types to declare what category they belong to.
This has been foreseen as needed for a *very* long time, but we never
really had a forcing function to make us do it before.

Yes, this would be a *much* nicer way to do it, IMO.

Obviously the solution should involve a new column in pg_type and
a new type property in CREATE TYPE, but what should the representation
be? A full-on approach would make the type categories be real SQL
objects with their own system catalog and reference them by OID,
but I can't help thinking that that's overkill.

It kinda sounds that way, yeah. What happens with DOMAINs, BTW? Do
they need to write hacky functions like the above, or are they aware
of their types because of the types from which they inherit?

Anyway, debating that is probably material for a separate thread ...

Here you go! ;-)

Thanks again for the commit, Tom.

Best,

David

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: David E. Wheeler (#1)

Re: Type Categories for User-Defined Types

"David E. Wheeler" <david@kineticode.com> writes:

On Jul 29, 2008, at 11:41, Tom Lane wrote:

and I notice that cases like
contrib_regression=# select 'a'::text || 'b'::citext;
ERROR: operator is not unique: text || citext
still don't work even though you put in an alias || operator.

Damn, I didn't even notice that! Can that be fixed?

Given the present infrastructure I think the only way would be with
two more alias operators, text||citext and citext||text. But that way
madness lies.

Obviously the solution should involve a new column in pg_type and
a new type property in CREATE TYPE, but what should the representation
be? A full-on approach would make the type categories be real SQL
objects with their own system catalog and reference them by OID,
but I can't help thinking that that's overkill.

It kinda sounds that way, yeah. What happens with DOMAINs, BTW? Do
they need to write hacky functions like the above, or are they aware
of their types because of the types from which they inherit?

Domains are treated as their base types in general. Elein has been
complaining about that for years ;-) ... but I think improving it
is unrelated to this issue.

Anyway, debating that is probably material for a separate thread ...

Here you go! ;-)

After a quick look to verify my recollection: the only two things
that the system does with type categories are

extern CATEGORY TypeCategory(Oid type);

Returns the category a type belongs to.

extern bool IsPreferredType(CATEGORY category, Oid type);

Detects whether a type is a preferred type in its category (there can
be more than one preferred type in a category, and in fact the
traditional setup is that *every* user-defined type is a preferred
type in the USER_TYPE category).

The categories themselves are pretty much opaque values, except that
parse_func.c has special behavior to prefer STRING_TYPE when in doubt.

So this can fairly obviously be replaced by two new pg_type columns,
say "typcategory" and "typpreferred", where the latter is a bool.
Since the list of categories is pretty short and there's no obvious
reason to extend it a lot, I propose that we just represent typcategory
as a "char", using a mapping along the lines of

BITSTRING_TYPE b
BOOLEAN_TYPE B
DATETIME_TYPE D
GENERIC_TYPE P (think "pseudotype")
GEOMETRIC_TYPE G
INVALID_TYPE \0 (not allowed in catalog anyway)
NETWORK_TYPE n
NUMERIC_TYPE N
STRING_TYPE S
TIMESPAN_TYPE T
UNKNOWN_TYPE u
USER_TYPE U

Users would be allowed to select any single ASCII character as the
"category" of a user-defined type, should they have a need to make their
own new category. Of course CREATE TYPE's default is category = U and
preferred = true for backward compatibility reasons. We could put down
a rule that system-defined categories are always upper or lower case
letters (or even always upper, if we wanted to strain some of the
assignments a bit) so that it's clear what can be used for a
user-defined category.

It might possibly be worth making new categories for arrays, composites,
and enums; they're currently effectively USER_TYPE but that doesn't seem
quite right. Also, the rules for domains should likely be "same
category as base type, never a preferred type" instead of the current
behavior where they're user types. (I think the latter doesn't really
matter now, because we always smash a domain to its base type before
inquiring about categories anyway. But it might give Elein a bit more
room to maneuver with the functions-on-domains issue.)

A possible objection is that this will make TypeCategory and
IsPreferredType slower than before, since they'll involve a syscache
lookup instead of a simple switch statement. I don't think this will
be too bad though; all the paths they are used in are full of catalog
lookups anyway, so it's hard to credit that there would be much
percentage slowdown.

Thoughts?

regards, tom lane

David E. Wheeler

david@kineticode.com

over 17 years ago

In reply to: Tom Lane (#2)

Re: Type Categories for User-Defined Types

On Jul 29, 2008, at 13:12, Tom Lane wrote:

Damn, I didn't even notice that! Can that be fixed?

Given the present infrastructure I think the only way would be with
two more alias operators, text||citext and citext||text. But that way
madness lies.

I suppose, then, that you're saying that there are lots of other
functions for which this sort of thing would need to be done? Because
two more aliases for this one operator is no big deal, AFAIC.

It kinda sounds that way, yeah. What happens with DOMAINs, BTW? Do
they need to write hacky functions like the above, or are they aware
of their types because of the types from which they inherit?

Domains are treated as their base types in general. Elein has been
complaining about that for years ;-) ... but I think improving it
is unrelated to this issue.

I see.

After a quick look to verify my recollection: the only two things
that the system does with type categories are

extern CATEGORY TypeCategory(Oid type);

Returns the category a type belongs to.

extern bool IsPreferredType(CATEGORY category, Oid type);

Detects whether a type is a preferred type in its category (there can
be more than one preferred type in a category, and in fact the
traditional setup is that *every* user-defined type is a preferred
type in the USER_TYPE category).

Perhaps tangential: What does it mean for a type to be "preferred"?

The categories themselves are pretty much opaque values, except that
parse_func.c has special behavior to prefer STRING_TYPE when in doubt.

So this can fairly obviously be replaced by two new pg_type columns,
say "typcategory" and "typpreferred", where the latter is a bool.
Since the list of categories is pretty short and there's no obvious
reason to extend it a lot, I propose that we just represent
typcategory
as a "char", using a mapping along the lines of

BITSTRING_TYPE b
BOOLEAN_TYPE B
DATETIME_TYPE D
GENERIC_TYPE P (think "pseudotype")
GEOMETRIC_TYPE G
INVALID_TYPE \0 (not allowed in catalog anyway)
NETWORK_TYPE n
NUMERIC_TYPE N
STRING_TYPE S
TIMESPAN_TYPE T
UNKNOWN_TYPE u
USER_TYPE U

Users would be allowed to select any single ASCII character as the
"category" of a user-defined type, should they have a need to make
their
own new category.

Wouldn't this then limit them to 52 possible categories? Does that
matter? Given your suggestion, I'm assuming that a single character is
somehow more efficient than an enum, yes?

Of course CREATE TYPE's default is category = U and
preferred = true for backward compatibility reasons. We could put
down
a rule that system-defined categories are always upper or lower case
letters (or even always upper, if we wanted to strain some of the
assignments a bit) so that it's clear what can be used for a
user-defined category.

Makes sense.

It might possibly be worth making new categories for arrays,
composites,
and enums; they're currently effectively USER_TYPE but that doesn't
seem
quite right. Also, the rules for domains should likely be "same
category as base type, never a preferred type" instead of the current
behavior where they're user types. (I think the latter doesn't really
matter now, because we always smash a domain to its base type before
inquiring about categories anyway. But it might give Elein a bit more
room to maneuver with the functions-on-domains issue.)

Yes, this all sounds like it'd be an important improvement.

A possible objection is that this will make TypeCategory and
IsPreferredType slower than before, since they'll involve a syscache
lookup instead of a simple switch statement. I don't think this will
be too bad though; all the paths they are used in are full of catalog
lookups anyway, so it's hard to credit that there would be much
percentage slowdown.

Thoughts?

Obviously I don't know much about the internals, but your explanation
here seems very clear to me. I like it. +1

Thank you, Tom.

Best,

David

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: David E. Wheeler (#3)

Re: Type Categories for User-Defined Types

"David E. Wheeler" <david@kineticode.com> writes:

On Jul 29, 2008, at 13:12, Tom Lane wrote:

Given the present infrastructure I think the only way would be with
two more alias operators, text||citext and citext||text. But that way
madness lies.

I suppose, then, that you're saying that there are lots of other
functions for which this sort of thing would need to be done? Because
two more aliases for this one operator is no big deal, AFAIC.

Well, a rough estimate of the places where implicit coercion to text
might be relevant to resolving ambiguity is

select proname from pg_proc
where 'text'::regtype = any(proargtypes)
group by proname having count(*)>1;

select oprname from pg_operator
where oprleft='text'::regtype or oprright='text'::regtype
group by oprname having count(*)> 1;

I count 37 functions and 10 operators as of CVS HEAD. Perhaps not all
would need to be fixed in practical use, but if you wanted seamless
integration of citext it's quite possible that you'd need alias
functions/operators (maybe more than one) in each of those cases.

[ squint... ] Actually, this is an underestimate since these queries
aren't finding cases like quote_literal, where there is ambiguity but
only one of the alternatives takes 'text'. I'm too lazy to work out a
better query though.

Perhaps tangential: What does it mean for a type to be "preferred"?

See the ambiguous-function resolution rules in chapter 10 of the fine
manual ...

Users would be allowed to select any single ASCII character as the
"category" of a user-defined type, should they have a need to make
their own new category.

Wouldn't this then limit them to 52 possible categories?

It'd be either 94 - 26 or 94 - 26 - 26 depending on what the policy is
about lower-case letters (and assuming they wanted to stay away from
control characters, which seems like a good idea). Considering the
world supply of categories up to now has been about ten, it's hard
to imagine that this is really a limitation.

Does that
matter? Given your suggestion, I'm assuming that a single character is
somehow more efficient than an enum, yes?

Marginally so; but an enum wouldn't help anyway unless we are prepared
to invent ALTER ENUM. We'd have to go to an actual new system catalog
if we wanted something noticeably better than the poor-mans-enum
approach, and as I mentioned earlier, that just seems like overkill.
(Besides, we could always add it later if there's suddenly a gold rush
for categories. The only thing we'd be locking ourselves into, if
we view this as a stopgap implementation, is the need to accept
single-character abbreviations in future, even after the system knows
actual names for categories.)

regards, tom lane

David E. Wheeler

david@kineticode.com

over 17 years ago

In reply to: Tom Lane (#4)

Re: Type Categories for User-Defined Types

On Jul 29, 2008, at 14:00, Tom Lane wrote:

Well, a rough estimate of the places where implicit coercion to text
might be relevant to resolving ambiguity is

select proname from pg_proc
where 'text'::regtype = any(proargtypes)
group by proname having count(*)>1;

select oprname from pg_operator
where oprleft='text'::regtype or oprright='text'::regtype
group by oprname having count(*)> 1;

I count 37 functions and 10 operators as of CVS HEAD. Perhaps not all
would need to be fixed in practical use, but if you wanted seamless
integration of citext it's quite possible that you'd need alias
functions/operators (maybe more than one) in each of those cases.

Well, there are already citext aliases for all of those operators, for
this very reason. There are citext aliases for a bunch of the
functions, too (ltrim(), substring(), etc.), so I wouldn't worry about
adding more. I've added more of them since I last sent a patch, mainly
for the regexp functions, replace(), strpos(), etc. I'd guess that I'm
about half-way there already, and there probably are a few I wouldn't
bother with (like timezone()).

Anyway, would this issue then go away once the type stuff was added
and citext was specified as TYPE = 'S'?

[ squint... ] Actually, this is an underestimate since these queries
aren't finding cases like quote_literal, where there is ambiguity but
only one of the alternatives takes 'text'. I'm too lazy to work out a
better query though.

Thanks.

Perhaps tangential: What does it mean for a type to be "preferred"?

See the ambiguous-function resolution rules in chapter 10 of the fine
manual ...

I see this:

C. Run through all candidates and keep those that accept preferred
types (of the input data type's type category) at the most positions
where type conversion will be required. Keep all candidates if none
accept preferred types. If only one candidate remains, use it; else
continue to the next step.

That doesn't exactly explain what "preferred" means, just that it
seems to prioritize the resolution of a function a bit. Which, I
guess, is the point.

Wouldn't this then limit them to 52 possible categories?

It'd be either 94 - 26 or 94 - 26 - 26 depending on what the policy is
about lower-case letters (and assuming they wanted to stay away from
control characters, which seems like a good idea). Considering the
world supply of categories up to now has been about ten, it's hard
to imagine that this is really a limitation.

Okay.

Does that
matter? Given your suggestion, I'm assuming that a single character
is
somehow more efficient than an enum, yes?

Marginally so; but an enum wouldn't help anyway unless we are prepared
to invent ALTER ENUM. We'd have to go to an actual new system catalog
if we wanted something noticeably better than the poor-mans-enum
approach, and as I mentioned earlier, that just seems like overkill.
(Besides, we could always add it later if there's suddenly a gold rush
for categories. The only thing we'd be locking ourselves into, if
we view this as a stopgap implementation, is the need to accept
single-character abbreviations in future, even after the system knows
actual names for categories.)

Makes sense.

Thanks,

David

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: David E. Wheeler (#5)

Re: Type Categories for User-Defined Types

"David E. Wheeler" <david@kineticode.com> writes:

Well, there are already citext aliases for all of those operators, for
this very reason. There are citext aliases for a bunch of the
functions, too (ltrim(), substring(), etc.), so I wouldn't worry about
adding more. I've added more of them since I last sent a patch, mainly
for the regexp functions, replace(), strpos(), etc. I'd guess that I'm
about half-way there already, and there probably are a few I wouldn't
bother with (like timezone()).

That's exactly what I don't really want to do; if you are adding aliases
*only* to get rid of ambiguity-errors, and not to alter functionality,
then I think you're doing the wrong thing. Adding more aliases can
easily make the situation worse.

Anyway, would this issue then go away once the type stuff was added
and citext was specified as TYPE = 'S'?

Yeah, that's the point of the proposal. I think the issue has come up
once or twice before, too, else I'd not be so interested in a general
solution. (digs in archives ... there was some discussion of this
in connection with unsigned integer types, and I seem to recall older
threads but can't find any right now.)

regards, tom lane

David E. Wheeler

david@kineticode.com

over 17 years ago

In reply to: Tom Lane (#6)

Re: Type Categories for User-Defined Types

On Jul 29, 2008, at 14:33, Tom Lane wrote:

That's exactly what I don't really want to do; if you are adding
aliases
*only* to get rid of ambiguity-errors, and not to alter functionality,
then I think you're doing the wrong thing. Adding more aliases can
easily make the situation worse.

Actually, most seem to resolve to text okay. I'm adding aliases to
change behavior (e.g., case-insensitive matching in replace()). It's
the bazillion cast functions I'm having to add that are annoying (see
my previous post with the int8 example).

Anyway, would this issue then go away once the type stuff was added
and citext was specified as TYPE = 'S'?

Yeah, that's the point of the proposal. I think the issue has come up
once or twice before, too, else I'd not be so interested in a general
solution. (digs in archives ... there was some discussion of this
in connection with unsigned integer types, and I seem to recall older
threads but can't find any right now.)

No worries, it seems like a really good idea to me, regardless.

Thanks!

David

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Tom Lane (#6)

Re: Type Categories for User-Defined Types

I wrote:

"David E. Wheeler" <david@kineticode.com> writes:

Anyway, would this issue then go away once the type stuff was added
and citext was specified as TYPE = 'S'?

Yeah, that's the point of the proposal.

BTW, I did confirm (by temporarily hacking up TypeCategory()) that
causing citext to appear to be of STRING category eliminates the need
for the extra || operator and quote_literal() function that are in
the current citext code. So the proposed solution really will work.

You might still want to keep the aliases in cases where the point is to
have the function or operator output resolve as citext rather than text.
I'm not sure how many of these cases that's really important for,
though.

regards, tom lane

David E. Wheeler

david@kineticode.com

over 17 years ago

In reply to: Tom Lane (#8)

Re: Type Categories for User-Defined Types

On Jul 29, 2008, at 15:35, Tom Lane wrote:

BTW, I did confirm (by temporarily hacking up TypeCategory()) that
causing citext to appear to be of STRING category eliminates the need
for the extra || operator and quote_literal() function that are in
the current citext code. So the proposed solution really will work.

Nice! Does it also allow all of the casts to and from text to
implicitly work, e.g,

SELECT 4::int8::citext = '4' AS t;
SELECT '4'::citext::int8 = 4 AS t;
SELECT 4::bigint::citext = '4' AS t;
SELECT '4'::citext::bigint = 4 AS t;

You might still want to keep the aliases in cases where the point is
to
have the function or operator output resolve as citext rather than
text.
I'm not sure how many of these cases that's really important for,
though.

Not many, I should think.

Thanks,

David

#10

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: David E. Wheeler (#9)

Re: Type Categories for User-Defined Types

"David E. Wheeler" <david@kineticode.com> writes:

On Jul 29, 2008, at 15:35, Tom Lane wrote:

BTW, I did confirm (by temporarily hacking up TypeCategory()) that
causing citext to appear to be of STRING category eliminates the need
for the extra || operator and quote_literal() function that are in
the current citext code. So the proposed solution really will work.

Nice! Does it also allow all of the casts to and from text to
implicitly work, e.g,

No, but we could think about that. Do we really want that to work for
any member of the string category? It seems a bit overly broad to me
...

regards, tom lane

#11

David E. Wheeler

david@kineticode.com

over 17 years ago

In reply to: Tom Lane (#10)

Re: Type Categories for User-Defined Types

On Jul 29, 2008, at 16:45, Tom Lane wrote:

Nice! Does it also allow all of the casts to and from text to
implicitly work, e.g,

No, but we could think about that. Do we really want that to work for
any member of the string category? It seems a bit overly broad to me
...

I was thinking about other text-style types, like UUID and UPC, and
was thinking probably not, at least for them. OTOH, some strings
should work that way, like citext or enums, the difference being that,
in most contexts, they can be treated as plain text, since they're
usually thought of that way. Or so it seems to me. Might there be a
way to create that distinction?

Best,

David

#12

Ryan Bradetich

rbradetich@gmail.com

over 17 years ago

In reply to: Tom Lane (#6)

Re: Type Categories for User-Defined Types

Tom,

On Tue, Jul 29, 2008 at 2:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, that's the point of the proposal. I think the issue has come up
once or twice before, too, else I'd not be so interested in a general
solution. (digs in archives ... there was some discussion of this
in connection with unsigned integer types, and I seem to recall older
threads but can't find any right now.)

Anything I should be looking into and/or testing for unsigned integer support?

Thanks!

- Ryan

P.S. I have most of the uint2 and uint4 types implemented. I am currently
working on the unit and performance testing. I need to upgrade one of
my Linux boxes first before I can complete my testing. I am hoping to post
my code and performance testing later this week.

#13

Bruce Momjian

bruce@momjian.us

over 17 years ago

In reply to: Tom Lane (#2)

Re: Type Categories for User-Defined Types

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

So this can fairly obviously be replaced by two new pg_type columns,
say "typcategory" and "typpreferred", where the latter is a bool.
Since the list of categories is pretty short and there's no obvious
reason to extend it a lot, I propose that we just represent typcategory
as a "char", using a mapping along the lines of

BITSTRING_TYPE b
BOOLEAN_TYPE B
DATETIME_TYPE D
GENERIC_TYPE P (think "pseudotype")
GEOMETRIC_TYPE G
INVALID_TYPE \0 (not allowed in catalog anyway)
NETWORK_TYPE n

...

I had a different thought when you posted the original prompt earlier. Instead
of a static list of type categories we could re-use type oids. So if you
belong to a type category you store the oid of the preferred type of that
category in typcategory.

I can't help thinking from the list above that there's nothing special about
datetime, geometric, and network data types that some user defined set of
types wouldn't necessarily want to define.

I do agree that having SQL commands to create new type categories, even a new
catalog table is overkill, but not because we wouldn't want to create new
ones. Just because there isn't really any other meta data we want to store
about type categories. Aside from the preferred type and the members there
isn't anything more to say about them.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

#14

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Bruce Momjian (#13)

Re: Type Categories for User-Defined Types

Gregory Stark <stark@enterprisedb.com> writes:

I had a different thought when you posted the original prompt earlier. Instead
of a static list of type categories we could re-use type oids. So if you
belong to a type category you store the oid of the preferred type of that
category in typcategory.

The problem with that is it assumes that each category has one and only
one preferred type. This isn't the case in NUMERIC_TYPE for instance.

What is actually going on in NUMERIC_TYPE is that the OID and OID-alias
types form a sub-category; but you have to look at the set of available
implicit casts to realize that. We could maybe get away with one
preferred type per category if we wanted to invent an explicit
representation of sub-categories, but that seems too complicated.
In any case it's not clear to me that the flexibility to have more than
one preferred type might not be helpful in other, less clearly
structured categories.

I can't help thinking from the list above that there's nothing special about
datetime, geometric, and network data types that some user defined set of
types wouldn't necessarily want to define.

Well, that's why the set of categories needs to be extensible. I agree
that some of the existing categories aren't really pulling their weight:
unless you have enough implicit casts within the category to cause
ambiguity, you don't really need preferred types and hence not your own
category.

For the moment I'm just trying to model the behavior of the existing
functions exactly; but once the catalog infrastructure is in, we could
revisit some of those old decisions and try to eliminate useless
categories. In particular I'm thinking that the historical behavior of
treating all user-defined types as preferred might be backwards.

I do agree that having SQL commands to create new type categories, even a new
catalog table is overkill, but not because we wouldn't want to create new
ones. Just because there isn't really any other meta data we want to store
about type categories. Aside from the preferred type and the members there
isn't anything more to say about them.

Agreed.

regards, tom lane

#15

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: Ryan Bradetich (#12)

Re: Type Categories for User-Defined Types

"Ryan Bradetich" <rbradetich@gmail.com> writes:

On Tue, Jul 29, 2008 at 2:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, that's the point of the proposal. I think the issue has come up
once or twice before, too, else I'd not be so interested in a general
solution. (digs in archives ... there was some discussion of this
in connection with unsigned integer types, and I seem to recall older
threads but can't find any right now.)

Anything I should be looking into and/or testing for unsigned integer support?

Dunno, I forget what the conclusion was about implicit casting for the
unsigned types in your proposal. Have you experimented with seeing
whether, eg, UNION'ing an unsigned with some signed-integer value
behaves sensibly?

The thread I mention above was a year or so back and was originated by
someone who wanted to duplicate mysql's behavior. Your proposal is
a lot more limited and might not really need to try to put the unsigned
types into the numeric category.

regards, tom lane

#16

Ryan Bradetich

rbradetich@gmail.com

over 17 years ago

In reply to: Tom Lane (#15)

Re: Type Categories for User-Defined Types

Hello Tom,

On Wed, Jul 30, 2008 at 7:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dunno, I forget what the conclusion was about implicit casting for the
unsigned types in your proposal. Have you experimented with seeing
whether, eg, UNION'ing an unsigned with some signed-integer value
behaves sensibly?

That was not one of my tests, but I will add it :)

Here are the performance tests I am planning / working on:

1. Loading of both integer and unsigned integers into a table. This test is
mainly a sanity check to ensure the unsigned integers do not add
significant time during inserts. In a perfect world, I believe they should
take the same time when the unsigned integers and integers have equal
widths. Experimentation is showing the unsigned integer to take slightly
longer. I suspect this is due to the additional integer->unsigned
integer cast? I am still digging into this for my personal curiosity.

2. Testing the < operator. The unsigned integer type provides a native cast
for the < operator.

3. Testing the & operator. The unsigned integer type does not provide a
native cast for the & operator, so they are cast up to the next larger size.

I am testing this for both signed and unsigned integers with data sets of the
following sizes: 100, 1000, 10000, 100000, 500000, 1M, 5M, and 10M rows.
I am planning to test on both 32-bit and 64-bit x86 platforms. If there is
interest, I can also run these tests on 32-bit and 64-bit PowerPC platforms.

I will add the union test to my test plan. Anything else I should add or any
holes I am missing with this test plan?

The thread I mention above was a year or so back and was originated by
someone who wanted to duplicate mysql's behavior. Your proposal is
a lot more limited and might not really need to try to put the unsigned
types into the numeric category.

Ah, ok. I will not worry about it for now.

Thanks!

- Ryan

#17

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: David E. Wheeler (#11)

Re: Type Categories for User-Defined Types

"David E. Wheeler" <david@kineticode.com> writes:

On Jul 29, 2008, at 16:45, Tom Lane wrote:

[ move preferred-type info into the system catalogs ]

I've committed changes along this line. Please look at CVS HEAD and
re-evaluate which alias functions/operators you still really need.

Nice! Does it also allow all of the casts to and from text to
implicitly work, e.g,

No, but we could think about that. Do we really want that to work for
any member of the string category? It seems a bit overly broad to me
...

I was thinking about other text-style types, like UUID and UPC, and
was thinking probably not, at least for them. OTOH, some strings
should work that way, like citext or enums, the difference being that,
in most contexts, they can be treated as plain text, since they're
usually thought of that way. Or so it seems to me. Might there be a
way to create that distinction?

UUID and so on aren't considered part of the string category, and
shouldn't be IMHO ... any type that has semantics significantly
different from "arbitrary string of text" doesn't belong. At the
same time I'm not entirely sure that we want the I/O conversions
to work for everything in the category. As of CVS HEAD, what we've
got in string category by default are

template1=# select oid::regtype, typtype, typispreferred from pg_type where typcategory = 'S';
oid | typtype | typispreferred
-----------------------------------+---------+----------------
"char" | b | f
name | b | f
text | b | t
character | b | f
character varying | b | f
information_schema.character_data | d | f
information_schema.sql_identifier | d | f
(7 rows)

and you have to remember that *any* domain created over a string type
will also be considered to be of string category.

The behavior that's hard-wired into parse_coerce.c at the moment
is that only text, varchar, bpchar can be sources or targets of
I/O conversions. While opening it up to citext sounds reasonable,
I'm a lot less sure about domains.

[ pokes at it ... ] Oh, I hadn't realized this: find_coercion_pathway
is looking at types that it's already smashed to base types, so
actually you can get an I/O conversion for a domain over one of these
types already!

regression=# create domain d2 as varchar(2);
CREATE DOMAIN
regression=# select 123::int4::d2;
d2
----
12
(1 row)

So maybe the domain issue isn't so important.

Comments anyone?

regards, tom lane

#18

David E. Wheeler

david@kineticode.com

over 17 years ago

In reply to: Tom Lane (#17)

Re: Type Categories for User-Defined Types

On Jul 30, 2008, at 10:34, Tom Lane wrote:

[ move preferred-type info into the system catalogs ]

I've committed changes along this line. Please look at CVS HEAD and
re-evaluate which alias functions/operators you still really need.

Okay, I'll hunt down some tuits today to hack on that.

UUID and so on aren't considered part of the string category, and
shouldn't be IMHO ... any type that has semantics significantly
different from "arbitrary string of text" doesn't belong.

Yes, that was essentially my point. "arbitrary string of text" types
are probably fairly rare, since one can just use text or citext or
varchar.

At the
same time I'm not entirely sure that we want the I/O conversions
to work for everything in the category. As of CVS HEAD, what we've
got in string category by default are

template1=# select oid::regtype, typtype, typispreferred from
pg_type where typcategory = 'S';
oid | typtype | typispreferred
-----------------------------------+---------+----------------
"char" | b | f
name | b | f
text | b | t
character | b | f
character varying | b | f
information_schema.character_data | d | f
information_schema.sql_identifier | d | f
(7 rows)

and you have to remember that *any* domain created over a string type
will also be considered to be of string category.

Right, that all makes sense.

The behavior that's hard-wired into parse_coerce.c at the moment
is that only text, varchar, bpchar can be sources or targets of
I/O conversions. While opening it up to citext sounds reasonable,
I'm a lot less sure about domains.

So who might we open it up so that citext can take advantage of it?
I'd love to get rid of all these ugly cast functions I've been writing.

[ pokes at it ... ] Oh, I hadn't realized this: find_coercion_pathway
is looking at types that it's already smashed to base types, so
actually you can get an I/O conversion for a domain over one of these
types already!

regression=# create domain d2 as varchar(2);
CREATE DOMAIN
regression=# select 123::int4::d2;
d2
----
12
(1 row)

So maybe the domain issue isn't so important.

What about enums?

Best,

David

#19

Kevin Grittner

Kevin.Grittner@wicourts.gov

over 17 years ago

In reply to: Tom Lane (#17)

Re: Type Categories for User-Defined Types

Tom Lane <tgl@sss.pgh.pa.us> wrote:

and you have to remember that *any* domain created over a string

type

will also be considered to be of string category.

find_coercion_pathway
is looking at types that it's already smashed to base types, so
actually you can get an I/O conversion for a domain over one of

these

types already!

regression=# create domain d2 as varchar(2);
CREATE DOMAIN
regression=# select 123::int4::d2;
d2
----
12
(1 row)

I would find is surprising if a domain didn't share these properties
with its underlying type.

-Kevin

#20

Tom Lane

tgl@sss.pgh.pa.us

over 17 years ago

In reply to: David E. Wheeler (#18)

Re: Type Categories for User-Defined Types

"David E. Wheeler" <david@kineticode.com> writes:

On Jul 30, 2008, at 10:34, Tom Lane wrote:

The behavior that's hard-wired into parse_coerce.c at the moment
is that only text, varchar, bpchar can be sources or targets of
I/O conversions. While opening it up to citext sounds reasonable,
I'm a lot less sure about domains.
[ uh, no, actually domains do work here ]
So maybe the domain issue isn't so important.