[RFC] Unsigned integer support.

Started by Ryan Bradetichover 17 years ago25 messages
#1Ryan Bradetich
rbradetich@gmail.com

Hello hackers,

I know the development community is in the middle of the July 2008
commit-fest, so I apologize if this design proposals are in
appropriate at this time.

I am looking to take advantage of PostgreSQL extensible type system
and implement unsigned integer support. The data I am dealing with
contains
many unsigned data types and I am planning on using this type to
reduce storage consumption. I am initially looking to add support for
16-bit and
32-bit unsigned integers with the potential to add 8-bit and 64-bit
unsigned integers if needed or desired by the community.

Searching through the list archives, I found two objections raised in the past:

1. Implicit casts between different data types.

I am hoping the removal of many of the implicit casts in
PostgreSQL 8.3 will simplify this task to where this objection can be
removed.

My plan (without much experimentation) is to have maybe a handful
of casts (something like):
* uint4 -> integer
* integer -> uint4
* bigint -> uint4
* integer -> uint2
* uint2 -> smallint

and then provide operators to provide a reasonable set of
functionality. My initial thought for this functionality is to
provide default operators on any
type that is not implicitly casted on the psql command-line.

As an example, I am planning for the following SQL statements to
work correctly:

3000000000::uint4 + 10 and
3000000000::uint4 > 10

My understanding is the SQL standard does not provide support for
unsigned integers, so I am planning on making all casts from unsigned
integers
to other data types explicit. Is this acceptable to the community?

Another question for the community is should we allow the following cast?
-1::uint4

Even though this is acceptable c-code, I am leaning towards
throwing an out-of-range error when this occurs.

Are there some areas I am missing or should investigate further
before working on this project?

2. There is not much demand for unsigned integer types.

Not much I can do about that :) I am willing to post my work as
a PgFoundry project.

PgFoundry already has an uint project:
http://pgfoundry.org/projects/uint/

Unfortunately this project seems to have not gone anywhere. Last
activity was late 2006 and there are not any files checked into the
SCM repository.
Is it acceptable to hijack this PgFoundry project? Or should I
start a new project (assuming there is any interest in publishing this
work).

Although I am not targeting inclusion for this type in the core
PostgreSQL code, I would like to post code for review and receive
feedback from the
community on this work. As I understand this RFC is the first step in
the process :) Once I have some code ready for review, is it
acceptable to use the
commit-fest wiki for this project?

Thanks much for your time!

- Ryan

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ryan Bradetich (#1)
Re: [RFC] Unsigned integer support.

"Ryan Bradetich" <rbradetich@gmail.com> writes:

I am looking to take advantage of PostgreSQL extensible type system
and implement unsigned integer support.

This has been proposed before, and foundered before on the question
of implicit coercions. If you're willing to make all coercions *to*
unsigned types be explicit (or at most assignment), then I think
it can be made to work without breaking anything. But usually the
folk who ask for this feature are hoping that bare integer literals
like "42" will get interpreted as unsigned when they want them to be.
The problem with that wish is illustrated by

select 1500000000 + 1500000000;

These literals might be either int4 or uint4, therefore this command
might yield either an integer-overflow error or 3000000000::uint4.
That's not a distinction you can fuzz over --- it's got to be one
or the other, and backwards compatibility says it'd better be the
first.

I am hoping the removal of many of the implicit casts in
PostgreSQL 8.3 will simplify this task to where this objection can be
removed.

The implicit casts we removed were cross-type-category cases.
If you hope for unsigned types to be considered part of the numeric
category, there's no guidance for you there. In fact, the real nub
of the problem is what type shall be initially assigned to an
integer-looking literal, and how will you get things to behave sanely
if that initial choice wasn't what was desired. We still have some
issues around the fact that "42" isn't considered a smallint. Throwing
in another possible meaning isn't going to help.

My understanding is the SQL standard does not provide support for
unsigned integers, so I am planning on making all casts from unsigned
integers to other data types explicit.

It's really the other direction that would be contentious ...

regards, tom lane

#3Ryan Bradetich
rbradetich@gmail.com
In reply to: Tom Lane (#2)
Re: [RFC] Unsigned integer support.

Hello Tom,

On Thu, Jul 24, 2008 at 10:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Ryan Bradetich" <rbradetich@gmail.com> writes:

I am looking to take advantage of PostgreSQL extensible type system
and implement unsigned integer support.

This has been proposed before, and foundered before on the question
of implicit coercions. If you're willing to make all coercions *to*
unsigned types be explicit (or at most assignment), then I think
it can be made to work without breaking anything. But usually the
folk who ask for this feature are hoping that bare integer literals
like "42" will get interpreted as unsigned when they want them to be.
The problem with that wish is illustrated by

select 1500000000 + 1500000000;

These literals might be either int4 or uint4, therefore this command
might yield either an integer-overflow error or 3000000000::uint4.
That's not a distinction you can fuzz over --- it's got to be one
or the other, and backwards compatibility says it'd better be the
first.

I am in agreement with you on this. Since SQL does not specify
unsigned types, I was assuming only explicit and assignment casts.
I should have probably mentioned that in the RFC. Thanks for
pointing this out.

My main goal for this type is the reduced storage space. I am fine
with people needing to cast to the unsigned types to benefit from the
reduced storage space.

My plans for the example above would be:

1. SELECT 1500000000 + 1500000000 --> Throws overflow error.
2. SELECT 1500000000::uint4 + 1500000000 --> Returns 3000000000::uint4.

I am hoping the removal of many of the implicit casts in
PostgreSQL 8.3 will simplify this task to where this objection can be
removed.

The implicit casts we removed were cross-type-category cases.
If you hope for unsigned types to be considered part of the numeric
category, there's no guidance for you there. In fact, the real nub
of the problem is what type shall be initially assigned to an
integer-looking literal, and how will you get things to behave sanely
if that initial choice wasn't what was desired. We still have some
issues around the fact that "42" isn't considered a smallint. Throwing
in another possible meaning isn't going to help.

My understanding is the SQL standard does not provide support for
unsigned integers, so I am planning on making all casts from unsigned
integers to other data types explicit.

It's really the other direction that would be contentious ...

regards, tom lane

Thanks for your comments! I have already started to play around a bit with
the types and will hopefully have some code ready for review / feedback soon.

- Ryan

#4Gregory Stark
stark@enterprisedb.com
In reply to: Ryan Bradetich (#3)
Re: [RFC] Unsigned integer support.

"Ryan Bradetich" <rbradetich@gmail.com> writes:

My plans for the example above would be:

1. SELECT 1500000000 + 1500000000 --> Throws overflow error.
2. SELECT 1500000000::uint4 + 1500000000 --> Returns 3000000000::uint4.

I think that wouldn't actually work. Postgres's parser immediately assigns a
type to the bare unquoted integral constant so it would end up with a int4
type. Then when it has to pick an operator for uint4+int4 it wouldn't be able
to cast the int4 to uint4 because there would be no implicit cast.

You could make it work by having a uint4+int4 operator which returns uint4 but
then you're going to need a *lot* of operators....

One other idea that's been mentioned before is treating integral constants
like 150000 as type "unknown" like the quoted '150000' constant is. That way
the parser would see uint4+unknown and could pick the uint4 operator. But that
would be a pretty massive semantics change.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

#5Peter Eisentraut
peter_e@gmx.net
In reply to: Ryan Bradetich (#1)
Re: [RFC] Unsigned integer support.

Am Friday, 25. July 2008 schrieb Ryan Bradetich:

PgFoundry already has an uint project:
� � � � http://pgfoundry.org/projects/uint/

� � Unfortunately this project seems to have not gone anywhere. �Last
activity was late 2006 and there are not any files checked into the
SCM repository.
� � Is it acceptable to hijack this PgFoundry project? �Or should I
start a new project (assuming there is any interest in publishing this
work).

Please hijack the project and develop your code there. Of course you can
always ask for advice here.

#6Ryan Bradetich
rbradetich@gmail.com
In reply to: Gregory Stark (#4)
Re: [RFC] Unsigned integer support.

Hello Greg,

On Fri, Jul 25, 2008 at 3:57 AM, Gregory Stark <stark@enterprisedb.com> wrote:

"Ryan Bradetich" <rbradetich@gmail.com> writes:

My plans for the example above would be:

1. SELECT 1500000000 + 1500000000 --> Throws overflow error.
2. SELECT 1500000000::uint4 + 1500000000 --> Returns 3000000000::uint4.

I think that wouldn't actually work. Postgres's parser immediately assigns a
type to the bare unquoted integral constant so it would end up with a int4
type. Then when it has to pick an operator for uint4+int4 it wouldn't be able
to cast the int4 to uint4 because there would be no implicit cast.

You could make it work by having a uint4+int4 operator which returns uint4 but
then you're going to need a *lot* of operators....

This was my plan. I performed some testing last night to verify that
bare literals
are considered plain integers and would not be implicitly casted to a
different type
(i.e. smallint or bigint). I am seeing three operators for most operations:

1. uint4 -> uint4 = uint4
2. int4 -> uint4 = uint4
3. uint4 -> int4 = uint4

Is there something I need to watch out for when adding this number of
operators (i.e.
performance impact, etc)? Some tests I should be running to measure the impact
of adding these operators?

One other idea that's been mentioned before is treating integral constants
like 150000 as type "unknown" like the quoted '150000' constant is. That way
the parser would see uint4+unknown and could pick the uint4 operator. But that
would be a pretty massive semantics change.

This would require changes to the core PostgreSQL code correct? My
goal for this
type was to have it as an external project on PgFoundry since there
does not appear
to be much demand for it and unsigned types are not specified in the
SQL standard.
If the community decides this support would be better in core
PostgreSQL code, then
I am willing to help with that work, but I will need a significant
amount of guidance :)

With my limited knowledge, the best (and easiest) path seems to take
advantage of
the extensible type system in PostgreSQL and support unsigned integers as a
PgFoundry project.

Thanks for your review and comments!

- Ryan

Show quoted text

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

#7Ryan Bradetich
rbradetich@gmail.com
In reply to: Peter Eisentraut (#5)
Re: [RFC] Unsigned integer support.

Hello Peter,

On Fri, Jul 25, 2008 at 5:14 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

Am Friday, 25. July 2008 schrieb Ryan Bradetich:

PgFoundry already has an uint project:
http://pgfoundry.org/projects/uint/

Unfortunately this project seems to have not gone anywhere. Last
activity was late 2006 and there are not any files checked into the
SCM repository.
Is it acceptable to hijack this PgFoundry project? Or should I
start a new project (assuming there is any interest in publishing this
work).

Please hijack the project and develop your code there. Of course you can
always ask for advice here.

I will work on getting the PgFoundry project setup.

Thanks!

- Ryan

#8Alvaro Herrera
alvherre@commandprompt.com
In reply to: Gregory Stark (#4)
Re: [RFC] Unsigned integer support.

Gregory Stark escribi�:

One other idea that's been mentioned before is treating integral constants
like 150000 as type "unknown" like the quoted '150000' constant is. That way
the parser would see uint4+unknown and could pick the uint4 operator. But that
would be a pretty massive semantics change.

Hmm, if we do that, how would the system resolve something like this?

select 1000 + 1000

There would be no clue as to what + operator to pick, since both
operands are unknown. This is in fact what happens today with

alvherre=# select '100' + '100';
ERROR: operator is not unique: unknown + unknown at character 14
HINT: Could not choose a best candidate operator. You might need to add explicit type casts.
STATEMENT: select '100' + '100';

I think this is a nonstarter.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#9Gregory Stark
stark@enterprisedb.com
In reply to: Alvaro Herrera (#8)
Re: [RFC] Unsigned integer support.

"Alvaro Herrera" <alvherre@commandprompt.com> writes:

Hmm, if we do that, how would the system resolve something like this?

select 1000 + 1000

Well we have the same problem with 'foo' || 'bar'. The question I think is
whether the solution there scales to having two different fallback types.

There would be no clue as to what + operator to pick, since both
operands are unknown. This is in fact what happens today with

alvherre=# select '100' + '100';
ERROR: operator is not unique: unknown + unknown at character 14
HINT: Could not choose a best candidate operator. You might need to add explicit type casts.
STATEMENT: select '100' + '100';

Perhaps we could kill two birds with one stone...

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

#10Alvaro Herrera
alvherre@commandprompt.com
In reply to: Gregory Stark (#9)
Re: [RFC] Unsigned integer support.

Gregory Stark escribi�:

"Alvaro Herrera" <alvherre@commandprompt.com> writes:

Hmm, if we do that, how would the system resolve something like this?

select 1000 + 1000

Well we have the same problem with 'foo' || 'bar'. The question I think is
whether the solution there scales to having two different fallback types.

Hmm, right. But you need more than two: consider

alvherre=# select 0.42 + 1;
?column?
----------
1.42
(1 ligne)

However, it would be neat if this behaved the same as

alvherre=# select '0.42' + 1;
ERROR: invalid input syntax for integer: "0.42"
STATEMENT: select '0.42' + 1;

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#11Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Alvaro Herrera (#10)
Re: [RFC] Unsigned integer support.

Alvaro Herrera <alvherre@commandprompt.com> wrote:

consider

alvherre=# select 0.42 + 1;
?column?
----------
1.42
(1 ligne)

However, it would be neat if this behaved the same as

alvherre=# select '0.42' + 1;
ERROR: invalid input syntax for integer: "0.42"
STATEMENT: select '0.42' + 1;

I wouldn't want the former to fail.

I also wouldn't like these to fail:

select 5000000000 + 1;
select 'abc'::text || 'def'::varchar(3);

-Kevin

#12Alvaro Herrera
alvherre@commandprompt.com
In reply to: Kevin Grittner (#11)
Re: [RFC] Unsigned integer support.

Kevin Grittner escribi�:

Alvaro Herrera <alvherre@commandprompt.com> wrote:

consider

alvherre=# select 0.42 + 1;
?column?
----------
1.42
(1 ligne)

However, it would be neat if this behaved the same as

alvherre=# select '0.42' + 1;
ERROR: invalid input syntax for integer: "0.42"
STATEMENT: select '0.42' + 1;

I wouldn't want the former to fail.

Sorry, I was unclear. What I meant was that both 0.42 + 1 and
'0.42' + 1 should be treated the same, and they should both produce a
numeric output.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ryan Bradetich (#6)
Re: [RFC] Unsigned integer support.

"Ryan Bradetich" <rbradetich@gmail.com> writes:

On Fri, Jul 25, 2008 at 3:57 AM, Gregory Stark <stark@enterprisedb.com> wrote:

"Ryan Bradetich" <rbradetich@gmail.com> writes:

My plans for the example above would be:

1. SELECT 1500000000 + 1500000000 --> Throws overflow error.
2. SELECT 1500000000::uint4 + 1500000000 --> Returns 3000000000::uint4.

You could make it work by having a uint4+int4 operator which returns uint4 but
then you're going to need a *lot* of operators....

This was my plan.

Like he says, it's a *lot* of operators, and the point doesn't seem
entirely clear to me. You'll still have overflow cases, they'll just be
in different places.

Consider the idea of not having any uint4-specific arithmetic operators,
but instead providing the following:

* assignment casts from int4 and int8 to uint4
(these throw error if out of range, of course)
* implicit cast from uint4 to int8 (can never fail)

The effect of providing the latter cast would be that any arithmetic
involving a uint4 column would automatically be done in int8. Which
would make it a shade slower than a native implementation, but probably
not enough slower to be a problem --- and you'd avoid having to write
dozens of operators and underlying support functions. Storing into the
uint4 column would work fine with no extra notation because of the
assignment casts.

Moreover, you'd avoid cluttering the system with a pile of cross-type
operators, which we have recently realized are not a good thing, because
they increase the likelihood of "ambiguous operator" problems --- see
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00750.php

For uint8 you'd have to promote to numeric to guarantee no failure
in the implicit cast; which is going to be a rather bigger performance
hit, but I don't really see uint8 as being a type with huge demand.

Now you probably *will* want cross-type comparison operators, if you
are going to support indexing of unsigned columns, so that something
like
uint4col > 42
can be indexed without any casting. But limiting yourself to the six
basic comparison operators certainly makes it a much less bulky project.

regards, tom lane

#14Dann Corbit
DCorbit@connx.com
In reply to: Tom Lane (#13)
Re: [RFC] Unsigned integer support.

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
owner@postgresql.org] On Behalf Of Tom Lane
Sent: Friday, July 25, 2008 12:32 PM
To: Ryan Bradetich
Cc: Gregory Stark; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [RFC] Unsigned integer support.

"Ryan Bradetich" <rbradetich@gmail.com> writes:

On Fri, Jul 25, 2008 at 3:57 AM, Gregory Stark

<stark@enterprisedb.com> wrote:

"Ryan Bradetich" <rbradetich@gmail.com> writes:

My plans for the example above would be:

1. SELECT 1500000000 + 1500000000 --> Throws overflow error.
2. SELECT 1500000000::uint4 + 1500000000 --> Returns

3000000000::uint4.

You could make it work by having a uint4+int4 operator which

returns

uint4 but

then you're going to need a *lot* of operators....

This was my plan.

Like he says, it's a *lot* of operators, and the point doesn't seem
entirely clear to me. You'll still have overflow cases, they'll just
be
in different places.

Consider the idea of not having any uint4-specific arithmetic
operators,
but instead providing the following:

* assignment casts from int4 and int8 to uint4
(these throw error if out of range, of course)
* implicit cast from uint4 to int8 (can never fail)

The effect of providing the latter cast would be that any arithmetic
involving a uint4 column would automatically be done in int8. Which
would make it a shade slower than a native implementation, but

probably

not enough slower to be a problem --- and you'd avoid having to write
dozens of operators and underlying support functions. Storing into

the

uint4 column would work fine with no extra notation because of the
assignment casts.

Moreover, you'd avoid cluttering the system with a pile of cross-type
operators, which we have recently realized are not a good thing,
because
they increase the likelihood of "ambiguous operator" problems --- see
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00750.php

For uint8 you'd have to promote to numeric to guarantee no failure
in the implicit cast; which is going to be a rather bigger performance
hit, but I don't really see uint8 as being a type with huge demand.

Now you probably *will* want cross-type comparison operators, if you
are going to support indexing of unsigned columns, so that something
like
uint4col > 42
can be indexed without any casting. But limiting yourself to the six
basic comparison operators certainly makes it a much less bulky
project.

At the cost of one bit of storage, you have compatible types using
CREATE DOMAIN:

CREATE DOMAIN name [ AS ] data_type
[ DEFAULT expression ]
[ constraint [ ... ] ]

where constraint is:

[ CONSTRAINT constraint_name ]
{ NOT NULL | NULL | CHECK (expression) }

More specifically:

CREATE DOMAIN usmallint AS SMALLINT CHECK(VALUE > 0);
CREATE DOMAIN uinteger AS INTEGER CHECK(VALUE > 0);
CREATE DOMAIN ubigint AS BIGINT CHECK(VALUE > 0);
CREATE DOMAIN unumeric AS NUMERIC CHECK(VALUE > 0);

Seems like a heck of a lot less work to me. Not to mention very easy to
use.

C:\Program Files (x86)\PostgreSQL\8.3\bin>psql -h localhost -U postgres
domaintest
Password for user postgres:
Welcome to psql 8.3.3, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

Warning: Console code page (437) differs from Windows code page (1252)
8-bit characters might not work correctly. See psql reference
page "Notes for Windows users" for details.

domaintest=# CREATE DOMAIN usmallint AS SMALLINT CHECK(VALUE > 0);
CREATE DOMAIN
domaintest=# CREATE DOMAIN uinteger AS INTEGER CHECK(VALUE > 0);
CREATE DOMAIN
domaintest=# CREATE DOMAIN ubigint AS BIGINT CHECK(VALUE > 0);
CREATE DOMAIN
domaintest=# CREATE DOMAIN unumeric AS NUMERIC CHECK(VALUE > 0);
CREATE DOMAIN
domaintest=#
domaintest=# create table integer_types (
domaintest(# usCol usmallint,
domaintest(# sCol smallint,
domaintest(# uiCol uinteger,
domaintest(# iCol integer,
domaintest(# ubCol ubigint,
domaintest(# bCol bigint,
domaintest(# unCol unumeric,
domaintest(# nCol numeric
domaintest(# );
CREATE TABLE
domaintest=# create index i1 on integer_types(usCol);
CREATE INDEX
domaintest=# create index i2 on integer_types(sCol);
CREATE INDEX
domaintest=# create index i3 on integer_types(uiCol);
CREATE INDEX
domaintest=# create index i4 on integer_types(iCol);
CREATE INDEX
domaintest=# create index i5 on integer_types(ubCol);
CREATE INDEX
domaintest=# create index i6 on integer_types(bCol);
CREATE INDEX
domaintest=# create index i7 on integer_types(unCol);
CREATE INDEX
domaintest=# create index i8 on integer_types(nCol);
CREATE INDEX
domaintest=# insert into integer_types values(1,1,1,1,1,1,1,1);
INSERT 0 1
domaintest=# select * from integer_types;
uscol | scol | uicol | icol | ubcol | bcol | uncol | ncol
-------+------+-------+------+-------+------+-------+------
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1
(1 row)

domaintest=# insert into integer_types (usCol) values (-1);
ERROR: value for domain usmallint violates check constraint
"usmallint_check"
domaintest=#

#15Andrew Dunstan
andrew@dunslane.net
In reply to: Dann Corbit (#14)
Re: [RFC] Unsigned integer support.

Dann Corbit wrote:

CREATE DOMAIN usmallint AS SMALLINT CHECK(VALUE > 0);
CREATE DOMAIN uinteger AS INTEGER CHECK(VALUE > 0);
CREATE DOMAIN ubigint AS BIGINT CHECK(VALUE > 0);
CREATE DOMAIN unumeric AS NUMERIC CHECK(VALUE > 0);

s/>/>=/g

cheers

andrew

#16Dann Corbit
DCorbit@connx.com
In reply to: Andrew Dunstan (#15)
Re: [RFC] Unsigned integer support.

-----Original Message-----
From: Andrew Dunstan [mailto:andrew@dunslane.net]
Sent: Friday, July 25, 2008 1:11 PM
To: Dann Corbit
Cc: Tom Lane; Ryan Bradetich; Gregory Stark; pgsql-
hackers@postgresql.org
Subject: Re: [HACKERS] [RFC] Unsigned integer support.

Dann Corbit wrote:

CREATE DOMAIN usmallint AS SMALLINT CHECK(VALUE > 0);
CREATE DOMAIN uinteger AS INTEGER CHECK(VALUE > 0);
CREATE DOMAIN ubigint AS BIGINT CHECK(VALUE > 0);
CREATE DOMAIN unumeric AS NUMERIC CHECK(VALUE > 0);

s/>/>=/g

I turned off the default "option" to remove extra line breaks.
Future posts should not be quite as even and bletcherous.
God willing, and the crick don't rise.

#17Andrew Dunstan
andrew@dunslane.net
In reply to: Dann Corbit (#16)
Re: [RFC] Unsigned integer support.

Dann Corbit wrote:

-----Original Message-----
From: Andrew Dunstan [mailto:andrew@dunslane.net]
Sent: Friday, July 25, 2008 1:11 PM
To: Dann Corbit
Cc: Tom Lane; Ryan Bradetich; Gregory Stark; pgsql-
hackers@postgresql.org
Subject: Re: [HACKERS] [RFC] Unsigned integer support.

Dann Corbit wrote:

CREATE DOMAIN usmallint AS SMALLINT CHECK(VALUE > 0);
CREATE DOMAIN uinteger AS INTEGER CHECK(VALUE > 0);
CREATE DOMAIN ubigint AS BIGINT CHECK(VALUE > 0);
CREATE DOMAIN unumeric AS NUMERIC CHECK(VALUE > 0);

s/>/>=/g

I turned off the default "option" to remove extra line breaks.
Future posts should not be quite as even and bletcherous.
God willing, and the crick don't rise.

I suspect you're missing my point, namely that 0 should be an allowed
value for unsigned types.

cheers

andrew

#18Dann Corbit
DCorbit@connx.com
In reply to: Andrew Dunstan (#17)
Re: [RFC] Unsigned integer support.

-----Original Message-----
From: Andrew Dunstan [mailto:andrew@dunslane.net]
Sent: Friday, July 25, 2008 1:28 PM
To: Dann Corbit
Cc: Tom Lane; Ryan Bradetich; Gregory Stark; pgsql-
hackers@postgresql.org
Subject: Re: [HACKERS] [RFC] Unsigned integer support.

Dann Corbit wrote:

-----Original Message-----
From: Andrew Dunstan [mailto:andrew@dunslane.net]
Sent: Friday, July 25, 2008 1:11 PM
To: Dann Corbit
Cc: Tom Lane; Ryan Bradetich; Gregory Stark; pgsql-
hackers@postgresql.org
Subject: Re: [HACKERS] [RFC] Unsigned integer support.

Dann Corbit wrote:

CREATE DOMAIN usmallint AS SMALLINT CHECK(VALUE > 0);
CREATE DOMAIN uinteger AS INTEGER CHECK(VALUE > 0);
CREATE DOMAIN ubigint AS BIGINT CHECK(VALUE > 0);
CREATE DOMAIN unumeric AS NUMERIC CHECK(VALUE > 0);

s/>/>=/g

I turned off the default "option" to remove extra line breaks.
Future posts should not be quite as even and bletcherous.
God willing, and the crick don't rise.

I suspect you're missing my point, namely that 0 should be an allowed
value for unsigned types.

Quite right. The domains I created were really the 'natural numbers'
rather than unsigned types.

#19Ryan Bradetich
rbradetich@gmail.com
In reply to: Tom Lane (#13)
Re: [RFC] Unsigned integer support.

Tom,

On Fri, Jul 25, 2008 at 12:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Consider the idea of not having any uint4-specific arithmetic operators,
but instead providing the following:

* assignment casts from int4 and int8 to uint4
(these throw error if out of range, of course)
* implicit cast from uint4 to int8 (can never fail)

The effect of providing the latter cast would be that any arithmetic
involving a uint4 column would automatically be done in int8. Which
would make it a shade slower than a native implementation, but probably
not enough slower to be a problem --- and you'd avoid having to write
dozens of operators and underlying support functions. Storing into the
uint4 column would work fine with no extra notation because of the
assignment casts.

This is an interesting idea that I will test out tonight. I did have
the following
concern looking through src/backend/utils/adt/int8.c: There is code that is
optionally compiled based on the INT64_IS_BUSTED pre-processor define.
Is this pre-processor define something I should worry about for portability
with this plan?

After I get uint types implemented, for fun I might try some benchmarks
to see if I can detect the int8 overhead on a 32-bit system.

Moreover, you'd avoid cluttering the system with a pile of cross-type
operators, which we have recently realized are not a good thing, because
they increase the likelihood of "ambiguous operator" problems --- see
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00750.php

Good to know. Thanks for the link.

For uint8 you'd have to promote to numeric to guarantee no failure
in the implicit cast; which is going to be a rather bigger performance
hit, but I don't really see uint8 as being a type with huge demand.

Hopefully I will not need the uint8 type. Right now for a project I am
looking at I need the uint2 and uint4 types. uint8 support can come
later if it is needed or requested.

Now you probably *will* want cross-type comparison operators, if you
are going to support indexing of unsigned columns, so that something
like
uint4col > 42
can be indexed without any casting. But limiting yourself to the six
basic comparison operators certainly makes it a much less bulky project.

This sounds excellent! Hopefully by using these operators I will be able to
avoid most of the casting to int8 for my use, while still providing the
complete functionality for this type.

Thanks again for your review and feedback!

- Ryan

#20Ryan Bradetich
rbradetich@gmail.com
In reply to: Dann Corbit (#14)
Re: [RFC] Unsigned integer support.

Hello Dann,

On Fri, Jul 25, 2008 at 1:06 PM, Dann Corbit <DCorbit@connx.com> wrote:

At the cost of one bit of storage, you have compatible types using

Thanks for your review and feedback! Unfortunately, I do need the full range
of the unsigned types for the project I am looking at. The reason I started
working on these types is because it seemed wasteful to use the next size
larger signed integer for the storage type of the unsigned integer.

Thanks for the suggestion!

- Ryan

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ryan Bradetich (#19)
Re: [RFC] Unsigned integer support.

"Ryan Bradetich" <rbradetich@gmail.com> writes:

... I did have the following
concern looking through src/backend/utils/adt/int8.c: There is code that is
optionally compiled based on the INT64_IS_BUSTED pre-processor define.
Is this pre-processor define something I should worry about for portability
with this plan?

I wouldn't worry, really ;-). Five or more years ago, it seemed
important for PG to work on machines without functional int64 support,
but there is little if any evidence that anyone is using current PG
releases on such platforms. I might well be the last active PG hacker
who gives a damn about that case at all, and even I long ago stopped
expecting anything beyond core functionality to work on such a machine.
Since your proposed unsigned types certainly aren't core functionality,
I see no reason that they should need to work on INT64_IS_BUSTED
platforms.

After I get uint types implemented, for fun I might try some benchmarks
to see if I can detect the int8 overhead on a 32-bit system.

Right, you need to check that before drinking the kool-aid ...

regards, tom lane

In reply to: Ryan Bradetich (#20)
Re: [RFC] Unsigned integer support.

Hi Ryan,

I agree, I have had applications use uint types to avoid using
a larger data type. I have actually had to patch an application
developed for MySQL uint8 to signed int8 on PostgreSQL. In that
case, the only operations that were performed where assignment
and lookup. If we need to use the numeric type for calculations,
that would be okay as long as the data is actually stored as
8-bytes, not numeric. It would certainly increase the ease of
moving an application from MySQL to PostgreSQL.

Cheers,
Ken

Show quoted text

On Fri, Jul 25, 2008 at 08:50:02PM -0700, Ryan Bradetich wrote:

Hello Dann,

On Fri, Jul 25, 2008 at 1:06 PM, Dann Corbit <DCorbit@connx.com> wrote:

At the cost of one bit of storage, you have compatible types using

Thanks for your review and feedback! Unfortunately, I do need the full range
of the unsigned types for the project I am looking at. The reason I started
working on these types is because it seemed wasteful to use the next size
larger signed integer for the storage type of the unsigned integer.

Thanks for the suggestion!

- Ryan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Decibel!
decibel@decibel.org
In reply to: Alvaro Herrera (#10)
1 attachment(s)
Re: [RFC] Unsigned integer support.

On Jul 25, 2008, at 11:44 AM, Alvaro Herrera wrote:

However, it would be neat if this behaved the same as

alvherre=# select '0.42' + 1;
ERROR: invalid input syntax for integer: "0.42"
STATEMENT: select '0.42' + 1;

Do we really want to be making it easier for people to wrap numbers
in quotes?
--
Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Attachments:

smime.p7sapplication/pkcs7-signature; name=smime.p7sDownload
#24Gregory Stark
stark@enterprisedb.com
In reply to: Decibel! (#23)
Re: [RFC] Unsigned integer support.

"Decibel!" <decibel@decibel.org> writes:

On Jul 25, 2008, at 11:44 AM, Alvaro Herrera wrote:

However, it would be neat if this behaved the same as

alvherre=# select '0.42' + 1;
ERROR: invalid input syntax for integer: "0.42"
STATEMENT: select '0.42' + 1;

Do we really want to be making it easier for people to wrap numbers in quotes?

Currently wrapping numbers in quotes is really the way Postgres expects to get
them. Quoted constants in Postgres are just "unknown" type which are parsed
according to context. Unquoted constants in Postgres are assigned a type by
the parser based on what they look like and then have to be cast to something
else if that turns out to be the wrong data type.

I think people from other languages expect quoted constants to be strings and
they often go out of their way to avoid them (causing themselves headaches
when they then need to deal with casting rules).

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gregory Stark (#24)
Re: [RFC] Unsigned integer support.

Gregory Stark <stark@enterprisedb.com> writes:

"Decibel!" <decibel@decibel.org> writes:

Do we really want to be making it easier for people to wrap numbers in quotes?

Currently wrapping numbers in quotes is really the way Postgres expects to get
them.

Really?

regression=# select '2' + '2';
ERROR: operator is not unique: unknown + unknown
LINE 1: select '2' + '2';
^
HINT: Could not choose a best candidate operator. You might need to add explicit type casts.

We've worked really hard to get to a point where interpretation of
literals mostly behaves unsurprisingly. I think most people would
find it surprising to think that quoting a number is the preferred
way to represent it.

regards, tom lane