Unsigned integer types

Started by Maciej Gajewskiover 12 years ago16 messages
#1Maciej Gajewski
maciej.gajewski0@gmail.com

Hi all

I know this topic was discussed before, but there doesn't seem to be
any conclusion.

The lack of unsigned integer types is one of the biggest sources of
grief in my daily work with pgsql.

Before I go and start hacking, I'd like to discuss few points:

1. Is there a strong objection against merging this kind of patch?

I can provide numerous reasons why using bigger int or numeric type
just doesn't cut.

2. How/if should the behaviour of numeric literals change?

The minimalistic solution is: it shouldn't, literals should be assumed
signed by default. More complex solution could involve using C-style
suffix ('123456u').

3. How/if should comparing singed and unsigned types work?

IMO they shouldn't be allowed and explicit cast should be required.

Thanks in advance!

Maciek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Albe Laurenz
laurenz.albe@wien.gv.at
In reply to: Maciej Gajewski (#1)
Re: Unsigned integer types

Maciej Gajewski wrote:

I know this topic was discussed before, but there doesn't seem to be
any conclusion.

The lack of unsigned integer types is one of the biggest sources of
grief in my daily work with pgsql.

Before I go and start hacking, I'd like to discuss few points:

1. Is there a strong objection against merging this kind of patch?

I can provide numerous reasons why using bigger int or numeric type
just doesn't cut.

It would be interesting to know these reasons.

Yours,
Laurenz Albe

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Maciej Gajewski (#1)
Re: Unsigned integer types

Maciej Gajewski <maciej.gajewski0@gmail.com> writes:

The lack of unsigned integer types is one of the biggest sources of
grief in my daily work with pgsql.
Before I go and start hacking, I'd like to discuss few points:
1. Is there a strong objection against merging this kind of patch?

Basically, there is zero chance this will happen unless you can find
a way of fitting them into the numeric promotion hierarchy that doesn't
break a lot of existing applications. We have looked at this more than
once, if memory serves, and failed to come up with a workable design
that didn't seem to violate the POLA.

2. How/if should the behaviour of numeric literals change?

The minimalistic solution is: it shouldn't, literals should be assumed
signed by default. More complex solution could involve using C-style
suffix ('123456u').

Well, if you don't do that, there is no need for you to merge anything:
you can build unsigned types as an external extension if they aren't
affecting the core parser's behavior. As long as it's external, you
don't need to satisfy anybody else's idea of what reasonable behavior
is ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Maciej Gajewski
maciej.gajewski0@gmail.com
In reply to: Tom Lane (#3)
Re: Unsigned integer types

The reasons are: performance, storage and frustration.

I think the frustration comes from the fact that unsigned integers are
universally available, except in PostgreSQL. I work with a really
complex system, with many moving parts, and Postgres really is one of
the components that causes the least trouble (compared to other
opens-source and closed-sourced systems, which I shall leave unnamed),
except for the unsigned integers.

Let me give you few examples:

1. SMALLINT

Probably the most popular unsigned short int on the planet: IP port
number. I had to store some network traffic data in DB; I
instinctively started to prototyping it like this:

CREATE TABLE packets (addr INET, port SMALLINT, ... );

Of course it failed quickly and I had to bump the size to INTEGER. No
real harm here, as the 2 bytes will probably go into some padding
anyway, but somehow it feels wrong.

2. INTEGER

I had to store a record with several uint32. I had to store an awful
lot of them; hundreds GB of data per day. Roughly half of the record
consists of uint32 fields.
Increasing the data type to bigint would mean that I could store 3
instead of 4 days worth of data on available storage.
Continuing with int4 meant that I would have to deal with the data in
special way when in enters and leaves the DB. It's easy in C: just
cast uint32_t to int32_t. But python code requires more complex
changes. And the web backend too...

It's suffering either way!

Just imagine the conversation I had to have with my boss: "Either
we'll increase budged for storage, or we need to touch every bit of
the system".

3 .BIGINT

There is no escape from bigint. Numeric (or TEXT!) is the only thing
that can keep uint64, but when you have 10^9 and more records, and you
need to do some arithmetic on it, numeric it's just too slow.

We use uint64 all across our system as unique event identifier. It
works fine, it's fast, and it's very convenient. Passing uint64
around, storing it, looking it up. We use it everywhere, including UI
and log files. So once I decided to use BIGINT to store it, I had to
guard all the inputs and outputs and make sure it is handled
correctly. Or so I though.

It turned out that some guys from different department are parsing
some logs with perl parser and they store it in DB. They choose to
store the uint64 id as TEXT. They probably tried BIGINT and failed and
decided that - since they have low volume and they are not doing any
arithmetics - to store it as TEXT.

And now someone came up with an idea to join one table with another,
bigint with text. I did it. Initially I wrote function that converted
the text to numeric, then rotated it around 2^64 if necessary. It was
too slow. Too slow for something that should be a simple
reinterpretation of data.

Eventually I ended up writing a C function, that first scanf(
"%llu")'d the text into uint64_t, and then PG_RETURN_INT64-ed the
uint64_t value. Works fast, but operations hate for increasing the
complexity of DB deployment.

---

I know some cynical people that love this kind of problems, they feel
that the constant struggle is what keeps them employed :) But I'm
ready to use my private time to solve it once and for all.

I'm afraid that implementing uints as and extension would introduce
some performance penalty (I may be wrong). I'm also afraid that with
the extension I'd be left on my own maintaining it forever. While if
this could go into the core product, it would live forever.

As for the POLA violation: programmers experienced with statically
typed languages shouldn't have problems dealing with all the issues
surrounding signed/unsigned integers (like the ones described here:
http://c-faq.com/expr/preservingrules.html). Others don't need to use
them.

Maciek

On 27 May 2013 16:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Maciej Gajewski <maciej.gajewski0@gmail.com> writes:

The lack of unsigned integer types is one of the biggest sources of
grief in my daily work with pgsql.
Before I go and start hacking, I'd like to discuss few points:
1. Is there a strong objection against merging this kind of patch?

Basically, there is zero chance this will happen unless you can find
a way of fitting them into the numeric promotion hierarchy that doesn't
break a lot of existing applications. We have looked at this more than
once, if memory serves, and failed to come up with a workable design
that didn't seem to violate the POLA.

2. How/if should the behaviour of numeric literals change?

The minimalistic solution is: it shouldn't, literals should be assumed
signed by default. More complex solution could involve using C-style
suffix ('123456u').

Well, if you don't do that, there is no need for you to merge anything:
you can build unsigned types as an external extension if they aren't
affecting the core parser's behavior. As long as it's external, you
don't need to satisfy anybody else's idea of what reasonable behavior
is ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Maciej Gajewski (#4)
Re: Unsigned integer types

On 05/28/2013 05:17 AM, Maciej Gajewski wrote:

I'm afraid that implementing uints as and extension would introduce
some performance penalty (I may be wrong).

You are.

I'm also afraid that with
the extension I'd be left on my own maintaining it forever. While if
this could go into the core product, it would live forever.

This is an argument against ever doing anything as an extension.

You have not at all addressed the real problem with doing what you are
asking for, the one that Tom Lane stated:

Basically, there is zero chance this will happen unless you can find
a way of fitting them into the numeric promotion hierarchy that doesn't
break a lot of existing applications. We have looked at this more than
once, if memory serves, and failed to come up with a workable design
that didn't seem to violate the POLA.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Bruce Momjian
bruce@momjian.us
In reply to: Maciej Gajewski (#4)
Re: Unsigned integer types

On Tue, May 28, 2013 at 11:17:42AM +0200, Maciej Gajewski wrote:

2. INTEGER

I had to store a record with several uint32. I had to store an awful
lot of them; hundreds GB of data per day. Roughly half of the record
consists of uint32 fields.
Increasing the data type to bigint would mean that I could store 3
instead of 4 days worth of data on available storage.
Continuing with int4 meant that I would have to deal with the data in
special way when in enters and leaves the DB. It's easy in C: just
cast uint32_t to int32_t. But python code requires more complex
changes. And the web backend too...

It's suffering either way!

Just imagine the conversation I had to have with my boss: "Either
we'll increase budged for storage, or we need to touch every bit of
the system".

Did you try 'oid' as an unsigned int4?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Jim Nasby
jim@nasby.net
In reply to: Bruce Momjian (#6)
Re: Unsigned integer types

On 5/28/13 4:07 PM, Bruce Momjian wrote:

On Tue, May 28, 2013 at 11:17:42AM +0200, Maciej Gajewski wrote:

2. INTEGER

I had to store a record with several uint32. I had to store an awful
lot of them; hundreds GB of data per day. Roughly half of the record
consists of uint32 fields.
Increasing the data type to bigint would mean that I could store 3
instead of 4 days worth of data on available storage.
Continuing with int4 meant that I would have to deal with the data in
special way when in enters and leaves the DB. It's easy in C: just
cast uint32_t to int32_t. But python code requires more complex
changes. And the web backend too...

It's suffering either way!

Just imagine the conversation I had to have with my boss: "Either
we'll increase budged for storage, or we need to touch every bit of
the system".

Did you try 'oid' as an unsigned int4?

Using an internal catalog type for user data seems like a horrible idea to me...

I'll also add that Maciej hasn't explained why these types couldn't be an extension (in fact, I'm pretty sure there's already code for this out there, though possibly not utilizing the extension framework).

If you don't need implicit casting it should actually be pretty easy to do this externally, and I don't think maintenance would be an issue (it's not like uint's change...).
--
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8David Johnston
polobo@yahoo.com
In reply to: Maciej Gajewski (#4)
Re: Unsigned integer types

Maciej Gajewski wrote

I'm also afraid that with
the extension I'd be left on my own maintaining it forever. While if
this could go into the core product, it would live forever.

Clarification from the gallery: are we talking an extension or a custom
PostgreSQL build/fork?

If it is an extension the stick it up on GitHub and let whomever finds it
valuable help contribute to keeping it relevant.

No use letting perfection stand in the way of usability. If the current
solutions are too slow then exploring the extension aspect - even if it
falls short - is worthwhile. At minimum you learn from the experience and
maybe someone else (or even yourself) can build on that foundation.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Unsigned-integer-types-tp5756994p5757234.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Bruce Momjian
bruce@momjian.us
In reply to: Jim Nasby (#7)
Re: Unsigned integer types

On Tue, May 28, 2013 at 05:57:41PM -0500, Jim Nasby wrote:

On 5/28/13 4:07 PM, Bruce Momjian wrote:

On Tue, May 28, 2013 at 11:17:42AM +0200, Maciej Gajewski wrote:

2. INTEGER

I had to store a record with several uint32. I had to store an awful
lot of them; hundreds GB of data per day. Roughly half of the record
consists of uint32 fields.
Increasing the data type to bigint would mean that I could store 3
instead of 4 days worth of data on available storage.
Continuing with int4 meant that I would have to deal with the data in
special way when in enters and leaves the DB. It's easy in C: just
cast uint32_t to int32_t. But python code requires more complex
changes. And the web backend too...

It's suffering either way!

Just imagine the conversation I had to have with my boss: "Either
we'll increase budged for storage, or we need to touch every bit of
the system".

Did you try 'oid' as an unsigned int4?

Using an internal catalog type for user data seems like a horrible idea to me...

Uh, not sure if we can say oid is only an internal catalog type. It is
certainly used for storing large object references.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#9)
Re: Unsigned integer types

On 05/28/2013 07:00 PM, Bruce Momjian wrote:

On Tue, May 28, 2013 at 05:57:41PM -0500, Jim Nasby wrote:

Did you try 'oid' as an unsigned int4?

Using an internal catalog type for user data seems like a horrible idea to me...

Uh, not sure if we can say oid is only an internal catalog type. It is
certainly used for storing large object references.

pg_largeobject has oids. I don't thing the fact that we use oids to
store references to pg_largeobject should blind us to the fact that oid
should be an opaque type. Using them as substitute unsigned ints seems
like a horrible idea to me too.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Maciej Gajewski
maciej.gajewski0@gmail.com
In reply to: Andrew Dunstan (#10)
Re: Unsigned integer types

I will implement it as an extension then.

My feeling is that PostgreSQL extensions tend to fall into obscurity.
As an ordinary user it took me really long time to find out that
interesting features are available in form of extensions; they are
certainly under-marketed. But this is a topic for separate discussion.

You have not at all addressed the real problem with doing what you are asking for, the one that Tom Lane stated:

Basically, there is zero chance this will happen unless you can find
a way of fitting them into the numeric promotion hierarchy that doesn't
break a lot of existing applications. We have looked at this more than
once, if memory serves, and failed to come up with a workable design
that didn't seem to violate the POLA.

I'm sorry, I thought my proposal was clear.

I propose to not integrate the unsigned types into existing promotion
hierarchy, and behave just like gcc would with -Werror: require
explicit cast. Between them, the unsigned types would be automatically
converted up (uint2 > uint4 > uint8).

Maciek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Robert Haas
robertmhaas@gmail.com
In reply to: Maciej Gajewski (#11)
Re: Unsigned integer types

On Wed, May 29, 2013 at 4:33 AM, Maciej Gajewski
<maciej.gajewski0@gmail.com> wrote:

I propose to not integrate the unsigned types into existing promotion
hierarchy, and behave just like gcc would with -Werror: require
explicit cast. Between them, the unsigned types would be automatically
converted up (uint2 > uint4 > uint8).

Seems pretty sensible to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Hannu Krosing
hannu@2ndQuadrant.com
In reply to: Maciej Gajewski (#11)
Re: Unsigned integer types

On 05/29/2013 11:33 AM, Maciej Gajewski wrote:

I will implement it as an extension then.

My feeling is that PostgreSQL extensions tend to fall into obscurity.
As an ordinary user it took me really long time to find out that
interesting features are available in form of extensions; they are
certainly under-marketed. But this is a topic for separate discussion.

You have not at all addressed the real problem with doing what you are asking for, the one that Tom Lane stated:

Basically, there is zero chance this will happen unless you can find
a way of fitting them into the numeric promotion hierarchy that doesn't
break a lot of existing applications. We have looked at this more than
once, if memory serves, and failed to come up with a workable design
that didn't seem to violate the POLA.

I'm sorry, I thought my proposal was clear.

I propose to not integrate the unsigned types into existing promotion
hierarchy, and behave just like gcc would with -Werror: require
explicit cast. Between them, the unsigned types would be automatically
converted up (uint2 > uint4 > uint8).

+1

(And we could even put some possible sample cast sets in contrib for those
who need automatic casts and are willing to do the required debugging )

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic O�

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Maciej Gajewski (#11)
Re: Unsigned integer types

I agree that extensions are undermarketed. Although pgxn is a good step,
I could not find it from "postgresql.org":-(

I propose to not integrate the unsigned types into existing promotion
hierarchy, and behave just like gcc would with -Werror: require
explicit cast. Between them, the unsigned types would be automatically
converted up (uint2 > uint4 > uint8).

If you do it, having uint1 (1 byte) would be nice as well.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15David E. Wheeler
david@justatheory.com
In reply to: Fabien COELHO (#14)
Re: Unsigned integer types

On May 29, 2013, at 10:48 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

If you do it, having uint1 (1 byte) would be nice as well.

There is a signed 1byte int on PGXN, FWIW:

http://pgxn.org/extension/tinyint

Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Fabien COELHO
coelho@cri.ensmp.fr
In reply to: David E. Wheeler (#15)
Re: Unsigned integer types

If you do it, having uint1 (1 byte) would be nice as well.

There is a signed 1byte int on PGXN, FWIW:
http://pgxn.org/extension/tinyint

That's good, thanks for the pointer!

However, it is a signed tinyint (-128..127 range), not an unsigned one.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers