[RFC] new digest datatypes, or generic fixed-len hex types?

Started by Alvaro Herreraover 16 years ago10 messages
#1Alvaro Herrera
alvherre@commandprompt.com

Hi,

We've developed some code to implement fixed-length datatypes for well
known digest function output (MD5, SHA1 and the various SHA2 types).
These types have minimal overhead and are quite complete, including
btree and hash opclasses.

We're wondering about proposing them for inclusion in pgcrypto. I asked
Marko Kreen but he is not sure about it; according to him it would be
better to have general fixed-length hex types. (I guess it would be
possible to implement the digest types as domains over those.)

So basically we have sha1, sha-256, sha-512 etc on one hand, and hex8,
hex16, hex32 on the other hand. In both cases there is a single body of
code that is compiled with a macro definition that provides the data
length for every separate case. (Actually in the digest code we
refactored the common routines so that each type has a light wrapper
calling a function that works on any length; this could also be done to
the fixed-len hex code as well -- that code is pretty grotty at the
moment.)

Of these two choices, which one is likely to have better acceptance
around here?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#1)
Re: [RFC] new digest datatypes, or generic fixed-len hex types?

Alvaro Herrera <alvherre@commandprompt.com> writes:

We've developed some code to implement fixed-length datatypes for well
known digest function output (MD5, SHA1 and the various SHA2 types).
These types have minimal overhead and are quite complete, including
btree and hash opclasses.

We're wondering about proposing them for inclusion in pgcrypto.

Wasn't this proposed and rejected before? (Or more to the point,
why'd you bother? The advantage over bytea seems negligible.)

regards, tom lane

#3Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#2)
Re: [RFC] new digest datatypes, or generic fixed-len hex types?

On Mon, Jul 27, 2009 at 10:20 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

We've developed some code to implement fixed-length datatypes for well
known digest function output (MD5, SHA1 and the various SHA2 types).
These types have minimal overhead and are quite complete, including
btree and hash opclasses.

We're wondering about proposing them for inclusion in pgcrypto.

Wasn't this proposed and rejected before?  (Or more to the point,
why'd you bother?  The advantage over bytea seems negligible.)

well, one nice things about the fixed length types is that you can
keep your table from needing a toast table when you have a bytea in
it.

merlin

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#3)
Re: [RFC] new digest datatypes, or generic fixed-len hex types?

Merlin Moncure <mmoncure@gmail.com> writes:

On Mon, Jul 27, 2009 at 10:20 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Wasn't this proposed and rejected before? �(Or more to the point,
why'd you bother? �The advantage over bytea seems negligible.)

well, one nice things about the fixed length types is that you can
keep your table from needing a toast table when you have a bytea in
it.

If you don't actually use the toast table, it doesn't cost anything very
noticeable ...

regards, tom lane

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Merlin Moncure (#3)
Re: [RFC] new digest datatypes, or generic fixed-len hex types?

Merlin Moncure wrote:

On Mon, Jul 27, 2009 at 10:20 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

We've developed some code to implement fixed-length datatypes for well
known digest function output (MD5, SHA1 and the various SHA2 types).
These types have minimal overhead and are quite complete, including
btree and hash opclasses.

We're wondering about proposing them for inclusion in pgcrypto.

Wasn't this proposed and rejected before? (Or more to the point,
why'd you bother? The advantage over bytea seems negligible.)

well, one nice things about the fixed length types is that you can
keep your table from needing a toast table when you have a bytea in
it.

Can't you just set storage on the column to MAIN to stop it being stored
in a toast table?

cheers

andrew

#6Merlin Moncure
mmoncure@gmail.com
In reply to: Andrew Dunstan (#5)
Re: [RFC] new digest datatypes, or generic fixed-len hex types?

On Mon, Jul 27, 2009 at 12:02 PM, Andrew Dunstan<andrew@dunslane.net> wrote:

Merlin Moncure wrote:

On Mon, Jul 27, 2009 at 10:20 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

We've developed some code to implement fixed-length datatypes for well
known digest function output (MD5, SHA1 and the various SHA2 types).
These types have minimal overhead and are quite complete, including
btree and hash opclasses.
     We're wondering about proposing them for inclusion in pgcrypto.

Wasn't this proposed and rejected before?  (Or more to the point,
why'd you bother?  The advantage over bytea seems negligible.)

well, one nice things about the fixed length types is that you can
keep your table from needing a toast table when you have a bytea in
it.

Can't you just set storage on the column to MAIN to stop it being stored in
a toast table?

of course.

hm. would the input/output functions for the fixed length types be
faster? what is the advantage of the proposal?

merlin

#7Peter Eisentraut
peter_e@gmx.net
In reply to: Alvaro Herrera (#1)
Re: [RFC] new digest datatypes, or generic fixed-len hex types?

On Monday 27 July 2009 14:50:30 Alvaro Herrera wrote:

We've developed some code to implement fixed-length datatypes for well
known digest function output (MD5, SHA1 and the various SHA2 types).
These types have minimal overhead and are quite complete, including
btree and hash opclasses.

We're wondering about proposing them for inclusion in pgcrypto. I asked
Marko Kreen but he is not sure about it; according to him it would be
better to have general fixed-length hex types. (I guess it would be
possible to implement the digest types as domains over those.)

I think equipping bytea with a length restriction would be a very natural,
simple, and useful addition. If we ever want to move the bytea type closer to
the SQL standard blob type, this will need to happen anyway.

The case for separate fixed-length data types seems very dubious, unless you
can show very impressive performance numbers. For one thing, they would make
the whole type system more complicated, or in the alternative, would have
little function and operator support.

#8Merlin Moncure
mmoncure@gmail.com
In reply to: Peter Eisentraut (#7)
Re: [RFC] new digest datatypes, or generic fixed-len hex types?

On Tue, Jul 28, 2009 at 7:15 AM, Peter Eisentraut<peter_e@gmx.net> wrote:

On Monday 27 July 2009 14:50:30 Alvaro Herrera wrote:

We've developed some code to implement fixed-length datatypes for well
known digest function output (MD5, SHA1 and the various SHA2 types).
These types have minimal overhead and are quite complete, including
btree and hash opclasses.

I think equipping bytea with a length restriction would be a very natural,
simple, and useful addition.  If we ever want to move the bytea type closer to
the SQL standard blob type, this will need to happen anyway.

+1

merlin

#9decibel
decibel@decibel.org
In reply to: Peter Eisentraut (#7)
Re: [RFC] new digest datatypes, or generic fixed-len hex types?

On Jul 28, 2009, at 6:15 AM, Peter Eisentraut wrote:

On Monday 27 July 2009 14:50:30 Alvaro Herrera wrote:

We've developed some code to implement fixed-length datatypes for
well
known digest function output (MD5, SHA1 and the various SHA2 types).
These types have minimal overhead and are quite complete, including
btree and hash opclasses.

We're wondering about proposing them for inclusion in pgcrypto. I
asked
Marko Kreen but he is not sure about it; according to him it would be
better to have general fixed-length hex types. (I guess it would be
possible to implement the digest types as domains over those.)

I think equipping bytea with a length restriction would be a very
natural,
simple, and useful addition. If we ever want to move the bytea
type closer to
the SQL standard blob type, this will need to happen anyway.

The case for separate fixed-length data types seems very dubious,
unless you
can show very impressive performance numbers. For one thing, they
would make
the whole type system more complicated, or in the alternative,
would have
little function and operator support.

bytea doesn't cast well to and from text when you're dealing with hex
data; you end up using the same amount of space as a varchar. What
would probably work well is a hex datatype that internally works like
bytea but requires that the input data is hex (I know you can use
encode/decode, but that added step is a pain). A similar argument
could be made for base64 encoded data.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

#10Peter Eisentraut
peter_e@gmx.net
In reply to: decibel (#9)
Re: [RFC] new digest datatypes, or generic fixed-len hex types?

On Wednesday 29 July 2009 20:16:48 decibel wrote:

bytea doesn't cast well to and from text when you're dealing with hex
data; you end up using the same amount of space as a varchar. What
would probably work well is a hex datatype that internally works like
bytea but requires that the input data is hex (I know you can use
encode/decode, but that added step is a pain). A similar argument
could be made for base64 encoded data.

There is a patch in the queue that adds hex input and output to bytea.