sha1, sha2 functions into core?
I would like to see whether there is support for adding sha1 and sha2
functions into the core. These are obviously well-known and widely used
functions, but currently the only way to get them is either through
pgcrypto or one of the PLs. We could say that's OK, but then we do
support md5 in core, which then encourages people to use that, when they
really shouldn't use that for new applications. Another weirdness is
that md5() doesn't return bytea but instead the result hex-encoded in a
string, which makes it weird to use in some cases.
One thing that might be reasonable would be to move the digest()
functions
digest(data text, type text) returns bytea
digest(data bytea, type text) returns bytea
from pgcrypto into core, so that pgcrypto is mostly restricted to
encryption, and can be kept at arm's length for those who need to do
that.
(Side note: Would the extension mechanism be able to easily cope with a
move like that?)
Peter Eisentraut <peter_e@gmx.net> writes:
I would like to see whether there is support for adding sha1 and sha2
functions into the core.
I can't get excited about that, but could put up with it as long as
there wasn't scope creep ...
One thing that might be reasonable would be to move the digest()
functions
digest(data text, type text) returns bytea
digest(data bytea, type text) returns bytea
from pgcrypto into core,
... which this approach would create, because digest() isn't restricted
to just those algorithms. I think it'd be better to just invent two
new functions, which also avoids issues for applications that currently
expect the digest functions to be installed in pgcrypto's schema.
regards, tom lane
On 08/10/2011 02:06 PM, Peter Eisentraut wrote:
I would like to see whether there is support for adding sha1 and sha2
functions into the core. These are obviously well-known and widely used
functions, but currently the only way to get them is either through
pgcrypto or one of the PLs. We could say that's OK, but then we do
support md5 in core, which then encourages people to use that, when they
really shouldn't use that for new applications. Another weirdness is
that md5() doesn't return bytea but instead the result hex-encoded in a
string, which makes it weird to use in some cases.One thing that might be reasonable would be to move the digest()
functionsdigest(data text, type text) returns bytea
digest(data bytea, type text) returns byteafrom pgcrypto into core, so that pgcrypto is mostly restricted to
encryption, and can be kept at arm's length for those who need to do
that.(Side note: Would the extension mechanism be able to easily cope with a
move like that?)
It's come up before:
<http://archives.postgresql.org/pgsql-hackers/2009-09/msg01293.php>
+1 for returning bytea though.
cheers
andrew
On Wed, Aug 10, 2011 at 2:24 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
It's come up before:
<http://archives.postgresql.org/pgsql-hackers/2009-09/msg01293.php>
I was about to wonder out loud if we might be trying to hit a moving target....
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Aug 10, 2011 at 7:06 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
I would like to see whether there is support for adding sha1 and sha2
functions into the core. These are obviously well-known and widely used
functions, but currently the only way to get them is either through
pgcrypto or one of the PLs. We could say that's OK, but then we do
support md5 in core, which then encourages people to use that, when they
really shouldn't use that for new applications.
Slightly different, but related - I've seen complaints that we only
use md5 for password storage/transmission, which is apparently not
acceptable under some government security standards. In the most
recent case, they wanted to be able to use sha256 for password storage
(transmission isn't really an issue where SSL can be used of course).
If we're ready to move more hashing functions into core, then it seems
reasonable to add more options for password storage to help those who
need to meet mandated standards.
--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On ons, 2011-08-10 at 19:29 +0100, Dave Page wrote:
On Wed, Aug 10, 2011 at 7:06 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
I would like to see whether there is support for adding sha1 and sha2
functions into the core. These are obviously well-known and widely used
functions, but currently the only way to get them is either through
pgcrypto or one of the PLs. We could say that's OK, but then we do
support md5 in core, which then encourages people to use that, when they
really shouldn't use that for new applications.Slightly different, but related - I've seen complaints that we only
use md5 for password storage/transmission, which is apparently not
acceptable under some government security standards. In the most
recent case, they wanted to be able to use sha256 for password storage
(transmission isn't really an issue where SSL can be used of course).
Yeah, that's one of those things. These days, using md5 for anything
raises red flags, so it would be better to slowly move some alternatives
into place.
If we're ready to move more hashing functions into core, then it seems
reasonable to add more options for password storage to help those who
need to meet mandated standards.
Yes, that would be good.
On ons, 2011-08-10 at 14:26 -0400, Robert Haas wrote:
On Wed, Aug 10, 2011 at 2:24 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
It's come up before:
<http://archives.postgresql.org/pgsql-hackers/2009-09/msg01293.php>I was about to wonder out loud if we might be trying to hit a moving target....
I think we are dealing with a lot more moving targets than adding a new
version of SHA every 12 to 15 years.
On 10.08.2011 21:45, Peter Eisentraut wrote:
On ons, 2011-08-10 at 14:26 -0400, Robert Haas wrote:
On Wed, Aug 10, 2011 at 2:24 PM, Andrew Dunstan<andrew@dunslane.net> wrote:
It's come up before:
<http://archives.postgresql.org/pgsql-hackers/2009-09/msg01293.php>I was about to wonder out loud if we might be trying to hit a moving target....
I think we are dealing with a lot more moving targets than adding a new
version of SHA every 12 to 15 years.
Moving to a something more modern for internal use is one thing. But
regarding the user-visible md5() function, how about we jump off this
treadmill and remove it altogether? And provide a backwards-compatible
function in pgcrypto.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Wed, Aug 10, 2011 at 21:02, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
On 10.08.2011 21:45, Peter Eisentraut wrote:
On ons, 2011-08-10 at 14:26 -0400, Robert Haas wrote:
On Wed, Aug 10, 2011 at 2:24 PM, Andrew Dunstan<andrew@dunslane.net>
wrote:It's come up before:
<http://archives.postgresql.org/pgsql-hackers/2009-09/msg01293.php>I was about to wonder out loud if we might be trying to hit a moving
target....I think we are dealing with a lot more moving targets than adding a new
version of SHA every 12 to 15 years.Moving to a something more modern for internal use is one thing. But
regarding the user-visible md5() function, how about we jump off this
treadmill and remove it altogether? And provide a backwards-compatible
function in pgcrypto.
-1.
There are certainly a number of perfectly valid use-cases for md5, and
it would probably break a *lot* of applications to remove it.
+1 for adding the SHA functions to core as choices, of course.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On ons, 2011-08-10 at 14:19 -0400, Tom Lane wrote:
One thing that might be reasonable would be to move the digest()
functions
digest(data text, type text) returns bytea
digest(data bytea, type text) returns bytea
from pgcrypto into core,... which this approach would create, because digest() isn't
restricted
to just those algorithms. I think it'd be better to just invent two
new functions, which also avoids issues for applications that
currently
expect the digest functions to be installed in pgcrypto's schema.
I would also prefer to simply add sha1(bytea/text) => bytea, but the
existing md5 function is md5(bytea/text) => test, so either the new
functions would be inconsistent, or we make the new functions broken
like the old one, or we invent a different naming system, such as
digest().
On Thu, Aug 11, 2011 at 09:06, Peter Eisentraut <peter_e@gmx.net> wrote:
On ons, 2011-08-10 at 14:19 -0400, Tom Lane wrote:
One thing that might be reasonable would be to move the digest()
functions
digest(data text, type text) returns bytea
digest(data bytea, type text) returns bytea
from pgcrypto into core,... which this approach would create, because digest() isn't
restricted
to just those algorithms. I think it'd be better to just invent two
new functions, which also avoids issues for applications that
currently
expect the digest functions to be installed in pgcrypto's schema.I would also prefer to simply add sha1(bytea/text) => bytea, but the
existing md5 function is md5(bytea/text) => test, so either the new
functions would be inconsistent, or we make the new functions broken
like the old one, or we invent a different naming system, such as
digest().
You could always combine them and create digest_sha1(bytea/text) =>
bytea, etc. That still won't have the "open ended" problem of just
digest().
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Wed, Aug 10, 2011 at 9:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
I would like to see whether there is support for adding sha1 and sha2
functions into the core.I can't get excited about that, but could put up with it as long as
there wasn't scope creep ...One thing that might be reasonable would be to move the digest()
functions
digest(data text, type text) returns bytea
digest(data bytea, type text) returns bytea
from pgcrypto into core,... which this approach would create, because digest() isn't restricted
to just those algorithms. I think it'd be better to just invent two
new functions, which also avoids issues for applications that currently
expect the digest functions to be installed in pgcrypto's schema.
I would suggest digest() with fixed list of algorithms: md5, sha1, sha2.
The uncommon/obsolete algorithms that can be used
from digest() if compiled with openssl, are not something we
need to worry over. In fact we have never "supported" them,
as no testing has been done.
Then we could also add hexdigest() which would fix whole bytea/hex
confusion without bloating pg_proc.
--
marko
Marko Kreen <markokr@gmail.com> writes:
On Wed, Aug 10, 2011 at 9:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
... which this approach would create, because digest() isn't restricted
to just those algorithms. �I think it'd be better to just invent two
new functions, which also avoids issues for applications that currently
expect the digest functions to be installed in pgcrypto's schema.
I would suggest digest() with fixed list of algorithms: md5, sha1, sha2.
The uncommon/obsolete algorithms that can be used
from digest() if compiled with openssl, are not something we
need to worry over. In fact we have never "supported" them,
as no testing has been done.
Hmm ... they may be untested by us, but I feel sure that if we remove
that functionality from pgcrypto, *somebody* is gonna complain.
I don't see anything much wrong with sha1(bytea/text) -> bytea.
There's no law that says it has to work exactly like md5() does.
regards, tom lane
On 08/11/2011 10:46 AM, Tom Lane wrote:
Marko Kreen<markokr@gmail.com> writes:
On Wed, Aug 10, 2011 at 9:19 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
... which this approach would create, because digest() isn't restricted
to just those algorithms. I think it'd be better to just invent two
new functions, which also avoids issues for applications that currently
expect the digest functions to be installed in pgcrypto's schema.I would suggest digest() with fixed list of algorithms: md5, sha1, sha2.
The uncommon/obsolete algorithms that can be used
from digest() if compiled with openssl, are not something we
need to worry over. In fact we have never "supported" them,
as no testing has been done.Hmm ... they may be untested by us, but I feel sure that if we remove
that functionality from pgcrypto, *somebody* is gonna complain.
Yeah. Maybe we should add a test or two.
I don't see anything much wrong with sha1(bytea/text) -> bytea.
There's no law that says it has to work exactly like md5() does.
I agree. We could provide an md5_b(text/bytea) -> bytea if people are
really concerned about orthogonality.
cheers
andrew
On Thu, Aug 11, 2011 at 5:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Marko Kreen <markokr@gmail.com> writes:
On Wed, Aug 10, 2011 at 9:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
... which this approach would create, because digest() isn't restricted
to just those algorithms. I think it'd be better to just invent two
new functions, which also avoids issues for applications that currently
expect the digest functions to be installed in pgcrypto's schema.I would suggest digest() with fixed list of algorithms: md5, sha1, sha2.
The uncommon/obsolete algorithms that can be used
from digest() if compiled with openssl, are not something we
need to worry over. In fact we have never "supported" them,
as no testing has been done.Hmm ... they may be untested by us, but I feel sure that if we remove
that functionality from pgcrypto, *somebody* is gonna complain.
Well, if you are worried about that, you can duplicate current
pgcrypto behaviour - if postgres is compiled against openssl,
you get whatever algorithms are available in that particular
version of openssl.
My point was that giving such open-ended list of algorithms
was bad idea, but there is no problem keeping old behaviour.
I don't see anything much wrong with sha1(bytea/text) -> bytea.
There's no law that says it has to work exactly like md5() does.
The problem is that list of must-have algorithms is getting
quite long: md5, sha1, sha224, sha256, sha384, sha512,
+ at least 4 from upcoming sha3.
Another problem is that generic hashes are bad way for storing passwords
- identical passwords will look identical, and its easy to brute-force
passwords as the algorithms are very fast.
So the question is: is there actual *good* reason add each algorithm separately,
without uniform API to core functions?
If the user requests are about storing passwords, and we want to make
that easier, then we should import crypt() also, as that is the secure
way for password storage. Then the md5(), md5_b() plus bunch of
sha-s will look silly.
--
marko
On Aug 12, 2011, at 5:02 AM, Marko Kreen wrote:
My point was that giving such open-ended list of algorithms
was bad idea, but there is no problem keeping old behaviour.I don't see anything much wrong with sha1(bytea/text) -> bytea.
There's no law that says it has to work exactly like md5() does.The problem is that list of must-have algorithms is getting
quite long: md5, sha1, sha224, sha256, sha384, sha512,
+ at least 4 from upcoming sha3.
+1
I think some sort of digest() function that takes a parameter naming the algorithm would be the way to go. That's not to say that the existing named functions could continue to exist -- md5() in core and sha1() in pg_crypto. But it sure seems to me like we ought to have just one function for digests (or 2, if we also have hexdigest()).
Best,
David
On Thu, Aug 11, 2011 at 5:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Marko Kreen <markokr@gmail.com> writes:
On Wed, Aug 10, 2011 at 9:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
... which this approach would create, because digest() isn't restricted
to just those algorithms. I think it'd be better to just invent two
new functions, which also avoids issues for applications that currently
expect the digest functions to be installed in pgcrypto's schema.I would suggest digest() with fixed list of algorithms: md5, sha1, sha2.
The uncommon/obsolete algorithms that can be used
from digest() if compiled with openssl, are not something we
need to worry over. In fact we have never "supported" them,
as no testing has been done.Hmm ... they may be untested by us, but I feel sure that if we remove
that functionality from pgcrypto, *somebody* is gonna complain.
If you dont want to break digest() but do not want such behaviour in core,
we could go with hash(data, algo) that has fixed number of digests,
but also couple non-cryptographic hashes like crc32, lookup2/3.
This would also fix the problem of people using hashtext() in user code.
--
marko
On Fri, Aug 12, 2011 at 10:14:58PM +0300, Marko Kreen wrote:
On Thu, Aug 11, 2011 at 5:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Marko Kreen <markokr@gmail.com> writes:
On Wed, Aug 10, 2011 at 9:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
... which this approach would create, because digest() isn't restricted
to just those algorithms. �I think it'd be better to just invent two
new functions, which also avoids issues for applications that currently
expect the digest functions to be installed in pgcrypto's schema.I would suggest digest() with fixed list of algorithms: md5, sha1, sha2.
The uncommon/obsolete algorithms that can be used
from digest() if compiled with openssl, are not something we
need to worry over. �In fact we have never "supported" them,
as no testing has been done.Hmm ... they may be untested by us, but I feel sure that if we remove
that functionality from pgcrypto, *somebody* is gonna complain.If you dont want to break digest() but do not want such behaviour in core,
we could go with hash(data, algo) that has fixed number of digests,
but also couple non-cryptographic hashes like crc32, lookup2/3.
This would also fix the problem of people using hashtext() in user code.
Hmm, this thread seems to have petered out without a conclusion. Just
wanted to comment that there _are_ non-password storage uses for these
digests: I use them in a context of storing large files in a bytea
column, as a means to doing data deduplication, and avoiding pushing
files from clients to server and back.
Ross
--
Ross Reedstrom, Ph.D. reedstrm@rice.edu
Systems Engineer & Admin, Research Scientist phone: 713-348-6166
Connexions http://cnx.org fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE
On Wed, Aug 31, 2011 at 11:12 AM, Ross J. Reedstrom <reedstrm@rice.edu> wrote:
Hmm, this thread seems to have petered out without a conclusion. Just
wanted to comment that there _are_ non-password storage uses for these
digests: I use them in a context of storing large files in a bytea
column, as a means to doing data deduplication, and avoiding pushing
files from clients to server and back.
Yes, agreed: there is no decent content-addressing type in PostgreSQL,
so one rolls their own using shas and joins; I've seen this more than
once. It's a useful way to get non-bloated index on a series of
(larger than sha1) values where one only cares about the equality
operator (hash indexes, as unattractive as they were before in
PostgreSQL's implementation are even less so now with streaming
replication).
When that content to be addressed can be submitted from another
source, anything with md5 is correctly met with suspicion. We have
gone to the trouble of using pgcrypto to get sha1 access, but I know
of other applications that would have preferred to use sha but settle
for md5 simply because it's known to be bundled in core everywhere.
CREATE EXTENSION -- particularly if there is *any* way (is there? even
with ugliness like utility statement hooks) to configure it on the
provider end to not require superuser for common extensions like
'pgcrypto' -- could ablate this issue and one could get off the hash
"treadmill", including md5 -- but I think that would be a mistake.
Applications need a high quality digest to enable any kind of
principled content addressing use case, and I think making that any
harder than a builtin is going to negatively impact the state of
things at large. As a compromise, I'd also be happy with making
CREATE EXTENSION so trivial that everyone who has that use case can
get pgcrypto on any hosting provider.
--
fdr
On ons, 2011-08-31 at 13:12 -0500, Ross J. Reedstrom wrote:
Hmm, this thread seems to have petered out without a conclusion. Just
wanted to comment that there _are_ non-password storage uses for these
digests: I use them in a context of storing large files in a bytea
column, as a means to doing data deduplication, and avoiding pushing
files from clients to server and back.
But I suppose you don't need the hash function in the database system
for that.