CRC, hash & Co.
There have been some misconceptions in previous mails.
1.) A CRC is _not_ stronger than a hash. CRC is a subset of the hash domain,
defined as "a fast error-check hash based on mod 2 polynomial operations"
which has typically no crypto strength (and does not need it either for most
purposes).
2.) Theoretically, an optimal MD5 implementation can't be faster than an
optimal CRC-32 implementation. Check it yourself: download openssl
(www.openssl.org) or Peter Gutmans cryptlib where all sorts of hashes and
CRC-32 are implemented in a very reasonable way. Write a tiny routine
generating random strings, popping them through the hash function. You will
see, CRC-32 is typically several times faster.
3.) There are many domains where you need to protect yout database not only
against random accidental glitches, but also against malicious attacks. In
these cases, CRC-32 (and other CRCs without any cryptographic strength) are
no help.
The majority will probably be more happy with fast CRCs, but there will
always be some heavy weight users (such as in medical, legal and financial
domains) where strong hashes are required. Thus, it should be user-definable
at runtime which one to choose.
4.) Without CRC/hash facility, we will have no means of checking our data
integrity at all. At least in my domain (medical) most developers are
craving for database backends where we don't have to re-implement the
integrity checking stuff again and again. If postgres could provide this, it
would be a strong argument in favour of postgres.
5.) As opposed to a previous posting (Bruce ?), MD5 has been shown to be
"crackable" (deliberate collison feasible withavailable technology) - that
was one of the main reasons for the development of RIPEMD-160 (check the
RIPEMD home page for more information)
Once again, I am happy to implement any number of CRC/hash methods in
postgres if anybody (especially theone who implemented the SERIAL data type)
points me into the right direction within the postgres source code. No
takers so far :-(
Horst
On Sat, Dec 09, 2000 at 01:58:29PM +1100, Horst Herb wrote:
There have been some misconceptions in previous mails.
1.) A CRC is _not_ stronger than a hash. CRC is a subset of the hash domain,
defined as "a fast error-check hash based on mod 2 polynomial operations"
which has typically no crypto strength (and does not need it either for most
purposes).
Not true, unless your definition of strength is different than mine.
The point under question is if different data can produce the same hash
as correct data. A CRC will always be different if the difference is a
burst error of N-1 bits or less (N being the size of the CRC), and has a
2^N chance of being different for all other errors. Cryptographic
hashes can only claim the 2^N factor, with no guarantees.
2.) Theoretically, an optimal MD5 implementation can't be faster than an
optimal CRC-32 implementation. Check it yourself: download openssl
(www.openssl.org) or Peter Gutmans cryptlib where all sorts of hashes and
CRC-32 are implemented in a very reasonable way. Write a tiny routine
generating random strings, popping them through the hash function. You will
see, CRC-32 is typically several times faster.
You check it yourself. I'll send you a copy of my benchmarking code
under seperate cover. All the core hashes except for CRC were taken
from openssl. As per my last message, CRC on Celeron/P2/P3 sucks, and
in the worst case would only be 1.5 times faster than MD5. MD4 would be
near par with CRC.
3.) There are many domains where you need to protect yout database not only
against random accidental glitches, but also against malicious attacks. In
these cases, CRC-32 (and other CRCs without any cryptographic strength) are
no help.
If you have malicious attackers who can deliberately modify live data in
a database, you have problems beyond what any kind of hash can protect
against.
4.) Without CRC/hash facility, we will have no means of checking our data
integrity at all. At least in my domain (medical) most developers are
craving for database backends where we don't have to re-implement the
integrity checking stuff again and again. If postgres could provide this, it
would be a strong argument in favour of postgres.
I agree that it would be useful to CRC data blocks to protect against
bad data errors. If you're data is really that sensitive, though, you
may be looking for ECC, not CRC or hash facilities.
5.) As opposed to a previous posting (Bruce ?), MD5 has been shown to be
"crackable" (deliberate collison feasible withavailable technology)
No, it hasn't, unless you can provide us a reference to a paper showing
that. I've seen references that there are internal collisions in the
MD5 reduction function, but still no way to produce collisions on the
final digest.
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Sunday 10 December 2000 17:35, you wrote:
1.) A CRC is _not_ stronger than a hash. CRC is a subset of the hash
domain, defined as "a fast error-check hash based on mod 2 polynomial
operations" which has typically no crypto strength (and does not need it
either for most purposes).Not true, unless your definition of strength is different than mine.
It is not my definition, but the definition found in any technical / IT
dictionary I could grab. Examples:
3.) There are many domains where you need to protect yout database not
only against random accidental glitches, but also against malicious
attacks. In these cases, CRC-32 (and other CRCs without any cryptographic
strength) are no help.If you have malicious attackers who can deliberately modify live data in
a database, you have problems beyond what any kind of hash can protect
against.
In the medical domain, the "malicious attacker" is often the user himself.
For medico-legal reasons, we need a complete audit trail proofing that no
alterations have been made to medical records. For practical reasons, the
quickest means (AFAIK) to achieve this is digestig the digests of all entries
(or at least those of the change log) and store these externally on a trusted
authentication server. No matter how unlikely such a manipulation is; for a
court case you always need the state-of-the-art precautions.
5.) As opposed to a previous posting (Bruce ?), MD5 has been shown to be
"crackable" (deliberate collison feasible withavailable technology)No, it hasn't, unless you can provide us a reference to a paper showing
that. I've seen references that there are internal collisions in the
MD5 reduction function, but still no way to produce collisions on the
final digest.
You are partially right. It was only the compression function of MD5. But
that's enough.
"An iterated hash function is thus in this regard at most as strong as its
compression function" ( A.J.Menezes, P.C.van Oorschot, S.A.Vanstone "Handbook
of Applied Cryptography", CRC Press, 1999, link to online version:
http://www.cacr.math.uwaterloo.ca/hac/ ).
Read Cryptobytes Vol.2 No.2 Summer 1996; Hans Dobbertin: The status of MD5
after a recent attack
(ftp://ftp.rsasecurity.com/pub/cryptobytes/crypto2n2.pdf).
and
RSA Data security recommended already 1996 that MD4 should no longer be used
and that MD5 "should not be used for future applications that require the
hash function to be collision resistant"
(ftp://ftp.rsasecurity.com/pub/pdfs/bulletn4.pdf)
Even in S/MIME MD5 "is only provided for backwards compatibility" for that
very reason
(http://web.elastic.org/~fche/mirrors/www.jya.com/pgpfaq-ss.htm#SubMD5Broke)
and Bruce Schneier stated that he is "wary of MD5" ( B.Schneier, "Applied
Cryptography, Second Edition", Wiley, 1996 (cited at
http://web.elastic.org/~fche/mirrors/www.jya.com/pgpfaq-ss.htm, I am still
trying to find the original quote in the book))
For further reference I recommend the "Handbook of applied cryptography"
which surprisingly is available online (full text) at
http://www.cacr.math.uwaterloo.ca/hac/
Please remember that the whole reason for my reasoning is that we need a
run-time definable choice of CRCs/digests as no one single hash will suit all
needs.
Horst
Sorry, but I just found out that many of my mails bounced because I was using
my secondary email address =8-0
---------- Forwarded Message ----------
Subject: Re: [HACKERS] CRC, hash & Co.
Date: Mon, 11 Dec 2000 07:13:31 +1100
From: Horst Herb <horst@hherb.com>
To: pgsql-hackers@postgresql.org
On Sunday 10 December 2000 17:35, you wrote:
1.) A CRC is _not_ stronger than a hash. CRC is a subset of the hash
domain, defined as "a fast error-check hash based on mod 2 polynomial
operations" which has typically no crypto strength (and does not need it
either for most purposes).Not true, unless your definition of strength is different than mine.
It is not my definition, but the definition found in any technical / IT
dictionary I could grab. Examples:
3.) There are many domains where you need to protect your database not
only against random accidental glitches, but also against malicious
attacks. In these cases, CRC-32 (and other CRCs without any cryptographic
strength) are no help.If you have malicious attackers who can deliberately modify live data in
a database, you have problems beyond what any kind of hash can protect
against.
In the medical domain, the "malicious attacker" is often the user himself.
For medico-legal reasons, we need a complete audit trail proofing that no
alterations have been made to medical records. For practical reasons, the
quickest means (AFAIK) to achieve this is digesting the digests of all entries
(or at least those of the change log) and store these externally on a trusted
authentication server. No matter how unlikely such a manipulation is; for a
court case you always need the state-of-the-art precautions.
5.) As opposed to a previous posting (Bruce ?), MD5 has been shown to be
"crackable" (deliberate collision feasible with available technology)No, it hasn't, unless you can provide us a reference to a paper showing
that. I've seen references that there are internal collisions in the
MD5 reduction function, but still no way to produce collisions on the
final digest.
You are partially right. It was only the compression function of MD5. But
that's enough.
"An iterated hash function is thus in this regard at most as strong as its
compression function" ( A.J.Menezes, P.C.van Oorschot, S.A.Vanstone "Handbook
of Applied Cryptography", CRC Press, 1999, link to online version:
http://www.cacr.math.uwaterloo.ca/hac/ ).
Read Cryptobytes Vol.2 No.2 Summer 1996; Hans Dobbertin: The status of MD5
after a recent attack
(ftp://ftp.rsasecurity.com/pub/cryptobytes/crypto2n2.pdf).
and
RSA Data security recommended already 1996 that MD4 should no longer be used
and that MD5 "should not be used for future applications that require the
hash function to be collision resistant"
(ftp://ftp.rsasecurity.com/pub/pdfs/bulletn4.pdf)
Even in S/MIME MD5 "is only provided for backwards compatibility" for that
very reason
(http://web.elastic.org/~fche/mirrors/www.jya.com/pgpfaq-ss.htm#SubMD5Broke)
and Bruce Schneier stated that he is "wary of MD5" ( B.Schneier, "Applied
Cryptography, Second Edition", Wiley, 1996 (cited at
http://web.elastic.org/~fche/mirrors/www.jya.com/pgpfaq-ss.htm, I am still
trying to find the original quote in the book))
For further reference I recommend the "Handbook of applied cryptography"
which surprisingly is available online (full text) at
http://www.cacr.math.uwaterloo.ca/hac/
Please remember that the whole reason for my reasoning is that we need a
run-time definable choice of CRCs/digests as no one single hash will suit all
needs.
Horst
-------------------------------------------------------
Import Notes
Resolved by subject fallback