storing binary data

Started by Neil Conwayover 25 years ago6 messagesgeneral
Jump to latest
#1Neil Conway
neilc@samurai.com

I want to store some binary data in Postgres. The data is an
MD5 checksum of the user's password, in binary. It will be
exactly 16 bytes (since it is a one-way hash).

Can I store this safely in a CHAR column? Can the data be
treated normally (i.e. compare a binary checksum of the
password the user entered with the binary value stored
in the DB, etc).

Are there any issues I'll need to watch out for when
storing/manipulating binary data?

(I was going to RTFM and search the mailing list archives,
but they don't seem to be working.)

TIA,

Neil

--
Neil Conway <neilconway@home.com>
Get my GnuPG key from: http://klamath.dyndns.org/mykey.asc
Encrypted mail welcomed

Secrecy is the beginning of tyranny.
-- Heinlein

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neil Conway (#1)
Re: storing binary data

Neil Conway <nconway@klamath.dyndns.org> writes:

I want to store some binary data in Postgres. The data is an
MD5 checksum of the user's password, in binary. It will be
exactly 16 bytes (since it is a one-way hash).

Can I store this safely in a CHAR column?

No. CHAR and friends assume there are no null (zero) bytes.
In MULTIBYTE setups there are probably additional constraints.

You could use bytea, but I would recommend converting the checksum
to a hex digit string and then storing that in a char-type field.
Hex is the usual textual representation for MD5 values, no?

regards, tom lane

#3Neil Conway
neilc@samurai.com
In reply to: Tom Lane (#2)
Re: storing binary data

On Mon, Oct 16, 2000 at 11:22:40PM -0400, Tom Lane wrote:

Neil Conway <nconway@klamath.dyndns.org> writes:

I want to store some binary data in Postgres. The data is an
MD5 checksum of the user's password, in binary. It will be
exactly 16 bytes (since it is a one-way hash).

Can I store this safely in a CHAR column?

No. CHAR and friends assume there are no null (zero) bytes.
In MULTIBYTE setups there are probably additional constraints.

You could use bytea, but I would recommend converting the checksum
to a hex digit string and then storing that in a char-type field.
Hex is the usual textual representation for MD5 values, no?

It is, but (IMHO) it's a big waste of space. The actual MD5 digest is
128 bits. If stored in binary form, it's 16 bytes. If stored in hex
form (as ASCII), it's 32 characters @ 1 byte per character = 32 bytes.
In Unicode, that's 64 bytes (correct me if I'm wrong).

It's not a huge deal, but it would be nice to store this efficiently.
Is this possible?

TIA,

Neil

--
Neil Conway <neilconway@home.com>
Get my GnuPG key from: http://klamath.dyndns.org/mykey.asc
Encrypted mail welcomed

Being able to break into computers doesn't make you a hacker any more
than being able to hotwire cars makes you an automotive engineer.

#4Alfred Perlstein
bright@wintelcom.net
In reply to: Neil Conway (#3)
Re: storing binary data

* Neil Conway <nconway@klamath.dyndns.org> [001016 20:41] wrote:

On Mon, Oct 16, 2000 at 11:22:40PM -0400, Tom Lane wrote:

Neil Conway <nconway@klamath.dyndns.org> writes:

I want to store some binary data in Postgres. The data is an
MD5 checksum of the user's password, in binary. It will be
exactly 16 bytes (since it is a one-way hash).

Can I store this safely in a CHAR column?

No. CHAR and friends assume there are no null (zero) bytes.
In MULTIBYTE setups there are probably additional constraints.

You could use bytea, but I would recommend converting the checksum
to a hex digit string and then storing that in a char-type field.
Hex is the usual textual representation for MD5 values, no?

It is, but (IMHO) it's a big waste of space. The actual MD5 digest is
128 bits. If stored in binary form, it's 16 bytes. If stored in hex
form (as ASCII), it's 32 characters @ 1 byte per character = 32 bytes.
In Unicode, that's 64 bytes (correct me if I'm wrong).

It's not a huge deal, but it would be nice to store this efficiently.
Is this possible?

Why not use base64? It's pretty gross but might work for you.

-Alfred

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neil Conway (#3)
Re: storing binary data

Neil Conway <nconway@klamath.dyndns.org> writes:

You could use bytea, but I would recommend converting the checksum
to a hex digit string and then storing that in a char-type field.
Hex is the usual textual representation for MD5 values, no?

It is, but (IMHO) it's a big waste of space. The actual MD5 digest is
128 bits. If stored in binary form, it's 16 bytes. If stored in hex
form (as ASCII), it's 32 characters @ 1 byte per character =3D 32 bytes.

You're worried about 16 bytes per pg_shadow entry? Get real. I'd
have recommended bytea if the amount of storage involved were actually
significant, but for this application readability seems more important.

regards, tom lane

#6Neil Conway
neilc@samurai.com
In reply to: Tom Lane (#5)
Re: storing binary data

On Mon, Oct 16, 2000 at 11:57:36PM -0400, Tom Lane wrote:

Neil Conway <nconway@klamath.dyndns.org> writes:

It is, but (IMHO) it's a big waste of space. The actual MD5 digest is
128 bits. If stored in binary form, it's 16 bytes. If stored in hex
form (as ASCII), it's 32 characters @ 1 byte per character =3D 32 bytes.

You're worried about 16 bytes per pg_shadow entry? Get real. I'd
have recommended bytea if the amount of storage involved were actually
significant, but for this application readability seems more important.

To clarify, these are 'application users', not Postgres users. So
the info is stored in one of my own tables, not pg_shadow. Although
I agree, this isn't a big deal either way.

Alfred: thanks for the tip. Looks like base64 will solve my problems!

Thanks to everyone who responded,

Neil

--
Neil Conway <neilconway@home.com>
Get my GnuPG key from: http://klamath.dyndns.org/mykey.asc
Encrypted mail welcomed

Violence is to dictatorship as propaganda is to democracy.
-- Noam Chomsky