storing binary data

Started by Jason Orendorffabout 24 years ago15 messages
#1Jason Orendorff
jason@jorendorff.com

Reply-To: sender

Hi. I was surprised to discover today that postgres's
character types don't support zero bytes. That is,
Postgres isn't 8-bit clean. Why is that?

More to the point, I need to store about 1k bytes per row
of varying-length 8-bit binary data. I have a few options:

+ BLOBs. PostgreSQL BLOBs make me nervous. I worry about
the BLOB not being deleted when the corresponding row in
the table is deleted. The documentation is vague.

+ What I really need is a binary *short* object type.
I have heard rumors of a legendary "bytea" type that might
help me, but it doesn't appear to be documented anywhere,
so I hesitate to use it.

+ I can base64-encode the data and store it in a "text"
field. But postgres is a great big data-storage system;
surely it can store binary data without resorting to
this kind of hack.

What should I do? Please help. Thanks!

--
Jason Orendorff

P.S. I would love to help improve PostgreSQL's documentation.
Whom should I contact?

#2Barry Lind
barry@xythos.com
In reply to: Jason Orendorff (#1)
Re: storing binary data

Jason,

BLOBs as you have correctly inferred do not get automatically deleted.
You can add triggers to your tables to delete them automatically if you
so desire.

However 'bytea' is the datatype that is most appropriate for your needs.
It has been around for a long time, but not well documented. I have
been using it in my code since 7.0 of postgres and it works fine. In
fact many of the internal postgres tables use it.

The problem with bytea is that many of the client interfaces don't
support it well or at all. So depending on how you intend to access the
data you may not be able to use the bytea datatype. The situation is
much improved in 7.2 with bytea documented and better support for it in
the client interfaces (jdbc especially).

Encoding the data into a text format will certainly work, if you can't
work around the current limitations of the above two options. And I
believe there is some contrib code to help in this area.

thanks,
--Barry

Jason Orendorff wrote:

Show quoted text

Reply-To: sender

Hi. I was surprised to discover today that postgres's
character types don't support zero bytes. That is,
Postgres isn't 8-bit clean. Why is that?

More to the point, I need to store about 1k bytes per row
of varying-length 8-bit binary data. I have a few options:

+ BLOBs. PostgreSQL BLOBs make me nervous. I worry about
the BLOB not being deleted when the corresponding row in
the table is deleted. The documentation is vague.

+ What I really need is a binary *short* object type.
I have heard rumors of a legendary "bytea" type that might
help me, but it doesn't appear to be documented anywhere,
so I hesitate to use it.

+ I can base64-encode the data and store it in a "text"
field. But postgres is a great big data-storage system;
surely it can store binary data without resorting to
this kind of hack.

What should I do? Please help. Thanks!

#3Doug McNaught
doug@wireboard.com
In reply to: Jason Orendorff (#1)
Re: storing binary data

"Jason Orendorff" <jason@jorendorff.com> writes:

Reply-To: sender

Just to be nice, I'll do this. ;)

Hi. I was surprised to discover today that postgres's
character types don't support zero bytes. That is,
Postgres isn't 8-bit clean. Why is that?

As I understand it, the storage system itself is 8-bit clean; it's the
parser layer that isn't (as it uses C strings everywhere).

More to the point, I need to store about 1k bytes per row
of varying-length 8-bit binary data. I have a few options:

+ BLOBs. PostgreSQL BLOBs make me nervous. I worry about
the BLOB not being deleted when the corresponding row in
the table is deleted. The documentation is vague.

This is an issue. There is definitely no automatic deletion of
LOs. There is a 'vacuumlo' program in contrib/ that may be useful, or
you can roll your own, or you can use triggers to make sure LOs get
deleted.

FWIW, I've been using LOBs in a couple of applications and haven't had
too much trouble.

+ What I really need is a binary *short* object type.
I have heard rumors of a legendary "bytea" type that might
help me, but it doesn't appear to be documented anywhere,
so I hesitate to use it.

It is in 7.1, but is more fully documented in 7.2 (which is entering
beta). See:

http://candle.pha.pa.us/main/writings/pgsql/sgml/datatype-binary.html

+ I can base64-encode the data and store it in a "text"
field. But postgres is a great big data-storage system;
surely it can store binary data without resorting to
this kind of hack.

Since the only way to store or retrieve non-LOB data is to go through
the SQL parser, you always have to do some escaping. The link above
tells you how to do it for 'bytea' without having to go the base64
route.

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863

#4Alex Pilosov
alex@pilosoft.com
In reply to: Jason Orendorff (#1)
Re: storing binary data

Use bytea. Search archives.

On Sun, 21 Oct 2001, Jason Orendorff wrote:

Show quoted text

Reply-To: sender

Hi. I was surprised to discover today that postgres's
character types don't support zero bytes. That is,
Postgres isn't 8-bit clean. Why is that?

More to the point, I need to store about 1k bytes per row
of varying-length 8-bit binary data. I have a few options:

+ BLOBs. PostgreSQL BLOBs make me nervous. I worry about
the BLOB not being deleted when the corresponding row in
the table is deleted. The documentation is vague.

+ What I really need is a binary *short* object type.
I have heard rumors of a legendary "bytea" type that might
help me, but it doesn't appear to be documented anywhere,
so I hesitate to use it.

+ I can base64-encode the data and store it in a "text"
field. But postgres is a great big data-storage system;
surely it can store binary data without resorting to
this kind of hack.

What should I do? Please help. Thanks!

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jason Orendorff (#1)
Re: storing binary data

"Jason Orendorff" <jason@jorendorff.com> writes:

Hi. I was surprised to discover today that postgres's
character types don't support zero bytes. That is,
Postgres isn't 8-bit clean. Why is that?

(a) because all our datatype I/O interfaces are based on C-style
(null terminated) strings

(b) because comparison of character datatypes is based on strcoll()
(at least if you compiled with locale support)

Fixing either of these is far more pain than is justified to allow
people to store non-textual data in textual datatypes. I don't foresee
it happening.

+ What I really need is a binary *short* object type.
I have heard rumors of a legendary "bytea" type that might
help me, but it doesn't appear to be documented anywhere,
so I hesitate to use it.

It's real and it's not going away. It is pretty poorly documented
and doesn't have a wide variety of functions ... but hey, you can help
improve that situation. This is an open source project after all ;-)

regards, tom lane

#6Joe Conway
joseph.conway@home.com
In reply to: Jason Orendorff (#1)
Re: storing binary data

+ What I really need is a binary *short* object type.
I have heard rumors of a legendary "bytea" type that might
help me, but it doesn't appear to be documented anywhere,
so I hesitate to use it.

It's real and it's not going away. It is pretty poorly documented
and doesn't have a wide variety of functions ... but hey, you can help
improve that situation. This is an open source project after all ;-)

regards, tom lane

I'll take a shot at improving the documentation for bytea. I'm hoping
documentation patches are accepted during beta though ;-)

Also, FWIW, 7.2 includes bytea support for LIKE, NOT LIKE, LIKE ESCAPE,
||, trim(), substring(), position(), length(), indexing, and various
comparators.

Joe

#7Thomas Lockhart
lockhart@fourpalms.org
In reply to: Jason Orendorff (#1)
Re: storing binary data

...

I'll take a shot at improving the documentation for bytea. I'm hoping
documentation patches are accepted during beta though ;-)

Always. At least up until a week or so before release, when we need to
firm up the docs and work on final cleanup etc. There are several
announcements leading up to that point, so it will not be a suprise.

- Thomas

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#6)
Re: storing binary data

Joe Conway <joseph.conway@home.com> writes:

I'll take a shot at improving the documentation for bytea. I'm hoping
documentation patches are accepted during beta though ;-)

Of course. The only limitation we place during beta is "no new features
added". I plan to spend a good deal of time on the docs during beta
myself.

regards, tom lane

#9Peter Eisentraut
peter_e@gmx.net
In reply to: Jason Orendorff (#1)
Re: storing binary data

Jason Orendorff writes:

Hi. I was surprised to discover today that postgres's
character types don't support zero bytes. That is,
Postgres isn't 8-bit clean. Why is that?

PostgreSQL is 8-bit clean. The character types don't support zero bytes
because the character types store characters, not bytes.

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#10Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#8)
Re: storing binary data

Quick question - I couldn't find this in the docs:

What exactly is the advantage in using VIEWs? I get the impression that the
SELECT query it is based on is cached (ie. a cached query plan).

But, is this cached between db restarts, between connections, etc. Is it
cached upon the first use of the view for a db instance for a particular
connection, etc?

Chris

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Tom Lane
Sent: Thursday, 25 October 2001 1:00 AM
To: Joe Conway
Cc: Jason Orendorff; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] storing binary data

Joe Conway <joseph.conway@home.com> writes:

I'll take a shot at improving the documentation for bytea. I'm hoping
documentation patches are accepted during beta though ;-)

Of course. The only limitation we place during beta is "no new features
added". I plan to spend a good deal of time on the docs during beta
myself.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christopher Kings-Lynne (#10)
Re: storing binary data

"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:

What exactly is the advantage in using VIEWs?

A level of logical indirection between the application and the physical
data schema. There are no performance benefits.

I get the impression that the
SELECT query it is based on is cached (ie. a cached query plan).

Nope. If there's something in the docs that makes you think so,
point out so I can fix it ;-)

regards, tom lane

#12Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#11)
Re: storing binary data

I get the impression that the
SELECT query it is based on is cached (ie. a cached query plan).

Nope. If there's something in the docs that makes you think so,
point out so I can fix it ;-)

Hmmm...I could have sworn that you mentioned in passing something about
cached query plans and VIEWs - I must have been in dream land.

Chris

#13Lincoln Yeoh
lyeoh@pop.jaring.my
In reply to: Joe Conway (#6)
Re: storing binary data

I'll take a shot at improving the documentation for bytea. I'm hoping
documentation patches are accepted during beta though ;-)

Also, FWIW, 7.2 includes bytea support for LIKE, NOT LIKE, LIKE ESCAPE,
||, trim(), substring(), position(), length(), indexing, and various
comparators.

Cool!

Would it be practical to use substring for retrieving chunks of binary data
in manageable sizes? Or would the overheads be too high?

Cheerio,
Link.

#14Alessio Bragadini
alessio@albourne.com
In reply to: Christopher Kings-Lynne (#10)
Re: storing binary data

Christopher Kings-Lynne wrote:

What exactly is the advantage in using VIEWs? I get the impression that the
SELECT query it is based on is cached (ie. a cached query plan).

I had the same impression but I've been told (with explanations) that
the query plan for a view is not cached in any way.

--
Alessio F. Bragadini alessio@albourne.com
APL Financial Services http://village.albourne.com
Nicosia, Cyprus phone: +357-2-755750

"It is more complicated than you think"
-- The Eighth Networking Truth from RFC 1925

#15Joe Conway
joseph.conway@home.com
In reply to: Jason Orendorff (#1)
Re: storing binary data

Lincoln Yeoh wrote:

Also, FWIW, 7.2 includes bytea support for LIKE, NOT LIKE, LIKE ESCAPE,
||, trim(), substring(), position(), length(), indexing, and various
comparators.

Cool!

Would it be practical to use substring for retrieving chunks of binary data
in manageable sizes? Or would the overheads be too high?

Cheerio,
Link.

I haven't done any performance testing, but it should be no different
than the substring function used on TEXT fields. Try it out and let us
know ;-)

-- Joe