COPY BINARY is broken...

Started by Tom Laneabout 25 years ago14 messages

tgl@sss.pgh.pa.us

about 25 years ago

I've just noticed that COPY BINARY is pretty thoroughly broken by TOAST,
because what it does is to dump out verbatim the bytes making up each
tuple of the relation. In the case of a moved-off value, you'll get
the toast reference, which is not going to be too helpful for reloading
the table data. In the case of a compressed-in-line datum, you'll at
least have all the data there, but the COPY BINARY reader will crash
and burn when it sees it.

Fixing this while retaining backwards compatibility with the existing
COPY BINARY file format is possible, but it seems rather a headache:
we'd need to detoast all the toasted columns, then heap_formtuple a
new tuple containing the expanded data, and finally write that out.
(Can't do it on a field-by-field basis because the file format requires
the total tuple size to precede the tuple data.) Kind of ugly.

The existing COPY BINARY file format is entirely brain-dead anyway; for
example, it wants the number of tuples to be stored at the front, which
means we have to scan the whole relation an extra time to get that info.
Its handling of nulls is bizarre, too. I'm thinking this might be a
good time to abandon backwards compatibility and switch to a format
that's a little easier to read and write. Does anyone have an opinion
pro or con about that?

regards, tom lane

Mikheev, Vadim

vmikheev@SECTORBASE.COM

about 25 years ago

In reply to: Tom Lane (#1)

RE: COPY BINARY is broken...

The existing COPY BINARY file format is entirely brain-dead
anyway; for example, it wants the number of tuples to be stored
at the front, which means we have to scan the whole relation an
extra time to get that info. Its handling of nulls is bizarre, too.
I'm thinking this might be a good time to abandon backwards
compatibility and switch to a format that's a little easier to read
and write. Does anyone have an opinion pro or con about that?

Switch to new format.

Vadim

Import Notes

Resolved by subject fallback

Mikheev, Vadim

vmikheev@SECTORBASE.COM

about 25 years ago

In reply to: Mikheev, Vadim (#2)

RE: COPY BINARY is broken...

Alfred Perlstein <bright@wintelcom.net> writes:

I would rip it out.

I thought about that too, but was afraid to suggest it ;-)

How many people are actually using COPY BINARY?

It could be useful if only single scan would be required.
But I have no strong opinion about keeping it.

Vadim

Import Notes

Resolved by subject fallback

Alfred Perlstein

bright@wintelcom.net

about 25 years ago

In reply to: Tom Lane (#1)

Re: COPY BINARY is broken...

* Tom Lane <tgl@sss.pgh.pa.us> [001201 14:42] wrote:

I've just noticed that COPY BINARY is pretty thoroughly broken by TOAST,
because what it does is to dump out verbatim the bytes making up each
tuple of the relation. In the case of a moved-off value, you'll get
the toast reference, which is not going to be too helpful for reloading
the table data. In the case of a compressed-in-line datum, you'll at
least have all the data there, but the COPY BINARY reader will crash
and burn when it sees it.

Fixing this while retaining backwards compatibility with the existing
COPY BINARY file format is possible, but it seems rather a headache:
we'd need to detoast all the toasted columns, then heap_formtuple a
new tuple containing the expanded data, and finally write that out.
(Can't do it on a field-by-field basis because the file format requires
the total tuple size to precede the tuple data.) Kind of ugly.

The existing COPY BINARY file format is entirely brain-dead anyway; for
example, it wants the number of tuples to be stored at the front, which
means we have to scan the whole relation an extra time to get that info.
Its handling of nulls is bizarre, too. I'm thinking this might be a
good time to abandon backwards compatibility and switch to a format
that's a little easier to read and write. Does anyone have an opinion
pro or con about that?

BINARY COPY scared the bejeezus out of me, anyone using the interface
is asking for trouble and supporting it seems like a nightmare, I
would rip it out.

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."

Tom Lane

tgl@sss.pgh.pa.us

about 25 years ago

In reply to: Alfred Perlstein (#4)

Re: COPY BINARY is broken...

Alfred Perlstein <bright@wintelcom.net> writes:

I would rip it out.

I thought about that too, but was afraid to suggest it ;-)

How many people are actually using COPY BINARY?

regards, tom lane

Alfred Perlstein

bright@wintelcom.net

about 25 years ago

In reply to: Tom Lane (#5)

Re: COPY BINARY is broken...

* Tom Lane <tgl@sss.pgh.pa.us> [001201 14:57] wrote:

Alfred Perlstein <bright@wintelcom.net> writes:

I would rip it out.

I thought about that too, but was afraid to suggest it ;-)

I think you'd agree that you have more fun and important things to
do than to deal with this yucky interface. :)

How many people are actually using COPY BINARY?

I'm not using it. :)

How about adding COPY XML?

(kidding of course about the XML, but it would make postgresql more
buzzword compliant :) )

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."

Don Baccus

dhogaza@pacifier.com

about 25 years ago

In reply to: Alfred Perlstein (#6)

Re: COPY BINARY is broken...

At 03:05 PM 12/1/00 -0800, Alfred Perlstein wrote:

How about adding COPY XML?
(kidding of course about the XML, but it would make postgresql more
buzzword compliant :) )

Hey, we could add a parser and call the module MyXML ...

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.

Samuel Sieb

samuel@sieb.net

about 25 years ago

In reply to: Tom Lane (#5)

Re: COPY BINARY is broken...

On Fri, Dec 01, 2000 at 05:56:57PM -0500, Tom Lane wrote:

Alfred Perlstein <bright@wintelcom.net> writes:

I would rip it out.

I thought about that too, but was afraid to suggest it ;-)

How many people are actually using COPY BINARY?

I have used it, I don't think I'm actually using right now. But, it was
very handy. (Once I finally figured out the format through trial and error,
ouch!) It is very nice to be able to just dump tuples in, instead of having
to format them to text, then the database has to put them back to binary
again. So, an alternative with a clean interface would be very much
appreciated.

Adriaan Joubert

a.joubert@albourne.com

about 25 years ago

In reply to: Mikheev, Vadim (#2)

Re: COPY BINARY is broken...

Hi,

I would very much like some way of writing binary data to a database.
Copy binary recently broke on me after upgrading to 7.0. I have large
simulation codes and writing lots of floats to the database by
converting them to text first is 1) a real pain, 2) slow and 3) can lead
to unexpected loss in precision.

I think binary writes would actually be solved better and safer through
some type of CORBA interface, but previous discussions seemed to
indicate that that is even more of a pain than fixing the current binary
interface.

So I agree that the current version is a problem, but I do think
something needs to be put in place. Not everybody only writes a few
numbers from a web page into the database -- some have masses of data to
dump into a database. For all I care it doesn't even have to look like
SQL, but can be purely accessible through libpq.

Adriaan

#10

Tom Lane

tgl@sss.pgh.pa.us

about 25 years ago

In reply to: Adriaan Joubert (#9)

Re: COPY BINARY is broken...

Adriaan Joubert <a.joubert@albourne.com> writes:

Copy binary recently broke on me after upgrading to 7.0.

I think you're talking about binary copy via the frontend, which has a
different set of problems. To fix that, we need to make some protocol
changes, which would (preferably) also apply to non-binary frontend
copy, which would create a compatibility problem. (The reason the
protocol is broken is there's no reasonable way to find or signal the
end of the COPY data stream after an error.)

I think that's worth doing, but there's no time to design and implement
it for 7.1. Maybe for 7.2.

I think binary writes would actually be solved better and safer through
some type of CORBA interface,

CORBA would provide a more machine-independent interface, but migrating
to CORBA would be a huge task, and I'm not sure the payoff is worth
it...

regards, tom lane

#11

Bruce Momjian

pgman@candle.pha.pa.us

about 25 years ago

In reply to: Alfred Perlstein (#4)

Re: COPY BINARY is broken...

Its handling of nulls is bizarre, too. I'm thinking this might be a
good time to abandon backwards compatibility and switch to a format
that's a little easier to read and write. Does anyone have an opinion
pro or con about that?

BINARY COPY scared the bejeezus out of me, anyone using the interface
is asking for trouble and supporting it seems like a nightmare, I
would rip it out.

Tom, just keep in mind that the format is documented in copy.sgml.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#12

Tom Lane

tgl@sss.pgh.pa.us

about 25 years ago

In reply to: Bruce Momjian (#11)

Re: COPY BINARY is broken...

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Its handling of nulls is bizarre, too. I'm thinking this might be a
good time to abandon backwards compatibility and switch to a format
that's a little easier to read and write. Does anyone have an opinion
pro or con about that?

BINARY COPY scared the bejeezus out of me, anyone using the interface
is asking for trouble and supporting it seems like a nightmare, I
would rip it out.

Tom, just keep in mind that the format is documented in copy.sgml.

Not documented *correctly*, I notice. There are at least two errors,
plus the rather major omission that <tuple data> is not explained.

regards, tom lane

#13

Bruce Momjian

pgman@candle.pha.pa.us

almost 25 years ago

In reply to: Tom Lane (#1)

Re: COPY BINARY is broken...

Tom, this is fixed, right?

I've just noticed that COPY BINARY is pretty thoroughly broken by TOAST,
because what it does is to dump out verbatim the bytes making up each
tuple of the relation. In the case of a moved-off value, you'll get
the toast reference, which is not going to be too helpful for reloading
the table data. In the case of a compressed-in-line datum, you'll at
least have all the data there, but the COPY BINARY reader will crash
and burn when it sees it.

Fixing this while retaining backwards compatibility with the existing
COPY BINARY file format is possible, but it seems rather a headache:
we'd need to detoast all the toasted columns, then heap_formtuple a
new tuple containing the expanded data, and finally write that out.
(Can't do it on a field-by-field basis because the file format requires
the total tuple size to precede the tuple data.) Kind of ugly.

The existing COPY BINARY file format is entirely brain-dead anyway; for
example, it wants the number of tuples to be stored at the front, which
means we have to scan the whole relation an extra time to get that info.
Its handling of nulls is bizarre, too. I'm thinking this might be a
good time to abandon backwards compatibility and switch to a format
that's a little easier to read and write. Does anyone have an opinion
pro or con about that?

regards, tom lane

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#14

Tom Lane

tgl@sss.pgh.pa.us

almost 25 years ago

In reply to: Bruce Momjian (#13)

Re: COPY BINARY is broken...

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom, this is fixed, right?

Yes.

regards, tom lane