Loading binary data into the database
Hi,
I'm having serious problems trying to load large amounts of data
into the database. I have this data in binary, database compatible,
form, but there seems no way left to upload this data except to turn it
all back into strings (since 7.0.2 binary copy has been disabled, making
quite a few libpq functions superfluous). This is quite a serious
deficiency in my view.
So, I want to know what my options are. For this type of thing I would
have thought that Karel's stored queries would be useful, combined with
some way of uploading binary data to the database. Something along the
lines of
prepare insert into my_table (col1, col2) values (?,?);
execute <handle to the query> 3, 4;
To upload binary data there would have to be a libpq call that uploads
the data in the execute statement as binary (there is a specification
for this already in the current libpq), and executes the prepared plan.
For any application that generates a lot of data (especially floating
point data), this would be a huge win. An added advantage would be that
this type of schema would allow a serial value on the table to be
incremented as in any normal insert, which has always been annoying when
using copy.
I have no idea how hard this is and whether I'm the only person in the
world that will find this useful. I seem to be the only one who moaned
about the binary copy vanishing, and breaking code, so perhaps nobody
else sees this as a problem?
Adriaan
Adriaan Joubert <a.joubert@albourne.com> writes:
since 7.0.2 binary copy has been disabled, making
quite a few libpq functions superfluous
Such as? IIRC, the reason we disabled it was precisely that there was
no support on the client side. (What's worse, there's no support in
the FE/BE protocol either. I don't see how you could have made this
work...)
Cross-machine binary copy is a dangerous thing anyway, since it opens
you up to all sorts of compatibility problems. If your app is running
on the same machine as the server, you can write data to a file and
then send a command to do a binary copy from that file.
regards, tom lane
Tom Lane wrote:
Adriaan Joubert <a.joubert@albourne.com> writes:
since 7.0.2 binary copy has been disabled, making
quite a few libpq functions superfluousSuch as? IIRC, the reason we disabled it was precisely that there was
no support on the client side. (What's worse, there's no support in
the FE/BE protocol either. I don't see how you could have made this
work...)
I issued a 'copy binary <table> from stdin;' and then sent the data with
PQputnbytes (this is now obsolete, isn't it?) and as this was from a
CORBA server running on the same machine as the database this worked fine
and was very fast (not being able to update a serial was a pain, and I
ended up doing it by hand in the server). As some of the data I have to
write goes into bytea fields, i now have to convert all non-printable
characters to octet codes, which is a total pain in the neck.
Cross-machine binary copy is a dangerous thing anyway, since it opens
you up to all sorts of compatibility problems. If your app is running
on the same machine as the server, you can write data to a file and
then send a command to do a binary copy from that file.
Yes sure, if you write from a machine with a different architecture it is
going to cause trouble. Reading and writing binary files on the host
machine seems kind-of a slow solution to the problem and leads to yet
another load of permission problems (Ok, they can be resolved, but it is
yet another place where things can go wrong).
Perhaps libpq is not the answer. I've even been thinking about writing a
SPI function that acts as a CORBA server -- but decided that that is just
too ugly to contemplate. So what is the solution?
Adriaan
Adriaan Joubert <a.joubert@albourne.com> writes:
Such as? IIRC, the reason we disabled it was precisely that there was
no support on the client side. (What's worse, there's no support in
the FE/BE protocol either. I don't see how you could have made this
work...)
I issued a 'copy binary <table> from stdin;' and then sent the data with
PQputnbytes
How did you get out of COPY state? In binary mode CopyFrom will only
recognize EOF as end of data, and there's no provision in the FE/BE
protocol for making it see an EOF. You'd have had to break the
connection to get out of that --- and I'd have expected the loss of
connection to cause a transaction abort, preventing your data from
getting committed. (If it didn't abort, that's a bug that needs to be
fixed... if the line drops halfway through a copy, you don't want it
to commit do you?)
The real bottom line here is that the FE/BE protocol would need to be
changed to support binary copy properly, and no one's excited about
putting more work into the existing protocol, nor about the ensuing
compatibility problems.
Perhaps libpq is not the answer. I've even been thinking about writing a
SPI function that acts as a CORBA server -- but decided that that is just
too ugly to contemplate. So what is the solution?
A CORBA-based replacement protocol has been discussed seriously, though
I haven't noticed any work getting done on it lately. Feel free to
pitch in if you think it's a good idea.
regards, tom lane
Tom Lane wrote:
Adriaan Joubert <a.joubert@albourne.com> writes:
Such as? IIRC, the reason we disabled it was precisely that there was
no support on the client side. (What's worse, there's no support in
the FE/BE protocol either. I don't see how you could have made this
work...)I issued a 'copy binary <table> from stdin;' and then sent the data with
PQputnbytesHow did you get out of COPY state? In binary mode CopyFrom will only
recognize EOF as end of data, and there's no provision in the FE/BE
protocol for making it see an EOF. You'd have had to break the
connection to get out of that --- and I'd have expected the loss of
connection to cause a transaction abort, preventing your data from
getting committed. (If it didn't abort, that's a bug that needs to be
fixed... if the line drops halfway through a copy, you don't want it
to commit do you?)
Don't know. I first sent the length of the binary buffer, then the buffer (I
just stored the whole thing in an STL vector) and PQendcopy to terminate it.
But, as you said, libpq is probably not the right way to go about it. Also,
the docs for the binary structure were not quite correct, but it took only a
little bit of fiddling to get the structure right. I could not find the
description of the binary structure back in the current docs on
postgresql.org, so I guess this really has been ripped out.
A CORBA-based replacement protocol has been discussed seriously, though
I haven't noticed any work getting done on it lately. Feel free to
pitch in if you think it's a good idea.
Yes, I've been looking through the mailing list. Problem is to settle on a
CORBA system that runs everywhere. And it is much more natural to program
CORBA in C++, but if I see the problems people have had just compiling the C++
interface to postgres, this looks like a no-go. I'll look around at the
various bits and pieces floating around the net.
If anybody is working on a CORBA interface to postgres, please let me know!
Adriaan
Adriaan Joubert <a.joubert@albourne.com> writes:
since 7.0.2 binary copy has been disabled, making
quite a few libpq functions superfluousSuch as? IIRC, the reason we disabled it was precisely that there was
no support on the client side. (What's worse, there's no support in
the FE/BE protocol either. I don't see how you could have made this
work...)Cross-machine binary copy is a dangerous thing anyway, since it opens
you up to all sorts of compatibility problems. If your app is running
on the same machine as the server, you can write data to a file and
then send a command to do a binary copy from that file.
We disabled binary copy? Using \copy or COPY?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026