populating a table via the COPY command using C code.

Started by Mak, Jasonabout 21 years ago5 messagesgeneral

jason.mak@ngc.com

about 21 years ago

Show quoted text

hi,

I'm writing an application in C that basically converts binary data into something meaningful. My first attempt was to parse the binary and insert directly to the database in one step. But this proved to be very slow. So I decided to go with a two step process. The first step is to parse the data and create a flat file with tab delimited fields. The second step is to load this data using the COPY command. I don't quite understand how this is done within C. Can someone provide me with some examples. I've already done some searches on the internet. the examples that I found don't match with I'm trying to do. Please help!

thanks,
jason.

Michael Fuhr

mike@fuhr.org

about 21 years ago

In reply to: Mak, Jason (#1)

Re: populating a table via the COPY command using C code.

On Wed, Apr 27, 2005 at 01:12:42PM -0400, Mak, Jason wrote:

The second step is to load this data using the COPY command.
I don't quite understand how this is done within C.

Are you writing a client application that uses libpq? If so, have
you seen "Functions Associated with the COPY Command" in the libpq
chapter of the documentation?

http://www.postgresql.org/docs/8.0/interactive/libpq-copy.html

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Michael Fuhr

mike@fuhr.org

about 21 years ago

In reply to: Michael Fuhr (#2)

Re: populating a table via the COPY command using C code.

[Please copy the mailing list on replies so others can contribute
to and learn from the discussion.]

On Wed, Apr 27, 2005 at 02:34:26PM -0400, Mak, Jason wrote:

Yes, my application is a client application that uses libpq api, ie.
PQexec, etc... I have looked at the "Functions Associated with the COPY
Command". But I still don't understand. what I really need is an
example of how those api's(PQputCopyData) are used, other than the
"simple" example that's provided.

What example are you looking at and what don't you understand about it?

This "dataload" should be relatively simple. I already have a flat
file created. I should be able to use some api and say here is the
pointer to my db connection and here is a pointer to the flat file.
now do your thing. Perhaps you can explain this to me.

libpq provides the primitives that you could use to implement such
an API: it would be a trivial matter to write a function that opens
the indicated file, reads its contents, and sends them to the
database. As the documentation indicates, you'd use PQexec() or
its ilk to send a COPY FROM STDIN command (see the COPY documentation
for the exact syntax), then PQputCopyData() or PQputline() to send
the data (probably in a loop), then PQputCopyEnd() or PQendcopy()
to indicate that you're finished. Add the necessary file I/O
statements and there's your function.

Do you have a reason for using an intermediate file? Instead of
writing data to the file and then reading it back, you could use
PQputCopyData() or PQputline() to send the data directly to the
database.

Another possibility: if the file resides somewhere the backend can
read, and if you can connect to the database as a superuser, then
you could use COPY tablename FROM 'filename'.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Import Notes

Reply to msg id not found: 521ABD2E7DC4254D9633A530906212D5026A98@xcgny105.northgrum.comReference msg id not found: 521ABD2E7DC4254D9633A530906212D5026A98@xcgny105.northgrum.com | Resolved by subject fallback

Mak, Jason

jason.mak@ngc.com

about 21 years ago

In reply to: Michael Fuhr (#3)

Re: populating a table via the COPY command using C code.

What example are you looking at and what don't you understand about it?

Some of the examples that I looked over are either from the internet or from the Postgres Manual. The API I'm refering to is PQputCopyData. However, with the explanation given below. I'm starting to understand.

libpq provides the primitives that you could use to implement such
an API: it would be a trivial matter to write a function that opens
the indicated file, reads its contents, and sends them to the
database. As the documentation indicates, you'd use PQexec() or
its ilk to send a COPY FROM STDIN command (see the COPY documentation
for the exact syntax), then PQputCopyData() or PQputline() to send
the data (probably in a loop), then PQputCopyEnd() or PQendcopy()
to indicate that you're finished. Add the necessary file I/O
statements and there's your function.

so basically in C, I would open some file i/o using fopen and in a loop. Do something like a read line into the buffer with some byte count and send that to the database using the PQputCopyData. Is this correct??

Do you have a reason for using an intermediate file? Instead of
writing data to the file and then reading it back, you could use
PQputCopyData() or PQputline() to send the data directly to the
database.

For the project I'm working on. We basically setup a postgres data warehouse. We have a large set of binary data that needs to be parsed and translated into something meaningful. We intend to load this processed data into 3 tables using the quickest means possible. I've already tried parsing and doing inserts. but this proved to be very slow. So I figured a 2 step automated process. The first step would be to parse the data and create 3 separate files. then load each file into the warehouse. Never considered using PQputCopyData in realtime. Not sure how this would work given 3 different tables that hold differnet data or how fast it's going to be. but I have tried the last approach. It works fairly well. The only problem is the lack of insight into where it is during the load processing.

What's your thoughts?? which approach would be the fastest?
1) 2 step process.
2) realtime PQputCopyData - not sure how this would work with 3 different tables.
3) COPY tablename FROM 'filename'

thanks,
jason.

-----Original Message-----
From: Michael Fuhr [mailto:mike@fuhr.org]
Sent: Wednesday, April 27, 2005 3:46 PM
To: Mak, Jason
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] populating a table via the COPY command using C
code.

[Please copy the mailing list on replies so others can contribute
to and learn from the discussion.]

On Wed, Apr 27, 2005 at 02:34:26PM -0400, Mak, Jason wrote:

Yes, my application is a client application that uses libpq api, ie.
PQexec, etc... I have looked at the "Functions Associated with the COPY
Command". But I still don't understand. what I really need is an
example of how those api's(PQputCopyData) are used, other than the
"simple" example that's provided.

What example are you looking at and what don't you understand about it?

This "dataload" should be relatively simple. I already have a flat
file created. I should be able to use some api and say here is the
pointer to my db connection and here is a pointer to the flat file.
now do your thing. Perhaps you can explain this to me.

Another possibility: if the file resides somewhere the backend can
read, and if you can connect to the database as a superuser, then
you could use COPY tablename FROM 'filename'.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Import Notes

Resolved by subject fallback

Sean Davis

sdavis2@mail.nih.gov

about 21 years ago

In reply to: Mak, Jason (#4)

Re: populating a table via the COPY command using C code.

On Apr 27, 2005, at 4:48 PM, Mak, Jason wrote:

What's your thoughts?? which approach would be the fastest?
1) 2 step process.
2) realtime PQputCopyData - not sure how this would work with 3
different tables.
3) COPY tablename FROM 'filename'

thanks,
jason.

COPY tablename FROM 'filename'

is VERY fast. Generally, I think people generally load the data into
postgres using COPY (perhaps into a "loader" table that isn't in the
same format that the final tables will be in) and then do data
manipulation and cleaning within the database using database tools.
This paradigm may or may not work for you, but it seems to be pretty
general.

Sean