Load data from a csv file without using COPY

Started by Ravi Krishnaalmost 8 years ago29 messagesgeneral
Jump to latest
#1Ravi Krishna
srkrishna@yahoo.com

In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.

thanks

#2Nicolas Paris
niparisco@gmail.com
In reply to: Ravi Krishna (#1)
Re: Load data from a csv file without using COPY

hi

AFAIK you can use copy from a jdbc command since copy allows to stream data
(from stdin version)
However while faster than INSERT INTO, this might lock the target table
during the process

2018-06-19 22:16 GMT+02:00 Ravi Krishna <srkrishna@yahoo.com>:

Show quoted text

In order to test a real life scenario (and use it for benchmarking) I want
to load large number of data from csv files.
The requirement is that the load should happen like an application writing
to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and
insert it to the database row by row.

thanks

#3Hans Schou
hans.schou@gmail.com
In reply to: Ravi Krishna (#1)
Re: Load data from a csv file without using COPY

On Tue, Jun 19, 2018 at 10:17 PM Ravi Krishna <srkrishna@yahoo.com> wrote:

In order to test a real life scenario (and use it for benchmarking) I want
to load large number of data from csv files.
The requirement is that the load should happen like an application writing
to the database ( that is, no COPY command).

Once you have parsed the data it is fairly easy to use PostgreSQL "COPY
FROM stdin" format. If you have all data with a tabulator separator. A
simple table (t1) could look like:

COPY t1 (f1,f2) FROM stdin;
3<tab>Joe
7<tab>Jane
\.

These data can be piped directly to psql and it will be fast.

Note: NULL should be '\N', see manual:
https://www.postgresql.org/docs/current/static/sql-copy.html

It is the same kind of data you get with pg_dump.

./hans

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Ravi Krishna (#1)
Re: Load data from a csv file without using COPY

On Tue, Jun 19, 2018 at 1:16 PM, Ravi Krishna <srkrishna@yahoo.com> wrote:

In order to test a real life scenario (and use it for benchmarking) I want
to load large number of data from csv files.
The requirement is that the load should happen like an application writing
to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and
insert it to the database row by row.

​I'm skeptical that injesting CSV of any form, even if you intentionally
blow things up by converting into:

BEGIN;
INSERT INTO tbl VALUES ('','','');
COMMIT;
BEGIN;
INSERT INTO tbl VALUES ('','','');
COMMIT;

(which is what auto-commit mode looks like)

Is going to provide a meaningful ​benchmark for application-like usage
patterns.

But anyway, I'm not familiar with any tools that make doing this
particularly simple. In most situations like this I'll just import the CSV
into a spreadsheet and create a formula that builds out the individual SQL
commands. Whether that's useful depends a lot on how often the source CSV
is updated.

That said, I have the following tool to be generally helpful in this area -
though I'm thinking it doesn't do what you want here.

http://csvkit.readthedocs.io/en/1.0.3/scripts/csvsql.html

David J.

#5Steve Atkins
steve@blighty.com
In reply to: Ravi Krishna (#1)
Re: Load data from a csv file without using COPY

On Jun 19, 2018, at 9:16 PM, Ravi Krishna <srkrishna@yahoo.com> wrote:

In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.

If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's
supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be
an appropriate benchmark.

Cheers,
Steve

#6Alban Hertroys
haramrae@gmail.com
In reply to: Ravi Krishna (#1)
Re: Load data from a csv file without using COPY

On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:

In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.

thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

#7Ravi Krishna
srkrishna@yahoo.com
In reply to: Steve Atkins (#5)
Re: Load data from a csv file without using COPY

If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's
supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be
an appropriate benchmark.

Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in
two diff env.

#8Ravi Krishna
srkrishna@yahoo.com
In reply to: Alban Hertroys (#6)
Re: Load data from a csv file without using COPY

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

This was my first thought too. However, as I understand, pg_dump --insert basically runs INSERT INTO ... sql for every row.
In other words, each row is un-prepared and executed individually. That is also not real life scenario.

#9Steve Atkins
steve@blighty.com
In reply to: Ravi Krishna (#7)
Re: Load data from a csv file without using COPY

On Jun 19, 2018, at 10:14 PM, Ravi Krishna <srkrishna@yahoo.com> wrote:

If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's
supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be
an appropriate benchmark.

Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in
two diff env.

That's something I'd only do if I intended to rig a benchmark between a RDBMS with good bulk import
and one without. If that's not your goal, your approach doesn't seem to make sense and is unlikely
to provide performance metrics that are useful or related to your app performance, unless you intend
to hamstring your app in exactly the same way you're running the benchmark.

Maybe use your app, or write ten minutes worth of code that'll interact with the database in much the
same way as your app will?

Cheers,
Steve

#10Rob Sargent
robjsargent@gmail.com
In reply to: Ravi Krishna (#7)
Re: Load data from a csv file without using COPY

On 06/19/2018 03:14 PM, Ravi Krishna wrote:

If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's
supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be
an appropriate benchmark.

Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in
two diff env.

If bulk loading is the actual production target, all your RDBMS choices
have their own loaders.  I suggest that is what you ought tocompare.

#11David G. Johnston
david.g.johnston@gmail.com
In reply to: Ravi Krishna (#8)
Re: Load data from a csv file without using COPY

On Tue, Jun 19, 2018 at 2:17 PM, Ravi Krishna <srkrishna@yahoo.com> wrote:

I think an easy approach would be to COPY the CSV files into a separate

database using psql's \copy command and then pg_dump that as separate
insert statements with pg_dump —inserts.

This was my first thought too. However, as I understand, pg_dump --insert
basically runs INSERT INTO ... sql for every row.
In other words, each row is un-prepared and executed individually. That
is also not real life scenario.

​You really need to describe what you consider to be a "real life​
scenario"; and probably give a better idea of creation and number of these
csv files. In addition to describing the relevant behavior of the
application you are testing.

If you want maximum realism you should probably write integration tests for
your application and then execute those at high volume.

Or at minimum give an example of the output you would want from this
unknown program...

David J.

#12Tim Cross
theophilusx@gmail.com
In reply to: Ravi Krishna (#1)
Re: Load data from a csv file without using COPY

Ravi Krishna <srkrishna@yahoo.com> writes:

In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.

Not clear what you mean by 'real world scenario', but you could possibly
use PG's foreign table support and define a csv file as a foreign table
and then have scripts which read from there and do whatever
insert/update etc you need. However, this has a high level of 'fakery'
going on and probably not testing what you really want.

There are lots of ways that applications write to the database -
different drivers (e.g. jdbc, odbc, pg etc), different commit
and transaction strategies and even different ways to handle things like
an update or insert process. You can even use streams and copy from an
application.

To get 'real world' equivalence, you really need to use the same
interface as the application you are comparing. Most languages have
support for processing CSV files, so you may be better off writing a
small 'wrapper' app which uses the same drivers and assuming your
database connectivity has been abstracted into some sort of
module/library/class, use the same interface to write to the database
that the application uses.

Tim
--
Tim Cross

#13Asif Ali
asif2k@hotmail.com
In reply to: Tim Cross (#12)
Re: Load data from a csv file without using COPY

how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

________________________________
From: Tim Cross <theophilusx@gmail.com>
Sent: Wednesday, June 20, 2018 2:59 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

Ravi Krishna <srkrishna@yahoo.com> writes:

In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.

Not clear what you mean by 'real world scenario', but you could possibly
use PG's foreign table support and define a csv file as a foreign table
and then have scripts which read from there and do whatever
insert/update etc you need. However, this has a high level of 'fakery'
going on and probably not testing what you really want.

There are lots of ways that applications write to the database -
different drivers (e.g. jdbc, odbc, pg etc), different commit
and transaction strategies and even different ways to handle things like
an update or insert process. You can even use streams and copy from an
application.

To get 'real world' equivalence, you really need to use the same
interface as the application you are comparing. Most languages have
support for processing CSV files, so you may be better off writing a
small 'wrapper' app which uses the same drivers and assuming your
database connectivity has been abstracted into some sort of
module/library/class, use the same interface to write to the database
that the application uses.

Tim
--
Tim Cross

#14Asif Ali
asif2k@hotmail.com
In reply to: Alban Hertroys (#6)
Re: Load data from a csv file without using COPY

how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

Bye

________________________________
From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:

In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.

thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

#15James Keener
jim@jimkeener.com
In reply to: Asif Ali (#14)
Re: Load data from a csv file without using COPY

Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.

On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:

how the fuck i unsubscribe to this mailing list , i get more than 100
emails a day

Bye

________________________________
From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:

In order to test a real life scenario (and use it for benchmarking) I

want to load large number of data from csv files.

The requirement is that the load should happen like an application

writing to the database ( that is, no COPY command).

Is there a tool which can do the job. Basically parse the csv file

and insert it to the database row by row.

thanks

I think an easy approach would be to COPY the CSV files into a separate
database using psql's \copy command and then pg_dump that as separate
insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#16Asif Ali
asif2k@hotmail.com
In reply to: James Keener (#15)
Re: Load data from a csv file without using COPY

please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the list

Bye

________________________________
From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.

On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

Bye

________________________________
From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:

In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.

thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#17David G. Johnston
david.g.johnston@gmail.com
In reply to: Asif Ali (#16)
Re: Load data from a csv file without using COPY

https://lists.postgresql.org/unsubscribe/

On Tuesday, June 19, 2018, Asif Ali <asif2k@hotmail.com> wrote:

Show quoted text

please just tell me the site i will do it right away and i have marked it
junked so many times , i will keep spamming it until my email address is
removed from the list

Bye

------------------------------
*From:* James Keener <jim@jimkeener.com>
*Sent:* Wednesday, June 20, 2018 3:11 AM
*To:* pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi
Krishna
*Cc:* PG mailing List
*Subject:* Re: Load data from a csv file without using COPY

Seriously, stop spamming the list and stop cursing and acting like a
petulant child. Go to the site and unsubscribe or use a mail client that
understands the standard list headers.

On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:

how the fuck i unsubscribe to this mailing list , i get more than 100
emails a day

Bye

------------------------------
*From:* Alban Hertroys <haramrae@gmail.com>
*Sent:* Wednesday, June 20, 2018 2:10 AM
*To:* Ravi Krishna
*Cc:* PG mailing List
*Subject:* Re: Load data from a csv file without using COPY

On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:

In order to test a real life scenario (and use it for benchmarking) I

want to load large number of data from csv files.

The requirement is that the load should happen like an application

writing to the database ( that is, no COPY command).

Is there a tool which can do the job. Basically parse the csv file and

insert it to the database row by row.

thanks

I think an easy approach would be to COPY the CSV files into a separate
database using psql's \copy command and then pg_dump that as separate
insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#18James Keener
jim@jimkeener.com
In reply to: Asif Ali (#16)
Re: Load data from a csv file without using COPY

It's people like you who make spam filters worse for the rest of us to the point they need to be checked daily for false positives. I'm sure you could have found it in less time than it took to spam the list with obscenities.

On June 19, 2018 6:13:49 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:

please just tell me the site i will do it right away and i have marked
it junked so many times , i will keep spamming it until my email
address is removed from the list

Bye

________________________________
From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi
Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

Seriously, stop spamming the list and stop cursing and acting like a
petulant child. Go to the site and unsubscribe or use a mail client
that understands the standard list headers.

On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100
emails a day

Bye

________________________________
From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:

In order to test a real life scenario (and use it for benchmarking) I

want to load large number of data from csv files.

The requirement is that the load should happen like an application

writing to the database ( that is, no COPY command).

Is there a tool which can do the job. Basically parse the csv file

and insert it to the database row by row.

thanks

I think an easy approach would be to COPY the CSV files into a separate
database using psql's \copy command and then pg_dump that as separate
insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#19Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Asif Ali (#16)
Re: Load data from a csv file without using COPY

On 06/19/2018 03:13 PM, Asif Ali wrote:

please just tell me the site i will do it right away and i have marked
it junked so many times , i will keep spamming it until my email address
is removed from the list

https://lists.postgresql.org/unsubscribe/

Bye

------------------------------------------------------------------------
*From:* James Keener <jim@jimkeener.com>
*Sent:* Wednesday, June 20, 2018 3:11 AM
*To:* pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi
Krishna
*Cc:* PG mailing List
*Subject:* Re: Load data from a csv file without using COPY
Seriously, stop spamming the list and stop cursing and acting like a
petulant child. Go to the site and unsubscribe or use a mail client that
understands the standard list headers.

On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:

how the fuck i unsubscribe to this mailing list , i get more than
100 emails a day

Bye

------------------------------------------------------------------------
*From:* Alban Hertroys <haramrae@gmail.com>
*Sent:* Wednesday, June 20, 2018 2:10 AM
*To:* Ravi Krishna
*Cc:* PG mailing List
*Subject:* Re: Load data from a csv file without using COPY

On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:

In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job.� Basically parse the csv file and insert it to the database row by row.

thanks

I think an easy approach would be to COPY the CSV files into a
separate database using psql's \copy command and then pg_dump that
as separate insert statements with pg_dump �inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

--
Adrian Klaver
adrian.klaver@aklaver.com

#20Asif Ali
asif2k@hotmail.com
In reply to: James Keener (#18)
Re: Load data from a csv file without using COPY

just tell me the site , i dont have time to waste on shitty things , i will program a spammer to send email to this list

Bye

________________________________
From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:16 AM
To: Asif Ali; pgsql-general@lists.postgresql.org; Alban Hertroys; Ravi Krishna
Subject: Re: Load data from a csv file without using COPY

It's people like you who make spam filters worse for the rest of us to the point they need to be checked daily for false positives. I'm sure you could have found it in less time than it took to spam the list with obscenities.

On June 19, 2018 6:13:49 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the list

Bye

________________________________
From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.

On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day

Bye

________________________________
From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY

On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:

In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.

thanks

I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#21Asif Ali
asif2k@hotmail.com
In reply to: Asif Ali (#20)
#22Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Asif Ali (#20)
#23Stephen Frost
sfrost@snowman.net
In reply to: Adrian Klaver (#22)
#24Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Ravi Krishna (#1)
#25Ravi Krishna
srkrishna@yahoo.com
In reply to: Nicolas Paris (#2)
#26Michael Paquier
michael@paquier.xyz
In reply to: David G. Johnston (#11)
#27Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Ravi Krishna (#25)
#28Kevin Brannen
KBrannen@efji.com
In reply to: Ravi Krishna (#7)
#29Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Kevin Brannen (#28)