Load data from a csv file without using COPY
In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
thanks
hi
AFAIK you can use copy from a jdbc command since copy allows to stream data
(from stdin version)
However while faster than INSERT INTO, this might lock the target table
during the process
2018-06-19 22:16 GMT+02:00 Ravi Krishna <srkrishna@yahoo.com>:
Show quoted text
In order to test a real life scenario (and use it for benchmarking) I want
to load large number of data from csv files.
The requirement is that the load should happen like an application writing
to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and
insert it to the database row by row.thanks
On Tue, Jun 19, 2018 at 10:17 PM Ravi Krishna <srkrishna@yahoo.com> wrote:
In order to test a real life scenario (and use it for benchmarking) I want
to load large number of data from csv files.
The requirement is that the load should happen like an application writing
to the database ( that is, no COPY command).
Once you have parsed the data it is fairly easy to use PostgreSQL "COPY
FROM stdin" format. If you have all data with a tabulator separator. A
simple table (t1) could look like:
COPY t1 (f1,f2) FROM stdin;
3<tab>Joe
7<tab>Jane
\.
These data can be piped directly to psql and it will be fast.
Note: NULL should be '\N', see manual:
https://www.postgresql.org/docs/current/static/sql-copy.html
It is the same kind of data you get with pg_dump.
./hans
On Tue, Jun 19, 2018 at 1:16 PM, Ravi Krishna <srkrishna@yahoo.com> wrote:
In order to test a real life scenario (and use it for benchmarking) I want
to load large number of data from csv files.
The requirement is that the load should happen like an application writing
to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and
insert it to the database row by row.
I'm skeptical that injesting CSV of any form, even if you intentionally
blow things up by converting into:
BEGIN;
INSERT INTO tbl VALUES ('','','');
COMMIT;
BEGIN;
INSERT INTO tbl VALUES ('','','');
COMMIT;
(which is what auto-commit mode looks like)
Is going to provide a meaningful benchmark for application-like usage
patterns.
But anyway, I'm not familiar with any tools that make doing this
particularly simple. In most situations like this I'll just import the CSV
into a spreadsheet and create a formula that builds out the individual SQL
commands. Whether that's useful depends a lot on how often the source CSV
is updated.
That said, I have the following tool to be generally helpful in this area -
though I'm thinking it doesn't do what you want here.
http://csvkit.readthedocs.io/en/1.0.3/scripts/csvsql.html
David J.
On Jun 19, 2018, at 9:16 PM, Ravi Krishna <srkrishna@yahoo.com> wrote:
In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's
supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be
an appropriate benchmark.
Cheers,
Steve
On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.thanks
I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's
supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be
an appropriate benchmark.
Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in
two diff env.
I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.
This was my first thought too. However, as I understand, pg_dump --insert basically runs INSERT INTO ... sql for every row.
In other words, each row is un-prepared and executed individually. That is also not real life scenario.
On Jun 19, 2018, at 10:14 PM, Ravi Krishna <srkrishna@yahoo.com> wrote:
If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's
supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be
an appropriate benchmark.Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in
two diff env.
That's something I'd only do if I intended to rig a benchmark between a RDBMS with good bulk import
and one without. If that's not your goal, your approach doesn't seem to make sense and is unlikely
to provide performance metrics that are useful or related to your app performance, unless you intend
to hamstring your app in exactly the same way you're running the benchmark.
Maybe use your app, or write ten minutes worth of code that'll interact with the database in much the
same way as your app will?
Cheers,
Steve
On 06/19/2018 03:14 PM, Ravi Krishna wrote:
If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's
supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be
an appropriate benchmark.Actually the reluctance to not use COPY is to make the benchmark same across two different RDBMS in
two diff env.
If bulk loading is the actual production target, all your RDBMS choices
have their own loaders. I suggest that is what you ought tocompare.
On Tue, Jun 19, 2018 at 2:17 PM, Ravi Krishna <srkrishna@yahoo.com> wrote:
I think an easy approach would be to COPY the CSV files into a separate
database using psql's \copy command and then pg_dump that as separate
insert statements with pg_dump —inserts.This was my first thought too. However, as I understand, pg_dump --insert
basically runs INSERT INTO ... sql for every row.
In other words, each row is un-prepared and executed individually. That
is also not real life scenario.
You really need to describe what you consider to be a "real life
scenario"; and probably give a better idea of creation and number of these
csv files. In addition to describing the relevant behavior of the
application you are testing.
If you want maximum realism you should probably write integration tests for
your application and then execute those at high volume.
Or at minimum give an example of the output you would want from this
unknown program...
David J.
Ravi Krishna <srkrishna@yahoo.com> writes:
In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
Not clear what you mean by 'real world scenario', but you could possibly
use PG's foreign table support and define a csv file as a foreign table
and then have scripts which read from there and do whatever
insert/update etc you need. However, this has a high level of 'fakery'
going on and probably not testing what you really want.
There are lots of ways that applications write to the database -
different drivers (e.g. jdbc, odbc, pg etc), different commit
and transaction strategies and even different ways to handle things like
an update or insert process. You can even use streams and copy from an
application.
To get 'real world' equivalence, you really need to use the same
interface as the application you are comparing. Most languages have
support for processing CSV files, so you may be better off writing a
small 'wrapper' app which uses the same drivers and assuming your
database connectivity has been abstracted into some sort of
module/library/class, use the same interface to write to the database
that the application uses.
Tim
--
Tim Cross
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day
________________________________
From: Tim Cross <theophilusx@gmail.com>
Sent: Wednesday, June 20, 2018 2:59 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
Ravi Krishna <srkrishna@yahoo.com> writes:
In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.
Not clear what you mean by 'real world scenario', but you could possibly
use PG's foreign table support and define a csv file as a foreign table
and then have scripts which read from there and do whatever
insert/update etc you need. However, this has a high level of 'fakery'
going on and probably not testing what you really want.
There are lots of ways that applications write to the database -
different drivers (e.g. jdbc, odbc, pg etc), different commit
and transaction strategies and even different ways to handle things like
an update or insert process. You can even use streams and copy from an
application.
To get 'real world' equivalence, you really need to use the same
interface as the application you are comparing. Most languages have
support for processing CSV files, so you may be better off writing a
small 'wrapper' app which uses the same drivers and assuming your
database connectivity has been abstracted into some sort of
module/library/class, use the same interface to write to the database
that the application uses.
Tim
--
Tim Cross
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day
Bye
________________________________
From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.thanks
I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.
On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100
emails a dayBye
________________________________
From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPYOn 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
In order to test a real life scenario (and use it for benchmarking) I
want to load large number of data from csv files.
The requirement is that the load should happen like an application
writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file
and insert it to the database row by row.
thanks
I think an easy approach would be to COPY the CSV files into a separate
database using psql's \copy command and then pg_dump that as separate
insert statements with pg_dump —inserts.Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the list
Bye
________________________________
From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.
On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day
Bye
________________________________
From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.thanks
I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
https://lists.postgresql.org/unsubscribe/
On Tuesday, June 19, 2018, Asif Ali <asif2k@hotmail.com> wrote:
Show quoted text
please just tell me the site i will do it right away and i have marked it
junked so many times , i will keep spamming it until my email address is
removed from the listBye
------------------------------
*From:* James Keener <jim@jimkeener.com>
*Sent:* Wednesday, June 20, 2018 3:11 AM
*To:* pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi
Krishna
*Cc:* PG mailing List
*Subject:* Re: Load data from a csv file without using COPYSeriously, stop spamming the list and stop cursing and acting like a
petulant child. Go to the site and unsubscribe or use a mail client that
understands the standard list headers.On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100
emails a dayBye
------------------------------
*From:* Alban Hertroys <haramrae@gmail.com>
*Sent:* Wednesday, June 20, 2018 2:10 AM
*To:* Ravi Krishna
*Cc:* PG mailing List
*Subject:* Re: Load data from a csv file without using COPYOn 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
In order to test a real life scenario (and use it for benchmarking) I
want to load large number of data from csv files.
The requirement is that the load should happen like an application
writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and
insert it to the database row by row.
thanks
I think an easy approach would be to COPY the CSV files into a separate
database using psql's \copy command and then pg_dump that as separate
insert statements with pg_dump —inserts.Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
It's people like you who make spam filters worse for the rest of us to the point they need to be checked daily for false positives. I'm sure you could have found it in less time than it took to spam the list with obscenities.
On June 19, 2018 6:13:49 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
please just tell me the site i will do it right away and i have marked
it junked so many times , i will keep spamming it until my email
address is removed from the listBye
________________________________
From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi
Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPYSeriously, stop spamming the list and stop cursing and acting like a
petulant child. Go to the site and unsubscribe or use a mail client
that understands the standard list headers.On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100
emails a dayBye
________________________________
From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPYOn 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
In order to test a real life scenario (and use it for benchmarking) I
want to load large number of data from csv files.
The requirement is that the load should happen like an application
writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file
and insert it to the database row by row.
thanks
I think an easy approach would be to COPY the CSV files into a separate
database using psql's \copy command and then pg_dump that as separate
insert statements with pg_dump —inserts.Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 06/19/2018 03:13 PM, Asif Ali wrote:
please just tell me the site i will do it right away and i have marked
it junked so many times , i will keep spamming it until my email address
is removed from the list
https://lists.postgresql.org/unsubscribe/
Bye
------------------------------------------------------------------------
*From:* James Keener <jim@jimkeener.com>
*Sent:* Wednesday, June 20, 2018 3:11 AM
*To:* pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi
Krishna
*Cc:* PG mailing List
*Subject:* Re: Load data from a csv file without using COPY
Seriously, stop spamming the list and stop cursing and acting like a
petulant child. Go to the site and unsubscribe or use a mail client that
understands the standard list headers.On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than
100 emails a dayBye
------------------------------------------------------------------------
*From:* Alban Hertroys <haramrae@gmail.com>
*Sent:* Wednesday, June 20, 2018 2:10 AM
*To:* Ravi Krishna
*Cc:* PG mailing List
*Subject:* Re: Load data from a csv file without using COPYOn 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job.� Basically parse the csv file and insert it to the database row by row.thanks
I think an easy approach would be to COPY the CSV files into a
separate database using psql's \copy command and then pg_dump that
as separate insert statements with pg_dump �inserts.Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
Adrian Klaver
adrian.klaver@aklaver.com
just tell me the site , i dont have time to waste on shitty things , i will program a spammer to send email to this list
Bye
________________________________
From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:16 AM
To: Asif Ali; pgsql-general@lists.postgresql.org; Alban Hertroys; Ravi Krishna
Subject: Re: Load data from a csv file without using COPY
It's people like you who make spam filters worse for the rest of us to the point they need to be checked daily for false positives. I'm sure you could have found it in less time than it took to spam the list with obscenities.
On June 19, 2018 6:13:49 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
please just tell me the site i will do it right away and i have marked it junked so many times , i will keep spamming it until my email address is removed from the list
Bye
________________________________
From: James Keener <jim@jimkeener.com>
Sent: Wednesday, June 20, 2018 3:11 AM
To: pgsql-general@lists.postgresql.org; Asif Ali; Alban Hertroys; Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
Seriously, stop spamming the list and stop cursing and acting like a petulant child. Go to the site and unsubscribe or use a mail client that understands the standard list headers.
On June 19, 2018 6:06:59 PM EDT, Asif Ali <asif2k@hotmail.com> wrote:
how the fuck i unsubscribe to this mailing list , i get more than 100 emails a day
Bye
________________________________
From: Alban Hertroys <haramrae@gmail.com>
Sent: Wednesday, June 20, 2018 2:10 AM
To: Ravi Krishna
Cc: PG mailing List
Subject: Re: Load data from a csv file without using COPY
On 19 Jun 2018, at 22:16, Ravi Krishna <srkrishna@yahoo.com> wrote:
In order to test a real life scenario (and use it for benchmarking) I want to load large number of data from csv files.
The requirement is that the load should happen like an application writing to the database ( that is, no COPY command).
Is there a tool which can do the job. Basically parse the csv file and insert it to the database row by row.thanks
I think an easy approach would be to COPY the CSV files into a separate database using psql's \copy command and then pg_dump that as separate insert statements with pg_dump —inserts.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.