batch insert/update

Started by blackwater devover 18 years ago7 messagesgeneral
Jump to latest
#1blackwater dev
blackwaterdev@gmail.com

I have some php code that will be pulling in a file via ftp. This file will
contain 20,000+ records that I then need to pump into the postgres db.
These records will represent a subset of the records in a certain table. I
basically need an efficient way to pump these rows into the table, replacing
matching rows (based on id) already there and inserting ones that aren't.
Sort of looping through the result and inserting or updating based on the
presents of the row, what is the best way to handle this? This is something
that will run nightly.

Thanks!

#2Andreas Kretschmer
akretschmer@spamfence.net
In reply to: blackwater dev (#1)
Re: batch insert/update

blackwater dev <blackwaterdev@gmail.com> schrieb:

I have some php code that will be pulling in a file via ftp. This file will
contain 20,000+ records that I then need to pump into the postgres db. These
records will represent a subset of the records in a certain table. I basically
need an efficient way to pump these rows into the table, replacing matching
rows (based on id) already there and inserting ones that aren't. Sort of
looping through the result and inserting or updating based on the presents of
the row, what is the best way to handle this? This is something that will run
nightly.

Insert you data to a extra table and work with regular SQL to
insert/update the destination table. You can use COPY to insert the data
into your extra table, this works very fast, but you need a suitable
file format for this.

Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknow)
Kaufbach, Saxony, Germany, Europe. N 51.05082�, E 13.56889�

#3Ivan Sergio Borgonovo
mail@webthatworks.it
In reply to: Andreas Kretschmer (#2)
Re: batch insert/update

On Wed, 26 Dec 2007 20:48:27 +0100
Andreas Kretschmer <akretschmer@spamfence.net> wrote:

blackwater dev <blackwaterdev@gmail.com> schrieb:

I have some php code that will be pulling in a file via ftp.
This file will contain 20,000+ records that I then need to pump
into the postgres db. These records will represent a subset of
the records in a certain table. I basically need an efficient
way to pump these rows into the table, replacing matching rows
(based on id) already there and inserting ones that aren't. Sort
of looping through the result and inserting or updating based on
the presents of the row, what is the best way to handle this?
This is something that will run nightly.

Insert you data to a extra table and work with regular SQL to
insert/update the destination table. You can use COPY to insert the
data into your extra table, this works very fast, but you need a
suitable file format for this.

What if you know in advance what are the row that should be inserted
and you've a batch of rows that should be updated?

Is it still the fasted system to insert them all in a temp table with
copy?

What about the one that have to be updated if you've all the columns,
not just the changed ones?
Is it faster to delete & insert or to update?

updates comes with the same pk as the destination table.

thx

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

#4Ivan Sergio Borgonovo
mail@webthatworks.it
In reply to: Andreas Kretschmer (#2)
[TLM] Re: batch insert/update

On Wed, 26 Dec 2007 20:48:27 +0100
Andreas Kretschmer <akretschmer@spamfence.net> wrote:

blackwater dev <blackwaterdev@gmail.com> schrieb:

I have some php code that will be pulling in a file via ftp.
This file will contain 20,000+ records that I then need to pump
into the postgres db. These records will represent a subset of
the records in a certain table. I basically need an efficient
way to pump these rows into the table, replacing matching rows
(based on id) already there and inserting ones that aren't. Sort
of looping through the result and inserting or updating based on
the presents of the row, what is the best way to handle this?
This is something that will run nightly.

Insert you data to a extra table and work with regular SQL to
insert/update the destination table. You can use COPY to insert the
data into your extra table, this works very fast, but you need a
suitable file format for this.

What if you know in advance what are the row that should be inserted
and you've a batch of rows that should be updated?

Is it still the fasted system to insert them all in a temp table with
copy?

What about the one that have to be updated if you've all the columns,
not just the changed ones?
Is it faster to delete & insert or to update?

updates comes with the same pk as the destination table.

thx

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

#5Tom Hart
tomhart@coopfed.org
In reply to: blackwater dev (#1)
Re: batch insert/update

Ivan Sergio Borgonovo wrote:

What if you know in advance what are the row that should be inserted
and you've a batch of rows that should be updated?

Is it still the fasted system to insert them all in a temp table with
copy?

What about the one that have to be updated if you've all the columns,
not just the changed ones?
Is it faster to delete & insert or to update?

updates comes with the same pk as the destination table.

thx

We heard you the first time

--
Tom Hart
IT Specialist
Cooperative Federal
723 Westcott St.
Syracuse, NY 13210
(315) 471-1116 ext. 202
(315) 476-0567 (fax)

#6Ivan Sergio Borgonovo
mail@webthatworks.it
In reply to: Andreas Kretschmer (#2)
Re: batch insert/update

On Wed, 26 Dec 2007 20:48:27 +0100
Andreas Kretschmer <akretschmer@spamfence.net> wrote:

blackwater dev <blackwaterdev@gmail.com> schrieb:

I have some php code that will be pulling in a file via ftp.
This file will contain 20,000+ records that I then need to pump
into the postgres db. These records will represent a subset of
the records in a certain table. I basically need an efficient
way to pump these rows into the table, replacing matching rows
(based on id) already there and inserting ones that aren't. Sort
of looping through the result and inserting or updating based on
the presents of the row, what is the best way to handle this?
This is something that will run nightly.

Insert you data to a extra table and work with regular SQL to
insert/update the destination table. You can use COPY to insert the
data into your extra table, this works very fast, but you need a
suitable file format for this.

What if you know in advance what are the row that should be inserted
and you've a batch of rows that should be updated?

Is it still the fasted system to insert them all in a temp table with
copy?

What about the one that have to be updated if you've all the columns,
not just the changed ones?
Is it faster to delete & insert or to update?

updates comes with the same pk as the destination table.

thx

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

#7blackwater dev
blackwaterdev@gmail.com
In reply to: Ivan Sergio Borgonovo (#4)
Re: [TLM] Re: batch insert/update

I was also thinking about adding a 'is_new' column to the table which I
would flag as 0, then do a basic copy of all the new rows in with is_new at
1. I'd then do a delete statement to delete all the rows which are
duplicate and have a flag of 0 as the copy should leave me some with two
rows, one with is_new of 1 and some with 0. Just don't know if this would
be best.

On Dec 26, 2007 3:13 PM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:

Show quoted text

On Wed, 26 Dec 2007 20:48:27 +0100
Andreas Kretschmer <akretschmer@spamfence.net> wrote:

blackwater dev <blackwaterdev@gmail.com> schrieb:

I have some php code that will be pulling in a file via ftp.
This file will contain 20,000+ records that I then need to pump
into the postgres db. These records will represent a subset of
the records in a certain table. I basically need an efficient
way to pump these rows into the table, replacing matching rows
(based on id) already there and inserting ones that aren't. Sort
of looping through the result and inserting or updating based on
the presents of the row, what is the best way to handle this?
This is something that will run nightly.

Insert you data to a extra table and work with regular SQL to
insert/update the destination table. You can use COPY to insert the
data into your extra table, this works very fast, but you need a
suitable file format for this.

What if you know in advance what are the row that should be inserted
and you've a batch of rows that should be updated?

Is it still the fasted system to insert them all in a temp table with
copy?

What about the one that have to be updated if you've all the columns,
not just the changed ones?
Is it faster to delete & insert or to update?

updates comes with the same pk as the destination table.

thx

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match