Fastest way to import only ONE column into a table? (COPY doesn't work)
What is the fastest way to import the values of *only one* column into
an already existing table? Say the table looks like this:
id (primary key)
description
created_on
I want to import only a new column so the table looks like this:
id (primary key)
title
description
created_on
So I have a large-ish text file with the following format:
id1||title1
id2||title2
id3||title3
id4||title4
id5||title5
id6||title6
I try to import it in with
COPY table1 (id, title) from "/usr/tmp/titles.txt"
But this doesn't seem to be working, because COPY doesn't know how to
UPDATE values, it just inserts. Appreciate any tips, because it would
be nasty to have to do this with millions of UPDATE statements!
TIA
On Aug 15, 11:46 pm, phoenix.ki...@gmail.com ("Phoenix Kiula") wrote:
Appreciate any tips, because it would
be nasty to have to do this with millions of UPDATE statements!
- Create an interim table
- COPY the data into it
- Do an UPDATE ... FROM ...
On 16/08/07, Rodrigo De León <rdeleonp@gmail.com> wrote:
On Aug 15, 11:46 pm, phoenix.ki...@gmail.com ("Phoenix Kiula") wrote:
Appreciate any tips, because it would
be nasty to have to do this with millions of UPDATE statements!- Create an interim table
- COPY the data into it
- Do an UPDATE ... FROM ...
Thanks! I thought about it and then gave up because SQL trumped me up.
Could you please suggest what the query should look like?
Based on this:
http://www.postgresql.org/docs/8.1/static/sql-update.html
I tried this:
UPDATE
t1 SET title = title FROM t2
WHERE
t1.id = t2.id;
But I am miles from what it needs to be! The example in the docs does
not refer to a column in the other table, it just increments a value
in the same table based on the join ("UPDATE employees SET sales_count
= sales_count + 1 FROM accounts....").
TIA!
On 16/08/07, Richard Broersma Jr <rabroersma@yahoo.com> wrote:
--- Phoenix Kiula <phoenix.kiula@gmail.com> wrote:On 16/08/07, Rodrigo De León <rdeleonp@gmail.com> wrote:
On Aug 15, 11:46 pm, phoenix.ki...@gmail.com ("Phoenix Kiula") wrote:
Appreciate any tips, because it would
be nasty to have to do this with millions of UPDATE statements!- Create an interim table
- COPY the data into it
- Do an UPDATE ... FROM ...Thanks! I thought about it and then gave up because SQL trumped me up.
Could you please suggest what the query should look like?Based on this:
http://www.postgresql.org/docs/8.1/static/sql-update.htmlI tried this:
UPDATE
t1 SET title = title FROM t2
WHERE
t1.id = t2.id;UPDATE T1
SET T1.title = T2.title
FROM T2
WHERE T1.id = T2.id
AND T1.title IS NULL;or
UPDATE T1
SET title = ( SELECT title
FROM T2
WHERE T2.id = T1.id )
WHERE T1.title IS NULL;
Thanks much RIchard, but neither of those work. For me table t1 has
over 6 million rows, and table t2 has about 600,000. In both of the
queries above I suppose it is going through each and every row of
table t1 and taking its own sweet time. I've dropped all indexes on
t1, but the query has still been running for over 45 minutes as I
write! Any other suggestions?
Import Notes
Reply to msg id not found: 752315.36198.qm@web31814.mail.mud.yahoo.com
Phoenix Kiula wrote:
On 16/08/07, Richard Broersma Jr <rabroersma@yahoo.com> wrote:
--- Phoenix Kiula <phoenix.kiula@gmail.com> wrote:On 16/08/07, Rodrigo De Le�n <rdeleonp@gmail.com> wrote:
On Aug 15, 11:46 pm, phoenix.ki...@gmail.com ("Phoenix Kiula") wrote:
Appreciate any tips, because it would
be nasty to have to do this with millions of UPDATE statements!- Create an interim table
- COPY the data into it
- Do an UPDATE ... FROM ...Thanks! I thought about it and then gave up because SQL trumped me up.
Could you please suggest what the query should look like?Based on this:
http://www.postgresql.org/docs/8.1/static/sql-update.htmlI tried this:
UPDATE
t1 SET title = title FROM t2
WHERE
t1.id = t2.id;UPDATE T1
SET T1.title = T2.title
FROM T2
WHERE T1.id = T2.id
AND T1.title IS NULL;or
UPDATE T1
SET title = ( SELECT title
FROM T2
WHERE T2.id = T1.id )
WHERE T1.title IS NULL;Thanks much RIchard, but neither of those work. For me table t1 has
over 6 million rows, and table t2 has about 600,000. In both of the
queries above I suppose it is going through each and every row of
table t1 and taking its own sweet time. I've dropped all indexes on
t1, but the query has still been running for over 45 minutes as I
write! Any other suggestions?
I'm not sure would it be faster - but you can try to create a function
which will create new empty table, then fill it with the result of
SELECT query. Something like this:
CREATE OR REPLACE FUNCTION add_column () RETURNS INTEGER AS $$
DECLARE
r RECORD;
BEGIN
CREATE TABLE new_table (id integer, value varchar);
FOR r IN select t1.id, t2.title value from t1 left outer join t2 on
(t1.id = t2.id) LOOP
INSERT INTO new_table VALUES(r.id, r.title);
END LOOP;
return 0;
end
$$ LANGUAGE plpgsql;
Try this function and if its' time would be acceptable - you'll need to
drop existing table and rename newly created one.
--- Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
UPDATE T1
SET T1.title = T2.title
FROM T2
WHERE T1.id = T2.id
AND T1.title IS NULL;Thanks much RIchard, but neither of those work. For me table t1 has
over 6 million rows, and table t2 has about 600,000. In both of the
queries above I suppose it is going through each and every row of
table t1 and taking its own sweet time. I've dropped all indexes on
t1, but the query has still been running for over 45 minutes as I
write! Any other suggestions?
You could post the explain plan for this above query to verify your theory. Also it might give
ideas about whether carefully placed indexs would help.
My guess is that you will need to add indexes at least on T1.id and T2.id to help reduce the join
time. Also, since you mention that you already attempted this query, I would view the
postgresql-logs to see if you might need to increase your check-point size temporarily just for
this operation.
Also, since this is hopefully a one time update, you could temporarily turn off fsync during the
update. when the update is done, you should turn fsync back on.
My guess is that a join and update on 6 million records in just going to take a while. Hopefully
this isn't an operation that you will need to preform regularly.
Regards,
Richard Broersma Jr.