Performance problem with query

Started by Christian Rengstlover 19 years ago2 messagesgeneral
Jump to latest
#1Christian Rengstl
Christian.Rengstl@klinik.uni-regensburg.de

Good morning list,

the following query takes about 15 to 20 minutes for around 2 million lines in the file myfile.txt, but with 8 million lines it takes around 5 hours and i just don't understand why there is such a huge discrepancy in performance.

COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';

INSERT INTO public.master(pid,smid, val1, val2, chr)
SELECT pid, smid, val1, val12, CAST(split_part((SELECT chr from public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

INSERT INTO public.values(smid, pos, chr)
SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from public.temp_table LIMIT 1), '_', 2) as int2)
FROM public.temp_table;

I came up with this query, because i wanted to use the COPY command to load huge files into the db, but i don't want to load all the columns contained in the file in only one table but copy some of them into one table and some in a second table. As i found out with logging, the data is loaded into temp_table within 15 minutes, but to transfer it from the temp_table to public.master took around 5 hours and from temp_table to public.values took again only something like 10 minutes. Can it be that the cast takes up so much more time than when reading and transferring 2 million lines?

Thanks for any advice!

--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

#2Merlin Moncure
mmoncure@gmail.com
In reply to: Christian Rengstl (#1)
Re: Performance problem with query

On 7/13/06, Christian Rengstl
<Christian.Rengstl@klinik.uni-regensburg.de> wrote:

Good morning list,

the following query takes about 15 to 20 minutes for around 2 million lines in the file myfile.txt, but with 8 million lines it takes around 5 hours and i just don't understand why there is such a huge discrepancy in performance.

on the surface it doesn't make sense, can you post an explain analyze?

merlin