commiting in/between functions

Started by Magnus Månssonabout 23 years ago4 messagesgeneral
Jump to latest
#1Magnus Månsson
ganja@0x63.nu

Hey ppl,
I am trying to convert some huge tables in a quite huge database (about
50gigs). The convertion requires me to loop through 30 tables with 30
million rows each with a script an I chose plpgsql to do the job though I
have worked with oracles pl/sql before.

Making the script wasnt any big problem, the problem is the amount of memory
it wants to use, if I had a terabyte of memory I do not think that would be
enough, if I try to convert a 100th of the database at a time it uses all
the swap, 1G and the 512M memory.
Though all tables are statistics and static I know that noone changes in
them, so I would like to commit from time to time, but what I have found
commits are not allowed in a function, it has to be one transaction.

Is there any way to pass my problem besides making my plpgsql take arguments
and call the function about 500 times with different arguments? (that would
work if I could commit between to functions, one function to call and one to
do the work, but not even that is allowed)

And if this still isnt supported, is it planned in some how?

thanks in advance..

--
Magnus M�nsson

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Månsson (#1)
Re: commiting in/between functions

Magnus =?iso-8859-1?Q?M=E5nsson?= <ganja@0x63.nu> writes:

Making the script wasnt any big problem, the problem is the amount of memory
it wants to use, if I had a terabyte of memory I do not think that would be
enough, if I try to convert a 100th of the database at a time it uses all
the swap, 1G and the 512M memory.

It would help if you showed us the function and told us what version of
Postgres you're talking about. This could just be a garden-variety
memory leakage problem.

regards, tom lane

#3Magnus Månsson
ganja@0x63.nu
In reply to: Tom Lane (#2)
Re: commiting in/between functions

On Tue, Mar 11, 2003 at 10:34:08AM -0500, Tom Lane wrote:

Magnus =?iso-8859-1?Q?M=E5nsson?= <ganja@0x63.nu> writes:

Making the script wasnt any big problem, the problem is the amount of memory
it wants to use, if I had a terabyte of memory I do not think that would be
enough, if I try to convert a 100th of the database at a time it uses all
the swap, 1G and the 512M memory.

It would help if you showed us the function and told us what version of
Postgres you're talking about. This could just be a garden-variety
memory leakage problem.

regards, tom lane

I have included my function below, a short explaination:
The connectionstats_yyyy_mm_dd tables contains information about connections
on the network, one tables for every day. hosts_yyyy_mm_dd contains
information about the hosts that made the connections, also one table for
every day.
connectionstats_2003_01_01 contains 1.7 million rows and hosts_2003_01_01
98.000 rows.

My goal is to get all connectionstats in one table and all hosts in one
table, though the connectionstats are referring to hosts using the id I have
to recreate the ids and give the connection-row the right id. I also have to
check that I haven't already inserted the host in the hosts-table from
another day (ip and hostname together should be a unique key).

If I run this function it will in some minutes eat up all the avail memory
(512+1024M) and after that the postmaster-process will be killed by the
kernel.

My workaround that I have made now is a version where I have starthour as an
parameter and it runs it in 1 hour-intervals to us less memory (1 day takes
24 function-calls).

The version of postgresql running on the server is 7.1 and if needed I can
test it on data I have duplic�ted to another server with 7.3.

--create or replace function constats()
drop function constats()
create function constats()
returns int2 as '
declare
rec1_ RECORD;
rec2_ RECORD;
rec3_ RECORD;
client_ int;
server_ int;
begin
for rec1_ in select * from connectionstats_2003_01_01 loop
select into rec2_ address, hostname, is_local from hosts_2003_01_01 where id=rec1_.client;
if not found then
raise notice ''foo: %'', rec1_.client;
end if;
select into rec3_ id from new_hosts where address=rec2_.address and hostname=rec2_.hostname;
if not found then
--The host was not found in the new table, insert it.
insert into new_hosts (address, hostname, is_local) values (rec2_.address, rec2_.hostname, rec2_.is_local);
get diagnostics client_ = result_oid;
else
client_ := rec3_.id;
end if;
select into rec2_ address, hostname, is_local from hosts_2003_01_01 where id=rec1_.server;
if not found then
raise notice ''bar'';
end if;
select into rec3_ id from new_hosts where address=rec2_.address and hostname=rec2_.hostname;
if not found then
--The host was not found in the new table, insert it.
insert into new_hosts (address, hostname, is_local) values (rec2_.address, rec2_.hostname, rec2_.is_local);
get diagnostics server_ = result_oid;
else
server_ := rec3_.id;
end if;
insert into new_connectionstats (time, client, server, server_port, protocol, service, inbound, outbound) values
(rec1_.time, client_, server_, rec1_.server_port, rec1_.protocol, rec1_.service, rec1_.inbound, rec1_.outbound);
end loop;
return (0);
end;'
LANGUAGE 'plpgsql';

--
Magnus M�nsson

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Månsson (#3)
Re: commiting in/between functions

Magnus =?iso-8859-1?Q?M=E5nsson?= <ganja@0x63.nu> writes:

On Tue, Mar 11, 2003 at 10:34:08AM -0500, Tom Lane wrote:

It would help if you showed us the function and told us what version of
Postgres you're talking about. This could just be a garden-variety
memory leakage problem.

The version of postgresql running on the server is 7.1 and if needed I can
test it on data I have duplic�ted to another server with 7.3.

Please try 7.3 before we look any further; 7.1 was a long time back and
we've fixed a number of intra-function memory leaks since then.

If you still see it in 7.3, please send along the CREATE TABLE commands
for the tables referenced by the function, and I'll try to duplicate the
problem here.

regards, tom lane