DBI/AutoCommit/Postgres
Hello all,
I'm trying to speed up some insert statements. I have been tinkering
with the postmaster and DBI parameters I did some timings on my insert
and copy commands. Here is a sample insert query:
010430.18:31:18.199 [2604] query: insert into log values
(0,0,lower('blah.blah.mydomain.com'),lower('foo'),lower('bar'),lower('blah'),upper('Me'),
upper('Myself'), upper('I'), upper('INFO'), 'String Here', '20010430
16:00:00')
Pretty straightforward. Table log looks like:
Table "log"
Attribute | Type | Modifier
--------------+-----------+----------
site_id | bigint |
host_id | bigint |
fqdn | varchar() | not null
site | varchar() | not null
region | varchar() | not null
hostname | varchar() | not null
product | varchar() | not null
class | varchar() | not null
subclass | varchar() | not null
status | varchar() | not null
msg | varchar() | not null
remote_stamp | timestamp | not null
tstamp | timestamp | not null
Here are my non-scientific timings:
with AutoCommit on, using DBI across TCP/IP: 1.3 INSERTS/second
with AutoCommit off, DBI, TCP/IP, committing after every 100: 1.6
INSERTS/second
using psql -h host -U user -c "copy log from stdin" dbname < datafile
1.73 rows/second
using COPY LOG FROM 'filename' on the db machine itself: 1.73
rows/second
Another crucial piece of information is that each insert kicks off a
trigger. I did not write the trigger, and do not know how to write
triggers, but I think that might be the contributing factor to the
slowness. Here is the text file used to create the trigger:
drop function update_host_table();
drop trigger incoming_trigger on incoming ;
create function update_host_table()
returns opaque
as 'declare
myrec record;
new_hostid int4;
begin
new.timestamp := now() ;
/* check to see if we have see this machine before */
select * into myrec
from knownhosts k
where k.fqdn = new.fqdn and
k.hostname = new.hostname ;
/* -- if we have not found the machine name we are going to
insert a new record into the knownhosts table and set the init_contact
to now
*/
if not found
then
insert into knownhosts
values (new.fqdn,new.hostname,new.timestamp,new.timestamp) ;
else
update knownhosts
set last_contact = new.timestamp
where knownhosts.fqdn = new.fqdn ;
end if ;
/* now we are going to update the status table with the new record */
select * into myrec
from status s where
s.fqdn = new.fqdn and s.hostname=new.hostname
and s.class=new.class and s.sub_class=new.sub_class ;
if not found
then
insert into status
values (new.fqdn,new.hostname,new.class,
new.sub_class,new.level,new.msg,new.timestamp) ;
else
update status
set level = new.level,
timestamp = new.timestamp
where fqdn=new.fqdn and hostname=new.hostname and
class = new.class and sub_class = new.sub_class ;
end if;
return new;
end ;'
language 'plpgsql';
create trigger incoming_trigger
before insert on incoming
for each row
execute procedure update_host_table();
1.73 INSERTS/second seems awfully slow, but maybe I have set my
expectations too high. Now that you all can see the table and the kind
of data I am trying to put into it, do you have any suggestions? The
hardware specs of the database machine are: Pentium III 733Mhz, 512
megs memory, 7 gigs free on the partition. Seems like I should be
getting a lot more horsepower. I really need to speed this up somehow.
Does anyone see anything in the trigger or otherwise that would cause
this to be so slow?
Thank you very much,
Fran
Fran Fabrizio <ffabrizio@Exchange.WebMD.net> writes:
Another crucial piece of information is that each insert kicks off a
trigger. I did not write the trigger, and do not know how to write
triggers, but I think that might be the contributing factor to the
slowness.
Bingo. Your trigger is adding four complete queries (two selects and
two inserts or updates) for every row inserted.
Think about ways to avoid some or all of those. (For example, does the
knownhosts table really need a last_contact column, or could you obtain
that on-the-fly from a query over the incoming or status table when
needed?) Also look to see if these are being done with reasonable query
plans ... perhaps you need some indexes and/or vacuum operations ...
regards, tom lane