INSERT extremely slow with large data sets
Hi Everyone,
This is my first post here so please tell me to go somewhere else if this
is the wrong place to post questions like this.
I am using PostgreSQL 7.3.2 and have used earlier versions (7.1.x onwards)
and with all of them I noticed same problem with INSERTs when there is a
large data set. Just to so you guys can compare time it takes to insert
one row into a table when there are only few rows present and when there
are thousands:
Rows Present Start Time Finish Time
------------------------------------------------------------
100 1068790804.12 1068790804.12
1000 1068790807.87 1068790807.87
5000 1068790839.26 1068790839.27
10000 1068790909.24 1068790909.26
20000 1068791172.82 1068791172.85
30000 1068791664.06 1068791664.09
40000 1068792369.94 1068792370.0
50000 1068793317.53 1068793317.6
60000 1068794369.38 1068794369.47
As you can see if takes awfully lots of time for me just to have those
values inserted. Now to make a picture a bit clearer for you this table
has lots of information in there, about 25 columns. Also there are few
indexes that I created so that the process of selecting values from there
is faster which by the way works fine. Selecting anything takes under 5
seconds.
Any help would be greatly appreciated even pointing me in the right
direction where to ask this question. By the way I designed the database
this way as my application that uses PGSQL a lot during the execution so
there was a huge need for fast SELECTs. Our experiments are getting larger
and larger every day so fast inserts would be good as well.
Just to note those times above are of INSERTs only. Nothing else done that
would be included in those times. Machine was also free and that was the
only process running all the time and the machine was Intel(R) Pentium(R)
4 CPU 2.40GHz.
Regards,
Slavisa
Hi Everyone,
This is my first post here so please tell me to go somewhere else if this
is the wrong place to post questions like this.
I am using PostgreSQL 7.3.2 and have used earlier versions (7.1.x onwards)
and with all of them I noticed same problem with INSERTs when there is a
large data set. Just to so you guys can compare time it takes to insert
one row into a table when there are only few rows present and when there
are thousands:
Rows Present Start Time Finish Time
------------------------------------------------------------
100 1068790804.12 1068790804.12
1000 1068790807.87 1068790807.87
5000 1068790839.26 1068790839.27
10000 1068790909.24 1068790909.26
20000 1068791172.82 1068791172.85
30000 1068791664.06 1068791664.09
40000 1068792369.94 1068792370.0
50000 1068793317.53 1068793317.6
60000 1068794369.38 1068794369.47
As you can see if takes awfully lots of time for me just to have those
values inserted. Now to make a picture a bit clearer for you this table
has lots of information in there, about 25 columns. Also there are few
indexes that I created so that the process of selecting values from there
is faster which by the way works fine. Selecting anything takes under 5
seconds.
Any help would be greatly appreciated even pointing me in the right
direction where to ask this question. By the way I designed the database
this way as my application that uses PGSQL a lot during the execution so
there was a huge need for fast SELECTs. Our experiments are getting larger
and larger every day so fast inserts would be good as well.
Just to note those times above are of INSERTs only. Nothing else done that
would be included in those times. Machine was also free and that was the
only process running all the time and the machine was Intel(R) Pentium(R)
4 CPU 2.40GHz.
Regards,
Slavisa
There is a pgsql-performance list, which was created for questions like yours.
Your problem was brought up many times before, so searching the archives is
an alternative.
Regards, Christoph
Show quoted text
Hi Everyone,
This is my first post here so please tell me to go somewhere else if this
is the wrong place to post questions like this.I am using PostgreSQL 7.3.2 and have used earlier versions (7.1.x onwards)
and with all of them I noticed same problem with INSERTs when there is a
large data set. Just to so you guys can compare time it takes to insert
one row into a table when there are only few rows present and when there
are thousands:Rows Present Start Time Finish Time
------------------------------------------------------------
100 1068790804.12 1068790804.12
1000 1068790807.87 1068790807.87
5000 1068790839.26 1068790839.27
10000 1068790909.24 1068790909.26
20000 1068791172.82 1068791172.85
30000 1068791664.06 1068791664.09
40000 1068792369.94 1068792370.0
50000 1068793317.53 1068793317.6
60000 1068794369.38 1068794369.47As you can see if takes awfully lots of time for me just to have those
values inserted. Now to make a picture a bit clearer for you this table
has lots of information in there, about 25 columns. Also there are few
indexes that I created so that the process of selecting values from there
is faster which by the way works fine. Selecting anything takes under 5
seconds.Any help would be greatly appreciated even pointing me in the right
direction where to ask this question. By the way I designed the database
this way as my application that uses PGSQL a lot during the execution so
there was a huge need for fast SELECTs. Our experiments are getting larger
and larger every day so fast inserts would be good as well.Just to note those times above are of INSERTs only. Nothing else done that
would be included in those times. Machine was also free and that was the
only process running all the time and the machine was Intel(R) Pentium(R)
4 CPU 2.40GHz.Regards,
Slavisa
Import Notes
Reply to msg id not found: fromSlavisaGaricatNov14103918am | Resolved by subject fallback
-----Original Message-----
From: Slavisa Garic [mailto:Slavisa.Garic@infotech.monash.edu.au]
Sent: Thursday, November 13, 2003 11:37 PM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] INSERT extremely slow with large data setsHi Everyone,
This is my first post here so please tell me to go somewhere
else if this is the wrong place to post questions like this.I am using PostgreSQL 7.3.2 and have used earlier versions
(7.1.x onwards) and with all of them I noticed same problem
with INSERTs when there is a large data set. Just to so you
guys can compare time it takes to insert one row into a table
when there are only few rows present and when there are thousands:Rows Present Start Time Finish Time
------------------------------------------------------------
100 1068790804.12 1068790804.12
1000 1068790807.87 1068790807.87
5000 1068790839.26 1068790839.27
10000 1068790909.24 1068790909.26
20000 1068791172.82 1068791172.85
30000 1068791664.06 1068791664.09
40000 1068792369.94 1068792370.0
50000 1068793317.53 1068793317.6
60000 1068794369.38 1068794369.47As you can see if takes awfully lots of time for me just to
have those values inserted. Now to make a picture a bit
clearer for you this table has lots of information in there,
about 25 columns. Also there are few indexes that I created
so that the process of selecting values from there is faster
which by the way works fine. Selecting anything takes under 5 seconds.Any help would be greatly appreciated even pointing me in the
right direction where to ask this question. By the way I
designed the database this way as my application that uses
PGSQL a lot during the execution so there was a huge need for
fast SELECTs. Our experiments are getting larger and larger
every day so fast inserts would be good as well.Just to note those times above are of INSERTs only. Nothing
else done that would be included in those times. Machine was
also free and that was the only process running all the time
and the machine was Intel(R) Pentium(R) 4 CPU 2.40GHz.
You should post the schema for the table in question when you ask a
question like this.
The behavior is not surprising in the least bit. Every database will
perform in this way, since you have mentioned that you have indexes on
the table.
The depth of the tree will be proportional to the log of the row count.
As the tree gets deeper, inserts will be more and more expensive.
If you have a giant pile of stuff to insert, consider the COPY command
or API if it is time critical.
Import Notes
Resolved by subject fallback
On Fri, Nov 14, 2003 at 06:36:41PM +1100, Slavisa Garic wrote:
Rows Present Start Time Finish Time
------------------------------------------------------------
100 1068790804.12 1068790804.12
1000 1068790807.87 1068790807.87
5000 1068790839.26 1068790839.27
10000 1068790909.24 1068790909.26
20000 1068791172.82 1068791172.85
30000 1068791664.06 1068791664.09
40000 1068792369.94 1068792370.0
50000 1068793317.53 1068793317.6
60000 1068794369.38 1068794369.47
[too slow]
Ok, so inserting 60000 rows seems to take 0.09 seconds, and inserting
5000 takes only 0.01. And your problem is exactly what?
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Coge la flor que hoy nace alegre, ufana. �Qui�n sabe si nacera otra ma�ana?"
On Fri, 14 Nov 2003, Alvaro Herrera wrote:
On Fri, Nov 14, 2003 at 06:36:41PM +1100, Slavisa Garic wrote:
Rows Present Start Time Finish Time
------------------------------------------------------------
100 1068790804.12 1068790804.12
1000 1068790807.87 1068790807.87
5000 1068790839.26 1068790839.27
10000 1068790909.24 1068790909.26
20000 1068791172.82 1068791172.85
30000 1068791664.06 1068791664.09
40000 1068792369.94 1068792370.0
50000 1068793317.53 1068793317.6
60000 1068794369.38 1068794369.47[too slow]
Ok, so inserting 60000 rows seems to take 0.09 seconds, and inserting
5000 takes only 0.01. And your problem is exactly what?
You didn't understand the question. Inserting ONE ROW when there are already
5000 ROWS present takes 0.01 seconds. Inserting ONE ROW when there are
already 60000 ROWS present takes 0.09 secods. In other words in takes
about 9 times more time to insert ONE ROW when there is a larger set of
data already in the database. As my experiments will grow and more data
will be inserted this is getting to take too long. Inserting 70000 rows
takes about just over an hour. INserting 5000 takes about minute and an
half.
I don't know if this the behaviour to be expected so that is why i posted
the question and that is my problem. I also wanted to know what can be
done to improve this if it can be,
Regards,
Slavisa
Show quoted text
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Coge la flor que hoy nace alegre, ufana. �Qui�n sabe si nacera otra ma�ana?"
Hi Dann
Here is the schema and also could you just be more specific on COPY
command. ALso does talking dirrectly to API speed things up ? (I am new to
databases but i am learning quickly)
-- NimrodEnfJob --
create table NimrodEnfJob(
exp_id INTEGER not null references NimrodEnfExperiment,
task_id INTEGER not null references NimrodTask,
pgroup_id INTEGER not null references NimrodParameterGroup,
agent_id INTEGER references NimrodAgent on delete set null,
jobname varchar(255) not null,
admin char(1) not null default 'F'
check (admin in ('F', 'T')),
taskname varchar(255) not null,
param_text TEXT not null,
open char(1) not null default 'F'
check (open in ('F', 'T')),
control varchar(8) not null default 'start'
check (control in ('nostart', 'start', 'stop')),
status varchar(16) not null default 'ready'
check (status in ('ready', 'executing', 'failed',
'done')),
cpulength real not null,
sleeptime real not null,
filesize real not null,
cputime real,
waittime real,
filetime real,
filebytes integer,
priority integer not null default 100,
create_time timestamp not null default CURRENT_TIMESTAMP,
start_time timestamp,
finish_time timestamp,
budget real not null default 0.0,
servername varchar(255),
error_info varchar(255) not null default '',
more_info TEXT not null default '',
primary key (exp_id, jobname),
foreign key (exp_id, taskname) references NimrodEnfTask
);
Also these are the indexes on this table. I created them on the columnt
that are most commonly accessed:
create unique index nej_idx
ON NimrodEnfJob (exp_id, pgroup_id);
create unique index nej_idx1
ON NimrodEnfJob (pgroup_id);
create index nej_idx2
ON NimrodEnfJob (status);
create unique index nej_idx3
ON NimrodEnfJob (status, pgroup_id);
create index nej_idx4
ON NimrodEnfJob (status, agent_id);
create index nej_idx5
ON NimrodEnfJob (agent_id);
I did notice that removing those indexes doesn't import by much. Similar
behaviour is observed but it just takes a bit less time to insert (0.01
less then usually at 60000 records)
Regards,
Slavisa
On Fri, 14 Nov 2003, Dann Corbit
wrote:
Show quoted text
-----Original Message-----
From: Slavisa Garic [mailto:Slavisa.Garic@infotech.monash.edu.au]
Sent: Thursday, November 13, 2003 11:37 PM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] INSERT extremely slow with large data setsHi Everyone,
This is my first post here so please tell me to go somewhere
else if this is the wrong place to post questions like this.I am using PostgreSQL 7.3.2 and have used earlier versions
(7.1.x onwards) and with all of them I noticed same problem
with INSERTs when there is a large data set. Just to so you
guys can compare time it takes to insert one row into a table
when there are only few rows present and when there are thousands:Rows Present Start Time Finish Time
------------------------------------------------------------
100 1068790804.12 1068790804.12
1000 1068790807.87 1068790807.87
5000 1068790839.26 1068790839.27
10000 1068790909.24 1068790909.26
20000 1068791172.82 1068791172.85
30000 1068791664.06 1068791664.09
40000 1068792369.94 1068792370.0
50000 1068793317.53 1068793317.6
60000 1068794369.38 1068794369.47As you can see if takes awfully lots of time for me just to
have those values inserted. Now to make a picture a bit
clearer for you this table has lots of information in there,
about 25 columns. Also there are few indexes that I created
so that the process of selecting values from there is faster
which by the way works fine. Selecting anything takes under 5 seconds.Any help would be greatly appreciated even pointing me in the
right direction where to ask this question. By the way I
designed the database this way as my application that uses
PGSQL a lot during the execution so there was a huge need for
fast SELECTs. Our experiments are getting larger and larger
every day so fast inserts would be good as well.Just to note those times above are of INSERTs only. Nothing
else done that would be included in those times. Machine was
also free and that was the only process running all the time
and the machine was Intel(R) Pentium(R) 4 CPU 2.40GHz.You should post the schema for the table in question when you ask a
question like this.The behavior is not surprising in the least bit. Every database will
perform in this way, since you have mentioned that you have indexes on
the table.The depth of the tree will be proportional to the log of the row count.
As the tree gets deeper, inserts will be more and more expensive.If you have a giant pile of stuff to insert, consider the COPY command
or API if it is time critical.
-----Original Message-----
From: Slavisa Garic [mailto:Slavisa.Garic@infotech.monash.edu.au]
Sent: Friday, November 14, 2003 5:12 PM
To: Dann Corbit
Cc: Slavisa Garic; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] INSERT extremely slow with large data setsHi Dann
Here is the schema and also could you just be more specific
on COPY command.
http://www.postgresql.org/docs/7.3/static/sql-copy.html
And
http://techdocs.postgresql.org/techdocs/usingcopy.php
May be helpful.
ALso does talking dirrectly to API speed
things up ? (I am new to databases but i am learning quickly)
Not particularly. This is the copy command API:
http://www.postgresql.org/docs/7.3/static/libpq-copy.html
What the API can allow you to do (for instance) would be to never let
the data touch the ground. Instead of writing to a text file or even a
binary format copy input file, you use the API to take the incoming data
and insert it directly.
Like everything else, there is a dark side. Read the documents and they
will explain it. But if you need to move a giant pile of data into the
database as fast as possible, it is the copy command that is the most
efficient.
-- NimrodEnfJob --
create table NimrodEnfJob(
exp_id INTEGER not null references NimrodEnfExperiment,
task_id INTEGER not null references NimrodTask,
pgroup_id INTEGER not null references
NimrodParameterGroup,
agent_id INTEGER references NimrodAgent on
delete set null,
jobname varchar(255) not null,
admin char(1) not null default 'F'
check (admin in ('F', 'T')),
taskname varchar(255) not null,
param_text TEXT not null,
open char(1) not null default 'F'
check (open in ('F', 'T')),
control varchar(8) not null default 'start'
check (control in ('nostart', 'start', 'stop')),
status varchar(16) not null default 'ready'
check (status in ('ready', 'executing', 'failed',
'done')),
cpulength real not null,
sleeptime real not null,
filesize real not null,
cputime real,
waittime real,
filetime real,
filebytes integer,
priority integer not null default 100,
create_time timestamp not null default CURRENT_TIMESTAMP,
start_time timestamp,
finish_time timestamp,
budget real not null default 0.0,
servername varchar(255),
error_info varchar(255) not null default '',
more_info TEXT not null default '',
primary key (exp_id, jobname),
foreign key (exp_id, taskname) references NimrodEnfTask
);Also these are the indexes on this table. I created them on
the columnt that are most commonly accessed:
create unique index nej_idx
ON NimrodEnfJob (exp_id, pgroup_id);create unique index nej_idx1
ON NimrodEnfJob (pgroup_id);create index nej_idx2
ON NimrodEnfJob (status);create unique index nej_idx3
ON NimrodEnfJob (status, pgroup_id);create index nej_idx4
ON NimrodEnfJob (status, agent_id);create index nej_idx5
ON NimrodEnfJob (agent_id);I did notice that removing those indexes doesn't import by
much. Similar behaviour is observed but it just takes a bit
less time to insert (0.01 less then usually at 60000 records)
I am quite surprised that removing the indexes does not have a large
impact on insert speed, especially, since you have 6 of them. Most of
the other costs that I can think of are fixed for inserts into a "bare
table". Perhaps someone with more intimate knowledge of the inner
working may know why inserts into a table without any index will trail
off in speed as the table grows.
[snip]
Import Notes
Resolved by subject fallback
Slavisa Garic <Slavisa.Garic@infotech.monash.edu.au> writes:
You didn't understand the question. Inserting ONE ROW when there are already
5000 ROWS present takes 0.01 seconds. Inserting ONE ROW when there are
already 60000 ROWS present takes 0.09 secods.
The numbers you presented didn't really offer any strong grounds for
believing that there's an O(N) growth rate --- as far as I can see your
results are only barely out of the measurement-noise category. Can you
run some tests where the issue is not in the least significant digit
of the available numbers?
But assuming for the moment that you've got hold of a real problem...
The actual insertion of a row should be essentially a constant-time
operation, since we just stick it into the last page of the table
(or any page with sufficient free space). Insertion of index entries
for the row would have cost that depends on the number of existing
table entries, but for btree indexes I'd expect the cost to vary as
O(log2(N)) not O(N). I do not think you've presented enough evidence
to prove that you're seeing linear rather than log-N cost growth.
Most of the serious insertion-cost problems we've seen lately have
to do with the costs of checking foreign key references ... but those
normally vary with the size of the referenced table, not the table into
which you're inserting. Besides which you mentioned nothing about
foreign keys ... or any other schema details as far as I saw ...
regards, tom lane