INSERT performance

Started by Razvan Surdulescuover 22 years ago5 messagesgeneral
Jump to latest
#1Razvan Surdulescu
surdules@yahoo.com

I'm running PostgreSQL 7.3.4 on Windows 2000/Cygwin (AMD Athlon XP+
1800, 512MB RAM).

I want to insert about 500 records into a table. The table is heavily
indexed (has about 10-12 indices created on it). The insert is performed
in a transaction block.

If I keep the indices on the table, the insert takes about 12 seconds.
If I drop the indices, do the insert, and then re-create the indices,
the entire process takes about 3-4 seconds.

I am somewhat surprised at both of those performance numbers above -- I
would have expected both of them to be a lot smaller (about half of what
they are). 500 records is not that big of a number! In particular, the
transacted insert without indices should be very fast!

Are these numbers about in range to what I should expect from
PostgreSQL? I looked through the various performance FAQs and I cannot
use the COPY command since the data needs to be processed first, and
this can only be done in memory.

Any suggestions/pointers would be much appreciated.

Thanks,

Razvan.

#2Dann Corbit
DCorbit@connx.com
In reply to: Razvan Surdulescu (#1)
Re: INSERT performance

-----Original Message-----
From: Razvan Surdulescu [mailto:surdules@yahoo.com]
Sent: Wednesday, October 29, 2003 8:41 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] INSERT performance

I'm running PostgreSQL 7.3.4 on Windows 2000/Cygwin (AMD Athlon XP+
1800, 512MB RAM).

I want to insert about 500 records into a table. The table is heavily
indexed (has about 10-12 indices created on it). The insert
is performed
in a transaction block.

If I keep the indices on the table, the insert takes about 12
seconds.
If I drop the indices, do the insert, and then re-create the indices,
the entire process takes about 3-4 seconds.

I am somewhat surprised at both of those performance numbers
above -- I
would have expected both of them to be a lot smaller (about
half of what
they are). 500 records is not that big of a number! In
particular, the
transacted insert without indices should be very fast!

Are these numbers about in range to what I should expect from
PostgreSQL? I looked through the various performance FAQs and
I cannot
use the COPY command since the data needs to be processed first, and
this can only be done in memory.

Any suggestions/pointers would be much appreciated.

Cygwin?
It will be faster under Linux.

500 records in 12 seconds is about 42/second. Hard to know if that is
good or bad. Is the machine under heavy use? Are the records extremely
long?

You can still use the copy command as an API. It will be faster than
the inserts, but there are (of course) caveats with its use.
http://developer.postgresql.org/docs/postgres/libpq-copy.html

Provide the SQL that defines the table and its indexes.

Provide the cardinality of the table.

Provide the average machine load during the insert operation.

Probably, you can get better answers if you provide more information.

What kind of disk drives do you have on your machine? (More
importantly, where does PostgreSQL data reside?)

#3Razvan Surdulescu
surdules@yahoo.com
In reply to: Dann Corbit (#2)
Re: INSERT performance

DCorbit@connx.com ("Dann Corbit") wrote in message news:<D90A5A6C612A39408103E6ECDD77B829408C2E@voyager.corporate.connx.com>...

Cygwin? It will be faster under Linux.

I agree, I would also expect it to be faster under Linux. Would you
expect that the performance under Linux should be an order of
magnitude faster? I know that Cygwin I/O goes through something akin
to a translation layer, but I don't know how much of a performance hit
I should expect from that.

500 records in 12 seconds is about 42/second. Hard to know if that is
good or bad. Is the machine under heavy use? Are the records extremely
long?

No one else is using the machine, and the records are short (at most
around 1k each).

You can still use the copy command as an API. It will be faster than
the inserts, but there are (of course) caveats with its use.
http://developer.postgresql.org/docs/postgres/libpq-copy.html

Thanks, I'll look into it.

Provide the SQL that defines the table and its indexes.

Here is the approximate SQL statement (I cannot provide the original
statement for intellectual property reasons):

CREATE TABLE data (
id char(32) NOT NULL, -- auto-generated from PHP using md5(...)

-- the fieldN fields below have different lengths
field1 varchar(5),
field2 varchar(50),
field3 varchar(10),
...
field 20 varchar(255),

PRIMARY KEY (id)
);

CREATE INDEX idx_field1 ON data(field1);
CREATE INDEX idx_field2 ON data(field2);
...
CREATE INDEX idx_field20 ON data(field20);

Provide the cardinality of the table.

The table is empty (cardinality = 0).

Provide the average machine load during the insert operation.

I will have to measure this and get back to you with it in a future
post. Here is what I can say from memory right now:

* If I do the INSERT with the indexes enabled, the HDD thrashes
visibly (audibly?) and the operation takes about 12 seconds.

* If I drop the indices, do the INSERT, and re-create the indices, the
HDD no longer thrashes, and the operation takes about 3-4 seconds.

Probably, you can get better answers if you provide more information.

I agree -- I hope the information above is more illuminating.

What kind of disk drives do you have on your machine? (More
importantly, where does PostgreSQL data reside?)

I have an ATA-100 7200 RPM HDD. The PostgreSQL data resides on this
drive (which also contains the Cygwin installation).

Thanks again,

Razvan.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Razvan Surdulescu (#3)
Re: INSERT performance

surdules@yahoo.com (Razvan Surdulescu) writes:

DCorbit@connx.com ("Dann Corbit") wrote in message news:<D90A5A6C612A39408103E6ECDD77B829408C2E@voyager.corporate.connx.com>...

500 records in 12 seconds is about 42/second. Hard to know if that is
good or bad. Is the machine under heavy use? Are the records extremely
long?

Provide the SQL that defines the table and its indexes.

CREATE INDEX idx_field1 ON data(field1);
CREATE INDEX idx_field2 ON data(field2);
...
CREATE INDEX idx_field20 ON data(field20);

Uh, do you actually need an index on every column?

It's obvious that the index insertions are where the time is going.
You're getting close to 900 index insertions per second, which is not
bad at all on consumer-grade ATA disk hardware, if you ask me. It might
help to raise shared_buffers, if you didn't already do that ... but the
real solution here is to only index the columns that you are actually
intending to search on.

regards, tom lane

#5Razvan Surdulescu
surdules@yahoo.com
In reply to: Tom Lane (#4)
Re: INSERT performance
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

surdules@yahoo.com (Razvan Surdulescu) writes:

CREATE INDEX idx_field1 ON data(field1);
CREATE INDEX idx_field2 ON data(field2);
...
CREATE INDEX idx_field20 ON data(field20);

Uh, do you actually need an index on every column?

Yes -- I need to search on all these columns, the size
of the table is expected to get very large, and each
column contains very heterogenous data (so indexing
makes sense).

It's obvious that the index insertions are where the
time is going.
You're getting close to 900 index insertions per
second, which is not
bad at all on consumer-grade ATA disk hardware, if
you ask me.

That's helpful to know -- thanks.

It might
help to raise shared_buffers, if you didn't already
do that ... but the
real solution here is to only index the columns that
you are actually
intending to search on.

I will look into the shared_buffers setting -- I have
not done anything with it thus far.

Razvan.

__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree