slow inserts and updates on large tables

Started by Jim Mercerabout 27 years ago9 messagesgeneral
Jump to latest
#1Jim Mercer
jim@reptiles.org

i'm developing a database for one of my clients.

unfortunately, the combination of PostgreSQL 6.4 and FreeBSD 3.0 don't
seem to be giving me the performance i would expect.

i have a table which has some 60 fields, largely fixed length strings
(of a variety of sizes) and indexed on a 20 character string field.

currently i have some 5,866,667 records in the table and when i use
a c program to insert records, i get a max of 102 inserts per second.

updates average about 40/second, sometimes much slower.

i intend to have a sliding group of about 10-15 million records in this
table (deleting some each night, while constantly adding new records).

if 100 inserts/second and 40 updates/second is all i can realistically
expect, then i will have to investigate alternate databases (likely
commercial ones like Progress on SCO, yech!).

if anyone has any pointers as to why this is so slow, lemme know.

the system is:
FreeBSD 3.0-RELEASE
CPU: Pentium II (299.17-MHz 686-class CPU)
avail memory = 62836736 (61364K bytes)
ahc0: <Adaptec aic7880 Ultra SCSI adapter> rev 0x00 int a irq 15 on pci1.4.0
da1: <IBM DDRS-34560W S71D> Fixed Direct Access SCSI2 device
da1: 40.0MB/s transfers (20.0MHz, offset 8, 16bit), Tagged Queueing Enabled
da2: <IBM DDRS-34560W S71D> Fixed Direct Access SCSI2 device
da2: 40.0MB/s transfers (20.0MHz, offset 8, 16bit), Tagged Queueing Enabled
da3: <QUANTUM VIKING 4.5 WSE 880P> Fixed Direct Access SCSI2 device
da3: 40.0MB/s transfers (20.0MHz, offset 8, 16bit), Tagged Queueing Enabled
da4: <QUANTUM VIKING 4.5 WSE 880P> Fixed Direct Access SCSI2 device
da4: 40.0MB/s transfers (20.0MHz, offset 8, 16bit), Tagged Queueing Enabled

postgres v6.4 (not 6.4.2 yet)

the databases live on a 16 gig striped (not RAID) array across the
above 4 drives.

the operating system lives on a separate drive.

we are going to upgrade to 512M RAM soon, but i don't think that RAM is the
issue on this beast.

bonnie and iozone show that the array is kicking ass as far as throughput goes.

--
[ Jim Mercer Reptilian Research jim@reptiles.org +1 416 410-5633 ]
[ The telephone, for those of you who have forgotten, was a commonly used ]
[ communications technology in the days before electronic mail. ]
[ They're still easy to find in most large cities. -- Nathaniel Borenstein ]

#2Vadim Mikheev
vadim@krs.ru
In reply to: Jim Mercer (#1)
Re: [GENERAL] slow inserts and updates on large tables

Jim Mercer wrote:

i'm developing a database for one of my clients.

unfortunately, the combination of PostgreSQL 6.4 and FreeBSD 3.0 don't
seem to be giving me the performance i would expect.

i have a table which has some 60 fields, largely fixed length strings
(of a variety of sizes) and indexed on a 20 character string field.

currently i have some 5,866,667 records in the table and when i use
a c program to insert records, i get a max of 102 inserts per second.

updates average about 40/second, sometimes much slower.

i intend to have a sliding group of about 10-15 million records in this
table (deleting some each night, while constantly adding new records).

if 100 inserts/second and 40 updates/second is all i can realistically
expect, then i will have to investigate alternate databases (likely
commercial ones like Progress on SCO, yech!).

if anyone has any pointers as to why this is so slow, lemme know.

What's -B ?
Do you use BEGIN/END to run _many_ updates/inserts in
_single_ transaction ?

Vadim

#3Jim Mercer
jim@reptiles.org
In reply to: Vadim Mikheev (#2)
Re: [GENERAL] slow inserts and updates on large tables

if 100 inserts/second and 40 updates/second is all i can realistically
expect, then i will have to investigate alternate databases (likely
commercial ones like Progress on SCO, yech!).

if anyone has any pointers as to why this is so slow, lemme know.

What's -B ?

it is whatever the default is.

Do you use BEGIN/END to run _many_ updates/inserts in
_single_ transaction ?

i tried that and it didn't seem to make much difference.

--
[ Jim Mercer Reptilian Research jim@reptiles.org +1 416 410-5633 ]
[ The telephone, for those of you who have forgotten, was a commonly used ]
[ communications technology in the days before electronic mail. ]
[ They're still easy to find in most large cities. -- Nathaniel Borenstein ]

#4Herouth Maoz
herouth@oumail.openu.ac.il
In reply to: Jim Mercer (#1)
Re: [GENERAL] slow inserts and updates on large tables

At 5:02 +0200 on 17/2/99, Jim Mercer wrote:

if anyone has any pointers as to why this is so slow, lemme know.

Have you checked the usual stuff:

1) Each insert and update creates its own transaction. That is, an
insert is in fact:

BEGIN TRANSACTION;
INSERT...;
COMMIT;

So, to make things faster you should BEGIN TRANSACTION explicitly
before all the inserts and COMMIT after them. Or separate into
manageable bulks if you run into a memory problem.

2) Indexing tables mean faster SELECTs at the expense of slower INSERTs
and UPDATEs. There is no magic. The reasoning is that normally you
query the data a lot more than you change it.

Thus, it is preferable, before doing bulk inserts, to drop the
indices and recreate them afterwards. This is true when you are not
expectind the database to be queried at the same time the inserts are
made.

As for updates, it's trickier, because you actually use the index
for the WHERE part of the update. If speed is of an essence, I would
probably try the following:

SELECT * INTO TABLE temp_table FROM your_table WHERE update_condition;
DELETE FROM your_table WHERE update_condition;
DROP INDEX...;
INSERT INTO your_table SELECT ... FROM temp_table; -- update within select
CREATE INDEX...;

3) Back to the issue of INSERTS - copies are faster. If you can transform
the data into tab-delimited format as required by COPY, you save a lot
of time on parsing, planning etc.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma

#5Jim Mercer
jim@reptiles.org
In reply to: Herouth Maoz (#4)
Re: [GENERAL] slow inserts and updates on large tables

if anyone has any pointers as to why this is so slow, lemme know.

Have you checked the usual stuff:

1) Each insert and update creates its own transaction. That is, an
insert is in fact:

BEGIN TRANSACTION;
INSERT...;
COMMIT;

So, to make things faster you should BEGIN TRANSACTION explicitly
before all the inserts and COMMIT after them. Or separate into
manageable bulks if you run into a memory problem.

i have a script which times various methods:

------
data is 6047 records
insertama time was 56 seconds
psqlinsert time was 51 seconds
traninsert time was 51 seconds
psqlselect time was 0
------

the script drops and recreates the testtable before each test.

insertama read the data from a text file and uses PQexec to insert each record.
it does not use begin/commit.

psqlinsert is "psql -f insert.sql" where insert.sql is a file of insert
commands (created by printf'ing the PQexec args from insertama).

traninsert is the same as psqlinsert, except it has "begin transaction" and
"commit transaction" at the start and end.

psqlselect is a "psql -c select count(callid) from testtable".

so, according to my tests, the begin/commit stuff doesn't do me any good.

2) Indexing tables mean faster SELECTs at the expense of slower INSERTs
and UPDATEs. There is no magic. The reasoning is that normally you
query the data a lot more than you change it.

Thus, it is preferable, before doing bulk inserts, to drop the
indices and recreate them afterwards. This is true when you are not
expectind the database to be queried at the same time the inserts are
made.

the production database currently has over 5 million records, and is expected
to run at 10-15 million is full production.

the inserts happen in batches, every 10 minutes or so.

recreating the index takes some 30 minutes as it is.

however, if i use the above bench test script, but with the index removed,
i get:

------
data is 6047 records
insertama time was 53 seconds
psqlinsert time was 47 seconds
traninsert time was 48 seconds
psqlselect time was 0 seconds
------

as you can see, it didn't make much difference.

As for updates, it's trickier, because you actually use the index
for the WHERE part of the update. If speed is of an essence, I would
probably try the following:

SELECT * INTO TABLE temp_table FROM your_table WHERE update_condition;
DELETE FROM your_table WHERE update_condition;
DROP INDEX...;
INSERT INTO your_table SELECT ... FROM temp_table; -- update within select
CREATE INDEX...;

as stated above, the size of the table means i can't do this on every update.

3) Back to the issue of INSERTS - copies are faster. If you can transform
the data into tab-delimited format as required by COPY, you save a lot
of time on parsing, planning etc.

this sorta defeats the purpose of putting the data in an SQL database. 8^)

--
[ Jim Mercer Reptilian Research jim@reptiles.org +1 416 410-5633 ]
[ The telephone, for those of you who have forgotten, was a commonly used ]
[ communications technology in the days before electronic mail. ]
[ They're still easy to find in most large cities. -- Nathaniel Borenstein ]

#6Jim Mercer
jim@reptiles.org
In reply to: Jim Mercer (#5)
Re: [GENERAL] slow inserts and updates on large tables

How about -o -F ?

the startup script is:

[ -x /usr/local/pgsql/bin/postmaster ] && {
su -l pgsql -c 'exec /usr/local/pgsql/bin/postmaster -i -S -o -F -d 3 \
-D/usr/local/pgsql/data' >> /usr/local/pgsql/errlog 2>&1
echo -n ' pgsql'
}

--
[ Jim Mercer Reptilian Research jim@reptiles.org +1 416 410-5633 ]
[ The telephone, for those of you who have forgotten, was a commonly used ]
[ communications technology in the days before electronic mail. ]
[ They're still easy to find in most large cities. -- Nathaniel Borenstein ]

#7Herouth Maoz
herouth@oumail.openu.ac.il
In reply to: Jim Mercer (#5)
Re: [GENERAL] slow inserts and updates on large tables

At 16:10 +0200 on 17/2/99, Jim Mercer wrote:

3) Back to the issue of INSERTS - copies are faster. If you can transform
the data into tab-delimited format as required by COPY, you save a lot
of time on parsing, planning etc.

this sorta defeats the purpose of putting the data in an SQL database. 8^)

You probably didn't understand me. If you convert it to tab delimited text
and then use COPY table_name FROM filename/stdin instead of INSERT, it will
be much faster, because you don't have to do the parsing and planning on
each line, but only on the whole copy.

I didn't tell you to use the data directly from those text files...

In fact, it doesn't require using text files at all, just reformatting your
program. If until now it did

- - - -

while (data_still_coming) {

sprintf( command, "INSERT INTO table1 VALUES( %s, %s, %s )",
item1, item2, item3 );

PQexec( con, command );
}

- - - -

Now you have to do instead

- - - -

PQexec( con, "COPY table1 FROM stdin" );

while (data_still_coming) {

sprintf( line, "%s\t%s\t%s\n" , item1, item2, item3 );
PQputline( con, line );

}

PQputline( con, ".\n" );
PQendcopy(con);

- - - -

It's simply a different formatting to your data insertion.

Herouth

#8Jim Mercer
jim@reptiles.org
In reply to: Herouth Maoz (#7)
Re: [GENERAL] slow inserts and updates on large tables

At 16:10 +0200 on 17/2/99, Jim Mercer wrote:
You probably didn't understand me. If you convert it to tab delimited text
and then use COPY table_name FROM filename/stdin instead of INSERT, it will
be much faster, because you don't have to do the parsing and planning on
each line, but only on the whole copy.

I didn't tell you to use the data directly from those text files...

PQexec( con, "COPY table1 FROM stdin" );

while (data_still_coming) {

sprintf( line, "%s\t%s\t%s\n" , item1, item2, item3 );
PQputline( con, line );

}

PQputline( con, ".\n" );
PQendcopy(con);

i will test this with my insertama program, but i see some problems with this.

firstly, it assumes that all of your applications programs are updated each
time you modify the structure of the table.

i am using "insert into testtable (fieldname1, fieldname2) values ('1', '2');"

this allows the applications to remain unchanged if new fields are added.

also, it doesn't seem to address the issue of updates, which suffer from worse
performance than inserts.

what is a realistic number of inserts per second under postgresql, with or
without an index?

--
[ Jim Mercer Reptilian Research jim@reptiles.org +1 416 410-5633 ]
[ The telephone, for those of you who have forgotten, was a commonly used ]
[ communications technology in the days before electronic mail. ]
[ They're still easy to find in most large cities. -- Nathaniel Borenstein ]

#9Herouth Maoz
herouth@oumail.openu.ac.il
In reply to: Jim Mercer (#8)
Re: [GENERAL] slow inserts and updates on large tables

At 16:47 +0200 on 17/2/99, Jim Mercer wrote:

i will test this with my insertama program, but i see some problems with
this.

firstly, it assumes that all of your applications programs are updated each
time you modify the structure of the table.

This is true. That's the sacrifice you get for COPY's fast transfers.

also, it doesn't seem to address the issue of updates, which suffer
from worse performance than inserts.

Did you try my trick, but without removing the indices? Move the data over
to a temporary table, delete from the original, insert updated data back?
(Assuming you don't have a separate update for each line).

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma