DELETE performance issues

Started by Schwenker, Stephenover 19 years ago5 messagesgeneral
Jump to latest
#1Schwenker, Stephen
SSchwenker@thestar.ca

Hello,

I'm having a performance issue with postgresql. I'm using it with JBoss
BPM api and when I use the api to delete a process instance from the
database, it seems like the delete requests are taking quite a long time
to delete. I've also noticed that when I execute the delete, the cpu
load on the postgresql process on the server ramps up to 100%. Can
someone tell me what logging I can turn on so I can figure out what is
causing the high load and slow deletes? I can then use that info to tune
the runtime parameters.

Thank you,

Steve.

#2Michael Fuhr
mike@fuhr.org
In reply to: Schwenker, Stephen (#1)
Re: DELETE performance issues

On Fri, Oct 27, 2006 at 03:40:51PM -0400, Schwenker, Stephen wrote:

I'm having a performance issue with postgresql. I'm using it with JBoss
BPM api and when I use the api to delete a process instance from the
database, it seems like the delete requests are taking quite a long time
to delete.

How long is "quite a long time"? How long are you expecting the
delete to take? Do you get better performance if you don't use the
API, like when executing the same statement in psql? How many rows
are you deleting? How many rows are in the table? Is the delete
able to use an index? Do you have unindexed foreign key references
to the table you're deleting from? What does EXPLAIN ANALYZE show?
Are you regularly vacuuming and analyzing the table? What version
of PostgreSQL are you using? What OS and hardware?

Can someone tell me what logging I can turn on so I can figure out
what is causing the high load and slow deletes? I can then use that
info to tune the runtime parameters.

What non-default settings are you currently using?

--
Michael Fuhr

#3Schwenker, Stephen
SSchwenker@thestar.ca
In reply to: Michael Fuhr (#2)
Re: DELETE performance issues

Hey,

The api uses hibernate to do all the work. Here is a bit of the log of
the sql queries it does. The JBPM_bytelock queries seem to execute
quite fast but when the api starts switching back and forth from
JBPM_BYTEARRAY and JBPM_VARIABLEINSTANCE it seems to run slow. Like
probably about 1 second for each set of 2 deletes. Which means that
they are taking approximately 500ms to execute each one which seems slow
because the application can create records faster than then can be
deleted. I'm working on a way to bulk purge the database but I haven't
got there yet so I'm just trying to tune the current process.

Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_MODULEINSTANCE where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_TOKENVARIABLEMAP where ID_=?
Hibernate: delete from JBPM_MODULEINSTANCE where ID_=?
Hibernate: delete from JBPM_PROCESSINSTANCE where ID_=? and VERSION_=?
Hibernate: delete from JBPM_TOKEN where ID_=? and VERSION_=?

Now, to answer your questions,

All the tables have primary keys and when I explain the queries, they
use the primary key index. I have increased the shared memory which
only seems to help when the same table is accessed synchronously but
this switching back and forth between tables seems to be slow. There
are approximately 600,000 rows in the tables. And these tables do not
have any references point to them but they do reference other tables.
The autovacuum is on.

We're using Postgresql 8.1.4 and the server is a dual 3.2ghz cpu linux
system.

I'm guessing that the problem lies in loading the indexes off the disk
each time the table switches. Is there a way to view what Postgres is
doing in the back end. Ie, loading index from disk or using it in
memory.... Is there a way to see what memory is being used and what's
in it?

Here is my current config changes...
shared_buffers = 24000 # 192MB
work_mem = 50240
max_fsm_pages = 40000
max_fsm_relations = 2000

bgwriter_lru_percent = 50.0
bgwriter_lru_maxpages = 50

wal_buffers = 64

I hope you can all help.

Thank you,

Steve.

-----Original Message-----
From: Michael Fuhr [mailto:mike@fuhr.org]
Sent: Sunday, October 29, 2006 11:01 AM
To: Schwenker, Stephen
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] DELETE performance issues

On Fri, Oct 27, 2006 at 03:40:51PM -0400, Schwenker, Stephen wrote:

I'm having a performance issue with postgresql. I'm using it with
JBoss BPM api and when I use the api to delete a process instance from

the database, it seems like the delete requests are taking quite a
long time to delete.

How long is "quite a long time"? How long are you expecting the delete
to take? Do you get better performance if you don't use the API, like
when executing the same statement in psql? How many rows are you
deleting? How many rows are in the table? Is the delete able to use an
index? Do you have unindexed foreign key references to the table you're
deleting from? What does EXPLAIN ANALYZE show?
Are you regularly vacuuming and analyzing the table? What version of
PostgreSQL are you using? What OS and hardware?

Can someone tell me what logging I can turn on so I can figure out
what is causing the high load and slow deletes? I can then use that
info to tune the runtime parameters.

What non-default settings are you currently using?

--
Michael Fuhr

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Schwenker, Stephen (#3)
Re: DELETE performance issues

"Schwenker, Stephen" <SSchwenker@thestar.ca> writes:

All the tables have primary keys and when I explain the queries, they
use the primary key index. I have increased the shared memory which
only seems to help when the same table is accessed synchronously but
this switching back and forth between tables seems to be slow. There
are approximately 600,000 rows in the tables. And these tables do not
have any references point to them but they do reference other tables.

Are you absolutely sure about that last? Unindexed foreign key
references are the explanation nine times out of ten when someone
complains about deletes being slow.

I'm guessing that the problem lies in loading the indexes off the disk
each time the table switches.

For tables with only 600K rows the indexes should be plenty small enough
to fit in memory on any modern machine. I'm thinking maybe the problem
is enormous index bloat ... does REINDEXing the tables improve matters?

max_fsm_pages = 40000

That seems mighty small :-(. Do a database-wide VACUUM VERBOSE and see
what it says about FSM usage in the last few lines of output.

regards, tom lane

#5Reece Hart
reece@harts.net
In reply to: Tom Lane (#4)
Re: DELETE performance issues

On Thu, 2006-11-02 at 11:27 -0500, Tom Lane wrote:

Are you absolutely sure about that last? Unindexed foreign key
references are the explanation nine times out of ten when someone
complains about deletes being slow.

This was certainly the major cause when I had slow deletes. (The other
problem was chaining of cascading deletes.) In order to help correct
such problems, I wrote some views to identify unindexed, cascading
foreign keys. An example:

rkh@csb-dev=> select * from pgutils.foreign_keys_missing_indexes ;
fk_namespace | fk_relation | fk_column | fk_indexed | pk_namespace | pk_relation | pk_column | pk_indexed | ud ...
--------------+--------------+---------------------+------------+--------------+-------------+-------------+------------+--- ...
gong | node | alias_id | f | gong | alias | alias_id | t | cn ...
taxonomy | node | division_id | f | taxonomy | division | division_id | t | cc ...
gong | alias | go_id | f | gong | node | go_id | t | cc ...

etc...

ud is an abbreviation for update and delete constraint type (cascade,
set null, restrict, etc).

In this view, "indexed" means that the column is the first or only
column in some index, i.e., pg_index.indkey[0] = pg_attribute.attnum. I
suppose that one might want to distinguish the indexing cases more
precisely as unindexed, sole-column index, first col of mult-col index,
second col of multi-col index, etc, but I didn't do so. The views were
originally written for 7.4 and I don't know what's appropriate for
current multicolumn index behavior.

The code is in http://harts.net/reece/pgutils/ .

-Reece

--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0