explicit cursor vs. for loop in pl/pgsql

Started by David Parkeralmost 21 years ago3 messagesgeneral
Jump to latest
#1David Parker
dparker@tazznetworks.com

I need to process a large table a few "chunks" at a time, commiting in
between chunks so that another process can pick up and start processing
the data.

I am using a pl/pgsql procedure with a "FOR rec in Select * from tab
order by...." statement. The chunksize is passed in to the procedure,
and in the FOR loop I iterate until I reach chunksize. The procedure
then returns and the calling code issues the commit, etc.

I know from the documentation that the FOR implicitly opens a cursor,
but I'm wondering if there would be any performance advantages to
explicitly declaring a cursor and moving through it with FETCH commands?

I have to use the ORDER BY, so I imagine I'm taking the hit of
processing all the records in the table anyway, regardless of how many I
ultimately fetch. The nature of the data is that chunksize doesn't
necessarily match up one-for-one with rows, so I can't use it as a LIMIT
value.

The table in question gets inserted pretty heavily, and my procedure
processes rows then deletes those it has processed. My main concern is
to keep the processing fairly smooth, i.e., not have it choke on a
select when the table gets huge.

Any suggestions appreciated!

- DAP
------------------------------------------------------------------------
----------
David Parker Tazz Networks (401) 709-5130

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Parker (#1)
Re: explicit cursor vs. for loop in pl/pgsql

"David Parker" <dparker@tazznetworks.com> writes:

I know from the documentation that the FOR implicitly opens a cursor,
but I'm wondering if there would be any performance advantages to
explicitly declaring a cursor and moving through it with FETCH commands?

AFAICS it'd be exactly the same. Might as well stick with the simpler
notation.

I have to use the ORDER BY, so I imagine I'm taking the hit of
processing all the records in the table anyway, regardless of how many I
ultimately fetch.

Not if the ORDER BY can be implemented using an index. Perhaps what you
need is to make sure that an indexscan gets used.

The nature of the data is that chunksize doesn't necessarily match up
one-for-one with rows, so I can't use it as a LIMIT value.

Can you set an upper bound on how many rows you need? If you can put a
LIMIT into the select, it'll encourage the planner to use an indexscan,
even if you break out of the loop before the limit is reached.

regards, tom lane

#3David Parker
dparker@tazznetworks.com
In reply to: Tom Lane (#2)
Re: explicit cursor vs. for loop in pl/pgsql

Thanks for the info. I've got an index, so I guess it's as good as it
gets!

The data is actually copied over from the slony transaction log table,
and there's no way to know how many statements (=rows) there might be
for any given transaction, so assigning an arbitrary limit seems too
risky, and I don't want to take the hit of a select count.

Thanks again.

- DAP

Show quoted text

The nature of the data is that chunksize doesn't necessarily

match up

one-for-one with rows, so I can't use it as a LIMIT value.

Can you set an upper bound on how many rows you need? If you
can put a LIMIT into the select, it'll encourage the planner
to use an indexscan, even if you break out of the loop before
the limit is reached.

regards, tom lane