explicit cursor vs. for loop in pl/pgsql
I need to process a large table a few "chunks" at a time, commiting in
between chunks so that another process can pick up and start processing
the data.
I am using a pl/pgsql procedure with a "FOR rec in Select * from tab
order by...." statement. The chunksize is passed in to the procedure,
and in the FOR loop I iterate until I reach chunksize. The procedure
then returns and the calling code issues the commit, etc.
I know from the documentation that the FOR implicitly opens a cursor,
but I'm wondering if there would be any performance advantages to
explicitly declaring a cursor and moving through it with FETCH commands?
I have to use the ORDER BY, so I imagine I'm taking the hit of
processing all the records in the table anyway, regardless of how many I
ultimately fetch. The nature of the data is that chunksize doesn't
necessarily match up one-for-one with rows, so I can't use it as a LIMIT
value.
The table in question gets inserted pretty heavily, and my procedure
processes rows then deletes those it has processed. My main concern is
to keep the processing fairly smooth, i.e., not have it choke on a
select when the table gets huge.
Any suggestions appreciated!
- DAP
------------------------------------------------------------------------
----------
David Parker Tazz Networks (401) 709-5130
"David Parker" <dparker@tazznetworks.com> writes:
I know from the documentation that the FOR implicitly opens a cursor,
but I'm wondering if there would be any performance advantages to
explicitly declaring a cursor and moving through it with FETCH commands?
AFAICS it'd be exactly the same. Might as well stick with the simpler
notation.
I have to use the ORDER BY, so I imagine I'm taking the hit of
processing all the records in the table anyway, regardless of how many I
ultimately fetch.
Not if the ORDER BY can be implemented using an index. Perhaps what you
need is to make sure that an indexscan gets used.
The nature of the data is that chunksize doesn't necessarily match up
one-for-one with rows, so I can't use it as a LIMIT value.
Can you set an upper bound on how many rows you need? If you can put a
LIMIT into the select, it'll encourage the planner to use an indexscan,
even if you break out of the loop before the limit is reached.
regards, tom lane
Thanks for the info. I've got an index, so I guess it's as good as it
gets!
The data is actually copied over from the slony transaction log table,
and there's no way to know how many statements (=rows) there might be
for any given transaction, so assigning an arbitrary limit seems too
risky, and I don't want to take the hit of a select count.
Thanks again.
- DAP
Show quoted text
The nature of the data is that chunksize doesn't necessarily
match up
one-for-one with rows, so I can't use it as a LIMIT value.
Can you set an upper bound on how many rows you need? If you
can put a LIMIT into the select, it'll encourage the planner
to use an indexscan, even if you break out of the loop before
the limit is reached.regards, tom lane
Import Notes
Resolved by subject fallback