Read table rows in chunks

Started by Sushrut Shivaswamyover 1 year ago3 messages
#1Sushrut Shivaswamy
sushrut.shivaswamy@gmail.com

Hey,

I"m trying to read the rows of a table in chunks to process them in a
background worker.
I want to ensure that each row is processed only once.

I was thinking of using the `SELECT * ... OFFSET {offset_size} LIMIT
{limit_size}` functionality for this but I"m running into issues.

Some approaches I had in mind that aren't working out:
- Try to use the transaction id to query rows created since the last
processed transaction id
- It seems Postgres does not expose row transaction ids so this
approach is not feasible
- Rely on OFFSET / LIMIT combination to query the next chunk of data
- SELECT * does not guarantee ordering of rows so it's possible older
rows repeat or newer rows are missed in a chunk

Can you please suggest any alternative to periodically read rows from a
table in chunks while processing each row exactly once.

Thanks,
Sushrut

#2Kashif Zeeshan
kashi.zeeshan@gmail.com
In reply to: Sushrut Shivaswamy (#1)
Re: Read table rows in chunks

Hi

You can also use the following approaches.

1. Cursors
2. FETCH with OFFSET clause

Regards
Kashif Zeeshan
Bitnine Global

On Sat, Apr 27, 2024 at 12:47 PM Sushrut Shivaswamy <
sushrut.shivaswamy@gmail.com> wrote:

Show quoted text

Hey,

I"m trying to read the rows of a table in chunks to process them in a
background worker.
I want to ensure that each row is processed only once.

I was thinking of using the `SELECT * ... OFFSET {offset_size} LIMIT
{limit_size}` functionality for this but I"m running into issues.

Some approaches I had in mind that aren't working out:
- Try to use the transaction id to query rows created since the last
processed transaction id
- It seems Postgres does not expose row transaction ids so this
approach is not feasible
- Rely on OFFSET / LIMIT combination to query the next chunk of data
- SELECT * does not guarantee ordering of rows so it's possible
older rows repeat or newer rows are missed in a chunk

Can you please suggest any alternative to periodically read rows from a
table in chunks while processing each row exactly once.

Thanks,
Sushrut

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Sushrut Shivaswamy (#1)
Re: Read table rows in chunks

On Sat, Apr 27, 2024 at 12:47 AM Sushrut Shivaswamy <
sushrut.shivaswamy@gmail.com> wrote:

I"m trying to read the rows of a table in chunks to process them in a
background worker.

This list really isn't the place for this kind of discussion. You are
doing application-level stuff, not working on patches for core. General
discussions and questions like this should be directed to the -general
mailing list.

I want to ensure that each row is processed only once.

Is failure during processing possible?

I was thinking of using the `SELECT * ... OFFSET {offset_size} LIMIT
{limit_size}` functionality for this but I"m running into issues.

FOR UPDATE and SKIPPED LOCKED clauses usually come into play for this use
case.

Can you please suggest any alternative to periodically read rows from a
table in chunks while processing each row exactly once.

I think you are fooling yourself if you think you can do this without
writing back to the row the fact it has been processed. At which point
ensuring that you only retrieve and process unprocessed rows is trivial -
just don't select ones with a status of processed.

If adding a column to the data is not possible, record processed row
identifiers into a separate table and inspect that.

DavId J.