Batching up data into groups of n rows

Started by Andy Chambersover 14 years ago3 messagesgeneral

achambers@mcna.net

over 14 years ago

Hi,

I have a need to write a query that batches up rows of a table into
groups of n records. I feel like this should be possible using the
existing window functions but I can't figure out how to do it from the
examples.

So I have some table, let's say

create table addresses (
line_1 text,
line_2 text,
city text,
state text,
zip text);

...and I want to select the data from that table, adding a "group_id"
column, and a "record_id" column. The "group_id" would start at 1,
and increment by 1 every 100 rows, and the "record_id" would increment
by 1 every row, but restart at 1 for each new group_id.

Thanks,
Andy

Marti Raudsepp

marti@juffo.org

over 14 years ago

In reply to: Andy Chambers (#1)

Re: Batching up data into groups of n rows

On Mon, Sep 26, 2011 at 18:59, Andy Chambers <achambers@mcna.net> wrote:

...and I want to select the data from that table, adding a "group_id"
column, and a "record_id" column. The "group_id" would start at 1,
and increment by 1 every 100 rows, and the "record_id" would increment
by 1 every row, but restart at 1 for each new group_id.

I can't see why you would want this, but regardless...

Since you didn't list a primary key for the table, I'm using "ctid" as
a substitute. But that's a PostgreSQL-specific hack, normally you
would use the real primary key instead.

update addresses set group_id=(nr/100)+1, record_id=(nr%100)+1 from
(select ctid, row_number() over () -1 as nr from addresses) as subq
where subq.ctid=addresses.ctid;

This isn't going to be fast for a large table as it will effectively
make 3 passes over the table, but it will get the job done.

If you want the numbers to be ordered by certain columns, you'd use
row_number() OVER (ORDER BY col1, col2)

Regards,
Marti

Andy Chambers

achambers@mcna.net

over 14 years ago

In reply to: Marti Raudsepp (#2)

Re: Batching up data into groups of n rows

On Mon, Sep 26, 2011 at 4:22 PM, Marti Raudsepp <marti@juffo.org> wrote:

On Mon, Sep 26, 2011 at 18:59, Andy Chambers <achambers@mcna.net> wrote:

...and I want to select the data from that table, adding a "group_id"
column, and a "record_id" column. The "group_id" would start at 1,
and increment by 1 every 100 rows, and the "record_id" would increment
by 1 every row, but restart at 1 for each new group_id.

I can't see why you would want this, but regardless...

The addresses need to be sent to a 3rd party web-service for
canonicalization. The web service accepts batches of <100 addresses.
I was wondering how I'd get Postgres to generate the XML for sending
100 addresses at a time to this web service.

Since you didn't list a primary key for the table, I'm using "ctid" as
a substitute. But that's a PostgreSQL-specific hack, normally you
would use the real primary key instead.

update addresses set group_id=(nr/100)+1, record_id=(nr%100)+1 from
(select ctid, row_number() over () -1 as nr from addresses) as subq
where subq.ctid=addresses.ctid;

Cool! I don't need to actually store these ids in the database, they
just need to be generated on the fly and forgotten but I think I can
adapt the example to do what I need.

Thanks,
Andy