Batching up data into groups of n rows
Hi,
I have a need to write a query that batches up rows of a table into
groups of n records. I feel like this should be possible using the
existing window functions but I can't figure out how to do it from the
examples.
So I have some table, let's say
create table addresses (
line_1 text,
line_2 text,
city text,
state text,
zip text);
...and I want to select the data from that table, adding a "group_id"
column, and a "record_id" column. The "group_id" would start at 1,
and increment by 1 every 100 rows, and the "record_id" would increment
by 1 every row, but restart at 1 for each new group_id.
Thanks,
Andy
On Mon, Sep 26, 2011 at 18:59, Andy Chambers <achambers@mcna.net> wrote:
...and I want to select the data from that table, adding a "group_id"
column, and a "record_id" column. The "group_id" would start at 1,
and increment by 1 every 100 rows, and the "record_id" would increment
by 1 every row, but restart at 1 for each new group_id.
I can't see why you would want this, but regardless...
Since you didn't list a primary key for the table, I'm using "ctid" as
a substitute. But that's a PostgreSQL-specific hack, normally you
would use the real primary key instead.
update addresses set group_id=(nr/100)+1, record_id=(nr%100)+1 from
(select ctid, row_number() over () -1 as nr from addresses) as subq
where subq.ctid=addresses.ctid;
This isn't going to be fast for a large table as it will effectively
make 3 passes over the table, but it will get the job done.
If you want the numbers to be ordered by certain columns, you'd use
row_number() OVER (ORDER BY col1, col2)
Regards,
Marti
On Mon, Sep 26, 2011 at 4:22 PM, Marti Raudsepp <marti@juffo.org> wrote:
On Mon, Sep 26, 2011 at 18:59, Andy Chambers <achambers@mcna.net> wrote:
...and I want to select the data from that table, adding a "group_id"
column, and a "record_id" column. The "group_id" would start at 1,
and increment by 1 every 100 rows, and the "record_id" would increment
by 1 every row, but restart at 1 for each new group_id.I can't see why you would want this, but regardless...
The addresses need to be sent to a 3rd party web-service for
canonicalization. The web service accepts batches of <100 addresses.
I was wondering how I'd get Postgres to generate the XML for sending
100 addresses at a time to this web service.
Since you didn't list a primary key for the table, I'm using "ctid" as
a substitute. But that's a PostgreSQL-specific hack, normally you
would use the real primary key instead.update addresses set group_id=(nr/100)+1, record_id=(nr%100)+1 from
(select ctid, row_number() over () -1 as nr from addresses) as subq
where subq.ctid=addresses.ctid;
Cool! I don't need to actually store these ids in the database, they
just need to be generated on the fly and forgotten but I think I can
adapt the example to do what I need.
Thanks,
Andy