Updating a specific number of rows in pl/pgsql

Started by Peter Headlandover 16 years ago11 messagesgeneral
Jump to latest
#1Peter Headland
pheadland@actuate.com

I'm working on a queuing application. As part of this I'm trying to
write a pl/pgsql function that updates a specific number of rows in the
most efficient way possible. Multiple queues are contained within a
single table.

I can get the rows I want to update like this:

SELECT *
FROM queue
WHERE id = p_queue_id
ORDER BY rank
LIMIT p_number_of_items;

Of course, there may not be p_number_of_items available in the queue.

I want to update all the rows in the cursor in the same way:

UPDATE queue SET assigned = TRUE;

The "obvious" solution is to get a cursor on the query and attempt to
MOVE through that cursor in a loop, using the row count from the SELECT
to tell me when I am done. I can then use UPDATE ... WHERE CURRENT OF
... to do the updates. This seems cumbersome and inefficient to me. Is
there a better way?

Ideally, I'd like to do something like:

UPDATE (SELECT ... ) ...

--
Peter Headland
Architect
Actuate Corporation

#2D'Arcy J.M. Cain
darcy@druid.net
In reply to: Peter Headland (#1)
Re: Updating a specific number of rows in pl/pgsql

On Mon, 10 Aug 2009 17:52:36 -0700
"Peter Headland" <pheadland@actuate.com> wrote:

I can get the rows I want to update like this:

SELECT *
FROM queue
WHERE id = p_queue_id
ORDER BY rank
LIMIT p_number_of_items;

Of course, there may not be p_number_of_items available in the queue.

I want to update all the rows in the cursor in the same way:

UPDATE queue SET assigned = TRUE;

Assuming that there is a unique identifier on queue, let's call it
queue_id, you should be able to do something like this:

UPDATE queue SET assigned = TRUE
WHERE queue_id IN (SELECT queue_id
FROM queue
WHERE id = p_queue_id
ORDER BY rank
LIMIT p_number_of_items);

-- 
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.
#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: D'Arcy J.M. Cain (#2)
Re: Updating a specific number of rows in pl/pgsql

2009/8/11 D'Arcy J.M. Cain <darcy@druid.net>:

On Mon, 10 Aug 2009 17:52:36 -0700
"Peter Headland" <pheadland@actuate.com> wrote:

I can get the rows I want to update like this:

  SELECT *
   FROM queue
   WHERE id = p_queue_id
   ORDER BY rank
   LIMIT p_number_of_items;

Of course, there may not be p_number_of_items available in the queue.

I want to update all the rows in the cursor in the same way:

  UPDATE queue SET assigned = TRUE;

Assuming that there is a unique identifier on queue, let's call it
queue_id, you should be able to do something like this:

 UPDATE queue SET assigned = TRUE
 WHERE queue_id IN (SELECT queue_id
   FROM queue
   WHERE id = p_queue_id
   ORDER BY rank
   LIMIT p_number_of_items);

there are one fast trick
http://www.postgres.cz/index.php/PostgreSQL_SQL_Tricks#Fast_first_n_rows_removing

p.s. replace DELETE by UPDATE
regards
Pavel Stehule

Show quoted text
--
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

#4Peter Headland
pheadland@actuate.com
In reply to: D'Arcy J.M. Cain (#2)
Re: Updating a specific number of rows in pl/pgsql

Assuming that there is a unique identifier on queue

Alas, there is not. The PK is made up of 4 columns.

--
Peter Headland
Architect
Actuate Corporation

-----Original Message-----
From: D'Arcy J.M. Cain [mailto:darcy@druid.net]
Sent: Tuesday, August 11, 2009 03:25
To: Peter Headland
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] Updating a specific number of rows in pl/pgsql

On Mon, 10 Aug 2009 17:52:36 -0700
"Peter Headland" <pheadland@actuate.com> wrote:

I can get the rows I want to update like this:

SELECT *
FROM queue
WHERE id = p_queue_id
ORDER BY rank
LIMIT p_number_of_items;

Of course, there may not be p_number_of_items available in the queue.

I want to update all the rows in the cursor in the same way:

UPDATE queue SET assigned = TRUE;

Assuming that there is a unique identifier on queue, let's call it
queue_id, you should be able to do something like this:

UPDATE queue SET assigned = TRUE
WHERE queue_id IN (SELECT queue_id
FROM queue
WHERE id = p_queue_id
ORDER BY rank
LIMIT p_number_of_items);

-- 
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.
#5Peter Headland
pheadland@actuate.com
In reply to: Pavel Stehule (#3)
Re: Updating a specific number of rows in pl/pgsql

there are one fast trick
http://www.postgres.cz/index.php/PostgreSQL_SQL_Tricks#Fast_first_n_rows_removing

Thanks - that's a very useful page!

Unfortunately, there is no single column that provides a unique id, and I am reluctant to add one (for example, using a sequence and a new index) for performance reasons.

Given that additional constraint, is my original plan using a loop to iterate over a cursor reasonable? I don't anticipate p_number_of_items being more than 20.

--
Peter Headland
Architect
Actuate Corporation

-----Original Message-----
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Tuesday, August 11, 2009 03:55
To: D'Arcy J.M. Cain
Cc: Peter Headland; pgsql-sql@postgresql.org
Subject: Re: [SQL] Updating a specific number of rows in pl/pgsql

2009/8/11 D'Arcy J.M. Cain <darcy@druid.net>:

On Mon, 10 Aug 2009 17:52:36 -0700
"Peter Headland" <pheadland@actuate.com> wrote:

I can get the rows I want to update like this:

  SELECT *
   FROM queue
   WHERE id = p_queue_id
   ORDER BY rank
   LIMIT p_number_of_items;

Of course, there may not be p_number_of_items available in the queue.

I want to update all the rows in the cursor in the same way:

  UPDATE queue SET assigned = TRUE;

Assuming that there is a unique identifier on queue, let's call it
queue_id, you should be able to do something like this:

 UPDATE queue SET assigned = TRUE
 WHERE queue_id IN (SELECT queue_id
   FROM queue
   WHERE id = p_queue_id
   ORDER BY rank
   LIMIT p_number_of_items);

there are one fast trick
http://www.postgres.cz/index.php/PostgreSQL_SQL_Tricks#Fast_first_n_rows_removing

p.s. replace DELETE by UPDATE
regards
Pavel Stehule

Show quoted text
--
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Peter Headland (#5)
Re: Updating a specific number of rows in pl/pgsql

2009/8/11 Peter Headland <pheadland@actuate.com>:

there are one fast trick
http://www.postgres.cz/index.php/PostgreSQL_SQL_Tricks#Fast_first_n_rows_removing

Thanks - that's a very useful page!

Unfortunately, there is no single column that provides a unique id, and I am reluctant to add one (for example, using a sequence and a new index) for performance reasons.

ctid is unique system column in every table.

postgres=# create table x(a int);
CREATE TABLE
Time: 655,062 ms
postgres=# insert into x values(10);
INSERT 0 1
Time: 49,237 ms
postgres=# insert into x values(10);
INSERT 0 1
Time: 1,740 ms
postgres=# select ctid, a from x;
ctid | a
-------+----
(0,1) | 10
(0,2) | 10
(2 rows)

Given that additional constraint, is my original plan using a loop to iterate over a cursor reasonable? I don't anticipate p_number_of_items being more than 20.

why not? for small number of iteration is loop over cursor good solution.

Pavel Stehule

Show quoted text

--
Peter Headland
Architect
Actuate Corporation

-----Original Message-----
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Tuesday, August 11, 2009 03:55
To: D'Arcy J.M. Cain
Cc: Peter Headland; pgsql-sql@postgresql.org
Subject: Re: [SQL] Updating a specific number of rows in pl/pgsql

2009/8/11 D'Arcy J.M. Cain <darcy@druid.net>:

On Mon, 10 Aug 2009 17:52:36 -0700
"Peter Headland" <pheadland@actuate.com> wrote:

I can get the rows I want to update like this:

  SELECT *
   FROM queue
   WHERE id = p_queue_id
   ORDER BY rank
   LIMIT p_number_of_items;

Of course, there may not be p_number_of_items available in the queue.

I want to update all the rows in the cursor in the same way:

  UPDATE queue SET assigned = TRUE;

Assuming that there is a unique identifier on queue, let's call it
queue_id, you should be able to do something like this:

 UPDATE queue SET assigned = TRUE
 WHERE queue_id IN (SELECT queue_id
   FROM queue
   WHERE id = p_queue_id
   ORDER BY rank
   LIMIT p_number_of_items);

there are one fast trick
http://www.postgres.cz/index.php/PostgreSQL_SQL_Tricks#Fast_first_n_rows_removing

p.s. replace DELETE by UPDATE
regards
Pavel Stehule

--
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

#7Peter Headland
pheadland@actuate.com
In reply to: Peter Headland (#1)
Re: Updating a specific number of rows in pl/pgsql

Unfortunately, there is no single column that provides a unique id.

Correction - I did not understand what ctid was, but now I do, so I will try your tip.

--
Peter Headland
Architect
Actuate Corporation

-----Original Message-----
From: Peter Headland
Sent: Tuesday, August 11, 2009 10:05
To: 'Pavel Stehule'; D'Arcy J.M. Cain
Cc: pgsql-sql@postgresql.org
Subject: RE: [SQL] Updating a specific number of rows in pl/pgsql

there are one fast trick
http://www.postgres.cz/index.php/PostgreSQL_SQL_Tricks#Fast_first_n_rows_removing

Thanks - that's a very useful page!

Unfortunately, there is no single column that provides a unique id, and I am reluctant to add one (for example, using a sequence and a new index) for performance reasons.

Given that additional constraint, is my original plan using a loop to iterate over a cursor reasonable? I don't anticipate p_number_of_items being more than 20.

--
Peter Headland
Architect
Actuate Corporation

-----Original Message-----
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: Tuesday, August 11, 2009 03:55
To: D'Arcy J.M. Cain
Cc: Peter Headland; pgsql-sql@postgresql.org
Subject: Re: [SQL] Updating a specific number of rows in pl/pgsql

2009/8/11 D'Arcy J.M. Cain <darcy@druid.net>:

On Mon, 10 Aug 2009 17:52:36 -0700
"Peter Headland" <pheadland@actuate.com> wrote:

I can get the rows I want to update like this:

  SELECT *
   FROM queue
   WHERE id = p_queue_id
   ORDER BY rank
   LIMIT p_number_of_items;

Of course, there may not be p_number_of_items available in the queue.

I want to update all the rows in the cursor in the same way:

  UPDATE queue SET assigned = TRUE;

Assuming that there is a unique identifier on queue, let's call it
queue_id, you should be able to do something like this:

 UPDATE queue SET assigned = TRUE
 WHERE queue_id IN (SELECT queue_id
   FROM queue
   WHERE id = p_queue_id
   ORDER BY rank
   LIMIT p_number_of_items);

there are one fast trick
http://www.postgres.cz/index.php/PostgreSQL_SQL_Tricks#Fast_first_n_rows_removing

p.s. replace DELETE by UPDATE
regards
Pavel Stehule

Show quoted text
--
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

#8Peter Headland
pheadland@actuate.com
In reply to: Pavel Stehule (#6)
Race hazard deleting using CTID?

I believe that the following statement was originally suggested by Tom Lane; I got it from Pavel Stehule's PostgreSQL Tips page.
My question is, does this code contain a race hazard, because the list from the SELECT might get changed by another session before the DELETE uses it?

delete from del where ctid = any(array(select ctid from del limit 10))

If so, am I correct to think that adding FOR UPDATE to create the version below would eliminate the hazard?

delete from del where ctid = any(array(select ctid from del limit 10 for update))

--
Peter Headland
Architect
Actuate Corporation

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Headland (#8)
Re: Race hazard deleting using CTID?

"Peter Headland" <pheadland@actuate.com> writes:

My question is, does this code contain a race hazard, because the list from the SELECT might get changed by another session before the DELETE uses it?

delete from del where ctid = any(array(select ctid from del limit 10))

Well, the CTID of a row you can see can't be changed by another
transaction while your transaction is still live. However, if someone
else does modify/delete one of those rows concurrently, it will fail the
outer WHERE check and thus silently not be deleted. Net effect is that
you might delete fewer than 10 rows. Not sure if you'd consider that a
race hazard or not.

If so, am I correct to think that adding FOR UPDATE to create the version below would eliminate the hazard?

delete from del where ctid = any(array(select ctid from del limit 10 for update))

If you'd bothered to try that before asking the list, you'd know the
system won't take it --- FOR UPDATE is only supported at top level.
You could probably do something equivalent using a plpgsql loop, or
pulling the CTIDs back to the client side.

regards, tom lane

#10Peter Headland
pheadland@actuate.com
In reply to: Tom Lane (#9)
Re: Race hazard deleting using CTID?

Well, the CTID of a row you can see can't be changed by another
transaction while your transaction is still live. However, if
someone else does modify/delete one of those rows concurrently,
it will fail the outer WHERE check and thus silently not be
deleted. Net effect is that you might delete fewer than 10
rows. Not sure if you'd consider that a race hazard or not.

Thanks for the explanation. My actual use-case is an UPDATE, and
updating fewer rows than I wanted to is definitely not the desired
behavior.

It's a race hazard condition by definition if the behavior of a system
is non-deterministic due to timing of another process.

This is all in a pl/pgsql function.

I was trying to avoid a cursor and loop solution in the hope of better
performance. Sounds like I need to separate the SELECT into a top-level
statement and get the list of ctids into an array variable that I then
use in the DELETE (actually an UPDATE in my case, but that's beside the
point). I know this question would be better on new users, but since we
already have all the context established, I'd be grateful to know how to
get the ctids into an array variable in pl/pgsql.

--
Peter Headland
Architect
Actuate Corporation

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, August 11, 2009 12:01
To: Peter Headland
Cc: pgsql-general@postgresql.org; Pavel Stehule
Subject: Re: Race hazard deleting using CTID?

"Peter Headland" <pheadland@actuate.com> writes:

My question is, does this code contain a race hazard, because the list

from the SELECT might get changed by another session before the DELETE
uses it?

delete from del where ctid = any(array(select ctid from del limit

10))

Well, the CTID of a row you can see can't be changed by another
transaction while your transaction is still live. However, if someone
else does modify/delete one of those rows concurrently, it will fail the
outer WHERE check and thus silently not be deleted. Net effect is that
you might delete fewer than 10 rows. Not sure if you'd consider that a
race hazard or not.

If so, am I correct to think that adding FOR UPDATE to create the

version below would eliminate the hazard?

delete from del where ctid = any(array(select ctid from del limit 10

for update))

If you'd bothered to try that before asking the list, you'd know the
system won't take it --- FOR UPDATE is only supported at top level.
You could probably do something equivalent using a plpgsql loop, or
pulling the CTIDs back to the client side.

regards, tom lane

#11Xharon
sahapasci@gmail.com
In reply to: Peter Headland (#1)
Re: Updating a specific number of rows in pl/pgsql

ctid is the physical location of the row version, so it's not
static..
check http://www.postgresql.org/docs/8.2/static/ddl-system-columns.html