allow LIMIT in UPDATE and DELETE
Hi all,
Currently the LIMIT clause is not allowed in UPDATE or DELETE
statements. I wonder how easy it would be to allow it, and what people
think about it ? For our application it would help a lot when processing
things chunk-wise to avoid long running queries.
The fact that the actual rows processed would be unpredictable does not
make it less useful for us. We actually don't care which rows are
processed, we process them all anyway, we just want to make sure it is a
limited number at a time. A lot of our processes do take large amounts
of time (hours up to days), and we cannot allow that to be in one
transaction, the system does on-line processing too...
I guess the low-level infrastructure is already there (from what I
understood from earlier postings, but I may be wrong), and the question
is more if this feature is worth to be included or not... and the syntax
must be adjusted of course if yes.
All comments welcome...
Thanks,
Csaba.
On 5/19/06, Csaba Nagy <nagy@ecircle-ag.com> wrote:
Hi all,
Currently the LIMIT clause is not allowed in UPDATE or DELETE
statements. I wonder how easy it would be to allow it, and what people
think about it ? For our application it would help a lot when processing
things chunk-wise to avoid long running queries.
I asked that question a while ago..
http://archives.postgresql.org/pgsql-general/2005-04/msg00386.php
and got this response:
http://archives.postgresql.org/pgsql-general/2005-04/msg00387.php
Works quite well :)
--
Postgresql & php tutorials
http://www.designmagick.com/
On 5/19/06, Csaba Nagy <nagy@ecircle-ag.com> wrote:
Hi all,
Currently the LIMIT clause is not allowed in UPDATE or DELETE
statements. I wonder how easy it would be to allow it, and what people
think about it ? For our application it would help a lot when processing
things chunk-wise to avoid long running queries.The fact that the actual rows processed would be unpredictable does not
make it less useful for us. We actually don't care which rows are
processed, we process them all anyway, we just want to make sure it is a
limited number at a time. A lot of our processes do take large amounts
of time (hours up to days), and we cannot allow that to be in one
transaction, the system does on-line processing too...I guess the low-level infrastructure is already there (from what I
understood from earlier postings, but I may be wrong), and the question
is more if this feature is worth to be included or not... and the syntax
must be adjusted of course if yes.
-- sample data
CREATE TEMP TABLE tab (id serial primary key, n int, t text);
INSERT INTO tab(n) SELECT * FROM generate_series(1,1000);
-- say, you want such an update:
UPDATE tab SET t = 'aqq' WHERE n > 10;
-- but with limit:
UPDATE tab SET t = 'aqq' WHERE id in (SELECT id FROM tab WHERE n > 10
LIMIT 100);
-- or this way (join):
UPDATE tab SET t = 'aqq' FROM (SELECT id FROM tab WHERE n > 10 LIMIT
100) AS tab_ids WHERE tab.id = tab_ids.id;
...this of course assumes that you have a primary key you can use
to "target" the update.
Then again, there are places where there is no primary key, like:
CREATE TABLE foo (t text);
INSERT INTO foo VALUES('aaa');
INSERT INTO foo VALUES('aaa');
...and you want to update first 'aaa' to 'bbb'.
But you can handle it this way:
CREATE TEMP SEQUENCE aaa_temp_seq;
UPDATE foo SET t='bbb' WHERE t='aaa' AND nextval('aaa_temp_seq') <= 1;
-- LIMIT 1
...this of course will suck for big queries (all matching rows will be
searched, but not updated);
Reagrds,
Dawid
I asked that question a while ago..
http://archives.postgresql.org/pgsql-general/2005-04/msg00386.php
and got this response:
http://archives.postgresql.org/pgsql-general/2005-04/msg00387.php
Works quite well :)
I don't doubt that it works well, in fact that's what I plan to use
myself... I was not asking for a workaround, that I can figure out
myself :-)
Regarding the arguments of the post you linked:
"It isn't in the SQL standard" - neither LIMIT on SELECT, still it is a
very useful feature;
"it would have undefined behavior" - so does LIMIT on SELECT without
order by, and it is again still useful when you don't care about which
entry you get, you just want one of them. It certainly provides a
planning advantage in some cases where ordering would mean a sort;
The subquery delete will definitely have more involved plan than a
delete with limit. On some of my tables that would make a difference,
even if not that big one due to caching effects (the same rows are
deleted/updated which were already visited by the subquery). I can't say
for sure how big is the penalty of doing the subqery, but the plans I
have seen involve something like:
db=# prepare test_001(bigint, bigint, smallint) as
db-# DELETE FROM big_table
db-# WHERE (col1, col2) IN
db-# (SELECT col1, col2 FROM big_table
db(# WHERE col1=$2
db(# AND col3 IS NOT NULL
db(# AND col4 =$3
db(# AND col5 <> 'o'
db(# LIMIT 1000);
PREPARE
db=# explain execute test_001(1,1,1);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=2166.10..2282.99 rows=1 width=6)
-> HashAggregate (cost=2166.10..2166.10 rows=29 width=16)
-> Subquery Scan "IN_subquery" (cost=0.00..2165.95 rows=29
width=16)
-> Limit (cost=0.00..2165.66 rows=29 width=16)
-> Index Scan using idx_big_table_col3 on
big_table (cost=0.00..2165.66 rows=29 width=16)
Index Cond: (col1 = $2)
Filter: ((col3 IS NOT NULL) AND (col4 = $3)
AND ("col5" <> 'o'::bpchar))
-> Index Scan using pk_big_table on big_table (cost=0.00..4.02
rows=1 width=22)
Index Cond: ((big_table.col1 = "outer".col1) AND
(big_table.col2 = "outer".col2))
(9 rows)
idx_big_table_col3 - is a partial index where col3 is not null, which
means ~ 10% of the total rows. The estimates are a generic estimate, the
worst case is that there are a few 100K rows selected by the subselect
before the limit. So I guess the worst case can have as much as double
cost than a plan for DELETE with LIMIT would have.
With the LIMIT allowed on DELETE it would be something like:
-> Limit (cost=0.00..2165.66 rows=29 width=16)
-> Index Scan using idx_big_table_col3 on
big_table (cost=0.00..2165.66 rows=29 width=16)
Index Cond: (col1 = $2)
Filter: ((col3 IS NOT NULL) AND (col4 = $3)
AND ("col5" <> 'o'::bpchar))
That would spare a HashAggregate and an index scan. The index scan would
very likely not be a problem, as the same index entries are visited in
the subquery and likely are cached, and the HashAggregate should be also
fast for the max 1000 rows it has to handle, but they are still
completely unnecessary for my purpose, so I still think the LIMIT on
DELETE and UPDATE would make perfect sense.
Cheers,
Csaba.
-- sample data
CREATE TEMP TABLE tab (id serial primary key, n int, t text);
INSERT INTO tab(n) SELECT * FROM generate_series(1,1000);
-- say, you want such an update:
UPDATE tab SET t = 'aqq' WHERE n > 10;
-- but with limit:
UPDATE tab SET t = 'aqq' WHERE id in (SELECT id FROM tab WHERE n > 10
LIMIT 100);
-- or this way (join):
UPDATE tab SET t = 'aqq' FROM (SELECT id FROM tab WHERE n > 10 LIMIT
100) AS tab_ids WHERE tab.id = tab_ids.id;...this of course assumes that you have a primary key you can use
to "target" the update.Then again, there are places where there is no primary key, like:
CREATE TABLE foo (t text);
INSERT INTO foo VALUES('aaa');
INSERT INTO foo VALUES('aaa');
...and you want to update first 'aaa' to 'bbb'.But you can handle it this way:
CREATE TEMP SEQUENCE aaa_temp_seq;
UPDATE foo SET t='bbb' WHERE t='aaa' AND nextval('aaa_temp_seq') <= 1;
-- LIMIT 1
...this of course will suck for big queries (all matching rows will be
searched, but not updated);Reagrds,
Dawid
Like I said in a previous post, I can figure out the workarounds, but it
would be nice not to need it, not to mention the LIMIT would work
faster.
I specifically seek for opinions about the LIMIT on DELETE/UPDATE
feature, not workarounds.
Cheers,
Csaba.
That would spare a HashAggregate and an index scan. The index scan would
very likely not be a problem, as the same index entries are visited in
the subquery and likely are cached, and the HashAggregate should be also
fast for the max 1000 rows it has to handle, but they are still
completely unnecessary for my purpose, so I still think the LIMIT on
DELETE and UPDATE would make perfect sense.
Oh, it just occured to me: the subquery is using a different index than
the outer loop, so we can forget about caching. Considering that the
outer loop uses an index 10x bigger than the subquery, and that means
~50million entries, and the typical situation where this would be used
mostly selects 1000 rows indeed, the LIMIT on DELETE for this case would
mean for sure ~ 50% speedup.
Cheers,
Csaba.
Csaba Nagy <nagy@ecircle-ag.com> writes:
Currently the LIMIT clause is not allowed in UPDATE or DELETE
statements. I wonder how easy it would be to allow it, and what people
think about it ?
This has been proposed before, and rejected before, and the arguments
against are just as strong as they were before. See the archives.
regards, tom lane
On Fri, 2006-05-19 at 15:51, Tom Lane wrote:
Csaba Nagy <nagy@ecircle-ag.com> writes:
Currently the LIMIT clause is not allowed in UPDATE or DELETE
statements. I wonder how easy it would be to allow it, and what people
think about it ?This has been proposed before, and rejected before, and the arguments
against are just as strong as they were before. See the archives.
Tom, I guess you refer to the following:
http://archives.postgresql.org/pgsql-patches/2002-09/msg00256.php
Well, let me fight your arguments.
The first one here is not the first in your mail, but the most important
I guess:
"Just because MySQL is willing to implement nonstandard bad ideas
doesn't
mean we are. In any case the idea that this might provide some amount
of compatibility is illusory: the odds are good that we'd delete or
update a different tuple than they do, because of implementation
differences. An application that actually depends on MySQL's behavior
would surely be broken."
Well, first of all, you're not competing here with MySQL in this case, but with Oracle. Our application does this using Oracle's ROWNUM trick and it works perfectly fine. Now I guess you think Oracle's ROWNUM is also stupid in this case, but it certainly helps us writing cleaner SQL, and a missing postgres alternative which is easy to use won't help you in attracting Oracle users.
Regarding the compatibility, I do not expect that the rows deleted/updated will be the same as on Oracle or other DB, I simply expect that only a limited number of rows will be processed at a time... the rest will be processed in the next runs, on all DBs... Is this so outrageous ?
Regarding non-standard, LIMIT is non-standard in SELECTS as well, and I don't see how much more non-standard would it be in DELETE/UPDATE.
Regarding "bad ideas", "depending on MySQLs behavior", I think you're simply overreacting here... for me LIMIT in DELETE/UPDATE would have saved a few days of compatibility work on our application WITH ORACLE, not MySQL, and a few contrived queries.
"And how exactly do you control *which* tuple(s) get deleted or updated,
if the WHERE clause selects more than the limit?"
I DO NOT CARE about which rows are deleted. The fact that it is nondeterministic can be very clearly specified in the documentation if you think it is such a bad thing, but nondeterministic is perfectly fine sometimes. There are lots of nondeterminisms in the data base world, starting with the ordering of selects if you don't use order by, then why don't we force everybody using order by ? Why don't you force to use order by on a select with limit ? Why there it is enough to say it in the docs that it WILL BE NON_DETERMINISTIC ?
"Then use ctid."
For the problem at hand in your post it is a good solution, except that it will cause a full table scan cause I guess few people have indexes on ctid. Or you have to write your queries really contrived, by duplicating most of your query conditions so that it can use some indexes. I'm not sure if you'll get away without at least 2 full table scans if using ctid and no indexes, one for the subquery and one for the delete itself... not to mention the need for something like a HashAggregate on the subquery results... all this is speculation, but for sure you'll spend 10x the time for optimizing the subquery then you would writing a simple DELETE with LIMIT.
"Have you got any evidence that there's a meaningful speedup?"
No, but from speculating the query plans I see it would mean up to 50% speedup for my use case.
"We have a zero-tolerance policy on yacc warnings."
"This just seems like a really bad idea ..."
You simply have prejudices against this feature. If you wouldn't be so against it I'm sure the problems could be solved. You're one of the most influent person on where postgres is going, and it's a pity when you're so against something you don't like based on gut feelings...
Cheers,
Csaba.
Csaba Nagy <nagy@ecircle-ag.com> writes:
I DO NOT CARE about which rows are deleted.
You can't possibly think that that holds true in general.
The fact that it is
nondeterministic can be very clearly specified in the documentation if
you think it is such a bad thing, but nondeterministic is perfectly
fine sometimes. There are lots of nondeterminisms in the data base
world, starting with the ordering of selects if you don't use order
by, then why don't we force everybody using order by ? Why don't you
force to use order by on a select with limit ? Why there it is enough
to say it in the docs that it WILL BE NON_DETERMINISTIC ?
I can tolerate nondeterminism in SELECT because it doesn't change the
data. If you get it wrong you can always do it over. UPDATE/DELETE
need to have higher standards though.
regards, tom lane
On Fri, 2006-05-19 at 16:31, Tom Lane wrote:
Csaba Nagy <nagy@ecircle-ag.com> writes:
I DO NOT CARE about which rows are deleted.
You can't possibly think that that holds true in general.
I agree that it is not true in the general case, but then I also don't
want to use DELETE with LIMIT in the general case. I only want to use it
in the very specific cases where it makes sense, and it results in
cleaner SQL, and it would likely result in a better execution plan.
I can tolerate nondeterminism in SELECT because it doesn't change the
data. If you get it wrong you can always do it over. UPDATE/DELETE
need to have higher standards though.
Please Tom, there are so many ways you can shoot your feet already in
there... I don't see why this one would be a bigger foot-gun then the
subquery stile. It is functionally equivalent. It's only easier to
write... if somebody wants to shoot himself, he can do it one way or the
other. Placing a big warning on the docs should be enough... <rant>
except if postgres is really targeting the MySql users instead of the
Oracle folks. Those guys already have this foot-gun readily loaded...
where's the American spirit where you are allowed to carry guns and
expected to act responsible ?</rant>
Cheers,
Csaba.
----- Original Message ----
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Csaba Nagy <nagy@ecircle-ag.com>
Cc: Postgres general mailing list <pgsql-general@postgresql.org>
Sent: Friday, May 19, 2006 9:31:24 AM
Subject: Re: [GENERAL] allow LIMIT in UPDATE and DELETEYou can't possibly think that that holds true in general.
I can tolerate nondeterminism in SELECT because it doesn't change the
data. If you get it wrong you can always do it over. UPDATE/DELETE
need to have higher standards though.regards, tom lane
The usage Csaba is referring to seems to be pretty common practice in the world of Oracle. If I need to purge 5-10 million rows from a non-partitioned table on a regular basis (e.g: archiving) I'm going to use delete in conjunction with an appropriate where clause (typically something like less than some sequence number or date) and tack a "rownum<X" (where X is some fairly large constant) on the end so that the delete is done in chunks. I'll commit immediately afterwards and loop until sql%rowcount<X indicating that I'm finsihed.
Now the question... why would you do that instead of doing everything in one big transaction on Oracle? I guess performance is one reason. Oracle's (at least with 8/8i) performance seems to tank very quickly on deletes as you increase the number of records you delete in a single transaction. The other (at least with my understanding of Oracle internals) is that using smaller transactions will mean less rollback segment space used which reduces the likelyhood of your transaction getting killed due to Oracle running out of rollback space on a database that has heavy usage.
Since Postgresql's MVCC system doesn't require rollback segments I guess the performance question is the more important of the two. If there could be a performance gain in Postgresql with "delete from X where Y limit Z" vs "delete from X where Y in (some subselect limit Z)" I'd think Csaba suggestion has some merit.
Regards,
Shelby Cain
Now the question... why would you do that instead of doing everything in one big transaction on Oracle? I guess performance is one reason. Oracle's (at least with 8/8i) performance seems to tank very quickly on deletes as you increase the number of records you delete in a single transaction. The other (at least with my understanding of Oracle internals) is that using smaller transactions will mean less rollback segment space used which reduces the likelyhood of your transaction getting killed due to Oracle running out of rollback space on a database that has heavy usage.
Running out of rollback segments is the answer in our case. It happened
more than once...
Since Postgresql's MVCC system doesn't require rollback segments I guess the performance question is the more important of the two. If there could be a performance gain in Postgresql with "delete from X where Y limit Z" vs "delete from X where Y in (some subselect limit Z)" I'd think Csaba suggestion has some merit.
But postgres has in turn the problem of not functional vacuum when you
have long running transactions. That is a problem for heavily recycled
tables like queue tables.
Now recently I have solved the vacuum problem by regularly CLUSTER-ing
our most heavily used queue table, so long running transactions are not
anymore such a huge problem for us, but we still have the case of some
user triggered operations which time out on the web before finishing on
the DB. Some of those would make perfect sense to be done partially and
the user then can restart the operation from where it left... now we
often have 5 minutes of updates rolled back and leaving nothing useful
but only dead rows.
And yes, not everything we do must be perfectly transactional...
sometimes the partial work worth more than partial work rolled back.
Cheers,
Csaba.
On Fri, May 19, 2006 at 04:22:42PM +0200, Csaba Nagy wrote:
"Then use ctid."
For the problem at hand in your post it is a good solution, except
that it will cause a full table scan cause I guess few people have
indexes on ctid. Or you have to write your queries really contrived,
by duplicating most of your query conditions so that it can use some
indexes. I'm not sure if you'll get away without at least 2 full
table scans if using ctid and no indexes, one for the subquery and
one for the delete itself... not to mention the need for something
like a HashAggregate on the subquery results... all this is
speculation, but for sure you'll spend 10x the time for optimizing
the subquery then you would writing a simple DELETE with LIMIT.
Err, you don't need an index on ctid because the ctid represents that
physical location of the tuple on disk. ctids are what indexes use to
refer to tuples...
# explain DELETE FROM t WHERE ctid=(SELECT ctid FROM t WHERE pronargs=1 LIMIT 1);
QUERY PLAN
----------------------------------------------------------------
Tid Scan on t (cost=3.75..7.76 rows=2 width=6)
Filter: (ctid = $0)
InitPlan
-> Limit (cost=0.00..3.75 rows=1 width=6)
-> Seq Scan on t (cost=0.00..22.50 rows=6 width=6)
Filter: (pronargs = 1)
(6 rows)
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Err, you don't need an index on ctid because the ctid represents that
physical location of the tuple on disk. ctids are what indexes use to
refer to tuples...
OK, then how you explain this:
db=# prepare test_001(bigint, bigint, smallint) as
db-# DELETE FROM big_table
db-# WHERE ctid IN
db-# (SELECT ctid FROM big_table
db(# WHERE col1=$2
db(# AND col2 IS NOT NULL
db(# AND col3 =$3
db(# AND col4 <> 'o'
db(# LIMIT 1000);
PREPARE
db=# explain execute test_001(1,1,1);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Nested Loop IN Join (cost=2165.98..24570725.13 rows=29 width=6)
Join Filter: ("outer".ctid = "inner".ctid)
-> Seq Scan on big_table (cost=0.00..1037188.04 rows=36063404
width=6)
-> Materialize (cost=2165.98..2166.27 rows=29 width=6)
-> Subquery Scan "IN_subquery" (cost=0.00..2165.95 rows=29
width=6)
-> Limit (cost=0.00..2165.66 rows=29 width=6)
-> Index Scan using idx_big_table_col2 on
big_table (cost=0.00..2165.66 rows=29 width=6)
Index Cond: (col1 = $2)
Filter: ((col2 IS NOT NULL) AND (col3 = $3)
AND ("col4" <> 'o'::bpchar))
(9 rows)
Cheers,
Csaba.
Regarding Oracle's ROWNUM - since they have to use that instead of OFFSET
and LIMIT, that isn't much of an argument for the Oracle way. When
converting queries into Oracle SQL, I always _really_ miss OFFSET and
LIMIT. They are much easier to use than ROWNUM, especially with ORDER BY.
I think that more databases support OFFSET and LIMIT than ROWNUM (the
Oracle way).
Personally, I have never wanted a DELETE or UPDATE with LIMIT. The one
time I did something similar in Oracle, I used partitions, and just dropped
or truncated the partition containing the old data.
Susan
Csaba Nagy
<nagy@ecircle-ag.com> To: Postgres general mailing list <pgsql-general@postgresql.org>
Sent by: cc:
Subject: Re: [GENERAL] allow LIMIT in UPDATE and DELETE
pgsql-general-owner@pos |-------------------|
tgresql.org | [ ] Expand Groups |
|-------------------|
05/19/2006 07:22
AM
Well, first of all, you're not competing here with MySQL in this case, but
with Oracle. Our application does this using Oracle's ROWNUM trick and it
works perfectly fine. Now I guess you think Oracle's ROWNUM is also stupid
in this case, but it certainly helps us writing cleaner SQL, and a missing
postgres alternative which is easy to use won't help you in attracting
Oracle users.
Cheers,
Csaba.
----------------------------------------------------------------------------------------------
Simply protected storage solutions ensure that your information is
automatically safe, readily available and always there, visit us at http://www.overlandstorage.com
----------------------------------------------------------------------------------------------
Import Notes
Resolved by subject fallback
On Fri, 2006-05-19 at 17:43, SCassidy@overlandstorage.com wrote:
Regarding Oracle's ROWNUM - since they have to use that instead of OFFSET
and LIMIT, that isn't much of an argument for the Oracle way. When
converting queries into Oracle SQL, I always _really_ miss OFFSET and
LIMIT. They are much easier to use than ROWNUM, especially with ORDER BY.
I would agree here, but for my purpose would have been a lot easier to
parameterize the limit syntax only than to completely rewrite the query
for postgres.
I think that more databases support OFFSET and LIMIT than ROWNUM (the
Oracle way).
I actually don't care what's the syntax as long I can create a query
with the syntactically equivalent parts in the same place of the query
and the same number and order of parameters. Then it's fairly easy to
parameterize it and have the same code handle it... otherwise I have to
write special code for each data base. Sometimes I have to do that
anyway for different reasons, but it is a PITA to maintain it, so I try
to minimize the number of places where I have to do it.
Personally, I have never wanted a DELETE or UPDATE with LIMIT. The one
time I did something similar in Oracle, I used partitions, and just dropped
or truncated the partition containing the old data.
Perfectly fine, I also don't use many of the features available :-) That
doesn't mean it is not useful for others...
Cheers,
Csaba.
----- Original Message ----
From: SCassidy@overlandstorage.com
To: Csaba Nagy <nagy@ecircle-ag.com>
Cc: Postgres general mailing list <pgsql-general@postgresql.org>; >pgsql-general-owner@postgresql.org
Sent: Friday, May 19, 2006 10:43:43 AM
Subject: Re: [GENERAL] allow LIMIT in UPDATE and DELETEPersonally, I have never wanted a DELETE or UPDATE with LIMIT. The one
time I did something similar in Oracle, I used partitions, and just dropped
or truncated the partition containing the old data.
Yeah, that’s the proper way to handle the issue assuming that sufficient forethought was put into the design of the database. It becomes trivial to drop a partition part of your weekly/monthly maintenance.
Unfortunately, when dealing with legacy database systems that have grown in a very organic way over a span of 5-10 years you don't have much control over the schema and there is significant inertia present to leave things as they are. As an example, the DBA group at the client I'm currently working has their databases managed by IBM global services and they will push back on altering a table definition to use partitioning as it costs the organization money to have those changes put into production. Having a internal developer use a script to perform such maintenance as a batch process is considered "free" so you can guess how many such processes have been created over the years.
Regards,
Shelby Cain
Personally, I have never wanted a DELETE or UPDATE with LIMIT. The one
time I did something similar in Oracle, I used partitions, and just dropped
or truncated the partition containing the old data.Yeah, that’s the proper way to handle the issue assuming that sufficient forethought was put into the design of the database. It becomes trivial to drop a partition part of your weekly/monthly maintenance.
Well, sometimes it's not that easy. How would you handle a batch
processing system which stores the incoming requests in a queue table in
the data base, and then periodically processes a batch of it, with the
additional constraint that it is allowed to process at most 1000 at a
time so it won't produce a too long running transaction ? Suppose the
processing is quite costly, and the queue can have bursts of incoming
requests which then have to be slowly processed... the requests are
coming from the web and must be processed asynchronously, the insert
into the data base must be very fast.
Partitioning would be able to solve this kind of problem I guess, if one
processing chunk is one partition, and it is dropped after processed,
but it needs a whole lot of setup and I'm not sure how well it would
work with largely variable bursts of data... the number of partitions
could grow indefinitely, and there would be a race condition when
there's low traffic and we need to process an incomplete chunk (after a
maximum sleep timeout for e.g.) while there are still some incoming
requests we don't want to loose.
So what I'm talking about is not maintenance, but on-line operation...
Cheers,
Csaba.
Martijn van Oosterhout <kleptog@svana.org> writes:
Err, you don't need an index on ctid because the ctid represents that
physical location of the tuple on disk. ctids are what indexes use to
refer to tuples...
# explain DELETE FROM t WHERE ctid=(SELECT ctid FROM t WHERE pronargs=1 LIMIT 1);
This doesn't currently work for more than one tuple, though: the natural
locution would be "WHERE ctid IN (SELECT returning more than one tid)"
but the planner/executor don't cope with doing that efficiently. Might
be worth trying to fix that.
regards, tom lane
----- Original Message ----
From: Csaba Nagy <nagy@ecircle-ag.com>
To: Shelby Cain <alyandon@yahoo.com>
Cc: SCassidy@overlandstorage.com; Postgres general mailing list ><pgsql-general@postgresql.org>; pgsql-general-owner@postgresql.org
Sent: Friday, May 19, 2006 11:46:42 AM
Subject: Re: [GENERAL] allow LIMIT in UPDATE and DELETEWell, sometimes it's not that easy. How would you handle a batch
processing system which stores the incoming requests in a queue table in
the data base, and then periodically processes a batch of it, with the
additional constraint that it is allowed to process at most 1000 at a
time so it won't produce a too long running transaction ? Suppose the
processing is quite costly, and the queue can have bursts of incoming
requests which then have to be slowly processed... the requests are
coming from the web and must be processed asynchronously, the insert
into the data base must be very fast.
I can't imagine a case where a properly tuned Postgresql installation with appropriate hardware backing it couldn't handle that particular kind of workload pattern. However, I usually work with Oracle so tables used as queues don't have the same performance issues you'd run into with Postgresql.
Regardless, this type of queue problem can also be tackled by having your data layer persisting the input from the web in memory (which maintains a low perceived response time to the client) and posting to the table as fast as the database allows.
So what I'm talking about is not maintenance, but on-line operation...
Different problems will always require different solutions. In the case you present I don't really think partitioning is the answer.
Regards,
Shelby Cain