Fast way to delete big table?
Hi All,
I have a big table (200G with 728 million records), the table slows down lots of things. It's time to clean the data up. The requirement is when I delete I must summarise few columns to a new table for backup purpose. So that mean I need to go through each row and add the columns' value to the summary table (the corresponding category).
The table has indexes (multicoumn indexes) before, I removed the indexes because I read on forum says the indexes heavily reduce the speed of deleting. That's true if I only delete from the table, but my case is I first SELECT this row, then add few values to the summary table and then DELETE the row according to its multiple primary key. This seems to be a bad idea now as it takes very long time to DELETE the row (as delete needs to find the row first).
Here are the two version of the delete functions, please help to point out how can I speed it up.
1.
CREATE OR REPLACE FUNCTION summary_delete_table()
RETURNS integer AS
$BODY$
DECLARE
rec RECORD;
subrec RECORD;
BEGIN
FOR rec IN SELECT * FROM tableA limit 100 LOOP
BEGIN
UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =... where category match;
delete from tableA where tableA.primaryKeyA=rec.primaryKeyA and tableA.primaryKeyB=rec.primaryKeyB;
END;
END LOOP;
return 1;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
And then I have a .bat script to loop the above function million times.
2.
CREATE OR REPLACE FUNCTION summary_delete_table()
RETURNS integer AS
$BODY$
DECLARE
rec RECORD;
td_cursor CURSOR FOR SELECT * FROM tableA limit 100;
BEGIN
FOR rec IN td_cursor LOOP
BEGIN
UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =... where category match;
delete from tableA WHERE CURRENT OF td_cursor;
END;
END LOOP;
return 1;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
Method 2 is bit faster but not much, the delete speed is 2478 rows/s for method 2 and 2008 rows/s for method 1.
Any suggestions are welcomed.
BTW, I guess if reindex, it may take few days to finish.
Also, I tried change delete 100 rows at a time and 1000, 2000. The result showed 1000 is faster than 100 and 2000 a time.
Thanks and Regards,
Haiming
[cid:logo_new_96ac6d31-abb9-4f10-a3b7-74542f8c4a5f1111111111111111111111111111111.png]
Haiming Zhang | Engineer | Redflex Group
T +61 3 9674 1868 | F +61 3 9699 3566 | E Haiming.Zhang@redflex.com.au
31 Market Street, (PO Box 720), South Melbourne VIC 3205, Australia
If you are not an authorised recipient of this e-mail, please contact me at Redflex immediately by return phone call or by email. In this case, you should not read, print, retransmit, store or act in reliance on this e-mail or any attachments, and should destroy all copies of them. This e-mail and any attachments are confidential and may contain privileged information and/or copyright material of Redflex or third parties. You should only retransmit, distribute or commercialise the material if you are authorised to do so. This notice should not be removed.
Attachments:
On 16/05/16 17:09, Haiming Zhang wrote:
Hi All,
I have a big table (200G with 728 million records), the table slows
down lots of things. It's time to clean the data up. The requirement
is when I delete I must summarise few columns to a new table for
backup purpose. So that mean I need to go through each row and add the
columns' value to the summary table (the corresponding category).The table has indexes (multicoumn indexes) before, I removed the
indexes because I read on forum says the indexes *heavily* reduce the
speed of deleting. That's true if I only delete from the table, but my
case is I first SELECT this row, then add few values to the summary
table and then DELETE the row according to its multiple primary key.
This seems to be a bad idea now as it takes very long time to DELETE
the row (as delete needs to find the row first).Here are the two version of the delete functions, please help to point
out how can I speed it up.1.
CREATE OR REPLACE FUNCTION summary_delete_table()
RETURNS integer AS
$BODY$
DECLARE
rec RECORD;
subrec RECORD;
BEGIN
FOR rec IN SELECT * FROM tableA limit 100 LOOP
BEGIN
UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =...
where category match;delete from tableA where tableA.primaryKeyA=rec.primaryKeyA and
tableA.primaryKeyB=rec.primaryKeyB;END;
END LOOP;
return 1;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
And then I have a .bat script to loop the above function million times.
2.
CREATE OR REPLACE FUNCTION summary_delete_table()
RETURNS integer AS
$BODY$
DECLARE
rec RECORD;
td_cursor CURSOR FOR SELECT * FROM tableA limit 100;
BEGIN
FOR rec IN td_cursor LOOP
BEGIN
UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =...
where category match;delete from tableA WHERE CURRENT OF td_cursor;
END;
END LOOP;
return 1;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
Method 2 is bit faster but not much, the delete speed is 2478 rows/s
for method 2 and 2008 rows/s for method 1.Any suggestions are welcomed.
BTW, I guess if reindex, it may take few days to finish.
Also, I tried change delete 100 rows at a time and 1000, 2000. The
result showed 1000 is faster than 100 and 2000 a time.Thanks and Regards,
Haiming
[...]
In one transaction:
1. populate the summary table
2. drop indexes on tableA
3. delete selected entries in tableA
4. recreate indexes for tableA
If deleting all entries, then simply truncate it!
N.B. I have NOT checked the fine print in the documentation, nor tested
this - so be warned! :-)
Cheers,
Gavin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 5/15/2016 10:23 PM, Gavin Flower wrote:
4. recreate indexes for tableA
note on a large table, this step can take a LONG time. its greatly
facilitated by setting maintenance_work_mem = 1G beforehand.
--
john r pierce, recycling bits in santa cruz
Hi Gavin,
Thanks for the suggestion. What you said was what I tried to do (except the last reindexes, planned to do it when the table is cleaned up), however it was too slow. I have run for two days, nothing much happened.
Truncate does not work for my purpose.
Regards,
Haiming
[cid:logo_new_96ac6d31-abb9-4f10-a3b7-74542f8c4a5f1111111111111111111111111111111.png]
Haiming Zhang | Engineer | Redflex Group
T +61 3 9674 1868 | F +61 3 9699 3566 | E Haiming.Zhang@redflex.com.au
31 Market Street, (PO Box 720), South Melbourne VIC 3205, Australia
If you are not an authorised recipient of this e-mail, please contact me at Redflex immediately by return phone call or by email. In this case, you should not read, print, retransmit, store or act in reliance on this e-mail or any attachments, and should destroy all copies of them. This e-mail and any attachments are confidential and may contain privileged information and/or copyright material of Redflex or third parties. You should only retransmit, distribute or commercialise the material if you are authorised to do so. This notice should not be removed.
-----Original Message-----
From: Gavin Flower [mailto:GavinFlower@archidevsys.co.nz]
Sent: Monday, 16 May 2016 3:24 PM
To: Haiming Zhang <Haiming.Zhang@redflex.com.au>; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Fast way to delete big table?
On 16/05/16 17:09, Haiming Zhang wrote:
Hi All,
I have a big table (200G with 728 million records), the table slows
down lots of things. It's time to clean the data up. The requirement
is when I delete I must summarise few columns to a new table for
backup purpose. So that mean I need to go through each row and add the
columns' value to the summary table (the corresponding category).The table has indexes (multicoumn indexes) before, I removed the
indexes because I read on forum says the indexes *heavily* reduce the
speed of deleting. That's true if I only delete from the table, but my
case is I first SELECT this row, then add few values to the summary
table and then DELETE the row according to its multiple primary key.
This seems to be a bad idea now as it takes very long time to DELETE
the row (as delete needs to find the row first).Here are the two version of the delete functions, please help to point
out how can I speed it up.1.
CREATE OR REPLACE FUNCTION summary_delete_table()
RETURNS integer AS
$BODY$
DECLARE
rec RECORD;
subrec RECORD;
BEGIN
FOR rec IN SELECT * FROM tableA limit 100 LOOP
BEGIN
UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =...
where category match;delete from tableA where tableA.primaryKeyA=rec.primaryKeyA and
tableA.primaryKeyB=rec.primaryKeyB;END;
END LOOP;
return 1;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
And then I have a .bat script to loop the above function million times.
2.
CREATE OR REPLACE FUNCTION summary_delete_table()
RETURNS integer AS
$BODY$
DECLARE
rec RECORD;
td_cursor CURSOR FOR SELECT * FROM tableA limit 100;
BEGIN
FOR rec IN td_cursor LOOP
BEGIN
UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =...
where category match;delete from tableA WHERE CURRENT OF td_cursor;
END;
END LOOP;
return 1;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
Method 2 is bit faster but not much, the delete speed is 2478 rows/s
for method 2 and 2008 rows/s for method 1.Any suggestions are welcomed.
BTW, I guess if reindex, it may take few days to finish.
Also, I tried change delete 100 rows at a time and 1000, 2000. The
result showed 1000 is faster than 100 and 2000 a time.Thanks and Regards,
Haiming
[...]
In one transaction:
1. populate the summary table
2. drop indexes on tableA
3. delete selected entries in tableA
4. recreate indexes for tableA
If deleting all entries, then simply truncate it!
N.B. I have NOT checked the fine print in the documentation, nor tested this - so be warned! :-)
Cheers,
Gavin
Attachments:
Hi John,
Thanks. Agree, the reindexes will take forever to finish. Do you think increase temp_buffers will help too?
Regards,
Haiming
[cid:logo_new_96ac6d31-abb9-4f10-a3b7-74542f8c4a5f1111111111111111111111111111111.png]
Haiming Zhang | Engineer | Redflex Group
T +61 3 9674 1868 | F +61 3 9699 3566 | E Haiming.Zhang@redflex.com.au
31 Market Street, (PO Box 720), South Melbourne VIC 3205, Australia
If you are not an authorised recipient of this e-mail, please contact me at Redflex immediately by return phone call or by email. In this case, you should not read, print, retransmit, store or act in reliance on this e-mail or any attachments, and should destroy all copies of them. This e-mail and any attachments are confidential and may contain privileged information and/or copyright material of Redflex or third parties. You should only retransmit, distribute or commercialise the material if you are authorised to do so. This notice should not be removed.
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of John R Pierce
Sent: Monday, 16 May 2016 3:28 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Fast way to delete big table?
On 5/15/2016 10:23 PM, Gavin Flower wrote:
4. recreate indexes for tableA
note on a large table, this step can take a LONG time. its greatly facilitated by setting maintenance_work_mem = 1G beforehand.
--
john r pierce, recycling bits in santa cruz
Attachments:
Hello
I am not sure if this is an option, but would partial indexes be of
help? You could create your indexes in such a way that you exclude the
rows that are not needed for your current queries, i.e. those that you
would summarize in a separate table and delete using the same condition
that you use for selecting which rows must be deleted.
CREATE INDEX ... ON tablea (...) WHERE ...;
This would not make deletion quicker, but operations should have a gain.
What I also see is that you don't have any condition when selecting from
the loop, i.e. you actually go through all rows and decide on the fly
which you must delete. Maybe reducing the number of rows to be processed
by the function helps.
Finally I don't know the structure of the table, but you may select only
those columns that you need for your summary instead of *. If you have
many columns this could have an impact on the performance of the select
statement.
Regards
Charles
On 05/16/2016 07:09 AM, Haiming Zhang wrote:
Hi All,
I have a big table (200G with 728 million records), the table slows down
lots of things. It's time to clean the data up. The requirement is when
I delete I must summarise few columns to a new table for backup purpose.
So that mean I need to go through each row and add the columns' value to
the summary table (the corresponding category).The table has indexes (multicoumn indexes) before, I removed the indexes
because I read on forum says the indexes *heavily* reduce the speed of
deleting. That's true if I only delete from the table, but my case is I
first SELECT this row, then add few values to the summary table and then
DELETE the row according to its multiple primary key. This seems to be a
bad idea now as it takes very long time to DELETE the row (as delete
needs to find the row first).Here are the two version of the delete functions, please help to point
out how can I speed it up.1.
CREATE OR REPLACE FUNCTION summary_delete_table()
RETURNS integer AS
$BODY$
DECLARE
rec RECORD;
subrec RECORD;
BEGIN
FOR rec
IN SELECT * FROM tableA limit 100 LOOPBEGIN
UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =... where category match;
delete from tableA where tableA.primaryKeyA=rec.primaryKeyA and tableA.primaryKeyB=rec.primaryKeyB;
END;
END LOOP;
return 1;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
And then I have a .bat script to loop the above function million times.
2.
CREATE OR REPLACE FUNCTION summary_delete_table()
RETURNS integer AS
$BODY$
DECLARE
rec RECORD;
td_cursor CURSOR FOR SELECT * FROM tableA limit 100;
BEGIN
FOR rec
IN td_cursor LOOPBEGIN
UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =... where category match;
delete from tableA WHERE CURRENT OF td_cursor;
END;
END LOOP;
return 1;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
Method 2 is bit faster but not much, the delete speed is 2478 rows/s for
method 2 and 2008 rows/s for method 1.Any suggestions are welcomed.
BTW, I guess if reindex, it may take few days to finish.
Also, I tried change delete 100 rows at a time and 1000, 2000. The
result showed 1000 is faster than 100 and 2000 a time.Thanks and Regards,
Haiming
*Haiming Zhang* | Engineer | *Redflex Group*
*T* +61 3 9674 1868 | *F* +61 3 9699 3566 | *E*
Haiming.Zhang@redflex.com.au
31 Market Street, (PO Box 720), South Melbourne VIC 3205, AustraliaIf you are not an authorised recipient of this e-mail, please contact me
at Redflex immediately by return phone call or by email. In this case,
you should not read, print, retransmit, store or act in reliance on this
e-mail or any attachments, and should destroy all copies of them. This
e-mail and any attachments are confidential and may contain privileged
information and/or copyright material of Redflex or third parties. You
should only retransmit, distribute or commercialise the material if you
are authorised to do so. This notice should not be removed.
--
Swiss PostgreSQL Users Group
c/o Charles Clavadetscher
Motorenstrasse 18
CH - 8005 Z�rich
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 05/15/2016 10:33 PM, Haiming Zhang wrote:
Hi Gavin,
Thanks for the suggestion. What you said was what I tried to do (except
the last reindexes, planned to do it when the table is cleaned up),
however it was too slow. I have run for two days, nothing much happened.
From your original post I gathered you did not do what Gavin suggested.
In that you dropped the indexes first and the combined the UPDATE
summary and DELETE row into one operation. So I am little confused on
what you are asking. Are you look for suggestions on what to do in the
future or how to make the existing condition(no indexes on the big
table) work better or both?
Truncate does not work for my purpose.
Regards,
Haiming*Haiming Zhang* | Engineer | *Redflex Group*
*T* +61 3 9674 1868 | *F* +61 3 9699 3566 | *E*
Haiming.Zhang@redflex.com.au
31 Market Street, (PO Box 720), South Melbourne VIC 3205, AustraliaIf you are not an authorised recipient of this e-mail, please contact me
at Redflex immediately by return phone call or by email. In this case,
you should not read, print, retransmit, store or act in reliance on this
e-mail or any attachments, and should destroy all copies of them. This
e-mail and any attachments are confidential and may contain privileged
information and/or copyright material of Redflex or third parties. You
should only retransmit, distribute or commercialise the material if you
are authorised to do so. This notice should not be removed.-----Original Message-----
From: Gavin Flower [mailto:GavinFlower@archidevsys.co.nz]
Sent: Monday, 16 May 2016 3:24 PM
To: Haiming Zhang <Haiming.Zhang@redflex.com.au>;
pgsql-general@postgresql.org
Subject: Re: [GENERAL] Fast way to delete big table?On 16/05/16 17:09, Haiming Zhang wrote:
Hi All,
I have a big table (200G with 728 million records), the table slows
down lots of things. It's time to clean the data up. The requirement
is when I delete I must summarise few columns to a new table for
backup purpose. So that mean I need to go through each row and add the
columns' value to the summary table (the corresponding category).The table has indexes (multicoumn indexes) before, I removed the
indexes because I read on forum says the indexes *heavily* reduce the
speed of deleting. That's true if I only delete from the table, but my
case is I first SELECT this row, then add few values to the summary
table and then DELETE the row according to its multiple primary key.
This seems to be a bad idea now as it takes very long time to DELETE
the row (as delete needs to find the row first).Here are the two version of the delete functions, please help to point
out how can I speed it up.1.
CREATE OR REPLACE FUNCTION summary_delete_table()
RETURNS integer AS
$BODY$
DECLARE
rec RECORD;
subrec RECORD;
BEGIN
FOR rec IN SELECT * FROM tableA limit 100 LOOP
BEGIN
UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =...
where category match;delete from tableA where tableA.primaryKeyA=rec.primaryKeyA and
tableA.primaryKeyB=rec.primaryKeyB;END;
END LOOP;
return 1;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
And then I have a .bat script to loop the above function million times.
2.
CREATE OR REPLACE FUNCTION summary_delete_table()
RETURNS integer AS
$BODY$
DECLARE
rec RECORD;
td_cursor CURSOR FOR SELECT * FROM tableA limit 100;
BEGIN
FOR rec IN td_cursor LOOP
BEGIN
UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =...
where category match;delete from tableA WHERE CURRENT OF td_cursor;
END;
END LOOP;
return 1;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
Method 2 is bit faster but not much, the delete speed is 2478 rows/s
for method 2 and 2008 rows/s for method 1.Any suggestions are welcomed.
BTW, I guess if reindex, it may take few days to finish.
Also, I tried change delete 100 rows at a time and 1000, 2000. The
result showed 1000 is faster than 100 and 2000 a time.Thanks and Regards,
Haiming
[...]
In one transaction:
1. populate the summary table
2. drop indexes on tableA
3. delete selected entries in tableA
4. recreate indexes for tableAIf deleting all entries, then simply truncate it!
N.B. I have NOT checked the fine print in the documentation, nor tested
this - so be warned! :-)Cheers,
Gavin
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Sun, May 15, 2016 at 10:09 PM, Haiming Zhang <
Haiming.Zhang@redflex.com.au> wrote:
Hi All,
I have a big table (200G with 728 million records), the table slows down
lots of things. It's time to clean the data up. The requirement is when I
delete I must summarise few columns to a new table for backup purpose. So
that mean I need to go through each row and add the columns' value to the
summary table (the corresponding category).
Is the performance slow-down an emergency? If not, then delete and
summarize the rows in bite size chunks, over weeks or months. It took a
long time to get into the situation, spend some time calmly getting out of
it.
The table has indexes (multicoumn indexes) before, I removed the indexes
because I read on forum says the indexes *heavily* reduce the speed of
deleting.
That generally isn't true in PostgreSQL. Deletes do not directly perform
any index maintenance (triggers might cause them to do some indirectly).
Doing the deletes does build up "vacuum debt" which must eventually be
paid, and more indexes does make that more expensive. But that happens in
the background.
That's true if I only delete from the table, but my case is I first SELECT
this row, then add few values to the summary table and then DELETE the row
according to its multiple primary key. This seems to be a bad idea now as
it takes very long time to DELETE the row (as delete needs to find the row
first).
This doesn't match your example, as in your examples there are no WHERE
clauses on the main table select, so no index could be used to find the
rows.
Your examples also have no selection criteria or stopping criteria (other
than when your outer driving script stops calling the function). That
means you might be deleting any arbitrary rows in the master table (not
just the oldest ones) and might proceed with these deletions until the
table is empty. Given that, why not just summarize the entire table and
then truncate the whole thing?
...
You could try to rewrite this into a single SQL query, using a CTE with
"with t as (delete .... returning * )". The hardest part is that
PostgreSQL alas doesn't allow LIMIT on DELETE statements, so you have to go
through some ugly contortions with a sub-select, or a where clause on your
delete using, for example, a date to do the limiting, and have your script
increment the date on each invocation. I've not done this with
summarization, but have done it with partitioning where I had to move
tuples from the parent table to their new partition (or the reverse, moving
from an ill-conceived partition back to the parent) with a minimum of
disruption.
It should be faster, but I don't know how much. It would depend a lot on
whether your bottleneck is CPU or IO, amongst other things.
Method 2 is bit faster but not much, the delete speed is 2478 rows/s for
method 2 and 2008 rows/s for method 1.Any suggestions are welcomed.
Why is this not fast enough? So, it would take 4 days. How fast do you
need it to be?
But there is the possibility that it will slow down over time, as each
execution might have to scan over the debris of all of the previously
deleted rows before it gets to the ones you want to work on.
Cheers,
Jeff
Hi Charles,
Thanks.
The reason I dont have condition when selecting is it's faster than having.
Because my aim is to go through every row, find values of few columns and
append them to a summary table. Then delete the row from the table. So find
the rows on the fly is fine for me.
I have tried to reduce the number of rows to be processed, even I process 1
row in production machine, it takes 24 hours to finish.
Regards,
Haiming
--
View this message in context: http://postgresql.nabble.com/Fast-way-to-delete-big-table-tp5903787p5903917.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi Jeff,Thank you very much.>Your examples also have no selection criteria or
stopping criteria (other>than when your outer driving script stops calling
the function). That>means you might be deleting any arbitrary rows in the
master table (not>just the oldest ones) and might proceed with these
deletions until the>table is empty. Given that, why not just summarize the
entire table and>then truncate the whole thing?Yeah, that's my aim. Go
through every row and delete the entire table. The problem of summarizing
first then truncate is I cant guarantee the database and system keep running
during the summarizing as the customer sometimes stops the application and
postgres manually. Plus there are new data goes into the table so I need to
process chunk by chunk until it finishes. >You could try to rewrite this
into a single SQL query, using a CTE with>"with t as (delete .... returning
* )". The hardest part is that>PostgreSQL alas doesn't allow LIMIT on
DELETE statements, so you have to go>through some ugly contortions with a
sub-select, or a where clause on your>delete using, for example, a date to
do the limiting, and have your script>increment the date on each invocation.
I've not done this with>summarization, but have done it with partitioning
where I had to move>tuples from the parent table to their new partition (or
the reverse, moving>from an ill-conceived partition back to the parent) with
a minimum of>disruption.worth to try, eg. delete the oldest 100 rows at a
time. But wont the sorting take more time than just select from the fly?>Why
is this not fast enough? So, it would take 4 days. How fast do you>need it
to be?Works ok in my testing environment but not on the production
machine.Select * from tableA limit 1; takes milliseconds to finishupdate
summary table from the previous select result, takes millisecondsdelete from
tableA where primaryKey = ... takes 2.9 hours to finish. Regards,Haiming
--
View this message in context: http://postgresql.nabble.com/Fast-way-to-delete-big-table-tp5903787p5903918.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Hello
The reason I dont have condition when selecting is it's faster than having.
Because my aim is to go through every row, find values of few columns and append them to a summary table. Then
delete the row from the table. So find the rows on the fly is fine for me.I have tried to reduce the number of rows to be processed, even I process 1 row in production machine, it takes 24
hours to finish.
This is quite weird... It there any other process locking the record or the table?
Bye
Charles
Regards,
Haiming--
View this message in context: http://postgresql.nabble.com/Fast-way-to-delete-big-table-tp5903787p5903917.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mon, May 16, 2016 at 10:22 PM, hmzha2 <Haiming.Zhang@redflex.com.au> wrote:
Works ok in my testing environment but not
on the production machine. Select * from tableA limit 1; takes milliseconds
to finish update summary table from the previous select result, takes
milliseconds delete from tableA where primaryKey = ... takes 2.9 hours to
finish.
Perhaps you have a foreign constraint pointing to your tableA from
another table, and the other table is un-indexed on those columns (in
production). So every delete from tableA leads to a full scan of that
other table to make sure no offending rows exist.
If that is not the case, then I'd try to do a "strace -ttt -T -y -p
<PID>" on the process which is taking so long, and see what it is
doing. Unfortunately, you have to do that on production, as that is
the only system where the problem appears.
Cheers,
Jeff
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Charles Clavadetscher wrote
Hello
This is quite weird... It there any other process locking the record or
the table?
Bye
Charles
Yes, weird. The table is using by other process (keep inserting rows to the
table) at the same time but no one should lock the row as we dont touch rows
after they've been inserted).
--
View this message in context: http://postgresql.nabble.com/Fast-way-to-delete-big-table-tp5903787p5903923.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi Jeff and all,
Thank you for your help. Just update my progress, I finally made it working
in production by using the second solution. The extra thing I have done was
"restart postgres" and the script started to delete 2558 records per second.
I guess it must has locked records or something blocked the delete, and
"restart" removed the obstacle.
Thanks again for the help.
Regards,
Haiming
--
View this message in context: http://postgresql.nabble.com/Fast-way-to-delete-big-table-tp5903787p5904081.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general