Improving performance of select query
Hi,
I have a postgres(v9.5) table named customer holding 1 billion rows. It is not partitioned but it has an index against the primary key (integer). I need to keep a very few records (say, about 10k rows) and remove everything else.
insert into customer_backup select * from customer where customer_id in (<id1>,<id2>,..);
If I go for something like above I'm afraid the insert-select may take a very long time as when I ran
select count(*) from customer;
it is taking about 45 minutes to return the count.
Are there ways to improve the efficiency of the insert-select by , say, tuning some configurations related to memory to improve the efficiency ?
This is a box with 96GB of RAM overall and I can stop all the data load and DML operations if needed. But need a way to run this query as much efficiently as possible
Thanks and Regards,
Karthik
Disclaimer:
This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is considered confidential, proprietary, sensitive and/or otherwise legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.
Karthik Shivashankar schrieb am 14.12.2020 um 12:38:
I have a postgres(v9.5) table named customer holding 1 billion rows.
It is not partitioned but it has an index against the primary key
(integer). I need to keep a very few records (say, about 10k rows)
and remove everything else./insert into customer_backup select * from customer where customer_id in (<id1>,<id2>,..); /
If I go for something like above I'm afraid the insert-select may take a very long time as when I ran
/select count(*) from customer;/
it is taking about 45 minutes to return the count.
Well, you need to compare the time with the same condition you use in your
CREATE TABLE .. AS SELECT statement,
e.g.:
select count(*)
from customer
where id in (....);
Or:
explain (analyze)
select *
from customer
where id in (....);
Regards
Thomas
On Dec 14, 2020, at 4:47 AM, Thomas Kellerer <shammat@gmx.net> wrote:
Karthik Shivashankar schrieb am 14.12.2020 um 12:38:
I have a postgres(v9.5) table named customer holding 1 billion rows.
It is not partitioned but it has an index against the primary key
(integer). I need to keep a very few records (say, about 10k rows)
and remove everything else./insert into customer_backup select * from customer where customer_id in (<id1>,<id2>,..); /
If I go for something like above I'm afraid the insert-select may take a very long time as when I ran
/select count(*) from customer;/
it is taking about 45 minutes to return the count.
Well, you need to compare the time with the same condition you use in your
CREATE TABLE .. AS SELECT statement,e.g.:
select count(*)
from customer
where id in (....);Or:
explain (analyze)
select *
from customer
where id in (....);Regards
Thomas
As for the actually copy of the specific records, I would ‘where exists’ (even possibly with a temp table of ids) rather than in(id1..id10000)
Show quoted text
I think you can also increase the query performance by creating indexes?
On Mon, 14 Dec 2020 at 11:36, Rob Sargent <robjsargent@gmail.com> wrote:
Show quoted text
On Dec 14, 2020, at 4:47 AM, Thomas Kellerer <shammat@gmx.net> wrote:
Karthik Shivashankar schrieb am 14.12.2020 um 12:38:
I have a postgres(v9.5) table named customer holding 1 billion rows.
It is not partitioned but it has an index against the primary key
(integer). I need to keep a very few records (say, about 10k rows)
and remove everything else./insert into customer_backup select * from customer where customer_id
in (<id1>,<id2>,..); /
If I go for something like above I'm afraid the insert-select may take
a very long time as when I ran
/select count(*) from customer;/
it is taking about 45 minutes to return the count.
Well, you need to compare the time with the same condition you use in
your
CREATE TABLE .. AS SELECT statement,
e.g.:
select count(*)
from customer
where id in (....);Or:
explain (analyze)
select *
from customer
where id in (....);Regards
ThomasAs for the actually copy of the specific records, I would ‘where exists’
(even possibly with a temp table of ids) rather than in(id1..id10000)
select count(*) from <table> is probably not using the index that your
insert/select would, so I would not use that as a test for performance.
If customer_backup has an index, the insert-select will be
performance-limited by updating that index.
If you can do a *create table customer_backup* as
*select * from customer where customer_id in (<id1>,<id2>,..); *
I expect it to run quite fast as long as customer_id is indexed and
analyzed.
On Mon, Dec 14, 2020 at 9:37 AM Muhammad Bilal Jamil <mbjamil92@gmail.com>
wrote:
Show quoted text
I think you can also increase the query performance by creating indexes?
On Mon, 14 Dec 2020 at 11:36, Rob Sargent <robjsargent@gmail.com> wrote:
On Dec 14, 2020, at 4:47 AM, Thomas Kellerer <shammat@gmx.net> wrote:
Karthik Shivashankar schrieb am 14.12.2020 um 12:38:
I have a postgres(v9.5) table named customer holding 1 billion rows.
It is not partitioned but it has an index against the primary key
(integer). I need to keep a very few records (say, about 10k rows)
and remove everything else./insert into customer_backup select * from customer where customer_id
in (<id1>,<id2>,..); /
If I go for something like above I'm afraid the insert-select may take
a very long time as when I ran
/select count(*) from customer;/
it is taking about 45 minutes to return the count.
Well, you need to compare the time with the same condition you use in
your
CREATE TABLE .. AS SELECT statement,
e.g.:
select count(*)
from customer
where id in (....);Or:
explain (analyze)
select *
from customer
where id in (....);Regards
ThomasAs for the actually copy of the specific records, I would ‘where exists’
(even possibly with a temp table of ids) rather than in(id1..id10000)
On Dec 14, 2020, at 10:37 AM, Muhammad Bilal Jamil <mbjamil92@gmail.com> wrote:
I think you can also increase the query performance by creating indexes?
OP said there was a key on the target (large) table. I’m not sure there’s much of a win in indexing 10K ids.