reorder table data
Hi,
I have a 40GB database with a few tables containing approx 10 million rows.
Most of the data in these tables is inactive and there is only a few rows
which get used for our benchmark run each day. We cannot delete the inactive
data since it might be required for a particular run some day while it is
active data that is used mostly. Problem here is that the update and select
queries are very slow becuase of this background data. I am looking for some
way to reorganize the data in such a way that the active data gets accessed
much faster as compared to inactive data. I am using btree indices which
assumes that all the rows have the same probability. Is there an index or
some other way to order this data so that active data can be accessed most
efficiently (want to reduce the effect of background data as much as
possible).??
Also, what is the best way to find out which rows are getting accessed in a
table for a particular run??
Regards,
Vinita Bansal
_________________________________________________________________
Find,Compare,Buy & Sell!
http://adfarm.mediaplex.com/ad/ck/4686-26272-10936-265?ck=Register Do it all
on eBay!
vinita bansal wrote:
Hi,
I have a 40GB database with a few tables containing approx 10 million
rows. Most of the data in these tables is inactive and there is only a
few rows which get used for our benchmark run each day. We cannot delete
the inactive data since it might be required for a particular run some
day while it is active data that is used mostly.
Is there some pattern to which data is being accessed. For example, in
an accounts system it might be rows with paid=false.
--
Richard Huxton
Archonet Ltd
Richard Huxton wrote:
vinita bansal wrote:
Hi,
I have a 40GB database with a few tables containing approx 10 million
rows. Most of the data in these tables is inactive and there is only
a few rows which get used for our benchmark run each day. We cannot
delete the inactive data since it might be required for a particular
run some day while it is active data that is used mostly.Is there some pattern to which data is being accessed. For example, in
an accounts system it might be rows with paid=false.
If the above is the case you could create an expression index
specifically for your clause.
You could also archive out the old information into another schema and
access it when required using UNIONS.
Sincerely,
Joshua D. Drake
Command Prompt, Inc.
--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedication Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
Hi,
There is no particular pattern but it is generally the update queries of the
form "update tableName set colName='something'" that are taking a lot of
time incase there is a lot of background data. Also, I would not like to
change my application to access data from another schema when required. I
want this to be handled at database level wherein everything in database
itself is organised to make access faster.
Regards,
Vinita Bansal
From: "Joshua D. Drake" <jd@commandprompt.com>
To: Richard Huxton <dev@archonet.com>
CC: vinita bansal <sagivini@hotmail.com>, pgsql-general@postgresql.org
Subject: Re: [GENERAL] reorder table data
Date: Tue, 19 Apr 2005 11:25:06 -0700Richard Huxton wrote:
vinita bansal wrote:
Hi,
I have a 40GB database with a few tables containing approx 10 million
rows. Most of the data in these tables is inactive and there is only a
few rows which get used for our benchmark run each day. We cannot delete
the inactive data since it might be required for a particular run some
day while it is active data that is used mostly.Is there some pattern to which data is being accessed. For example, in an
accounts system it might be rows with paid=false.If the above is the case you could create an expression index specifically
for your clause.You could also archive out the old information into another schema and
access it when required using UNIONS.Sincerely,
Joshua D. Drake
Command Prompt, Inc.--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedication Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
_________________________________________________________________
Find,Compare,Buy & Sell!
http://adfarm.mediaplex.com/ad/ck/4686-26272-10936-265?ck=Register Do it all
on eBay!
vinita bansal wrote:
Hi,
There is no particular pattern but it is generally the update queries of
the form "update tableName set colName='something'" that are taking a
lot of time incase there is a lot of background data.
Well, that query will obviously update the entire table, but if you
can't predict which rows will be changed all you can do is index the
appropriate column(s) you select against.
Also, I would not
like to change my application to access data from another schema when
required. I want this to be handled at database level wherein everything
in database itself is organised to make access faster.
Not without some pattern to work to - if you can't say which rows will
be accessed next, then how can your database know?
Can you provide an actual example of a query you find too slow, how long
it takes and what it's EXPLAIN ANALYSE is?
--
Richard Huxton
Archonet Ltd
Oh Sorry!!
I gave a wrong example for update. It includes "where clause" as well which
esentially mean that not all the rows will be modified each time. If suppose
I already know (assuming that I can find out which rows will bw accessed) is
there a way to organize the table data such that updates and selects become
fast.
We cannot make changes to the application to handle such a situation (by
creating two different tables one for active and another for inactive data)
wherein it will access active data mostly and access inactive data from
other table only when required since it is very complex and changing it will
require lots of effort.
Regards,
Vinita
From: Richard Huxton <dev@archonet.com>
To: vinita bansal <sagivini@hotmail.com>
CC: jd@commandprompt.com, pgsql-general@postgresql.org
Subject: Re: [GENERAL] reorder table data
Date: Wed, 20 Apr 2005 11:03:19 +0100vinita bansal wrote:
Hi,
There is no particular pattern but it is generally the update queries of
the form "update tableName set colName='something'" that are taking a lot
of time incase there is a lot of background data.Well, that query will obviously update the entire table, but if you can't
predict which rows will be changed all you can do is index the appropriate
column(s) you select against.Also, I would not
like to change my application to access data from another schema when
required. I want this to be handled at database level wherein everything
in database itself is organised to make access faster.Not without some pattern to work to - if you can't say which rows will be
accessed next, then how can your database know?Can you provide an actual example of a query you find too slow, how long it
takes and what it's EXPLAIN ANALYSE is?--
Richard Huxton
Archonet Ltd
_________________________________________________________________
Find,Compare,Buy & Sell!
http://adfarm.mediaplex.com/ad/ck/4686-26272-10936-265?ck=Register Do it all
on eBay!
vinita bansal wrote:
Oh Sorry!!
I gave a wrong example for update. It includes "where clause" as well
which esentially mean that not all the rows will be modified each time.
If suppose I already know (assuming that I can find out which rows will
bw accessed) is there a way to organize the table data such that updates
and selects become fast.We cannot make changes to the application to handle such a situation (by
creating two different tables one for active and another for inactive
data) wherein it will access active data mostly and access inactive data
from other table only when required since it is very complex and
changing it will require lots of effort.
OK - so show one or more typical queries and their EXPLAIN ANALYSE
outputs and we can see where the bottleneck is.
--
Richard Huxton
Archonet Ltd