At what point does a big table start becoming too big?

Started by Nickover 13 years ago14 messagesgeneral
Jump to latest
#1Nick
nboutelier@gmail.com

I have a table with 40 million rows and haven't had any performance issues yet.

Are there any rules of thumb as to when a table starts getting too big?

For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc?

#2Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Nick (#1)
Re: At what point does a big table start becoming too big?

On 23/08/12 11:06, Nick wrote:

I have a table with 40 million rows and haven't had any performance issues yet.

Are there any rules of thumb as to when a table starts getting too big?

For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc?

I think it would be good to specify the context.

For example:
The timeliness of a database required to support an ship based
anti-missile system would require far more stringent timing
considerations than a database used to retrieve scientific images based
on complicated criteria.

The size of records, how often updated/deleted, types of queries, ...
would also be useful.

Unfortunately it might simply be a case of "It depends..."!

Cheers,
Gavin

#3Martin French
Martin.French@romaxtech.com
In reply to: Nick (#1)
Re: At what point does a big table start becoming too big?

<html><body><p><tt><font size="2">&gt; <br>&gt; I have a table with 40 million rows and haven't had any performance <br>&gt; issues yet.<br>&gt; <br>&gt; Are there any rules of thumb as to when a table starts getting too big?<br>&gt; <br>&gt; For example, maybe if the index size is 6x the amount of ram, if the<br>&gt; table is 10% of total disk space, etc?<br>&gt; <br>&gt; <br>&gt; -- <br>&gt; </font></tt><br><br><tt><font size="2">My rule here is that a table is too big when performance starts degrading beyond an acceptable level. </font></tt><br><br><tt><font size="2">If the database and server are delivering consistent and acceptable performance levels despite an index being 6x RAM or a table consuming 10% of disk, then I tend to leave it be until an issue is raised.</font></tt><br><br><tt><font size="2">Cheers<br></font></tt><br><tt><font size="2">Martin</font></tt><font face="sans-serif">=============================================

Romax Technology Limited
Rutherford House
Nottingham Science & Technology Park
Nottingham,
NG7 2PZ
England

Telephone numbers:
+44 (0)115 951 88 00 (main)

For other office locations see:
http://www.romaxtech.com/Contact
=================================
===============
E-mail: info@romaxtech.com
Website: www.romaxtech.com
=================================

================
Confidentiality Statement
This transmission is for the addressee only and contains information that is confidential and privileged.
Unless you are the named addressee, or authorised to receive it on behalf of the addressee
you may not copy or use it, or disclose it to anyone else.
If you have received this transmission in error please delete from your system and contact the sender. Thank you for your cooperation.
=================================================</font>
</body></html>

#4Chris Travers
chris.travers@gmail.com
In reply to: Nick (#1)
Re: At what point does a big table start becoming too big?

On Wed, Aug 22, 2012 at 4:06 PM, Nick <nboutelier@gmail.com> wrote:

I have a table with 40 million rows and haven't had any performance issues yet.

Are there any rules of thumb as to when a table starts getting too big?

For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc?

Performance on a big table is going to depend very heavily on the
sorts of queries executed against it. I don't think you can come up
with a rule of thumb of that sort.

Best Wishes,
Chris Travers

#5Jasen Betts
jasen@xnet.co.nz
In reply to: Nick (#1)
Re: At what point does a big table start becoming too big?

On 2012-08-22, Nick <nboutelier@gmail.com> wrote:

I have a table with 40 million rows and haven't had any performance issues yet.

Are there any rules of thumb as to when a table starts getting too big?

when you need to run a query that needs to fetch too many rows.

For example, maybe if the index size is 6x the amount of ram,
if the table is 10% of total disk space, etc?

If you only need one row at a time and you have the indices for it
no size is too big, the larger they are the more impressive
indices are. O(log(n)) beats O(n) more and more as n grows.

--
⚂⚃ 100% natural

#6Bill Moran
wmoran@potentialtech.com
In reply to: Martin French (#3)
Re: At what point does a big table start becoming too big?

In response to "Martin French" <Martin.French@romaxtech.com>:

I have a table with 40 million rows and haven't had any performance
issues yet.

Are there any rules of thumb as to when a table starts getting too big?

For example, maybe if the index size is 6x the amount of ram, if the
table is 10% of total disk space, etc?

My rule here is that a table is too big when performance starts degrading beyond an acceptable level.

The challenge there is that if you wait until performance degrades
beyond an acceptable level, you've allowed yourself to get into a
situation where clients are upset and frustrated, and fixing the
problem is difficult because there's so much data to manipulate to
rearrange things.

And the advice I have along those lines is to establish now what
constitutes unacceptable performance, and put some sort of monitoring
and tracking in place to know what your performance degradation looks
like and predict when you'll have to react. For example, a MRTG
graph that runs an experimental query once a day during off hours and
graphs the time it takes vs. the # of rows in the table will prove
a valuable tool that can sometimes predict exactly when you'll have
to change things before it becomes a problem. Other tricks work as
well, such as having the application send an email any time a process
takes more than 50% of the allowable maximum time.

The key is to have visibility into what's going on so your guesses
are at least informed. People will often point out that no monitoring
or trend tracking is 100% accurate, but if it allows you to predict
and plan for 90% of the future issues, you'll have that much more time
available to deal with the 10% that you don't expect.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

#7Merlin Moncure
mmoncure@gmail.com
In reply to: Nick (#1)
Re: At what point does a big table start becoming too big?

On Wed, Aug 22, 2012 at 6:06 PM, Nick <nboutelier@gmail.com> wrote:

I have a table with 40 million rows and haven't had any performance issues yet.

Are there any rules of thumb as to when a table starts getting too big?

For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc?

Well, that begs the question: ...and do what? I guess you probably
mean partitioning.

Partitioning doesn't reduce index size -- it makes total index size
*bigger* since you have to duplicate higher nodes in the index --
unless you can exploit the table structure around the partition so
that less fields have to be indexed.

Where partitioning helps is by speeding certain classes of bulk
operations like deleting a bunch of rows -- maybe you can set it up so
that a partition can be dropped instead for a huge efficiency win.
Partitioning also helps by breaking up administrative operations such
as vacuum, analyze, cluster, create index, reindex, etc. So I'd argue
that it's time to start thinking about plan 'b' when you find yourself
getting concerned about performance of those operations.

Partitioning aside, the way to reduce the number of rows you're
dealing with is to explore reorganizing your data: classic
normalization or use of arrays are a couple of examples of things you
can try.

merlin

#8Martin French
Martin.French@romaxtech.com
In reply to: Bill Moran (#6)
Re: At what point does a big table start becoming too big?

<html><body><p><tt><font size="2"><br>&gt; &gt; &gt;<br>&gt; &gt; &gt; I have a table with 40 million rows and haven't had any performance<br>&gt; &gt; &gt; issues yet.<br>&gt; &gt; &gt;<br>&gt; &gt; &gt; Are there any rules of thumb as to when a table starts getting too big?<br>&gt; &gt; &gt;<br>&gt; &gt; &gt; For example, maybe if the index size is 6x the amount of ram, if the<br>&gt; &gt; &gt; table is 10% of total disk space, etc?<br>&gt; &gt; <br>&gt; &gt; My rule here is that a table is too big when performance starts <br>&gt; degrading beyond an acceptable level.<br>&gt; <br>&gt; The challenge there is that if you wait until performance degrades<br>&gt; beyond an acceptable level, you've allowed yourself to get into a<br>&gt; situation where clients are upset and frustrated, and fixing the<br>&gt; problem is difficult because there's so much data to manipulate to<br>&gt; rearrange things.<br>&gt; <br>Apologies, I could/should have phrased that better..</font></tt><br><br><tt><font size="2">My rule here is that a table is too big when performance starts degrading beyond a MEASURABLE level. :)</font></tt><br><br><tt><font size="2">Cheers</font></tt><br><br><tt><font size="2">Martin </font></tt><br><font face="sans-serif">=============================================

Romax Technology Limited
Rutherford House
Nottingham Science & Technology Park
Nottingham,
NG7 2PZ
England

Telephone numbers:
+44 (0)115 951 88 00 (main)

For other office locations see:
http://www.romaxtech.com/Contact
=================================
===============
E-mail: info@romaxtech.com
Website: www.romaxtech.com
=================================

================
Confidentiality Statement
This transmission is for the addressee only and contains information that is confidential and privileged.
Unless you are the named addressee, or authorised to receive it on behalf of the addressee
you may not copy or use it, or disclose it to anyone else.
If you have received this transmission in error please delete from your system and contact the sender. Thank you for your cooperation.
=================================================</font>
</body></html>

#9Chris Travers
chris.travers@gmail.com
In reply to: Merlin Moncure (#7)
Re: At what point does a big table start becoming too big?

On Thu, Aug 23, 2012 at 6:46 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Wed, Aug 22, 2012 at 6:06 PM, Nick <nboutelier@gmail.com> wrote:

I have a table with 40 million rows and haven't had any performance issues yet.

Are there any rules of thumb as to when a table starts getting too big?

For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc?

Well, that begs the question: ...and do what? I guess you probably
mean partitioning.

Partitioning doesn't reduce index size -- it makes total index size
*bigger* since you have to duplicate higher nodes in the index --
unless you can exploit the table structure around the partition so
that less fields have to be indexed.

Depending on the operation it may make the effective index size bigger
or smaller. For example if querying only one child table your
effective index size is much smaller.

However, if you are worried about that, partial indexes rock :-D

Best Wishes,
Chris Travers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chris Travers (#9)
Re: At what point does a big table start becoming too big?

Chris Travers <chris.travers@gmail.com> writes:

On Thu, Aug 23, 2012 at 6:46 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

Partitioning doesn't reduce index size -- it makes total index size
*bigger* since you have to duplicate higher nodes in the index --
unless you can exploit the table structure around the partition so
that less fields have to be indexed.

Depending on the operation it may make the effective index size bigger
or smaller. For example if querying only one child table your
effective index size is much smaller.

I tend to think of it like this: partitioning means *manually* replacing
the first level of index search.

As such, it is almost never a win for either complexity or performance
of simple searches and updates. As Merlin said, pretty much the only
compelling reason to do it is if you can match up the partition
boundaries with bulk tasks that are common in your application, such as
dropping a month's worth of data at a time.

regards, tom lane

#11Jeff Janes
jeff.janes@gmail.com
In reply to: Nick (#1)
Re: At what point does a big table start becoming too big?

On Wed, Aug 22, 2012 at 4:06 PM, Nick <nboutelier@gmail.com> wrote:

I have a table with 40 million rows and haven't had any performance issues yet.

Are there any rules of thumb as to when a table starts getting too big?

No. Assuming you decided it were "too big", what could you do about it?

If there are chunks of data that you don't need anymore, why wait for
the table to be become too big before removing it?

And partitioning very often isn't the answer, either. There are very
few problems that ill-conceived partitioning won't make worse. And
there are very many problems which even the best-conceived
partitioning will fail to improve. If you have one of the cases where
partitioning is a good solution, don't wait for the table to become
'too big'. Just go do it.

For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc?

If you need to be able to rapidly insert new rows in bulk, and you
can't sort them before inserting because there are multiple indexes
with completely different sort order and they cover the entire key
range of at least some of the indexes, than your performance will
collapse long before you get to 6x the amount of RAM. But, what can
you do about it? Maybe partitioning will fix this, maybe it won't.
If it will, why wait for a rule of thumb to be met? If it won't, what
do you actually do once the rule of thumb is met?

I guess one rule of them I would have is, if for some reason I had to
cluster or reindex the table, how long would it take to do so? If
that is much longer than I can reasonably schedule as a maintenance
window, I would be worried.

Cheers,

Jeff

#12Jeff Janes
jeff.janes@gmail.com
In reply to: Bill Moran (#6)
Re: At what point does a big table start becoming too big?

On Thu, Aug 23, 2012 at 6:37 AM, Bill Moran <wmoran@potentialtech.com> wrote:

In response to "Martin French" <Martin.French@romaxtech.com>:

I have a table with 40 million rows and haven't had any performance
issues yet.

Are there any rules of thumb as to when a table starts getting too big?

For example, maybe if the index size is 6x the amount of ram, if the
table is 10% of total disk space, etc?

My rule here is that a table is too big when performance starts degrading beyond an acceptable level.

The challenge there is that if you wait until performance degrades
beyond an acceptable level, you've allowed yourself to get into a
situation where clients are upset and frustrated, and fixing the
problem is difficult because there's so much data to manipulate to
rearrange things.

Yes, I agree with that.

And the advice I have along those lines is to establish now what
constitutes unacceptable performance, and put some sort of monitoring
and tracking in place to know what your performance degradation looks
like and predict when you'll have to react. For example, a MRTG
graph that runs an experimental query once a day during off hours and
graphs the time it takes vs. the # of rows in the table will prove
a valuable tool that can sometimes predict exactly when you'll have
to change things before it becomes a problem.

This seems inconsistent with your previous advice. By the time your
experimental query shows a problem, you no longer have any maintenance
windows left large enough to fix it. Unless your experimental query
was a reindex or something non-production like that, in which case
running it on a production server, even off-hours, doesn't seem like a
good idea.

Cheers,

Jeff

#13Bill Moran
wmoran@potentialtech.com
In reply to: Jeff Janes (#12)
Re: At what point does a big table start becoming too big?

On Thu, 23 Aug 2012 17:56:37 -0700 Jeff Janes <jeff.janes@gmail.com> wrote:

On Thu, Aug 23, 2012 at 6:37 AM, Bill Moran <wmoran@potentialtech.com> wrote:

In response to "Martin French" <Martin.French@romaxtech.com>:

I have a table with 40 million rows and haven't had any performance
issues yet.

Are there any rules of thumb as to when a table starts getting too big?

For example, maybe if the index size is 6x the amount of ram, if the
table is 10% of total disk space, etc?

My rule here is that a table is too big when performance starts degrading beyond an acceptable level.

The challenge there is that if you wait until performance degrades
beyond an acceptable level, you've allowed yourself to get into a
situation where clients are upset and frustrated, and fixing the
problem is difficult because there's so much data to manipulate to
rearrange things.

Yes, I agree with that.

And the advice I have along those lines is to establish now what
constitutes unacceptable performance, and put some sort of monitoring
and tracking in place to know what your performance degradation looks
like and predict when you'll have to react. For example, a MRTG
graph that runs an experimental query once a day during off hours and
graphs the time it takes vs. the # of rows in the table will prove
a valuable tool that can sometimes predict exactly when you'll have
to change things before it becomes a problem.

This seems inconsistent with your previous advice. By the time your
experimental query shows a problem, you no longer have any maintenance
windows left large enough to fix it. Unless your experimental query
was a reindex or something non-production like that, in which case
running it on a production server, even off-hours, doesn't seem like a
good idea.

Perhaps I didn't explain the approach sufficiently.

If you can establish something like, "This specific SELECT has to run
in under 5 minutes to meet the client's expectations" you can then
time how long that query takes each time it's run (by capturing that
information in the application, for example ... or by running it in
some automated fashion ... possibly other methods as well).

If you capture that runtime on a regular basis and put the results
on a graph in concert with other relevant data, such as the number
of rows in the related tables, size of the data, etc, you quickly
get a good picture of how fast things are growing, and frequently
you can project the line out into the future and say things like
"if we don't come up with a better way to do this by Sept of next
year, we're going to exceed our allowed run time." You can then
take that very detailed information to business planners and point
out that they need to schedule developer time _before_ then if they
don't want the application to slow down below the allowable level.

Unless you work for somewhere that has unlimited resources, your
time is always split between feature requests, day to day operations,
firefighting, etc. In my experience, keeping things like this
under control is often a matter of having enough information to
justify why your optimization project is more important than
whizbang feature x that marketing wants so bad.

Of course, if you work somewhere with unlimited resources, you
should let me know so I can send in my resume.

And none of what I'm suggesting is intended to belittle the other
suggestions either -- if you know of a way to optimize the data
better, why not do it now? If you can be purging old data, why
wait until performance is a problem to start purging, etc.

It's just another trick to have in your bag.

--
Bill Moran <wmoran@potentialtech.com>

#14Ondrej Ivanič
ondrej.ivanic@gmail.com
In reply to: Bill Moran (#6)
Re: At what point does a big table start becoming too big?

Hi,

On 23 August 2012 23:37, Bill Moran <wmoran@potentialtech.com> wrote:

And the advice I have along those lines is to establish now what
constitutes unacceptable performance, and put some sort of monitoring
and tracking in place to know what your performance degradation looks
like and predict when you'll have to react. For example, a MRTG
graph that runs an experimental query once a day during off hours and
graphs the time it takes vs. the # of rows in the table will prove
a valuable tool that can sometimes predict exactly when you'll have
to change things before it becomes a problem. Other tricks work as
well, such as having the application send an email any time a process
takes more than 50% of the allowable maximum time.

I like to use APDEX (http://apdex.org/specs.html). You can change your
database to all time for all statements and then calculate APDEX score
based on last N log entries (> 10). APDEX score is weighted score
based on number of datapoints within three zones:
0...T Satisfied Zone
T..F (=4*T) Tolerating Zone
4T... Frustrated Zone

you can choose T (or F; then T = F/4) i.e. under normal circumstances
all queries should finish under 20ms (T = 20ms, F = 4T = 80ms). Apdex
score is:

score = (Satisfied count + Tolerating count / 2) / Total samples

You can get this number, for example, every minute and plot it using
Ganglia / MRTG / ...

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)