Performance impact of hundreds of partitions

Started by Leonardo Francalanciabout 16 years ago5 messagesgeneral
Jump to latest
#1Leonardo Francalanci
m_lists@yahoo.it

Hi,

increasing shared_buffers has improved *a lot* the number of inserts/second,
so my "problem" [1]http://archives.postgresql.org/pgsql-general/2010-04/msg00611.php is fixed.

But now I'm worried because of the sentence (Tom Lane):

"The partitioning code isn't designed to scale beyond a few dozen partitions"

Is it mainly a planning problem or an execution time problem?

I did a (very quick) test with 3000 empty partitions, and it took 0.5 secs to do
the planning for a simple select with a very simple where condition.
With 300 partitions, planning takes about 30ms.

That's fine for my case, as I don't expect more than 300 partitions; and I could
actually wait for .5 secs more if that helps with such large tables, and I won't
be doing joins.

So: the "scaling" problem would be more evident in case joins were taken into
account? Or there's something else I didn't get?

[1]: http://archives.postgresql.org/pgsql-general/2010-04/msg00611.php

#2Rene Schickbauer
rene.schickbauer@gmail.com
In reply to: Leonardo Francalanci (#1)
Re: Performance impact of hundreds of partitions

Leonardo F wrote:

Is it mainly a planning problem or an execution time problem?

I have here a database with a table partitioned across 400 sub-tables.

I'm using a trigger-based solution with constraint exclusion. The thing
that takes the longest is planning queries. I made THAT problem just go
away for the most part by using cached queries (only works within the
same database connection, but thats no problem for me).

I also tried a rule-based partitioning, that indeed breaks down quickly
performance wise.

Also, the trigger is a ON INSERT AND UPDATE, and only on the main table.
If your main insert/update process knows into which partition to insert,
you can speed that up even more; while still beeing able to use the
automated partitioning for everything else.

LG
Rene

#3Vick Khera
vivek@khera.org
In reply to: Leonardo Francalanci (#1)
Re: Performance impact of hundreds of partitions

On Wed, Apr 21, 2010 at 6:45 AM, Leonardo F <m_lists@yahoo.it> wrote:

"The partitioning code isn't designed to scale beyond a few dozen partitions"

Is it mainly a planning problem or an execution time problem?

I'll bet that is related to the planning and constraint exclusion
parts. I have a couple of tables split into 100 partitions, and they
work extremely well. However, I was able to alter my application such
that it almost always references the correct partition directly. The
only times it does not is when it requires a full scan of all
partitions. All inserts are direct to proper partition.

In my view, it is a big win to partition large tables such that each
partition holds no more than 5 million rows. This keeps the indexes
small, and the query engine can quite easily skip huge hunks of them
on many queries. Also, reindexes can be done pretty quickly and in my
case without seriously disrupting the application -- each table
reindexes in under 5 or 10 seconds. When this was all one table, a
reindex op would lock up the application for upwards of two hours.

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Vick Khera (#3)
Re: Performance impact of hundreds of partitions

Vick Khera wrote:

On Wed, Apr 21, 2010 at 6:45 AM, Leonardo F <m_lists@yahoo.it> wrote:

"The partitioning code isn't designed to scale beyond a few dozen partitions"

Is it mainly a planning problem or an execution time problem?

I'll bet that is related to the planning and constraint exclusion
parts. I have a couple of tables split into 100 partitions, and they
work extremely well.

Keep in mind that 100 is only 8 dozen ...

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#5Leonardo Francalanci
m_lists@yahoo.it
In reply to: Rene Schickbauer (#2)
Re: Performance impact of hundreds of partitions

The thing that takes the longest is planning queries. I made THAT problem just
go away for the most part by using cached queries (only works within the same
database connection, but thats no problem for me).

What do you mean by "cached queries"? Prepared statements?