diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 13a1001761..45f369806a 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -2833,8 +2833,9 @@ VALUES ('Albany', NULL, NULL, 'NY'); - These deficiencies will probably be fixed in some future release, - but in the meantime considerable care is needed in deciding whether + Some functionality not implemented for inheritance hierarchies is + implemented for declarative partitioning. + Considerable care is needed in deciding whether partitioning with legacy inheritance is useful for your application. @@ -3927,6 +3928,83 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01'; + + + Declarative Partitioning Best Practices + + + The choice of how to partition a table should be made carefully as the + performance of query planning and execution can be negatively affected by + poorly made design decisions. + + + + One of the most critical design decisions will be the column or columns + by which you partition your data. Often the best choice will be to + partition by the column or set of columns which most commonly appear in + WHERE clauses of queries being executed on the + partitioned table. WHERE clause items that match and + are compatible with the partition key can be used to prune away unneeded + partitions. Removal of unwanted data is also a factor to consider when + planning your partitioning strategy as an entire partition can be removed + fairly quickly. However, if data that you want to keep exists in that + partition then that means having to resort to using + DELETE instead of removing the partition. + + + + Choosing the target number of partitions by which the table should be + divided into is also a critical decision to make. Not having enough + partitions may mean that indexes remain too large and that data locality + remains poor which could result in poor cache hit ratios. However, + dividing the table into too many partitions can also cause issues. + Too many partitions can mean slower query planning times and higher memory + consumption during both query planning and execution. It's also important + to consider what changes may occur in the future when choosing how to + partition your table. For example, if you choose to have one partition + per customer and you currently have a small number of large customers, + what will the implications be if in several years you obtain a large + number of small customers. In this case, it may be better to choose to + partition by RANGE and choose a reasonable number of + partitions rather than trying to partition by LIST and + hoping that the number of customers does not increase significantly over + time. + + + + Sub-partitioning can be useful to further divide partitions that are + expected to become larger than other partitions, although excessive + sub-partitioning can easily lead to large numbers of partitions and can + cause the problems mentioned in the preceding paragraph. + + + + It is also important to consider the overhead of partitioning during + query planning and execution. The query planner is generally able to + handle partition hierarchies up a few hundred partitions. With larger + numbers of partitions planning times become slower and memory consumption + becomes higher. This is particularly true for the + UPDATE and DELETE commands. It also + may be undesirable to have a large number of partitions as each partition + requires metadata about the partition to be stored in each session that + touches it. If each session touches a large number of partitions over a + period of time then the memory consumption for this may become + significant. + + + + With data warehouse type workloads it can make sense to use a larger + number of partitions than with an OLTP type workload. Generally, in data + warehouses, query planning time is less of a concern as the majority of + processing time is spent during query execution. With either of these two + types of workload it is important to make the right decisions early as + re-partitioning large quantities of data can be painfully slow. + Simulations of the intended workload are often beneficial for optimizing + the partitioning strategy. Never assume that more partitions are better + than fewer partitions and vice-versa. + + +