diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index cce1618fc1..be2ca3be48 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -3450,8 +3450,9 @@ VALUES ('Albany', NULL, NULL, 'NY'); - These deficiencies will probably be fixed in some future release, - but in the meantime considerable care is needed in deciding whether + Some functionality not implemented for inheritance hierarchies is + implemented for declarative partitioning. + Considerable care is needed in deciding whether partitioning with legacy inheritance is useful for your application. @@ -4674,6 +4675,88 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01'; + + + Declarative Partitioning Best Practices + + + The choice of how to partition a table should be made carefully as the + performance of query planning and execution can be negatively affected by + poorly made design decisions. + + + + One of the most critical design decisions will be the column or columns + by which you partition your data. Often the best choice will be to + partition by the column or set of columns which most commonly appear in + WHERE clauses of queries being executed on the + partitioned table. WHERE clause items that match and + are compatible with the partition key can be used to prune away unneeded + partitions. However, you may be forced into making other decisions by + requirements for the PRIMARY KEY or a + UNIQUE constraint. Removal of unwanted data is also + a factor to consider when planning your partitioning strategy as an entire + partition can be removed fairly quickly. However, if data that you want + to keep exists in that partition then that means having to resort to using + DELETE instead of removing the partition. + + + + Choosing the target number of partitions by which the table should be + divided into is also a critical decision to make. Not having enough + partitions may mean that indexes remain too large and that data locality + remains poor which could result in poor cache hit ratios. However, + dividing the table into too many partitions can also cause issues. + Too many partitions can mean slower query planning times and higher memory + consumption during both query planning and execution. It's also important + to consider what changes may occur in the future when choosing how to + partition your table. For example, if you choose to have one partition + per customer and you currently have a small number of large customers, + what will the implications be if in several years you obtain a large + number of small customers. In this case, it may be better to choose to + partition by HASH and choose a reasonable number of + partitions rather than trying to partition by LIST and + hoping that the number of customers does not increase significantly over + time. + + + + Sub-partitioning can be useful to further divide partitions that are + expected to become larger than other partitions, although excessive + sub-partitioning can easily lead to large numbers of partitions and can + cause the problems mentioned in the preceding paragraph. + + + + It is also important to consider the overhead of partitioning during + query planning and execution. The query planner is generally able to + handle partition hierarchies up a few thousand partitions fairly well, + provided that typical queries prune all but a small number of partitions + during query planning. Planning times become slower and memory + consumption becomes higher when more partitions remain after the planner + performs partition pruning. This is particularly true for the + UPDATE and DELETE commands. Also, + even if most queries are able to prune a large number of partitions during + query planning, it still may be undesirable to have a large number of + partitions as each partition requires metadata about the partition to be + stored in each session that touches it. If each session touches a large + number of partitions over a period of time then the memory consumption for + this may become significant. + + + + With data warehouse type workloads it can make sense to use a larger + number of partitions than with an OLTP type workload. Generally, in data + warehouses, query planning time is less of a concern as the majority of + processing time is spent during query execution. With either of these two + types of workload it is important to make the right decisions early as + re-partitioning large quantities of data can be painfully slow. + Simulations of the intended workload are often beneficial for optimizing + the partitioning strategy. Never assume that more partitions are better + than fewer partitions and vice-versa. + + +