[POC] hash partitioning
Hi all,
Now we have a declarative partitioning, but hash partitioning is not
implemented yet. Attached is a POC patch to add the hash partitioning
feature. I know we will need more discussions about the syntax and other
specifications before going ahead the project, but I think this runnable
code might help to discuss what and how we implement this.
* Description
In this patch, the hash partitioning implementation is basically based
on the list partitioning mechanism. However, partition bounds cannot be
specified explicitly, but this is used internally as hash partition
index, which is calculated when a partition is created or attached.
The tentative syntax to create a partitioned table is as bellow;
CREATE TABLE h (i int) PARTITION BY HASH(i) PARTITIONS 3 USING hashint4;
The number of partitions is specified by PARTITIONS, which is currently
constant and cannot be changed, but I think this is needed to be changed in
some manner. A hash function is specified by USING. Maybe, specifying hash
function may be ommitted, and in this case, a default hash function
corresponding to key type will be used.
A partition table can be create as bellow;
CREATE TABLE h1 PARTITION OF h;
CREATE TABLE h2 PARTITION OF h;
CREATE TABLE h3 PARTITION OF h;
FOR VALUES clause cannot be used, and the partition bound is
calclulated automatically as partition index of single integer value.
When trying create partitions more than the number specified
by PARTITIONS, it gets an error.
postgres=# create table h4 partition of h;
ERROR: cannot create hash partition more than 3 for h
An inserted record is stored in a partition whose index equals
abs(hashfunc(key)) % <number_of_partitions>. In the above
example, this is abs(hashint4(i))%3.
postgres=# insert into h (select generate_series(0,20));
INSERT 0 21
postgres=# select *,tableoid::regclass from h;
i | tableoid
----+----------
0 | h1
1 | h1
2 | h1
4 | h1
8 | h1
10 | h1
11 | h1
14 | h1
15 | h1
17 | h1
20 | h1
5 | h2
12 | h2
13 | h2
16 | h2
19 | h2
3 | h3
6 | h3
7 | h3
9 | h3
18 | h3
(21 rows)
* Todo / discussions
In this patch, we cannot change the number of partitions specified
by PARTITIONS. I we can change this, the partitioning rule
(<partition index> = abs(hashfunc(key)) % <number_of_partitions>)
is also changed and then we need reallocatiing records between
partitions.
In this patch, user can specify a hash function USING. However,
we migth need default hash functions which are useful and
proper for hash partitioning.
Currently, even when we issue SELECT query with a condition,
postgres looks into all partitions regardless of each partition's
constraint, because this is complicated such like "abs(hashint4(i))%3 = 0".
postgres=# explain select * from h where i = 10;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..125.62 rows=40 width=4)
-> Seq Scan on h (cost=0.00..0.00 rows=1 width=4)
Filter: (i = 10)
-> Seq Scan on h1 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
-> Seq Scan on h2 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
-> Seq Scan on h3 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
(9 rows)
However, if we modify a condition into a same expression
as the partitions constraint, postgres can exclude unrelated
table from search targets. So, we might avoid the problem
by converting the qual properly before calling predicate_refuted_by().
postgres=# explain select * from h where abs(hashint4(i))%3 = abs(hashint4(10))%3;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..61.00 rows=14 width=4)
-> Seq Scan on h (cost=0.00..0.00 rows=1 width=4)
Filter: ((abs(hashint4(i)) % 3) = 2)
-> Seq Scan on h3 (cost=0.00..61.00 rows=13 width=4)
Filter: ((abs(hashint4(i)) % 3) = 2)
(5 rows)
Best regards,
Yugo Nagata
--
Yugo Nagata <nagata@sraoss.co.jp>
Attachments:
hash_partition.patchtext/x-diff; name=hash_partition.patchDownload+307-19
Hi, Yugo.
Looks like a great feature! I'm going to take a closer look on your code
and write a feedback shortly. For now I can only tell that you forgot
to include some documentation in the patch.
I've added a corresponding entry to current commitfest [1]https://commitfest.postgresql.org/13/1059/. Hope you
don't mind. If it's not too much trouble could you please register on a
commitfest site and add yourself to this entry as an author? I'm pretty
sure someone is using this information for writing release notes or
something like this.
[1]: https://commitfest.postgresql.org/13/1059/
On Tue, Feb 28, 2017 at 11:33:13PM +0900, Yugo Nagata wrote:
Hi all,
Now we have a declarative partitioning, but hash partitioning is not
implemented yet. Attached is a POC patch to add the hash partitioning
feature. I know we will need more discussions about the syntax and other
specifications before going ahead the project, but I think this runnable
code might help to discuss what and how we implement this.* Description
In this patch, the hash partitioning implementation is basically based
on the list partitioning mechanism. However, partition bounds cannot be
specified explicitly, but this is used internally as hash partition
index, which is calculated when a partition is created or attached.The tentative syntax to create a partitioned table is as bellow;
CREATE TABLE h (i int) PARTITION BY HASH(i) PARTITIONS 3 USING hashint4;
The number of partitions is specified by PARTITIONS, which is currently
constant and cannot be changed, but I think this is needed to be changed in
some manner. A hash function is specified by USING. Maybe, specifying hash
function may be ommitted, and in this case, a default hash function
corresponding to key type will be used.A partition table can be create as bellow;
CREATE TABLE h1 PARTITION OF h;
CREATE TABLE h2 PARTITION OF h;
CREATE TABLE h3 PARTITION OF h;FOR VALUES clause cannot be used, and the partition bound is
calclulated automatically as partition index of single integer value.When trying create partitions more than the number specified
by PARTITIONS, it gets an error.postgres=# create table h4 partition of h;
ERROR: cannot create hash partition more than 3 for hAn inserted record is stored in a partition whose index equals
abs(hashfunc(key)) % <number_of_partitions>. In the above
example, this is abs(hashint4(i))%3.postgres=# insert into h (select generate_series(0,20));
INSERT 0 21postgres=# select *,tableoid::regclass from h;
i | tableoid
----+----------
0 | h1
1 | h1
2 | h1
4 | h1
8 | h1
10 | h1
11 | h1
14 | h1
15 | h1
17 | h1
20 | h1
5 | h2
12 | h2
13 | h2
16 | h2
19 | h2
3 | h3
6 | h3
7 | h3
9 | h3
18 | h3
(21 rows)* Todo / discussions
In this patch, we cannot change the number of partitions specified
by PARTITIONS. I we can change this, the partitioning rule
(<partition index> = abs(hashfunc(key)) % <number_of_partitions>)
is also changed and then we need reallocatiing records between
partitions.In this patch, user can specify a hash function USING. However,
we migth need default hash functions which are useful and
proper for hash partitioning.Currently, even when we issue SELECT query with a condition,
postgres looks into all partitions regardless of each partition's
constraint, because this is complicated such like "abs(hashint4(i))%3 = 0".postgres=# explain select * from h where i = 10;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..125.62 rows=40 width=4)
-> Seq Scan on h (cost=0.00..0.00 rows=1 width=4)
Filter: (i = 10)
-> Seq Scan on h1 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
-> Seq Scan on h2 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
-> Seq Scan on h3 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
(9 rows)However, if we modify a condition into a same expression
as the partitions constraint, postgres can exclude unrelated
table from search targets. So, we might avoid the problem
by converting the qual properly before calling predicate_refuted_by().postgres=# explain select * from h where abs(hashint4(i))%3 = abs(hashint4(10))%3;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..61.00 rows=14 width=4)
-> Seq Scan on h (cost=0.00..0.00 rows=1 width=4)
Filter: ((abs(hashint4(i)) % 3) = 2)
-> Seq Scan on h3 (cost=0.00..61.00 rows=13 width=4)
Filter: ((abs(hashint4(i)) % 3) = 2)
(5 rows)Best regards,
Yugo Nagata--
Yugo Nagata <nagata@sraoss.co.jp>
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 41c0056..3820920 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -3074,7 +3074,7 @@ StorePartitionKey(Relation rel, AttrNumber *partattrs, List *partexprs, Oid *partopclass, - Oid *partcollation) + Oid *partcollation, int16 partnparts, Oid hashfunc) { int i; int2vector *partattrs_vec; @@ -3121,6 +3121,8 @@ StorePartitionKey(Relation rel, values[Anum_pg_partitioned_table_partrelid - 1] = ObjectIdGetDatum(RelationGetRelid(rel)); values[Anum_pg_partitioned_table_partstrat - 1] = CharGetDatum(strategy); values[Anum_pg_partitioned_table_partnatts - 1] = Int16GetDatum(partnatts); + values[Anum_pg_partitioned_table_partnparts - 1] = Int16GetDatum(partnparts); + values[Anum_pg_partitioned_table_parthashfunc - 1] = ObjectIdGetDatum(hashfunc); values[Anum_pg_partitioned_table_partattrs - 1] = PointerGetDatum(partattrs_vec); values[Anum_pg_partitioned_table_partclass - 1] = PointerGetDatum(partopclass_vec); values[Anum_pg_partitioned_table_partcollation - 1] = PointerGetDatum(partcollation_vec); diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 4bcef58..24e69c6 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -36,6 +36,8 @@ #include "optimizer/clauses.h" #include "optimizer/planmain.h" #include "optimizer/var.h" +#include "parser/parse_func.h" +#include "parser/parse_oper.h" #include "rewrite/rewriteManip.h" #include "storage/lmgr.h" #include "utils/array.h" @@ -120,6 +122,7 @@ static int32 qsort_partition_rbound_cmp(const void *a, const void *b,static List *get_qual_for_list(PartitionKey key, PartitionBoundSpec *spec); static List *get_qual_for_range(PartitionKey key, PartitionBoundSpec *spec); +static List *get_qual_for_hash(PartitionKey key, PartitionBoundSpec *spec); static Oid get_partition_operator(PartitionKey key, int col, StrategyNumber strategy, bool *need_relabel); static List *generate_partition_qual(Relation rel); @@ -236,7 +239,8 @@ RelationBuildPartitionDesc(Relation rel) oids[i++] = lfirst_oid(cell);/* Convert from node to the internal representation */ - if (key->strategy == PARTITION_STRATEGY_LIST) + if (key->strategy == PARTITION_STRATEGY_LIST || + key->strategy == PARTITION_STRATEGY_HASH) { List *non_null_values = NIL;@@ -251,7 +255,7 @@ RelationBuildPartitionDesc(Relation rel)
ListCell *c;
PartitionBoundSpec *spec = lfirst(cell);- if (spec->strategy != PARTITION_STRATEGY_LIST) + if (spec->strategy != key->strategy) elog(ERROR, "invalid strategy in partition bound spec");foreach(c, spec->listdatums)
@@ -464,6 +468,7 @@ RelationBuildPartitionDesc(Relation rel)
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_HASH:
{
boundinfo->has_null = found_null;
boundinfo->indexes = (int *) palloc(ndatums * sizeof(int));
@@ -829,6 +834,18 @@ check_new_partition_bound(char *relname, Relation parent, Node *bound)
break;
}+ case PARTITION_STRATEGY_HASH: + { + Assert(spec->strategy == PARTITION_STRATEGY_HASH); + + if (partdesc->nparts + 1 > key->partnparts) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("cannot create hash partition more than %d for %s", + key->partnparts, RelationGetRelationName(parent)))); + break; + } + default: elog(ERROR, "unexpected partition strategy: %d", (int) key->strategy); @@ -916,6 +933,11 @@ get_qual_from_partbound(Relation rel, Relation parent, Node *bound) my_qual = get_qual_for_range(key, spec); break;+ case PARTITION_STRATEGY_HASH: + Assert(spec->strategy == PARTITION_STRATEGY_LIST); + my_qual = get_qual_for_hash(key, spec); + break; + default: elog(ERROR, "unexpected partition strategy: %d", (int) key->strategy); @@ -1146,6 +1168,84 @@ RelationGetPartitionDispatchInfo(Relation rel, int lockmode, return pd; }+/* + * convert_expr_for_hash + * + * Converts a expr for a hash partition's constraint. + * expr is converted into 'abs(hashfunc(expr)) % npart". + * + * npart: number of partitions + * hashfunc: OID of hash function + */ +Expr * +convert_expr_for_hash(Expr *expr, int npart, Oid hashfunc) +{ + FuncExpr *func, + *abs; + Expr *modexpr; + Oid modoid; + Oid int4oid[1] = {INT4OID}; + + ParseState *pstate = make_parsestate(NULL); + Value *val_npart = makeInteger(npart); + Node *const_npart = (Node *) make_const(pstate, val_npart, -1); + + /* hash function */ + func = makeFuncExpr(hashfunc, + INT4OID, + list_make1(expr), + 0, + 0, + COERCE_EXPLICIT_CALL); + + /* Abs */ + abs = makeFuncExpr(LookupFuncName(list_make1(makeString("abs")), 1, int4oid, false), + INT4OID, + list_make1(func), + 0, + 0, + COERCE_EXPLICIT_CALL); + + /* modulo by npart */ + modoid = LookupOperName(pstate, list_make1(makeString("%")), INT4OID, INT4OID, false, -1); + modexpr = make_opclause(modoid, INT4OID, false, (Expr*)abs, (Expr*)const_npart, 0, 0); + + return modexpr; +} + + +/* + * get_next_hash_partition_index + * + * Returns the minimal index which is not used for hash partition. + */ +int +get_next_hash_partition_index(Relation parent) +{ + PartitionKey key = RelationGetPartitionKey(parent); + PartitionDesc partdesc = RelationGetPartitionDesc(parent); + + int i; + bool *used = palloc0(sizeof(int) * key->partnparts); + + /* mark used for existing partition indexs */ + for (i = 0; i < partdesc->boundinfo->ndatums; i++) + { + Datum* datum = partdesc->boundinfo->datums[i]; + int idx = DatumGetInt16(datum[0]); + + if (!used[idx]) + used[idx] = true; + } + + /* find the minimal unused index */ + for (i = 0; i < key->partnparts; i++) + if (!used[i]) + break; + + return i; +} + /* Module-local functions *//*
@@ -1467,6 +1567,43 @@ get_qual_for_range(PartitionKey key, PartitionBoundSpec *spec)
}/* + * get_qual_for_hash + * + * Returns a list of expressions to use as a hash partition's constraint. + */ +static List * +get_qual_for_hash(PartitionKey key, PartitionBoundSpec *spec) +{ + List *result; + Expr *keyCol; + Expr *expr; + Expr *opexpr; + Oid operoid; + ParseState *pstate = make_parsestate(NULL); + + /* Left operand */ + if (key->partattrs[0] != 0) + keyCol = (Expr *) makeVar(1, + key->partattrs[0], + key->parttypid[0], + key->parttypmod[0], + key->parttypcoll[0], + 0); + else + keyCol = (Expr *) copyObject(linitial(key->partexprs)); + + expr = convert_expr_for_hash(keyCol, key->partnparts, key->parthashfunc); + + /* equals the listdaums value */ + operoid = LookupOperName(pstate, list_make1(makeString("=")), INT4OID, INT4OID, false, -1); + opexpr = make_opclause(operoid, BOOLOID, false, expr, linitial(spec->listdatums), 0, 0); + + result = list_make1(opexpr); + + return result; +} + +/* * get_partition_operator * * Return oid of the operator of given strategy for a given partition key @@ -1730,6 +1867,11 @@ get_partition_for_tuple(PartitionDispatch *pd, (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), errmsg("range partition key of row contains null"))); } + else if (key->strategy == PARTITION_STRATEGY_HASH) + { + values[0] = OidFunctionCall1(key->parthashfunc, values[0]); + values[0] = Int16GetDatum(Abs(DatumGetInt16(values[0])) % key->partnparts); + }if (partdesc->boundinfo->has_null && isnull[0]) /* Tuple maps to the null-accepting list partition */ @@ -1744,6 +1886,7 @@ get_partition_for_tuple(PartitionDispatch *pd, switch (key->strategy) { case PARTITION_STRATEGY_LIST: + case PARTITION_STRATEGY_HASH: if (cur_offset >= 0 && equal) cur_index = partdesc->boundinfo->indexes[cur_offset]; else @@ -1968,6 +2111,7 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, switch (key->strategy) { case PARTITION_STRATEGY_LIST: + case PARTITION_STRATEGY_HASH: cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0], key->partcollation[0], bound_datums[0], diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 3cea220..5a28cc0 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -41,6 +41,7 @@ #include "catalog/pg_inherits_fn.h" #include "catalog/pg_namespace.h" #include "catalog/pg_opclass.h" +#include "catalog/pg_proc.h" #include "catalog/pg_tablespace.h" #include "catalog/pg_trigger.h" #include "catalog/pg_type.h" @@ -77,6 +78,7 @@ #include "parser/parse_oper.h" #include "parser/parse_relation.h" #include "parser/parse_type.h" +#include "parser/parse_func.h" #include "parser/parse_utilcmd.h" #include "parser/parser.h" #include "pgstat.h" @@ -450,7 +452,7 @@ static void RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid, void *arg); static bool is_partition_attr(Relation rel, AttrNumber attnum, bool *used_in_expr); static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy); -static void ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, +static void ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, Oid *partatttypes, List **partexprs, Oid *partopclass, Oid *partcollation); static void CreateInheritance(Relation child_rel, Relation parent_rel); static void RemoveInheritance(Relation child_rel, Relation parent_rel); @@ -799,8 +801,10 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, AttrNumber partattrs[PARTITION_MAX_KEYS]; Oid partopclass[PARTITION_MAX_KEYS]; Oid partcollation[PARTITION_MAX_KEYS]; + Oid partatttypes[PARTITION_MAX_KEYS]; List *partexprs = NIL; List *cmds = NIL; + Oid hashfuncOid = InvalidOid;/* * We need to transform the raw parsetrees corresponding to partition @@ -811,15 +815,40 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, stmt->partspec = transformPartitionSpec(rel, stmt->partspec, &strategy); ComputePartitionAttrs(rel, stmt->partspec->partParams, - partattrs, &partexprs, partopclass, + partattrs, partatttypes, &partexprs, partopclass, partcollation);partnatts = list_length(stmt->partspec->partParams); + + if (strategy == PARTITION_STRATEGY_HASH) + { + Oid funcrettype; + + if (partnatts != 1) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of partition key must be 1 for hash partition"))); + + hashfuncOid = LookupFuncName(stmt->partspec->hashfunc, 1, partatttypes, false); + funcrettype = get_func_rettype(hashfuncOid); + if (funcrettype != INT4OID) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("hash function for partitioning must return integer"))); + + if (func_volatile(hashfuncOid) != PROVOLATILE_IMMUTABLE) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("hash function for partitioning must be marked IMMUTABLE"))); + + } + StorePartitionKey(rel, strategy, partnatts, partattrs, partexprs, - partopclass, partcollation); + partopclass, partcollation, stmt->partspec->partnparts, hashfuncOid);- /* Force key columns to be NOT NULL when using range partitioning */ - if (strategy == PARTITION_STRATEGY_RANGE) + /* Force key columns to be NOT NULL when using range or hash partitioning */ + if (strategy == PARTITION_STRATEGY_RANGE || + strategy == PARTITION_STRATEGY_HASH) { for (i = 0; i < partnatts; i++) { @@ -12783,18 +12812,51 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy) newspec->strategy = partspec->strategy; newspec->location = partspec->location; newspec->partParams = NIL; + newspec->partnparts = partspec->partnparts; + newspec->hashfunc = partspec->hashfunc;/* Parse partitioning strategy name */ if (!pg_strcasecmp(partspec->strategy, "list")) *strategy = PARTITION_STRATEGY_LIST; else if (!pg_strcasecmp(partspec->strategy, "range")) *strategy = PARTITION_STRATEGY_RANGE; + else if (!pg_strcasecmp(partspec->strategy, "hash")) + *strategy = PARTITION_STRATEGY_HASH; else ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("unrecognized partitioning strategy \"%s\"", partspec->strategy)));+ if (*strategy == PARTITION_STRATEGY_HASH) + { + if (partspec->partnparts < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of partitions must be specified for hash partition"))); + else if (partspec->partnparts == 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of partitions must be greater than 0"))); + + if (list_length(partspec->hashfunc) == 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("hash function must be specified for hash partition"))); + } + else + { + if (partspec->partnparts >= 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of partitions can be specified only for hash partition"))); + + if (list_length(partspec->hashfunc) > 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("hash function can be specified only for hash partition"))); + } + /* * Create a dummy ParseState and insert the target relation as its sole * rangetable entry. We need a ParseState for transformExpr. @@ -12843,7 +12905,7 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy) * Compute per-partition-column information from a list of PartitionElem's */ static void -ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, +ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, Oid *partatttypes, List **partexprs, Oid *partopclass, Oid *partcollation) { int attn; @@ -13010,6 +13072,7 @@ ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, "btree", BTREE_AM_OID);+ partatttypes[attn] = atttype; attn++; } } diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 05d8538..f4febc9 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -4232,6 +4232,8 @@ _copyPartitionSpec(const PartitionSpec *from)COPY_STRING_FIELD(strategy);
COPY_NODE_FIELD(partParams);
+ COPY_SCALAR_FIELD(partnparts);
+ COPY_NODE_FIELD(hashfunc);
COPY_LOCATION_FIELD(location);return newnode; diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index d595cd7..d589eac 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -2725,6 +2725,8 @@ _equalPartitionSpec(const PartitionSpec *a, const PartitionSpec *b) { COMPARE_STRING_FIELD(strategy); COMPARE_NODE_FIELD(partParams); + COMPARE_SCALAR_FIELD(partnparts); + COMPARE_NODE_FIELD(hashfunc); COMPARE_LOCATION_FIELD(location);return true; diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index b3802b4..d6db80e 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -3318,6 +3318,8 @@ _outPartitionSpec(StringInfo str, const PartitionSpec *node)WRITE_STRING_FIELD(strategy);
WRITE_NODE_FIELD(partParams);
+ WRITE_INT_FIELD(partnparts);
+ WRITE_NODE_FIELD(hashfunc);
WRITE_LOCATION_FIELD(location);
}diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index e833b2e..b67140d 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -574,6 +574,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type <list> partbound_datum_list %type <partrange_datum> PartitionRangeDatum %type <list> range_datum_list +%type <ival> hash_partitions +%type <list> hash_function/*
* Non-keyword token types. These are hard-wired into the "flex" lexer.
@@ -627,7 +629,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);GLOBAL GRANT GRANTED GREATEST GROUP_P GROUPING
- HANDLER HAVING HEADER_P HOLD HOUR_P + HANDLER HASH HAVING HEADER_P HOLD HOUR_PIDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P
INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
@@ -651,7 +653,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
OBJECT_P OF OFF OFFSET OIDS OLD ON ONLY OPERATOR OPTION OPTIONS OR
ORDER ORDINALITY OUT_P OUTER_P OVER OVERLAPS OVERLAY OWNED OWNER- PARALLEL PARSER PARTIAL PARTITION PASSING PASSWORD PLACING PLANS POLICY + PARALLEL PARSER PARTIAL PARTITION PARTITIONS PASSING PASSWORD PLACING PLANS POLICY POSITION PRECEDING PRECISION PRESERVE PREPARE PREPARED PRIMARY PRIOR PRIVILEGES PROCEDURAL PROCEDURE PROGRAM PUBLICATION@@ -2587,6 +2589,16 @@ ForValues:
$$ = (Node *) n; } + + /* a HASH partition */ + | /*EMPTY*/ + { + PartitionBoundSpec *n = makeNode(PartitionBoundSpec); + + n->strategy = PARTITION_STRATEGY_HASH; + + $$ = (Node *) n; + } ;partbound_datum:
@@ -3666,7 +3678,7 @@ OptPartitionSpec: PartitionSpec { $$ = $1; }
| /*EMPTY*/ { $$ = NULL; }
;-PartitionSpec: PARTITION BY part_strategy '(' part_params ')' +PartitionSpec: PARTITION BY part_strategy '(' part_params ')' hash_partitions hash_function { PartitionSpec *n = makeNode(PartitionSpec);@@ -3674,10 +3686,21 @@ PartitionSpec: PARTITION BY part_strategy '(' part_params ')'
n->partParams = $5;
n->location = @1;+ n->partnparts = $7; + n->hashfunc = $8; + $$ = n; } ;+hash_partitions: PARTITIONS Iconst { $$ = $2; } + | /*EMPTY*/ { $$ = -1; } + ; + +hash_function: USING handler_name { $$ = $2; } + | /*EMPTY*/ { $$ = NULL; } + ; + part_strategy: IDENT { $$ = $1; } | unreserved_keyword { $$ = pstrdup($1); } ; @@ -14377,6 +14400,7 @@ unreserved_keyword: | GLOBAL | GRANTED | HANDLER + | HASH | HEADER_P | HOLD | HOUR_P @@ -14448,6 +14472,7 @@ unreserved_keyword: | PARSER | PARTIAL | PARTITION + | PARTITIONS | PASSING | PASSWORD | PLANS diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c index ff2bab6..8e1be31 100644 --- a/src/backend/parser/parse_utilcmd.c +++ b/src/backend/parser/parse_utilcmd.c @@ -40,6 +40,7 @@ #include "catalog/pg_opclass.h" #include "catalog/pg_operator.h" #include "catalog/pg_type.h" +#include "catalog/partition.h" #include "commands/comment.h" #include "commands/defrem.h" #include "commands/tablecmds.h" @@ -3252,6 +3253,24 @@ transformPartitionBound(ParseState *pstate, Relation parent, Node *bound) ++i; } } + else if (strategy == PARTITION_STRATEGY_HASH) + { + Value *conval; + Node *value; + int index; + + if (spec->strategy != PARTITION_STRATEGY_HASH) + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("invalid bound specification for a hash partition"))); + + index = get_next_hash_partition_index(parent); + + /* store the partition index as a listdatums value */ + conval = makeInteger(index); + value = (Node *) make_const(pstate, conval, -1); + result_spec->listdatums = list_make1(value); + } else elog(ERROR, "unexpected partition strategy: %d", (int) strategy);diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index b27b77d..fab6eea 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -1423,7 +1423,7 @@ pg_get_indexdef_worker(Oid indexrelid, int colno, * * Returns the partition key specification, ie, the following: * - * PARTITION BY { RANGE | LIST } (column opt_collation opt_opclass [, ...]) + * PARTITION BY { RANGE | LIST | HASH } (column opt_collation opt_opclass [, ...]) */ Datum pg_get_partkeydef(PG_FUNCTION_ARGS) @@ -1513,6 +1513,9 @@ pg_get_partkeydef_worker(Oid relid, int prettyFlags) case PARTITION_STRATEGY_RANGE: appendStringInfo(&buf, "RANGE"); break; + case PARTITION_STRATEGY_HASH: + appendStringInfo(&buf, "HASH"); + break; default: elog(ERROR, "unexpected partition strategy: %d", (int) form->partstrat); @@ -8520,6 +8523,9 @@ get_rule_expr(Node *node, deparse_context *context, appendStringInfoString(buf, ")"); break;+ case PARTITION_STRATEGY_HASH: + break; + default: elog(ERROR, "unrecognized partition strategy: %d", (int) spec->strategy); diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index 9001e20..829e4d2 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -855,6 +855,9 @@ RelationBuildPartitionKey(Relation relation) key->strategy = form->partstrat; key->partnatts = form->partnatts;+ key->partnparts = form->partnparts; + key->parthashfunc = form->parthashfunc; + /* * We can rely on the first variable-length attribute being mapped to the * relevant field of the catalog's C struct, because all previous @@ -999,6 +1002,9 @@ copy_partition_key(PartitionKey fromkey) newkey->strategy = fromkey->strategy; newkey->partnatts = n = fromkey->partnatts;+ newkey->partnparts = fromkey->partnparts; + newkey->parthashfunc = fromkey->parthashfunc; + newkey->partattrs = (AttrNumber *) palloc(n * sizeof(AttrNumber)); memcpy(newkey->partattrs, fromkey->partattrs, n * sizeof(AttrNumber));diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h index 1187797..367e2f8 100644 --- a/src/include/catalog/heap.h +++ b/src/include/catalog/heap.h @@ -141,7 +141,7 @@ extern void StorePartitionKey(Relation rel, AttrNumber *partattrs, List *partexprs, Oid *partopclass, - Oid *partcollation); + Oid *partcollation, int16 partnparts, Oid hashfunc); extern void RemovePartitionKeyByRelId(Oid relid); extern void StorePartitionBound(Relation rel, Relation parent, Node *bound);diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h index b195d1a..80f4b0e 100644 --- a/src/include/catalog/partition.h +++ b/src/include/catalog/partition.h @@ -89,4 +89,6 @@ extern int get_partition_for_tuple(PartitionDispatch *pd, TupleTableSlot *slot, EState *estate, Oid *failed_at); +extern Expr *convert_expr_for_hash(Expr *expr, int npart, Oid hashfunc); +extern int get_next_hash_partition_index(Relation parent); #endif /* PARTITION_H */ diff --git a/src/include/catalog/pg_partitioned_table.h b/src/include/catalog/pg_partitioned_table.h index bdff36a..69e509c 100644 --- a/src/include/catalog/pg_partitioned_table.h +++ b/src/include/catalog/pg_partitioned_table.h @@ -33,6 +33,9 @@ CATALOG(pg_partitioned_table,3350) BKI_WITHOUT_OIDS char partstrat; /* partitioning strategy */ int16 partnatts; /* number of partition key columns */+ int16 partnparts;
+ Oid parthashfunc;
+
/*
* variable-length fields start here, but we allow direct access to
* partattrs via the C struct. That's because the first variable-length
@@ -49,6 +52,8 @@ CATALOG(pg_partitioned_table,3350) BKI_WITHOUT_OIDS
pg_node_tree partexprs; /* list of expressions in the partition key;
* one item for each zero entry in partattrs[] */
#endif
+
+
} FormData_pg_partitioned_table;/* ---------------- @@ -62,13 +67,15 @@ typedef FormData_pg_partitioned_table *Form_pg_partitioned_table; * compiler constants for pg_partitioned_table * ---------------- */ -#define Natts_pg_partitioned_table 7 +#define Natts_pg_partitioned_table 9 #define Anum_pg_partitioned_table_partrelid 1 #define Anum_pg_partitioned_table_partstrat 2 #define Anum_pg_partitioned_table_partnatts 3 -#define Anum_pg_partitioned_table_partattrs 4 -#define Anum_pg_partitioned_table_partclass 5 -#define Anum_pg_partitioned_table_partcollation 6 -#define Anum_pg_partitioned_table_partexprs 7 +#define Anum_pg_partitioned_table_partnparts 4 +#define Anum_pg_partitioned_table_parthashfunc 5 +#define Anum_pg_partitioned_table_partattrs 6 +#define Anum_pg_partitioned_table_partclass 7 +#define Anum_pg_partitioned_table_partcollation 8 +#define Anum_pg_partitioned_table_partexprs 9#endif /* PG_PARTITIONED_TABLE_H */ diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 5afc3eb..1c3474f 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -730,11 +730,14 @@ typedef struct PartitionSpec NodeTag type; char *strategy; /* partitioning strategy ('list' or 'range') */ List *partParams; /* List of PartitionElems */ + int partnparts; + List *hashfunc; int location; /* token location, or -1 if unknown */ } PartitionSpec;#define PARTITION_STRATEGY_LIST 'l'
#define PARTITION_STRATEGY_RANGE 'r'
+#define PARTITION_STRATEGY_HASH 'h'/* * PartitionBoundSpec - a partition bound specification diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h index 985d650..0597939 100644 --- a/src/include/parser/kwlist.h +++ b/src/include/parser/kwlist.h @@ -180,6 +180,7 @@ PG_KEYWORD("greatest", GREATEST, COL_NAME_KEYWORD) PG_KEYWORD("group", GROUP_P, RESERVED_KEYWORD) PG_KEYWORD("grouping", GROUPING, COL_NAME_KEYWORD) PG_KEYWORD("handler", HANDLER, UNRESERVED_KEYWORD) +PG_KEYWORD("hash", HASH, UNRESERVED_KEYWORD) PG_KEYWORD("having", HAVING, RESERVED_KEYWORD) PG_KEYWORD("header", HEADER_P, UNRESERVED_KEYWORD) PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD) @@ -291,6 +292,7 @@ PG_KEYWORD("parallel", PARALLEL, UNRESERVED_KEYWORD) PG_KEYWORD("parser", PARSER, UNRESERVED_KEYWORD) PG_KEYWORD("partial", PARTIAL, UNRESERVED_KEYWORD) PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD) +PG_KEYWORD("partitions", PARTITIONS, UNRESERVED_KEYWORD) PG_KEYWORD("passing", PASSING, UNRESERVED_KEYWORD) PG_KEYWORD("password", PASSWORD, UNRESERVED_KEYWORD) PG_KEYWORD("placing", PLACING, RESERVED_KEYWORD) diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index a617a7c..660adfb 100644 --- a/src/include/utils/rel.h +++ b/src/include/utils/rel.h @@ -62,6 +62,9 @@ typedef struct PartitionKeyData Oid *partopcintype; /* OIDs of opclass declared input data types */ FmgrInfo *partsupfunc; /* lookup info for support funcs */+ int16 partnparts; /* number of hash partitions */ + Oid parthashfunc; /* OID of hash function */ + /* Partitioning collation per attribute */ Oid *partcollation;
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Best regards,
Aleksander Alekseev
Nagata-san,
On 2017/02/28 23:33, Yugo Nagata wrote:
Hi all,
Now we have a declarative partitioning, but hash partitioning is not
implemented yet. Attached is a POC patch to add the hash partitioning
feature. I know we will need more discussions about the syntax and other
specifications before going ahead the project, but I think this runnable
code might help to discuss what and how we implement this.
Great!
* Description
In this patch, the hash partitioning implementation is basically based
on the list partitioning mechanism. However, partition bounds cannot be
specified explicitly, but this is used internally as hash partition
index, which is calculated when a partition is created or attached.The tentative syntax to create a partitioned table is as bellow;
CREATE TABLE h (i int) PARTITION BY HASH(i) PARTITIONS 3 USING hashint4;
The number of partitions is specified by PARTITIONS, which is currently
constant and cannot be changed, but I think this is needed to be changed in
some manner. A hash function is specified by USING. Maybe, specifying hash
function may be ommitted, and in this case, a default hash function
corresponding to key type will be used.A partition table can be create as bellow;
CREATE TABLE h1 PARTITION OF h;
CREATE TABLE h2 PARTITION OF h;
CREATE TABLE h3 PARTITION OF h;FOR VALUES clause cannot be used, and the partition bound is
calclulated automatically as partition index of single integer value.When trying create partitions more than the number specified
by PARTITIONS, it gets an error.postgres=# create table h4 partition of h;
ERROR: cannot create hash partition more than 3 for h
Instead of having to create each partition individually, wouldn't it be
better if the following command
CREATE TABLE h (i int) PARTITION BY HASH (i) PARTITIONS 3;
created the partitions *automatically*?
It makes sense to provide a way to create individual list and range
partitions separately, because users can specify custom bounds for each.
We don't need that for hash partitions, so why make users run separate
commands (without the FOR VALUES clause) anyway? We may perhaps need to
offer a way to optionally specify a user-defined name for each partition
in the same command, along with tablespace, storage options, etc. By
default, the names would be generated internally and the user can ALTER
individual partitions after the fact to specify tablespace, etc.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Feb 28, 2017 at 8:03 PM, Yugo Nagata <nagata@sraoss.co.jp> wrote:
Hi all,
Now we have a declarative partitioning, but hash partitioning is not
implemented yet. Attached is a POC patch to add the hash partitioning
feature. I know we will need more discussions about the syntax and other
specifications before going ahead the project, but I think this runnable
code might help to discuss what and how we implement this.* Description
In this patch, the hash partitioning implementation is basically based
on the list partitioning mechanism. However, partition bounds cannot be
specified explicitly, but this is used internally as hash partition
index, which is calculated when a partition is created or attached.The tentative syntax to create a partitioned table is as bellow;
CREATE TABLE h (i int) PARTITION BY HASH(i) PARTITIONS 3 USING hashint4;
The number of partitions is specified by PARTITIONS, which is currently
constant and cannot be changed, but I think this is needed to be changed in
some manner. A hash function is specified by USING. Maybe, specifying hash
function may be ommitted, and in this case, a default hash function
corresponding to key type will be used.A partition table can be create as bellow;
CREATE TABLE h1 PARTITION OF h;
CREATE TABLE h2 PARTITION OF h;
CREATE TABLE h3 PARTITION OF h;FOR VALUES clause cannot be used, and the partition bound is
calclulated automatically as partition index of single integer value.When trying create partitions more than the number specified
by PARTITIONS, it gets an error.postgres=# create table h4 partition of h;
ERROR: cannot create hash partition more than 3 for hAn inserted record is stored in a partition whose index equals
abs(hashfunc(key)) % <number_of_partitions>. In the above
example, this is abs(hashint4(i))%3.postgres=# insert into h (select generate_series(0,20));
INSERT 0 21postgres=# select *,tableoid::regclass from h;
i | tableoid
----+----------
0 | h1
1 | h1
2 | h1
4 | h1
8 | h1
10 | h1
11 | h1
14 | h1
15 | h1
17 | h1
20 | h1
5 | h2
12 | h2
13 | h2
16 | h2
19 | h2
3 | h3
6 | h3
7 | h3
9 | h3
18 | h3
(21 rows)
This is good, I will have closer look into the patch, but here are
few quick comments.
- CREATE HASH partition syntax adds two new keywords and ideally
we should try to avoid adding additional keywords. Also I can see that
HASH keyword been added, but I don't see any use of newly added
keyword in gram.y.
- Also I didn't like the idea of fixing number of partitions during the
CREATE
TABLE syntax. Thats something that needs to be able to changes.
* Todo / discussions
In this patch, we cannot change the number of partitions specified
by PARTITIONS. I we can change this, the partitioning rule
(<partition index> = abs(hashfunc(key)) % <number_of_partitions>)
is also changed and then we need reallocatiing records between
partitions.In this patch, user can specify a hash function USING. However,
we migth need default hash functions which are useful and
proper for hash partitioning.
+1
- With fixing default hash function and not specifying number of partitions
during CREATE TABLE - don't need two new additional columns into
pg_partitioned_table catalog.
Currently, even when we issue SELECT query with a condition,
postgres looks into all partitions regardless of each partition's
constraint, because this is complicated such like "abs(hashint4(i))%3 = 0".postgres=# explain select * from h where i = 10;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..125.62 rows=40 width=4)
-> Seq Scan on h (cost=0.00..0.00 rows=1 width=4)
Filter: (i = 10)
-> Seq Scan on h1 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
-> Seq Scan on h2 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
-> Seq Scan on h3 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
(9 rows)However, if we modify a condition into a same expression
as the partitions constraint, postgres can exclude unrelated
table from search targets. So, we might avoid the problem
by converting the qual properly before calling predicate_refuted_by().postgres=# explain select * from h where abs(hashint4(i))%3 =
abs(hashint4(10))%3;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..61.00 rows=14 width=4)
-> Seq Scan on h (cost=0.00..0.00 rows=1 width=4)
Filter: ((abs(hashint4(i)) % 3) = 2)
-> Seq Scan on h3 (cost=0.00..61.00 rows=13 width=4)
Filter: ((abs(hashint4(i)) % 3) = 2)
(5 rows)Best regards,
Yugo Nagata--
Yugo Nagata <nagata@sraoss.co.jp>--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Regards,
Rushabh Lathia
On Tue, Feb 28, 2017 at 8:03 PM, Yugo Nagata <nagata@sraoss.co.jp> wrote:
Hi all,
Now we have a declarative partitioning, but hash partitioning is not
implemented yet. Attached is a POC patch to add the hash partitioning
feature. I know we will need more discussions about the syntax and other
specifications before going ahead the project, but I think this runnable
code might help to discuss what and how we implement this.
Great.
* Description
In this patch, the hash partitioning implementation is basically based
on the list partitioning mechanism. However, partition bounds cannot be
specified explicitly, but this is used internally as hash partition
index, which is calculated when a partition is created or attached.The tentative syntax to create a partitioned table is as bellow;
CREATE TABLE h (i int) PARTITION BY HASH(i) PARTITIONS 3 USING hashint4;
The number of partitions is specified by PARTITIONS, which is currently
constant and cannot be changed, but I think this is needed to be changed
in
some manner. A hash function is specified by USING. Maybe, specifying hash
function may be ommitted, and in this case, a default hash function
corresponding to key type will be used.A partition table can be create as bellow;
CREATE TABLE h1 PARTITION OF h;
CREATE TABLE h2 PARTITION OF h;
CREATE TABLE h3 PARTITION OF h;FOR VALUES clause cannot be used, and the partition bound is
calclulated automatically as partition index of single integer value.When trying create partitions more than the number specified
by PARTITIONS, it gets an error.postgres=# create table h4 partition of h;
ERROR: cannot create hash partition more than 3 for hAn inserted record is stored in a partition whose index equals
abs(hashfunc(key)) % <number_of_partitions>. In the above
example, this is abs(hashint4(i))%3.postgres=# insert into h (select generate_series(0,20));
INSERT 0 21postgres=# select *,tableoid::regclass from h;
i | tableoid
----+----------
0 | h1
1 | h1
2 | h1
4 | h1
8 | h1
10 | h1
11 | h1
14 | h1
15 | h1
17 | h1
20 | h1
5 | h2
12 | h2
13 | h2
16 | h2
19 | h2
3 | h3
6 | h3
7 | h3
9 | h3
18 | h3
(21 rows)* Todo / discussions
In this patch, we cannot change the number of partitions specified
by PARTITIONS. I we can change this, the partitioning rule
(<partition index> = abs(hashfunc(key)) % <number_of_partitions>)
is also changed and then we need reallocatiing records between
partitions.In this patch, user can specify a hash function USING. However,
we migth need default hash functions which are useful and
proper for hash partitioning.
IMHO, we should try to keep create partition syntax simple and aligned
with other partition strategy. For e.g:
CREATE TABLE h (i int) PARTITION BY HASH(i);
I Agree that it is unavoidable partitions number in modulo hashing,
but we can do in other hashing technique. Have you had thought about
Linear hashing[1]https://en.wikipedia.org/wiki/Linear_hashing or Consistent hashing[2]https://en.wikipedia.org/wiki/Consistent_hashing? This will allow us to
add/drop
partition with minimal row moment.
+1 for the default hash function corresponding to partitioning key type.
Regards,
Amul
[1]: https://en.wikipedia.org/wiki/Linear_hashing
[2]: https://en.wikipedia.org/wiki/Consistent_hashing
Hi Aleksander,
On Tue, 28 Feb 2017 18:05:36 +0300
Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote:
Hi, Yugo.
Looks like a great feature! I'm going to take a closer look on your code
and write a feedback shortly. For now I can only tell that you forgot
to include some documentation in the patch.
Thank you for looking into it. I'm forward to your feedback.
This is a proof of concept patch and additional documentation
is not included. I'll add this after reaching a consensus
on the specification of the feature.
I've added a corresponding entry to current commitfest [1]. Hope you
don't mind. If it's not too much trouble could you please register on a
commitfest site and add yourself to this entry as an author? I'm pretty
sure someone is using this information for writing release notes or
something like this.
Thank you for registering it to the commitfest. I have added me as an auther.
[1] https://commitfest.postgresql.org/13/1059/
On Tue, Feb 28, 2017 at 11:33:13PM +0900, Yugo Nagata wrote:
Hi all,
Now we have a declarative partitioning, but hash partitioning is not
implemented yet. Attached is a POC patch to add the hash partitioning
feature. I know we will need more discussions about the syntax and other
specifications before going ahead the project, but I think this runnable
code might help to discuss what and how we implement this.* Description
In this patch, the hash partitioning implementation is basically based
on the list partitioning mechanism. However, partition bounds cannot be
specified explicitly, but this is used internally as hash partition
index, which is calculated when a partition is created or attached.The tentative syntax to create a partitioned table is as bellow;
CREATE TABLE h (i int) PARTITION BY HASH(i) PARTITIONS 3 USING hashint4;
The number of partitions is specified by PARTITIONS, which is currently
constant and cannot be changed, but I think this is needed to be changed in
some manner. A hash function is specified by USING. Maybe, specifying hash
function may be ommitted, and in this case, a default hash function
corresponding to key type will be used.A partition table can be create as bellow;
CREATE TABLE h1 PARTITION OF h;
CREATE TABLE h2 PARTITION OF h;
CREATE TABLE h3 PARTITION OF h;FOR VALUES clause cannot be used, and the partition bound is
calclulated automatically as partition index of single integer value.When trying create partitions more than the number specified
by PARTITIONS, it gets an error.postgres=# create table h4 partition of h;
ERROR: cannot create hash partition more than 3 for hAn inserted record is stored in a partition whose index equals
abs(hashfunc(key)) % <number_of_partitions>. In the above
example, this is abs(hashint4(i))%3.postgres=# insert into h (select generate_series(0,20));
INSERT 0 21postgres=# select *,tableoid::regclass from h;
i | tableoid
----+----------
0 | h1
1 | h1
2 | h1
4 | h1
8 | h1
10 | h1
11 | h1
14 | h1
15 | h1
17 | h1
20 | h1
5 | h2
12 | h2
13 | h2
16 | h2
19 | h2
3 | h3
6 | h3
7 | h3
9 | h3
18 | h3
(21 rows)* Todo / discussions
In this patch, we cannot change the number of partitions specified
by PARTITIONS. I we can change this, the partitioning rule
(<partition index> = abs(hashfunc(key)) % <number_of_partitions>)
is also changed and then we need reallocatiing records between
partitions.In this patch, user can specify a hash function USING. However,
we migth need default hash functions which are useful and
proper for hash partitioning.Currently, even when we issue SELECT query with a condition,
postgres looks into all partitions regardless of each partition's
constraint, because this is complicated such like "abs(hashint4(i))%3 = 0".postgres=# explain select * from h where i = 10;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..125.62 rows=40 width=4)
-> Seq Scan on h (cost=0.00..0.00 rows=1 width=4)
Filter: (i = 10)
-> Seq Scan on h1 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
-> Seq Scan on h2 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
-> Seq Scan on h3 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
(9 rows)However, if we modify a condition into a same expression
as the partitions constraint, postgres can exclude unrelated
table from search targets. So, we might avoid the problem
by converting the qual properly before calling predicate_refuted_by().postgres=# explain select * from h where abs(hashint4(i))%3 = abs(hashint4(10))%3;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..61.00 rows=14 width=4)
-> Seq Scan on h (cost=0.00..0.00 rows=1 width=4)
Filter: ((abs(hashint4(i)) % 3) = 2)
-> Seq Scan on h3 (cost=0.00..61.00 rows=13 width=4)
Filter: ((abs(hashint4(i)) % 3) = 2)
(5 rows)Best regards,
Yugo Nagata--
Yugo Nagata <nagata@sraoss.co.jp>diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 41c0056..3820920 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -3074,7 +3074,7 @@ StorePartitionKey(Relation rel, AttrNumber *partattrs, List *partexprs, Oid *partopclass, - Oid *partcollation) + Oid *partcollation, int16 partnparts, Oid hashfunc) { int i; int2vector *partattrs_vec; @@ -3121,6 +3121,8 @@ StorePartitionKey(Relation rel, values[Anum_pg_partitioned_table_partrelid - 1] = ObjectIdGetDatum(RelationGetRelid(rel)); values[Anum_pg_partitioned_table_partstrat - 1] = CharGetDatum(strategy); values[Anum_pg_partitioned_table_partnatts - 1] = Int16GetDatum(partnatts); + values[Anum_pg_partitioned_table_partnparts - 1] = Int16GetDatum(partnparts); + values[Anum_pg_partitioned_table_parthashfunc - 1] = ObjectIdGetDatum(hashfunc); values[Anum_pg_partitioned_table_partattrs - 1] = PointerGetDatum(partattrs_vec); values[Anum_pg_partitioned_table_partclass - 1] = PointerGetDatum(partopclass_vec); values[Anum_pg_partitioned_table_partcollation - 1] = PointerGetDatum(partcollation_vec); diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 4bcef58..24e69c6 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -36,6 +36,8 @@ #include "optimizer/clauses.h" #include "optimizer/planmain.h" #include "optimizer/var.h" +#include "parser/parse_func.h" +#include "parser/parse_oper.h" #include "rewrite/rewriteManip.h" #include "storage/lmgr.h" #include "utils/array.h" @@ -120,6 +122,7 @@ static int32 qsort_partition_rbound_cmp(const void *a, const void *b,static List *get_qual_for_list(PartitionKey key, PartitionBoundSpec *spec); static List *get_qual_for_range(PartitionKey key, PartitionBoundSpec *spec); +static List *get_qual_for_hash(PartitionKey key, PartitionBoundSpec *spec); static Oid get_partition_operator(PartitionKey key, int col, StrategyNumber strategy, bool *need_relabel); static List *generate_partition_qual(Relation rel); @@ -236,7 +239,8 @@ RelationBuildPartitionDesc(Relation rel) oids[i++] = lfirst_oid(cell);/* Convert from node to the internal representation */ - if (key->strategy == PARTITION_STRATEGY_LIST) + if (key->strategy == PARTITION_STRATEGY_LIST || + key->strategy == PARTITION_STRATEGY_HASH) { List *non_null_values = NIL;@@ -251,7 +255,7 @@ RelationBuildPartitionDesc(Relation rel)
ListCell *c;
PartitionBoundSpec *spec = lfirst(cell);- if (spec->strategy != PARTITION_STRATEGY_LIST) + if (spec->strategy != key->strategy) elog(ERROR, "invalid strategy in partition bound spec");foreach(c, spec->listdatums)
@@ -464,6 +468,7 @@ RelationBuildPartitionDesc(Relation rel)
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_HASH:
{
boundinfo->has_null = found_null;
boundinfo->indexes = (int *) palloc(ndatums * sizeof(int));
@@ -829,6 +834,18 @@ check_new_partition_bound(char *relname, Relation parent, Node *bound)
break;
}+ case PARTITION_STRATEGY_HASH: + { + Assert(spec->strategy == PARTITION_STRATEGY_HASH); + + if (partdesc->nparts + 1 > key->partnparts) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("cannot create hash partition more than %d for %s", + key->partnparts, RelationGetRelationName(parent)))); + break; + } + default: elog(ERROR, "unexpected partition strategy: %d", (int) key->strategy); @@ -916,6 +933,11 @@ get_qual_from_partbound(Relation rel, Relation parent, Node *bound) my_qual = get_qual_for_range(key, spec); break;+ case PARTITION_STRATEGY_HASH: + Assert(spec->strategy == PARTITION_STRATEGY_LIST); + my_qual = get_qual_for_hash(key, spec); + break; + default: elog(ERROR, "unexpected partition strategy: %d", (int) key->strategy); @@ -1146,6 +1168,84 @@ RelationGetPartitionDispatchInfo(Relation rel, int lockmode, return pd; }+/* + * convert_expr_for_hash + * + * Converts a expr for a hash partition's constraint. + * expr is converted into 'abs(hashfunc(expr)) % npart". + * + * npart: number of partitions + * hashfunc: OID of hash function + */ +Expr * +convert_expr_for_hash(Expr *expr, int npart, Oid hashfunc) +{ + FuncExpr *func, + *abs; + Expr *modexpr; + Oid modoid; + Oid int4oid[1] = {INT4OID}; + + ParseState *pstate = make_parsestate(NULL); + Value *val_npart = makeInteger(npart); + Node *const_npart = (Node *) make_const(pstate, val_npart, -1); + + /* hash function */ + func = makeFuncExpr(hashfunc, + INT4OID, + list_make1(expr), + 0, + 0, + COERCE_EXPLICIT_CALL); + + /* Abs */ + abs = makeFuncExpr(LookupFuncName(list_make1(makeString("abs")), 1, int4oid, false), + INT4OID, + list_make1(func), + 0, + 0, + COERCE_EXPLICIT_CALL); + + /* modulo by npart */ + modoid = LookupOperName(pstate, list_make1(makeString("%")), INT4OID, INT4OID, false, -1); + modexpr = make_opclause(modoid, INT4OID, false, (Expr*)abs, (Expr*)const_npart, 0, 0); + + return modexpr; +} + + +/* + * get_next_hash_partition_index + * + * Returns the minimal index which is not used for hash partition. + */ +int +get_next_hash_partition_index(Relation parent) +{ + PartitionKey key = RelationGetPartitionKey(parent); + PartitionDesc partdesc = RelationGetPartitionDesc(parent); + + int i; + bool *used = palloc0(sizeof(int) * key->partnparts); + + /* mark used for existing partition indexs */ + for (i = 0; i < partdesc->boundinfo->ndatums; i++) + { + Datum* datum = partdesc->boundinfo->datums[i]; + int idx = DatumGetInt16(datum[0]); + + if (!used[idx]) + used[idx] = true; + } + + /* find the minimal unused index */ + for (i = 0; i < key->partnparts; i++) + if (!used[i]) + break; + + return i; +} + /* Module-local functions *//*
@@ -1467,6 +1567,43 @@ get_qual_for_range(PartitionKey key, PartitionBoundSpec *spec)
}/* + * get_qual_for_hash + * + * Returns a list of expressions to use as a hash partition's constraint. + */ +static List * +get_qual_for_hash(PartitionKey key, PartitionBoundSpec *spec) +{ + List *result; + Expr *keyCol; + Expr *expr; + Expr *opexpr; + Oid operoid; + ParseState *pstate = make_parsestate(NULL); + + /* Left operand */ + if (key->partattrs[0] != 0) + keyCol = (Expr *) makeVar(1, + key->partattrs[0], + key->parttypid[0], + key->parttypmod[0], + key->parttypcoll[0], + 0); + else + keyCol = (Expr *) copyObject(linitial(key->partexprs)); + + expr = convert_expr_for_hash(keyCol, key->partnparts, key->parthashfunc); + + /* equals the listdaums value */ + operoid = LookupOperName(pstate, list_make1(makeString("=")), INT4OID, INT4OID, false, -1); + opexpr = make_opclause(operoid, BOOLOID, false, expr, linitial(spec->listdatums), 0, 0); + + result = list_make1(opexpr); + + return result; +} + +/* * get_partition_operator * * Return oid of the operator of given strategy for a given partition key @@ -1730,6 +1867,11 @@ get_partition_for_tuple(PartitionDispatch *pd, (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), errmsg("range partition key of row contains null"))); } + else if (key->strategy == PARTITION_STRATEGY_HASH) + { + values[0] = OidFunctionCall1(key->parthashfunc, values[0]); + values[0] = Int16GetDatum(Abs(DatumGetInt16(values[0])) % key->partnparts); + }if (partdesc->boundinfo->has_null && isnull[0]) /* Tuple maps to the null-accepting list partition */ @@ -1744,6 +1886,7 @@ get_partition_for_tuple(PartitionDispatch *pd, switch (key->strategy) { case PARTITION_STRATEGY_LIST: + case PARTITION_STRATEGY_HASH: if (cur_offset >= 0 && equal) cur_index = partdesc->boundinfo->indexes[cur_offset]; else @@ -1968,6 +2111,7 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, switch (key->strategy) { case PARTITION_STRATEGY_LIST: + case PARTITION_STRATEGY_HASH: cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0], key->partcollation[0], bound_datums[0], diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 3cea220..5a28cc0 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -41,6 +41,7 @@ #include "catalog/pg_inherits_fn.h" #include "catalog/pg_namespace.h" #include "catalog/pg_opclass.h" +#include "catalog/pg_proc.h" #include "catalog/pg_tablespace.h" #include "catalog/pg_trigger.h" #include "catalog/pg_type.h" @@ -77,6 +78,7 @@ #include "parser/parse_oper.h" #include "parser/parse_relation.h" #include "parser/parse_type.h" +#include "parser/parse_func.h" #include "parser/parse_utilcmd.h" #include "parser/parser.h" #include "pgstat.h" @@ -450,7 +452,7 @@ static void RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid, void *arg); static bool is_partition_attr(Relation rel, AttrNumber attnum, bool *used_in_expr); static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy); -static void ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, +static void ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, Oid *partatttypes, List **partexprs, Oid *partopclass, Oid *partcollation); static void CreateInheritance(Relation child_rel, Relation parent_rel); static void RemoveInheritance(Relation child_rel, Relation parent_rel); @@ -799,8 +801,10 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, AttrNumber partattrs[PARTITION_MAX_KEYS]; Oid partopclass[PARTITION_MAX_KEYS]; Oid partcollation[PARTITION_MAX_KEYS]; + Oid partatttypes[PARTITION_MAX_KEYS]; List *partexprs = NIL; List *cmds = NIL; + Oid hashfuncOid = InvalidOid;/* * We need to transform the raw parsetrees corresponding to partition @@ -811,15 +815,40 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, stmt->partspec = transformPartitionSpec(rel, stmt->partspec, &strategy); ComputePartitionAttrs(rel, stmt->partspec->partParams, - partattrs, &partexprs, partopclass, + partattrs, partatttypes, &partexprs, partopclass, partcollation);partnatts = list_length(stmt->partspec->partParams); + + if (strategy == PARTITION_STRATEGY_HASH) + { + Oid funcrettype; + + if (partnatts != 1) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of partition key must be 1 for hash partition"))); + + hashfuncOid = LookupFuncName(stmt->partspec->hashfunc, 1, partatttypes, false); + funcrettype = get_func_rettype(hashfuncOid); + if (funcrettype != INT4OID) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("hash function for partitioning must return integer"))); + + if (func_volatile(hashfuncOid) != PROVOLATILE_IMMUTABLE) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("hash function for partitioning must be marked IMMUTABLE"))); + + } + StorePartitionKey(rel, strategy, partnatts, partattrs, partexprs, - partopclass, partcollation); + partopclass, partcollation, stmt->partspec->partnparts, hashfuncOid);- /* Force key columns to be NOT NULL when using range partitioning */ - if (strategy == PARTITION_STRATEGY_RANGE) + /* Force key columns to be NOT NULL when using range or hash partitioning */ + if (strategy == PARTITION_STRATEGY_RANGE || + strategy == PARTITION_STRATEGY_HASH) { for (i = 0; i < partnatts; i++) { @@ -12783,18 +12812,51 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy) newspec->strategy = partspec->strategy; newspec->location = partspec->location; newspec->partParams = NIL; + newspec->partnparts = partspec->partnparts; + newspec->hashfunc = partspec->hashfunc;/* Parse partitioning strategy name */ if (!pg_strcasecmp(partspec->strategy, "list")) *strategy = PARTITION_STRATEGY_LIST; else if (!pg_strcasecmp(partspec->strategy, "range")) *strategy = PARTITION_STRATEGY_RANGE; + else if (!pg_strcasecmp(partspec->strategy, "hash")) + *strategy = PARTITION_STRATEGY_HASH; else ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("unrecognized partitioning strategy \"%s\"", partspec->strategy)));+ if (*strategy == PARTITION_STRATEGY_HASH) + { + if (partspec->partnparts < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of partitions must be specified for hash partition"))); + else if (partspec->partnparts == 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of partitions must be greater than 0"))); + + if (list_length(partspec->hashfunc) == 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("hash function must be specified for hash partition"))); + } + else + { + if (partspec->partnparts >= 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of partitions can be specified only for hash partition"))); + + if (list_length(partspec->hashfunc) > 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("hash function can be specified only for hash partition"))); + } + /* * Create a dummy ParseState and insert the target relation as its sole * rangetable entry. We need a ParseState for transformExpr. @@ -12843,7 +12905,7 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy) * Compute per-partition-column information from a list of PartitionElem's */ static void -ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, +ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, Oid *partatttypes, List **partexprs, Oid *partopclass, Oid *partcollation) { int attn; @@ -13010,6 +13072,7 @@ ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, "btree", BTREE_AM_OID);+ partatttypes[attn] = atttype; attn++; } } diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 05d8538..f4febc9 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -4232,6 +4232,8 @@ _copyPartitionSpec(const PartitionSpec *from)COPY_STRING_FIELD(strategy);
COPY_NODE_FIELD(partParams);
+ COPY_SCALAR_FIELD(partnparts);
+ COPY_NODE_FIELD(hashfunc);
COPY_LOCATION_FIELD(location);return newnode; diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index d595cd7..d589eac 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -2725,6 +2725,8 @@ _equalPartitionSpec(const PartitionSpec *a, const PartitionSpec *b) { COMPARE_STRING_FIELD(strategy); COMPARE_NODE_FIELD(partParams); + COMPARE_SCALAR_FIELD(partnparts); + COMPARE_NODE_FIELD(hashfunc); COMPARE_LOCATION_FIELD(location);return true; diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index b3802b4..d6db80e 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -3318,6 +3318,8 @@ _outPartitionSpec(StringInfo str, const PartitionSpec *node)WRITE_STRING_FIELD(strategy);
WRITE_NODE_FIELD(partParams);
+ WRITE_INT_FIELD(partnparts);
+ WRITE_NODE_FIELD(hashfunc);
WRITE_LOCATION_FIELD(location);
}diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index e833b2e..b67140d 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -574,6 +574,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type <list> partbound_datum_list %type <partrange_datum> PartitionRangeDatum %type <list> range_datum_list +%type <ival> hash_partitions +%type <list> hash_function/*
* Non-keyword token types. These are hard-wired into the "flex" lexer.
@@ -627,7 +629,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);GLOBAL GRANT GRANTED GREATEST GROUP_P GROUPING
- HANDLER HAVING HEADER_P HOLD HOUR_P + HANDLER HASH HAVING HEADER_P HOLD HOUR_PIDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P
INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
@@ -651,7 +653,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
OBJECT_P OF OFF OFFSET OIDS OLD ON ONLY OPERATOR OPTION OPTIONS OR
ORDER ORDINALITY OUT_P OUTER_P OVER OVERLAPS OVERLAY OWNED OWNER- PARALLEL PARSER PARTIAL PARTITION PASSING PASSWORD PLACING PLANS POLICY + PARALLEL PARSER PARTIAL PARTITION PARTITIONS PASSING PASSWORD PLACING PLANS POLICY POSITION PRECEDING PRECISION PRESERVE PREPARE PREPARED PRIMARY PRIOR PRIVILEGES PROCEDURAL PROCEDURE PROGRAM PUBLICATION@@ -2587,6 +2589,16 @@ ForValues:
$$ = (Node *) n; } + + /* a HASH partition */ + | /*EMPTY*/ + { + PartitionBoundSpec *n = makeNode(PartitionBoundSpec); + + n->strategy = PARTITION_STRATEGY_HASH; + + $$ = (Node *) n; + } ;partbound_datum:
@@ -3666,7 +3678,7 @@ OptPartitionSpec: PartitionSpec { $$ = $1; }
| /*EMPTY*/ { $$ = NULL; }
;-PartitionSpec: PARTITION BY part_strategy '(' part_params ')' +PartitionSpec: PARTITION BY part_strategy '(' part_params ')' hash_partitions hash_function { PartitionSpec *n = makeNode(PartitionSpec);@@ -3674,10 +3686,21 @@ PartitionSpec: PARTITION BY part_strategy '(' part_params ')'
n->partParams = $5;
n->location = @1;+ n->partnparts = $7; + n->hashfunc = $8; + $$ = n; } ;+hash_partitions: PARTITIONS Iconst { $$ = $2; } + | /*EMPTY*/ { $$ = -1; } + ; + +hash_function: USING handler_name { $$ = $2; } + | /*EMPTY*/ { $$ = NULL; } + ; + part_strategy: IDENT { $$ = $1; } | unreserved_keyword { $$ = pstrdup($1); } ; @@ -14377,6 +14400,7 @@ unreserved_keyword: | GLOBAL | GRANTED | HANDLER + | HASH | HEADER_P | HOLD | HOUR_P @@ -14448,6 +14472,7 @@ unreserved_keyword: | PARSER | PARTIAL | PARTITION + | PARTITIONS | PASSING | PASSWORD | PLANS diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c index ff2bab6..8e1be31 100644 --- a/src/backend/parser/parse_utilcmd.c +++ b/src/backend/parser/parse_utilcmd.c @@ -40,6 +40,7 @@ #include "catalog/pg_opclass.h" #include "catalog/pg_operator.h" #include "catalog/pg_type.h" +#include "catalog/partition.h" #include "commands/comment.h" #include "commands/defrem.h" #include "commands/tablecmds.h" @@ -3252,6 +3253,24 @@ transformPartitionBound(ParseState *pstate, Relation parent, Node *bound) ++i; } } + else if (strategy == PARTITION_STRATEGY_HASH) + { + Value *conval; + Node *value; + int index; + + if (spec->strategy != PARTITION_STRATEGY_HASH) + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("invalid bound specification for a hash partition"))); + + index = get_next_hash_partition_index(parent); + + /* store the partition index as a listdatums value */ + conval = makeInteger(index); + value = (Node *) make_const(pstate, conval, -1); + result_spec->listdatums = list_make1(value); + } else elog(ERROR, "unexpected partition strategy: %d", (int) strategy);diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index b27b77d..fab6eea 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -1423,7 +1423,7 @@ pg_get_indexdef_worker(Oid indexrelid, int colno, * * Returns the partition key specification, ie, the following: * - * PARTITION BY { RANGE | LIST } (column opt_collation opt_opclass [, ...]) + * PARTITION BY { RANGE | LIST | HASH } (column opt_collation opt_opclass [, ...]) */ Datum pg_get_partkeydef(PG_FUNCTION_ARGS) @@ -1513,6 +1513,9 @@ pg_get_partkeydef_worker(Oid relid, int prettyFlags) case PARTITION_STRATEGY_RANGE: appendStringInfo(&buf, "RANGE"); break; + case PARTITION_STRATEGY_HASH: + appendStringInfo(&buf, "HASH"); + break; default: elog(ERROR, "unexpected partition strategy: %d", (int) form->partstrat); @@ -8520,6 +8523,9 @@ get_rule_expr(Node *node, deparse_context *context, appendStringInfoString(buf, ")"); break;+ case PARTITION_STRATEGY_HASH: + break; + default: elog(ERROR, "unrecognized partition strategy: %d", (int) spec->strategy); diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index 9001e20..829e4d2 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -855,6 +855,9 @@ RelationBuildPartitionKey(Relation relation) key->strategy = form->partstrat; key->partnatts = form->partnatts;+ key->partnparts = form->partnparts; + key->parthashfunc = form->parthashfunc; + /* * We can rely on the first variable-length attribute being mapped to the * relevant field of the catalog's C struct, because all previous @@ -999,6 +1002,9 @@ copy_partition_key(PartitionKey fromkey) newkey->strategy = fromkey->strategy; newkey->partnatts = n = fromkey->partnatts;+ newkey->partnparts = fromkey->partnparts; + newkey->parthashfunc = fromkey->parthashfunc; + newkey->partattrs = (AttrNumber *) palloc(n * sizeof(AttrNumber)); memcpy(newkey->partattrs, fromkey->partattrs, n * sizeof(AttrNumber));diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h index 1187797..367e2f8 100644 --- a/src/include/catalog/heap.h +++ b/src/include/catalog/heap.h @@ -141,7 +141,7 @@ extern void StorePartitionKey(Relation rel, AttrNumber *partattrs, List *partexprs, Oid *partopclass, - Oid *partcollation); + Oid *partcollation, int16 partnparts, Oid hashfunc); extern void RemovePartitionKeyByRelId(Oid relid); extern void StorePartitionBound(Relation rel, Relation parent, Node *bound);diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h index b195d1a..80f4b0e 100644 --- a/src/include/catalog/partition.h +++ b/src/include/catalog/partition.h @@ -89,4 +89,6 @@ extern int get_partition_for_tuple(PartitionDispatch *pd, TupleTableSlot *slot, EState *estate, Oid *failed_at); +extern Expr *convert_expr_for_hash(Expr *expr, int npart, Oid hashfunc); +extern int get_next_hash_partition_index(Relation parent); #endif /* PARTITION_H */ diff --git a/src/include/catalog/pg_partitioned_table.h b/src/include/catalog/pg_partitioned_table.h index bdff36a..69e509c 100644 --- a/src/include/catalog/pg_partitioned_table.h +++ b/src/include/catalog/pg_partitioned_table.h @@ -33,6 +33,9 @@ CATALOG(pg_partitioned_table,3350) BKI_WITHOUT_OIDS char partstrat; /* partitioning strategy */ int16 partnatts; /* number of partition key columns */+ int16 partnparts;
+ Oid parthashfunc;
+
/*
* variable-length fields start here, but we allow direct access to
* partattrs via the C struct. That's because the first variable-length
@@ -49,6 +52,8 @@ CATALOG(pg_partitioned_table,3350) BKI_WITHOUT_OIDS
pg_node_tree partexprs; /* list of expressions in the partition key;
* one item for each zero entry in partattrs[] */
#endif
+
+
} FormData_pg_partitioned_table;/* ---------------- @@ -62,13 +67,15 @@ typedef FormData_pg_partitioned_table *Form_pg_partitioned_table; * compiler constants for pg_partitioned_table * ---------------- */ -#define Natts_pg_partitioned_table 7 +#define Natts_pg_partitioned_table 9 #define Anum_pg_partitioned_table_partrelid 1 #define Anum_pg_partitioned_table_partstrat 2 #define Anum_pg_partitioned_table_partnatts 3 -#define Anum_pg_partitioned_table_partattrs 4 -#define Anum_pg_partitioned_table_partclass 5 -#define Anum_pg_partitioned_table_partcollation 6 -#define Anum_pg_partitioned_table_partexprs 7 +#define Anum_pg_partitioned_table_partnparts 4 +#define Anum_pg_partitioned_table_parthashfunc 5 +#define Anum_pg_partitioned_table_partattrs 6 +#define Anum_pg_partitioned_table_partclass 7 +#define Anum_pg_partitioned_table_partcollation 8 +#define Anum_pg_partitioned_table_partexprs 9#endif /* PG_PARTITIONED_TABLE_H */ diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 5afc3eb..1c3474f 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -730,11 +730,14 @@ typedef struct PartitionSpec NodeTag type; char *strategy; /* partitioning strategy ('list' or 'range') */ List *partParams; /* List of PartitionElems */ + int partnparts; + List *hashfunc; int location; /* token location, or -1 if unknown */ } PartitionSpec;#define PARTITION_STRATEGY_LIST 'l'
#define PARTITION_STRATEGY_RANGE 'r'
+#define PARTITION_STRATEGY_HASH 'h'/* * PartitionBoundSpec - a partition bound specification diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h index 985d650..0597939 100644 --- a/src/include/parser/kwlist.h +++ b/src/include/parser/kwlist.h @@ -180,6 +180,7 @@ PG_KEYWORD("greatest", GREATEST, COL_NAME_KEYWORD) PG_KEYWORD("group", GROUP_P, RESERVED_KEYWORD) PG_KEYWORD("grouping", GROUPING, COL_NAME_KEYWORD) PG_KEYWORD("handler", HANDLER, UNRESERVED_KEYWORD) +PG_KEYWORD("hash", HASH, UNRESERVED_KEYWORD) PG_KEYWORD("having", HAVING, RESERVED_KEYWORD) PG_KEYWORD("header", HEADER_P, UNRESERVED_KEYWORD) PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD) @@ -291,6 +292,7 @@ PG_KEYWORD("parallel", PARALLEL, UNRESERVED_KEYWORD) PG_KEYWORD("parser", PARSER, UNRESERVED_KEYWORD) PG_KEYWORD("partial", PARTIAL, UNRESERVED_KEYWORD) PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD) +PG_KEYWORD("partitions", PARTITIONS, UNRESERVED_KEYWORD) PG_KEYWORD("passing", PASSING, UNRESERVED_KEYWORD) PG_KEYWORD("password", PASSWORD, UNRESERVED_KEYWORD) PG_KEYWORD("placing", PLACING, RESERVED_KEYWORD) diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index a617a7c..660adfb 100644 --- a/src/include/utils/rel.h +++ b/src/include/utils/rel.h @@ -62,6 +62,9 @@ typedef struct PartitionKeyData Oid *partopcintype; /* OIDs of opclass declared input data types */ FmgrInfo *partsupfunc; /* lookup info for support funcs */+ int16 partnparts; /* number of hash partitions */ + Oid parthashfunc; /* OID of hash function */ + /* Partitioning collation per attribute */ Oid *partcollation;--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers--
Best regards,
Aleksander Alekseev
--
Yugo Nagata <nagata@sraoss.co.jp>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Ammit,
On Wed, 1 Mar 2017 11:14:15 +0900
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Nagata-san,
On 2017/02/28 23:33, Yugo Nagata wrote:
Hi all,
Now we have a declarative partitioning, but hash partitioning is not
implemented yet. Attached is a POC patch to add the hash partitioning
feature. I know we will need more discussions about the syntax and other
specifications before going ahead the project, but I think this runnable
code might help to discuss what and how we implement this.Great!
Thank you!
* Description
In this patch, the hash partitioning implementation is basically based
on the list partitioning mechanism. However, partition bounds cannot be
specified explicitly, but this is used internally as hash partition
index, which is calculated when a partition is created or attached.The tentative syntax to create a partitioned table is as bellow;
CREATE TABLE h (i int) PARTITION BY HASH(i) PARTITIONS 3 USING hashint4;
The number of partitions is specified by PARTITIONS, which is currently
constant and cannot be changed, but I think this is needed to be changed in
some manner. A hash function is specified by USING. Maybe, specifying hash
function may be ommitted, and in this case, a default hash function
corresponding to key type will be used.A partition table can be create as bellow;
CREATE TABLE h1 PARTITION OF h;
CREATE TABLE h2 PARTITION OF h;
CREATE TABLE h3 PARTITION OF h;FOR VALUES clause cannot be used, and the partition bound is
calclulated automatically as partition index of single integer value.When trying create partitions more than the number specified
by PARTITIONS, it gets an error.postgres=# create table h4 partition of h;
ERROR: cannot create hash partition more than 3 for hInstead of having to create each partition individually, wouldn't it be
better if the following commandCREATE TABLE h (i int) PARTITION BY HASH (i) PARTITIONS 3;
created the partitions *automatically*?
It makes sense to provide a way to create individual list and range
partitions separately, because users can specify custom bounds for each.
We don't need that for hash partitions, so why make users run separate
commands (without the FOR VALUES clause) anyway? We may perhaps need to
offer a way to optionally specify a user-defined name for each partition
in the same command, along with tablespace, storage options, etc. By
default, the names would be generated internally and the user can ALTER
individual partitions after the fact to specify tablespace, etc.
I though that creating each partition individually is needed because some
user will want to specify a tablespce to each partition. However, as you
say, that isn't need for many cases because use can move a partition
to other tablespaces afterward by ALTER.
Thanks,
Yugo Nagata
Thanks,
Amit--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Yugo Nagata <nagata@sraoss.co.jp>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, 1 Mar 2017 10:30:09 +0530
Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
On Tue, Feb 28, 2017 at 8:03 PM, Yugo Nagata <nagata@sraoss.co.jp> wrote:
Hi all,
Now we have a declarative partitioning, but hash partitioning is not
implemented yet. Attached is a POC patch to add the hash partitioning
feature. I know we will need more discussions about the syntax and other
specifications before going ahead the project, but I think this runnable
code might help to discuss what and how we implement this.* Description
In this patch, the hash partitioning implementation is basically based
on the list partitioning mechanism. However, partition bounds cannot be
specified explicitly, but this is used internally as hash partition
index, which is calculated when a partition is created or attached.The tentative syntax to create a partitioned table is as bellow;
CREATE TABLE h (i int) PARTITION BY HASH(i) PARTITIONS 3 USING hashint4;
The number of partitions is specified by PARTITIONS, which is currently
constant and cannot be changed, but I think this is needed to be changed in
some manner. A hash function is specified by USING. Maybe, specifying hash
function may be ommitted, and in this case, a default hash function
corresponding to key type will be used.A partition table can be create as bellow;
CREATE TABLE h1 PARTITION OF h;
CREATE TABLE h2 PARTITION OF h;
CREATE TABLE h3 PARTITION OF h;FOR VALUES clause cannot be used, and the partition bound is
calclulated automatically as partition index of single integer value.When trying create partitions more than the number specified
by PARTITIONS, it gets an error.postgres=# create table h4 partition of h;
ERROR: cannot create hash partition more than 3 for hAn inserted record is stored in a partition whose index equals
abs(hashfunc(key)) % <number_of_partitions>. In the above
example, this is abs(hashint4(i))%3.postgres=# insert into h (select generate_series(0,20));
INSERT 0 21postgres=# select *,tableoid::regclass from h;
i | tableoid
----+----------
0 | h1
1 | h1
2 | h1
4 | h1
8 | h1
10 | h1
11 | h1
14 | h1
15 | h1
17 | h1
20 | h1
5 | h2
12 | h2
13 | h2
16 | h2
19 | h2
3 | h3
6 | h3
7 | h3
9 | h3
18 | h3
(21 rows)This is good, I will have closer look into the patch, but here are
few quick comments.
Thanks. I'm looking forward to your comments.
- CREATE HASH partition syntax adds two new keywords and ideally
we should try to avoid adding additional keywords. Also I can see that
HASH keyword been added, but I don't see any use of newly added
keyword in gram.y.
Yes, you are right. HASH keyword is not necessary. I'll remove it
from the patch.
- Also I didn't like the idea of fixing number of partitions during the
CREATE
TABLE syntax. Thats something that needs to be able to changes.
I agree. The number specified by PARTIONS should be the *initial* number
of partitions and this should be abelt to be changed. I'm investigating
the way.
* Todo / discussions
In this patch, we cannot change the number of partitions specified
by PARTITIONS. I we can change this, the partitioning rule
(<partition index> = abs(hashfunc(key)) % <number_of_partitions>)
is also changed and then we need reallocatiing records between
partitions.In this patch, user can specify a hash function USING. However,
we migth need default hash functions which are useful and
proper for hash partitioning.+1
- With fixing default hash function and not specifying number of partitions
during CREATE TABLE - don't need two new additional columns into
pg_partitioned_table catalog.
I think the option to specify a hash function is needed because
user may want to use a user-defined hash function for some reasons,
for example, when a user-defined type is used as a partition key.
Currently, even when we issue SELECT query with a condition,
postgres looks into all partitions regardless of each partition's
constraint, because this is complicated such like "abs(hashint4(i))%3 = 0".postgres=# explain select * from h where i = 10;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..125.62 rows=40 width=4)
-> Seq Scan on h (cost=0.00..0.00 rows=1 width=4)
Filter: (i = 10)
-> Seq Scan on h1 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
-> Seq Scan on h2 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
-> Seq Scan on h3 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
(9 rows)However, if we modify a condition into a same expression
as the partitions constraint, postgres can exclude unrelated
table from search targets. So, we might avoid the problem
by converting the qual properly before calling predicate_refuted_by().postgres=# explain select * from h where abs(hashint4(i))%3 =
abs(hashint4(10))%3;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..61.00 rows=14 width=4)
-> Seq Scan on h (cost=0.00..0.00 rows=1 width=4)
Filter: ((abs(hashint4(i)) % 3) = 2)
-> Seq Scan on h3 (cost=0.00..61.00 rows=13 width=4)
Filter: ((abs(hashint4(i)) % 3) = 2)
(5 rows)Best regards,
Yugo Nagata--
Yugo Nagata <nagata@sraoss.co.jp>--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackersRegards,
Rushabh Lathia
--
Yugo Nagata <nagata@sraoss.co.jp>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, 1 Mar 2017 10:52:58 +0530
amul sul <sulamul@gmail.com> wrote:
On Tue, Feb 28, 2017 at 8:03 PM, Yugo Nagata <nagata@sraoss.co.jp> wrote:
Hi all,
Now we have a declarative partitioning, but hash partitioning is not
implemented yet. Attached is a POC patch to add the hash partitioning
feature. I know we will need more discussions about the syntax and other
specifications before going ahead the project, but I think this runnable
code might help to discuss what and how we implement this.Great.
Thanks.
* Description
In this patch, the hash partitioning implementation is basically based
on the list partitioning mechanism. However, partition bounds cannot be
specified explicitly, but this is used internally as hash partition
index, which is calculated when a partition is created or attached.The tentative syntax to create a partitioned table is as bellow;
CREATE TABLE h (i int) PARTITION BY HASH(i) PARTITIONS 3 USING hashint4;
The number of partitions is specified by PARTITIONS, which is currently
constant and cannot be changed, but I think this is needed to be changedin
some manner. A hash function is specified by USING. Maybe, specifying hash
function may be ommitted, and in this case, a default hash function
corresponding to key type will be used.A partition table can be create as bellow;
CREATE TABLE h1 PARTITION OF h;
CREATE TABLE h2 PARTITION OF h;
CREATE TABLE h3 PARTITION OF h;FOR VALUES clause cannot be used, and the partition bound is
calclulated automatically as partition index of single integer value.When trying create partitions more than the number specified
by PARTITIONS, it gets an error.postgres=# create table h4 partition of h;
ERROR: cannot create hash partition more than 3 for hAn inserted record is stored in a partition whose index equals
abs(hashfunc(key)) % <number_of_partitions>. In the above
example, this is abs(hashint4(i))%3.postgres=# insert into h (select generate_series(0,20));
INSERT 0 21postgres=# select *,tableoid::regclass from h;
i | tableoid
----+----------
0 | h1
1 | h1
2 | h1
4 | h1
8 | h1
10 | h1
11 | h1
14 | h1
15 | h1
17 | h1
20 | h1
5 | h2
12 | h2
13 | h2
16 | h2
19 | h2
3 | h3
6 | h3
7 | h3
9 | h3
18 | h3
(21 rows)* Todo / discussions
In this patch, we cannot change the number of partitions specified
by PARTITIONS. I we can change this, the partitioning rule
(<partition index> = abs(hashfunc(key)) % <number_of_partitions>)
is also changed and then we need reallocatiing records between
partitions.In this patch, user can specify a hash function USING. However,
we migth need default hash functions which are useful and
proper for hash partitioning.IMHO, we should try to keep create partition syntax simple and aligned
with other partition strategy. For e.g:
CREATE TABLE h (i int) PARTITION BY HASH(i);I Agree that it is unavoidable partitions number in modulo hashing,
but we can do in other hashing technique. Have you had thought about
Linear hashing[1] or Consistent hashing[2]? This will allow us to
add/drop
partition with minimal row moment.
Thank you for your information of hash technique. I'll see them
and try to allowing the number of partitions to be changed.
Thanks,
Yugo Nagata
+1 for the default hash function corresponding to partitioning key type.
Regards,
Amul
[1] https://en.wikipedia.org/wiki/Linear_hashing
[2] https://en.wikipedia.org/wiki/Consistent_hashing
--
Yugo Nagata <nagata@sraoss.co.jp>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi, Yugo.
Today I've had an opportunity to take a closer look on this patch. Here are
a few things that bother me.
1a) There are missing commends here:
```
--- a/src/include/catalog/pg_partitioned_table.h
+++ b/src/include/catalog/pg_partitioned_table.h
@@ -33,6 +33,9 @@ CATALOG(pg_partitioned_table,3350) BKI_WITHOUT_OIDS
char partstrat; /* partitioning strategy */
int16 partnatts; /* number of partition key columns */
+ int16 partnparts;
+ Oid parthashfunc;
+
```
1b) ... and here:
```
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -730,11 +730,14 @@ typedef struct PartitionSpec
NodeTag type;
char *strategy; /* partitioning strategy ('list' or 'range') */
List *partParams; /* List of PartitionElems */
+ int partnparts;
+ List *hashfunc;
int location; /* token location, or -1 if unknown */
} PartitionSpec;
```
2) I believe new empty lines in patches are generally not welcomed by
community:
```
@@ -49,6 +52,8 @@ CATALOG(pg_partitioned_table,3350) BKI_WITHOUT_OIDS
pg_node_tree partexprs; /* list of expressions in the partition key;
* one item for each zero entry in partattrs[] */
#endif
+
+
} FormData_pg_partitioned_table;
```
3) One test fails on my laptop (Arch Linux, x64) [1]http://afiskon.ru/s/dd/20cbe21934_regression.diffs.txt:
```
***************
*** 344,350 ****
CREATE TABLE partitioned (
a int
) PARTITION BY HASH (a);
! ERROR: unrecognized partitioning strategy "hash"
-- specified column must be present in the table
CREATE TABLE partitioned (
a int
--- 344,350 ----
CREATE TABLE partitioned (
a int
) PARTITION BY HASH (a);
! ERROR: number of partitions must be specified for hash partition
-- specified column must be present in the table
CREATE TABLE partitioned (
a int
```
Exact script I'm using for building and testing PostgreSQL could be
found here [2]http://afiskon.ru/s/76/a4fb71739c_full-build.sh.txt.
4) As I already mentioned - missing documentation.
In general patch looks quite good to me. I personally believe it has all
the changes to be accepted in current commitfest. Naturally if community
will come to a consensus regarding keywords, whether all partitions
should be created automatically, etc :)
[1]: http://afiskon.ru/s/dd/20cbe21934_regression.diffs.txt
[2]: http://afiskon.ru/s/76/a4fb71739c_full-build.sh.txt
On Wed, Mar 01, 2017 at 06:10:10PM +0900, Yugo Nagata wrote:
Hi Aleksander,
On Tue, 28 Feb 2017 18:05:36 +0300
Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote:Hi, Yugo.
Looks like a great feature! I'm going to take a closer look on your code
and write a feedback shortly. For now I can only tell that you forgot
to include some documentation in the patch.Thank you for looking into it. I'm forward to your feedback.
This is a proof of concept patch and additional documentation
is not included. I'll add this after reaching a consensus
on the specification of the feature.I've added a corresponding entry to current commitfest [1]. Hope you
don't mind. If it's not too much trouble could you please register on a
commitfest site and add yourself to this entry as an author? I'm pretty
sure someone is using this information for writing release notes or
something like this.Thank you for registering it to the commitfest. I have added me as an auther.
[1] https://commitfest.postgresql.org/13/1059/
On Tue, Feb 28, 2017 at 11:33:13PM +0900, Yugo Nagata wrote:
Hi all,
Now we have a declarative partitioning, but hash partitioning is not
implemented yet. Attached is a POC patch to add the hash partitioning
feature. I know we will need more discussions about the syntax and other
specifications before going ahead the project, but I think this runnable
code might help to discuss what and how we implement this.* Description
In this patch, the hash partitioning implementation is basically based
on the list partitioning mechanism. However, partition bounds cannot be
specified explicitly, but this is used internally as hash partition
index, which is calculated when a partition is created or attached.The tentative syntax to create a partitioned table is as bellow;
CREATE TABLE h (i int) PARTITION BY HASH(i) PARTITIONS 3 USING hashint4;
The number of partitions is specified by PARTITIONS, which is currently
constant and cannot be changed, but I think this is needed to be changed in
some manner. A hash function is specified by USING. Maybe, specifying hash
function may be ommitted, and in this case, a default hash function
corresponding to key type will be used.A partition table can be create as bellow;
CREATE TABLE h1 PARTITION OF h;
CREATE TABLE h2 PARTITION OF h;
CREATE TABLE h3 PARTITION OF h;FOR VALUES clause cannot be used, and the partition bound is
calclulated automatically as partition index of single integer value.When trying create partitions more than the number specified
by PARTITIONS, it gets an error.postgres=# create table h4 partition of h;
ERROR: cannot create hash partition more than 3 for hAn inserted record is stored in a partition whose index equals
abs(hashfunc(key)) % <number_of_partitions>. In the above
example, this is abs(hashint4(i))%3.postgres=# insert into h (select generate_series(0,20));
INSERT 0 21postgres=# select *,tableoid::regclass from h;
i | tableoid
----+----------
0 | h1
1 | h1
2 | h1
4 | h1
8 | h1
10 | h1
11 | h1
14 | h1
15 | h1
17 | h1
20 | h1
5 | h2
12 | h2
13 | h2
16 | h2
19 | h2
3 | h3
6 | h3
7 | h3
9 | h3
18 | h3
(21 rows)* Todo / discussions
In this patch, we cannot change the number of partitions specified
by PARTITIONS. I we can change this, the partitioning rule
(<partition index> = abs(hashfunc(key)) % <number_of_partitions>)
is also changed and then we need reallocatiing records between
partitions.In this patch, user can specify a hash function USING. However,
we migth need default hash functions which are useful and
proper for hash partitioning.Currently, even when we issue SELECT query with a condition,
postgres looks into all partitions regardless of each partition's
constraint, because this is complicated such like "abs(hashint4(i))%3 = 0".postgres=# explain select * from h where i = 10;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..125.62 rows=40 width=4)
-> Seq Scan on h (cost=0.00..0.00 rows=1 width=4)
Filter: (i = 10)
-> Seq Scan on h1 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
-> Seq Scan on h2 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
-> Seq Scan on h3 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
(9 rows)However, if we modify a condition into a same expression
as the partitions constraint, postgres can exclude unrelated
table from search targets. So, we might avoid the problem
by converting the qual properly before calling predicate_refuted_by().postgres=# explain select * from h where abs(hashint4(i))%3 = abs(hashint4(10))%3;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..61.00 rows=14 width=4)
-> Seq Scan on h (cost=0.00..0.00 rows=1 width=4)
Filter: ((abs(hashint4(i)) % 3) = 2)
-> Seq Scan on h3 (cost=0.00..61.00 rows=13 width=4)
Filter: ((abs(hashint4(i)) % 3) = 2)
(5 rows)Best regards,
Yugo Nagata--
Yugo Nagata <nagata@sraoss.co.jp>diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 41c0056..3820920 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -3074,7 +3074,7 @@ StorePartitionKey(Relation rel, AttrNumber *partattrs, List *partexprs, Oid *partopclass, - Oid *partcollation) + Oid *partcollation, int16 partnparts, Oid hashfunc) { int i; int2vector *partattrs_vec; @@ -3121,6 +3121,8 @@ StorePartitionKey(Relation rel, values[Anum_pg_partitioned_table_partrelid - 1] = ObjectIdGetDatum(RelationGetRelid(rel)); values[Anum_pg_partitioned_table_partstrat - 1] = CharGetDatum(strategy); values[Anum_pg_partitioned_table_partnatts - 1] = Int16GetDatum(partnatts); + values[Anum_pg_partitioned_table_partnparts - 1] = Int16GetDatum(partnparts); + values[Anum_pg_partitioned_table_parthashfunc - 1] = ObjectIdGetDatum(hashfunc); values[Anum_pg_partitioned_table_partattrs - 1] = PointerGetDatum(partattrs_vec); values[Anum_pg_partitioned_table_partclass - 1] = PointerGetDatum(partopclass_vec); values[Anum_pg_partitioned_table_partcollation - 1] = PointerGetDatum(partcollation_vec); diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 4bcef58..24e69c6 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -36,6 +36,8 @@ #include "optimizer/clauses.h" #include "optimizer/planmain.h" #include "optimizer/var.h" +#include "parser/parse_func.h" +#include "parser/parse_oper.h" #include "rewrite/rewriteManip.h" #include "storage/lmgr.h" #include "utils/array.h" @@ -120,6 +122,7 @@ static int32 qsort_partition_rbound_cmp(const void *a, const void *b,static List *get_qual_for_list(PartitionKey key, PartitionBoundSpec *spec); static List *get_qual_for_range(PartitionKey key, PartitionBoundSpec *spec); +static List *get_qual_for_hash(PartitionKey key, PartitionBoundSpec *spec); static Oid get_partition_operator(PartitionKey key, int col, StrategyNumber strategy, bool *need_relabel); static List *generate_partition_qual(Relation rel); @@ -236,7 +239,8 @@ RelationBuildPartitionDesc(Relation rel) oids[i++] = lfirst_oid(cell);/* Convert from node to the internal representation */ - if (key->strategy == PARTITION_STRATEGY_LIST) + if (key->strategy == PARTITION_STRATEGY_LIST || + key->strategy == PARTITION_STRATEGY_HASH) { List *non_null_values = NIL;@@ -251,7 +255,7 @@ RelationBuildPartitionDesc(Relation rel)
ListCell *c;
PartitionBoundSpec *spec = lfirst(cell);- if (spec->strategy != PARTITION_STRATEGY_LIST) + if (spec->strategy != key->strategy) elog(ERROR, "invalid strategy in partition bound spec");foreach(c, spec->listdatums)
@@ -464,6 +468,7 @@ RelationBuildPartitionDesc(Relation rel)
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_HASH:
{
boundinfo->has_null = found_null;
boundinfo->indexes = (int *) palloc(ndatums * sizeof(int));
@@ -829,6 +834,18 @@ check_new_partition_bound(char *relname, Relation parent, Node *bound)
break;
}+ case PARTITION_STRATEGY_HASH: + { + Assert(spec->strategy == PARTITION_STRATEGY_HASH); + + if (partdesc->nparts + 1 > key->partnparts) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("cannot create hash partition more than %d for %s", + key->partnparts, RelationGetRelationName(parent)))); + break; + } + default: elog(ERROR, "unexpected partition strategy: %d", (int) key->strategy); @@ -916,6 +933,11 @@ get_qual_from_partbound(Relation rel, Relation parent, Node *bound) my_qual = get_qual_for_range(key, spec); break;+ case PARTITION_STRATEGY_HASH: + Assert(spec->strategy == PARTITION_STRATEGY_LIST); + my_qual = get_qual_for_hash(key, spec); + break; + default: elog(ERROR, "unexpected partition strategy: %d", (int) key->strategy); @@ -1146,6 +1168,84 @@ RelationGetPartitionDispatchInfo(Relation rel, int lockmode, return pd; }+/* + * convert_expr_for_hash + * + * Converts a expr for a hash partition's constraint. + * expr is converted into 'abs(hashfunc(expr)) % npart". + * + * npart: number of partitions + * hashfunc: OID of hash function + */ +Expr * +convert_expr_for_hash(Expr *expr, int npart, Oid hashfunc) +{ + FuncExpr *func, + *abs; + Expr *modexpr; + Oid modoid; + Oid int4oid[1] = {INT4OID}; + + ParseState *pstate = make_parsestate(NULL); + Value *val_npart = makeInteger(npart); + Node *const_npart = (Node *) make_const(pstate, val_npart, -1); + + /* hash function */ + func = makeFuncExpr(hashfunc, + INT4OID, + list_make1(expr), + 0, + 0, + COERCE_EXPLICIT_CALL); + + /* Abs */ + abs = makeFuncExpr(LookupFuncName(list_make1(makeString("abs")), 1, int4oid, false), + INT4OID, + list_make1(func), + 0, + 0, + COERCE_EXPLICIT_CALL); + + /* modulo by npart */ + modoid = LookupOperName(pstate, list_make1(makeString("%")), INT4OID, INT4OID, false, -1); + modexpr = make_opclause(modoid, INT4OID, false, (Expr*)abs, (Expr*)const_npart, 0, 0); + + return modexpr; +} + + +/* + * get_next_hash_partition_index + * + * Returns the minimal index which is not used for hash partition. + */ +int +get_next_hash_partition_index(Relation parent) +{ + PartitionKey key = RelationGetPartitionKey(parent); + PartitionDesc partdesc = RelationGetPartitionDesc(parent); + + int i; + bool *used = palloc0(sizeof(int) * key->partnparts); + + /* mark used for existing partition indexs */ + for (i = 0; i < partdesc->boundinfo->ndatums; i++) + { + Datum* datum = partdesc->boundinfo->datums[i]; + int idx = DatumGetInt16(datum[0]); + + if (!used[idx]) + used[idx] = true; + } + + /* find the minimal unused index */ + for (i = 0; i < key->partnparts; i++) + if (!used[i]) + break; + + return i; +} + /* Module-local functions *//*
@@ -1467,6 +1567,43 @@ get_qual_for_range(PartitionKey key, PartitionBoundSpec *spec)
}/* + * get_qual_for_hash + * + * Returns a list of expressions to use as a hash partition's constraint. + */ +static List * +get_qual_for_hash(PartitionKey key, PartitionBoundSpec *spec) +{ + List *result; + Expr *keyCol; + Expr *expr; + Expr *opexpr; + Oid operoid; + ParseState *pstate = make_parsestate(NULL); + + /* Left operand */ + if (key->partattrs[0] != 0) + keyCol = (Expr *) makeVar(1, + key->partattrs[0], + key->parttypid[0], + key->parttypmod[0], + key->parttypcoll[0], + 0); + else + keyCol = (Expr *) copyObject(linitial(key->partexprs)); + + expr = convert_expr_for_hash(keyCol, key->partnparts, key->parthashfunc); + + /* equals the listdaums value */ + operoid = LookupOperName(pstate, list_make1(makeString("=")), INT4OID, INT4OID, false, -1); + opexpr = make_opclause(operoid, BOOLOID, false, expr, linitial(spec->listdatums), 0, 0); + + result = list_make1(opexpr); + + return result; +} + +/* * get_partition_operator * * Return oid of the operator of given strategy for a given partition key @@ -1730,6 +1867,11 @@ get_partition_for_tuple(PartitionDispatch *pd, (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), errmsg("range partition key of row contains null"))); } + else if (key->strategy == PARTITION_STRATEGY_HASH) + { + values[0] = OidFunctionCall1(key->parthashfunc, values[0]); + values[0] = Int16GetDatum(Abs(DatumGetInt16(values[0])) % key->partnparts); + }if (partdesc->boundinfo->has_null && isnull[0]) /* Tuple maps to the null-accepting list partition */ @@ -1744,6 +1886,7 @@ get_partition_for_tuple(PartitionDispatch *pd, switch (key->strategy) { case PARTITION_STRATEGY_LIST: + case PARTITION_STRATEGY_HASH: if (cur_offset >= 0 && equal) cur_index = partdesc->boundinfo->indexes[cur_offset]; else @@ -1968,6 +2111,7 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, switch (key->strategy) { case PARTITION_STRATEGY_LIST: + case PARTITION_STRATEGY_HASH: cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0], key->partcollation[0], bound_datums[0], diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 3cea220..5a28cc0 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -41,6 +41,7 @@ #include "catalog/pg_inherits_fn.h" #include "catalog/pg_namespace.h" #include "catalog/pg_opclass.h" +#include "catalog/pg_proc.h" #include "catalog/pg_tablespace.h" #include "catalog/pg_trigger.h" #include "catalog/pg_type.h" @@ -77,6 +78,7 @@ #include "parser/parse_oper.h" #include "parser/parse_relation.h" #include "parser/parse_type.h" +#include "parser/parse_func.h" #include "parser/parse_utilcmd.h" #include "parser/parser.h" #include "pgstat.h" @@ -450,7 +452,7 @@ static void RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid, void *arg); static bool is_partition_attr(Relation rel, AttrNumber attnum, bool *used_in_expr); static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy); -static void ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, +static void ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, Oid *partatttypes, List **partexprs, Oid *partopclass, Oid *partcollation); static void CreateInheritance(Relation child_rel, Relation parent_rel); static void RemoveInheritance(Relation child_rel, Relation parent_rel); @@ -799,8 +801,10 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, AttrNumber partattrs[PARTITION_MAX_KEYS]; Oid partopclass[PARTITION_MAX_KEYS]; Oid partcollation[PARTITION_MAX_KEYS]; + Oid partatttypes[PARTITION_MAX_KEYS]; List *partexprs = NIL; List *cmds = NIL; + Oid hashfuncOid = InvalidOid;/* * We need to transform the raw parsetrees corresponding to partition @@ -811,15 +815,40 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, stmt->partspec = transformPartitionSpec(rel, stmt->partspec, &strategy); ComputePartitionAttrs(rel, stmt->partspec->partParams, - partattrs, &partexprs, partopclass, + partattrs, partatttypes, &partexprs, partopclass, partcollation);partnatts = list_length(stmt->partspec->partParams); + + if (strategy == PARTITION_STRATEGY_HASH) + { + Oid funcrettype; + + if (partnatts != 1) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of partition key must be 1 for hash partition"))); + + hashfuncOid = LookupFuncName(stmt->partspec->hashfunc, 1, partatttypes, false); + funcrettype = get_func_rettype(hashfuncOid); + if (funcrettype != INT4OID) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("hash function for partitioning must return integer"))); + + if (func_volatile(hashfuncOid) != PROVOLATILE_IMMUTABLE) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("hash function for partitioning must be marked IMMUTABLE"))); + + } + StorePartitionKey(rel, strategy, partnatts, partattrs, partexprs, - partopclass, partcollation); + partopclass, partcollation, stmt->partspec->partnparts, hashfuncOid);- /* Force key columns to be NOT NULL when using range partitioning */ - if (strategy == PARTITION_STRATEGY_RANGE) + /* Force key columns to be NOT NULL when using range or hash partitioning */ + if (strategy == PARTITION_STRATEGY_RANGE || + strategy == PARTITION_STRATEGY_HASH) { for (i = 0; i < partnatts; i++) { @@ -12783,18 +12812,51 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy) newspec->strategy = partspec->strategy; newspec->location = partspec->location; newspec->partParams = NIL; + newspec->partnparts = partspec->partnparts; + newspec->hashfunc = partspec->hashfunc;/* Parse partitioning strategy name */ if (!pg_strcasecmp(partspec->strategy, "list")) *strategy = PARTITION_STRATEGY_LIST; else if (!pg_strcasecmp(partspec->strategy, "range")) *strategy = PARTITION_STRATEGY_RANGE; + else if (!pg_strcasecmp(partspec->strategy, "hash")) + *strategy = PARTITION_STRATEGY_HASH; else ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("unrecognized partitioning strategy \"%s\"", partspec->strategy)));+ if (*strategy == PARTITION_STRATEGY_HASH) + { + if (partspec->partnparts < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of partitions must be specified for hash partition"))); + else if (partspec->partnparts == 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of partitions must be greater than 0"))); + + if (list_length(partspec->hashfunc) == 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("hash function must be specified for hash partition"))); + } + else + { + if (partspec->partnparts >= 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of partitions can be specified only for hash partition"))); + + if (list_length(partspec->hashfunc) > 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("hash function can be specified only for hash partition"))); + } + /* * Create a dummy ParseState and insert the target relation as its sole * rangetable entry. We need a ParseState for transformExpr. @@ -12843,7 +12905,7 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy) * Compute per-partition-column information from a list of PartitionElem's */ static void -ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, +ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, Oid *partatttypes, List **partexprs, Oid *partopclass, Oid *partcollation) { int attn; @@ -13010,6 +13072,7 @@ ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, "btree", BTREE_AM_OID);+ partatttypes[attn] = atttype; attn++; } } diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 05d8538..f4febc9 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -4232,6 +4232,8 @@ _copyPartitionSpec(const PartitionSpec *from)COPY_STRING_FIELD(strategy);
COPY_NODE_FIELD(partParams);
+ COPY_SCALAR_FIELD(partnparts);
+ COPY_NODE_FIELD(hashfunc);
COPY_LOCATION_FIELD(location);return newnode; diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index d595cd7..d589eac 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -2725,6 +2725,8 @@ _equalPartitionSpec(const PartitionSpec *a, const PartitionSpec *b) { COMPARE_STRING_FIELD(strategy); COMPARE_NODE_FIELD(partParams); + COMPARE_SCALAR_FIELD(partnparts); + COMPARE_NODE_FIELD(hashfunc); COMPARE_LOCATION_FIELD(location);return true; diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index b3802b4..d6db80e 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -3318,6 +3318,8 @@ _outPartitionSpec(StringInfo str, const PartitionSpec *node)WRITE_STRING_FIELD(strategy);
WRITE_NODE_FIELD(partParams);
+ WRITE_INT_FIELD(partnparts);
+ WRITE_NODE_FIELD(hashfunc);
WRITE_LOCATION_FIELD(location);
}diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index e833b2e..b67140d 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -574,6 +574,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type <list> partbound_datum_list %type <partrange_datum> PartitionRangeDatum %type <list> range_datum_list +%type <ival> hash_partitions +%type <list> hash_function/*
* Non-keyword token types. These are hard-wired into the "flex" lexer.
@@ -627,7 +629,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);GLOBAL GRANT GRANTED GREATEST GROUP_P GROUPING
- HANDLER HAVING HEADER_P HOLD HOUR_P + HANDLER HASH HAVING HEADER_P HOLD HOUR_PIDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P
INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
@@ -651,7 +653,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
OBJECT_P OF OFF OFFSET OIDS OLD ON ONLY OPERATOR OPTION OPTIONS OR
ORDER ORDINALITY OUT_P OUTER_P OVER OVERLAPS OVERLAY OWNED OWNER- PARALLEL PARSER PARTIAL PARTITION PASSING PASSWORD PLACING PLANS POLICY + PARALLEL PARSER PARTIAL PARTITION PARTITIONS PASSING PASSWORD PLACING PLANS POLICY POSITION PRECEDING PRECISION PRESERVE PREPARE PREPARED PRIMARY PRIOR PRIVILEGES PROCEDURAL PROCEDURE PROGRAM PUBLICATION@@ -2587,6 +2589,16 @@ ForValues:
$$ = (Node *) n; } + + /* a HASH partition */ + | /*EMPTY*/ + { + PartitionBoundSpec *n = makeNode(PartitionBoundSpec); + + n->strategy = PARTITION_STRATEGY_HASH; + + $$ = (Node *) n; + } ;partbound_datum:
@@ -3666,7 +3678,7 @@ OptPartitionSpec: PartitionSpec { $$ = $1; }
| /*EMPTY*/ { $$ = NULL; }
;-PartitionSpec: PARTITION BY part_strategy '(' part_params ')' +PartitionSpec: PARTITION BY part_strategy '(' part_params ')' hash_partitions hash_function { PartitionSpec *n = makeNode(PartitionSpec);@@ -3674,10 +3686,21 @@ PartitionSpec: PARTITION BY part_strategy '(' part_params ')'
n->partParams = $5;
n->location = @1;+ n->partnparts = $7; + n->hashfunc = $8; + $$ = n; } ;+hash_partitions: PARTITIONS Iconst { $$ = $2; } + | /*EMPTY*/ { $$ = -1; } + ; + +hash_function: USING handler_name { $$ = $2; } + | /*EMPTY*/ { $$ = NULL; } + ; + part_strategy: IDENT { $$ = $1; } | unreserved_keyword { $$ = pstrdup($1); } ; @@ -14377,6 +14400,7 @@ unreserved_keyword: | GLOBAL | GRANTED | HANDLER + | HASH | HEADER_P | HOLD | HOUR_P @@ -14448,6 +14472,7 @@ unreserved_keyword: | PARSER | PARTIAL | PARTITION + | PARTITIONS | PASSING | PASSWORD | PLANS diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c index ff2bab6..8e1be31 100644 --- a/src/backend/parser/parse_utilcmd.c +++ b/src/backend/parser/parse_utilcmd.c @@ -40,6 +40,7 @@ #include "catalog/pg_opclass.h" #include "catalog/pg_operator.h" #include "catalog/pg_type.h" +#include "catalog/partition.h" #include "commands/comment.h" #include "commands/defrem.h" #include "commands/tablecmds.h" @@ -3252,6 +3253,24 @@ transformPartitionBound(ParseState *pstate, Relation parent, Node *bound) ++i; } } + else if (strategy == PARTITION_STRATEGY_HASH) + { + Value *conval; + Node *value; + int index; + + if (spec->strategy != PARTITION_STRATEGY_HASH) + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("invalid bound specification for a hash partition"))); + + index = get_next_hash_partition_index(parent); + + /* store the partition index as a listdatums value */ + conval = makeInteger(index); + value = (Node *) make_const(pstate, conval, -1); + result_spec->listdatums = list_make1(value); + } else elog(ERROR, "unexpected partition strategy: %d", (int) strategy);diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index b27b77d..fab6eea 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -1423,7 +1423,7 @@ pg_get_indexdef_worker(Oid indexrelid, int colno, * * Returns the partition key specification, ie, the following: * - * PARTITION BY { RANGE | LIST } (column opt_collation opt_opclass [, ...]) + * PARTITION BY { RANGE | LIST | HASH } (column opt_collation opt_opclass [, ...]) */ Datum pg_get_partkeydef(PG_FUNCTION_ARGS) @@ -1513,6 +1513,9 @@ pg_get_partkeydef_worker(Oid relid, int prettyFlags) case PARTITION_STRATEGY_RANGE: appendStringInfo(&buf, "RANGE"); break; + case PARTITION_STRATEGY_HASH: + appendStringInfo(&buf, "HASH"); + break; default: elog(ERROR, "unexpected partition strategy: %d", (int) form->partstrat); @@ -8520,6 +8523,9 @@ get_rule_expr(Node *node, deparse_context *context, appendStringInfoString(buf, ")"); break;+ case PARTITION_STRATEGY_HASH: + break; + default: elog(ERROR, "unrecognized partition strategy: %d", (int) spec->strategy); diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index 9001e20..829e4d2 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -855,6 +855,9 @@ RelationBuildPartitionKey(Relation relation) key->strategy = form->partstrat; key->partnatts = form->partnatts;+ key->partnparts = form->partnparts; + key->parthashfunc = form->parthashfunc; + /* * We can rely on the first variable-length attribute being mapped to the * relevant field of the catalog's C struct, because all previous @@ -999,6 +1002,9 @@ copy_partition_key(PartitionKey fromkey) newkey->strategy = fromkey->strategy; newkey->partnatts = n = fromkey->partnatts;+ newkey->partnparts = fromkey->partnparts; + newkey->parthashfunc = fromkey->parthashfunc; + newkey->partattrs = (AttrNumber *) palloc(n * sizeof(AttrNumber)); memcpy(newkey->partattrs, fromkey->partattrs, n * sizeof(AttrNumber));diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h index 1187797..367e2f8 100644 --- a/src/include/catalog/heap.h +++ b/src/include/catalog/heap.h @@ -141,7 +141,7 @@ extern void StorePartitionKey(Relation rel, AttrNumber *partattrs, List *partexprs, Oid *partopclass, - Oid *partcollation); + Oid *partcollation, int16 partnparts, Oid hashfunc); extern void RemovePartitionKeyByRelId(Oid relid); extern void StorePartitionBound(Relation rel, Relation parent, Node *bound);diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h index b195d1a..80f4b0e 100644 --- a/src/include/catalog/partition.h +++ b/src/include/catalog/partition.h @@ -89,4 +89,6 @@ extern int get_partition_for_tuple(PartitionDispatch *pd, TupleTableSlot *slot, EState *estate, Oid *failed_at); +extern Expr *convert_expr_for_hash(Expr *expr, int npart, Oid hashfunc); +extern int get_next_hash_partition_index(Relation parent); #endif /* PARTITION_H */ diff --git a/src/include/catalog/pg_partitioned_table.h b/src/include/catalog/pg_partitioned_table.h index bdff36a..69e509c 100644 --- a/src/include/catalog/pg_partitioned_table.h +++ b/src/include/catalog/pg_partitioned_table.h @@ -33,6 +33,9 @@ CATALOG(pg_partitioned_table,3350) BKI_WITHOUT_OIDS char partstrat; /* partitioning strategy */ int16 partnatts; /* number of partition key columns */+ int16 partnparts;
+ Oid parthashfunc;
+
/*
* variable-length fields start here, but we allow direct access to
* partattrs via the C struct. That's because the first variable-length
@@ -49,6 +52,8 @@ CATALOG(pg_partitioned_table,3350) BKI_WITHOUT_OIDS
pg_node_tree partexprs; /* list of expressions in the partition key;
* one item for each zero entry in partattrs[] */
#endif
+
+
} FormData_pg_partitioned_table;/* ---------------- @@ -62,13 +67,15 @@ typedef FormData_pg_partitioned_table *Form_pg_partitioned_table; * compiler constants for pg_partitioned_table * ---------------- */ -#define Natts_pg_partitioned_table 7 +#define Natts_pg_partitioned_table 9 #define Anum_pg_partitioned_table_partrelid 1 #define Anum_pg_partitioned_table_partstrat 2 #define Anum_pg_partitioned_table_partnatts 3 -#define Anum_pg_partitioned_table_partattrs 4 -#define Anum_pg_partitioned_table_partclass 5 -#define Anum_pg_partitioned_table_partcollation 6 -#define Anum_pg_partitioned_table_partexprs 7 +#define Anum_pg_partitioned_table_partnparts 4 +#define Anum_pg_partitioned_table_parthashfunc 5 +#define Anum_pg_partitioned_table_partattrs 6 +#define Anum_pg_partitioned_table_partclass 7 +#define Anum_pg_partitioned_table_partcollation 8 +#define Anum_pg_partitioned_table_partexprs 9#endif /* PG_PARTITIONED_TABLE_H */ diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 5afc3eb..1c3474f 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -730,11 +730,14 @@ typedef struct PartitionSpec NodeTag type; char *strategy; /* partitioning strategy ('list' or 'range') */ List *partParams; /* List of PartitionElems */ + int partnparts; + List *hashfunc; int location; /* token location, or -1 if unknown */ } PartitionSpec;#define PARTITION_STRATEGY_LIST 'l'
#define PARTITION_STRATEGY_RANGE 'r'
+#define PARTITION_STRATEGY_HASH 'h'/* * PartitionBoundSpec - a partition bound specification diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h index 985d650..0597939 100644 --- a/src/include/parser/kwlist.h +++ b/src/include/parser/kwlist.h @@ -180,6 +180,7 @@ PG_KEYWORD("greatest", GREATEST, COL_NAME_KEYWORD) PG_KEYWORD("group", GROUP_P, RESERVED_KEYWORD) PG_KEYWORD("grouping", GROUPING, COL_NAME_KEYWORD) PG_KEYWORD("handler", HANDLER, UNRESERVED_KEYWORD) +PG_KEYWORD("hash", HASH, UNRESERVED_KEYWORD) PG_KEYWORD("having", HAVING, RESERVED_KEYWORD) PG_KEYWORD("header", HEADER_P, UNRESERVED_KEYWORD) PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD) @@ -291,6 +292,7 @@ PG_KEYWORD("parallel", PARALLEL, UNRESERVED_KEYWORD) PG_KEYWORD("parser", PARSER, UNRESERVED_KEYWORD) PG_KEYWORD("partial", PARTIAL, UNRESERVED_KEYWORD) PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD) +PG_KEYWORD("partitions", PARTITIONS, UNRESERVED_KEYWORD) PG_KEYWORD("passing", PASSING, UNRESERVED_KEYWORD) PG_KEYWORD("password", PASSWORD, UNRESERVED_KEYWORD) PG_KEYWORD("placing", PLACING, RESERVED_KEYWORD) diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index a617a7c..660adfb 100644 --- a/src/include/utils/rel.h +++ b/src/include/utils/rel.h @@ -62,6 +62,9 @@ typedef struct PartitionKeyData Oid *partopcintype; /* OIDs of opclass declared input data types */ FmgrInfo *partsupfunc; /* lookup info for support funcs */+ int16 partnparts; /* number of hash partitions */ + Oid parthashfunc; /* OID of hash function */ + /* Partitioning collation per attribute */ Oid *partcollation;--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers--
Best regards,
Aleksander Alekseev--
Yugo Nagata <nagata@sraoss.co.jp>
--
Best regards,
Aleksander Alekseev
On 01.03.2017 05:14, Amit Langote wrote:
Nagata-san,
A partition table can be create as bellow;
CREATE TABLE h1 PARTITION OF h;
CREATE TABLE h2 PARTITION OF h;
CREATE TABLE h3 PARTITION OF h;FOR VALUES clause cannot be used, and the partition bound is
calclulated automatically as partition index of single integer value.When trying create partitions more than the number specified
by PARTITIONS, it gets an error.postgres=# create table h4 partition of h;
ERROR: cannot create hash partition more than 3 for hInstead of having to create each partition individually, wouldn't it be
better if the following commandCREATE TABLE h (i int) PARTITION BY HASH (i) PARTITIONS 3;
created the partitions *automatically*?
It's a good idea but in this case we can't create hash-partition that is
also partitioned table, and as a consequence we are unable to create
subpartitions. My understanding is that the table can be partitioned
only using CREATE TABLE statement, not ALTER TABLE. For this reason the
new created partitions are only regular tables.
We can achieve desired result through creating a separate partitioned
table and making the DETACH/ATTACH manipulation, though. But IMO it's
not flexible case.
It would be a good thing if a regular table could be partitioned through
separate command. Then your idea would not be restrictive.
--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
We can achieve desired result through creating a separate partitioned table
and making the DETACH/ATTACH manipulation, though. But IMO it's not flexible
case.
I think it would be great to allow end user to decide. If user is
not interested in subpartitions he or she can use syntax like 'CREATE
TABLE ... PARTITION BY HAHS(i) PARTITIONS 3 CREATE AUTOMATICALLY;' or
maybe a build-in procedure for this. Otherwise there is also
ATTACH/DETACH syntax available.
Anyway all of this is something that could be discussed infinitely and
not necessarily should be included in this concrete patch. We could
probably agree that 3 or 4 separately discussed, reviewed and tested
patches are better than one huge patch that will be moved to the next
commitfest because of disagreements regarding a syntax.
On Wed, Mar 01, 2017 at 05:10:34PM +0300, Maksim Milyutin wrote:
On 01.03.2017 05:14, Amit Langote wrote:
Nagata-san,
A partition table can be create as bellow;
CREATE TABLE h1 PARTITION OF h;
CREATE TABLE h2 PARTITION OF h;
CREATE TABLE h3 PARTITION OF h;FOR VALUES clause cannot be used, and the partition bound is
calclulated automatically as partition index of single integer value.When trying create partitions more than the number specified
by PARTITIONS, it gets an error.postgres=# create table h4 partition of h;
ERROR: cannot create hash partition more than 3 for hInstead of having to create each partition individually, wouldn't it be
better if the following commandCREATE TABLE h (i int) PARTITION BY HASH (i) PARTITIONS 3;
created the partitions *automatically*?
It's a good idea but in this case we can't create hash-partition that is
also partitioned table, and as a consequence we are unable to create
subpartitions. My understanding is that the table can be partitioned only
using CREATE TABLE statement, not ALTER TABLE. For this reason the new
created partitions are only regular tables.We can achieve desired result through creating a separate partitioned table
and making the DETACH/ATTACH manipulation, though. But IMO it's not flexible
case.It would be a good thing if a regular table could be partitioned through
separate command. Then your idea would not be restrictive.--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Best regards,
Aleksander Alekseev
Hi Aleksander ,
Thank you for reviewing the patch.
On Wed, 1 Mar 2017 17:08:49 +0300
Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote:
Hi, Yugo.
Today I've had an opportunity to take a closer look on this patch. Here are
a few things that bother me.1a) There are missing commends here:
``` --- a/src/include/catalog/pg_partitioned_table.h +++ b/src/include/catalog/pg_partitioned_table.h @@ -33,6 +33,9 @@ CATALOG(pg_partitioned_table,3350) BKI_WITHOUT_OIDS char partstrat; /* partitioning strategy */ int16 partnatts; /* number of partition key columns */+ int16 partnparts; + Oid parthashfunc; + ```1b) ... and here:
``` --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -730,11 +730,14 @@ typedef struct PartitionSpec NodeTag type; char *strategy; /* partitioning strategy ('list' or 'range') */ List *partParams; /* List of PartitionElems */ + int partnparts; + List *hashfunc; int location; /* token location, or -1 if unknown */ } PartitionSpec; ```
ok, I'll add comments for these members;
2) I believe new empty lines in patches are generally not welcomed by
community:```
@@ -49,6 +52,8 @@ CATALOG(pg_partitioned_table,3350) BKI_WITHOUT_OIDS
pg_node_tree partexprs; /* list of expressions in the partition key;
* one item for each zero entry in partattrs[] */
#endif
+
+
} FormData_pg_partitioned_table;
```
I'll remove it from the patch.
3) One test fails on my laptop (Arch Linux, x64) [1]:
``` *************** *** 344,350 **** CREATE TABLE partitioned ( a int ) PARTITION BY HASH (a); ! ERROR: unrecognized partitioning strategy "hash" -- specified column must be present in the table CREATE TABLE partitioned ( a int --- 344,350 ---- CREATE TABLE partitioned ( a int ) PARTITION BY HASH (a); ! ERROR: number of partitions must be specified for hash partition -- specified column must be present in the table CREATE TABLE partitioned ( a int ```
These are expected behaviors in the current patch. However, there
are some discussions on the specification about CREATE TABLE, so
it may be changed.
Exact script I'm using for building and testing PostgreSQL could be
found here [2].4) As I already mentioned - missing documentation.
I think writing the documentation should be waited fo the specification
getting a consensus.
In general patch looks quite good to me. I personally believe it has all
the changes to be accepted in current commitfest. Naturally if community
will come to a consensus regarding keywords, whether all partitions
should be created automatically, etc :)[1] http://afiskon.ru/s/dd/20cbe21934_regression.diffs.txt
[2] http://afiskon.ru/s/76/a4fb71739c_full-build.sh.txtOn Wed, Mar 01, 2017 at 06:10:10PM +0900, Yugo Nagata wrote:
Hi Aleksander,
On Tue, 28 Feb 2017 18:05:36 +0300
Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote:Hi, Yugo.
Looks like a great feature! I'm going to take a closer look on your code
and write a feedback shortly. For now I can only tell that you forgot
to include some documentation in the patch.Thank you for looking into it. I'm forward to your feedback.
This is a proof of concept patch and additional documentation
is not included. I'll add this after reaching a consensus
on the specification of the feature.I've added a corresponding entry to current commitfest [1]. Hope you
don't mind. If it's not too much trouble could you please register on a
commitfest site and add yourself to this entry as an author? I'm pretty
sure someone is using this information for writing release notes or
something like this.Thank you for registering it to the commitfest. I have added me as an auther.
[1] https://commitfest.postgresql.org/13/1059/
On Tue, Feb 28, 2017 at 11:33:13PM +0900, Yugo Nagata wrote:
Hi all,
Now we have a declarative partitioning, but hash partitioning is not
implemented yet. Attached is a POC patch to add the hash partitioning
feature. I know we will need more discussions about the syntax and other
specifications before going ahead the project, but I think this runnable
code might help to discuss what and how we implement this.* Description
In this patch, the hash partitioning implementation is basically based
on the list partitioning mechanism. However, partition bounds cannot be
specified explicitly, but this is used internally as hash partition
index, which is calculated when a partition is created or attached.The tentative syntax to create a partitioned table is as bellow;
CREATE TABLE h (i int) PARTITION BY HASH(i) PARTITIONS 3 USING hashint4;
The number of partitions is specified by PARTITIONS, which is currently
constant and cannot be changed, but I think this is needed to be changed in
some manner. A hash function is specified by USING. Maybe, specifying hash
function may be ommitted, and in this case, a default hash function
corresponding to key type will be used.A partition table can be create as bellow;
CREATE TABLE h1 PARTITION OF h;
CREATE TABLE h2 PARTITION OF h;
CREATE TABLE h3 PARTITION OF h;FOR VALUES clause cannot be used, and the partition bound is
calclulated automatically as partition index of single integer value.When trying create partitions more than the number specified
by PARTITIONS, it gets an error.postgres=# create table h4 partition of h;
ERROR: cannot create hash partition more than 3 for hAn inserted record is stored in a partition whose index equals
abs(hashfunc(key)) % <number_of_partitions>. In the above
example, this is abs(hashint4(i))%3.postgres=# insert into h (select generate_series(0,20));
INSERT 0 21postgres=# select *,tableoid::regclass from h;
i | tableoid
----+----------
0 | h1
1 | h1
2 | h1
4 | h1
8 | h1
10 | h1
11 | h1
14 | h1
15 | h1
17 | h1
20 | h1
5 | h2
12 | h2
13 | h2
16 | h2
19 | h2
3 | h3
6 | h3
7 | h3
9 | h3
18 | h3
(21 rows)* Todo / discussions
In this patch, we cannot change the number of partitions specified
by PARTITIONS. I we can change this, the partitioning rule
(<partition index> = abs(hashfunc(key)) % <number_of_partitions>)
is also changed and then we need reallocatiing records between
partitions.In this patch, user can specify a hash function USING. However,
we migth need default hash functions which are useful and
proper for hash partitioning.Currently, even when we issue SELECT query with a condition,
postgres looks into all partitions regardless of each partition's
constraint, because this is complicated such like "abs(hashint4(i))%3 = 0".postgres=# explain select * from h where i = 10;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..125.62 rows=40 width=4)
-> Seq Scan on h (cost=0.00..0.00 rows=1 width=4)
Filter: (i = 10)
-> Seq Scan on h1 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
-> Seq Scan on h2 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
-> Seq Scan on h3 (cost=0.00..41.88 rows=13 width=4)
Filter: (i = 10)
(9 rows)However, if we modify a condition into a same expression
as the partitions constraint, postgres can exclude unrelated
table from search targets. So, we might avoid the problem
by converting the qual properly before calling predicate_refuted_by().postgres=# explain select * from h where abs(hashint4(i))%3 = abs(hashint4(10))%3;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..61.00 rows=14 width=4)
-> Seq Scan on h (cost=0.00..0.00 rows=1 width=4)
Filter: ((abs(hashint4(i)) % 3) = 2)
-> Seq Scan on h3 (cost=0.00..61.00 rows=13 width=4)
Filter: ((abs(hashint4(i)) % 3) = 2)
(5 rows)Best regards,
Yugo Nagata--
Yugo Nagata <nagata@sraoss.co.jp>diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 41c0056..3820920 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -3074,7 +3074,7 @@ StorePartitionKey(Relation rel, AttrNumber *partattrs, List *partexprs, Oid *partopclass, - Oid *partcollation) + Oid *partcollation, int16 partnparts, Oid hashfunc) { int i; int2vector *partattrs_vec; @@ -3121,6 +3121,8 @@ StorePartitionKey(Relation rel, values[Anum_pg_partitioned_table_partrelid - 1] = ObjectIdGetDatum(RelationGetRelid(rel)); values[Anum_pg_partitioned_table_partstrat - 1] = CharGetDatum(strategy); values[Anum_pg_partitioned_table_partnatts - 1] = Int16GetDatum(partnatts); + values[Anum_pg_partitioned_table_partnparts - 1] = Int16GetDatum(partnparts); + values[Anum_pg_partitioned_table_parthashfunc - 1] = ObjectIdGetDatum(hashfunc); values[Anum_pg_partitioned_table_partattrs - 1] = PointerGetDatum(partattrs_vec); values[Anum_pg_partitioned_table_partclass - 1] = PointerGetDatum(partopclass_vec); values[Anum_pg_partitioned_table_partcollation - 1] = PointerGetDatum(partcollation_vec); diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 4bcef58..24e69c6 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -36,6 +36,8 @@ #include "optimizer/clauses.h" #include "optimizer/planmain.h" #include "optimizer/var.h" +#include "parser/parse_func.h" +#include "parser/parse_oper.h" #include "rewrite/rewriteManip.h" #include "storage/lmgr.h" #include "utils/array.h" @@ -120,6 +122,7 @@ static int32 qsort_partition_rbound_cmp(const void *a, const void *b,static List *get_qual_for_list(PartitionKey key, PartitionBoundSpec *spec); static List *get_qual_for_range(PartitionKey key, PartitionBoundSpec *spec); +static List *get_qual_for_hash(PartitionKey key, PartitionBoundSpec *spec); static Oid get_partition_operator(PartitionKey key, int col, StrategyNumber strategy, bool *need_relabel); static List *generate_partition_qual(Relation rel); @@ -236,7 +239,8 @@ RelationBuildPartitionDesc(Relation rel) oids[i++] = lfirst_oid(cell);/* Convert from node to the internal representation */ - if (key->strategy == PARTITION_STRATEGY_LIST) + if (key->strategy == PARTITION_STRATEGY_LIST || + key->strategy == PARTITION_STRATEGY_HASH) { List *non_null_values = NIL;@@ -251,7 +255,7 @@ RelationBuildPartitionDesc(Relation rel)
ListCell *c;
PartitionBoundSpec *spec = lfirst(cell);- if (spec->strategy != PARTITION_STRATEGY_LIST) + if (spec->strategy != key->strategy) elog(ERROR, "invalid strategy in partition bound spec");foreach(c, spec->listdatums)
@@ -464,6 +468,7 @@ RelationBuildPartitionDesc(Relation rel)
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_HASH:
{
boundinfo->has_null = found_null;
boundinfo->indexes = (int *) palloc(ndatums * sizeof(int));
@@ -829,6 +834,18 @@ check_new_partition_bound(char *relname, Relation parent, Node *bound)
break;
}+ case PARTITION_STRATEGY_HASH: + { + Assert(spec->strategy == PARTITION_STRATEGY_HASH); + + if (partdesc->nparts + 1 > key->partnparts) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("cannot create hash partition more than %d for %s", + key->partnparts, RelationGetRelationName(parent)))); + break; + } + default: elog(ERROR, "unexpected partition strategy: %d", (int) key->strategy); @@ -916,6 +933,11 @@ get_qual_from_partbound(Relation rel, Relation parent, Node *bound) my_qual = get_qual_for_range(key, spec); break;+ case PARTITION_STRATEGY_HASH: + Assert(spec->strategy == PARTITION_STRATEGY_LIST); + my_qual = get_qual_for_hash(key, spec); + break; + default: elog(ERROR, "unexpected partition strategy: %d", (int) key->strategy); @@ -1146,6 +1168,84 @@ RelationGetPartitionDispatchInfo(Relation rel, int lockmode, return pd; }+/* + * convert_expr_for_hash + * + * Converts a expr for a hash partition's constraint. + * expr is converted into 'abs(hashfunc(expr)) % npart". + * + * npart: number of partitions + * hashfunc: OID of hash function + */ +Expr * +convert_expr_for_hash(Expr *expr, int npart, Oid hashfunc) +{ + FuncExpr *func, + *abs; + Expr *modexpr; + Oid modoid; + Oid int4oid[1] = {INT4OID}; + + ParseState *pstate = make_parsestate(NULL); + Value *val_npart = makeInteger(npart); + Node *const_npart = (Node *) make_const(pstate, val_npart, -1); + + /* hash function */ + func = makeFuncExpr(hashfunc, + INT4OID, + list_make1(expr), + 0, + 0, + COERCE_EXPLICIT_CALL); + + /* Abs */ + abs = makeFuncExpr(LookupFuncName(list_make1(makeString("abs")), 1, int4oid, false), + INT4OID, + list_make1(func), + 0, + 0, + COERCE_EXPLICIT_CALL); + + /* modulo by npart */ + modoid = LookupOperName(pstate, list_make1(makeString("%")), INT4OID, INT4OID, false, -1); + modexpr = make_opclause(modoid, INT4OID, false, (Expr*)abs, (Expr*)const_npart, 0, 0); + + return modexpr; +} + + +/* + * get_next_hash_partition_index + * + * Returns the minimal index which is not used for hash partition. + */ +int +get_next_hash_partition_index(Relation parent) +{ + PartitionKey key = RelationGetPartitionKey(parent); + PartitionDesc partdesc = RelationGetPartitionDesc(parent); + + int i; + bool *used = palloc0(sizeof(int) * key->partnparts); + + /* mark used for existing partition indexs */ + for (i = 0; i < partdesc->boundinfo->ndatums; i++) + { + Datum* datum = partdesc->boundinfo->datums[i]; + int idx = DatumGetInt16(datum[0]); + + if (!used[idx]) + used[idx] = true; + } + + /* find the minimal unused index */ + for (i = 0; i < key->partnparts; i++) + if (!used[i]) + break; + + return i; +} + /* Module-local functions *//*
@@ -1467,6 +1567,43 @@ get_qual_for_range(PartitionKey key, PartitionBoundSpec *spec)
}/* + * get_qual_for_hash + * + * Returns a list of expressions to use as a hash partition's constraint. + */ +static List * +get_qual_for_hash(PartitionKey key, PartitionBoundSpec *spec) +{ + List *result; + Expr *keyCol; + Expr *expr; + Expr *opexpr; + Oid operoid; + ParseState *pstate = make_parsestate(NULL); + + /* Left operand */ + if (key->partattrs[0] != 0) + keyCol = (Expr *) makeVar(1, + key->partattrs[0], + key->parttypid[0], + key->parttypmod[0], + key->parttypcoll[0], + 0); + else + keyCol = (Expr *) copyObject(linitial(key->partexprs)); + + expr = convert_expr_for_hash(keyCol, key->partnparts, key->parthashfunc); + + /* equals the listdaums value */ + operoid = LookupOperName(pstate, list_make1(makeString("=")), INT4OID, INT4OID, false, -1); + opexpr = make_opclause(operoid, BOOLOID, false, expr, linitial(spec->listdatums), 0, 0); + + result = list_make1(opexpr); + + return result; +} + +/* * get_partition_operator * * Return oid of the operator of given strategy for a given partition key @@ -1730,6 +1867,11 @@ get_partition_for_tuple(PartitionDispatch *pd, (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), errmsg("range partition key of row contains null"))); } + else if (key->strategy == PARTITION_STRATEGY_HASH) + { + values[0] = OidFunctionCall1(key->parthashfunc, values[0]); + values[0] = Int16GetDatum(Abs(DatumGetInt16(values[0])) % key->partnparts); + }if (partdesc->boundinfo->has_null && isnull[0]) /* Tuple maps to the null-accepting list partition */ @@ -1744,6 +1886,7 @@ get_partition_for_tuple(PartitionDispatch *pd, switch (key->strategy) { case PARTITION_STRATEGY_LIST: + case PARTITION_STRATEGY_HASH: if (cur_offset >= 0 && equal) cur_index = partdesc->boundinfo->indexes[cur_offset]; else @@ -1968,6 +2111,7 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo, switch (key->strategy) { case PARTITION_STRATEGY_LIST: + case PARTITION_STRATEGY_HASH: cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0], key->partcollation[0], bound_datums[0], diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 3cea220..5a28cc0 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -41,6 +41,7 @@ #include "catalog/pg_inherits_fn.h" #include "catalog/pg_namespace.h" #include "catalog/pg_opclass.h" +#include "catalog/pg_proc.h" #include "catalog/pg_tablespace.h" #include "catalog/pg_trigger.h" #include "catalog/pg_type.h" @@ -77,6 +78,7 @@ #include "parser/parse_oper.h" #include "parser/parse_relation.h" #include "parser/parse_type.h" +#include "parser/parse_func.h" #include "parser/parse_utilcmd.h" #include "parser/parser.h" #include "pgstat.h" @@ -450,7 +452,7 @@ static void RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid, void *arg); static bool is_partition_attr(Relation rel, AttrNumber attnum, bool *used_in_expr); static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy); -static void ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, +static void ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, Oid *partatttypes, List **partexprs, Oid *partopclass, Oid *partcollation); static void CreateInheritance(Relation child_rel, Relation parent_rel); static void RemoveInheritance(Relation child_rel, Relation parent_rel); @@ -799,8 +801,10 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, AttrNumber partattrs[PARTITION_MAX_KEYS]; Oid partopclass[PARTITION_MAX_KEYS]; Oid partcollation[PARTITION_MAX_KEYS]; + Oid partatttypes[PARTITION_MAX_KEYS]; List *partexprs = NIL; List *cmds = NIL; + Oid hashfuncOid = InvalidOid;/* * We need to transform the raw parsetrees corresponding to partition @@ -811,15 +815,40 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, stmt->partspec = transformPartitionSpec(rel, stmt->partspec, &strategy); ComputePartitionAttrs(rel, stmt->partspec->partParams, - partattrs, &partexprs, partopclass, + partattrs, partatttypes, &partexprs, partopclass, partcollation);partnatts = list_length(stmt->partspec->partParams); + + if (strategy == PARTITION_STRATEGY_HASH) + { + Oid funcrettype; + + if (partnatts != 1) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of partition key must be 1 for hash partition"))); + + hashfuncOid = LookupFuncName(stmt->partspec->hashfunc, 1, partatttypes, false); + funcrettype = get_func_rettype(hashfuncOid); + if (funcrettype != INT4OID) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("hash function for partitioning must return integer"))); + + if (func_volatile(hashfuncOid) != PROVOLATILE_IMMUTABLE) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("hash function for partitioning must be marked IMMUTABLE"))); + + } + StorePartitionKey(rel, strategy, partnatts, partattrs, partexprs, - partopclass, partcollation); + partopclass, partcollation, stmt->partspec->partnparts, hashfuncOid);- /* Force key columns to be NOT NULL when using range partitioning */ - if (strategy == PARTITION_STRATEGY_RANGE) + /* Force key columns to be NOT NULL when using range or hash partitioning */ + if (strategy == PARTITION_STRATEGY_RANGE || + strategy == PARTITION_STRATEGY_HASH) { for (i = 0; i < partnatts; i++) { @@ -12783,18 +12812,51 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy) newspec->strategy = partspec->strategy; newspec->location = partspec->location; newspec->partParams = NIL; + newspec->partnparts = partspec->partnparts; + newspec->hashfunc = partspec->hashfunc;/* Parse partitioning strategy name */ if (!pg_strcasecmp(partspec->strategy, "list")) *strategy = PARTITION_STRATEGY_LIST; else if (!pg_strcasecmp(partspec->strategy, "range")) *strategy = PARTITION_STRATEGY_RANGE; + else if (!pg_strcasecmp(partspec->strategy, "hash")) + *strategy = PARTITION_STRATEGY_HASH; else ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("unrecognized partitioning strategy \"%s\"", partspec->strategy)));+ if (*strategy == PARTITION_STRATEGY_HASH) + { + if (partspec->partnparts < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of partitions must be specified for hash partition"))); + else if (partspec->partnparts == 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of partitions must be greater than 0"))); + + if (list_length(partspec->hashfunc) == 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("hash function must be specified for hash partition"))); + } + else + { + if (partspec->partnparts >= 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of partitions can be specified only for hash partition"))); + + if (list_length(partspec->hashfunc) > 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("hash function can be specified only for hash partition"))); + } + /* * Create a dummy ParseState and insert the target relation as its sole * rangetable entry. We need a ParseState for transformExpr. @@ -12843,7 +12905,7 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy) * Compute per-partition-column information from a list of PartitionElem's */ static void -ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, +ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, Oid *partatttypes, List **partexprs, Oid *partopclass, Oid *partcollation) { int attn; @@ -13010,6 +13072,7 @@ ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs, "btree", BTREE_AM_OID);+ partatttypes[attn] = atttype; attn++; } } diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 05d8538..f4febc9 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -4232,6 +4232,8 @@ _copyPartitionSpec(const PartitionSpec *from)COPY_STRING_FIELD(strategy);
COPY_NODE_FIELD(partParams);
+ COPY_SCALAR_FIELD(partnparts);
+ COPY_NODE_FIELD(hashfunc);
COPY_LOCATION_FIELD(location);return newnode; diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index d595cd7..d589eac 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -2725,6 +2725,8 @@ _equalPartitionSpec(const PartitionSpec *a, const PartitionSpec *b) { COMPARE_STRING_FIELD(strategy); COMPARE_NODE_FIELD(partParams); + COMPARE_SCALAR_FIELD(partnparts); + COMPARE_NODE_FIELD(hashfunc); COMPARE_LOCATION_FIELD(location);return true; diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index b3802b4..d6db80e 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -3318,6 +3318,8 @@ _outPartitionSpec(StringInfo str, const PartitionSpec *node)WRITE_STRING_FIELD(strategy);
WRITE_NODE_FIELD(partParams);
+ WRITE_INT_FIELD(partnparts);
+ WRITE_NODE_FIELD(hashfunc);
WRITE_LOCATION_FIELD(location);
}diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index e833b2e..b67140d 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -574,6 +574,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type <list> partbound_datum_list %type <partrange_datum> PartitionRangeDatum %type <list> range_datum_list +%type <ival> hash_partitions +%type <list> hash_function/*
* Non-keyword token types. These are hard-wired into the "flex" lexer.
@@ -627,7 +629,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);GLOBAL GRANT GRANTED GREATEST GROUP_P GROUPING
- HANDLER HAVING HEADER_P HOLD HOUR_P + HANDLER HASH HAVING HEADER_P HOLD HOUR_PIDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P
INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
@@ -651,7 +653,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
OBJECT_P OF OFF OFFSET OIDS OLD ON ONLY OPERATOR OPTION OPTIONS OR
ORDER ORDINALITY OUT_P OUTER_P OVER OVERLAPS OVERLAY OWNED OWNER- PARALLEL PARSER PARTIAL PARTITION PASSING PASSWORD PLACING PLANS POLICY + PARALLEL PARSER PARTIAL PARTITION PARTITIONS PASSING PASSWORD PLACING PLANS POLICY POSITION PRECEDING PRECISION PRESERVE PREPARE PREPARED PRIMARY PRIOR PRIVILEGES PROCEDURAL PROCEDURE PROGRAM PUBLICATION@@ -2587,6 +2589,16 @@ ForValues:
$$ = (Node *) n; } + + /* a HASH partition */ + | /*EMPTY*/ + { + PartitionBoundSpec *n = makeNode(PartitionBoundSpec); + + n->strategy = PARTITION_STRATEGY_HASH; + + $$ = (Node *) n; + } ;partbound_datum:
@@ -3666,7 +3678,7 @@ OptPartitionSpec: PartitionSpec { $$ = $1; }
| /*EMPTY*/ { $$ = NULL; }
;-PartitionSpec: PARTITION BY part_strategy '(' part_params ')' +PartitionSpec: PARTITION BY part_strategy '(' part_params ')' hash_partitions hash_function { PartitionSpec *n = makeNode(PartitionSpec);@@ -3674,10 +3686,21 @@ PartitionSpec: PARTITION BY part_strategy '(' part_params ')'
n->partParams = $5;
n->location = @1;+ n->partnparts = $7; + n->hashfunc = $8; + $$ = n; } ;+hash_partitions: PARTITIONS Iconst { $$ = $2; } + | /*EMPTY*/ { $$ = -1; } + ; + +hash_function: USING handler_name { $$ = $2; } + | /*EMPTY*/ { $$ = NULL; } + ; + part_strategy: IDENT { $$ = $1; } | unreserved_keyword { $$ = pstrdup($1); } ; @@ -14377,6 +14400,7 @@ unreserved_keyword: | GLOBAL | GRANTED | HANDLER + | HASH | HEADER_P | HOLD | HOUR_P @@ -14448,6 +14472,7 @@ unreserved_keyword: | PARSER | PARTIAL | PARTITION + | PARTITIONS | PASSING | PASSWORD | PLANS diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c index ff2bab6..8e1be31 100644 --- a/src/backend/parser/parse_utilcmd.c +++ b/src/backend/parser/parse_utilcmd.c @@ -40,6 +40,7 @@ #include "catalog/pg_opclass.h" #include "catalog/pg_operator.h" #include "catalog/pg_type.h" +#include "catalog/partition.h" #include "commands/comment.h" #include "commands/defrem.h" #include "commands/tablecmds.h" @@ -3252,6 +3253,24 @@ transformPartitionBound(ParseState *pstate, Relation parent, Node *bound) ++i; } } + else if (strategy == PARTITION_STRATEGY_HASH) + { + Value *conval; + Node *value; + int index; + + if (spec->strategy != PARTITION_STRATEGY_HASH) + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("invalid bound specification for a hash partition"))); + + index = get_next_hash_partition_index(parent); + + /* store the partition index as a listdatums value */ + conval = makeInteger(index); + value = (Node *) make_const(pstate, conval, -1); + result_spec->listdatums = list_make1(value); + } else elog(ERROR, "unexpected partition strategy: %d", (int) strategy);diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index b27b77d..fab6eea 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -1423,7 +1423,7 @@ pg_get_indexdef_worker(Oid indexrelid, int colno, * * Returns the partition key specification, ie, the following: * - * PARTITION BY { RANGE | LIST } (column opt_collation opt_opclass [, ...]) + * PARTITION BY { RANGE | LIST | HASH } (column opt_collation opt_opclass [, ...]) */ Datum pg_get_partkeydef(PG_FUNCTION_ARGS) @@ -1513,6 +1513,9 @@ pg_get_partkeydef_worker(Oid relid, int prettyFlags) case PARTITION_STRATEGY_RANGE: appendStringInfo(&buf, "RANGE"); break; + case PARTITION_STRATEGY_HASH: + appendStringInfo(&buf, "HASH"); + break; default: elog(ERROR, "unexpected partition strategy: %d", (int) form->partstrat); @@ -8520,6 +8523,9 @@ get_rule_expr(Node *node, deparse_context *context, appendStringInfoString(buf, ")"); break;+ case PARTITION_STRATEGY_HASH: + break; + default: elog(ERROR, "unrecognized partition strategy: %d", (int) spec->strategy); diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index 9001e20..829e4d2 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -855,6 +855,9 @@ RelationBuildPartitionKey(Relation relation) key->strategy = form->partstrat; key->partnatts = form->partnatts;+ key->partnparts = form->partnparts; + key->parthashfunc = form->parthashfunc; + /* * We can rely on the first variable-length attribute being mapped to the * relevant field of the catalog's C struct, because all previous @@ -999,6 +1002,9 @@ copy_partition_key(PartitionKey fromkey) newkey->strategy = fromkey->strategy; newkey->partnatts = n = fromkey->partnatts;+ newkey->partnparts = fromkey->partnparts; + newkey->parthashfunc = fromkey->parthashfunc; + newkey->partattrs = (AttrNumber *) palloc(n * sizeof(AttrNumber)); memcpy(newkey->partattrs, fromkey->partattrs, n * sizeof(AttrNumber));diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h index 1187797..367e2f8 100644 --- a/src/include/catalog/heap.h +++ b/src/include/catalog/heap.h @@ -141,7 +141,7 @@ extern void StorePartitionKey(Relation rel, AttrNumber *partattrs, List *partexprs, Oid *partopclass, - Oid *partcollation); + Oid *partcollation, int16 partnparts, Oid hashfunc); extern void RemovePartitionKeyByRelId(Oid relid); extern void StorePartitionBound(Relation rel, Relation parent, Node *bound);diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h index b195d1a..80f4b0e 100644 --- a/src/include/catalog/partition.h +++ b/src/include/catalog/partition.h @@ -89,4 +89,6 @@ extern int get_partition_for_tuple(PartitionDispatch *pd, TupleTableSlot *slot, EState *estate, Oid *failed_at); +extern Expr *convert_expr_for_hash(Expr *expr, int npart, Oid hashfunc); +extern int get_next_hash_partition_index(Relation parent); #endif /* PARTITION_H */ diff --git a/src/include/catalog/pg_partitioned_table.h b/src/include/catalog/pg_partitioned_table.h index bdff36a..69e509c 100644 --- a/src/include/catalog/pg_partitioned_table.h +++ b/src/include/catalog/pg_partitioned_table.h @@ -33,6 +33,9 @@ CATALOG(pg_partitioned_table,3350) BKI_WITHOUT_OIDS char partstrat; /* partitioning strategy */ int16 partnatts; /* number of partition key columns */+ int16 partnparts;
+ Oid parthashfunc;
+
/*
* variable-length fields start here, but we allow direct access to
* partattrs via the C struct. That's because the first variable-length
@@ -49,6 +52,8 @@ CATALOG(pg_partitioned_table,3350) BKI_WITHOUT_OIDS
pg_node_tree partexprs; /* list of expressions in the partition key;
* one item for each zero entry in partattrs[] */
#endif
+
+
} FormData_pg_partitioned_table;/* ---------------- @@ -62,13 +67,15 @@ typedef FormData_pg_partitioned_table *Form_pg_partitioned_table; * compiler constants for pg_partitioned_table * ---------------- */ -#define Natts_pg_partitioned_table 7 +#define Natts_pg_partitioned_table 9 #define Anum_pg_partitioned_table_partrelid 1 #define Anum_pg_partitioned_table_partstrat 2 #define Anum_pg_partitioned_table_partnatts 3 -#define Anum_pg_partitioned_table_partattrs 4 -#define Anum_pg_partitioned_table_partclass 5 -#define Anum_pg_partitioned_table_partcollation 6 -#define Anum_pg_partitioned_table_partexprs 7 +#define Anum_pg_partitioned_table_partnparts 4 +#define Anum_pg_partitioned_table_parthashfunc 5 +#define Anum_pg_partitioned_table_partattrs 6 +#define Anum_pg_partitioned_table_partclass 7 +#define Anum_pg_partitioned_table_partcollation 8 +#define Anum_pg_partitioned_table_partexprs 9#endif /* PG_PARTITIONED_TABLE_H */ diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 5afc3eb..1c3474f 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -730,11 +730,14 @@ typedef struct PartitionSpec NodeTag type; char *strategy; /* partitioning strategy ('list' or 'range') */ List *partParams; /* List of PartitionElems */ + int partnparts; + List *hashfunc; int location; /* token location, or -1 if unknown */ } PartitionSpec;#define PARTITION_STRATEGY_LIST 'l'
#define PARTITION_STRATEGY_RANGE 'r'
+#define PARTITION_STRATEGY_HASH 'h'/* * PartitionBoundSpec - a partition bound specification diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h index 985d650..0597939 100644 --- a/src/include/parser/kwlist.h +++ b/src/include/parser/kwlist.h @@ -180,6 +180,7 @@ PG_KEYWORD("greatest", GREATEST, COL_NAME_KEYWORD) PG_KEYWORD("group", GROUP_P, RESERVED_KEYWORD) PG_KEYWORD("grouping", GROUPING, COL_NAME_KEYWORD) PG_KEYWORD("handler", HANDLER, UNRESERVED_KEYWORD) +PG_KEYWORD("hash", HASH, UNRESERVED_KEYWORD) PG_KEYWORD("having", HAVING, RESERVED_KEYWORD) PG_KEYWORD("header", HEADER_P, UNRESERVED_KEYWORD) PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD) @@ -291,6 +292,7 @@ PG_KEYWORD("parallel", PARALLEL, UNRESERVED_KEYWORD) PG_KEYWORD("parser", PARSER, UNRESERVED_KEYWORD) PG_KEYWORD("partial", PARTIAL, UNRESERVED_KEYWORD) PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD) +PG_KEYWORD("partitions", PARTITIONS, UNRESERVED_KEYWORD) PG_KEYWORD("passing", PASSING, UNRESERVED_KEYWORD) PG_KEYWORD("password", PASSWORD, UNRESERVED_KEYWORD) PG_KEYWORD("placing", PLACING, RESERVED_KEYWORD) diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index a617a7c..660adfb 100644 --- a/src/include/utils/rel.h +++ b/src/include/utils/rel.h @@ -62,6 +62,9 @@ typedef struct PartitionKeyData Oid *partopcintype; /* OIDs of opclass declared input data types */ FmgrInfo *partsupfunc; /* lookup info for support funcs */+ int16 partnparts; /* number of hash partitions */ + Oid parthashfunc; /* OID of hash function */ + /* Partitioning collation per attribute */ Oid *partcollation;--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers--
Best regards,
Aleksander Alekseev--
Yugo Nagata <nagata@sraoss.co.jp>--
Best regards,
Aleksander Alekseev
--
Yugo Nagata <nagata@sraoss.co.jp>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 1, 2017 at 3:50 PM, Yugo Nagata <nagata@sraoss.co.jp> wrote:
[....]
I Agree that it is unavoidable partitions number in modulo hashing,
but we can do in other hashing technique. Have you had thought about
Linear hashing[1] or Consistent hashing[2]? This will allow us to
add/drop
partition with minimal row moment. Thank you for your information of hash technique. I'll see them
and try to allowing the number of partitions to be changed.
Thanks for showing interest, I was also talking about this with Robert Haas
and
hacking on this, here is what we came up with this.
If we want to introduce hash partitioning without syntax contort and minimal
movement while changing hash partitions (ADD-DROP/ATTACH-DETACH operation),
at start I thought we could pick up linear hashing, because of in both the
hashing we might need to move approx tot_num_of_tuple/tot_num_of_partitions
tuples at adding new partition and no row moment required at dropping
partitioning.
With further thinking and talking through the idea of using linear hashing
with my team, we realized that has some problems specially during pg_dump
and pg_upgrade. Both a regular pg_dump and the binary-upgrade version of
pg_dump which is used by pg_restore need to maintain the identity of the
partitions. We can't rely on things like OID order which may be unstable, or
name order which might not match the order in which partitions were added.
So
somehow the partition position would need to be specified explicitly.
So later we came up with some syntax like this (just fyi, this doesn't add
any new keywords):
create table foo (a integer, b text) partition by hash (a);
create table foo1 partition of foo with (modulus 4, remainder 0);
create table foo2 partition of foo with (modulus 8, remainder 1); --
legal, modulus doesn't need to match
create table foo3 partition of foo with (modulus 8, remainder 4); --
illegal, overlaps foo1
Here we need to enforce a rule that every modulus must be a factor of the
next
larger modulus. So, for example, if you have a bunch of partitions that all
have
modulus 5, you can add a new partition with modulus 10 or a new partition
with
modulus 15, but you cannot add both a partition with modulus 10 and a
partition
with modulus 15, because 10 is not a factor of 15. However, you could
simultaneously use modulus 4, modulus 8, modulus 16, and modulus 32 if you
wished, because each modulus is a factor of the next larger one. You could
also use modulus 10, modulus 20, and modulus 60. But you could not use
modulus
10, modulus 15, and modulus 60, because while both of the smaller module are
factors of 60, it is not true that each is a factor of the next.
Other advantages with this rule are:
1. Dropping (or detaching) and adding (or attaching) a partition can never
cause the rule to be violated.
2. We can easily build a tuple-routing data structure based on the largest
modulus.
For example: If the user has
partition 1 with (modulus 2, remainder 1),
partition 2 with (modulus 4, remainder 2),
partition 3 with (modulus 8, remainder 0) and
partition 4 with (modulus 8, remainder 4),
then we can build the following tuple routing array in the relcache:
== lookup table for hashvalue % 8 ==
0 => p3
1 => p1
2 => p2
3 => p1
4 => p4
5 => p1
6 => p2
7 => p1
3. It's also quite easy to test with a proposed new partition overlaps with
any
existing partition. Just build the mapping array and see if you ever end up
trying to assign a partition to a slot that's already been assigned to some
other partition.
We can still work on the proposed syntax - and I am open for suggestions.
One
more thought is to use FOR VALUES HAVING like:
CREATE TABLE foo1 PARTITION OF foo FOR VALUES HAVING (modulus 2, remainder
1);
But still more thoughts/inputs welcome here.
Attached patch implements former syntax, here is quick demonstration:
1.CREATE :
create table foo (a integer, b text) partition by hash (a);
create table foo1 partition of foo with (modulus 2, remainder 1);
create table foo2 partition of foo with (modulus 4, remainder 2);
create table foo3 partition of foo with (modulus 8, remainder 0);
create table foo4 partition of foo with (modulus 8, remainder 4);
2. Display parent table info:
postgres=# \d+ foo
Table "public.foo"
Column | Type | Collation | Nullable | Default | Storage | Stats
target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
a | integer | | | | plain |
|
b | text | | | | extended |
|
Partition key: HASH (a)
Partitions: foo1 WITH (modulus 2, remainder 1),
foo2 WITH (modulus 4, remainder 2),
foo3 WITH (modulus 8, remainder 0),
foo4 WITH (modulus 8, remainder 4)
3. Display child table info:
postgres=# \d+ foo1
Table "public.foo1"
Column | Type | Collation | Nullable | Default | Storage | Stats
target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
a | integer | | | | plain |
|
b | text | | | | extended |
|
Partition of: foo WITH (modulus 2, remainder 1)
4. INSERT:
postgres=# insert into foo select i, 'abc' from generate_series(1,10) i;
INSERT 0 10
postgres=# select tableoid::regclass as part, * from foo;
part | a | b
------+----+-----
foo1 | 3 | abc
foo1 | 4 | abc
foo1 | 7 | abc
foo1 | 10 | abc
foo2 | 1 | abc
foo2 | 2 | abc
foo2 | 9 | abc
foo3 | 6 | abc
foo4 | 5 | abc
foo4 | 8 | abc
(10 rows)
TODOs.
1. Maybe need some work in the CREATE TABLE .. PARTITION OF .. syntax.
2. Trim regression tests (if require).
3. Documentation
Thoughts/Comments?
Attachments:
hash-partitioning_another_design-v1.patchapplication/octet-stream; name=hash-partitioning_another_design-v1.patchDownload+1264-86
On Thu, 2 Mar 2017 18:33:42 +0530
amul sul <sulamul@gmail.com> wrote:
Thank you for the patch. This is very interesting. I'm going to look
into your code and write a feedback later.
On Wed, Mar 1, 2017 at 3:50 PM, Yugo Nagata <nagata@sraoss.co.jp> wrote:
[....]
I Agree that it is unavoidable partitions number in modulo hashing,
but we can do in other hashing technique. Have you had thought about
Linear hashing[1] or Consistent hashing[2]? This will allow us to
add/drop
partition with minimal row moment. Thank you for your information of hash technique. I'll see them
and try to allowing the number of partitions to be changed.
Thanks for showing interest, I was also talking about this with Robert Haas
and
hacking on this, here is what we came up with this.If we want to introduce hash partitioning without syntax contort and minimal
movement while changing hash partitions (ADD-DROP/ATTACH-DETACH operation),
at start I thought we could pick up linear hashing, because of in both the
hashing we might need to move approx tot_num_of_tuple/tot_num_of_partitions
tuples at adding new partition and no row moment required at dropping
partitioning.With further thinking and talking through the idea of using linear hashing
with my team, we realized that has some problems specially during pg_dump
and pg_upgrade. Both a regular pg_dump and the binary-upgrade version of
pg_dump which is used by pg_restore need to maintain the identity of the
partitions. We can't rely on things like OID order which may be unstable, or
name order which might not match the order in which partitions were added.
So
somehow the partition position would need to be specified explicitly.So later we came up with some syntax like this (just fyi, this doesn't add
any new keywords):create table foo (a integer, b text) partition by hash (a);
create table foo1 partition of foo with (modulus 4, remainder 0);
create table foo2 partition of foo with (modulus 8, remainder 1); --
legal, modulus doesn't need to match
create table foo3 partition of foo with (modulus 8, remainder 4); --
illegal, overlaps foo1Here we need to enforce a rule that every modulus must be a factor of the
next
larger modulus. So, for example, if you have a bunch of partitions that all
have
modulus 5, you can add a new partition with modulus 10 or a new partition
with
modulus 15, but you cannot add both a partition with modulus 10 and a
partition
with modulus 15, because 10 is not a factor of 15. However, you could
simultaneously use modulus 4, modulus 8, modulus 16, and modulus 32 if you
wished, because each modulus is a factor of the next larger one. You could
also use modulus 10, modulus 20, and modulus 60. But you could not use
modulus
10, modulus 15, and modulus 60, because while both of the smaller module are
factors of 60, it is not true that each is a factor of the next.Other advantages with this rule are:
1. Dropping (or detaching) and adding (or attaching) a partition can never
cause the rule to be violated.2. We can easily build a tuple-routing data structure based on the largest
modulus.For example: If the user has
partition 1 with (modulus 2, remainder 1),
partition 2 with (modulus 4, remainder 2),
partition 3 with (modulus 8, remainder 0) and
partition 4 with (modulus 8, remainder 4),then we can build the following tuple routing array in the relcache:
== lookup table for hashvalue % 8 ==
0 => p3
1 => p1
2 => p2
3 => p1
4 => p4
5 => p1
6 => p2
7 => p13. It's also quite easy to test with a proposed new partition overlaps with
any
existing partition. Just build the mapping array and see if you ever end up
trying to assign a partition to a slot that's already been assigned to some
other partition.We can still work on the proposed syntax - and I am open for suggestions.
One
more thought is to use FOR VALUES HAVING like:
CREATE TABLE foo1 PARTITION OF foo FOR VALUES HAVING (modulus 2, remainder
1);But still more thoughts/inputs welcome here.
Attached patch implements former syntax, here is quick demonstration:
1.CREATE :
create table foo (a integer, b text) partition by hash (a);
create table foo1 partition of foo with (modulus 2, remainder 1);
create table foo2 partition of foo with (modulus 4, remainder 2);
create table foo3 partition of foo with (modulus 8, remainder 0);
create table foo4 partition of foo with (modulus 8, remainder 4);2. Display parent table info:
postgres=# \d+ foo
Table "public.foo"
Column | Type | Collation | Nullable | Default | Storage | Stats
target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
a | integer | | | | plain |
|
b | text | | | | extended |
|
Partition key: HASH (a)
Partitions: foo1 WITH (modulus 2, remainder 1),
foo2 WITH (modulus 4, remainder 2),
foo3 WITH (modulus 8, remainder 0),
foo4 WITH (modulus 8, remainder 4)3. Display child table info:
postgres=# \d+ foo1
Table "public.foo1"
Column | Type | Collation | Nullable | Default | Storage | Stats
target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
a | integer | | | | plain |
|
b | text | | | | extended |
|
Partition of: foo WITH (modulus 2, remainder 1)4. INSERT:
postgres=# insert into foo select i, 'abc' from generate_series(1,10) i;
INSERT 0 10postgres=# select tableoid::regclass as part, * from foo;
part | a | b
------+----+-----
foo1 | 3 | abc
foo1 | 4 | abc
foo1 | 7 | abc
foo1 | 10 | abc
foo2 | 1 | abc
foo2 | 2 | abc
foo2 | 9 | abc
foo3 | 6 | abc
foo4 | 5 | abc
foo4 | 8 | abc
(10 rows)TODOs.
1. Maybe need some work in the CREATE TABLE .. PARTITION OF .. syntax.
2. Trim regression tests (if require).
3. DocumentationThoughts/Comments?
--
Yugo Nagata <nagata@sraoss.co.jp>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2 March 2017 at 13:03, amul sul <sulamul@gmail.com> wrote:
create table foo (a integer, b text) partition by hash (a);
create table foo1 partition of foo with (modulus 4, remainder 0);
create table foo2 partition of foo with (modulus 8, remainder 1); -- legal,
modulus doesn't need to match
create table foo3 partition of foo with (modulus 8, remainder 4); --
illegal, overlaps foo1
Instead of using modulus, why not just divide up the range of hash
keys using ranges? That should be just as good for a good hash
function (effectively using the high bits instead of the low bits of
the hash value). And it would mean you could reuse the machinery for
list partitioning for partition exclusion.
It also has the advantage that it's easier to see how to add more
partitions. You just split all the ranges and (and migrate the
data...). There's even the possibility of having uneven partitions if
you have a data distribution skew -- which can happen even if you have
a good hash function. In a degenerate case you could have a partition
for a single hash of a particularly common value then a reasonable
number of partitions for the remaining hash ranges.
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Mar 3, 2017 at 5:00 PM, Greg Stark <stark@mit.edu> wrote:
On 2 March 2017 at 13:03, amul sul <sulamul@gmail.com> wrote:
create table foo (a integer, b text) partition by hash (a);
create table foo1 partition of foo with (modulus 4, remainder 0);
create table foo2 partition of foo with (modulus 8, remainder 1); --legal,
modulus doesn't need to match
create table foo3 partition of foo with (modulus 8, remainder 4); --
illegal, overlaps foo1Instead of using modulus, why not just divide up the range of hash
keys using ranges?
That should be just as good for a good hash
function (effectively using the high bits instead of the low bits of
the hash value). And it would mean you could reuse the machinery for
list partitioning for partition exclusion.
It also has the advantage that it's easier to see how to add more
partitions. You just split all the ranges and (and migrate the
data...). There's even the possibility of having uneven partitions if
you have a data distribution skew -- which can happen even if you have
a good hash function. In a degenerate case you could have a partition
for a single hash of a particularly common value then a reasonable
number of partitions for the remaining hash ranges.
Initially
we
had
to have
somewhat similar thought to make a range of hash
values for
each partition, using the same half-open interval syntax we use in general:
create table foo (a integer, b text) partition by hash (a);
create table foo1 partition of foo for values from (0) to (1073741824);
create table foo2 partition of foo for values from (1073741824) to
(-2147483648);
create table foo3 partition of foo for values from (-2147483648) to
(-1073741824);
create table foo4 partition of foo for values from (-1073741824) to (0);
That's really nice for the system, but not so much for the users. The
system can
now generate each partition constraint correctly immediately upon seeing
the SQL
statement for the corresponding table, which is very desirable. However,
users are
not likely to know that the magic numbers to distribute keys equally across
four
partitions are 1073741824, -2147483648, and -1073741824.
So it's pretty
user-unfriendly.
Regards,
Amul
On 3/3/17 8:33 AM, amul sul wrote:
On Fri, Mar 3, 2017 at 5:00 PM, Greg Stark <stark@mit.edu
It also has the advantage that it's easier to see how to add more
partitions. You just split all the ranges and (and migrate the
data...). There's even the possibility of having uneven partitions if
you have a data distribution skew -- which can happen even if you have
a good hash function. In a degenerate case you could have a partition
for a single hash of a particularly common value then a reasonable
number of partitions for the remaining hash ranges.Initially
we
had
to have
somewhat similar thought to make a range of hash
values for
each partition, using the same half-open interval syntax we use in general:
<...>
So it's pretty
user-unfriendly.
This patch is marked as POC and after a read-through I agree that's
exactly what it is. As such, I'm not sure it belongs in the last
commitfest. Furthermore, there has not been any activity or a new patch
in a while and we are halfway through the CF.
Please post an explanation for the delay and a schedule for the new
patch. If no patch or explanation is posted by 2017-03-17 AoE I will
mark this submission "Returned with Feedback".
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Mar 14, 2017 at 10:08 AM, David Steele <david@pgmasters.net> wrote:
This patch is marked as POC and after a read-through I agree that's
exactly what it is.
Just out of curiosity, were you looking at Nagata-san's patch, or Amul's?
As such, I'm not sure it belongs in the last
commitfest. Furthermore, there has not been any activity or a new patch
in a while and we are halfway through the CF.Please post an explanation for the delay and a schedule for the new
patch. If no patch or explanation is posted by 2017-03-17 AoE I will
mark this submission "Returned with Feedback".
Regrettably, I do think it's too late to squeeze hash partitioning
into v10, but I plan to try to get something committed for v11. I was
heavily involved in the design of Amul's patch, and I think that
design solves several problems that would be an issue for us if we did
as Nagata-san is proposing. For example, he proposed this:
CREATE TABLE h1 PARTITION OF h;
CREATE TABLE h2 PARTITION OF h;
CREATE TABLE h3 PARTITION OF h;
That looks OK if you are thinking of typing this in interactively, but
if you're doing a pg_dump, maybe with --binary-upgrade, you don't want
the meaning of a series of nearly-identical SQL commands to depend on
the dump ordering. You want it to be explicit in the SQL command
which partition is which, and Amul's patch solves that problem. Also,
Nagata-san's proposal doesn't provide any way to increase the number
of partitions later, and Amul's approach gives you some options there.
I'm not sure those options are as good as we'd like them to be, and if
not then we may need to revise the approach, but I'm pretty sure
having no strategy at all for changing the partition count is not good
enough.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3/15/17 12:25 PM, Robert Haas wrote:
On Tue, Mar 14, 2017 at 10:08 AM, David Steele <david@pgmasters.net> wrote:
This patch is marked as POC and after a read-through I agree that's
exactly what it is.Just out of curiosity, were you looking at Nagata-san's patch, or Amul's?
Both - what I was looking for was some kind of reconciliation between
the two patches and I didn't find that. It seemed from the thread that
Yugo intended to pull Amul's changes/idea into his patch.
As such, I'm not sure it belongs in the last
commitfest. Furthermore, there has not been any activity or a new patch
in a while and we are halfway through the CF.Please post an explanation for the delay and a schedule for the new
patch. If no patch or explanation is posted by 2017-03-17 AoE I will
mark this submission "Returned with Feedback".Regrettably, I do think it's too late to squeeze hash partitioning
into v10, but I plan to try to get something committed for v11.
It would certainly be a nice feature to have.
I was
heavily involved in the design of Amul's patch, and I think that
design solves several problems that would be an issue for us if we did
as Nagata-san is proposing. For example, he proposed this:CREATE TABLE h1 PARTITION OF h;
CREATE TABLE h2 PARTITION OF h;
CREATE TABLE h3 PARTITION OF h;That looks OK if you are thinking of typing this in interactively, but
if you're doing a pg_dump, maybe with --binary-upgrade, you don't want
the meaning of a series of nearly-identical SQL commands to depend on
the dump ordering. You want it to be explicit in the SQL command
which partition is which, and Amul's patch solves that problem.
OK, it wasn't clear to me that this was the case because of the stated
user-unfriendliness.
Also,
Nagata-san's proposal doesn't provide any way to increase the number
of partitions later, and Amul's approach gives you some options there.
I'm not sure those options are as good as we'd like them to be, and if
not then we may need to revise the approach, but I'm pretty sure
having no strategy at all for changing the partition count is not good
enough.
Agreed. Perhaps both types of syntax should be supported, one that is
friendly to users and one that is precise for dump tools and those who
care get in the weeds.
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers