Multi-Column List Partitioning

Started by Nitin Jadhavalmost 5 years ago52 messageshackers
Jump to latest
#1Nitin Jadhav
nitinjadhavpostgres@gmail.com

Hi,

While reviewing one of the 'Table partitioning' related patches, I found
that Postgres does not support multiple column based LIST partitioning.
Based on this understanding, I have started working on this feature. I also
feel that 'Multi-Column List Partitioning' can be benefited to the Postgres
users in future.

I am attaching the WIP patch for this feature here. It supports
'Multi-Column List Partitioning', however some tasks are still pending. I
would like to know your thoughts about this, So that I can continue the
work with improvising the current patch.

Following things are handled in the patch.
1. Syntax

CREATE TABLE table_name (attrs) PARTITION BY LIST(list_of_columns);

Earlier there was no provision to mention multiple columns as part of the
'list_of_columns' clause. Now we can mention the list of columns separated
by comma.

CREATE TABLE table_name_p1 PARTITION OF table_name FOR VALUES IN
list_of_values.

Whereas list_of_columns can be
a. (value [,...])
b. (value [,...]) [,...]

I would like to list a few examples here for better understanding.
Ex-1:
CREATE TABLE t1(a int) PARTITION BY LIST(a);
CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN (1, 2, 10, 5, 7);

Ex-2:
CREATE TABLE t2(a int, b int) PARTITION BY LIST(a,b);
CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN (1, 2), (1, 5), (2, 2),(2,
10);

Please share if any changes are required in the above syntax.

2. Modified transformation logic to support above syntax.

3. Modified the data structures to store the information caused by above
syntax. Also modified the searching logic to route the tuple to the
appropriate partition.

4. Done a few basic testing and verified CREATE TABLE, INSERT INTO and
SELECT are working fine.

Following items are pending and I am working on it.

1. Handling of 'NULL' values.

2. Support multi column case in partition pruning.

3. Add test cases to the regression test suite.

Please share your thoughts.

Thanks & Regards,
Nitin Jadhav

Attachments:

v0_multi_column_list_partitioning.patchapplication/octet-stream; name=v0_multi_column_list_partitioning.patchDownload+312-104
#2Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Nitin Jadhav (#1)
Re: Multi-Column List Partitioning

While reviewing one of the 'Table partitioning' related patches,
I found that Postgres does not support multiple column based LIST
partitioning. Based on this understanding, I have started working on
this feature. I also feel that 'Multi-Column List Partitioning' can
be benefited to the Postgres users in future.

+1 for the feature. I also think this can help users deal with some
useful cases.

CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN (1, 2), (1, 5), (2,
2),(2, 10);

IMHO, listing every single tuple like this might be a bit cumbersome for
the user. What about something like this:

...FOR VALUES IN (1, 2, 3, 4), (11, 22, 33, 44), where the first set
is the list for values of column A and second list is for column B. We
can treat these lists as A X B possible values or simply (a1, b1), (a2,
b2) internally. However I see other proprietary databases already have
syntax something similar that you are proposing here. So, I leave it
open for the thoughts from experts. Also, though what I propose might be
easy from a user perspective, but might not be that easy for
implementation, given that for a larger number of columns in partition list
e.g. A X B X C X D lists become unmanageable.

I did not review the patch in detail, but a quick look at it leaves me
with following comments:

1.

+ * list. Then this function will continue the serach and return the

index of
Typo:
s/serach/search

2.
A compiler warning:
partprune.c: In function ‘get_matching_list_bounds’:
partprune.c:2731:20: error: passing argument 5 of ‘partition_list_bsearch’
makes pointer from integer without a cast [-Werror=int-conversion]
2731 | nvalues, value, &is_equal);
| ^~~~~
| |
| Datum {aka long unsigned int}
In file included from partprune.c:53:
../../../src/include/partitioning/partbounds.h:120:32: note: expected
‘Datum *’ {aka ‘long unsigned int *’} but argument is of type ‘Datum’ {aka
‘long unsigned int’}
120 | int nvalues, Datum *value, bool *is_equal);
| ~~~~~~~^~~~~

3.
And, a server crash with following case:
postgres=# CREATE TABLE t1 (a int) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TABLE t1p1 PARTITION OF t1 FOR VALUES IN (1, 2, 3);
CREATE TABLE
postgres=# \d+ t1p1
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?>

Stacktrace:
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f5d273c5859 in __GI_abort () at abort.c:79
#2 0x000055779d2eb69d in ExceptionalCondition
(conditionName=0x55779d4978d8 "ptr == NULL || nodeTag(ptr) == type",
errorType=0x55779d4978c3 "FailedAssertion",
fileName=0x55779d4978a0 "../../../src/include/nodes/nodes.h",
lineNumber=603) at assert.c:69
#3 0x000055779d03a684 in castNodeImpl (type=T_Const, ptr=0x55779e457b18)
at ../../../src/include/nodes/nodes.h:603
#4 0x000055779d04368a in get_qual_for_list (parent=0x7f5d1df829b8,
spec=0x55779e457950) at partbounds.c:4155
#5 0x000055779d03ac60 in get_qual_from_partbound (rel=0x7f5d1df82570,
parent=0x7f5d1df829b8, spec=0x55779e457950) at partbounds.c:272
#6 0x000055779d2cf630 in generate_partition_qual (rel=0x7f5d1df82570) at
partcache.c:379
#7 0x000055779d2cf468 in get_partition_qual_relid (relid=32771) at
partcache.c:308
#8 0x000055779d2592bf in pg_get_partition_constraintdef
(fcinfo=0x55779e44ee50) at ruleutils.c:2019
#9 0x000055779cec7221 in ExecInterpExpr (state=0x55779e44dfb0,
econtext=0x55779e407fe8, isnull=0x7ffddf9b109f) at execExprInterp.c:744
#10 0x000055779cec954f in ExecInterpExprStillValid (state=0x55779e44dfb0,
econtext=0x55779e407fe8, isNull=0x7ffddf9b109f) at execExprInterp.c:1819
#11 0x000055779cf1d58a in ExecEvalExprSwitchContext (state=0x55779e44dfb0,
econtext=0x55779e407fe8, isNull=0x7ffddf9b109f)
at ../../../src/include/executor/executor.h:338
#12 0x000055779cf1d602 in ExecProject (projInfo=0x55779e44dfa8) at
../../../src/include/executor/executor.h:372
#13 0x000055779cf1db2f in ExecNestLoop (pstate=0x55779e407ed0) at
nodeNestloop.c:241
#14 0x000055779cedf136 in ExecProcNodeFirst (node=0x55779e407ed0) at
execProcnode.c:462
#15 0x000055779ced3053 in ExecProcNode (node=0x55779e407ed0) at
../../../src/include/executor/executor.h:257
#16 0x000055779ced5a87 in ExecutePlan (estate=0x55779e407c80,
planstate=0x55779e407ed0, use_parallel_mode=false, operation=CMD_SELECT,
sendTuples=true, numberTuples=0,
direction=ForwardScanDirection, dest=0x55779e425a88, execute_once=true)
at execMain.c:1551
#17 0x000055779ced372d in standard_ExecutorRun (queryDesc=0x55779e453520,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:361
#18 0x000055779ced353c in ExecutorRun (queryDesc=0x55779e453520,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:305
#19 0x000055779d13d287 in PortalRunSelect (portal=0x55779e398800,
forward=true, count=0, dest=0x55779e425a88) at pquery.c:912
#20 0x000055779d13cec0 in PortalRun (portal=0x55779e398800,
count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x55779e425a88, altdest=0x55779e425a88,
qc=0x7ffddf9b14f0) at pquery.c:756
#21 0x000055779d1361ce in exec_simple_query (
query_string=0x55779e3367a0 "SELECT inhparent::pg_catalog.regclass,\n
pg_catalog.pg_get_expr(c.relpartbound, c.oid),\n inhdetachpending,\n
pg_catalog.pg_get_partition_constraintdef(c.oid)\nFROM pg_catalog.pg_class
c JOIN pg_catalo"...) at postgres.c:1214
#22 0x000055779d13ad8b in PostgresMain (argc=1, argv=0x7ffddf9b1710,
dbname=0x55779e3626f8 "postgres", username=0x55779e3626d8 "hadoop") at
postgres.c:4476
#23 0x000055779d0674d3 in BackendRun (port=0x55779e358380) at
postmaster.c:4488
#24 0x000055779d066d8c in BackendStartup (port=0x55779e358380) at
postmaster.c:4210
#25 0x000055779d062f9b in ServerLoop () at postmaster.c:1742
#26 0x000055779d062734 in PostmasterMain (argc=3, argv=0x55779e3308b0) at
postmaster.c:1414
#27 0x000055779cf5805f in main (argc=3, argv=0x55779e3308b0) at main.c:209

Regards,
Jeevan Ladhe

On Thu, May 6, 2021 at 7:33 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com>
wrote:

Show quoted text

Hi,

While reviewing one of the 'Table partitioning' related patches, I found
that Postgres does not support multiple column based LIST partitioning.
Based on this understanding, I have started working on this feature. I also
feel that 'Multi-Column List Partitioning' can be benefited to the Postgres
users in future.

I am attaching the WIP patch for this feature here. It supports
'Multi-Column List Partitioning', however some tasks are still pending. I
would like to know your thoughts about this, So that I can continue the
work with improvising the current patch.

Following things are handled in the patch.
1. Syntax

CREATE TABLE table_name (attrs) PARTITION BY LIST(list_of_columns);

Earlier there was no provision to mention multiple columns as part of the
'list_of_columns' clause. Now we can mention the list of columns separated
by comma.

CREATE TABLE table_name_p1 PARTITION OF table_name FOR VALUES IN
list_of_values.

Whereas list_of_columns can be
a. (value [,...])
b. (value [,...]) [,...]

I would like to list a few examples here for better understanding.
Ex-1:
CREATE TABLE t1(a int) PARTITION BY LIST(a);
CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN (1, 2, 10, 5, 7);

Ex-2:
CREATE TABLE t2(a int, b int) PARTITION BY LIST(a,b);
CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN (1, 2), (1, 5), (2, 2),(2,
10);

Please share if any changes are required in the above syntax.

2. Modified transformation logic to support above syntax.

3. Modified the data structures to store the information caused by above
syntax. Also modified the searching logic to route the tuple to the
appropriate partition.

4. Done a few basic testing and verified CREATE TABLE, INSERT INTO and
SELECT are working fine.

Following items are pending and I am working on it.

1. Handling of 'NULL' values.

2. Support multi column case in partition pruning.

3. Add test cases to the regression test suite.

Please share your thoughts.

Thanks & Regards,
Nitin Jadhav

#3Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Jeevan Ladhe (#2)
Re: Multi-Column List Partitioning

Thanks Jeevan for looking into this thread.

I did not review the patch in detail, but a quick look at it leaves me
with following comments:

I will incorporate these changes.

...FOR VALUES IN (1, 2, 3, 4), (11, 22, 33, 44), where the first set
is the list for values of column A and second list is for column B. We
can treat these lists as A X B possible values or simply (a1, b1), (a2,
b2) internally. However I see other proprietary databases already have
syntax something similar that you are proposing here. So, I leave it
open for the thoughts from experts. Also, though what I propose might be
easy from a user perspective, but might not be that easy for
implementation, given that for a larger number of columns in partition

list

e.g. A X B X C X D lists become unmanageable.

I feel this is also not easy from a user's perspective. For example
for a partition
with 2 partition keys (a,b) for values like (1,1), (1,2), (1,3),
(1,4),(1,5). This
would be converted to (1,1,1,1,1), (1,2,3,4,5). It is difficult to match
the values
of column 'a' to 'b'. Anyways let's wait for the other's opinion about this.

Thanks & Regards,
Nitin Jadhav

On Fri, May 7, 2021 at 7:36 PM Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
wrote:

Show quoted text

While reviewing one of the 'Table partitioning' related patches,
I found that Postgres does not support multiple column based LIST
partitioning. Based on this understanding, I have started working on
this feature. I also feel that 'Multi-Column List Partitioning' can
be benefited to the Postgres users in future.

+1 for the feature. I also think this can help users deal with some
useful cases.

CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN (1, 2), (1, 5), (2,
2),(2, 10);

IMHO, listing every single tuple like this might be a bit cumbersome for
the user. What about something like this:

...FOR VALUES IN (1, 2, 3, 4), (11, 22, 33, 44), where the first set
is the list for values of column A and second list is for column B. We
can treat these lists as A X B possible values or simply (a1, b1), (a2,
b2) internally. However I see other proprietary databases already have
syntax something similar that you are proposing here. So, I leave it
open for the thoughts from experts. Also, though what I propose might be
easy from a user perspective, but might not be that easy for
implementation, given that for a larger number of columns in partition list
e.g. A X B X C X D lists become unmanageable.

I did not review the patch in detail, but a quick look at it leaves me
with following comments:

1.

+ * list. Then this function will continue the serach and return the

index of
Typo:
s/serach/search

2.
A compiler warning:
partprune.c: In function ‘get_matching_list_bounds’:
partprune.c:2731:20: error: passing argument 5 of ‘partition_list_bsearch’
makes pointer from integer without a cast [-Werror=int-conversion]
2731 | nvalues, value, &is_equal);
| ^~~~~
| |
| Datum {aka long unsigned int}
In file included from partprune.c:53:
../../../src/include/partitioning/partbounds.h:120:32: note: expected
‘Datum *’ {aka ‘long unsigned int *’} but argument is of type ‘Datum’ {aka
‘long unsigned int’}
120 | int nvalues, Datum *value, bool *is_equal);
| ~~~~~~~^~~~~

3.
And, a server crash with following case:
postgres=# CREATE TABLE t1 (a int) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TABLE t1p1 PARTITION OF t1 FOR VALUES IN (1, 2, 3);
CREATE TABLE
postgres=# \d+ t1p1
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?>

Stacktrace:
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f5d273c5859 in __GI_abort () at abort.c:79
#2 0x000055779d2eb69d in ExceptionalCondition
(conditionName=0x55779d4978d8 "ptr == NULL || nodeTag(ptr) == type",
errorType=0x55779d4978c3 "FailedAssertion",
fileName=0x55779d4978a0 "../../../src/include/nodes/nodes.h",
lineNumber=603) at assert.c:69
#3 0x000055779d03a684 in castNodeImpl (type=T_Const, ptr=0x55779e457b18)
at ../../../src/include/nodes/nodes.h:603
#4 0x000055779d04368a in get_qual_for_list (parent=0x7f5d1df829b8,
spec=0x55779e457950) at partbounds.c:4155
#5 0x000055779d03ac60 in get_qual_from_partbound (rel=0x7f5d1df82570,
parent=0x7f5d1df829b8, spec=0x55779e457950) at partbounds.c:272
#6 0x000055779d2cf630 in generate_partition_qual (rel=0x7f5d1df82570) at
partcache.c:379
#7 0x000055779d2cf468 in get_partition_qual_relid (relid=32771) at
partcache.c:308
#8 0x000055779d2592bf in pg_get_partition_constraintdef
(fcinfo=0x55779e44ee50) at ruleutils.c:2019
#9 0x000055779cec7221 in ExecInterpExpr (state=0x55779e44dfb0,
econtext=0x55779e407fe8, isnull=0x7ffddf9b109f) at execExprInterp.c:744
#10 0x000055779cec954f in ExecInterpExprStillValid (state=0x55779e44dfb0,
econtext=0x55779e407fe8, isNull=0x7ffddf9b109f) at execExprInterp.c:1819
#11 0x000055779cf1d58a in ExecEvalExprSwitchContext (state=0x55779e44dfb0,
econtext=0x55779e407fe8, isNull=0x7ffddf9b109f)
at ../../../src/include/executor/executor.h:338
#12 0x000055779cf1d602 in ExecProject (projInfo=0x55779e44dfa8) at
../../../src/include/executor/executor.h:372
#13 0x000055779cf1db2f in ExecNestLoop (pstate=0x55779e407ed0) at
nodeNestloop.c:241
#14 0x000055779cedf136 in ExecProcNodeFirst (node=0x55779e407ed0) at
execProcnode.c:462
#15 0x000055779ced3053 in ExecProcNode (node=0x55779e407ed0) at
../../../src/include/executor/executor.h:257
#16 0x000055779ced5a87 in ExecutePlan (estate=0x55779e407c80,
planstate=0x55779e407ed0, use_parallel_mode=false, operation=CMD_SELECT,
sendTuples=true, numberTuples=0,
direction=ForwardScanDirection, dest=0x55779e425a88,
execute_once=true) at execMain.c:1551
#17 0x000055779ced372d in standard_ExecutorRun (queryDesc=0x55779e453520,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:361
#18 0x000055779ced353c in ExecutorRun (queryDesc=0x55779e453520,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:305
#19 0x000055779d13d287 in PortalRunSelect (portal=0x55779e398800,
forward=true, count=0, dest=0x55779e425a88) at pquery.c:912
#20 0x000055779d13cec0 in PortalRun (portal=0x55779e398800,
count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x55779e425a88, altdest=0x55779e425a88,
qc=0x7ffddf9b14f0) at pquery.c:756
#21 0x000055779d1361ce in exec_simple_query (
query_string=0x55779e3367a0 "SELECT inhparent::pg_catalog.regclass,\n
pg_catalog.pg_get_expr(c.relpartbound, c.oid),\n inhdetachpending,\n
pg_catalog.pg_get_partition_constraintdef(c.oid)\nFROM pg_catalog.pg_class
c JOIN pg_catalo"...) at postgres.c:1214
#22 0x000055779d13ad8b in PostgresMain (argc=1, argv=0x7ffddf9b1710,
dbname=0x55779e3626f8 "postgres", username=0x55779e3626d8 "hadoop") at
postgres.c:4476
#23 0x000055779d0674d3 in BackendRun (port=0x55779e358380) at
postmaster.c:4488
#24 0x000055779d066d8c in BackendStartup (port=0x55779e358380) at
postmaster.c:4210
#25 0x000055779d062f9b in ServerLoop () at postmaster.c:1742
#26 0x000055779d062734 in PostmasterMain (argc=3, argv=0x55779e3308b0) at
postmaster.c:1414
#27 0x000055779cf5805f in main (argc=3, argv=0x55779e3308b0) at main.c:209

Regards,
Jeevan Ladhe

On Thu, May 6, 2021 at 7:33 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com>
wrote:

Hi,

While reviewing one of the 'Table partitioning' related patches, I found
that Postgres does not support multiple column based LIST partitioning.
Based on this understanding, I have started working on this feature. I also
feel that 'Multi-Column List Partitioning' can be benefited to the Postgres
users in future.

I am attaching the WIP patch for this feature here. It supports
'Multi-Column List Partitioning', however some tasks are still pending. I
would like to know your thoughts about this, So that I can continue the
work with improvising the current patch.

Following things are handled in the patch.
1. Syntax

CREATE TABLE table_name (attrs) PARTITION BY LIST(list_of_columns);

Earlier there was no provision to mention multiple columns as part of the
'list_of_columns' clause. Now we can mention the list of columns separated
by comma.

CREATE TABLE table_name_p1 PARTITION OF table_name FOR VALUES IN
list_of_values.

Whereas list_of_columns can be
a. (value [,...])
b. (value [,...]) [,...]

I would like to list a few examples here for better understanding.
Ex-1:
CREATE TABLE t1(a int) PARTITION BY LIST(a);
CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN (1, 2, 10, 5, 7);

Ex-2:
CREATE TABLE t2(a int, b int) PARTITION BY LIST(a,b);
CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN (1, 2), (1, 5), (2,
2),(2, 10);

Please share if any changes are required in the above syntax.

2. Modified transformation logic to support above syntax.

3. Modified the data structures to store the information caused by above
syntax. Also modified the searching logic to route the tuple to the
appropriate partition.

4. Done a few basic testing and verified CREATE TABLE, INSERT INTO and
SELECT are working fine.

Following items are pending and I am working on it.

1. Handling of 'NULL' values.

2. Support multi column case in partition pruning.

3. Add test cases to the regression test suite.

Please share your thoughts.

Thanks & Regards,
Nitin Jadhav

#4Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Nitin Jadhav (#1)
Re: Multi-Column List Partitioning

Hello Nitin,

On Thu, May 6, 2021 at 11:03 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:

Hi,

While reviewing one of the 'Table partitioning' related patches, I found that Postgres does not support multiple column based LIST partitioning. Based on this understanding, I have started working on this feature. I also feel that 'Multi-Column List Partitioning' can be benefited to the Postgres users in future.

Yes, it would be nice to have this. Thanks for picking this up.

I am attaching the WIP patch for this feature here. It supports 'Multi-Column List Partitioning', however some tasks are still pending. I would like to know your thoughts about this, So that I can continue the work with improvising the current patch.

Following things are handled in the patch.
1. Syntax

CREATE TABLE table_name (attrs) PARTITION BY LIST(list_of_columns);

Earlier there was no provision to mention multiple columns as part of the 'list_of_columns' clause. Now we can mention the list of columns separated by comma.

CREATE TABLE table_name_p1 PARTITION OF table_name FOR VALUES IN list_of_values.

Whereas list_of_columns can be
a. (value [,...])
b. (value [,...]) [,...]

I would like to list a few examples here for better understanding.
Ex-1:
CREATE TABLE t1(a int) PARTITION BY LIST(a);
CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN (1, 2, 10, 5, 7);

Ex-2:
CREATE TABLE t2(a int, b int) PARTITION BY LIST(a,b);
CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN (1, 2), (1, 5), (2, 2),(2, 10);

Hmm, why not have parentheses around these lists, that is: (
(list_of_values) [, ...] )

So your example would look like this:

CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN ((1, 2), (1, 5), (2,
2), (2, 10));

IMO, it is not such a bad syntax from a user's PoV. It's not hard to
understand from this syntax that the partition constraint is something
like (a, b) = (1, 2) OR (a, b) = (1, 5) OR ..., where the = performs
row-wise comparison.

I will now take a look at the patch itself.

--
Amit Langote
EDB: http://www.enterprisedb.com

#5Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#4)
Re: Multi-Column List Partitioning

On Fri, May 21, 2021 at 1:02 PM Amit Langote <amitlangote09@gmail.com> wrote:

I will now take a look at the patch itself.

Some quick observations:

* I get a lot of instances of the following 2 warnings when compiling
the patched code:

Warning #1:

partprune.c: In function ‘get_matching_list_bounds’:
partprune.c:2731:11: warning: passing argument 5 of
‘partition_list_bsearch’ makes pointer from integer without a cast
[enabled by default]
nvalues, value, &is_equal);
^
In file included from partprune.c:53:0:
../../../src/include/partitioning/partbounds.h:117:12: note: expected
‘Datum *’ but argument is of type ‘Datum’
extern int partition_list_bsearch(FmgrInfo *partsupfunc,

Warning #2:

partprune.c:2781:12: warning: incompatible integer to pointer
conversion passing 'Datum'
(aka 'unsigned long') to parameter of type 'Datum *' (aka
'unsigned long *'); take the
address with & [-Wint-conversion]

value, &is_equal);

^~~~~

&
../../../src/include/partitioning/partbounds.h:120:32: note: passing
argument to parameter 'value'
here
...int nvalues, Datum *value, bool *is_equal);

* I think this code:

===
/* Get the only column's name in case we need to output an error */
if (key->partattrs[0] != 0)
colname = get_attname(RelationGetRelid(parent),
key->partattrs[0], false);
else
colname = deparse_expression((Node *) linitial(partexprs),

deparse_context_for(RelationGetRelationName(parent),

RelationGetRelid(parent)),
false, false);
/* Need its type data too */
coltype = get_partition_col_typid(key, 0);
coltypmod = get_partition_col_typmod(key, 0);
partcollation = get_partition_col_collation(key, 0);
===

belongs in the new function transformPartitionListBounds that you
added, because without doing so, any errors having to do with
partitioning columns other than the first one will report the first
column's name in the error message:

postgres=# create table foo (a bool, b bool) partition by list (a, b);
CREATE TABLE

-- this is fine!
postgres=# create table foo_true_true partition of foo for values in (1, true);
ERROR: specified value cannot be cast to type boolean for column "a"
LINE 1: ...able foo_true_true partition of foo for values in (1, true);

-- not this!
postgres=# create table foo_true_true partition of foo for values in (true, 1);
ERROR: specified value cannot be cast to type boolean for column "a"
LINE 1: ...able foo_true_true partition of foo for values in (true, 1);

* The following prototype of transformPartitionListBounds() means that
all values in a given bound list are analyzed with the first
partitioning column's colname, type, typmod, etc., which is wrong:

+static List *
+transformPartitionListBounds(ParseState *pstate, PartitionBoundSpec *spec,
+                            char *colname, Oid coltype, int32 coltypmod,
+                            Oid partcollation, int partnatts)
+{

An example of wrong behavior because of that:

postgres=# create table foo (a bool, b text) partition by list (a, b);
CREATE TABLE
Time: 3.967 ms
postgres=# create table foo_true_true partition of foo for values in
(true, 'whatever');
ERROR: invalid input syntax for type boolean: "whatever"
LINE 1: ...o_true_true partition of foo for values in (true, 'whatever'...

"whatever" should've been accepted but because it's checked with a's
type, it is wrongly flagged.

Please take a look at how transformPartitionRangeBound() handles this,
especially how it uses the correct partitioning column's info to
analyze the corresponding bound value expression.

I will continue looking next week.

--
Amit Langote
EDB: http://www.enterprisedb.com

#6Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Amit Langote (#5)
Re: Multi-Column List Partitioning

Yes, it would be nice to have this. Thanks for picking this up.

Thanks for confirming.

Some quick observations:

Thanks for providing the comments. I will handle these cases.

Hmm, why not have parentheses around these lists, that is: (
(list_of_values) [, ...] )

So your example would look like this:

CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN ((1, 2), (1, 5), (2,
2), (2, 10));

I am ok with this syntax. This would be more appropriate.

IMO, it is not such a bad syntax from a user's PoV. It's not hard to
understand from this syntax that the partition constraint is something
like (a, b) = (1, 2) OR (a, b) = (1, 5) OR ..., where the = performs
row-wise comparison.

Thanks for suggesting to use row-wise comparison. I have few queries
with respect to handling of NULL values.

1. What should be the partition constraint for the above case. AFAIK,
row-wise comparison wont work with NULL values as shown in [1]postgres@15890=#SELECT ROW(1, 2) = ROW(1, 2); ?column? ---------- t (1 row). I mean
two rows are considered equal if all their corresponding members are
non-null and equal. The rows are unequal if any corresponding members
are non-null and unequal. Otherwise the result of the row comparison
is unknown (null). So we should generate different types of
constraints for NULL values.

Ex:
CREATE TABLE t(a int, b int) PARTITION BY LIST(a,b);
CREATE TABLE t_1 PARTITION OF t FOR VALUES IN (1, 1), (1, NULL),
(NULL, 1), (NULL, NULL);

As per my knowledge, we should consider creating partition constraints
for the above example as given below.

(a, b) = (1, 1) OR ((a = 1) AND (b IS NULL)) OR ((a IS NULL) AND (b =
1)) OR ((a is NULL) AND (b is NULL)).

Kindly correct me if I am wrong.

2. In the current code we don't put the NULL value in the 'datums'
field of 'PartitionBoundInfoData' structure [2]typedef struct PartitionBoundInfoData { char strategy; /* hash, list or range? */ int ndatums; /* Length of the datums[] array */ Datum **datums; PartitionRangeDatumKind **kind; /* The kind of each range bound datum; * NULL for hash and list partitioned * tables */ int nindexes; /* Length of the indexes[] array */ int *indexes; /* Partition indexes */ int null_index; /* Index of the null-accepting partition; -1 * if there isn't one */ int default_index; /* Index of the default partition; -1 if there * isn't one */ } PartitionBoundInfoData;. Since there can be
only one NULL value, we directly store the corresponding index value
in the 'null_index' field. Now we have to handle multiple NULL values
in case of Multi-Column List Partitioning. So the question is how to
handle this scenario. Following are the 2 approaches to handle this.

Approach-1:
Add another field 'bool **isnull' in [2]typedef struct PartitionBoundInfoData { char strategy; /* hash, list or range? */ int ndatums; /* Length of the datums[] array */ Datum **datums; PartitionRangeDatumKind **kind; /* The kind of each range bound datum; * NULL for hash and list partitioned * tables */ int nindexes; /* Length of the indexes[] array */ int *indexes; /* Partition indexes */ int null_index; /* Index of the null-accepting partition; -1 * if there isn't one */ int default_index; /* Index of the default partition; -1 if there * isn't one */ } PartitionBoundInfoData; and mark the corresponding
element to TRUE if it has NULL value and the corresponding location in
'datums' contains empty/No value. For example, If a partition bound is
(1, NULL), then

datums[0][0] = 1
datums[0][1]postgres@15890=#SELECT ROW(1, 2) = ROW(1, 2); ?column? ---------- t (1 row) = Not assigned any value
isnull[0][0] = FALSE
is null[0][1]postgres@15890=#SELECT ROW(1, 2) = ROW(1, 2); ?column? ---------- t (1 row) = TRUE

So now we have an entry in the 'datums' field for a bound containing
NULL value, so we should handle this in all the scenarios where we are
manipulating 'datums' in order to support NULL values and avoid crash.

Approach-2:
Don't add the bound information to 'datums' field of [2]typedef struct PartitionBoundInfoData { char strategy; /* hash, list or range? */ int ndatums; /* Length of the datums[] array */ Datum **datums; PartitionRangeDatumKind **kind; /* The kind of each range bound datum; * NULL for hash and list partitioned * tables */ int nindexes; /* Length of the indexes[] array */ int *indexes; /* Partition indexes */ int null_index; /* Index of the null-accepting partition; -1 * if there isn't one */ int default_index; /* Index of the default partition; -1 if there * isn't one */ } PartitionBoundInfoData; if any of the
value is NULL. Store this information separately in the structures
mentioned in [3]typedef struct NullBoundDatumInfo { Datum *datum; int col_index; int. bound_index; } NullBoundDatumInfo; and process accordingly.

I feel approach-1 is the better solution as this requires less code
changes and easy to implement than approach-2. Kindly share your
thoughts about the approaches and please share if you have any better
solution than the above 2.

[1]: postgres@15890=#SELECT ROW(1, 2) = ROW(1, 2); ?column? ---------- t (1 row)
postgres@15890=#SELECT ROW(1, 2) = ROW(1, 2);
?column?
----------
t
(1 row)

postgres@15890=#SELECT ROW(1, 2) = ROW(1, 1);
?column?
----------
f
(1 row)

postgres@15890=#SELECT ROW(1, NULL) = ROW(1, NULL);
?column?
----------

(1 row)

postgres@15890=#SELECT ROW(1, 2) = ROW(1, NULL);
?column?
----------

(1 row)

[2]: typedef struct PartitionBoundInfoData { char strategy; /* hash, list or range? */ int ndatums; /* Length of the datums[] array */ Datum **datums; PartitionRangeDatumKind **kind; /* The kind of each range bound datum; * NULL for hash and list partitioned * tables */ int nindexes; /* Length of the indexes[] array */ int *indexes; /* Partition indexes */ int null_index; /* Index of the null-accepting partition; -1 * if there isn't one */ int default_index; /* Index of the default partition; -1 if there * isn't one */ } PartitionBoundInfoData;
typedef struct PartitionBoundInfoData
{
char strategy; /* hash, list or range? */
int ndatums; /* Length of the datums[] array */
Datum **datums;
PartitionRangeDatumKind **kind; /* The kind of each range bound datum;
* NULL for hash and list partitioned
* tables */
int nindexes; /* Length of the indexes[] array */
int *indexes; /* Partition indexes */
int null_index; /* Index of the null-accepting partition; -1
* if there isn't one */
int default_index; /* Index of the default partition; -1 if there
* isn't one */
} PartitionBoundInfoData;

[3]: typedef struct NullBoundDatumInfo { Datum *datum; int col_index; int. bound_index; } NullBoundDatumInfo;
typedef struct NullBoundDatumInfo
{
Datum *datum;
int col_index;
int. bound_index;
} NullBoundDatumInfo;

typedef struct NullBoundIsNullInfo
{
int col_index;
int. bound_index;
} NullBoundIsNullInfo;

Add 2 fields of type 'NullBoundDatumInfo' and 'NullBoundIsNullInfo' to
the structure [2]typedef struct PartitionBoundInfoData { char strategy; /* hash, list or range? */ int ndatums; /* Length of the datums[] array */ Datum **datums; PartitionRangeDatumKind **kind; /* The kind of each range bound datum; * NULL for hash and list partitioned * tables */ int nindexes; /* Length of the indexes[] array */ int *indexes; /* Partition indexes */ int null_index; /* Index of the null-accepting partition; -1 * if there isn't one */ int default_index; /* Index of the default partition; -1 if there * isn't one */ } PartitionBoundInfoData;.

--
Thanks & Regards,
Nitin Jadhav

Show quoted text

On Fri, May 21, 2021 at 5:47 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Fri, May 21, 2021 at 1:02 PM Amit Langote <amitlangote09@gmail.com> wrote:

I will now take a look at the patch itself.

Some quick observations:

* I get a lot of instances of the following 2 warnings when compiling
the patched code:

Warning #1:

partprune.c: In function ‘get_matching_list_bounds’:
partprune.c:2731:11: warning: passing argument 5 of
‘partition_list_bsearch’ makes pointer from integer without a cast
[enabled by default]
nvalues, value, &is_equal);
^
In file included from partprune.c:53:0:
../../../src/include/partitioning/partbounds.h:117:12: note: expected
‘Datum *’ but argument is of type ‘Datum’
extern int partition_list_bsearch(FmgrInfo *partsupfunc,

Warning #2:

partprune.c:2781:12: warning: incompatible integer to pointer
conversion passing 'Datum'
(aka 'unsigned long') to parameter of type 'Datum *' (aka
'unsigned long *'); take the
address with & [-Wint-conversion]

value, &is_equal);

^~~~~

&
../../../src/include/partitioning/partbounds.h:120:32: note: passing
argument to parameter 'value'
here
...int nvalues, Datum *value, bool *is_equal);

* I think this code:

===
/* Get the only column's name in case we need to output an error */
if (key->partattrs[0] != 0)
colname = get_attname(RelationGetRelid(parent),
key->partattrs[0], false);
else
colname = deparse_expression((Node *) linitial(partexprs),

deparse_context_for(RelationGetRelationName(parent),

RelationGetRelid(parent)),
false, false);
/* Need its type data too */
coltype = get_partition_col_typid(key, 0);
coltypmod = get_partition_col_typmod(key, 0);
partcollation = get_partition_col_collation(key, 0);
===

belongs in the new function transformPartitionListBounds that you
added, because without doing so, any errors having to do with
partitioning columns other than the first one will report the first
column's name in the error message:

postgres=# create table foo (a bool, b bool) partition by list (a, b);
CREATE TABLE

-- this is fine!
postgres=# create table foo_true_true partition of foo for values in (1, true);
ERROR: specified value cannot be cast to type boolean for column "a"
LINE 1: ...able foo_true_true partition of foo for values in (1, true);

-- not this!
postgres=# create table foo_true_true partition of foo for values in (true, 1);
ERROR: specified value cannot be cast to type boolean for column "a"
LINE 1: ...able foo_true_true partition of foo for values in (true, 1);

* The following prototype of transformPartitionListBounds() means that
all values in a given bound list are analyzed with the first
partitioning column's colname, type, typmod, etc., which is wrong:

+static List *
+transformPartitionListBounds(ParseState *pstate, PartitionBoundSpec *spec,
+                            char *colname, Oid coltype, int32 coltypmod,
+                            Oid partcollation, int partnatts)
+{

An example of wrong behavior because of that:

postgres=# create table foo (a bool, b text) partition by list (a, b);
CREATE TABLE
Time: 3.967 ms
postgres=# create table foo_true_true partition of foo for values in
(true, 'whatever');
ERROR: invalid input syntax for type boolean: "whatever"
LINE 1: ...o_true_true partition of foo for values in (true, 'whatever'...

"whatever" should've been accepted but because it's checked with a's
type, it is wrongly flagged.

Please take a look at how transformPartitionRangeBound() handles this,
especially how it uses the correct partitioning column's info to
analyze the corresponding bound value expression.

I will continue looking next week.

--
Amit Langote
EDB: http://www.enterprisedb.com

#7Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Nitin Jadhav (#6)
Re: Multi-Column List Partitioning

On Sun, May 23, 2021 at 6:49 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:

IMO, it is not such a bad syntax from a user's PoV. It's not hard to
understand from this syntax that the partition constraint is something
like (a, b) = (1, 2) OR (a, b) = (1, 5) OR ..., where the = performs
row-wise comparison.

Thanks for suggesting to use row-wise comparison.

Actually, I was just describing how the *users* may want to visualize
the partition constraint...

I have few queries
with respect to handling of NULL values.

1. What should be the partition constraint for the above case. AFAIK,
row-wise comparison wont work with NULL values as shown in [1]. I mean
two rows are considered equal if all their corresponding members are
non-null and equal. The rows are unequal if any corresponding members
are non-null and unequal. Otherwise the result of the row comparison
is unknown (null). So we should generate different types of
constraints for NULL values.

Ex:
CREATE TABLE t(a int, b int) PARTITION BY LIST(a,b);
CREATE TABLE t_1 PARTITION OF t FOR VALUES IN (1, 1), (1, NULL),
(NULL, 1), (NULL, NULL);

As per my knowledge, we should consider creating partition constraints
for the above example as given below.

(a, b) = (1, 1) OR ((a = 1) AND (b IS NULL)) OR ((a IS NULL) AND (b =
1)) OR ((a is NULL) AND (b is NULL)).

Yeah, something like that should do the trick.

Again, I was not actually suggesting that you write code to implement
the constraint using something like RowCompareExpr, only that the
users might want to view the constraint as doing row-wise comparison
of the partitioning columns and the specified value lists.

2. In the current code we don't put the NULL value in the 'datums'
field of 'PartitionBoundInfoData' structure [2]. Since there can be
only one NULL value, we directly store the corresponding index value
in the 'null_index' field. Now we have to handle multiple NULL values
in case of Multi-Column List Partitioning. So the question is how to
handle this scenario. Following are the 2 approaches to handle this.

Approach-1:
Add another field 'bool **isnull' in [2] and mark the corresponding
element to TRUE if it has NULL value and the corresponding location in
'datums' contains empty/No value. For example, If a partition bound is
(1, NULL), then

datums[0][0] = 1
datums[0][1] = Not assigned any value
isnull[0][0] = FALSE
is null[0][1] = TRUE

So now we have an entry in the 'datums' field for a bound containing
NULL value, so we should handle this in all the scenarios where we are
manipulating 'datums' in order to support NULL values and avoid crash.

Approach-2:
Don't add the bound information to 'datums' field of [2] if any of the
value is NULL. Store this information separately in the structures
mentioned in [3] and process accordingly.

I feel approach-1 is the better solution as this requires less code
changes and easy to implement than approach-2. Kindly share your
thoughts about the approaches and please share if you have any better
solution than the above 2.

Approach 1 sounds better. It sounds like approach 1 might help us
implement support for allowing NULLs in range partition bounds in the
future, if at all. For now, it might be better to not allocate the
isnull array except for list partitioning.

I'll wait for you to post a new patch addressing at least the comments
in my earlier email. Also, please make sure to run `make check`
successfully before posting the patch. :)

Thanks.

--
Amit Langote
EDB: http://www.enterprisedb.com

#8Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Amit Langote (#7)
Re: Multi-Column List Partitioning

Approach 1 sounds better. It sounds like approach 1 might help us
implement support for allowing NULLs in range partition bounds in the
future, if at all. For now, it might be better to not allocate the
isnull array except for list partitioning.

Thanks for confirming.

I'll wait for you to post a new patch addressing at least the comments
in my earlier email. Also, please make sure to run `make check`
successfully before posting the patch. :)

I have fixed all of the review comments given by you and Jeevan in the
attached patch and also the attached patch contains more changes
compared to the previous patch. Following are the implementation
details.

1. Regarding syntax, the existing syntax will work fine for the
single-column list partitioning. However I have used the new syntax
for the multi-column list partitioning as we discussed earlier. I have
used a combination of 'AND' and 'OR' logic for the partition
constraints as given in the below example.

postgres@17503=#create table t(a int, b text) partition by list(a,b);
CREATE TABLE
postgres@17503=#create table t1 partition of t for values in ((1,'a'),
(NULL,'b'));
CREATE TABLE
postgres@17503=#\d+ t
Partitioned table "public.t"
Column | Type | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
a | integer | | | | plain |
| |
b | text | | | | extended |
| |
Partition key: LIST (a, b)
Partitions: t1 FOR VALUES IN ((1, 'a'), (NULL, 'b'))

postgres@17503=#\d+ t1
Table "public.t1"
Column | Type | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
a | integer | | | | plain |
| |
b | text | | | | extended |
| |
Partition of: t FOR VALUES IN ((1, 'a'), (NULL, 'b'))
Partition constraint: (((a = 1) AND (b = 'a'::text)) OR ((a IS NULL)
AND (b = 'b'::text)))
Access method: heap

2. In the existing code, NULL values were handled differently. It was
not added to the 'datums' variable, rather used to store the partition
index directly in the 'null_index' variable. Now there is a
possibility of multiple NULL values, hence introducing a new member
'isnulls' in the 'PartitionBoundInfoData' struct which indicates
whether the corresponding element in the 'datums' is NULL. Now
'null_index' cannot be used directly to store the partition index, so
removed it and made the necessary changes in multiple places.

3. I have added test cases for 'create table' and 'insert' statements
related to multi-column list partitioning and these are working fine
with 'make check'.

4. Handled the partition pruning code to accommodate these changes for
single-column list partitioning. However it is pending for
multi-column list partitioning.

5. I have done necessary changes in partition wise join related code
to accommodate for single-column list partitioning. However it is
pending for multi-column list partitioning.

Kindly review the patch and let me know if any changes are required.

Pending items:
1. Support of partition pruning for multi-column list partitioning.
2. Support of partition wise join for multi-column list partitioning.

I will continue to work on the above 2 items.
Kindly let me know if I am missing something.

Thanks & Regards,
Nitin Jadhav

Show quoted text

On Wed, May 26, 2021 at 10:27 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Sun, May 23, 2021 at 6:49 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:

IMO, it is not such a bad syntax from a user's PoV. It's not hard to
understand from this syntax that the partition constraint is something
like (a, b) = (1, 2) OR (a, b) = (1, 5) OR ..., where the = performs
row-wise comparison.

Thanks for suggesting to use row-wise comparison.

Actually, I was just describing how the *users* may want to visualize
the partition constraint...

I have few queries
with respect to handling of NULL values.

1. What should be the partition constraint for the above case. AFAIK,
row-wise comparison wont work with NULL values as shown in [1]. I mean
two rows are considered equal if all their corresponding members are
non-null and equal. The rows are unequal if any corresponding members
are non-null and unequal. Otherwise the result of the row comparison
is unknown (null). So we should generate different types of
constraints for NULL values.

Ex:
CREATE TABLE t(a int, b int) PARTITION BY LIST(a,b);
CREATE TABLE t_1 PARTITION OF t FOR VALUES IN (1, 1), (1, NULL),
(NULL, 1), (NULL, NULL);

As per my knowledge, we should consider creating partition constraints
for the above example as given below.

(a, b) = (1, 1) OR ((a = 1) AND (b IS NULL)) OR ((a IS NULL) AND (b =
1)) OR ((a is NULL) AND (b is NULL)).

Yeah, something like that should do the trick.

Again, I was not actually suggesting that you write code to implement
the constraint using something like RowCompareExpr, only that the
users might want to view the constraint as doing row-wise comparison
of the partitioning columns and the specified value lists.

2. In the current code we don't put the NULL value in the 'datums'
field of 'PartitionBoundInfoData' structure [2]. Since there can be
only one NULL value, we directly store the corresponding index value
in the 'null_index' field. Now we have to handle multiple NULL values
in case of Multi-Column List Partitioning. So the question is how to
handle this scenario. Following are the 2 approaches to handle this.

Approach-1:
Add another field 'bool **isnull' in [2] and mark the corresponding
element to TRUE if it has NULL value and the corresponding location in
'datums' contains empty/No value. For example, If a partition bound is
(1, NULL), then

datums[0][0] = 1
datums[0][1] = Not assigned any value
isnull[0][0] = FALSE
is null[0][1] = TRUE

So now we have an entry in the 'datums' field for a bound containing
NULL value, so we should handle this in all the scenarios where we are
manipulating 'datums' in order to support NULL values and avoid crash.

Approach-2:
Don't add the bound information to 'datums' field of [2] if any of the
value is NULL. Store this information separately in the structures
mentioned in [3] and process accordingly.

I feel approach-1 is the better solution as this requires less code
changes and easy to implement than approach-2. Kindly share your
thoughts about the approaches and please share if you have any better
solution than the above 2.

Approach 1 sounds better. It sounds like approach 1 might help us
implement support for allowing NULLs in range partition bounds in the
future, if at all. For now, it might be better to not allocate the
isnull array except for list partitioning.

I'll wait for you to post a new patch addressing at least the comments
in my earlier email. Also, please make sure to run `make check`
successfully before posting the patch. :)

Thanks.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v1_multi_column_list_partitioning.patchapplication/octet-stream; name=v1_multi_column_list_partitioning.patchDownload+1037-220
#9Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Nitin Jadhav (#8)
Re: Multi-Column List Partitioning

Hi Nitin,

On Thu, Jun 3, 2021 at 11:45 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:

I'll wait for you to post a new patch addressing at least the comments
in my earlier email. Also, please make sure to run `make check`
successfully before posting the patch. :)

I have fixed all of the review comments given by you and Jeevan in the
attached patch and also the attached patch contains more changes
compared to the previous patch. Following are the implementation
details.

Thanks for the updated version.

1. Regarding syntax, the existing syntax will work fine for the
single-column list partitioning. However I have used the new syntax
for the multi-column list partitioning as we discussed earlier. I have
used a combination of 'AND' and 'OR' logic for the partition
constraints as given in the below example.

postgres@17503=#create table t(a int, b text) partition by list(a,b);
CREATE TABLE
postgres@17503=#create table t1 partition of t for values in ((1,'a'),
(NULL,'b'));
CREATE TABLE
postgres@17503=#\d+ t
Partitioned table "public.t"
Column | Type | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
a | integer | | | | plain |
| |
b | text | | | | extended |
| |
Partition key: LIST (a, b)
Partitions: t1 FOR VALUES IN ((1, 'a'), (NULL, 'b'))

postgres@17503=#\d+ t1
Table "public.t1"
Column | Type | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
a | integer | | | | plain |
| |
b | text | | | | extended |
| |
Partition of: t FOR VALUES IN ((1, 'a'), (NULL, 'b'))
Partition constraint: (((a = 1) AND (b = 'a'::text)) OR ((a IS NULL)
AND (b = 'b'::text)))
Access method: heap

The constraint expressions seem to come out correctly, though I
haven't checked your implementation closely yet.

2. In the existing code, NULL values were handled differently. It was
not added to the 'datums' variable, rather used to store the partition
index directly in the 'null_index' variable. Now there is a
possibility of multiple NULL values, hence introducing a new member
'isnulls' in the 'PartitionBoundInfoData' struct which indicates
whether the corresponding element in the 'datums' is NULL. Now
'null_index' cannot be used directly to store the partition index, so
removed it and made the necessary changes in multiple places.

3. I have added test cases for 'create table' and 'insert' statements
related to multi-column list partitioning and these are working fine
with 'make check'.

4. Handled the partition pruning code to accommodate these changes for
single-column list partitioning. However it is pending for
multi-column list partitioning.

5. I have done necessary changes in partition wise join related code
to accommodate for single-column list partitioning. However it is
pending for multi-column list partitioning.

Kindly review the patch and let me know if any changes are required.

The new list bound binary search and related comparison support
function look a bit too verbose to me. I was expecting
partition_list_bsearch() to look very much like
partition_range_datum_bsearch(), but that is not the case. The
special case code that you wrote in partition_list_bsearch() seems
unnecessary, at least in that function. I'm talking about the code
fragment starting with this comment:

/*
* Once we find the matching for the first column but if it does not
* match for the any of the other columns, then the binary search
* will not work in all the cases. We should traverse just below
* and above the mid index until we find the match or we reach the
* first mismatch.
*/

I guess you're perhaps trying to address the case where the caller
does not specify the values for all of the partition key columns,
which can happen when the partition pruning code needs to handle a set
of clauses matching only some of the partition key columns. But
that's a concern of the partition pruning code and so the special case
should be handled there (if at all), not in the binary search function
that is shared with other callers. Regarding that, I'm wondering if
we should require clauses matching all of the partition key columns to
be found for the pruning code to call the binary search, so do
something like get_matching_hash_bounds() does:

/*
* For hash partitioning we can only perform pruning based on equality
* clauses to the partition key or IS NULL clauses. We also can only
* prune if we got values for all keys.
*/
if (nvalues + bms_num_members(nullkeys) == partnatts)
{
/* code to compute matching hash bound offset */
}
else
{
/* Report all valid offsets into the boundinfo->indexes array. */
result->bound_offsets = bms_add_range(NULL, 0,
boundinfo->nindexes - 1);
}

Do you think that trying to match list partitions even with fewer keys
is worth the complexity of the implementation? That is, is the use
case to search for only a subset of partition key columns common
enough with list partitioning?

If we do decide to implement the special case, remember that to do
that efficiently, we'd need to require that the subset of matched key
columns constitutes a prefix, because of the way the datums are
sorted. That is, match all partitions when the query only contains a
clause for b when the partition key is (a, b, c), but engage the
special case of pruning if the query contains clauses for a, or for a
and b.

I will look at other parts of the patch next week hopefully. For
now, attached is a delta patch that applies on top of your v1, which
does:

* Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
* Make qsort_partition_list_value_cmp simply call
partition_lbound_datum_cmp() instead of having its own logic to
compare input bounds
* Move partition_lbound_datum_cmp() into partbounds.c as a static
function (export seems unnecessary)
* Add a comment for PartitionBoundInfo.isnulls and remove that for null_index

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v1_delta_amit.patchapplication/octet-stream; name=v1_delta_amit.patchDownload+58-126
#10Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#9)
Re: Multi-Column List Partitioning

On Fri, Jun 11, 2021 at 12:37 PM Amit Langote <amitlangote09@gmail.com> wrote:

I will look at other parts of the patch next week hopefully. For
now, attached is a delta patch that applies on top of your v1, which
does:

* Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
* Make qsort_partition_list_value_cmp simply call
partition_lbound_datum_cmp() instead of having its own logic to
compare input bounds
* Move partition_lbound_datum_cmp() into partbounds.c as a static
function (export seems unnecessary)
* Add a comment for PartitionBoundInfo.isnulls and remove that for null_index

One more:

* Add all columns of newly added test query in insert.sql to the order
by clause to get predictably ordered output

--
Amit Langote
EDB: http://www.enterprisedb.com

#11Zhihong Yu
zyu@yugabyte.com
In reply to: Amit Langote (#9)
Re: Multi-Column List Partitioning

On Thu, Jun 10, 2021 at 8:38 PM Amit Langote <amitlangote09@gmail.com>
wrote:

Hi Nitin,

On Thu, Jun 3, 2021 at 11:45 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:

I'll wait for you to post a new patch addressing at least the comments
in my earlier email. Also, please make sure to run `make check`
successfully before posting the patch. :)

I have fixed all of the review comments given by you and Jeevan in the
attached patch and also the attached patch contains more changes
compared to the previous patch. Following are the implementation
details.

Thanks for the updated version.

1. Regarding syntax, the existing syntax will work fine for the
single-column list partitioning. However I have used the new syntax
for the multi-column list partitioning as we discussed earlier. I have
used a combination of 'AND' and 'OR' logic for the partition
constraints as given in the below example.

postgres@17503=#create table t(a int, b text) partition by list(a,b);
CREATE TABLE
postgres@17503=#create table t1 partition of t for values in ((1,'a'),
(NULL,'b'));
CREATE TABLE
postgres@17503=#\d+ t
Partitioned table "public.t"
Column | Type | Collation | Nullable | Default | Storage |
Compression | Stats target | Description

--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------

a | integer | | | | plain |
| |
b | text | | | | extended |
| |
Partition key: LIST (a, b)
Partitions: t1 FOR VALUES IN ((1, 'a'), (NULL, 'b'))

postgres@17503=#\d+ t1
Table "public.t1"
Column | Type | Collation | Nullable | Default | Storage |
Compression | Stats target | Description

--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------

a | integer | | | | plain |
| |
b | text | | | | extended |
| |
Partition of: t FOR VALUES IN ((1, 'a'), (NULL, 'b'))
Partition constraint: (((a = 1) AND (b = 'a'::text)) OR ((a IS NULL)
AND (b = 'b'::text)))
Access method: heap

The constraint expressions seem to come out correctly, though I
haven't checked your implementation closely yet.

2. In the existing code, NULL values were handled differently. It was
not added to the 'datums' variable, rather used to store the partition
index directly in the 'null_index' variable. Now there is a
possibility of multiple NULL values, hence introducing a new member
'isnulls' in the 'PartitionBoundInfoData' struct which indicates
whether the corresponding element in the 'datums' is NULL. Now
'null_index' cannot be used directly to store the partition index, so
removed it and made the necessary changes in multiple places.

3. I have added test cases for 'create table' and 'insert' statements
related to multi-column list partitioning and these are working fine
with 'make check'.

4. Handled the partition pruning code to accommodate these changes for
single-column list partitioning. However it is pending for
multi-column list partitioning.

5. I have done necessary changes in partition wise join related code
to accommodate for single-column list partitioning. However it is
pending for multi-column list partitioning.

Kindly review the patch and let me know if any changes are required.

The new list bound binary search and related comparison support
function look a bit too verbose to me. I was expecting
partition_list_bsearch() to look very much like
partition_range_datum_bsearch(), but that is not the case. The
special case code that you wrote in partition_list_bsearch() seems
unnecessary, at least in that function. I'm talking about the code
fragment starting with this comment:

/*
* Once we find the matching for the first column but if it does
not
* match for the any of the other columns, then the binary search
* will not work in all the cases. We should traverse just below
* and above the mid index until we find the match or we reach
the
* first mismatch.
*/

I guess you're perhaps trying to address the case where the caller
does not specify the values for all of the partition key columns,
which can happen when the partition pruning code needs to handle a set
of clauses matching only some of the partition key columns. But
that's a concern of the partition pruning code and so the special case
should be handled there (if at all), not in the binary search function
that is shared with other callers. Regarding that, I'm wondering if
we should require clauses matching all of the partition key columns to
be found for the pruning code to call the binary search, so do
something like get_matching_hash_bounds() does:

/*
* For hash partitioning we can only perform pruning based on equality
* clauses to the partition key or IS NULL clauses. We also can only
* prune if we got values for all keys.
*/
if (nvalues + bms_num_members(nullkeys) == partnatts)
{
/* code to compute matching hash bound offset */
}
else
{
/* Report all valid offsets into the boundinfo->indexes array. */
result->bound_offsets = bms_add_range(NULL, 0,
boundinfo->nindexes - 1);
}

Do you think that trying to match list partitions even with fewer keys
is worth the complexity of the implementation? That is, is the use
case to search for only a subset of partition key columns common
enough with list partitioning?

If we do decide to implement the special case, remember that to do
that efficiently, we'd need to require that the subset of matched key
columns constitutes a prefix, because of the way the datums are
sorted. That is, match all partitions when the query only contains a
clause for b when the partition key is (a, b, c), but engage the
special case of pruning if the query contains clauses for a, or for a
and b.

I will look at other parts of the patch next week hopefully. For
now, attached is a delta patch that applies on top of your v1, which
does:

* Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
* Make qsort_partition_list_value_cmp simply call
partition_lbound_datum_cmp() instead of having its own logic to
compare input bounds
* Move partition_lbound_datum_cmp() into partbounds.c as a static
function (export seems unnecessary)
* Add a comment for PartitionBoundInfo.isnulls and remove that for
null_index

--
Amit Langote
EDB: http://www.enterprisedb.com

Hi, Amit:

+ * isnulls is an array of boolean-tuples with key->partnatts booleans
values
+ * each.  Currently only used for list partitioning, it stores whether a

I think 'booleans' should be 'boolean'.
The trailing word 'each' is unnecessary.

Cheers

#12Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Zhihong Yu (#11)
Re: Multi-Column List Partitioning

The new list bound binary search and related comparison support
function look a bit too verbose to me. I was expecting
partition_list_bsearch() to look very much like
partition_range_datum_bsearch(), but that is not the case. The
special case code that you wrote in partition_list_bsearch() seems
unnecessary, at least in that function. I'm talking about the code
fragment starting with this comment:

I will look at other parts of the patch next week hopefully. For
now, attached is a delta patch that applies on top of your v1, which
does:

* Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
* Make qsort_partition_list_value_cmp simply call
partition_lbound_datum_cmp() instead of having its own logic to
compare input bounds
* Move partition_lbound_datum_cmp() into partbounds.c as a static
function (export seems unnecessary)
* Add a comment for PartitionBoundInfo.isnulls and remove that for null_index

Yes. You are right. The extra code added in partition_list_bsearch()
is not required and thanks for sharing the delta patch. It looks good
to me and I have incorporated the changes in the attached patch.

I guess you're perhaps trying to address the case where the caller
does not specify the values for all of the partition key columns,
which can happen when the partition pruning code needs to handle a set
of clauses matching only some of the partition key columns. But
that's a concern of the partition pruning code and so the special case
should be handled there (if at all), not in the binary search function
that is shared with other callers. Regarding that, I'm wondering if
we should require clauses matching all of the partition key columns to
be found for the pruning code to call the binary search, so do
something like get_matching_hash_bounds() does:

Do you think that trying to match list partitions even with fewer keys
is worth the complexity of the implementation? That is, is the use
case to search for only a subset of partition key columns common
enough with list partitioning?

If we do decide to implement the special case, remember that to do
that efficiently, we'd need to require that the subset of matched key
columns constitutes a prefix, because of the way the datums are
sorted. That is, match all partitions when the query only contains a
clause for b when the partition key is (a, b, c), but engage the
special case of pruning if the query contains clauses for a, or for a
and b.

Thanks for the suggestion. Below is the implementation details for the
partition pruning for multi column list partitioning.

In the existing code (For single column list partitioning)
1. In gen_partprune_steps_internal(), we try to match the where
clauses provided by the user with the partition key data using
match_clause_to_partition_key(). Based on the match, this function can
return many values like PARTCLAUSE_MATCH_CLAUSE,
PARTCLAUSE_MATCH_NULLNESS, PARTCLAUSE_NOMATCH, etc.
2. In case of PARTCLAUSE_MATCH_CLAUSE, we generate steps using
gen_prune_steps_from_opexps() (strategy-2) which generate and return a
list of PartitionPruneStepOp that are based on OpExpr and BooleanTest
clauses that have been matched to the partition key and it also takes
care handling prefix of the partition keys.
3. In case of PARTCLAUSE_MATCH_NULLNESS, we generate steps using
gen_prune_step_op() (strategy-1) which generates single
PartitionPruneStepOp since the earlier list partitioning supports
single column and there can be only one NULL value. In
get_matching_list_bounds(), if the nullkeys is not empty, we fetch the
partition index which accepts null and we used to return from here.

In case of multi column list partitioning, we have columns more than
one and hence there is a possibility of more than one NULL values in
the where clauses. The above mentioned steps are modified like below.

1. Modified the match_clause_to_partition_key() to generate an object
of PartClauseInfo structure and return PARTCLAUSE_MATCH_CLAUSE even in
case of clauses related to NULL. The information required to generate
PartClauseInfo is populated here like the constant expression
consisting of (Datum) 0, op_strategy, op_is_ne, etc.
2. Since I am returning PARTCLAUSE_MATCH_CLAUSE, now we use strategy-2
(gen_prune_steps_from_opexps) to generate partition pruning steps.
This function takes care of generating a list of pruning steps if
there are multiple clauses and also takes care of handling prefixes.
3. Modified perform_pruning_base_step() to generate the datum values
and isnulls data of the where clauses. In case if any of the key
contains NULL value then the corresponding datum value is 0.
4. Modified get_matching_list_bounds() to generate the minimum offset
and/or maximum offset of the matched values based on the difference
operation strategies. Now since the NULL containing bound values are
part of 'boundinfo', changed the code accordingly to include the NULL
containing partitions or not in different scenarios like
InvalidStrategy, etc.

I have done some cosmetic changes to
v1_multi_column_list_partitioning.patch. So all the above code changes
related to partition pruning are merged with the previous patch and
also included the delta patch shared by you. Hence sharing a single
patch.

Kindly have a look and share your thoughts.

Show quoted text

On Fri, Jun 11, 2021 at 10:57 PM Zhihong Yu <zyu@yugabyte.com> wrote:

On Thu, Jun 10, 2021 at 8:38 PM Amit Langote <amitlangote09@gmail.com> wrote:

Hi Nitin,

On Thu, Jun 3, 2021 at 11:45 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:

I'll wait for you to post a new patch addressing at least the comments
in my earlier email. Also, please make sure to run `make check`
successfully before posting the patch. :)

I have fixed all of the review comments given by you and Jeevan in the
attached patch and also the attached patch contains more changes
compared to the previous patch. Following are the implementation
details.

Thanks for the updated version.

1. Regarding syntax, the existing syntax will work fine for the
single-column list partitioning. However I have used the new syntax
for the multi-column list partitioning as we discussed earlier. I have
used a combination of 'AND' and 'OR' logic for the partition
constraints as given in the below example.

postgres@17503=#create table t(a int, b text) partition by list(a,b);
CREATE TABLE
postgres@17503=#create table t1 partition of t for values in ((1,'a'),
(NULL,'b'));
CREATE TABLE
postgres@17503=#\d+ t
Partitioned table "public.t"
Column | Type | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
a | integer | | | | plain |
| |
b | text | | | | extended |
| |
Partition key: LIST (a, b)
Partitions: t1 FOR VALUES IN ((1, 'a'), (NULL, 'b'))

postgres@17503=#\d+ t1
Table "public.t1"
Column | Type | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
a | integer | | | | plain |
| |
b | text | | | | extended |
| |
Partition of: t FOR VALUES IN ((1, 'a'), (NULL, 'b'))
Partition constraint: (((a = 1) AND (b = 'a'::text)) OR ((a IS NULL)
AND (b = 'b'::text)))
Access method: heap

The constraint expressions seem to come out correctly, though I
haven't checked your implementation closely yet.

2. In the existing code, NULL values were handled differently. It was
not added to the 'datums' variable, rather used to store the partition
index directly in the 'null_index' variable. Now there is a
possibility of multiple NULL values, hence introducing a new member
'isnulls' in the 'PartitionBoundInfoData' struct which indicates
whether the corresponding element in the 'datums' is NULL. Now
'null_index' cannot be used directly to store the partition index, so
removed it and made the necessary changes in multiple places.

3. I have added test cases for 'create table' and 'insert' statements
related to multi-column list partitioning and these are working fine
with 'make check'.

4. Handled the partition pruning code to accommodate these changes for
single-column list partitioning. However it is pending for
multi-column list partitioning.

5. I have done necessary changes in partition wise join related code
to accommodate for single-column list partitioning. However it is
pending for multi-column list partitioning.

Kindly review the patch and let me know if any changes are required.

The new list bound binary search and related comparison support
function look a bit too verbose to me. I was expecting
partition_list_bsearch() to look very much like
partition_range_datum_bsearch(), but that is not the case. The
special case code that you wrote in partition_list_bsearch() seems
unnecessary, at least in that function. I'm talking about the code
fragment starting with this comment:

/*
* Once we find the matching for the first column but if it does not
* match for the any of the other columns, then the binary search
* will not work in all the cases. We should traverse just below
* and above the mid index until we find the match or we reach the
* first mismatch.
*/

I guess you're perhaps trying to address the case where the caller
does not specify the values for all of the partition key columns,
which can happen when the partition pruning code needs to handle a set
of clauses matching only some of the partition key columns. But
that's a concern of the partition pruning code and so the special case
should be handled there (if at all), not in the binary search function
that is shared with other callers. Regarding that, I'm wondering if
we should require clauses matching all of the partition key columns to
be found for the pruning code to call the binary search, so do
something like get_matching_hash_bounds() does:

/*
* For hash partitioning we can only perform pruning based on equality
* clauses to the partition key or IS NULL clauses. We also can only
* prune if we got values for all keys.
*/
if (nvalues + bms_num_members(nullkeys) == partnatts)
{
/* code to compute matching hash bound offset */
}
else
{
/* Report all valid offsets into the boundinfo->indexes array. */
result->bound_offsets = bms_add_range(NULL, 0,
boundinfo->nindexes - 1);
}

Do you think that trying to match list partitions even with fewer keys
is worth the complexity of the implementation? That is, is the use
case to search for only a subset of partition key columns common
enough with list partitioning?

If we do decide to implement the special case, remember that to do
that efficiently, we'd need to require that the subset of matched key
columns constitutes a prefix, because of the way the datums are
sorted. That is, match all partitions when the query only contains a
clause for b when the partition key is (a, b, c), but engage the
special case of pruning if the query contains clauses for a, or for a
and b.

I will look at other parts of the patch next week hopefully. For
now, attached is a delta patch that applies on top of your v1, which
does:

* Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
* Make qsort_partition_list_value_cmp simply call
partition_lbound_datum_cmp() instead of having its own logic to
compare input bounds
* Move partition_lbound_datum_cmp() into partbounds.c as a static
function (export seems unnecessary)
* Add a comment for PartitionBoundInfo.isnulls and remove that for null_index

--
Amit Langote
EDB: http://www.enterprisedb.com

Hi, Amit:

+ * isnulls is an array of boolean-tuples with key->partnatts booleans values
+ * each.  Currently only used for list partitioning, it stores whether a

I think 'booleans' should be 'boolean'.
The trailing word 'each' is unnecessary.

Cheers

Attachments:

v2-0001-multi-column-list-partitioning.patchapplication/octet-stream; name=v2-0001-multi-column-list-partitioning.patchDownload+1752-338
#13Zhihong Yu
zyu@yugabyte.com
In reply to: Nitin Jadhav (#12)
Re: Multi-Column List Partitioning

On Wed, Aug 25, 2021 at 5:41 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com>
wrote:

The new list bound binary search and related comparison support
function look a bit too verbose to me. I was expecting
partition_list_bsearch() to look very much like
partition_range_datum_bsearch(), but that is not the case. The
special case code that you wrote in partition_list_bsearch() seems
unnecessary, at least in that function. I'm talking about the code
fragment starting with this comment:

I will look at other parts of the patch next week hopefully. For
now, attached is a delta patch that applies on top of your v1, which
does:

* Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
* Make qsort_partition_list_value_cmp simply call
partition_lbound_datum_cmp() instead of having its own logic to
compare input bounds
* Move partition_lbound_datum_cmp() into partbounds.c as a static
function (export seems unnecessary)
* Add a comment for PartitionBoundInfo.isnulls and remove that for

null_index

Yes. You are right. The extra code added in partition_list_bsearch()
is not required and thanks for sharing the delta patch. It looks good
to me and I have incorporated the changes in the attached patch.

I guess you're perhaps trying to address the case where the caller
does not specify the values for all of the partition key columns,
which can happen when the partition pruning code needs to handle a set
of clauses matching only some of the partition key columns. But
that's a concern of the partition pruning code and so the special case
should be handled there (if at all), not in the binary search function
that is shared with other callers. Regarding that, I'm wondering if
we should require clauses matching all of the partition key columns to
be found for the pruning code to call the binary search, so do
something like get_matching_hash_bounds() does:

Do you think that trying to match list partitions even with fewer keys
is worth the complexity of the implementation? That is, is the use
case to search for only a subset of partition key columns common
enough with list partitioning?

If we do decide to implement the special case, remember that to do
that efficiently, we'd need to require that the subset of matched key
columns constitutes a prefix, because of the way the datums are
sorted. That is, match all partitions when the query only contains a
clause for b when the partition key is (a, b, c), but engage the
special case of pruning if the query contains clauses for a, or for a
and b.

Thanks for the suggestion. Below is the implementation details for the
partition pruning for multi column list partitioning.

In the existing code (For single column list partitioning)
1. In gen_partprune_steps_internal(), we try to match the where
clauses provided by the user with the partition key data using
match_clause_to_partition_key(). Based on the match, this function can
return many values like PARTCLAUSE_MATCH_CLAUSE,
PARTCLAUSE_MATCH_NULLNESS, PARTCLAUSE_NOMATCH, etc.
2. In case of PARTCLAUSE_MATCH_CLAUSE, we generate steps using
gen_prune_steps_from_opexps() (strategy-2) which generate and return a
list of PartitionPruneStepOp that are based on OpExpr and BooleanTest
clauses that have been matched to the partition key and it also takes
care handling prefix of the partition keys.
3. In case of PARTCLAUSE_MATCH_NULLNESS, we generate steps using
gen_prune_step_op() (strategy-1) which generates single
PartitionPruneStepOp since the earlier list partitioning supports
single column and there can be only one NULL value. In
get_matching_list_bounds(), if the nullkeys is not empty, we fetch the
partition index which accepts null and we used to return from here.

In case of multi column list partitioning, we have columns more than
one and hence there is a possibility of more than one NULL values in
the where clauses. The above mentioned steps are modified like below.

1. Modified the match_clause_to_partition_key() to generate an object
of PartClauseInfo structure and return PARTCLAUSE_MATCH_CLAUSE even in
case of clauses related to NULL. The information required to generate
PartClauseInfo is populated here like the constant expression
consisting of (Datum) 0, op_strategy, op_is_ne, etc.
2. Since I am returning PARTCLAUSE_MATCH_CLAUSE, now we use strategy-2
(gen_prune_steps_from_opexps) to generate partition pruning steps.
This function takes care of generating a list of pruning steps if
there are multiple clauses and also takes care of handling prefixes.
3. Modified perform_pruning_base_step() to generate the datum values
and isnulls data of the where clauses. In case if any of the key
contains NULL value then the corresponding datum value is 0.
4. Modified get_matching_list_bounds() to generate the minimum offset
and/or maximum offset of the matched values based on the difference
operation strategies. Now since the NULL containing bound values are
part of 'boundinfo', changed the code accordingly to include the NULL
containing partitions or not in different scenarios like
InvalidStrategy, etc.

I have done some cosmetic changes to
v1_multi_column_list_partitioning.patch. So all the above code changes
related to partition pruning are merged with the previous patch and
also included the delta patch shared by you. Hence sharing a single
patch.

Kindly have a look and share your thoughts.

Hi,

bq. Supported new syantx to allow mentioning multiple key information.

syantx -> syntax

+       isDuplicate = checkForDuplicates(result, values);
+       if (isDuplicate)
+           continue;

It seems the variable isDuplicate is not needed. The if statement can
directly check the return value from checkForDuplicates().

+       //TODO: Handle for multi-column cases
+       for (j = 0; j < 1; j++)

Is this part going to be updated in the next patch?

Cheers

#14Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Zhihong Yu (#13)
Re: Multi-Column List Partitioning
+ * isnulls is an array of boolean-tuples with key->partnatts booleans values
+ * each.  Currently only used for list partitioning, it stores whether a

I think 'booleans' should be 'boolean'.
The trailing word 'each' is unnecessary.

bq. Supported new syantx to allow mentioning multiple key information.

syantx -> syntax

+       isDuplicate = checkForDuplicates(result, values);
+       if (isDuplicate)
+           continue;

It seems the variable isDuplicate is not needed. The if statement can directly check the return value from checkForDuplicates().

I agree that isDuplicate is not required.
Thanks for sharing the comments. I will take care of these comments in
the next patch.

+       //TODO: Handle for multi-column cases
+       for (j = 0; j < 1; j++)

Is this part going to be updated in the next patch?

Yes. The code changes related to partition-wise join are in progress.
I will handle these in the next patch.

Thanks & Regards,
Nitin Jadhav

Show quoted text

On Thu, Aug 26, 2021 at 2:40 AM Zhihong Yu <zyu@yugabyte.com> wrote:

On Wed, Aug 25, 2021 at 5:41 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:

The new list bound binary search and related comparison support
function look a bit too verbose to me. I was expecting
partition_list_bsearch() to look very much like
partition_range_datum_bsearch(), but that is not the case. The
special case code that you wrote in partition_list_bsearch() seems
unnecessary, at least in that function. I'm talking about the code
fragment starting with this comment:

I will look at other parts of the patch next week hopefully. For
now, attached is a delta patch that applies on top of your v1, which
does:

* Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
* Make qsort_partition_list_value_cmp simply call
partition_lbound_datum_cmp() instead of having its own logic to
compare input bounds
* Move partition_lbound_datum_cmp() into partbounds.c as a static
function (export seems unnecessary)
* Add a comment for PartitionBoundInfo.isnulls and remove that for null_index

Yes. You are right. The extra code added in partition_list_bsearch()
is not required and thanks for sharing the delta patch. It looks good
to me and I have incorporated the changes in the attached patch.

I guess you're perhaps trying to address the case where the caller
does not specify the values for all of the partition key columns,
which can happen when the partition pruning code needs to handle a set
of clauses matching only some of the partition key columns. But
that's a concern of the partition pruning code and so the special case
should be handled there (if at all), not in the binary search function
that is shared with other callers. Regarding that, I'm wondering if
we should require clauses matching all of the partition key columns to
be found for the pruning code to call the binary search, so do
something like get_matching_hash_bounds() does:

Do you think that trying to match list partitions even with fewer keys
is worth the complexity of the implementation? That is, is the use
case to search for only a subset of partition key columns common
enough with list partitioning?

If we do decide to implement the special case, remember that to do
that efficiently, we'd need to require that the subset of matched key
columns constitutes a prefix, because of the way the datums are
sorted. That is, match all partitions when the query only contains a
clause for b when the partition key is (a, b, c), but engage the
special case of pruning if the query contains clauses for a, or for a
and b.

Thanks for the suggestion. Below is the implementation details for the
partition pruning for multi column list partitioning.

In the existing code (For single column list partitioning)
1. In gen_partprune_steps_internal(), we try to match the where
clauses provided by the user with the partition key data using
match_clause_to_partition_key(). Based on the match, this function can
return many values like PARTCLAUSE_MATCH_CLAUSE,
PARTCLAUSE_MATCH_NULLNESS, PARTCLAUSE_NOMATCH, etc.
2. In case of PARTCLAUSE_MATCH_CLAUSE, we generate steps using
gen_prune_steps_from_opexps() (strategy-2) which generate and return a
list of PartitionPruneStepOp that are based on OpExpr and BooleanTest
clauses that have been matched to the partition key and it also takes
care handling prefix of the partition keys.
3. In case of PARTCLAUSE_MATCH_NULLNESS, we generate steps using
gen_prune_step_op() (strategy-1) which generates single
PartitionPruneStepOp since the earlier list partitioning supports
single column and there can be only one NULL value. In
get_matching_list_bounds(), if the nullkeys is not empty, we fetch the
partition index which accepts null and we used to return from here.

In case of multi column list partitioning, we have columns more than
one and hence there is a possibility of more than one NULL values in
the where clauses. The above mentioned steps are modified like below.

1. Modified the match_clause_to_partition_key() to generate an object
of PartClauseInfo structure and return PARTCLAUSE_MATCH_CLAUSE even in
case of clauses related to NULL. The information required to generate
PartClauseInfo is populated here like the constant expression
consisting of (Datum) 0, op_strategy, op_is_ne, etc.
2. Since I am returning PARTCLAUSE_MATCH_CLAUSE, now we use strategy-2
(gen_prune_steps_from_opexps) to generate partition pruning steps.
This function takes care of generating a list of pruning steps if
there are multiple clauses and also takes care of handling prefixes.
3. Modified perform_pruning_base_step() to generate the datum values
and isnulls data of the where clauses. In case if any of the key
contains NULL value then the corresponding datum value is 0.
4. Modified get_matching_list_bounds() to generate the minimum offset
and/or maximum offset of the matched values based on the difference
operation strategies. Now since the NULL containing bound values are
part of 'boundinfo', changed the code accordingly to include the NULL
containing partitions or not in different scenarios like
InvalidStrategy, etc.

I have done some cosmetic changes to
v1_multi_column_list_partitioning.patch. So all the above code changes
related to partition pruning are merged with the previous patch and
also included the delta patch shared by you. Hence sharing a single
patch.

Kindly have a look and share your thoughts.

Hi,

bq. Supported new syantx to allow mentioning multiple key information.

syantx -> syntax

+       isDuplicate = checkForDuplicates(result, values);
+       if (isDuplicate)
+           continue;

It seems the variable isDuplicate is not needed. The if statement can directly check the return value from checkForDuplicates().

+       //TODO: Handle for multi-column cases
+       for (j = 0; j < 1; j++)

Is this part going to be updated in the next patch?

Cheers

#15Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com
In reply to: Nitin Jadhav (#14)
Re: Multi-Column List Partitioning

Hi Nitin.

I have been testing these patches. Patches applied cleanly on the head.
While testing I found below a case where update row movement is not working
properly.
Please find the test case below.

postgres=# create table p0 (a int, b text, c bool) partition by list
(a,b,c);
CREATE TABLE
postgres=# create table p01 partition of p0 for values in ((1,1,true));
CREATE TABLE
postgres=# create table p02 partition of p0 for values in ((1,NULL,false));
CREATE TABLE
postgres=# insert into p0 values (1,'1',true);
INSERT 0 1
postgres=# insert into p0 values (1,NULL,false);
INSERT 0 1
postgres=# select tableoid::regclass,* from p0;
tableoid | a | b | c
----------+---+---+---
p01 | 1 | 1 | t
p02 | 1 | | f
(2 rows)

postgres=# update p0 set b = NULL;
UPDATE 2
postgres=# select tableoid::regclass,* from p0;
tableoid | a | b | c
----------+---+---+---
p01 | 1 | | t
p02 | 1 | | f
(2 rows)

I think this update should fail as there is no partition satisfying update
row (1,NULL,true).

Thanks & Regards,
Rajkumar Raghuwanshi

On Fri, Aug 27, 2021 at 12:53 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com>
wrote:

Show quoted text

+ * isnulls is an array of boolean-tuples with key->partnatts booleans

values

+ * each. Currently only used for list partitioning, it stores whether a

I think 'booleans' should be 'boolean'.
The trailing word 'each' is unnecessary.

bq. Supported new syantx to allow mentioning multiple key information.

syantx -> syntax

+       isDuplicate = checkForDuplicates(result, values);
+       if (isDuplicate)
+           continue;

It seems the variable isDuplicate is not needed. The if statement can

directly check the return value from checkForDuplicates().

I agree that isDuplicate is not required.
Thanks for sharing the comments. I will take care of these comments in
the next patch.

+       //TODO: Handle for multi-column cases
+       for (j = 0; j < 1; j++)

Is this part going to be updated in the next patch?

Yes. The code changes related to partition-wise join are in progress.
I will handle these in the next patch.

Thanks & Regards,
Nitin Jadhav

On Thu, Aug 26, 2021 at 2:40 AM Zhihong Yu <zyu@yugabyte.com> wrote:

On Wed, Aug 25, 2021 at 5:41 AM Nitin Jadhav <

nitinjadhavpostgres@gmail.com> wrote:

The new list bound binary search and related comparison support
function look a bit too verbose to me. I was expecting
partition_list_bsearch() to look very much like
partition_range_datum_bsearch(), but that is not the case. The
special case code that you wrote in partition_list_bsearch() seems
unnecessary, at least in that function. I'm talking about the code
fragment starting with this comment:

I will look at other parts of the patch next week hopefully. For
now, attached is a delta patch that applies on top of your v1, which
does:

* Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
* Make qsort_partition_list_value_cmp simply call
partition_lbound_datum_cmp() instead of having its own logic to
compare input bounds
* Move partition_lbound_datum_cmp() into partbounds.c as a static
function (export seems unnecessary)
* Add a comment for PartitionBoundInfo.isnulls and remove that for

null_index

Yes. You are right. The extra code added in partition_list_bsearch()
is not required and thanks for sharing the delta patch. It looks good
to me and I have incorporated the changes in the attached patch.

I guess you're perhaps trying to address the case where the caller
does not specify the values for all of the partition key columns,
which can happen when the partition pruning code needs to handle a set
of clauses matching only some of the partition key columns. But
that's a concern of the partition pruning code and so the special case
should be handled there (if at all), not in the binary search function
that is shared with other callers. Regarding that, I'm wondering if
we should require clauses matching all of the partition key columns to
be found for the pruning code to call the binary search, so do
something like get_matching_hash_bounds() does:

Do you think that trying to match list partitions even with fewer keys
is worth the complexity of the implementation? That is, is the use
case to search for only a subset of partition key columns common
enough with list partitioning?

If we do decide to implement the special case, remember that to do
that efficiently, we'd need to require that the subset of matched key
columns constitutes a prefix, because of the way the datums are
sorted. That is, match all partitions when the query only contains a
clause for b when the partition key is (a, b, c), but engage the
special case of pruning if the query contains clauses for a, or for a
and b.

Thanks for the suggestion. Below is the implementation details for the
partition pruning for multi column list partitioning.

In the existing code (For single column list partitioning)
1. In gen_partprune_steps_internal(), we try to match the where
clauses provided by the user with the partition key data using
match_clause_to_partition_key(). Based on the match, this function can
return many values like PARTCLAUSE_MATCH_CLAUSE,
PARTCLAUSE_MATCH_NULLNESS, PARTCLAUSE_NOMATCH, etc.
2. In case of PARTCLAUSE_MATCH_CLAUSE, we generate steps using
gen_prune_steps_from_opexps() (strategy-2) which generate and return a
list of PartitionPruneStepOp that are based on OpExpr and BooleanTest
clauses that have been matched to the partition key and it also takes
care handling prefix of the partition keys.
3. In case of PARTCLAUSE_MATCH_NULLNESS, we generate steps using
gen_prune_step_op() (strategy-1) which generates single
PartitionPruneStepOp since the earlier list partitioning supports
single column and there can be only one NULL value. In
get_matching_list_bounds(), if the nullkeys is not empty, we fetch the
partition index which accepts null and we used to return from here.

In case of multi column list partitioning, we have columns more than
one and hence there is a possibility of more than one NULL values in
the where clauses. The above mentioned steps are modified like below.

1. Modified the match_clause_to_partition_key() to generate an object
of PartClauseInfo structure and return PARTCLAUSE_MATCH_CLAUSE even in
case of clauses related to NULL. The information required to generate
PartClauseInfo is populated here like the constant expression
consisting of (Datum) 0, op_strategy, op_is_ne, etc.
2. Since I am returning PARTCLAUSE_MATCH_CLAUSE, now we use strategy-2
(gen_prune_steps_from_opexps) to generate partition pruning steps.
This function takes care of generating a list of pruning steps if
there are multiple clauses and also takes care of handling prefixes.
3. Modified perform_pruning_base_step() to generate the datum values
and isnulls data of the where clauses. In case if any of the key
contains NULL value then the corresponding datum value is 0.
4. Modified get_matching_list_bounds() to generate the minimum offset
and/or maximum offset of the matched values based on the difference
operation strategies. Now since the NULL containing bound values are
part of 'boundinfo', changed the code accordingly to include the NULL
containing partitions or not in different scenarios like
InvalidStrategy, etc.

I have done some cosmetic changes to
v1_multi_column_list_partitioning.patch. So all the above code changes
related to partition pruning are merged with the previous patch and
also included the delta patch shared by you. Hence sharing a single
patch.

Kindly have a look and share your thoughts.

Hi,

bq. Supported new syantx to allow mentioning multiple key information.

syantx -> syntax

+       isDuplicate = checkForDuplicates(result, values);
+       if (isDuplicate)
+           continue;

It seems the variable isDuplicate is not needed. The if statement can

directly check the return value from checkForDuplicates().

+       //TODO: Handle for multi-column cases
+       for (j = 0; j < 1; j++)

Is this part going to be updated in the next patch?

Cheers

#16Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Rajkumar Raghuwanshi (#15)
Re: Multi-Column List Partitioning

On Mon, Aug 30, 2021 at 4:51 PM Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:

Hi Nitin.

I have been testing these patches. Patches applied cleanly on the head. While testing I found below a case where update row movement is not working properly.
Please find the test case below.

postgres=# create table p0 (a int, b text, c bool) partition by list (a,b,c);
CREATE TABLE
postgres=# create table p01 partition of p0 for values in ((1,1,true));
CREATE TABLE
postgres=# create table p02 partition of p0 for values in ((1,NULL,false));
CREATE TABLE
postgres=# insert into p0 values (1,'1',true);
INSERT 0 1
postgres=# insert into p0 values (1,NULL,false);
INSERT 0 1
postgres=# select tableoid::regclass,* from p0;
tableoid | a | b | c
----------+---+---+---
p01 | 1 | 1 | t
p02 | 1 | | f
(2 rows)

postgres=# update p0 set b = NULL;
UPDATE 2
postgres=# select tableoid::regclass,* from p0;
tableoid | a | b | c
----------+---+---+---
p01 | 1 | | t
p02 | 1 | | f
(2 rows)

I think this update should fail as there is no partition satisfying update row (1,NULL,true).

Yeah, contrary to my earlier assessment, it seems the partition
constraint on each of those partitions fails to explicitly include an
IS NOT NULL test for each column that has a non-NULL value assigned.
So, for example, the constraint of p01 should actually be:

(a IS NOT NULL) AND (a = 1) AND (b IS NOT NULL) AND (b = 1) AND (c IS
NOT NULL) AND (c = true)

As per the patch's current implementation, tuple (1, NULL, true)
passes p01's partition constraint, because only (b = 1) is not
sufficient to reject a NULL value being assigned to b.

--
Amit Langote
EDB: http://www.enterprisedb.com

#17Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Amit Langote (#16)
Re: Multi-Column List Partitioning

I have been testing these patches. Patches applied cleanly on the head. While testing I found below a case where update row movement is not working properly.
Please find the test case below.

Thanks for testing and sharing the details of the issue.

Yeah, contrary to my earlier assessment, it seems the partition
constraint on each of those partitions fails to explicitly include an
IS NOT NULL test for each column that has a non-NULL value assigned.
So, for example, the constraint of p01 should actually be:

(a IS NOT NULL) AND (a = 1) AND (b IS NOT NULL) AND (b = 1) AND (c IS
NOT NULL) AND (c = true)

Yes. It should add an IS NOT NULL test for each column. I have
modified the patch accordingly and verified with the test case shared
by Rajkumar.

+ * isnulls is an array of boolean-tuples with key->partnatts booleans values
+ * each.  Currently only used for list partitioning, it stores whether a

I think 'booleans' should be 'boolean'.
The trailing word 'each' is unnecessary.

bq. Supported new syantx to allow mentioning multiple key information.

syantx -> syntax

+       isDuplicate = checkForDuplicates(result, values);
+       if (isDuplicate)
+           continue;

It seems the variable isDuplicate is not needed. The if statement can directly check the return value from checkForDuplicates().

The attached patch also fixes the above comments.

Thanks & Regards,
Nitin Jadhav

Show quoted text

On Tue, Aug 31, 2021 at 9:36 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Aug 30, 2021 at 4:51 PM Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:

Hi Nitin.

I have been testing these patches. Patches applied cleanly on the head. While testing I found below a case where update row movement is not working properly.
Please find the test case below.

postgres=# create table p0 (a int, b text, c bool) partition by list (a,b,c);
CREATE TABLE
postgres=# create table p01 partition of p0 for values in ((1,1,true));
CREATE TABLE
postgres=# create table p02 partition of p0 for values in ((1,NULL,false));
CREATE TABLE
postgres=# insert into p0 values (1,'1',true);
INSERT 0 1
postgres=# insert into p0 values (1,NULL,false);
INSERT 0 1
postgres=# select tableoid::regclass,* from p0;
tableoid | a | b | c
----------+---+---+---
p01 | 1 | 1 | t
p02 | 1 | | f
(2 rows)

postgres=# update p0 set b = NULL;
UPDATE 2
postgres=# select tableoid::regclass,* from p0;
tableoid | a | b | c
----------+---+---+---
p01 | 1 | | t
p02 | 1 | | f
(2 rows)

I think this update should fail as there is no partition satisfying update row (1,NULL,true).

Yeah, contrary to my earlier assessment, it seems the partition
constraint on each of those partitions fails to explicitly include an
IS NOT NULL test for each column that has a non-NULL value assigned.
So, for example, the constraint of p01 should actually be:

(a IS NOT NULL) AND (a = 1) AND (b IS NOT NULL) AND (b = 1) AND (c IS
NOT NULL) AND (c = true)

As per the patch's current implementation, tuple (1, NULL, true)
passes p01's partition constraint, because only (b = 1) is not
sufficient to reject a NULL value being assigned to b.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v3-0001-multi-column-list-partitioning.patchapplication/x-patch; name=v3-0001-multi-column-list-partitioning.patchDownload+1774-338
#18Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Nitin Jadhav (#17)
Re: Multi-Column List Partitioning

Hi Nitin,

On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:

The attached patch also fixes the above comments.

I noticed that multi-column list partitions containing NULLs don't
work correctly with partition pruning yet.

create table p0 (a int, b text, c bool) partition by list (a, b, c);
create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
create table p02 partition of p0 for values in ((1, NULL, false));
explain select * from p0 where a is null;
QUERY PLAN
--------------------------------------------------------
Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
Filter: (a IS NULL)
(2 rows)

I guess that may be due to the following newly added code being incomplete:

+/*
+ * get_partition_bound_null_index
+ *
+ * Returns the partition index of the partition bound which accepts NULL.
+ */
+int
+get_partition_bound_null_index(PartitionBoundInfo boundinfo)
+{
+   int i = 0;
+   int j = 0;
+
+   if (!boundinfo->isnulls)
+       return -1;
-           if (!val->constisnull)
-               count++;
+   for (i = 0; i < boundinfo->ndatums; i++)
+   {
+       //TODO: Handle for multi-column cases
+       for (j = 0; j < 1; j++)
+       {
+           if (boundinfo->isnulls[i][j])
+               return boundinfo->indexes[i];
        }
    }

+ return -1;
+}

Maybe this function needs to return a "bitmapset" of indexes, because
multiple partitions can now contain NULL values.

Some other issues I noticed and suggestions for improvement:

+/*
+ * checkForDuplicates
+ *
+ * Returns TRUE if the list bound element is already present in the list of
+ * list bounds, FALSE otherwise.
+ */
+static bool
+checkForDuplicates(List *source, List *searchElem)

This function name may be too generic. Given that it is specific to
implementing list bound de-duplication, maybe the following signature
is more appropriate:

static bool
checkListBoundDuplicated(List *list_bounds, List *new_bound)

Also, better if the function comment mentions those parameter names, like:

"Returns TRUE if the list bound element 'new_bound' is already present
in the target list 'list_bounds', FALSE otherwise."

+/*
+ * transformPartitionListBounds
+ *
+ * Converts the expressions of list partition bounds from the raw grammar
+ * representation.

A sentence about the result format would be helpful, like:

The result is a List of Lists of Const nodes to account for the
partition key possibly containing more than one column.

+ int i = 0;
+ int j = 0;

Better to initialize such loop counters closer to the loop.

+           colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
+           colname[i] = get_attname(RelationGetRelid(parent),
+                                    key->partattrs[i], false);

The palloc in the 1st statement is wasteful, because the 2nd statement
overwrites its pointer by the pointer to the string palloc'd by
get_attname().

+ ListCell *cell2 = NULL;

No need to explicitly initialize the loop variable.

+           RowExpr     *rowexpr = NULL;
+
+           if (!IsA(expr, RowExpr))
+               ereport(ERROR,
+                       (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+                       errmsg("Invalid list bound specification"),
+                       parser_errposition(pstate, exprLocation((Node
*) spec))));
+
+           rowexpr = (RowExpr *) expr;

It's okay to assign rowexpr at the top here instead of the dummy
NULL-initialization and write the condition as:

if (!IsA(rowexpr, RowExpr))

+       if (isDuplicate)
+           continue;
+
+       result = lappend(result, values);

I can see you copied this style from the existing code, but how about
writing this simply as:

if (!isDuplicate)
result = lappend(result, values);

-/* One value coming from some (index'th) list partition */
+/* One bound of a list partition */
 typedef struct PartitionListValue
 {
    int         index;
-   Datum       value;
+   Datum      *values;
+   bool       *isnulls;
 } PartitionListValue;

Given that this is a locally-defined struct, I wonder if it makes
sense to rename the struct while we're at it. Call it, say,
PartitionListBound?

Also, please keep part of the existing comment that says that the
bound belongs to index'th partition.

Will send more comments in a bit...

--
Amit Langote
EDB: http://www.enterprisedb.com

#19Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#18)
Re: Multi-Column List Partitioning

On Wed, Sep 1, 2021 at 2:31 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:

The attached patch also fixes the above comments.

I noticed that multi-column list partitions containing NULLs don't
work correctly with partition pruning yet.

create table p0 (a int, b text, c bool) partition by list (a, b, c);
create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
create table p02 partition of p0 for values in ((1, NULL, false));
explain select * from p0 where a is null;
QUERY PLAN
--------------------------------------------------------
Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
Filter: (a IS NULL)
(2 rows)

I guess that may be due to the following newly added code being incomplete:

+/*
+ * get_partition_bound_null_index
+ *
+ * Returns the partition index of the partition bound which accepts NULL.
+ */
+int
+get_partition_bound_null_index(PartitionBoundInfo boundinfo)
+{
+   int i = 0;
+   int j = 0;
+
+   if (!boundinfo->isnulls)
+       return -1;
-           if (!val->constisnull)
-               count++;
+   for (i = 0; i < boundinfo->ndatums; i++)
+   {
+       //TODO: Handle for multi-column cases
+       for (j = 0; j < 1; j++)
+       {
+           if (boundinfo->isnulls[i][j])
+               return boundinfo->indexes[i];
}
}

+ return -1;
+}

Maybe this function needs to return a "bitmapset" of indexes, because
multiple partitions can now contain NULL values.

Some other issues I noticed and suggestions for improvement:

+/*
+ * checkForDuplicates
+ *
+ * Returns TRUE if the list bound element is already present in the list of
+ * list bounds, FALSE otherwise.
+ */
+static bool
+checkForDuplicates(List *source, List *searchElem)

This function name may be too generic. Given that it is specific to
implementing list bound de-duplication, maybe the following signature
is more appropriate:

static bool
checkListBoundDuplicated(List *list_bounds, List *new_bound)

Also, better if the function comment mentions those parameter names, like:

"Returns TRUE if the list bound element 'new_bound' is already present
in the target list 'list_bounds', FALSE otherwise."

+/*
+ * transformPartitionListBounds
+ *
+ * Converts the expressions of list partition bounds from the raw grammar
+ * representation.

A sentence about the result format would be helpful, like:

The result is a List of Lists of Const nodes to account for the
partition key possibly containing more than one column.

+ int i = 0;
+ int j = 0;

Better to initialize such loop counters closer to the loop.

+           colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
+           colname[i] = get_attname(RelationGetRelid(parent),
+                                    key->partattrs[i], false);

The palloc in the 1st statement is wasteful, because the 2nd statement
overwrites its pointer by the pointer to the string palloc'd by
get_attname().

+ ListCell *cell2 = NULL;

No need to explicitly initialize the loop variable.

+           RowExpr     *rowexpr = NULL;
+
+           if (!IsA(expr, RowExpr))
+               ereport(ERROR,
+                       (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+                       errmsg("Invalid list bound specification"),
+                       parser_errposition(pstate, exprLocation((Node
*) spec))));
+
+           rowexpr = (RowExpr *) expr;

It's okay to assign rowexpr at the top here instead of the dummy
NULL-initialization and write the condition as:

if (!IsA(rowexpr, RowExpr))

+       if (isDuplicate)
+           continue;
+
+       result = lappend(result, values);

I can see you copied this style from the existing code, but how about
writing this simply as:

if (!isDuplicate)
result = lappend(result, values);

-/* One value coming from some (index'th) list partition */
+/* One bound of a list partition */
typedef struct PartitionListValue
{
int         index;
-   Datum       value;
+   Datum      *values;
+   bool       *isnulls;
} PartitionListValue;

Given that this is a locally-defined struct, I wonder if it makes
sense to rename the struct while we're at it. Call it, say,
PartitionListBound?

Also, please keep part of the existing comment that says that the
bound belongs to index'th partition.

Will send more comments in a bit...

+ * partition_bound_accepts_nulls
+ *
+ * Returns TRUE if partition bound has NULL value, FALSE otherwise.
  */

I suggest slight rewording, as follows:

"Returns TRUE if any of the partition bounds contains a NULL value,
FALSE otherwise."

-   PartitionListValue *all_values;
+   PartitionListValue **all_values;
...
-   all_values = (PartitionListValue *)
-       palloc(ndatums * sizeof(PartitionListValue));
+   ndatums = get_list_datum_count(boundspecs, nparts);
+   all_values = (PartitionListValue **)
+       palloc(ndatums * sizeof(PartitionListValue *));

I don't see the need to redefine all_values's pointer type. No need
to palloc PartitionListValue repeatedly for every datum as done
further down as follows:

+ all_values[j] = (PartitionListValue *)
palloc(sizeof(PartitionListValue));

You do need the following two though:

+           all_values[j]->values = (Datum *) palloc0(key->partnatts *
sizeof(Datum));
+           all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
sizeof(bool));

If you change the above the way I suggest, you'd also need to revert
the following change:

-   qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
+   qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
              qsort_partition_list_value_cmp, (void *) key);
+       int         orig_index = all_values[i]->index;
+       boundinfo->datums[i] = (Datum *) palloc(key->partnatts * sizeof(Datum));

Missing a newline between these two statements.

BTW, I noticed that the boundDatums variable is no longer used in
create_list_bounds. I traced back its origin and found that a recent
commit 53d86957e98 introduced it to implement an idea to reduce the
finer-grained pallocs that were being done in create_list_bounds(). I
don't think that this patch needs to throw away that work. You can
make it work as the attached delta patch that applies on top of v3.
Please check.

@@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
*parttyplen, bool *parttypbyval,
if (b1->nindexes != b2->nindexes)
return false;

-   if (b1->null_index != b2->null_index)
+   if (get_partition_bound_null_index(b1) !=
get_partition_bound_null_index(b2))

As mentioned in the last message, this bit in partition_bounds_equal()
needs to be comparing "bitmapsets" of null bound indexes, that is
after fixing get_partition_bound_null_index() as previously mentioned.

But...

@@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
*parttyplen, bool *parttypbyval,
                 * context.  datumIsEqual() should be simple enough to be
                 * safe.
                 */
-               if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
+               if (b1->isnulls)
+                   b1_isnull = b1->isnulls[i][j];
+               if (b2->isnulls)
+                   b2_isnull = b2->isnulls[i][j];
+
+               /*
+                * If any of the partition bound has NULL value, then check
+                * equality for the NULL value instead of comparing the datums
+                * as it does not contain valid value in case of NULL.
+                */
+               if (b1_isnull || b2_isnull)
+               {
+                   if (b1_isnull != b2_isnull)
+                       return false;
+               }

...if you have this in the main loop, I don't think we need the above
code stanza which appears to implement a short-cut for this long-form
logic.

+               (key->strategy != PARTITION_STRATEGY_LIST ||
+                !src->isnulls[i][j]))

I think it's better to write this condition as follows just like the
accompanying condition involving src->kind:

(src->nulls == NULL || !src->isnulls[i][j])

(Skipped looking at merge_list_bounds() and related changes for now as
I see a lot of TODOs remain to be done.)

In check_new_partition_bound():

+                       Datum      *values = (Datum *)
palloc0(key->partnatts * sizeof(Datum));
+                       bool       *isnulls = (bool *)
palloc0(key->partnatts * sizeof(bool));

Doesn't seem like a bad idea to declare these as:

Datum values[PARTITION_MAX_KEYS];
bool isnulls[PARTITION_MAX_KEYS];

I looked at get_qual_for_list_multi_column() and immediately thought
that it may be a bad idea. I think it's better to integrate the logic
for multi-column case into the existing function even if that makes
the function appear more complex. Having two functions with the same
goal and mostly the same code is not a good idea mainly because it
becomes a maintenance burden.

I have attempted a rewrite such that get_qual_for_list() now handles
both the single-column and multi-column cases. Changes included in
the delta patch. The patch updates some outputs of the newly added
tests for multi-column list partitions, because the new code emits the
IS NOT NULL tests a bit differently than
get_qual_for_list_mutli_column() would. Notably, the old approach
would emit IS NOT NULL for every non-NULL datum matched to a given
column, not just once for the column. However, the patch makes a few
other tests fail, mainly because I had to fix
partition_bound_accepts_nulls() to handle the multi-column case,
though didn't bother to update all callers of it to also handle the
multi-column case correctly. I guess that's a TODO you're going to
deal with at some point anyway. :)

I still have more than half of v3 left to look at, so will continue
looking. In the meantime, please check the changes I suggested,
including the delta patch, and let me know your thoughts.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v3_delta_amit.patchapplication/octet-stream; name=v3_delta_amit.patchDownload+173-259
#20Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com
In reply to: Amit Langote (#19)
Re: Multi-Column List Partitioning

On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is failing
with below errors.

--inherit.sql is failing with error :"ERROR: negative bitmapset member not
allowed"
update mlparted_tab mlp set c = 'xxx'
from
(select a from some_tab union all select a+1 from some_tab) ss (a)
where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
ERROR: negative bitmapset member not allowed

--partition_join.sql is crashing with enable_partitionwise_join set to true.
CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('0001',
'0003');
CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0004',
'0006');
CREATE TABLE plt1_adv_p3 PARTITION OF plt1_adv FOR VALUES IN ('0008',
'0009');
INSERT INTO plt1_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM
generate_series(1, 299) i WHERE i % 10 IN (1, 3, 4, 6, 8, 9);
ANALYZE plt1_adv;
CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002',
'0003');
CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0004',
'0006');
CREATE TABLE plt2_adv_p3 PARTITION OF plt2_adv FOR VALUES IN ('0007',
'0009');
INSERT INTO plt2_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM
generate_series(1, 299) i WHERE i % 10 IN (2, 3, 4, 6, 7, 9);
ANALYZE plt2_adv;
-- inner join
EXPLAIN (COSTS OFF)
SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON
(t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost

--stack-trace
Core was generated by `postgres: edb regression [local] EXPLAIN
'.
Program terminated with signal 6, Aborted.
#0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install
glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64
libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-12.el7.x86_64
openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64
zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
#1 0x00007f7d339bb968 in abort () from /lib64/libc.so.6
#2 0x0000000000b0fbc3 in ExceptionalCondition (conditionName=0xcbda10
"part_index >= 0", errorType=0xcbd1c3 "FailedAssertion", fileName=0xcbd2fe
"partbounds.c", lineNumber=1957)
at assert.c:69
#3 0x0000000000892aa1 in is_dummy_partition (rel=0x19b37c0, part_index=-1)
at partbounds.c:1957
#4 0x00000000008919bd in merge_list_bounds (partnatts=1,
partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0,
inner_rel=0x1922938, jointype=JOIN_INNER,
outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at
partbounds.c:1529
#5 0x00000000008910de in partition_bounds_merge (partnatts=1,
partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0,
inner_rel=0x1922938, jointype=JOIN_INNER,
outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at
partbounds.c:1223
#6 0x000000000082c41a in compute_partition_bounds (root=0x1a19ed0,
rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30,
parent_sjinfo=0x7fffd67752a0, parts1=0x7fffd67751b0,
parts2=0x7fffd67751a8) at joinrels.c:1644
#7 0x000000000082bc34 in try_partitionwise_join (root=0x1a19ed0,
rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30,
parent_sjinfo=0x7fffd67752a0, parent_restrictlist=0x1ab3318)
at joinrels.c:1402
#8 0x000000000082aea2 in populate_joinrel_with_paths (root=0x1a19ed0,
rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, sjinfo=0x7fffd67752a0,
restrictlist=0x1ab3318)
at joinrels.c:926
#9 0x000000000082a8f5 in make_join_rel (root=0x1a19ed0, rel1=0x19b37c0,
rel2=0x1922938) at joinrels.c:760
#10 0x0000000000829e03 in make_rels_by_clause_joins (root=0x1a19ed0,
old_rel=0x19b37c0, other_rels_list=0x1ab2970, other_rels=0x1ab2990) at
joinrels.c:312
#11 0x00000000008298d9 in join_search_one_level (root=0x1a19ed0, level=2)
at joinrels.c:123
#12 0x000000000080c566 in standard_join_search (root=0x1a19ed0,
levels_needed=2, initial_rels=0x1ab2970) at allpaths.c:3020
#13 0x000000000080c4df in make_rel_from_joinlist (root=0x1a19ed0,
joinlist=0x199d538) at allpaths.c:2951
#14 0x000000000080816b in make_one_rel (root=0x1a19ed0, joinlist=0x199d538)
at allpaths.c:228
#15 0x000000000084491d in query_planner (root=0x1a19ed0,
qp_callback=0x84a538 <standard_qp_callback>, qp_extra=0x7fffd6775630) at
planmain.c:276
#16 0x0000000000847040 in grouping_planner (root=0x1a19ed0,
tuple_fraction=0) at planner.c:1447
#17 0x0000000000846709 in subquery_planner (glob=0x19b39d8,
parse=0x1aaa290, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at
planner.c:1025
#18 0x0000000000844f3e in standard_planner (parse=0x1aaa290,
query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a,
t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c =
t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0)
at planner.c:406
#19 0x0000000000844ce9 in planner (parse=0x1aaa290,
query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a,
t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c =
t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0)
at planner.c:277
#20 0x0000000000978483 in pg_plan_query (querytree=0x1aaa290,
query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a,
t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c =
t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0)
at postgres.c:847
#21 0x00000000006937fc in ExplainOneQuery (query=0x1aaa290,
cursorOptions=2048, into=0x0, es=0x19b36f0,
queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a,
t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c =
t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
params=0x0, queryEnv=0x0) at explain.c:397
#22 0x0000000000693351 in ExplainQuery (pstate=0x197c410, stmt=0x1aaa0b0,
params=0x0, dest=0x197c378) at explain.c:281
#23 0x00000000009811fa in standard_ProcessUtility (pstmt=0x1a0bfc8,
queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a,
t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c =
t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:845
#24 0x00000000009809ec in ProcessUtility (pstmt=0x1a0bfc8,
queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a,
t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c =
t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:527
#25 0x000000000097f636 in PortalRunUtility (portal=0x1893b40,
pstmt=0x1a0bfc8, isTopLevel=true, setHoldSnapshot=true, dest=0x197c378,
qc=0x7fffd6775f90) at pquery.c:1147
#26 0x000000000097f3a5 in FillPortalStore (portal=0x1893b40,
isTopLevel=true) at pquery.c:1026
#27 0x000000000097ed11 in PortalRun (portal=0x1893b40,
count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1a0c0b8,
altdest=0x1a0c0b8, qc=0x7fffd6776150) at pquery.c:758
#28 0x0000000000978aa5 in exec_simple_query (

Thanks & Regards,
Rajkumar Raghuwanshi

On Fri, Sep 3, 2021 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:

Show quoted text

On Wed, Sep 1, 2021 at 2:31 PM Amit Langote <amitlangote09@gmail.com>
wrote:

On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:

The attached patch also fixes the above comments.

I noticed that multi-column list partitions containing NULLs don't
work correctly with partition pruning yet.

create table p0 (a int, b text, c bool) partition by list (a, b, c);
create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1,

false));

create table p02 partition of p0 for values in ((1, NULL, false));
explain select * from p0 where a is null;
QUERY PLAN
--------------------------------------------------------
Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
Filter: (a IS NULL)
(2 rows)

I guess that may be due to the following newly added code being

incomplete:

+/*
+ * get_partition_bound_null_index
+ *
+ * Returns the partition index of the partition bound which accepts

NULL.

+ */
+int
+get_partition_bound_null_index(PartitionBoundInfo boundinfo)
+{
+   int i = 0;
+   int j = 0;
+
+   if (!boundinfo->isnulls)
+       return -1;
-           if (!val->constisnull)
-               count++;
+   for (i = 0; i < boundinfo->ndatums; i++)
+   {
+       //TODO: Handle for multi-column cases
+       for (j = 0; j < 1; j++)
+       {
+           if (boundinfo->isnulls[i][j])
+               return boundinfo->indexes[i];
}
}

+ return -1;
+}

Maybe this function needs to return a "bitmapset" of indexes, because
multiple partitions can now contain NULL values.

Some other issues I noticed and suggestions for improvement:

+/*
+ * checkForDuplicates
+ *
+ * Returns TRUE if the list bound element is already present in the

list of

+ * list bounds, FALSE otherwise.
+ */
+static bool
+checkForDuplicates(List *source, List *searchElem)

This function name may be too generic. Given that it is specific to
implementing list bound de-duplication, maybe the following signature
is more appropriate:

static bool
checkListBoundDuplicated(List *list_bounds, List *new_bound)

Also, better if the function comment mentions those parameter names,

like:

"Returns TRUE if the list bound element 'new_bound' is already present
in the target list 'list_bounds', FALSE otherwise."

+/*
+ * transformPartitionListBounds
+ *
+ * Converts the expressions of list partition bounds from the raw

grammar

+ * representation.

A sentence about the result format would be helpful, like:

The result is a List of Lists of Const nodes to account for the
partition key possibly containing more than one column.

+ int i = 0;
+ int j = 0;

Better to initialize such loop counters closer to the loop.

+           colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
+           colname[i] = get_attname(RelationGetRelid(parent),
+                                    key->partattrs[i], false);

The palloc in the 1st statement is wasteful, because the 2nd statement
overwrites its pointer by the pointer to the string palloc'd by
get_attname().

+ ListCell *cell2 = NULL;

No need to explicitly initialize the loop variable.

+           RowExpr     *rowexpr = NULL;
+
+           if (!IsA(expr, RowExpr))
+               ereport(ERROR,
+                       (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+                       errmsg("Invalid list bound specification"),
+                       parser_errposition(pstate, exprLocation((Node
*) spec))));
+
+           rowexpr = (RowExpr *) expr;

It's okay to assign rowexpr at the top here instead of the dummy
NULL-initialization and write the condition as:

if (!IsA(rowexpr, RowExpr))

+       if (isDuplicate)
+           continue;
+
+       result = lappend(result, values);

I can see you copied this style from the existing code, but how about
writing this simply as:

if (!isDuplicate)
result = lappend(result, values);

-/* One value coming from some (index'th) list partition */
+/* One bound of a list partition */
typedef struct PartitionListValue
{
int         index;
-   Datum       value;
+   Datum      *values;
+   bool       *isnulls;
} PartitionListValue;

Given that this is a locally-defined struct, I wonder if it makes
sense to rename the struct while we're at it. Call it, say,
PartitionListBound?

Also, please keep part of the existing comment that says that the
bound belongs to index'th partition.

Will send more comments in a bit...

+ * partition_bound_accepts_nulls
+ *
+ * Returns TRUE if partition bound has NULL value, FALSE otherwise.
*/

I suggest slight rewording, as follows:

"Returns TRUE if any of the partition bounds contains a NULL value,
FALSE otherwise."

-   PartitionListValue *all_values;
+   PartitionListValue **all_values;
...
-   all_values = (PartitionListValue *)
-       palloc(ndatums * sizeof(PartitionListValue));
+   ndatums = get_list_datum_count(boundspecs, nparts);
+   all_values = (PartitionListValue **)
+       palloc(ndatums * sizeof(PartitionListValue *));

I don't see the need to redefine all_values's pointer type. No need
to palloc PartitionListValue repeatedly for every datum as done
further down as follows:

+ all_values[j] = (PartitionListValue *)
palloc(sizeof(PartitionListValue));

You do need the following two though:

+           all_values[j]->values = (Datum *) palloc0(key->partnatts *
sizeof(Datum));
+           all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
sizeof(bool));

If you change the above the way I suggest, you'd also need to revert
the following change:

-   qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
+   qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
qsort_partition_list_value_cmp, (void *) key);
+       int         orig_index = all_values[i]->index;
+       boundinfo->datums[i] = (Datum *) palloc(key->partnatts *
sizeof(Datum));

Missing a newline between these two statements.

BTW, I noticed that the boundDatums variable is no longer used in
create_list_bounds. I traced back its origin and found that a recent
commit 53d86957e98 introduced it to implement an idea to reduce the
finer-grained pallocs that were being done in create_list_bounds(). I
don't think that this patch needs to throw away that work. You can
make it work as the attached delta patch that applies on top of v3.
Please check.

@@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
*parttyplen, bool *parttypbyval,
if (b1->nindexes != b2->nindexes)
return false;

-   if (b1->null_index != b2->null_index)
+   if (get_partition_bound_null_index(b1) !=
get_partition_bound_null_index(b2))

As mentioned in the last message, this bit in partition_bounds_equal()
needs to be comparing "bitmapsets" of null bound indexes, that is
after fixing get_partition_bound_null_index() as previously mentioned.

But...

@@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
*parttyplen, bool *parttypbyval,
* context.  datumIsEqual() should be simple enough to be
* safe.
*/
-               if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
+               if (b1->isnulls)
+                   b1_isnull = b1->isnulls[i][j];
+               if (b2->isnulls)
+                   b2_isnull = b2->isnulls[i][j];
+
+               /*
+                * If any of the partition bound has NULL value, then check
+                * equality for the NULL value instead of comparing the
datums
+                * as it does not contain valid value in case of NULL.
+                */
+               if (b1_isnull || b2_isnull)
+               {
+                   if (b1_isnull != b2_isnull)
+                       return false;
+               }

...if you have this in the main loop, I don't think we need the above
code stanza which appears to implement a short-cut for this long-form
logic.

+               (key->strategy != PARTITION_STRATEGY_LIST ||
+                !src->isnulls[i][j]))

I think it's better to write this condition as follows just like the
accompanying condition involving src->kind:

(src->nulls == NULL || !src->isnulls[i][j])

(Skipped looking at merge_list_bounds() and related changes for now as
I see a lot of TODOs remain to be done.)

In check_new_partition_bound():

+                       Datum      *values = (Datum *)
palloc0(key->partnatts * sizeof(Datum));
+                       bool       *isnulls = (bool *)
palloc0(key->partnatts * sizeof(bool));

Doesn't seem like a bad idea to declare these as:

Datum values[PARTITION_MAX_KEYS];
bool isnulls[PARTITION_MAX_KEYS];

I looked at get_qual_for_list_multi_column() and immediately thought
that it may be a bad idea. I think it's better to integrate the logic
for multi-column case into the existing function even if that makes
the function appear more complex. Having two functions with the same
goal and mostly the same code is not a good idea mainly because it
becomes a maintenance burden.

I have attempted a rewrite such that get_qual_for_list() now handles
both the single-column and multi-column cases. Changes included in
the delta patch. The patch updates some outputs of the newly added
tests for multi-column list partitions, because the new code emits the
IS NOT NULL tests a bit differently than
get_qual_for_list_mutli_column() would. Notably, the old approach
would emit IS NOT NULL for every non-NULL datum matched to a given
column, not just once for the column. However, the patch makes a few
other tests fail, mainly because I had to fix
partition_bound_accepts_nulls() to handle the multi-column case,
though didn't bother to update all callers of it to also handle the
multi-column case correctly. I guess that's a TODO you're going to
deal with at some point anyway. :)

I still have more than half of v3 left to look at, so will continue
looking. In the meantime, please check the changes I suggested,
including the delta patch, and let me know your thoughts.

--
Amit Langote
EDB: http://www.enterprisedb.com

#21Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Rajkumar Raghuwanshi (#20)
#22Zhihong Yu
zyu@yugabyte.com
In reply to: Amit Langote (#21)
#23Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Rajkumar Raghuwanshi (#20)
#24Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com
In reply to: Nitin Jadhav (#23)
#25Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Rajkumar Raghuwanshi (#24)
#26Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com
In reply to: Nitin Jadhav (#25)
#27Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com
In reply to: Rajkumar Raghuwanshi (#26)
#28Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Rajkumar Raghuwanshi (#26)
#29Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Rajkumar Raghuwanshi (#27)
#30Zhihong Yu
zyu@yugabyte.com
In reply to: Nitin Jadhav (#29)
#31Zhihong Yu
zyu@yugabyte.com
In reply to: Zhihong Yu (#30)
#32Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Nitin Jadhav (#29)
#33Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Zhihong Yu (#31)
#34Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Amit Langote (#32)
#35Amul Sul
sulamul@gmail.com
In reply to: Nitin Jadhav (#34)
#36Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Amul Sul (#35)
#37Amul Sul
sulamul@gmail.com
In reply to: Nitin Jadhav (#36)
#38Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Nitin Jadhav (#36)
#39Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#38)
#40Amul Sul
sulamul@gmail.com
In reply to: Amit Langote (#38)
#41Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amul Sul (#40)
#42Amul Sul
sulamul@gmail.com
In reply to: Amit Langote (#41)
#43Amul Sul
sulamul@gmail.com
In reply to: Amul Sul (#42)
#44Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Nitin Jadhav (#36)
#45Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Ashutosh Sharma (#44)
#46Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Amit Langote (#45)
#47Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Ashutosh Sharma (#46)
#48Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Amit Langote (#47)
#49Amul Sul
sulamul@gmail.com
In reply to: Nitin Jadhav (#48)
#50Nitin Jadhav
nitinjadhavpostgres@gmail.com
In reply to: Amul Sul (#49)
#51Amul Sul
sulamul@gmail.com
In reply to: Nitin Jadhav (#50)
#52Julien Rouhaud
rjuju123@gmail.com
In reply to: Nitin Jadhav (#50)