Question on Partition key
Hello Friends,
We are trying to create a monthly range partition table , partitioned on
column PART_DATE. This will hold Orders and part_date is nothing but
invoice date. Some Team mates are asking to use the "PART_DATE" column as
data type "INTEGER" with "YYYYMM" format and also define partitions as
below. Want to know experts' views on this. If the data type of the
partition key matters here or not? Or if there is any downside of each
approach in future?
-- With date data type, It will look like as below
CREATE TABLE TAB1( COLUMN1 VARCHAR(36) NOT NULL , PART_DATE DATE NOT
NULL ) PARTITION BY RANGE (PART_DATE);
CREATE TABLE TAB1_202309 PARTITION OF TAB1 FOR VALUES FROM ('2023-09-01')
TO ('2023-10-01');
CREATE TABLE TAB1_202310 PARTITION OF TAB1 FOR VALUES FROM ('2023-10-01')
TO ('2023-11-01');
CREATE TABLE TAB1_202311 PARTITION OF TAB1 FOR VALUES FROM ('2023-11-01')
TO ('2023-12-01');
ALTER TABLE TAB1 ADD CONSTRAINT PK_TAB1 PRIMARY KEY ( COLUMN1 ,
PART_DATE );
VS
-- With integer data type, It will look like as below
CREATE TABLE TAB1( COLUMN1 VARCHAR(36) NOT NULL , PART_DATE_YM_NM
Integer NOT NULL ) PARTITION BY RANGE (PART_DATE_YM_NM);
CREATE TABLE TAB1_202309 PARTITION OF TAB1 FOR VALUES FROM ('202309') TO
('202310');
CREATE TABLE TAB1_202310 PARTITION OF TAB1 FOR VALUES FROM ('202310') TO
('202311');
CREATE TABLE TAB1_202311 PARTITION OF TAB1 FOR VALUES FROM ('202311') TO
('202312');
ALTER TABLE TAB1 ADD CONSTRAINT PK_TAB1 PRIMARY KEY ( COLUMN1 ,
PART_DATE_YM_NM );
On 03/09/2023 00:35 CEST veem v <veema0000@gmail.com> wrote:
We are trying to create a monthly range partition table , partitioned on
column PART_DATE. This will hold Orders and part_date is nothing but invoice
date. Some Team mates are asking to use the "PART_DATE" column as data type
"INTEGER" with "YYYYMM" format [...]
Why do your team mates favor integer over date?
Want to know experts' views on this. If the data type of the partition key
matters here or not?
Both integer and date are stored as 4 bytes. There should be no difference
regarding index size. I don't know if the data type makes a difference in
partition pruning performance in this case, but I'd be surprised if it were
the case.
Or if there is any downside of each approach in future?
The downside of integer is that it allows invalid dates (e.g. 202313) unless
you also add check constraints. But then just use date if you want to store
dates. You get input validation and can use the date operators and functions
that Postgres offers.
--
Erik
Have your friends also mentioned how it is going to help to convert date
field to integer !???
On Sun, Sep 3, 2023 at 3:51 AM Erik Wienhold <ewie@ewie.name> wrote:
Show quoted text
On 03/09/2023 00:35 CEST veem v <veema0000@gmail.com> wrote:
We are trying to create a monthly range partition table , partitioned on
column PART_DATE. This will hold Orders and part_date is nothing butinvoice
date. Some Team mates are asking to use the "PART_DATE" column as data
type
"INTEGER" with "YYYYMM" format [...]
Why do your team mates favor integer over date?
Want to know experts' views on this. If the data type of the partition
key
matters here or not?
Both integer and date are stored as 4 bytes. There should be no difference
regarding index size. I don't know if the data type makes a difference in
partition pruning performance in this case, but I'd be surprised if it were
the case.Or if there is any downside of each approach in future?
The downside of integer is that it allows invalid dates (e.g. 202313)
unless
you also add check constraints. But then just use date if you want to
store
dates. You get input validation and can use the date operators and
functions
that Postgres offers.--
Erik
Thank you so much for the clarification.
Actually team have used similar partitioning strategy on integer columns in
past. So they are inclined towards that. I will still, double check with
others if any business restrictions exists. But as you already mentioned,
it's not good in terms of data quality perspective. I agree to this point.
Additionally, is it true that optimizer will also get fooled on getting the
math correct during cardinality estimates, as because there is a big
difference between , comparing or substracting, two date values VS two
number values. And storing the dates in the number columns will pose this
problem for the optimizer. Is my understanding correct here?
On Sun, 3 Sept, 2023, 2:02 pm Deep, <biswachk@gmail.com> wrote:
Show quoted text
Have your friends also mentioned how it is going to help to convert date
field to integer !???On Sun, Sep 3, 2023 at 3:51 AM Erik Wienhold <ewie@ewie.name> wrote:
On 03/09/2023 00:35 CEST veem v <veema0000@gmail.com> wrote:
We are trying to create a monthly range partition table , partitioned on
column PART_DATE. This will hold Orders and part_date is nothing butinvoice
date. Some Team mates are asking to use the "PART_DATE" column as data
type
"INTEGER" with "YYYYMM" format [...]
Why do your team mates favor integer over date?
Want to know experts' views on this. If the data type of the partition
key
matters here or not?
Both integer and date are stored as 4 bytes. There should be no
difference
regarding index size. I don't know if the data type makes a difference in
partition pruning performance in this case, but I'd be surprised if it
were
the case.Or if there is any downside of each approach in future?
The downside of integer is that it allows invalid dates (e.g. 202313)
unless
you also add check constraints. But then just use date if you want to
store
dates. You get input validation and can use the date operators and
functions
that Postgres offers.--
Erik
On Sun, 3 Sept 2023 at 23:52, veem v <veema0000@gmail.com> wrote:
Additionally, is it true that optimizer will also get fooled on getting the math correct during cardinality estimates, as because there is a big difference between , comparing or substracting, two date values VS two number values. And storing the dates in the number columns will pose this problem for the optimizer. Is my understanding correct here?
The query planner does not do any subtracting of values which are the
target of the statistics. There are comparisons, but comparing a DATE
or an INT are equally as cheap.
To me, the design with the PART_DATE_YM_NM INT column looks very
strange. Why bother partitioning by RANGE when there's just a single
value? The partition pruning done for LIST partitioning will work
equally as well when given ranges of values. Also, don't they ever
want to store the day of the month anywhere in the table? The INT
partitioned table won't allow that, but the DATE one will.
Several jobs ago in a land far far away, I worked with someone who
would tell engineers to not use EXISTs clauses in their SQLs as
"they're not optimised very well". I questioned him about this and as
it turned out, some version of Oracle once didn't optimise these very
well and when he learned this, he took that knowledge and seemingly
applied it to all versions of all RDBMSs in the universe. Rather
bizarre, but perhaps that's what's going on here too.
David