Partitioning on Table with Foreign Key

Started by Yan Cheng Cheokabout 16 years ago4 messagesgeneral
Jump to latest
#1Yan Cheng Cheok
yccheok@yahoo.com

I am referring to http://www.postgresql.org/docs/current/static/ddl-partitioning.html

I have the follow table :

table lot
=========
id | date |
1 2010-01-19 13:53:57.713
2 2010-01-20 11:34:11.856

table measurement
=========
id | fk_lot_id |
1 2
2 2
3 2
4 2
5 2
6 2
7 2
8 2

table measurement will have a *lot* of row (millions). I want to speed up write and read access. Hence, I use partition technique.

CREATE TABLE measurement_y2006m02 (
CHECK ( date >= DATE '2006-02-01' AND date < DATE '2006-03-01' )
) INHERITS (measurement);

Opps! But measurement do not have date column. How I can refer to measurement's lot's date?

Thanks and Regards
Yan Cheng CHEOK

#2Vick Khera
vivek@khera.org
In reply to: Yan Cheng Cheok (#1)
Re: Partitioning on Table with Foreign Key

On Thu, Jan 21, 2010 at 3:46 AM, Yan Cheng Cheok <yccheok@yahoo.com> wrote:

table measurement will have a *lot* of row (millions). I want to speed up write and read access. Hence, I use partition technique.

CREATE TABLE measurement_y2006m02 (
   CHECK ( date >= DATE '2006-02-01' AND date < DATE '2006-03-01' )
) INHERITS (measurement);

Opps! But measurement do not have date column. How I can refer to measurement's lot's date?

Split on your lot ID number instead of date, using a modulo operation.
I've done this by splitting a table 100 ways using the constraing (id
% 100 == N) for N = 0 .. 99.

We don't know what your data retention or distribution needs are, so
it is hard to offer solid advice.

#3Yan Cheng Cheok
yccheok@yahoo.com
In reply to: Vick Khera (#2)
Re: Partitioning on Table with Foreign Key

Make sense to me. Thanks for the advice. I will try that out.

Thanks and Regards
Yan Cheng CHEOK

--- On Thu, 1/21/10, Vick Khera <vivek@khera.org> wrote:
Show quoted text

From: Vick Khera <vivek@khera.org>
Subject: Re: [GENERAL] Partitioning on Table with Foreign Key
To: "Yan Cheng Cheok" <yccheok@yahoo.com>
Cc: pgsql-general@postgresql.org
Date: Thursday, January 21, 2010, 11:30 PM
On Thu, Jan 21, 2010 at 3:46 AM, Yan
Cheng Cheok <yccheok@yahoo.com>
wrote:

table measurement will have a *lot* of row (millions).

I want to speed up write and read access. Hence, I use
partition technique.

CREATE TABLE measurement_y2006m02 (
   CHECK ( date >= DATE '2006-02-01' AND date

< DATE '2006-03-01' )

) INHERITS (measurement);

Opps! But measurement do not have date column. How I

can refer to measurement's lot's date?

Split on your lot ID number instead of date, using a modulo
operation.
I've done this by splitting a table 100 ways using the
constraing (id
% 100 == N) for N = 0 .. 99.

We don't know what your data retention or distribution
needs are, so
it is hard to offer solid advice.

#4Yan Cheng Cheok
yccheok@yahoo.com
In reply to: Yan Cheng Cheok (#3)
Re: Partitioning on Table with Foreign Key

Hello Vick,

Can I get some advice from your side?

Currently, I have table :

1 lot is pointing to many units
1 unit is pointing to many measurements

Say, let say, I want "Give me all the measurements result within this lot".

For a single lot, with 100,000 units, with each unit having 48 different measurements (total 4,800,000 measurements). It will takes ~30 seconds to return result.

I use the following stored procedure to help me do so.

======================================================
CREATE OR REPLACE FUNCTION get_measurements(IN bigint)
RETURNS TABLE(_type text, _value double precision, _unit text) AS
$BODY$DECLARE
_lotID ALIAS FOR $1;
BEGIN
RETURN QUERY EXECUTE 'SELECT measurement_type.value, measurement.value, measurement_unit.value
FROM
measurement INNER JOIN unit ON (fk_unit_id = unit_id)
WHERE fk_lot_id = $1'
USING _lotID;
END;$BODY$
LANGUAGE 'plpgsql' VOLATILE;
ALTER FUNCTION get_measurements(bigint) OWNER TO postgres;
======================================================

However, when time goes on and more lots being added, the performance for the above stored procedure getting worst.

I plan to use table partition, to solve the above problem.

(1) Store unit, according to their unit_id (primary key). For example,

unit_id from 0 till 999999 will store in table unit_0.
unit_id from 1000000 till 1999999 will store in table unit_1.

(2) Store measurement, according to their measurment_id (primary key). For example,

measurement_id from 0 till 999999 will store in table measurement_0.
measurement_id from 1000000 till 1999999 will store in table measurement_1.

However, does the above stored procedure, will gain advantage from my (1) and (2) strategy?

As from my point of view, PostgreSQL seems doesn't get enough hint, where he need to go to unit_? and measurement_? table to perform read. In my query, I didn't specific rule like

"WHERE unit_id / 1000000 = 0"

Advice and comment are very much appreciated.

The below are my table partition rule :
========================================
CREATE TABLE measurement_0 (
CONSTRAINT pk_measurement_id PRIMARY KEY (measurement_id),
CONSTRAINT fk_unit_id FOREIGN KEY (fk_unit_id)
REFERENCES unit (unit_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,

CHECK ( measurement_id / 1000000 = 0 )
) INHERITS (measurement);

CREATE TABLE measurement_1 (
CONSTRAINT pk_measurement_id PRIMARY KEY (measurement_id),
CONSTRAINT fk_unit_id FOREIGN KEY (fk_unit_id)
REFERENCES unit (unit_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,

CHECK ( measurement_id / 1000000 = 1 )
) INHERITS (measurement);

......

CREATE TABLE unit_0 (
CONSTRAINT pk_unit_id PRIMARY KEY (unit_id),
CONSTRAINT fk_lot_id FOREIGN KEY (fk_lot_id)
REFERENCES lot (lot_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,

CHECK ( unit_id / 1000000 = 0 )
) INHERITS (measurement);

CREATE TABLE unit_1 (
CONSTRAINT pk_unit_id PRIMARY KEY (unit_id),
CONSTRAINT fk_lot_id FOREIGN KEY (fk_lot_id)
REFERENCES lot (lot_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,

CHECK ( unit_id / 1000000 = 1 )
) INHERITS (measurement);

...

Here are my table defination :
==============================
IF NOT EXISTS(SELECT * FROM information_schema.tables WHERE table_name = 'unit') THEN
CREATE TABLE unit
(
unit_id bigserial NOT NULL,
fk_lot_id bigint NOT NULL,
"cycle" bigint NOT NULL,
CONSTRAINT pk_unit_id PRIMARY KEY (unit_id),
CONSTRAINT fk_lot_id FOREIGN KEY (fk_lot_id)
REFERENCES lot (lot_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE
);

CREATE INDEX idx_fk_lot_id
ON unit
USING btree
(fk_lot_id);
END IF;

IF NOT EXISTS(SELECT * FROM information_schema.tables WHERE table_name = 'measurement') THEN
CREATE TABLE measurement
(
measurement_id bigserial NOT NULL,
fk_unit_id bigint NOT NULL,
"value" double precision,
CONSTRAINT pk_measurement_id PRIMARY KEY (measurement_id),
CONSTRAINT fk_unit_id FOREIGN KEY (fk_unit_id)
REFERENCES unit (unit_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE
);

CREATE INDEX idx_fk_unit_id
ON measurement
USING btree
(fk_unit_id);

CREATE INDEX idx_measurement_value
ON measurement
USING btree
(value) WHERE value IS NULL;
END IF;

Thanks and Regards
Yan Cheng CHEOK

--- On Thu, 1/21/10, Yan Cheng Cheok <yccheok@yahoo.com> wrote:
Show quoted text

From: Yan Cheng Cheok <yccheok@yahoo.com>
Subject: Re: [GENERAL] Partitioning on Table with Foreign Key
To: "Vick Khera" <vivek@khera.org>
Cc: pgsql-general@postgresql.org
Date: Thursday, January 21, 2010, 11:31 PM
Make sense to me. Thanks for the
advice. I will try that out.

Thanks and Regards
Yan Cheng CHEOK

--- On Thu, 1/21/10, Vick Khera <vivek@khera.org>
wrote:

From: Vick Khera <vivek@khera.org>
Subject: Re: [GENERAL] Partitioning on Table with

Foreign Key

To: "Yan Cheng Cheok" <yccheok@yahoo.com>
Cc: pgsql-general@postgresql.org
Date: Thursday, January 21, 2010, 11:30 PM
On Thu, Jan 21, 2010 at 3:46 AM, Yan
Cheng Cheok <yccheok@yahoo.com>
wrote:

table measurement will have a *lot* of row

(millions).

I want to speed up write and read access. Hence, I

use

partition technique.

CREATE TABLE measurement_y2006m02 (
   CHECK ( date >= DATE '2006-02-01' AND

date

< DATE '2006-03-01' )

) INHERITS (measurement);

Opps! But measurement do not have date column.

How I

can refer to measurement's lot's date?

Split on your lot ID number instead of date, using a

modulo

operation.
  I've done this by splitting a table 100 ways

using the

constraing (id
% 100 == N) for N = 0 .. 99.

We don't know what your data retention or

distribution

needs are, so
it is hard to offer solid advice.