Distributed Table Partitioning

Started by Leonardo M. Raméabout 10 years ago7 messagesgeneral
Jump to latest
#1Leonardo M. Ramé
l.rame@griensu.com

I have this problem: a Master table containing records with a timestamp
column registering creation date-time, and one Detail table containing
info related to the Master table.

As time went by, those tables grew enormously, and I can't afford
expanding my SSD VPS. So I'm thinking about storing only NEW data into
it, and move OLD data to a cheaper SATA VPS.

The goal is using the SSD server as "main", and the other (or others?)
as "child", so queries still go to the main server, it somehow detects
which records must be fetched from it and what from the child servers,
then return the "composed" dataset to the caller.

I think this is called Distributed Horizontal Table Partitioning.

Is there a way to do this without changing my application code?.

Regards,
--
Leonardo M. Ram�
Medical IT - Griensu S.A.
Av. Col�n 636 - Piso 8 Of. A
X5000EPT -- C�rdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Alvaro Aguayo Garcia-Rada
aaguayo@opensysperu.com
In reply to: Leonardo M. Ramé (#1)
Re: Distributed Table Partitioning

Hi. I think pgpool-II can do that job for you. It's a middleware, so you can use it without even changing your app code(but your postgres configuration). It suppoerts many clustering functions, including replication, failover, and a lot more; it also supports partitioning. so that may be suitable for you. Check the tutorial, it even has some examples: http://www.pgpool.net/docs/latest/tutorial-en.html

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103 | RPC: (+51) 954183248
Website: www.ocs.pe

----- Original Message -----
From: "Leonardo M. Ramé" <l.rame@griensu.com>
To: "PostgreSql-general" <pgsql-general@postgresql.org>
Sent: Saturday, 12 March, 2016 8:25:01 PM
Subject: [GENERAL] Distributed Table Partitioning

I have this problem: a Master table containing records with a timestamp
column registering creation date-time, and one Detail table containing
info related to the Master table.

As time went by, those tables grew enormously, and I can't afford
expanding my SSD VPS. So I'm thinking about storing only NEW data into
it, and move OLD data to a cheaper SATA VPS.

The goal is using the SSD server as "main", and the other (or others?)
as "child", so queries still go to the main server, it somehow detects
which records must be fetched from it and what from the child servers,
then return the "composed" dataset to the caller.

I think this is called Distributed Horizontal Table Partitioning.

Is there a way to do this without changing my application code?.

Regards,
--
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Melvin Davidson
melvin6925@gmail.com
In reply to: Alvaro Aguayo Garcia-Rada (#2)
Re: Distributed Table Partitioning

On Sat, Mar 12, 2016 at 8:33 PM, Alvaro Aguayo Garcia-Rada <
aaguayo@opensysperu.com> wrote:

Hi. I think pgpool-II can do that job for you. It's a middleware, so you
can use it without even changing your app code(but your postgres
configuration). It suppoerts many clustering functions, including
replication, failover, and a lot more; it also supports partitioning. so
that may be suitable for you. Check the tutorial, it even has some
examples: http://www.pgpool.net/docs/latest/tutorial-en.html

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103 | RPC: (+51)
954183248
Website: www.ocs.pe

----- Original Message -----
From: "Leonardo M. Ramé" <l.rame@griensu.com>
To: "PostgreSql-general" <pgsql-general@postgresql.org>
Sent: Saturday, 12 March, 2016 8:25:01 PM
Subject: [GENERAL] Distributed Table Partitioning

I have this problem: a Master table containing records with a timestamp
column registering creation date-time, and one Detail table containing
info related to the Master table.

As time went by, those tables grew enormously, and I can't afford
expanding my SSD VPS. So I'm thinking about storing only NEW data into
it, and move OLD data to a cheaper SATA VPS.

The goal is using the SSD server as "main", and the other (or others?)
as "child", so queries still go to the main server, it somehow detects
which records must be fetched from it and what from the child servers,
then return the "composed" dataset to the caller.

I think this is called Distributed Horizontal Table Partitioning.

Is there a way to do this without changing my application code?.

Regards,
--
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Why don't you just make use of tablespaces and partition the child
tablespaces
so that the newer parttion is on the SSD and the older one is on SATA?
You will need a trigger and tg function to handle inserts

eg: {note: below is psuedo code}
child {master} (SSD) NO ROWS33
child1 (SSD) CONSTRAINT timestamp > {specified time}
tg_insert_child1 on insert execute tgf_split_data
child2 (SATA) CONSTRAINT timestamp <= {specified time}
tg_insert_child2 on insert execute tgf_split_data

tgf_split_data()
if timestamp > {specified time}
insert into child1
else
insert into child2
endif
--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#4Melvin Davidson
melvin6925@gmail.com
In reply to: Melvin Davidson (#3)
Re: Distributed Table Partitioning

oops! Better example
eg: {note: below is psuedo code}
child {master} (SSD) NO ROWS33
tg_insert_child before insert execute tgf_split_data
child1 (SSD) CONSTRAINT timestamp > {specified time}
child2 (SATA) CONSTRAINT timestamp <= {specified time}

tgf_split_data()
if timestamp > {specified time}
insert into child1
else
insert into child2
endif

On Sat, Mar 12, 2016 at 9:19 PM, Melvin Davidson <melvin6925@gmail.com>
wrote:

On Sat, Mar 12, 2016 at 8:33 PM, Alvaro Aguayo Garcia-Rada <
aaguayo@opensysperu.com> wrote:

Hi. I think pgpool-II can do that job for you. It's a middleware, so you
can use it without even changing your app code(but your postgres
configuration). It suppoerts many clustering functions, including
replication, failover, and a lot more; it also supports partitioning. so
that may be suitable for you. Check the tutorial, it even has some
examples: http://www.pgpool.net/docs/latest/tutorial-en.html

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103 | RPC: (+51)
954183248
Website: www.ocs.pe

----- Original Message -----
From: "Leonardo M. Ramé" <l.rame@griensu.com>
To: "PostgreSql-general" <pgsql-general@postgresql.org>
Sent: Saturday, 12 March, 2016 8:25:01 PM
Subject: [GENERAL] Distributed Table Partitioning

I have this problem: a Master table containing records with a timestamp
column registering creation date-time, and one Detail table containing
info related to the Master table.

As time went by, those tables grew enormously, and I can't afford
expanding my SSD VPS. So I'm thinking about storing only NEW data into
it, and move OLD data to a cheaper SATA VPS.

The goal is using the SSD server as "main", and the other (or others?)
as "child", so queries still go to the main server, it somehow detects
which records must be fetched from it and what from the child servers,
then return the "composed" dataset to the caller.

I think this is called Distributed Horizontal Table Partitioning.

Is there a way to do this without changing my application code?.

Regards,
--
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Why don't you just make use of tablespaces and partition the child
tablespaces
so that the newer parttion is on the SSD and the older one is on SATA?
You will need a trigger and tg function to handle inserts

eg: {note: below is psuedo code}
child {master} (SSD) NO ROWS33
child1 (SSD) CONSTRAINT timestamp > {specified time}
tg_insert_child1 on insert execute tgf_split_data
child2 (SATA) CONSTRAINT timestamp <= {specified time}
tg_insert_child2 on insert execute tgf_split_data

tgf_split_data()
if timestamp > {specified time}
insert into child1
else
insert into child2
endif
--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#5Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Melvin Davidson (#3)
Re: Distributed Table Partitioning

On 2016-03-12 21:19:11 -0500, Melvin Davidson wrote:

----- Original Message -----
From: "Leonardo M. Ramé" <l.rame@griensu.com>
To: "PostgreSql-general" <pgsql-general@postgresql.org>
Sent: Saturday, 12 March, 2016 8:25:01 PM
Subject: [GENERAL] Distributed Table Partitioning

I have this problem: a Master table containing records with a timestamp
column registering creation date-time, and one Detail table containing
info related to the Master table.

As time went by, those tables grew enormously, and I can't afford
expanding my SSD VPS. So I'm thinking about storing only NEW data into
it, and move OLD data to a cheaper SATA VPS.

[...]

Why don't you just make use of tablespaces and partition the child tablespaces
so that the newer parttion is on the SSD and the older one is on SATA?

Since he mentioned virtual private servers (VPS) the reason might be
that his hoster offers VPS with SSDs (of various sizes) and VPS with
rotating hard disks (of various sizes), but not VPS with both. So he
can't rent a VPS with a relatively small SSD and a larger hard disk.

That might be a reason to look for an alternate hoster, but if he's
otherwise happy, switching to an unknown provider might be considered
too large a risk.

hp

--
_ | Peter J. Holzer | I want to forget all about both belts and
|_|_) | | suspenders; instead, I want to buy pants
| | | hjp@hjp.at | that actually fit.
__/ | http://www.hjp.at/ | -- http://noncombatant.org/

#6Leonardo M. Ramé
l.rame@griensu.com
In reply to: Peter J. Holzer (#5)
Re: Distributed Table Partitioning

El 13/03/16 a las 10:04, Peter J. Holzer escribi�:

On 2016-03-12 21:19:11 -0500, Melvin Davidson wrote:

----- Original Message -----
From: "Leonardo M. Ram�" <l.rame@griensu.com>
To: "PostgreSql-general" <pgsql-general@postgresql.org>
Sent: Saturday, 12 March, 2016 8:25:01 PM
Subject: [GENERAL] Distributed Table Partitioning

I have this problem: a Master table containing records with a timestamp
column registering creation date-time, and one Detail table containing
info related to the Master table.

As time went by, those tables grew enormously, and I can't afford
expanding my SSD VPS. So I'm thinking about storing only NEW data into
it, and move OLD data to a cheaper SATA VPS.

[...]

Why don't you just make use of tablespaces and partition the child tablespaces
so that the newer parttion is on the SSD and the older one is on SATA?

Since he mentioned virtual private servers (VPS) the reason might be
that his hoster offers VPS with SSDs (of various sizes) and VPS with
rotating hard disks (of various sizes), but not VPS with both. So he
can't rent a VPS with a relatively small SSD and a larger hard disk.

That might be a reason to look for an alternate hoster, but if he's
otherwise happy, switching to an unknown provider might be considered
too large a risk.

hp

Yes, Peter is right, I must store one table in one VPS and the other in
a 2nd VPS.

I'm thinking of partitioning the table, on local and the remote using a
Foreign Data Wrapper, what do you think about this approach?.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Melvin Davidson
melvin6925@gmail.com
In reply to: Leonardo M. Ramé (#6)
Re: Distributed Table Partitioning

I guess that would work if you used a view to combine both tables, but you
would still need a BEFORE trigger to make sure DML goes to the appropriate
table.
Another solution might be to use dblink if you need to access the SATA
table on the 2nd VPS,

On Sun, Mar 13, 2016 at 11:37 AM, Leonardo M. Ramé <l.rame@griensu.com>
wrote:

El 13/03/16 a las 10:04, Peter J. Holzer escribió:

On 2016-03-12 21:19:11 -0500, Melvin Davidson wrote:

----- Original Message -----
From: "Leonardo M. Ramé" <l.rame@griensu.com>
To: "PostgreSql-general" <pgsql-general@postgresql.org>
Sent: Saturday, 12 March, 2016 8:25:01 PM
Subject: [GENERAL] Distributed Table Partitioning

I have this problem: a Master table containing records with a
timestamp
column registering creation date-time, and one Detail table
containing
info related to the Master table.

As time went by, those tables grew enormously, and I can't afford
expanding my SSD VPS. So I'm thinking about storing only NEW data
into
it, and move OLD data to a cheaper SATA VPS.

[...]

Why don't you just make use of tablespaces and partition the child
tablespaces
so that the newer parttion is on the SSD and the older one is on SATA?

Since he mentioned virtual private servers (VPS) the reason might be
that his hoster offers VPS with SSDs (of various sizes) and VPS with
rotating hard disks (of various sizes), but not VPS with both. So he
can't rent a VPS with a relatively small SSD and a larger hard disk.

That might be a reason to look for an alternate hoster, but if he's
otherwise happy, switching to an unknown provider might be considered
too large a risk.

hp

Yes, Peter is right, I must store one table in one VPS and the other in a
2nd VPS.

I'm thinking of partitioning the table, on local and the remote using a
Foreign Data Wrapper, what do you think about this approach?.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.