Which replication is the best for our case ?

Started by ben.playalmost 11 years ago16 messagesgeneral
Jump to latest
#1ben.play
benjamin.cohen@playrion.com

Hi guys,

We have a PG database with more than 400 GB of data.
At this moment, a cron runs each ten minutes and updates about 10 000 lines
with complex algorithms in PHP.

Each time the cron runs, the website is almost down because some queries
have to make an update on the FULL table...

Therefore, I'm asking if it's possible to duplicate my main database on a
slave server in order to run these cron on this second server... then,
replicate these changes on the main database (master).

Which replication is the best in this case ?

http://www.postgresql.org/docs/9.3/static/warm-standby.html ?

Do you have any links or tutorial which explain this kind of operation ?

Thanks a lot !

--
View this message in context: http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: ben.play (#1)
Re: Which replication is the best for our case ?

On 06/29/2015 06:02 AM, ben.play wrote:

Hi guys,

We have a PG database with more than 400 GB of data.
At this moment, a cron runs each ten minutes and updates about 10 000 lines
with complex algorithms in PHP.

Each time the cron runs, the website is almost down because some queries
have to make an update on the FULL table...

Therefore, I'm asking if it's possible to duplicate my main database on a
slave server in order to run these cron on this second server... then,
replicate these changes on the main database (master).

So you want the replication to go from the standby back to the master?
If that is the case, it is not possible.

Which replication is the best in this case ?

http://www.postgresql.org/docs/9.3/static/warm-standby.html ?

Do you have any links or tutorial which explain this kind of operation ?

Thanks a lot !

--
View this message in context: http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: ben.play (#1)
Re: Which replication is the best for our case ?

On 06/29/2015 06:02 AM, ben.play wrote:

Hi guys,

We have a PG database with more than 400 GB of data.
At this moment, a cron runs each ten minutes and updates about 10 000 lines
with complex algorithms in PHP.

Each time the cron runs, the website is almost down because some queries
have to make an update on the FULL table...

Therefore, I'm asking if it's possible to duplicate my main database on a
slave server in order to run these cron on this second server... then,
replicate these changes on the main database (master).

Which replication is the best in this case ?

http://www.postgresql.org/docs/9.3/static/warm-standby.html ?

Do you have any links or tutorial which explain this kind of operation ?

I should have been clearer in my original post. What you want is not
possible using the procedures found at the link you posted. What you are
looking for is Master to Master replication. This is not something I
have done, so I am not the person to offer detailed advice on that
particular style of replication. I do know that this list sees a lot of
activity with regard to
BDR(http://bdr-project.org/docs/stable/index.html) which provides Master
to Master replication.

Thanks a lot !

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Arthur Silva
arthurprs@gmail.com
In reply to: ben.play (#1)
Re: Which replication is the best for our case ?

On Mon, Jun 29, 2015 at 10:02 AM, ben.play <benjamin.cohen@playrion.com>
wrote:

Hi guys,

We have a PG database with more than 400 GB of data.
At this moment, a cron runs each ten minutes and updates about 10 000 lines
with complex algorithms in PHP.

Each time the cron runs, the website is almost down because some queries
have to make an update on the FULL table...

Therefore, I'm asking if it's possible to duplicate my main database on a
slave server in order to run these cron on this second server... then,
replicate these changes on the main database (master).

Which replication is the best in this case ?

http://www.postgresql.org/docs/9.3/static/warm-standby.html ?

Do you have any links or tutorial which explain this kind of operation ?

Thanks a lot !

--
View this message in context:
http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Hello Adrian, can you give us one example of such FULL table update queries?

By website down you mean slowed to a halt or read-only mode (due to the
update locks)?

Either way it doesn't look like replication is going to solve your problem.

#5Noname
Holger.Friedrich-Fa-Trivadis@it.nrw.de
In reply to: Arthur Silva (#4)
Re: Which replication is the best for our case ?

Arthur Silva wrote on Monday, June 29, 2015 5:23 PM:

Therefore, I'm asking if it's possible to duplicate my main database on a
slave server in order to run these cron on this second server... then,
replicate these changes on the main database (master).

http://www.postgresql.org/docs/9.3/static/warm-standby.html ?

If you do it the other way around, a hot standby (see: http://www.postgresql.org/docs/9.3/static/hot-standby.html) might work for you.

That is, your cron scripts would UPDATE the master server, and the Web site would do read-only queries against the slave server.

That’s for the built-in replication of PostgreSQL (sorry, I have no experience with the add-on replication solutions).

#6Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Arthur Silva (#4)
Re: Which replication is the best for our case ?

On 06/29/2015 08:23 AM, Arthur Silva wrote:

On Mon, Jun 29, 2015 at 10:02 AM, ben.play <benjamin.cohen@playrion.com
<mailto:benjamin.cohen@playrion.com>> wrote:

Hi guys,

We have a PG database with more than 400 GB of data.
At this moment, a cron runs each ten minutes and updates about 10
000 lines
with complex algorithms in PHP.

Each time the cron runs, the website is almost down because some queries
have to make an update on the FULL table...

Therefore, I'm asking if it's possible to duplicate my main database
on a
slave server in order to run these cron on this second server... then,
replicate these changes on the main database (master).

Which replication is the best in this case ?

http://www.postgresql.org/docs/9.3/static/warm-standby.html ?

Do you have any links or tutorial which explain this kind of operation ?

Thanks a lot !

--
View this message in context:
http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org
<mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Hello Adrian, can you give us one example of such FULL table update queries?

Actually it is the OP(Ben) that is going to have to supply that.

By website down you mean slowed to a halt or read-only mode (due to the
update locks)?

Either way it doesn't look like replication is going to solve your problem.

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Arthur Silva
arthurprs@gmail.com
In reply to: Adrian Klaver (#6)
Re: Which replication is the best for our case ?

On Mon, Jun 29, 2015 at 1:44 PM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 06/29/2015 08:23 AM, Arthur Silva wrote:

On Mon, Jun 29, 2015 at 10:02 AM, ben.play <benjamin.cohen@playrion.com
<mailto:benjamin.cohen@playrion.com>> wrote:

Hi guys,

We have a PG database with more than 400 GB of data.
At this moment, a cron runs each ten minutes and updates about 10
000 lines
with complex algorithms in PHP.

Each time the cron runs, the website is almost down because some
queries
have to make an update on the FULL table...

Therefore, I'm asking if it's possible to duplicate my main database
on a
slave server in order to run these cron on this second server... then,
replicate these changes on the main database (master).

Which replication is the best in this case ?

http://www.postgresql.org/docs/9.3/static/warm-standby.html ?

Do you have any links or tutorial which explain this kind of
operation ?

Thanks a lot !

--
View this message in context:

http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org
<mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Hello Adrian, can you give us one example of such FULL table update
queries?

Actually it is the OP(Ben) that is going to have to supply that.

By website down you mean slowed to a halt or read-only mode (due to the
update locks)?

Either way it doesn't look like replication is going to solve your
problem.

--
Adrian Klaver
adrian.klaver@aklaver.com

Oh of course Adrian! I must have confused the names at the time. Sorry!

#8Jeff Janes
jeff.janes@gmail.com
In reply to: ben.play (#1)
Re: Which replication is the best for our case ?

On Mon, Jun 29, 2015 at 6:02 AM, ben.play <benjamin.cohen@playrion.com>
wrote:

Hi guys,

We have a PG database with more than 400 GB of data.
At this moment, a cron runs each ten minutes and updates about 10 000 lines
with complex algorithms in PHP.

Each time the cron runs, the website is almost down because some queries
have to make an update on the FULL table...

Why is it updating the full table of 400GB if it only changes 10,000 lines?

If most of the rows are being updated degenerately (they update the column
to have the same value it already has) then just add a where clause to
filter out those degenerate updates, unless the degenerate update is needed
for locking purposes, which is rare.

Therefore, I'm asking if it's possible to duplicate my main database on a

slave server in order to run these cron on this second server... then,
replicate these changes on the main database (master).

Sounds like you are trying to use a bulldozer to change a lightbulb.

Improving queries (including the effect running some queries has on the
entire system) starts with "EXPLAIN (ANALYZE, BUFFERS)", not with
multimaster replication.

Cheers,

Jeff

#9Melvin Davidson
melvin6925@gmail.com
In reply to: Jeff Janes (#8)
Re: Which replication is the best for our case ?

I think it would help immensely if you provided details such as
table_structure, indexes the actual UPDATE query and the reason all rows of
the table must be updated.

On Mon, Jun 29, 2015 at 1:15 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Mon, Jun 29, 2015 at 6:02 AM, ben.play <benjamin.cohen@playrion.com>
wrote:

Hi guys,

We have a PG database with more than 400 GB of data.
At this moment, a cron runs each ten minutes and updates about 10 000
lines
with complex algorithms in PHP.

Each time the cron runs, the website is almost down because some queries
have to make an update on the FULL table...

Why is it updating the full table of 400GB if it only changes 10,000 lines?

If most of the rows are being updated degenerately (they update the column
to have the same value it already has) then just add a where clause to
filter out those degenerate updates, unless the degenerate update is needed
for locking purposes, which is rare.

Therefore, I'm asking if it's possible to duplicate my main database on a

slave server in order to run these cron on this second server... then,
replicate these changes on the main database (master).

Sounds like you are trying to use a bulldozer to change a lightbulb.

Improving queries (including the effect running some queries has on the
entire system) starts with "EXPLAIN (ANALYZE, BUFFERS)", not with
multimaster replication.

Cheers,

Jeff

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#10John R Pierce
pierce@hogranch.com
In reply to: Melvin Davidson (#9)
Re: Which replication is the best for our case ?

On 6/29/2015 10:41 AM, Melvin Davidson wrote:

I think it would help immensely if you provided details such as
table_structure, indexes the actual UPDATE query and the reason all
rows of the table must be updated.

indeed, the whole model of massaging the entire database every 10
minutes is highly suspect.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11ben.play
benjamin.cohen@playrion.com
In reply to: Jeff Janes (#8)
Re: Which replication is the best for our case ?

Hi guys,

Thank you a lot for your answers.

In fact, I tried to write the easiest explanation of my problem in order to
be understood...
My project is developed with Symfony and Doctrine (BERK, i know ...).

The project has more than 2 years and Doctrine makes some bad decisions and
lock all the table for a while.
We are developing the project without Doctrine but it will not be available
within 1 year...

To be more precise : We have a database with more than 400 Gb and ONE table
with more than 100 Gb of data. This is huge for doctrine. When the cron
runs, it writes a lot on the disks in temporary file (although we have 128
GB of Ram...). Of course, each table is well indexes...

That is why I'm thinking about replication : My server A (master) is for my
users... and my server B is a server reserved for calculations (and this
server B which writes on the base)

This is a image of my dream system :
<http://postgresql.nabble.com/file/n5855916/9e41ce1f-38ea-4fba-a437-a43c598e655c.jpg&gt;
(If you can't see the image :
http://tof.canardpc.com/view/9e41ce1f-38ea-4fba-a437-a43c598e655c.jpg)

Thank you a lot for your help !

--
View this message in context: http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685p5855916.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12Arthur Silva
arthurprs@gmail.com
In reply to: ben.play (#11)
Re: Which replication is the best for our case ?

On Tue, Jun 30, 2015 at 1:57 PM, ben.play <benjamin.cohen@playrion.com>
wrote:

Hi guys,

Thank you a lot for your answers.

In fact, I tried to write the easiest explanation of my problem in order to
be understood...
My project is developed with Symfony and Doctrine (BERK, i know ...).

The project has more than 2 years and Doctrine makes some bad decisions and
lock all the table for a while.
We are developing the project without Doctrine but it will not be available
within 1 year...

To be more precise : We have a database with more than 400 Gb and ONE table
with more than 100 Gb of data. This is huge for doctrine. When the cron
runs, it writes a lot on the disks in temporary file (although we have 128
GB of Ram...). Of course, each table is well indexes...

That is why I'm thinking about replication : My server A (master) is for my
users... and my server B is a server reserved for calculations (and this
server B which writes on the base)

This is a image of my dream system :
<
http://postgresql.nabble.com/file/n5855916/9e41ce1f-38ea-4fba-a437-a43c598e655c.jpg

(If you can't see the image :
http://tof.canardpc.com/view/9e41ce1f-38ea-4fba-a437-a43c598e655c.jpg)

Thank you a lot for your help !

--
View this message in context:
http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685p5855916.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Hello,

Streaming replication will do just fine from ServerA to ServerB, but as for
the rest of the data flow I'm afraid we will need more details.

#13John R Pierce
pierce@hogranch.com
In reply to: ben.play (#11)
Re: Which replication is the best for our case ?

On 6/30/2015 9:57 AM, ben.play wrote:

To be more precise : We have a database with more than 400 Gb and ONE table
with more than 100 Gb of data. This is huge for doctrine. When the cron
runs, it writes a lot on the disks in temporary file (although we have 128
GB of Ram...). Of course, each table is well indexes...

That is why I'm thinking about replication : My server A (master) is for my
users... and my server B is a server reserved for calculations (and this
server B which writes on the base)

This is a image of my dream system :
<http://postgresql.nabble.com/file/n5855916/9e41ce1f-38ea-4fba-a437-a43c598e655c.jpg&gt;
(If you can't see the image :
http://tof.canardpc.com/view/9e41ce1f-38ea-4fba-a437-a43c598e655c.jpg)

what happens when master A continues to update/insert into these tables
that your cron job is batch updating on the offline copy ? How would you
merge those changes in ?

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#14ben.play
benjamin.cohen@playrion.com
In reply to: John R Pierce (#13)
Re: Which replication is the best for our case ?

In fact, the cron job will :
-> select about 10 000 lines from a big table (>100 Gb of data). 1 user has
about 10 lines.
-> each line will be examinate by an algorithm
-> at the end of each line, the cron job updates a few parameters for the
user (add some points for example)
-> Then, it inserts a line in another table to indicate to the user each
transaction.

All updates and inserts can be inserted ONLY by the cron job ...
Therefore ... the merge can be done easily : no one can be update these new
datas.

But ... how big company like Facebook or Youtube can calculate on (a)
dedicated server(s) without impacting users ?

--
View this message in context: http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685p5856062.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15John R Pierce
pierce@hogranch.com
In reply to: ben.play (#14)
Re: Which replication is the best for our case ?

On 7/1/2015 3:08 AM, ben.play wrote:

In fact, the cron job will :
-> select about 10 000 lines from a big table (>100 Gb of data). 1 user has
about 10 lines.
-> each line will be examinate by an algorithm
-> at the end of each line, the cron job updates a few parameters for the
user (add some points for example)
-> Then, it inserts a line in another table to indicate to the user each
transaction.

All updates and inserts can be inserted ONLY by the cron job ...
Therefore ... the merge can be done easily : no one can be update these new
datas.

But ... how big company like Facebook or Youtube can calculate on (a)
dedicated server(s) without impacting users ?

that sort of batch processing is not normally done in database-centric
systems, rather, databases are usually updated continuously in realtime
as the events come in via transactions.

your cron task is undoubtably single threaded which means it runs on one
core only, so the whole system ends up waiting on a single task
crunching massive amounts of data, while your other processor cores have
nothing to do.

it sounds to me like whomever designed this system didn't have a solid
grip on transactional database processing.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#16Arthur Silva
arthurprs@gmail.com
In reply to: ben.play (#14)
Re: Which replication is the best for our case ?

On Wed, Jul 1, 2015 at 7:08 AM, ben.play <benjamin.cohen@playrion.com>
wrote:

In fact, the cron job will :
-> select about 10 000 lines from a big table (>100 Gb of data). 1 user has
about 10 lines.
-> each line will be examinate by an algorithm
-> at the end of each line, the cron job updates a few parameters for the
user (add some points for example)
-> Then, it inserts a line in another table to indicate to the user each
transaction.

All updates and inserts can be inserted ONLY by the cron job ...
Therefore ... the merge can be done easily : no one can be update these new
datas.

But ... how big company like Facebook or Youtube can calculate on (a)
dedicated server(s) without impacting users ?

--
View this message in context:
http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685p5856062.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

I'm assuming this query is really HUGE,
otherwise I can't see why it'd bring your database to halt, specially with
that amount of main memory.

That aside, I don't see why you can't send inserts in small batches back to
the master DB.

Regards.