Replication & web apps

Started by Leonardo Francalanciabout 20 years ago9 messagesgeneral
Jump to latest
#1Leonardo Francalanci
Leonardo.Francalanci@CommProve.com

Hi,

I still don't understand how replication can be used in web applications.
Given this scenario:

1) user updates his profile -> update to the db (master)
2) web app redirects to the "profile page" -> select from db (slave)

Since (2) is a select it is issued to the slave.

How can one be sure that the master propagates the update (1) to the
slave before data is requested from the slave (2)?
And: suppose there is a method to understand that the user made a change
to the db in the web request (as above) so that we have to issue all
queries of the same web request to the master, that is:

1) user updates his profile -> update to the db (master)
2) web app redirects to the "profile page" -> select from db (master
again because in this web-request user made a change to the db)

what if the user ask AGAIN for the "profile page" BEFORE write
propagates to the slave:

3) User ask for a refresh of the "profile page" -> select -> slave
(because user didn't make any change during THIS web request)

???

In other words: how can asynchronous replication be used in an
application???

#2Jeff Amiel
jamiel@istreamimaging.com
In reply to: Leonardo Francalanci (#1)
Re: Replication & web apps

Well, you've hit upon the difference between synchronous replication and
asynchronous replication (and touched on the idea of clustering/load
balancing)
In a synchronous replication scheme, the database update is made to BOTH
master and slave simultaneously....the transactions must commit to both
or it is rolled back.
In asynchronous replication, there can indeed be a delay between when
the master makes update and the slave receives the notification of it.

pg_cluster is an example of a synchronous replication method (although
it's really considered multi-master...not master-slave)
Slony is an example of asynchronous.

There are other techniques to balance the load of the database calls so
that some go to one box and some to others, yet keep the data in synch...
Continuent makes a commercial p/cluster product as well as an open
source product called Sequoia that sit in the JDBC layer and direct
traffic and control the load balancing.

Jeff Amiel

Leonardo Francalanci wrote:

Show quoted text

Hi,

I still don't understand how replication can be used in web applications.
Given this scenario:

1) user updates his profile -> update to the db (master)
2) web app redirects to the "profile page" -> select from db (slave)

Since (2) is a select it is issued to the slave.

How can one be sure that the master propagates the update (1) to the
slave before data is requested from the slave (2)?
And: suppose there is a method to understand that the user made a
change to the db in the web request (as above) so that we have to
issue all queries of the same web request to the master, that is:

1) user updates his profile -> update to the db (master)
2) web app redirects to the "profile page" -> select from db (master
again because in this web-request user made a change to the db)

what if the user ask AGAIN for the "profile page" BEFORE write
propagates to the slave:

3) User ask for a refresh of the "profile page" -> select -> slave
(because user didn't make any change during THIS web request)

???

In other words: how can asynchronous replication be used in an
application???

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

#3Jeff Amiel
jamiel@istreamimaging.com
In reply to: Jeff Amiel (#2)
Re: Replication & web apps

I just reread your post and see that you already understand that what
you are describing is async replication. Duh.
That being said, what you are trying to do is really combine the idea of
load balancing and async replication....not too feasible I would think.
Usually when people are using async replication, the slave nodes are
only used as backups or for read-only reporting systems)

Jeff Amiel wrote:

Show quoted text

Well, you've hit upon the difference between synchronous replication
and asynchronous replication (and touched on the idea of
clustering/load balancing)
In a synchronous replication scheme, the database update is made to
BOTH master and slave simultaneously....the transactions must commit
to both or it is rolled back.
In asynchronous replication, there can indeed be a delay between when
the master makes update and the slave receives the notification of it.

pg_cluster is an example of a synchronous replication method (although
it's really considered multi-master...not master-slave)
Slony is an example of asynchronous.

There are other techniques to balance the load of the database calls
so that some go to one box and some to others, yet keep the data in
synch...
Continuent makes a commercial p/cluster product as well as an open
source product called Sequoia that sit in the JDBC layer and direct
traffic and control the load balancing.

Jeff Amiel

Leonardo Francalanci wrote:

Hi,

I still don't understand how replication can be used in web
applications.
Given this scenario:

1) user updates his profile -> update to the db (master)
2) web app redirects to the "profile page" -> select from db (slave)

Since (2) is a select it is issued to the slave.

How can one be sure that the master propagates the update (1) to the
slave before data is requested from the slave (2)?
And: suppose there is a method to understand that the user made a
change to the db in the web request (as above) so that we have to
issue all queries of the same web request to the master, that is:

1) user updates his profile -> update to the db (master)
2) web app redirects to the "profile page" -> select from db (master
again because in this web-request user made a change to the db)

what if the user ask AGAIN for the "profile page" BEFORE write
propagates to the slave:

3) User ask for a refresh of the "profile page" -> select -> slave
(because user didn't make any change during THIS web request)

???

In other words: how can asynchronous replication be used in an
application???

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

#4Leonardo Francalanci
Leonardo.Francalanci@CommProve.com
In reply to: Jeff Amiel (#2)
Re: Replication & web apps

pg_cluster is an example of a synchronous replication method (although
it's really considered multi-master...not master-slave)

It looks like pgcluster is what I would need, I just don't understand
how it works... aren't there some "good" docs about it?
Which are its limits?

And: is there a way to search the mailing list? It looks like
Pgcluster-general and Pgpool-general aren't searchable...

Thank you in advance

#5Philip Hallstrom
postgresql@philip.pjkh.com
In reply to: Jeff Amiel (#2)
Re: Replication & web apps

There are other techniques to balance the load of the database calls so that
some go to one box and some to others, yet keep the data in synch...
Continuent makes a commercial p/cluster product as well as an open source
product called Sequoia that sit in the JDBC layer and direct traffic and
control the load balancing.

We use continuent at work (albeit for mysql...) on a three node cluster.

In their default setup they use what's called "strict" querying. By which
they mean any read to any node will first check to make sure there aren't
any pending writes before continuing.

The other mode is "weak". Where reads don't bother checking to see if
there are any pending writes before doing their read. In this mode, we
run the risk that a read might happen before a previous write it's
depending on. In our tests though the time for replication was really
small (don't remember offhand) but certainly much much smaller than a
round trip request for an end user would be.

That all said, we had to go with weak because strict killed performance.
Absolutely killed it.

Anyway, just something to keep in mind. Kind of like raid... you lose in
one area to gain in another....

-philip

Show quoted text

Jeff Amiel

Leonardo Francalanci wrote:

Hi,

I still don't understand how replication can be used in web applications.
Given this scenario:

1) user updates his profile -> update to the db (master)
2) web app redirects to the "profile page" -> select from db (slave)

Since (2) is a select it is issued to the slave.

How can one be sure that the master propagates the update (1) to the slave
before data is requested from the slave (2)?
And: suppose there is a method to understand that the user made a change to
the db in the web request (as above) so that we have to issue all queries
of the same web request to the master, that is:

1) user updates his profile -> update to the db (master)
2) web app redirects to the "profile page" -> select from db (master again
because in this web-request user made a change to the db)

what if the user ask AGAIN for the "profile page" BEFORE write propagates
to the slave:

3) User ask for a refresh of the "profile page" -> select -> slave (because
user didn't make any change during THIS web request)

???

In other words: how can asynchronous replication be used in an
application???

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

#6Brad Nicholson
bnichols@ca.afilias.info
In reply to: Jeff Amiel (#2)
Re: Replication & web apps

Jeff Amiel wrote:

There are other techniques to balance the load of the database calls so
that some go to one box and some to others, yet keep the data in synch...
Continuent makes a commercial p/cluster product as well as an open
source product called Sequoia that sit in the JDBC layer and direct
traffic and control the load balancing.

pgpool allows you to load balance queries between two servers. While it
doesn't solve the issue of trying to read a row that hasn't been
replicated, it does offer you a bit of control that may be of use to
you. Any statement that is in a transaction will automatically go to
the master DB. If some queries are sensitive to this issue, and some
aren't, then you might be able to make pgpool work for you by wrapping
the sensitive ones in a transaction.

--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.

#7Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Brad Nicholson (#6)
Re: Replication & web apps

On Thu, 2006-03-16 at 13:18, Brad Nicholson wrote:

Jeff Amiel wrote:

There are other techniques to balance the load of the database calls so
that some go to one box and some to others, yet keep the data in synch...
Continuent makes a commercial p/cluster product as well as an open
source product called Sequoia that sit in the JDBC layer and direct
traffic and control the load balancing.

pgpool allows you to load balance queries between two servers. While it
doesn't solve the issue of trying to read a row that hasn't been
replicated, it does offer you a bit of control that may be of use to
you. Any statement that is in a transaction will automatically go to
the master DB. If some queries are sensitive to this issue, and some
aren't, then you might be able to make pgpool work for you by wrapping
the sensitive ones in a transaction.

Note that it's also possible to use pgpool as a synchronous clustering
too with automatic failover. Since it is statement based, you have to
make sure both DBs are in exactly the same state when you start it up,
and you can't rely on things like now() being exactly the same, nor on
random() and a few other things.

however, for many applications, this is plenty good enough, and it's
dirt simple to setup and use.

#8Chris Browne
cbbrowne@acm.org
In reply to: Leonardo Francalanci (#1)
Re: Replication & web apps

In the last exciting episode, Leonardo.Francalanci@CommProve.com (Leonardo Francalanci) wrote:

In other words: how can asynchronous replication be used in an
application???

Yes, this is an issue.

Asynchronous replication is NOT suitable in cases where you point
applications that need forcibly up-to-date information to a replica
that is (even just a little) behind.

We *have to* point all of the applications that need
up-to-the-instance balance information to the "master" system.

But there are doubtless plenty of places where you could use data that
isn't *forcibly* up to date.

- For instance, running temporal reports against a replica should turn
out fine.

- We run the WHOIS service against replicas; we don't promise
instantaneous propagation, and nobody worries too much if they get
information that, in a few cases, is a few seconds out of date.

- In a web sales system, I'd be inclined to run "normal" queries
against a replica for the portion of the workflow that involves
users assembling their "basket" of products. And move to
pointing to the "master" when it comes time to "check out" the
order, so that the final order gets the most up-to-date data.

Asynchronous replication isn't good for *everything*, but you'll
generally find ways to use replicas, when you have them...
--
let name="cbbrowne" and tld="ntlug.org" in String.concat "@" [name;tld];;
http://linuxfinances.info/info/lsf.html
"The best design is not predicated on how brain-dead you can be and
still operate it." -- David C. Wright

#9Leonardo Francalanci
Leonardo.Francalanci@CommProve.com
In reply to: Philip Hallstrom (#5)
Re: Replication & web apps

We use continuent at work (albeit for mysql...) on a three node cluster.

That's a good project, the only thing I don't like is that one is forced
to use Java which is not what I'd like to do (I'd prefer Ruby).

Thank you everybody for your answers.

Leonardo