|| PostgreSQL

Started by Duane Currieover 26 years ago3 messages
#1Duane Currie
dcurrie@sandman.acadiau.ca

Hi Everybody.

We've gotten a few requests for a distributed version of postgresql.
There's two models that I think could work for this. What I'm looking
for is two things:
1. Any opinions on which option to take.
2. Anyone willing to work on the project.

The features people are looking for are fault tolerance, and load distribution

The two models:
1. A separate server process which manages the parallelism.
Client connects to that server, which handles the request
using (nearly) unmodified backends on different machines.
Would make use of the fact that the vast majority of
commands are read-only.

Advantages:
Platform independent. (e.g. SPARC's and x86 in
same cluster)
Less inherently complex (simpler, separate component)
Less changes to current code

Potential Snag:
May need a different communication interface built..
(maybe not...)

2. Modifying the current server to handle the parallelism.
i.e. Start the server with another option and config file
(or something equivalent). All modifications built into
the current server.

Advantages:
Could more easily be made to run single queries in
parallel
Could be made more efficient by directly making
storage calls, instead of using an SQL
interface.

Potential Snag:
Adds a lot of complexity to the current backend.

My personal preference is toward option 1.
It sounds a lot easier to implement, and works through a well-defined
interface (SQL).
Platform independent, just in case someone decides to throw in a
SPARC box into their room of Alphas.
Minimal changes to the current backend which keeps the backend simpler
for people to work on. (My next email's gonna go down that
line somewhere)

Any preferences or options I haven't thought of? (or any details which would
complicate the project?) As well, would anybody be interested in working on
this?

Duane

#2Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Duane Currie (#1)
Re: [HACKERS] || PostgreSQL

Ingres happened to implement their distributed system using option
(1), having a distributed front-end which knew about remote servers
and could parse and optimize queries then send individual queries to
the actual servers.

Seemed to work pretty well, and you could reuse a large amount of
code. otoh, if you implemented option (2) (local and remote tables),
then you could choose to construct your database on the option (1)
model without penalty.

- Thomas

--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California

#3Ross J. Reedstrom
reedstrm@wallace.ece.rice.edu
In reply to: Duane Currie (#1)
Re: [HACKERS] || PostgreSQL

Hi Duane -
I swear, the syncronicity is starting to get _real_ thick around here.
Take a look at the discussion between myself, Tom Lane, and Bruce
Mojarian yesterday, under the helpful subject line of Mariposa. We discuss
implementing model 2, which, as Thomas points out in his reply to you,
can be a superset of model 1. I need to be able to set up a distributed,
heterogenous database system. We've priced commerical offerings in this
field, and ones with sufficent flexibilty to do what we need start at
$40000 for a _two_ backend license.

Anyway, let's combine forces! I'm going to take a stab at this anyway, no
sense in duplicating effort. I could do a local CVS tree, to keep us synced.

Ross

P.S. Mariposa was a project out of Stonebraker's lab at Berkeley, to build a
distributed db. checkout http://mariposa.cs.berkeley.edu

On Tue, Aug 03, 1999 at 10:33:05AM +0000, Duane Currie wrote:

<SNIP>

Any preferences or options I haven't thought of? (or any details which would
complicate the project?) As well, would anybody be interested in working on
this?

--
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu>
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St., Houston, TX 77005