Re: MySQL and BerkleyDB (fwd)

Started by The Hermit Hackeralmost 25 years ago26 messages
#1The Hermit Hacker
scrappy@hub.org

Is anyone looking at doing this? Is this purely a MySQL-ism, or is it
something that everyone else has except us?

Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

---------- Forwarded message ----------
Date: Mon, 22 Jan 2001 09:03:58 -0600
From: Dave Glowacki <dglo@sweetpea.ssec.wisc.edu>
To: The Hermit Hacker <scrappy@hub.org>
Cc: Radovan Gibala <gigi@agraf.sk>, ports@freebsd.org
Subject: Re: MySQL and BerkleyDB

The Hermit Hacker wrote:

On Mon, 22 Jan 2001, Radovan Gibala wrote:

Is there any possibility to get a port for MySQL with BerkleyDB support?
I realy need the transaction support and I'd like to build MySQL from a
port.

why not just build PgSQL, and have transaction support *with* subselects
and everything else that mySQL doesn't have?

I'd *love* to use PgSQL, but it doesn't support cross-DB joins (or at
least I couldn't figure out how to do it.) MySQL handles this, so
I'm using MySQL and would also like to have transaction support...

#2Marko Kreen
marko@l-t.ee
In reply to: The Hermit Hacker (#1)
Re: Re: MySQL and BerkleyDB (fwd)

On Mon, Jan 22, 2001 at 11:30:17AM -0400, The Hermit Hacker wrote:

Is anyone looking at doing this? Is this purely a MySQL-ism, or is it
something that everyone else has except us?

Afaik either Informix or Sybase (both?) has it too. Dunno about
anything else.

Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

---------- Forwarded message ----------
Date: Mon, 22 Jan 2001 09:03:58 -0600
From: Dave Glowacki <dglo@sweetpea.ssec.wisc.edu>
To: The Hermit Hacker <scrappy@hub.org>
Cc: Radovan Gibala <gigi@agraf.sk>, ports@freebsd.org
Subject: Re: MySQL and BerkleyDB

The Hermit Hacker wrote:

On Mon, 22 Jan 2001, Radovan Gibala wrote:

Is there any possibility to get a port for MySQL with BerkleyDB support?
I realy need the transaction support and I'd like to build MySQL from a
port.

why not just build PgSQL, and have transaction support *with* subselects
and everything else that mySQL doesn't have?

I'd *love* to use PgSQL, but it doesn't support cross-DB joins (or at
least I couldn't figure out how to do it.) MySQL handles this, so
I'm using MySQL and would also like to have transaction support...

--
marko

#3Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Marko Kreen (#2)
AW: Re: MySQL and BerkleyDB (fwd)

Is anyone looking at doing this? Is this purely a MySQL-ism, or is it
something that everyone else has except us?

We should not only support access to all db's under one postmaster,
but also remote access to other postmaster's databases.
All biggie db's allow this in one way or another (synonyms,
qualified object names) including 2-phase commit.
Ideally this includes access to other db manufacturers, flat files, bdb ...
Meaning, that this is a problem needing a generic approach.

Andreas

Show quoted text

Is there any possibility to get a port for MySQL with BerkleyDB support?
I realy need the transaction support and I'd like to build MySQL from a
port.

why not just build PgSQL, and have transaction support *with* subselects
and everything else that mySQL doesn't have?

I'd *love* to use PgSQL, but it doesn't support cross-DB joins (or at
least I couldn't figure out how to do it.) MySQL handles this, so
I'm using MySQL and would also like to have transaction support...

#4Peter Eisentraut
peter_e@gmx.net
In reply to: The Hermit Hacker (#1)
Re: Re: MySQL and BerkleyDB (fwd)

The Hermit Hacker writes:

Is anyone looking at doing this? Is this purely a MySQL-ism, or is it
something that everyone else has except us?

It's not required by SQL, that's for sure. I think in 7.2 we'll tackle
schema support, which will accomplish the same thing. Many people
(including myself) are of the opinion that not allowing cross-db access is
in fact a feature.

Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

---------- Forwarded message ----------
Date: Mon, 22 Jan 2001 09:03:58 -0600
From: Dave Glowacki <dglo@sweetpea.ssec.wisc.edu>
To: The Hermit Hacker <scrappy@hub.org>
Cc: Radovan Gibala <gigi@agraf.sk>, ports@freebsd.org
Subject: Re: MySQL and BerkleyDB

The Hermit Hacker wrote:

On Mon, 22 Jan 2001, Radovan Gibala wrote:

Is there any possibility to get a port for MySQL with BerkleyDB support?
I realy need the transaction support and I'd like to build MySQL from a
port.

why not just build PgSQL, and have transaction support *with* subselects
and everything else that mySQL doesn't have?

I'd *love* to use PgSQL, but it doesn't support cross-DB joins (or at
least I couldn't figure out how to do it.) MySQL handles this, so
I'm using MySQL and would also like to have transaction support...

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#2)
Re: Re: MySQL and BerkleyDB (fwd)

On Mon, Jan 22, 2001 at 11:30:17AM -0400, The Hermit Hacker wrote:

Is anyone looking at doing this? Is this purely a MySQL-ism, or is it
something that everyone else has except us?

Think "schema". I don't foresee supporting joins across multiple
databases any time soon (unless someone wants to resurrect the old
Mariposa code), but once we have schemas you can put things into
different schemas of one database and get most of the user-level
benefit of multiple databases, while still being able to join to
things that are in other schemas.

I haven't looked hard at what it will take to do schemas, but it's
high on my priority list for 7.2. Ross Reedstrom has already done
some preliminary work ...

regards, tom lane

#6Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Tom Lane (#5)
AW: Re: MySQL and BerkleyDB (fwd)

Is anyone looking at doing this? Is this purely a MySQL-ism, or is it
something that everyone else has except us?

It's not required by SQL, that's for sure. I think in 7.2 we'll tackle
schema support, which will accomplish the same thing.

It does not (e.g. remote access).

Many people
(including myself) are of the opinion that not allowing cross-db access is
in fact a feature.

Can you tell me what that "feature" gains you other than mere inconvenience ?
And are you implying, that all the other db's are misfeatured in this regard?

Andreas

#7The Hermit Hacker
scrappy@hub.org
In reply to: Zeugswetter Andreas SB (#3)
Re: AW: Re: MySQL and BerkleyDB (fwd)

sounds like something that should be handled at the application level
though ... at least the concept of 'access to other db manufacturers' ...
no?

On Mon, 22 Jan 2001, Zeugswetter Andreas SB wrote:

Is anyone looking at doing this? Is this purely a MySQL-ism, or is it
something that everyone else has except us?

We should not only support access to all db's under one postmaster,
but also remote access to other postmaster's databases.
All biggie db's allow this in one way or another (synonyms,
qualified object names) including 2-phase commit.
Ideally this includes access to other db manufacturers, flat files, bdb ...
Meaning, that this is a problem needing a generic approach.

Andreas

Is there any possibility to get a port for MySQL with BerkleyDB support?
I realy need the transaction support and I'd like to build MySQL from a
port.

why not just build PgSQL, and have transaction support *with* subselects
and everything else that mySQL doesn't have?

I'd *love* to use PgSQL, but it doesn't support cross-DB joins (or at
least I couldn't figure out how to do it.) MySQL handles this, so
I'm using MySQL and would also like to have transaction support...

Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#8Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Peter Eisentraut (#4)

It's not required by SQL, that's for sure. I think in 7.2 we'll tackle
schema support, which will accomplish the same thing. Many people
(including myself) are of the opinion that not allowing cross-db access is
in fact a feature.

I am hoping that when we get to query tree redesign we will have the
hooks to do distributed databases etc. Then "cross-db access" will come
nearly for free, which the user can choose to use or not.

- Thomas

#9Peter Eisentraut
peter_e@gmx.net
In reply to: Zeugswetter Andreas SB (#6)
Re: AW: Re: MySQL and BerkleyDB (fwd)

Zeugswetter Andreas SB writes:

It's not required by SQL, that's for sure. I think in 7.2 we'll tackle
schema support, which will accomplish the same thing.

It does not (e.g. remote access).

Maybe this is handled better by an external corba server or some such
thing.

Many people
(including myself) are of the opinion that not allowing cross-db access is
in fact a feature.

Can you tell me what that "feature" gains you other than mere inconvenience ?
And are you implying, that all the other db's are misfeatured in this regard?

It's a safety/security measure to me. As long as one backend process can
only touch one database you can control things much better. This could be
overcome of course, but I'm quite happy with it.

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

#10Joel Burton
jburton@scw.org
In reply to: Zeugswetter Andreas SB (#3)
Re: AW: Re: MySQL and BerkleyDB (fwd)

On Mon, 22 Jan 2001, Zeugswetter Andreas SB wrote:

Is anyone looking at doing this? Is this purely a MySQL-ism, or is it
something that everyone else has except us?

We should not only support access to all db's under one postmaster,
but also remote access to other postmaster's databases.
All biggie db's allow this in one way or another (synonyms,
qualified object names) including 2-phase commit.
Ideally this includes access to other db manufacturers, flat files, bdb ...
Meaning, that this is a problem needing a generic approach.

Of course, a generic, powerful approach would be great.

However, a simple, limited approach would a be solution for (I
suspect) 97% of the cases, which is that one software package creates a
database to store mailing list names, and another creates a database to
store web permissions, and you want to write a query that encompasses
both, w/o semi-tedious COPY TO FILEs to temporarily move a table back and
forth. And of course, a simple solution might be completed faster :-)

How could this be handled?

* a syntax for db-table names, such as mydb.myfield or something like
that. (do we have any unused punctuation? :-) )

* aliases, so that tblFoo in dbA can be called as ToFoo in dbB

* other ways?

The second might be easier from a conversion view: the user wouldn't have
to understand that this was a 'link', but it might prove complicated when
there are many links to keep track of, etc.

--
Joel Burton <jburton@scw.org>
Director of Information Systems, Support Center of Washington

#11Ross J. Reedstrom
reedstrm@rice.edu
In reply to: Joel Burton (#10)
Re: Re: AW: Re: MySQL and BerkleyDB (fwd)

On Mon, Jan 22, 2001 at 12:18:54PM -0500, Joel Burton wrote:

On Mon, 22 Jan 2001, Zeugswetter Andreas SB wrote:

Is anyone looking at doing this? Is this purely a MySQL-ism, or is it
something that everyone else has except us?

We should not only support access to all db's under one postmaster,
but also remote access to other postmaster's databases.
All biggie db's allow this in one way or another (synonyms,
qualified object names) including 2-phase commit.
Ideally this includes access to other db manufacturers, flat files, bdb ...
Meaning, that this is a problem needing a generic approach.

Of course, a generic, powerful approach would be great.

However, a simple, limited approach would a be solution for (I
suspect) 97% of the cases, which is that one software package creates a
database to store mailing list names, and another creates a database to
store web permissions, and you want to write a query that encompasses
both, w/o semi-tedious COPY TO FILEs to temporarily move a table back and
forth. And of course, a simple solution might be completed faster :-)

How could this be handled?

And this case can be handled within one database by having multiple
schema, one for each package. It's not there yet, but it's a simpler
solution than the generic solution. The problem (as others have mentioned)
is that we don't want to open the door to remote access until we have a
two-phase transaction commit mechanism in place. Doing it any other way
is not a 'partial solution', it's a corrupt database waiting to happen.

* a syntax for db-table names, such as mydb.myfield or something like
that. (do we have any unused punctuation? :-) )

This is the sort of syntax that SQL9* specify for cross schema access.
So far, it fits into the parser just fine.

* aliases, so that tblFoo in dbA can be called as ToFoo in dbB

This can be done with views, once schema are in place.

Ross
--
Open source code is like a natural resource, it's the result of providing
food and sunshine to programmers, and then staying out of their way.
[...] [It] is not going away because it has utility for both the developers
and users independent of economic motivations. Jim Flynn, Sunnyvale, Calif.

#12Joel Burton
jburton@scw.org
In reply to: Ross J. Reedstrom (#11)
Re: Re: AW: Re: MySQL and BerkleyDB (fwd)

On Mon, 22 Jan 2001, Ross J. Reedstrom wrote:

And this case can be handled within one database by having multiple
schema, one for each package. It's not there yet, but it's a simpler
solution than the generic solution. The problem (as others have mentioned)
is that we don't want to open the door to remote access until we have a
two-phase transaction commit mechanism in place. Doing it any other way
is not a 'partial solution', it's a corrupt database waiting to happen.

What does '2-phase transaction commit mechanism' mean in this case?

--
Joel Burton <jburton@scw.org>
Director of Information Systems, Support Center of Washington

#13Ross J. Reedstrom
reedstrm@rice.edu
In reply to: Joel Burton (#12)
Re: Re: AW: Re: MySQL and BerkleyDB (fwd)

On Mon, Jan 22, 2001 at 12:41:38PM -0500, Joel Burton wrote:

On Mon, 22 Jan 2001, Ross J. Reedstrom wrote:

And this case can be handled within one database by having multiple
schema, one for each package. It's not there yet, but it's a simpler
solution than the generic solution. The problem (as others have mentioned)
is that we don't want to open the door to remote access until we have a
two-phase transaction commit mechanism in place. Doing it any other way
is not a 'partial solution', it's a corrupt database waiting to happen.

What does '2-phase transaction commit mechanism' mean in this case?

Same thing it means elsewhere. Typing "two phase commit" into Google gets me
this url:

http://webopedia.internet.com/Computer_Science/Transaction_Processing/two_phase_commit.html

Which says:

A feature of transaction processing systems that enables databases
to be returned to the pre-transaction state if some error condition
occurs. A single transaction can update many different databases. The
two-phase commit strategy is designed to ensure that either all the
databases are updated or none of them, so that the databases remain
synchronized.

Database changes required by a transaction are initially stored
temporarily by each database. The transaction monitor then
issues a "pre-commit" command to each database which requires an
acknowledgment. If the monitor receives the appropriate response from
each database, the monitor issues the "commit" command, which causes
all databases to simultaneously make the transaction changes permanent.

This 'pre-commit' 'really commit' two-step (get 'yer cowboy hats, right
here) is what's needed, and is currently missing from pgsql.

Ross

#14Emmanuel Charpentier
charpent@bacbuc.dyndns.org
In reply to: Peter Eisentraut (#4)
Re: Re: MySQL and BerkleyDB (fwd)

Peter Eisentraut wrote:

The Hermit Hacker writes:

Is anyone looking at doing this? Is this purely a MySQL-ism, or is it
something that everyone else has except us?

It's not required by SQL, that's for sure. I think in 7.2 we'll tackle
schema support, which will accomplish the same thing. Many people
(including myself) are of the opinion that not allowing cross-db access is
in fact a feature.

I am of the inverse opinion : cross-DB joining is the only reasonable
way to cope with the unfortunate, disgracious, unreasonable, but quite
inescapable real-life fact that all data do not live in the same server
in any but the smallest sites ...

I recently did a plea in this list ("A post-7.1 wishlist") in this
direction, and got an answer (Peter Einstraut ?) that was more or less
on the lines of "over our dead bodies !" ... Sigh ...

However, I *think* that it could be done by another tool, such as
Easysoft's (Nick Gorham's, I think) SQL Engine, which allows for joins
between any ODBC-reachable tools. This tool is unreasonably expensive
for private use ($800 + $200/year mandatory maintainance). A PostgreSQL
alternative would be, IMSAO, a huge benefit, even huger if able to
cross-join with ODBC data sources ...

M$ Access has this, since version 1, and that's a hell of a handy
feature for a lot of cases involving management of multiple data sources
...

why not just build PgSQL, and have transaction support *with* subselects
and everything else that mySQL doesn't have?

I'd *love* to use PgSQL, but it doesn't support cross-DB joins (or at
least I couldn't figure out how to do it.) MySQL handles this, so
I'm using MySQL and would also like to have transaction support...

I have to tell that my daily work involves this kind of problems, with
data sources ranging from SAS datasets under MVS/XA to Excel files to
Oracle databases to younameit ... That's the kind of problem I would
*love* to have PostgreSQL to cope with, and *not* M$ Access ...

[ Back to lurking mode ... ]

E. Charpentier

--
Emmanuel Charpentier

#15Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Emmanuel Charpentier (#14)
AW: AW: Re: MySQL and BerkleyDB (fwd)

sounds like something that should be handled at the application level
though ... at least the concept of 'access to other db
manufacturers' ...
no?

Well I know of Informix for one, that allows transparent access to flat files
or other dbs. And this is a feature that often shows very handy.

Andreas

#16Hannu Krosing
hannu@tm.ee
In reply to: The Hermit Hacker (#7)
Re: AW: Re: MySQL and BerkleyDB (fwd)

The Hermit Hacker wrote:

sounds like something that should be handled at the application level
though ... at least the concept of 'access to other db manufacturers' ...
no?

If and when we will get functions that can return rowsets (IIRC Oracle's
RETURN AND CONTINUE)the simplest case can be easily implemented by
having
a user-defined method that just does the query for the whole table (or
for rows where "field in (x,y,z)"

Then putting this in a view and then using it as a table should also
be quite simple (or at least possible ;).

Only after that should we start optimizing ...

--------------
Hannu

#17Noname
dom@idealx.com
In reply to: Ross J. Reedstrom (#13)
Re: Re: AW: Re: MySQL and BerkleyDB (fwd)

This 'pre-commit' 'really commit' two-step (get 'yer cowboy hats, right
here) is what's needed, and is currently missing from pgsql.

Hello,

I'm very interested in this topic since I am involved in a
distributed, several-PostgreSQLs-backed, open-source,
buzzword-compliant database replication middleware (still in the draft
stage though --- this is not an announcement :-).
I had thought that the pre-commit information could be stored in an
auxiliary table by the middleware program ; we would then have
to re-implement some sort of higher-level WAL (I thought of the list
of the commands performed in the current transaction, with a sequence
number for each of them that would guarantee correct ordering between
concurrent transactions in case of a REDO). But I fear I am missing
a number of important issues there ; so could you please comment on my
idea ?
* what should I try not to forget to record in the higher-level WAL
if I want consistency ?
* how could one collect consistent ordering information without
impacting performance too much ? Will ordering suffice to guarantee
correctness of the REDO ? (I mean, are there sources of
nondeterminism in PostgreSQL such as resource exhaustion etc. that I
should be aware of ?)
* would it be easier or harder to help implement 2-phase commit
inside PostgreSQL (but I am not quite a PostgreSQL hacker yet !)

Many thanks in advance !

--
<< Tout n'y est pas parfait, mais on y honore certainement les jardiniers >>

Dominique Quatravaux <dom@kilimandjaro.dyndns.org>

#18Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Noname (#17)
RE: Re: AW: Re: MySQL and BerkleyDB (fwd)

I had thought that the pre-commit information could be stored in an
auxiliary table by the middleware program ; we would then have
to re-implement some sort of higher-level WAL (I thought of the list
of the commands performed in the current transaction, with a sequence
number for each of them that would guarantee correct ordering between
concurrent transactions in case of a REDO). But I fear I am missing

This wouldn't work for READ COMMITTED isolation level.
But why do you want to log commands into WAL where each modification
is already logged in, hm, correct order?
Well, it has sense if you're looking for async replication but
you need not in two-phase commit for this and should aware about
problems with READ COMMITTED isolevel.

Back to two-phase commit - it's easiest part of work required for
distributed transaction processing.
Currently we place single commit record to log and transaction is
committed when this record (and so all other transaction records)
is on disk.
Two-phase commit:

1. For 1st phase we'll place into log "prepared-to-commit" record
and this phase will be accomplished after record is flushed on disk.
At this point transaction may be committed at any time because of
all its modifications are logged. But it still may be rolled back
if this phase failed on other sites of distributed system.

2. When all sites are prepared to commit we'll place "committed"
record into log. No need to flush it because of in the event of
crash for all "prepared" transactions recoverer will have to
communicate other sites to know their statuses anyway.

That's all! It is really hard to implement distributed lock- and
communication- managers but there is no problem with logging two
records instead of one. Period.

Vadim

#19Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Mikheev, Vadim (#18)
Re: Re: AW: Re: MySQL and BerkleyDB (fwd)

[ Charset ISO-8859-1 unsupported, converting... ]

I had thought that the pre-commit information could be stored in an
auxiliary table by the middleware program ; we would then have
to re-implement some sort of higher-level WAL (I thought of the list
of the commands performed in the current transaction, with a sequence
number for each of them that would guarantee correct ordering between
concurrent transactions in case of a REDO). But I fear I am missing

This wouldn't work for READ COMMITTED isolation level.
But why do you want to log commands into WAL where each modification
is already logged in, hm, correct order?
Well, it has sense if you're looking for async replication but
you need not in two-phase commit for this and should aware about
problems with READ COMMITTED isolevel.

I believe the issue here is that while SERIALIZABLE ISOLATION means all
queries can be run serially, our default is READ COMMITTED, meaning that
open transactions see committed transactions, even if the transaction
committed after our transaction started. (FYI, see my chapter on
transactions for help, http://www.postgresql.org/docs/awbook.html.)

To do higher-level WAL, you would have to record not only the queries,
but the other queries that were committed at the start of each command
in your transaction.

Ideally, you could number every commit by its XID your log, and then
when processing the query, pass the "committed" transaction ids that
were visible at the time each command began.

In other words, you can replay the queries in transaction commit order,
except that you have to have some transactions committed at specific
points while other transactions are open, i.e.:

XID Open XIDS Query
500 UPDATE t SET col = 3;
501 500 BEGIN;
501 500 UPDATE t SET col = 4;
501 UPDATE t SET col = 5;
501 COMMIT;

This is a silly example, but it shows that 500 must commit after the
first command in transaction 501, but before the second command in the
transaction. This is because UPDATE t SET col = 5 actually sees the
changes made by transaction 500 in READ COMMITTED isolation level.

I am not advocating this. I think WAL is a better choice. I just
wanted to outline how replaying the queries in commit order is
insufficient.

Back to two-phase commit - it's easiest part of work required for
distributed transaction processing.
Currently we place single commit record to log and transaction is
committed when this record (and so all other transaction records)
is on disk.
Two-phase commit:

1. For 1st phase we'll place into log "prepared-to-commit" record
and this phase will be accomplished after record is flushed on disk.
At this point transaction may be committed at any time because of
all its modifications are logged. But it still may be rolled back
if this phase failed on other sites of distributed system.

2. When all sites are prepared to commit we'll place "committed"
record into log. No need to flush it because of in the event of
crash for all "prepared" transactions recoverer will have to
communicate other sites to know their statuses anyway.

That's all! It is really hard to implement distributed lock- and
communication- managers but there is no problem with logging two
records instead of one. Period.

Great.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#20Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Bruce Momjian (#19)
AW: Re: AW: Re: MySQL and BerkleyDB (fwd)

1. For 1st phase we'll place into log "prepared-to-commit" record
and this phase will be accomplished after record is
flushed on disk.
At this point transaction may be committed at any time because of
all its modifications are logged. But it still may be rolled back
if this phase failed on other sites of distributed system.

1st phase will also need to do all the delayed constraint checks,
and all other work a commit currently does, that could possibly lead
to a transaction abort. The 2nd phase of 2phase commit is not
allowed to return an error, unless of course in case of catastrophe.

2. When all sites are prepared to commit we'll place "committed"
record into log. No need to flush it because of in the event of
crash for all "prepared" transactions recoverer will have to
communicate other sites to know their statuses anyway.

That's all! It is really hard to implement distributed lock- and
communication- managers but there is no problem with logging two
records instead of one. Period.

yup :-) Maybe this could even be raised to the SQL level,
so applications could use this ? I have not seen this elsewhere,
but why actually not ?

Andreas

#21Hannu Krosing
hannu@tm.ee
In reply to: Peter Eisentraut (#4)
Re: Re: MySQL and BerkleyDB (fwd)

Emmanuel Charpentier wrote:

Peter Eisentraut wrote:

Re: cross-database joins

However, I *think* that it could be done by another tool, such as
Easysoft's (Nick Gorham's, I think) SQL Engine, which allows for joins
between any ODBC-reachable tools. This tool is unreasonably expensive
for private use ($800 + $200/year mandatory maintainance). A PostgreSQL
alternative would be, IMSAO, a huge benefit, even huger if able to
cross-join with ODBC data sources ...

M$ Access has this, since version 1, and that's a hell of a handy
feature for a lot of cases involving management of multiple data sources

You should probably use some front-end tools for most of it (I'd
recommend
Zope - http://www.zope.org/ )

Or you could try to make something up starting from Gadfly (an
all-python
SQL databse engine) that is included with zope and also available
separately.

...

I have to tell that my daily work involves this kind of problems, with
data sources ranging from SAS datasets under MVS/XA to Excel files to
Oracle databases to younameit ... That's the kind of problem I would
*love* to have PostgreSQL to cope with, and *not* M$ Access ...

OTOH, much of it could be done if postgres functions could return
datasets,
which is planned for not too distant future.

--------------------
Hannu

#22Noname
dom@idealx.com
In reply to: Ross J. Reedstrom (#13)
Re: 2-phase commit

[ sorry to repost this, but I didn't receive my mail back... Anything
wrong with the mailserver ? ]

 I am involved in a project of open-source, PostgreSQL-backed,
buzzword-compliant replication/high availability software that would
act as an SQL � one-to-many � gateway (but still in the design phase
--- this is *not* an announcement :-). Of course, the topic of 2-phase
commit is important to us ; we currently plan to record all write
commands issued during the transaction in an auxiliary table (some
sort of higher-level WAL). First commit phase would then consist in
closing this record and ensuring it can be  REDOne in the case of a
crash (UNDO would be done by just rolling back the current
transaction). But this is quite complicated and may require to
serialize all accesses (both read and write) to a given database so as
to guarantee that REDO will yield the very same result.

I understand it would certainly be better and more profitable for
the community if I could help implement 2-phase commit inside
PostgreSQL. But I am not much of a PostgreSQL hacker yet. What do you
think ?

--
<< Tout n'y est pas parfait, mais on y honore certainement les jardiniers >>

Dominique Quatravaux <dom@kilimandjaro.dyndns.org>

#23Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Hannu Krosing (#21)
RE: Re: AW: Re: MySQL and BerkleyDB (fwd)

1. For 1st phase we'll place into log "prepared-to-commit" record
and this phase will be accomplished after record is
flushed on disk.
At this point transaction may be committed at any time because of
all its modifications are logged. But it still may be rolled back
if this phase failed on other sites of distributed system.

1st phase will also need to do all the delayed constraint checks,
and all other work a commit currently does, that could possibly lead
to a transaction abort. The 2nd phase of 2phase commit is not

It was assumed.

Vadim

#24Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Mikheev, Vadim (#18)
Re: Re: AW: Re: MySQL and BerkleyDB (fwd)

Added to TODO.detail/replication.

[ Charset ISO-8859-1 unsupported, converting... ]

I had thought that the pre-commit information could be stored in an
auxiliary table by the middleware program ; we would then have
to re-implement some sort of higher-level WAL (I thought of the list
of the commands performed in the current transaction, with a sequence
number for each of them that would guarantee correct ordering between
concurrent transactions in case of a REDO). But I fear I am missing

This wouldn't work for READ COMMITTED isolation level.
But why do you want to log commands into WAL where each modification
is already logged in, hm, correct order?
Well, it has sense if you're looking for async replication but
you need not in two-phase commit for this and should aware about
problems with READ COMMITTED isolevel.

Back to two-phase commit - it's easiest part of work required for
distributed transaction processing.
Currently we place single commit record to log and transaction is
committed when this record (and so all other transaction records)
is on disk.
Two-phase commit:

1. For 1st phase we'll place into log "prepared-to-commit" record
and this phase will be accomplished after record is flushed on disk.
At this point transaction may be committed at any time because of
all its modifications are logged. But it still may be rolled back
if this phase failed on other sites of distributed system.

2. When all sites are prepared to commit we'll place "committed"
record into log. No need to flush it because of in the event of
crash for all "prepared" transactions recoverer will have to
communicate other sites to know their statuses anyway.

That's all! It is really hard to implement distributed lock- and
communication- managers but there is no problem with logging two
records instead of one. Period.

Vadim

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#25Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Bruce Momjian (#19)
Re: Re: AW: Re: MySQL and BerkleyDB (fwd)

Added to TODO.detail/replication.

[ Charset ISO-8859-1 unsupported, converting... ]

I had thought that the pre-commit information could be stored in an
auxiliary table by the middleware program ; we would then have
to re-implement some sort of higher-level WAL (I thought of the list
of the commands performed in the current transaction, with a sequence
number for each of them that would guarantee correct ordering between
concurrent transactions in case of a REDO). But I fear I am missing

This wouldn't work for READ COMMITTED isolation level.
But why do you want to log commands into WAL where each modification
is already logged in, hm, correct order?
Well, it has sense if you're looking for async replication but
you need not in two-phase commit for this and should aware about
problems with READ COMMITTED isolevel.

I believe the issue here is that while SERIALIZABLE ISOLATION means all
queries can be run serially, our default is READ COMMITTED, meaning that
open transactions see committed transactions, even if the transaction
committed after our transaction started. (FYI, see my chapter on
transactions for help, http://www.postgresql.org/docs/awbook.html.)

To do higher-level WAL, you would have to record not only the queries,
but the other queries that were committed at the start of each command
in your transaction.

Ideally, you could number every commit by its XID your log, and then
when processing the query, pass the "committed" transaction ids that
were visible at the time each command began.

In other words, you can replay the queries in transaction commit order,
except that you have to have some transactions committed at specific
points while other transactions are open, i.e.:

XID Open XIDS Query
500 UPDATE t SET col = 3;
501 500 BEGIN;
501 500 UPDATE t SET col = 4;
501 UPDATE t SET col = 5;
501 COMMIT;

This is a silly example, but it shows that 500 must commit after the
first command in transaction 501, but before the second command in the
transaction. This is because UPDATE t SET col = 5 actually sees the
changes made by transaction 500 in READ COMMITTED isolation level.

I am not advocating this. I think WAL is a better choice. I just
wanted to outline how replaying the queries in commit order is
insufficient.

Back to two-phase commit - it's easiest part of work required for
distributed transaction processing.
Currently we place single commit record to log and transaction is
committed when this record (and so all other transaction records)
is on disk.
Two-phase commit:

1. For 1st phase we'll place into log "prepared-to-commit" record
and this phase will be accomplished after record is flushed on disk.
At this point transaction may be committed at any time because of
all its modifications are logged. But it still may be rolled back
if this phase failed on other sites of distributed system.

2. When all sites are prepared to commit we'll place "committed"
record into log. No need to flush it because of in the event of
crash for all "prepared" transactions recoverer will have to
communicate other sites to know their statuses anyway.

That's all! It is really hard to implement distributed lock- and
communication- managers but there is no problem with logging two
records instead of one. Period.

Great.

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 853-3000
+  If your life is a hard drive,     |  830 Blythe Avenue
+  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#26Noname
dom@idealx.com
In reply to: Zeugswetter Andreas SB (#20)
Re: Re: AW: Re: MySQL and BerkleyDB (fwd)

yup :-) Maybe this could even be raised to the SQL level,
so applications could use this ? I have not seen this elsewhere,
but why actually not ?

Yes please :-) if someone is to code this quicker than me (I suppose
so, since I have other projects to deal with concurrently).

--
<< Tout n'y est pas parfait, mais on y honore certainement les jardiniers >>

Dominique Quatravaux <dom@kilimandjaro.dyndns.org>