protocol change in 7.4

Started by Neil Conwayover 23 years ago34 messageshackers
Jump to latest
#1Neil Conway
neilc@samurai.com

There has been some previous discussion of changing the FE/BE protocol
in 7.4, in order to fix several problems. I think this is worth doing:
if we can resolve all these issues in a single release, it will lessen
the upgrade difficulties for users.

I'm aware of the following problems that need a protocol change to fix
them:

(1) Add an optional textual message to NOTIFY

(2) Remove the hard-coded limits on database and user names
(SM_USER, SM_DATABASE), replace them with variable-length
fields.

(3) Remove some legacy elements in the startup packet
('unused' can go -- perhaps 'tty' as well). I think the
'length' field of the password packet is also not used,
but I'll need to double-check that.

(4) Fix the COPY protocol (Tom?)

(5) Fix the Fastpath protocol (Tom?)

(6) Protocol-level support for prepared queries, in order to
bypass the parser (and maybe be more compatible with the
implementation of prepared queries in other databases).

(7) Include the current transaction status, since it's
difficult for the client app to determine it for certain
(Tom/Bruce?)

If I've missed anything or if there is something you think we should
add, please let me know.

I can implement (1), (2), (3), and possibly (7), if someone can tell
me exactly what is required (my memory of the discussion relating to
this is fuzzy). The rest is up for grabs.

Finally, how should we manage the transition? I wasn't around for the
earlier protocol changes, so I'd appreciate any input on steps we can
take to improve backward-compatibility.

Cheers,

Neil

--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC

#2Mike Mascari
mascarm@mascari.com
In reply to: Neil Conway (#1)
Re: protocol change in 7.4

Neil Conway wrote:

There has been some previous discussion of changing the FE/BE protocol
in 7.4, in order to fix several problems. I think this is worth doing:
if we can resolve all these issues in a single release, it will lessen
the upgrade difficulties for users.

<snip>

If I've missed anything or if there is something you think we should
add, please let me know.

Is there any thought about changing the protocol to support
two-phase commit? Not that 2PC and distributed transactions
would be implemented in 7.4, but to prevent another protocol
change in the future?

Mike Mascari
mascarm@mascari.com

#3Neil Conway
neilc@samurai.com
In reply to: Mike Mascari (#2)
Re: protocol change in 7.4

Mike Mascari <mascarm@mascari.com> writes:

Is there any thought about changing the protocol to support
two-phase commit? Not that 2PC and distributed transactions would be
implemented in 7.4, but to prevent another protocol change in the
future?

My understanding is that 2PC is one way to implement multi-master
replication. If that's what you're referring to, then I'm not sure I
see the point: the multi-master replication project (pgreplication)
doesn't use 2PC, due to apparent scalability problems (not to mention
that it also uses a separate channel for communications between
backends on different nodes).

Cheers,

Neil

--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC

#4Darren Johnson
darren@up.hrcoxmail.com
In reply to: Neil Conway (#1)
Re: protocol change in 7.4

I'm now implementing 2PC replication and distributed transaction. My 2PC
needs some supports in startup packet to establish a replication session
or a recovery session.

BTW, my 2PC replication is working, and I'm implementing 2PC recovery now.

I would like to here more about your implementation. Do you have some
documentation that I
could read?

If not, perhaps (if you have the time) you could put together a post
describing your work. Like
Is it an internal or external solution. Are you sending SQL or tuples
in your update messages.
How are you handling failure detection? Is this partial or full
replication?

Please forgive me for asking so many questions, but I'm rather intrigued
by database
replication.

Darren

Show quoted text
#5Mike Mascari
mascarm@mascari.com
In reply to: Neil Conway (#1)
Re: protocol change in 7.4

Neil Conway wrote:

Mike Mascari <mascarm@mascari.com> writes:

Is there any thought about changing the protocol to support
two-phase commit? Not that 2PC and distributed transactions would be
implemented in 7.4, but to prevent another protocol change in the
future?

My understanding is that 2PC is one way to implement multi-master
replication. If that's what you're referring to, then I'm not sure I
see the point: the multi-master replication project (pgreplication)
doesn't use 2PC, due to apparent scalability problems (not to mention
that it also uses a separate channel for communications between
backends on different nodes).

Actually, I was thinking along the lines of a true CREATE
DATABASE LINK implementation, where multiple databases could
participate in a distributed transaction. That would require the
backend in which the main query is executing to act as the
"coordinator" and each of the other participating databases to
act as "cohorts". And would require a protocol change to support
the PREPARE, COMMIT-VOTE/ABORT-VOTE reply, and an ACK message
following the completion of the distributed COMMIT or ABORT.

Mike Mascari
mascarm@mascari.com

#6Satoshi Nagayasu
snaga@snaga.org
In reply to: Mike Mascari (#2)
Re: protocol change in 7.4

Hi all,

Mike Mascari <mascarm@mascari.com> wrote:

Is there any thought about changing the protocol to support
two-phase commit? Not that 2PC and distributed transactions
would be implemented in 7.4, but to prevent another protocol
change in the future?

I'm now implementing 2PC replication and distributed transaction. My 2PC
needs some support in startup packet to establish a replication session
and a recovery session.

BTW, 2PC replication is working, and I'm implementing 2PC recovery now.

--
NAGAYASU Satoshi <snaga@snaga.org>

#7Ross J. Reedstrom
reedstrm@rice.edu
In reply to: Mike Mascari (#5)
Re: protocol change in 7.4

On Mon, Nov 04, 2002 at 08:10:29PM -0500, Mike Mascari wrote:

Actually, I was thinking along the lines of a true CREATE
DATABASE LINK implementation, where multiple databases could
participate in a distributed transaction. That would require the
backend in which the main query is executing to act as the
"coordinator" and each of the other participating databases to
act as "cohorts". And would require a protocol change to support
the PREPARE, COMMIT-VOTE/ABORT-VOTE reply, and an ACK message
following the completion of the distributed COMMIT or ABORT.

Right, you need TPC in order for pgsql to participate in transactions
that span anything outside the DB proper. A DB link is one example,
or an external transaction manager that coordinates DB and filesystem
updates, for example. Zope could use this, to coordinate the DB with
it's internal object store.

Ross

#8Satoshi Nagayasu
pgsql@snaga.org
In reply to: Mike Mascari (#2)
Re: protocol change in 7.4

Hi,

Mike Mascari <mascarm@mascari.com> wrote:

Is there any thought about changing the protocol to support
two-phase commit? Not that 2PC and distributed transactions
would be implemented in 7.4, but to prevent another protocol
change in the future?

I'm now implementing 2PC replication and distributed transaction. My 2PC
needs some supports in startup packet to establish a replication session
or a recovery session.

BTW, my 2PC replication is working, and I'm implementing 2PC recovery now.

--
NAGAYASU Satoshi <snaga@snaga.org>

#9Satoshi Nagayasu
pgsql@snaga.org
In reply to: Darren Johnson (#4)
Re: protocol change in 7.4

Darren Johnson <darren@up.hrcoxmail.com> wrote:

I would like to here more about your implementation. Do you have some
documentation that I
could read?

Documentation is not available, but I have some slides for my presentation.

http://snaga.org/pgsql/20021018_2pc.pdf

Some answers for your questions may be in these slides.

And a current source code is available from:
http://snaga.org/pgsql/pgsql-20021025.tgz

If not, perhaps (if you have the time) you could put together a post
describing your work. Like
Is it an internal or external solution. Are you sending SQL or tuples
in your update messages.
How are you handling failure detection? Is this partial or full
replication?

It is an internal solution. In 2PC, pre-commit and commit are required.
So my implementation has some internal modifications on transaction
handling, log recording and else.

--
NAGAYASU Satoshi <snaga@snaga.org>

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Satoshi Nagayasu (#9)
Re: protocol change in 7.4

I don't see why 2PC would require any protocol-level change. I would
think that the API would be something like

BEGIN;
issue some commands ...
PRECOMMIT;
-- if the above does not return an error, then
COMMIT;

In other words, 2PC would require some new commands, but a new command
doesn't affect the protocol layer.

regards, tom lane

#11Grant Finnemore
grantf@guruhut.co.za
In reply to: Neil Conway (#1)
Re: protocol change in 7.4

Questions have arisen during discussions about errors relating
to how to support error codes without changing the FE/BE
protocols. (see TODO.detail/error)

Now that the protocol is up for revision, how about supporting
sql state strings, error codes, and other information directly in
the protocol.

Regards,
Grant

Neil Conway wrote:

Show quoted text

There has been some previous discussion of changing the FE/BE protocol
in 7.4, in order to fix several problems. I think this is worth doing:
if we can resolve all these issues in a single release, it will lessen
the upgrade difficulties for users.

I'm aware of the following problems that need a protocol change to fix
them:

(1) Add an optional textual message to NOTIFY

(2) Remove the hard-coded limits on database and user names
(SM_USER, SM_DATABASE), replace them with variable-length
fields.

(3) Remove some legacy elements in the startup packet
('unused' can go -- perhaps 'tty' as well). I think the
'length' field of the password packet is also not used,
but I'll need to double-check that.

(4) Fix the COPY protocol (Tom?)

(5) Fix the Fastpath protocol (Tom?)

(6) Protocol-level support for prepared queries, in order to
bypass the parser (and maybe be more compatible with the
implementation of prepared queries in other databases).

(7) Include the current transaction status, since it's
difficult for the client app to determine it for certain
(Tom/Bruce?)

If I've missed anything or if there is something you think we should
add, please let me know.

I can implement (1), (2), (3), and possibly (7), if someone can tell
me exactly what is required (my memory of the discussion relating to
this is fuzzy). The rest is up for grabs.

Finally, how should we manage the transition? I wasn't around for the
earlier protocol changes, so I'd appreciate any input on steps we can
take to improve backward-compatibility.

Cheers,

Neil

#12Neil Conway
neilc@samurai.com
In reply to: Grant Finnemore (#11)
Re: protocol change in 7.4

Grant Finnemore <grantf@guruhut.co.za> writes:

Now that the protocol is up for revision, how about supporting
sql state strings, error codes, and other information directly in
the protocol.

Ah, thanks for pointing that out. Error codes would be another thing
we can ideally support in 7.4, and we'd need a protocol change to do
it properly, AFAICS. IIRC, Peter E. expressed some interest in doing
this...

Cheers,

Neil

--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC

#13Satoshi Nagayasu
pgsql@snaga.org
In reply to: Tom Lane (#10)
Re: protocol change in 7.4

Tom Lane wrote:

I don't see why 2PC would require any protocol-level change. I would
think that the API would be something like

BEGIN;
issue some commands ...
PRECOMMIT;
-- if the above does not return an error, then
COMMIT;

In other words, 2PC would require some new commands, but a new command
doesn't affect the protocol layer.

I think a precommit-vote-commit phase of 2PC can be implemented in
command-lavel or protocol-level.

In command-level 2PC, an user application (or application programmer)
must know the DBMS is clustered or not (to use PRECOMMIT command).

In protocol-layer 2PC, no new SQL command is required.
A precommit-vote-commit phase will be called implicitly. It means an
user application can be used without any modification. An application
can use a traditional way (BEGIN...COMMIT).

So I made my decision to use protocol-layer implementation.
It doesn't affect the SQL command layer.

--
NAGAYASU Satoshi <snaga@snaga.org>

#14Maarten Boekhold
Maarten.Boekhold@reuters.com
In reply to: Satoshi Nagayasu (#13)
Re: protocol change in 7.4

<br><font size=2 face="sans-serif">On 11/05/2002 04:42:55 AM Neil Conway wrote:<br>
&gt; Mike Mascari &lt;mascarm@mascari.com&gt; writes:<br>
&gt; &gt; Is there any thought about changing the protocol to support<br>
&gt; &gt; two-phase commit? Not that 2PC and distributed transactions would be<br>
&gt; &gt; implemented in 7.4, but to prevent another protocol change in the<br>
&gt; &gt; future?<br>
&gt; <br>
&gt; My understanding is that 2PC is one way to implement multi-master<br>
&gt; replication. If that's what you're referring to, then I'm not sure I<br>
</font>
<br><font size=2 face="sans-serif">Another use of two-phase commit is in messaging middleware (MOM, message oriented middleware), were both the middleware and the database participate in the same transaction. Consider:</font>
<br>
<br><font size=2 face="sans-serif">- DB: begin</font>
<br><font size=2 face="sans-serif">- MOM: begin</font>
<br><font size=2 face="sans-serif">- DB: insert</font>
<br><font size=2 face="sans-serif">- MOM: send message</font>
<br><font size=2 face="sans-serif">- DB: prepare</font>
<br><font size=2 face="sans-serif">- MOM: prepare ==&gt; fails</font>
<br><font size=2 face="sans-serif">- DB: rollback</font>
<br><font size=2 face="sans-serif">- MOM: rollback</font>
<br>
<br><font size=2 face="sans-serif">just a simple example...</font>
<br>
<br><font size=2 face="sans-serif">Maarten</font>
<CODE><FONT SIZE=3><BR>
<BR>
------------------------------------------------------------- ---<BR>
Visit our Internet site at http://www.reuters.com&lt;BR&gt;
<BR>
Get closer to the financial markets with Reuters Messaging - for more<BR>
information and to register, visit http://www.reuters.com/messaging&lt;BR&gt;
<BR>
Any views expressed in this message are those of the individual<BR>
sender, except where the sender specifically states them to be<BR>
the views of Reuters Ltd.<BR>
</FONT></CODE>

#15Karel Zak
zakkr@zf.jcu.cz
In reply to: Neil Conway (#1)
Re: protocol change in 7.4

On Mon, Nov 04, 2002 at 07:22:54PM -0500, Neil Conway wrote:

(1) Add an optional textual message to NOTIFY

(2) Remove the hard-coded limits on database and user names
(SM_USER, SM_DATABASE), replace them with variable-length
fields.

(3) Remove some legacy elements in the startup packet
('unused' can go -- perhaps 'tty' as well). I think the
'length' field of the password packet is also not used,
but I'll need to double-check that.

(4) Fix the COPY protocol (Tom?)

(5) Fix the Fastpath protocol (Tom?)

(6) Protocol-level support for prepared queries, in order to
bypass the parser (and maybe be more compatible with the
implementation of prepared queries in other databases).

(7) Include the current transaction status, since it's
difficult for the client app to determine it for certain
(Tom/Bruce?)

(8) Error codes (maybe needn't change protocol)
- without this is PostgreSQL useless in real DB aplication

(9) Think about full dynamic charset encoding (add new encoding on
the fly)

Karel

--
Karel Zak <zakkr@zf.jcu.cz>
http://home.zf.jcu.cz/~zakkr/

C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

#16Hannu Krosing
hannu@tm.ee
In reply to: Satoshi Nagayasu (#13)
Re: protocol change in 7.4

Satoshi Nagayasu kirjutas T, 05.11.2002 kell 08:05:

Tom Lane wrote:

I don't see why 2PC would require any protocol-level change. I would
think that the API would be something like

BEGIN;
issue some commands ...
PRECOMMIT;
-- if the above does not return an error, then
COMMIT;

In other words, 2PC would require some new commands, but a new command
doesn't affect the protocol layer.

I think a precommit-vote-commit phase of 2PC can be implemented in
command-lavel or protocol-level.

In command-level 2PC, an user application (or application programmer)
must know the DBMS is clustered or not (to use PRECOMMIT command).

In protocol-layer 2PC, no new SQL command is required.
A precommit-vote-commit phase will be called implicitly. It means an
user application can be used without any modification. An application
can use a traditional way (BEGIN...COMMIT).

If application continues to use just BEGIN/COMMIT, then the protocol
level must parse command stream and recognize COMMIT in order to replace
it with PRECOMMIT, COMMIT.

If the communication library has to do that anyway, it could still do
the replacement without affecting wire protocol, no ?

------------------
Hannu

#17Satoshi Nagayasu
pgsql@snaga.org
In reply to: Hannu Krosing (#16)
Re: protocol change in 7.4

Hannu Krosing <hannu@tm.ee> wrote:

I think a precommit-vote-commit phase of 2PC can be implemented in
command-lavel or protocol-level.

In command-level 2PC, an user application (or application programmer)
must know the DBMS is clustered or not (to use PRECOMMIT command).

In protocol-layer 2PC, no new SQL command is required.
A precommit-vote-commit phase will be called implicitly. It means an
user application can be used without any modification. An application
can use a traditional way (BEGIN...COMMIT).

If application continues to use just BEGIN/COMMIT, then the protocol
level must parse command stream and recognize COMMIT in order to replace
it with PRECOMMIT, COMMIT.

If the communication library has to do that anyway, it could still do
the replacement without affecting wire protocol, no ?

In my implementation, 'the extended(2PC) FE/BE protocol' is used only in
the communication between the master and slave server(s), not between a
client app and the master server.

libpq <--Normal FE/BE--> (master)postgres <--Extended(2PC)FE/BE--> (slave)postgres
<--Extended(2PC)FE/BE--> (slave)postgres
<--Extended(2PC)FE/BE--> (slave)postgres

A client application and client's libpq can work continuously without
any modification. This is very important. And protocol modification
between master and slave server(s) is not so serious issue (I think).

--
NAGAYASU Satoshi <snaga@snaga.org>

#18Ross J. Reedstrom
reedstrm@rice.edu
In reply to: Satoshi Nagayasu (#17)
Re: protocol change in 7.4

On Tue, Nov 05, 2002 at 08:54:46PM +0900, Satoshi Nagayasu wrote:

Hannu Krosing <hannu@tm.ee> wrote:

In protocol-layer 2PC, no new SQL command is required.
A precommit-vote-commit phase will be called implicitly. It means an
user application can be used without any modification. An application
can use a traditional way (BEGIN...COMMIT).

If application continues to use just BEGIN/COMMIT, then the protocol
level must parse command stream and recognize COMMIT in order to replace
it with PRECOMMIT, COMMIT.

If the communication library has to do that anyway, it could still do
the replacement without affecting wire protocol, no ?

No, I think Satoshi is suggesting that from the client's point of view,
he's embedded the entire precommit-vote-commit cycle inside the COMMIT
command.

In my implementation, 'the extended(2PC) FE/BE protocol' is used only in
the communication between the master and slave server(s), not between a
client app and the master server.

libpq <--Normal FE/BE--> (master)postgres <--Extended(2PC)FE/BE--> (slave)postgres
<--Extended(2PC)FE/BE--> (slave)postgres
<--Extended(2PC)FE/BE--> (slave)postgres

A client application and client's libpq can work continuously without
any modification. This is very important. And protocol modification
between master and slave server(s) is not so serious issue (I think).

Ah, but this limits your use of 2PC to transparent DB replication - since
the client doesn't have access to the PRECOMMIT phase (usually called
prepare phase, but that's anothor overloaded term in the DB world!) it
_can't_ serve as the transaction master, so the other use cases that
people have mentioned here (zope, MOMs, etc.) wouldn't be possible.

Hmm, unless a connection can be switched into 2PC mode, so something
other than a postgresql server can act as the transaction master.

Does your implementation cascade? Can slaves have slaves?

Ross

#19Christof Petig
christof@petig-baender.de
In reply to: Neil Conway (#1)
Re: protocol change in 7.4

Neil Conway wrote:

(6) Protocol-level support for prepared queries, in order to
bypass the parser (and maybe be more compatible with the
implementation of prepared queries in other databases).

Let me add
(6b) Protocol level support for query parameters. This would actually
make (6) more powerful and speed up non prepared (but similar)
queries via the query cache (which is already there IIRC).
[I talk about <statement> USING :var ... ]

(n) Platform independant binary representation of parameters and
results (like in CORBA). This can _really_ speed up
communication with compiled programs if you take the time to
implement it. This was previously planned for a future
CORBA fe/be protocol, but this does not seem to come any time
soon.

(n+1) Optional additional Result qualifiers. E.g. dynamic embedded
sql has a
flag to indicate that this column is a key. Previously it was
impossible to set this flag to a meaningful value. Also
the standard has additional statistical information about the
size of the column etc. If it's unclear what I'm talking about
I will look up the exact location in the standard (it's embedded
sql, dynamic sql, get descriptor)

Yours
Christof

#20Satoshi Nagayasu
pgsql@snaga.org
In reply to: Ross J. Reedstrom (#18)
Re: protocol change in 7.4

"Ross J. Reedstrom" <reedstrm@rice.edu> wrote:

If application continues to use just BEGIN/COMMIT, then the protocol
level must parse command stream and recognize COMMIT in order to replace
it with PRECOMMIT, COMMIT.

If the communication library has to do that anyway, it could still do
the replacement without affecting wire protocol, no ?

No, I think Satoshi is suggesting that from the client's point of view,
he's embedded the entire precommit-vote-commit cycle inside the COMMIT
command.

Exactly. When user send the COMMIT command to the master server, the
master.talks to the slaves to process precommit-vote-commit using the
2PC. The 2PC cycle is hidden from user application. User application
just talks the normal FE/BE protocol.

In my implementation, 'the extended(2PC) FE/BE protocol' is used only in
the communication between the master and slave server(s), not between a
client app and the master server.

libpq <--Normal FE/BE--> (master)postgres <--Extended(2PC)FE/BE--> (slave)postgres
<--Extended(2PC)FE/BE--> (slave)postgres
<--Extended(2PC)FE/BE--> (slave)postgres

A client application and client's libpq can work continuously without
any modification. This is very important. And protocol modification
between master and slave server(s) is not so serious issue (I think).

Ah, but this limits your use of 2PC to transparent DB replication - since
the client doesn't have access to the PRECOMMIT phase (usually called
prepare phase, but that's anothor overloaded term in the DB world!) it
_can't_ serve as the transaction master, so the other use cases that
people have mentioned here (zope, MOMs, etc.) wouldn't be possible.

Hmm, unless a connection can be switched into 2PC mode, so something
other than a postgresql server can act as the transaction master.

I think the client should not act as the transaction master. But if it
is needed, the client can talk to postgres servers with the extended 2PC
FE/BE protocol.

Because all postgres servers(master and slave) can understand both the
normal FE/BE protocol and the extended 2PC FE/BE protocol. It is
switched in the startup packet.

See 10 page.
http://snaga.org/pgsql/20021018_2pc.pdf

I embeded 'the connection type' in the startup packet to switch postgres
backend's behavior (normal FE/BE protocol or 2PC FE/BE protocol).

In current implementation, if the connection type is 'R', it is handled
as the 2PC FE/BE connection (replication connection).

Does your implementation cascade? Can slaves have slaves?

It is not implemented, but I hope so. :-)
And I think it is not so difficult.

--
NAGAYASU Satoshi <snaga@snaga.org>

#21Hannu Krosing
hannu@tm.ee
In reply to: Satoshi Nagayasu (#20)
#22Christof Petig
christof@petig-baender.de
In reply to: Neil Conway (#1)
#23Satoshi Nagayasu
pgsql@snaga.org
In reply to: Hannu Krosing (#21)
#24Ross J. Reedstrom
reedstrm@rice.edu
In reply to: Satoshi Nagayasu (#23)
#25Satoshi Nagayasu
pgsql@snaga.org
In reply to: Ross J. Reedstrom (#24)
#26snpe
snpe@snpe.co.yu
In reply to: Satoshi Nagayasu (#6)
#27korry
korry@starband.net
In reply to: Neil Conway (#1)
#28korry
korry@starband.net
In reply to: korry (#27)
#29snpe
snpe@snpe.co.yu
In reply to: korry (#27)
#30snpe
snpe@snpe.co.yu
In reply to: korry (#28)
#31Satoshi Nagayasu
pgsql@snaga.org
In reply to: snpe (#26)
#32Hannu Krosing
hannu@tm.ee
In reply to: snpe (#30)
#33snpe
snpe@snpe.co.yu
In reply to: Hannu Krosing (#32)
#34Bruce Momjian
bruce@momjian.us
In reply to: snpe (#30)