PG wire protocol question

Started by Boszormenyi Zoltanalmost 10 years ago6 messagesgeneral

Jump to latest

Boszormenyi Zoltan

zboszor@pr.hu

almost 10 years ago

Hi,

it was a long time I have read this list or written to it.

Now, I have a question. This blog post was written about 3 years ago:
https://aphyr.com/posts/282-jepsen-postgres

Basically, it talks about the client AND the server as a system
and if the network is cut between sending COMMIT and
receiving the answer for it, the client has no way to know
whether the transaction was actually committed.

The client connection may just timeout and a reconnect would
give it a new connection but it cannot pick up its old connection
where it left. So it cannot really know whether the old transaction
was committed or not, possibly without doing expensive queries first.

Has anything changed on that front?

There is a 10.0 debate on -hackers. If this problem posed by
the above article is not fixed yet and needs a new wire protocol
to get it fixed, 10.0 would be justified.

Thanks in advance,
Zoltďż˝n Bďż˝szďż˝rmďż˝nyi

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Laurenz Albe

laurenz.albe@cybertec.at

almost 10 years ago

In reply to: Boszormenyi Zoltan (#1)

Re: PG wire protocol question

Boszormenyi Zoltan wrote:

it was a long time I have read this list or written to it.

Now, I have a question. This blog post was written about 3 years ago:
https://aphyr.com/posts/282-jepsen-postgres

Basically, it talks about the client AND the server as a system
and if the network is cut between sending COMMIT and
receiving the answer for it, the client has no way to know
whether the transaction was actually committed.

The client connection may just timeout and a reconnect would
give it a new connection but it cannot pick up its old connection
where it left. So it cannot really know whether the old transaction
was committed or not, possibly without doing expensive queries first.

Has anything changed on that front?

That blog post seems ill-informed - that has nothing to do with
two-phase commit.

The problem - that the server may commit a transaction, but the client
never receives the server's response - is independent of whether
two-phase commit is used or not.

This is not a problem of PostgreSQL, it is a generic problem of communication.

What would be the alternative?
That the server has to wait for the client to receive the commit response?
But what if the client received the message and the server or the network
go down before the server learns of the fact?
You see that this would lead to an infinite regress.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Manuel Gómez

targen@gmail.com

almost 10 years ago

In reply to: Laurenz Albe (#2)

Re: PG wire protocol question

On Tue, May 17, 2016 at 9:29 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

That blog post seems ill-informed - that has nothing to do with
two-phase commit.

The problem - that the server may commit a transaction, but the client
never receives the server's response - is independent of whether
two-phase commit is used or not.

The author addresses this in a comment within the linked page:

«The database may be consistent, but the system isn’t. A concurrent
request to the db will get the answer “yes, the transaction has
committed”, but the same request of the remote client gets “no, the
transaction has not yet committed.” The system may eventuallybecome
consistent, if the partition is healed and the acknowledgement reaches
the client. But it isn’t consistent until that point.

And the client can’t just wait indefinitely for acknowledgement–the
commit request may not have reached the server, in which case the
client would deadlock forever. Not to mention practical concerns (a
customer and clerk aren’t going to wait very long for a credit card
transaction to complete). Introducing timeouts then causes the
temporary inconsistency to become permanent.»

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

George Neuner

gneuner2@comcast.net

almost 10 years ago

In reply to: Boszormenyi Zoltan (#1)

Re: PG wire protocol question

On Sat, 14 May 2016 21:58:48 +0200, Boszormenyi Zoltan <zboszor@pr.hu>
wrote:

Hi,

it was a long time I have read this list or written to it.

Now, I have a question. This blog post was written about 3 years ago:
https://aphyr.com/posts/282-jepsen-postgres

Basically, it talks about the client AND the server as a system
and if the network is cut between sending COMMIT and
receiving the answer for it, the client has no way to know
whether the transaction was actually committed.

The client connection may just timeout and a reconnect would
give it a new connection but it cannot pick up its old connection
where it left. So it cannot really know whether the old transaction
was committed or not, possibly without doing expensive queries first.

Has anything changed on that front?

There is a 10.0 debate on -hackers. If this problem posed by
the above article is not fixed yet and needs a new wire protocol
to get it fixed, 10.0 would be justified.

It isn't going to be fixed ... it is a basic *unsolvable* problem in
communication theory that affects coordination in any distributed
system. For a simple explanation, see

https://en.wikipedia.org/wiki/Two_Generals'_Problem

Thanks in advance,
Zoltï¿½n Bï¿½szï¿½rmï¿½nyi

George

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Boszormenyi Zoltan

zboszor@pr.hu

almost 10 years ago

In reply to: Laurenz Albe (#2)

Re: PG wire protocol question

2016-05-17 15:29 keltezéssel, Albe Laurenz írta:

Boszormenyi Zoltan wrote:

it was a long time I have read this list or written to it.

Now, I have a question. This blog post was written about 3 years ago:
https://aphyr.com/posts/282-jepsen-postgres

Basically, it talks about the client AND the server as a system
and if the network is cut between sending COMMIT and
receiving the answer for it, the client has no way to know
whether the transaction was actually committed.

The client connection may just timeout and a reconnect would
give it a new connection but it cannot pick up its old connection
where it left. So it cannot really know whether the old transaction
was committed or not, possibly without doing expensive queries first.

Has anything changed on that front?

That blog post seems ill-informed - that has nothing to do with
two-phase commit.

In the blog post 2pc was mentioned related to the communication,
not as a transaction control inside the database. I wouldn't call
it misinformed. After all, terminology can mean different things
in different contexts.

The problem - that the server may commit a transaction, but the client
never receives the server's response - is independent of whether
two-phase commit is used or not.

This is not a problem of PostgreSQL, it is a generic problem of communication.

Indeed.

What would be the alternative?
That the server has to wait for the client to receive the commit response?

Not quite. That would mean constantly sending an ack that the other
received the last ack, which would be silly.

If the network connection is cut, the client should be able to
reconnect to the old backend and query the last state and continue
where it left, maybe confirming via some key or UUID that it was
indeed the client that connected previously.

But what if the client received the message and the server or the network
go down before the server learns of the fact?
You see that this would lead to an infinite regress.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Merlin Moncure

mmoncure@gmail.com

almost 10 years ago

In reply to: Boszormenyi Zoltan (#5)

Re: PG wire protocol question

On Wed, May 18, 2016 at 5:05 AM, Boszormenyi Zoltan <zboszor@pr.hu> wrote:

2016-05-17 15:29 keltezéssel, Albe Laurenz írta:

Boszormenyi Zoltan wrote:

it was a long time I have read this list or written to it.

Now, I have a question. This blog post was written about 3 years ago:
https://aphyr.com/posts/282-jepsen-postgres

Basically, it talks about the client AND the server as a system
and if the network is cut between sending COMMIT and
receiving the answer for it, the client has no way to know
whether the transaction was actually committed.

The client connection may just timeout and a reconnect would
give it a new connection but it cannot pick up its old connection
where it left. So it cannot really know whether the old transaction
was committed or not, possibly without doing expensive queries first.

Has anything changed on that front?

That blog post seems ill-informed - that has nothing to do with
two-phase commit.

Not quite. That would mean constantly sending an ack that the other
received the last ack, which would be silly.

If the network connection is cut, the client should be able to
reconnect to the old backend and query the last state and continue
where it left, maybe confirming via some key or UUID that it was
indeed the client that connected previously.

I agree. It's the server's job to make sure itself is consistent. If
the client is suspicious it may have lost the ack for whatever reason,
it needs to verify against the database that the transaction
succeeded. This is an application problem, not a protocol problem.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general