Odd query execution behavior with extended protocol

Started by Shay Rojanskyover 10 years ago11 messageshackers

roji@roji.org

over 10 years ago

Hi hackers, some odd behavior has been reported with Npgsql and I'm sure
you can help.

Npgsql supports sending multiple SQL statements in a single packet via the
extended protocol. This works fine, but when the second query SELECTs a
value modified by the first's UPDATE, I'm getting a result as if the UPDATE
hasn't yet occurred.

The exact messages send by Npgsql are:

Parse (UPDATE data SET name='foo' WHERE id=1), statement=unnamed
Describe (statement=unnamed)
Bind (statement=unnamed, portal=MQ0)
Parse (SELECT * FROM data WHERE id=1), statement=unnamed
Describe (statement=unnamed)
Bind (statement=unnamed, portal=MQ1)
Execute (portal=MQ0)
Close (portal=MQ0)
Execute (portal=MQ1)
Close (portal=MQ1)
Sync

Instead of returning the expected 'foo' value set in the first command's
UPDATE, I'm getting whatever value was previously there.
Note that this happen regardless of whether a transaction is already set
and of the isolation level.

Is this the expected behavior, have I misunderstood the protocol specs?

Thanks for your help, and please let me know if you need any more info.

Shay

Charles Clavadetscher

clavadetscher@swisspug.org

over 10 years ago

In reply to: Shay Rojansky (#1)

Re: Odd query execution behavior with extended protocol

Hello

Npgsql supports sending multiple SQL statements in a single packet via the extended protocol. This works fine, but when the second query SELECTs a value modified by the first's UPDATE, I'm getting a result as if the
UPDATE hasn't yet occurred.

Looks like the first updating statement is not committed, assuming that the two statements run in different transactions.

The exact messages send by Npgsql are:

Parse (UPDATE data SET name='foo' WHERE id=1), statement=unnamed
Describe (statement=unnamed)
Bind (statement=unnamed, portal=MQ0)
Parse (SELECT * FROM data WHERE id=1), statement=unnamed
Describe (statement=unnamed)
Bind (statement=unnamed, portal=MQ1)
Execute (portal=MQ0)
Close (portal=MQ0)
Execute (portal=MQ1)
Close (portal=MQ1)
Sync

I never used Npgsql so I don't know if there is something missing there. Would you need an explicit commit before closing MQ0?
Also I am not in clear what "statement=unnamed" means, but it is used twice. Is it possible that the update is overwritten with select before it executes?

Just some thoughts, as I said I know nothing of Npgsql.

BTW: Do you see the change after update in your DB if you look into it with another tool (e.g. psql)?

Charles

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Shay Rojansky

roji@roji.org

over 10 years ago

In reply to: Charles Clavadetscher (#2)

Re: Odd query execution behavior with extended protocol

Npgsql supports sending multiple SQL statements in a single packet via

the extended protocol. This works fine, but when the second query SELECTs a
value modified by the first's UPDATE, I'm getting a result as if the

UPDATE hasn't yet occurred.

Looks like the first updating statement is not committed, assuming that
the two statements run in different transactions.

I did try to prefix the message chain with an explicit transaction BEGIN
(with the several different isolation levels) without a difference in
behavior.

The exact messages send by Npgsql are:

Parse (UPDATE data SET name='foo' WHERE id=1), statement=unnamed
Describe (statement=unnamed)
Bind (statement=unnamed, portal=MQ0)
Parse (SELECT * FROM data WHERE id=1), statement=unnamed
Describe (statement=unnamed)
Bind (statement=unnamed, portal=MQ1)
Execute (portal=MQ0)
Close (portal=MQ0)
Execute (portal=MQ1)
Close (portal=MQ1)
Sync

I never used Npgsql so I don't know if there is something missing there.
Would you need an explicit commit before closing MQ0?

I guess this is exactly my question to PostgreSQL... But unless I'm
misunderstanding the transaction semantics I shouldn't need to commit the
first UPDATE in order to see its effect in the second SELECT...

Also I am not in clear what "statement=unnamed" means, but it is used

twice. Is it possible that the update is overwritten with select before it
executes?

statement=unnamed means that the destination statement is the unnamed
prepared statement (as described in
http://www.postgresql.org/docs/current/static/protocol-message-formats.html).
Right after the Parse I bind the unnamed statement which I just parsed to
cursor MQ0. In other words, Npgsql first parses the two queries and binds
them to portals MQ0 and MQ1, and only then executes both portals

BTW: Do you see the change after update in your DB if you look into it with

another tool (e.g. psql)?

That's a good suggestion, I'll try to check it out, thanks!

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Shay Rojansky (#3)

Re: Odd query execution behavior with extended protocol

On October 4, 2015 2:50:10 PM GMT+02:00, Shay Rojansky <roji@roji.org> wrote:

Npgsql supports sending multiple SQL statements in a single packet

via

the extended protocol. This works fine, but when the second query

SELECTs a

value modified by the first's UPDATE, I'm getting a result as if the

UPDATE hasn't yet occurred.

Looks like the first updating statement is not committed, assuming

that

the two statements run in different transactions.

I did try to prefix the message chain with an explicit transaction
BEGIN
(with the several different isolation levels) without a difference in
behavior.

The exact messages send by Npgsql are:

Parse (UPDATE data SET name='foo' WHERE id=1), statement=unnamed
Describe (statement=unnamed)
Bind (statement=unnamed, portal=MQ0)
Parse (SELECT * FROM data WHERE id=1), statement=unnamed
Describe (statement=unnamed)
Bind (statement=unnamed, portal=MQ1)
Execute (portal=MQ0)
Close (portal=MQ0)
Execute (portal=MQ1)
Close (portal=MQ1)
Sync

I never used Npgsql so I don't know if there is something missing

there.

Would you need an explicit commit before closing MQ0?

I guess this is exactly my question to PostgreSQL... But unless I'm
misunderstanding the transaction semantics I shouldn't need to commit
the
first UPDATE in order to see its effect in the second SELECT...

Try adding a sync before the second execute.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Shay Rojansky (#1)

Re: Odd query execution behavior with extended protocol

Shay Rojansky <roji@roji.org> writes:

Npgsql supports sending multiple SQL statements in a single packet via the
extended protocol. This works fine, but when the second query SELECTs a
value modified by the first's UPDATE, I'm getting a result as if the UPDATE
hasn't yet occurred.

The exact messages send by Npgsql are:

Parse (UPDATE data SET name='foo' WHERE id=1), statement=unnamed
Describe (statement=unnamed)
Bind (statement=unnamed, portal=MQ0)
Parse (SELECT * FROM data WHERE id=1), statement=unnamed
Describe (statement=unnamed)
Bind (statement=unnamed, portal=MQ1)
Execute (portal=MQ0)
Close (portal=MQ0)
Execute (portal=MQ1)
Close (portal=MQ1)
Sync

I'm fairly sure that the query snapshot is established at Bind time,
which means that this SELECT will run with a snapshot that indeed
does not see the effects of the UPDATE.

To my mind there is not a lot of value in performing Bind until you
are ready to do Execute. The only reason the operations are separated
in the protocol is so that you can do multiple Executes with a row limit
on each one, to retrieve a large query result in chunks.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Shay Rojansky

roji@roji.org

over 10 years ago

In reply to: Andres Freund (#4)

Re: Odd query execution behavior with extended protocol

Try adding a sync before the second execute.

I tried inserting a Sync right before the second Execute, this caused an
error with the message 'portal "MQ1" does not exist'.
This seems like problematic behavior on its own, regardless of my issues
here (Sync shouldn't be causing an implicit close of the portal, should
it?).

Shay Rojansky

roji@roji.org

over 10 years ago

In reply to: Tom Lane (#5)

Re: Odd query execution behavior with extended protocol

I'm fairly sure that the query snapshot is established at Bind time,
which means that this SELECT will run with a snapshot that indeed
does not see the effects of the UPDATE.

To my mind there is not a lot of value in performing Bind until you
are ready to do Execute. The only reason the operations are separated
in the protocol is so that you can do multiple Executes with a row limit
on each one, to retrieve a large query result in chunks.

So you would suggest changing my message chain to send Bind right after
Execute, right? This would yield the following messages:

P1/P2/D1/B1/E1/D2/B2/E2/S (rather than the current
P1/D1/B1/P2/D2/B2/E1/C1/E2/C2/S)

This would mean that I would switch to using named statements and the
unnamed portal, rather than the current unnamed statement
and named portals. If I recall correctly, I was under the impression that
there are some PostgreSQL performance benefits to using the
unnamed statement over named statements, although I admit I can't find any
documentation backing that. Can you confirm that the two
are equivalent performance-wise?

Shay

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Shay Rojansky (#7)

Re: Odd query execution behavior with extended protocol

Shay Rojansky <roji@roji.org> writes:

To my mind there is not a lot of value in performing Bind until you
are ready to do Execute. The only reason the operations are separated
in the protocol is so that you can do multiple Executes with a row limit
on each one, to retrieve a large query result in chunks.

So you would suggest changing my message chain to send Bind right after
Execute, right? This would yield the following messages:

P1/P2/D1/B1/E1/D2/B2/E2/S (rather than the current
P1/D1/B1/P2/D2/B2/E1/C1/E2/C2/S)

This would mean that I would switch to using named statements and the
unnamed portal, rather than the current unnamed statement
and named portals. If I recall correctly, I was under the impression that
there are some PostgreSQL performance benefits to using the
unnamed statement over named statements, although I admit I can't find any
documentation backing that. Can you confirm that the two
are equivalent performance-wise?

Hmm. I do not recall exactly what performance optimizations apply to
those two cases; they're probably not "equivalent", though I do not think
the difference is major in either case. TBH I was a bit surprised on
reading your message to hear that the system would take that sequence at
all; it's not obvious that it should be allowed to replace a statement,
named or not, while there's an open portal that depends on it.

I think you might have more issues with lifespans, since portals go away
at commit whereas named statements don't.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Shay Rojansky (#6)

Re: Odd query execution behavior with extended protocol

Shay Rojansky <roji@roji.org> writes:

Try adding a sync before the second execute.

I tried inserting a Sync right before the second Execute, this caused an
error with the message 'portal "MQ1" does not exist'.
This seems like problematic behavior on its own, regardless of my issues
here (Sync shouldn't be causing an implicit close of the portal, should
it?).

Sync results in closing the transaction, if you've not explicitly executed
a BEGIN.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Shay Rojansky

roji@roji.org

over 10 years ago

In reply to: Tom Lane (#8)

Re: Odd query execution behavior with extended protocol

Thanks for the help Tom and the others, I'll modify my sequence and report
if I encounter any further issues.

On Sun, Oct 4, 2015 at 7:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Shay Rojansky <roji@roji.org> writes:

To my mind there is not a lot of value in performing Bind until you
are ready to do Execute. The only reason the operations are separated
in the protocol is so that you can do multiple Executes with a row limit
on each one, to retrieve a large query result in chunks.

So you would suggest changing my message chain to send Bind right after
Execute, right? This would yield the following messages:

P1/P2/D1/B1/E1/D2/B2/E2/S (rather than the current
P1/D1/B1/P2/D2/B2/E1/C1/E2/C2/S)

This would mean that I would switch to using named statements and the
unnamed portal, rather than the current unnamed statement
and named portals. If I recall correctly, I was under the impression that
there are some PostgreSQL performance benefits to using the
unnamed statement over named statements, although I admit I can't find

any

documentation backing that. Can you confirm that the two
are equivalent performance-wise?

Hmm. I do not recall exactly what performance optimizations apply to
those two cases; they're probably not "equivalent", though I do not think
the difference is major in either case. TBH I was a bit surprised on
reading your message to hear that the system would take that sequence at
all; it's not obvious that it should be allowed to replace a statement,
named or not, while there's an open portal that depends on it.

I think you might have more issues with lifespans, since portals go away
at commit whereas named statements don't.

regards, tom lane

#11

Shay Rojansky

roji@roji.org

over 10 years ago

In reply to: Tom Lane (#8)

Re: Odd query execution behavior with extended protocol

So you would suggest changing my message chain to send Bind right after
Execute, right? This would yield the following messages:

P1/P2/D1/B1/E1/D2/B2/E2/S (rather than the current
P1/D1/B1/P2/D2/B2/E1/C1/E2/C2/S)

This would mean that I would switch to using named statements and the
unnamed portal, rather than the current unnamed statement
and named portals. If I recall correctly, I was under the impression that
there are some PostgreSQL performance benefits to using the
unnamed statement over named statements, although I admit I can't find

any

documentation backing that. Can you confirm that the two
are equivalent performance-wise?

Hmm. I do not recall exactly what performance optimizations apply to
those two cases; they're probably not "equivalent", though I do not think
the difference is major in either case. TBH I was a bit surprised on
reading your message to hear that the system would take that sequence at
all; it's not obvious that it should be allowed to replace a statement,
named or not, while there's an open portal that depends on it.

One more important piece of information...

The reason Npgsql currently sends P1/D1/B1/P2/D2/B2/E1/C1/E2/C2/S is to
avoid deadlocks, I've already discussed this with you in
/messages/by-id/CADT4RqB+fbtQpTE5YLZ0hKb-2K-nGZHM2YbVj0TMC8rQBGfUxA@mail.gmail.com
.

Unfortunately, the alternative I proposed above, P1/P2/D1/B1/E1/D2/B2/E2/S,
suffers from the same issue: any sequence in which a Bind is sent after a
previous Execute is deadlock-prone - Execute causes PostgreSQL to start
writing a potentially large dataset, while Bind means the client may be
writing a potentially large parameter value.

In other words, unless I'm mistaken it seems there's no alternative but to
implement non-blocking I/O at the client side - write until writing would
block, switching to reading when that happens. This adds some substantial
complexity, especially with .NET's SSL/TLS implementation layer.

Or does anyone see some sort of alternative which I've missed?