Fetch zero result rows when executing a query?

Started by Shay Rojanskyalmost 11 years ago22 messages
#1Shay Rojansky
roji@roji.org

Sorry if this has been asked before, couldn't find any mention...

I'm working on the Npgsql, the .NET driver for PostgreSQL, and am trying to
find a way to execute a query but without fetching any rows. The Execute
message has a maximum result-row count, but zero is documented to mean
"fetch all rows".

The use case would be sending a query which might modify or might not (e.g.
UPDATE), but we know that the user is uninterested in any result row.

My current workaround is to specify maxrows=1, was wondering if I missed a
better alternative.

Thanks,

Shay

#2Andres Freund
andres@2ndquadrant.com
In reply to: Shay Rojansky (#1)
Re: Fetch zero result rows when executing a query?

Hi,

On 2015-02-03 12:26:33 +0100, Shay Rojansky wrote:

Sorry if this has been asked before, couldn't find any mention...

I'm working on the Npgsql, the .NET driver for PostgreSQL, and am trying to
find a way to execute a query but without fetching any rows. The Execute
message has a maximum result-row count, but zero is documented to mean
"fetch all rows".

The use case would be sending a query which might modify or might not (e.g.
UPDATE), but we know that the user is uninterested in any result row.

My current workaround is to specify maxrows=1, was wondering if I missed a
better alternative.

Is this really a relevant optimization? If the user doesn't want
results, RETURNING shouldn't be specified... Sure, sometimes the same
query will be reused over cases where you want the results and those
where you don't, but I doubt this is worthy of optimization.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Shay Rojansky (#1)
Re: Fetch zero result rows when executing a query?

On 2/3/15 5:26 AM, Shay Rojansky wrote:

Sorry if this has been asked before, couldn't find any mention...

I'm working on the Npgsql, the .NET driver for PostgreSQL, and am trying
to find a way to execute a query but without fetching any rows. The
Execute message has a maximum result-row count, but zero is documented
to mean "fetch all rows".

The use case would be sending a query which might modify or might not
(e.g. UPDATE), but we know that the user is uninterested in any result row.

My current workaround is to specify maxrows=1, was wondering if I missed
a better alternative.

You might be able to add something like WHERE FALSE to the RETURNING
clause, but I agree with Andres; this seems like premature optimization.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Shay Rojansky (#1)
Re: Fetch zero result rows when executing a query?

Shay Rojansky wrote:

I'm working on the Npgsql, the .NET driver for PostgreSQL, and am trying to
find a way to execute a query but without fetching any rows. The Execute
message has a maximum result-row count, but zero is documented to mean
"fetch all rows".

The use case would be sending a query which might modify or might not (e.g.
UPDATE), but we know that the user is uninterested in any result row.

My current workaround is to specify maxrows=1, was wondering if I missed a
better alternative.

If you know beforehand the query generates at most one row, then
specifying maxrows=0 is best.
If you know beforehand the query might generate more than one row (SELECT)
yet you also know that you are not interested in those, then maxrows=1
is best; then again, modifying the query to include a LIMIT 1 is even
better, in which case maxrows can be zero again.
maxrows=1 for UPDATE/DELETE/INSERT does not make sense, use maxrows=0.
--
Stephen.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Andres Freund (#2)
Re: Fetch zero result rows when executing a query?

Andres Freund wrote:

On 2015-02-03 12:26:33 +0100, Shay Rojansky wrote:

find a way to execute a query but without fetching any rows. The Execute
message has a maximum result-row count, but zero is documented to mean
"fetch all rows".

Is this really a relevant optimization? If the user doesn't want

I believe he's talking about the network protocol of postgreSQL, not
about query optimisation (as you do).
--
Stephen.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andres Freund
andres@2ndquadrant.com
In reply to: Stephen R. van den Berg (#5)
Re: Fetch zero result rows when executing a query?

On 2015-02-04 12:17:23 +0100, Stephen R. van den Berg wrote:

Andres Freund wrote:

On 2015-02-03 12:26:33 +0100, Shay Rojansky wrote:

find a way to execute a query but without fetching any rows. The Execute
message has a maximum result-row count, but zero is documented to mean
"fetch all rows".

Is this really a relevant optimization? If the user doesn't want

I believe he's talking about the network protocol of postgreSQL, not
about query optimisation (as you do).

I actually am not. There's no rows returned by a UPDATE without a
RETURNING. So the whole question doesn't really make much sense without
RETURNING... And I think it's premature optimization to ignore
RETURNING's results in the driver, even if the user doesn't ask for
them.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Marko Tiikkaja
marko@joh.to
In reply to: Stephen R. van den Berg (#4)
Re: Fetch zero result rows when executing a query?

On 2/4/15 12:13 PM, Stephen R. van den Berg wrote:

If you know beforehand the query might generate more than one row (SELECT)
yet you also know that you are not interested in those, then maxrows=1
is best; then again, modifying the query to include a LIMIT 1 is even
better, in which case maxrows can be zero again.

This seems to be a common pattern, and I think it's a *huge* mistake to
specify maxrows=1 and/or ignore rows after the first one in the driver
layer. If the user says "give me the only row returned by this query",
the interface should check that only one row is in reality returned by
the query. If the query returns more than one row, the user made a
mistake in formulating the query and she probably wants to know about
it. If she genuinely doesn't care about the rows after the first one,
she can always specify LIMIT 1.

For a sad example, look at PL/PgSQL's SELECT .. INTO ..; it's not
terribly difficult to write a query which returns more than one row *by
mistake* and have something really bad happen later on since it went
undetected during testing because you just happened to get the expected
row back first. And when you do want to specifically enforce it for
e.g. security critical code, you have to resort to really ugly hacks
like window functions.

.m

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Marko Tiikkaja
marko@joh.to
In reply to: Stephen R. van den Berg (#5)
Re: Fetch zero result rows when executing a query?

On 2/4/15 12:17 PM, Stephen R. van den Berg wrote:

Andres Freund wrote:

On 2015-02-03 12:26:33 +0100, Shay Rojansky wrote:

find a way to execute a query but without fetching any rows. The Execute
message has a maximum result-row count, but zero is documented to mean
"fetch all rows".

Is this really a relevant optimization? If the user doesn't want

I believe he's talking about the network protocol of postgreSQL, not
about query optimisation (as you do).

I don't believe so. If this is not about optimization, why can't the
driver just ignore the rows from the server?

.m

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andres Freund
andres@2ndquadrant.com
In reply to: Marko Tiikkaja (#7)
Re: Fetch zero result rows when executing a query?

On 2015-02-04 12:23:51 +0100, Marko Tiikkaja wrote:

On 2/4/15 12:13 PM, Stephen R. van den Berg wrote:

If you know beforehand the query might generate more than one row (SELECT)
yet you also know that you are not interested in those, then maxrows=1
is best; then again, modifying the query to include a LIMIT 1 is even
better, in which case maxrows can be zero again.

This seems to be a common pattern, and I think it's a *huge* mistake to
specify maxrows=1 and/or ignore rows after the first one in the driver
layer. If the user says "give me the only row returned by this query", the
interface should check that only one row is in reality returned by the
query

I don't think these are what this thread is about. It's about a UPDATE
(=> no LIMIT) where the user uses a driver interface that doesn't return
rows generated by the UPDATE (the above error check doesn't make sense).

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andres Freund
andres@2ndquadrant.com
In reply to: Marko Tiikkaja (#8)
Re: Fetch zero result rows when executing a query?

On 2015-02-04 12:25:04 +0100, Marko Tiikkaja wrote:

On 2/4/15 12:17 PM, Stephen R. van den Berg wrote:

Andres Freund wrote:

On 2015-02-03 12:26:33 +0100, Shay Rojansky wrote:

find a way to execute a query but without fetching any rows. The Execute
message has a maximum result-row count, but zero is documented to mean
"fetch all rows".

Is this really a relevant optimization? If the user doesn't want

I believe he's talking about the network protocol of postgreSQL, not
about query optimisation (as you do).

I don't believe so.

Did you read the original post?

I'm working on the Npgsql, the .NET driver for PostgreSQL, and am trying to
find a way to execute a query but without fetching any rows. The Execute
message has a maximum result-row count, but zero is documented to mean
"fetch all rows".

That's pretty unambiguously about a driver and the network protocol.

If this is not about optimization, why can't the driver just ignore
the rows from the server?

IIUC Shay wants to avoid the network traffic incurred by that. I think
that's premature, but it's certainly not completely bogus.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Marko Tiikkaja
marko@joh.to
In reply to: Andres Freund (#10)
Re: Fetch zero result rows when executing a query?

On 2/4/15 12:31 PM, Andres Freund wrote:

On 2015-02-04 12:25:04 +0100, Marko Tiikkaja wrote:

On 2/4/15 12:17 PM, Stephen R. van den Berg wrote:

I believe he's talking about the network protocol of postgreSQL, not
about query optimisation (as you do).

I don't believe so.

Did you read the original post?

Yes, I did. Now I realize that my quoted part there was very poorly worded.

What I meant is that "I don't believe [you] were talking about query
optimization". I hope the rest of my message makes more sense to you in
that light.

.m

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Marko Tiikkaja
marko@joh.to
In reply to: Andres Freund (#9)
Re: Fetch zero result rows when executing a query?

On 2/4/15 12:27 PM, Andres Freund wrote:

On 2015-02-04 12:23:51 +0100, Marko Tiikkaja wrote:

On 2/4/15 12:13 PM, Stephen R. van den Berg wrote:

If you know beforehand the query might generate more than one row (SELECT)
yet you also know that you are not interested in those, then maxrows=1
is best; then again, modifying the query to include a LIMIT 1 is even
better, in which case maxrows can be zero again.

This seems to be a common pattern, and I think it's a *huge* mistake to
specify maxrows=1 and/or ignore rows after the first one in the driver
layer. If the user says "give me the only row returned by this query", the
interface should check that only one row is in reality returned by the
query

I don't think these are what this thread is about. It's about a UPDATE
(=> no LIMIT) where the user uses a driver interface that doesn't return
rows generated by the UPDATE (the above error check doesn't make sense).

No, this wasn't what OP was on about. But I was merely responding to
the quoted paragraph, which suggested that maxrows=1 would be something
to consider for SELECT. Which I really strongly believe is not, and I'm
hoping we can eliminate it from all interfaces by 2025.

So slightly off-topic, for which I apologize.

.m

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Marko Tiikkaja (#7)
Re: Fetch zero result rows when executing a query?

Marko Tiikkaja wrote:

On 2/4/15 12:13 PM, Stephen R. van den Berg wrote:

If you know beforehand the query might generate more than one row (SELECT)
yet you also know that you are not interested in those, then maxrows=1
is best; then again, modifying the query to include a LIMIT 1 is even
better, in which case maxrows can be zero again.

This seems to be a common pattern, and I think it's a *huge* mistake
to specify maxrows=1 and/or ignore rows after the first one in the
driver layer. If the user says "give me the only row returned by

I guess it depends on the amount of backpressure you can excert on
the sql "programmer".
If you are writing a driver which just has to run applications
written by third parties in the most efficient way, the outline
I gave above is best.
If the driver has a way to communicate with the one writing
the SQL, then giving warnings is better.

At the driver level, you generally have to assume that communicating
with the programmer is not possible anymore unless the driver
API allows for appropriate backpressure (and if this
results in numerous warnings, they still might go largely
unnoticed on production sites).
--
Stephen.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Marko Tiikkaja
marko@joh.to
In reply to: Stephen R. van den Berg (#13)
Re: Fetch zero result rows when executing a query?

On 2/4/15 12:36 PM, Stephen R. van den Berg wrote:

Marko Tiikkaja wrote:

This seems to be a common pattern, and I think it's a *huge* mistake
to specify maxrows=1 and/or ignore rows after the first one in the
driver layer. If the user says "give me the only row returned by

I guess it depends on the amount of backpressure you can excert on
the sql "programmer".
If you are writing a driver which just has to run applications
written by third parties in the most efficient way, the outline
I gave above is best.
If the driver has a way to communicate with the one writing
the SQL, then giving warnings is better.

Screw warnings, that should be an error. "I tried to execute your
query, but something went wrong."

Anyway, this is getting really off topic now, and that's my fault, so I
should shut up.

.m

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15David G Johnston
david.g.johnston@gmail.com
In reply to: Shay Rojansky (#1)
Re: Fetch zero result rows when executing a query?

Shay Rojansky wrote

The use case would be sending a query which might modify or might not
(e.g.
UPDATE), but we know that the user is uninterested in any result row.

How do you intend to gain this knowledge if the query doesn't structure
itself so that it does or does not return actual rows? Can you give a real
concrete example for people to get their heads around?

David J.

--
View this message in context: http://postgresql.nabble.com/Fetch-zero-result-rows-when-executing-a-query-tp5836537p5836740.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Shay Rojansky
roji@roji.org
In reply to: Marko Tiikkaja (#14)
Re: Fetch zero result rows when executing a query?

Sorry everyone, I was unexpectedly very busy and couldn't respond
earlier... My apologies.

I'll clarify just a little... I am indeed talking about the PostgreSQL
network protocol, and not about query optimization (with LIMIT or omitting
RETURNING etc.). I am implementing ADO.NET's ExecuteNonQuery, through which
the user indicates they're not interested in any result rows whether those
exist or not. For the situation where a user does ExecuteNonQuery but the
query returns result rows, the driver can save the needless network
transfers. We can definitely say it's the user's fault for providing a
query with a resultset to ExecuteNonQuery, but we *do* have the user's
clear intention that no rows be fetched so why not act on it. I agree this
isn't a terribly important optimization, the trigger for this question was
first and foremost curiosity: it seems strange the protocol allows you to
specify max_rows for any value other than 0.

Here's a possible believable use-case which doesn't involve user neglect:
imagine some server-side function which has side-effects and also returns
some rows. In some situations the user is interested in the result rows,
but in others they only want the side-effect. The user would probably have
no control over the function, and their only way to *not* transfer the
result rows would be with a mechanism such as max_rows.

Marko, regarding your general criticism of max_rows:

This seems to be a common pattern, and I think it's a *huge* mistake to

specify maxrows=1 and/or ignore rows after the first one in

the driver layer. If the user says "give me the only row returned by

this query", the interface should check that only one row is in

reality returned by the query. If the query returns more than one row,

the user made a mistake in formulating the query and she

probably wants to know about it. If she genuinely doesn't care about the

rows after the first one, she can always specify LIMIT 1.

For a sad example, look at PL/PgSQL's SELECT .. INTO ..; it's not

terribly difficult to write a query which returns more than one

row *by mistake* and have something really bad happen later on since it

went undetected during testing because you just

happened to get the expected row back first. And when you do want to

specifically enforce it for e.g. security critical code,

you have to resort to really ugly hacks like window functions.

There are some problems with what you say... First, the ADO.NET API
provides a SingleRow API option which explicitly provides exactly this.
This API option doesn't at all mean that there *should* be only one row
(i.e. an error should be raised if otherwise), but simply that any other
rows beyond the first should be discarded. So regardless of what we think
best practices are on this, this behavior is mandated/specified by a major
API.

More to the point, doesn't max_rows=1 have exactly the same dangers as
LIMIT 1? The two seem to be identical, except that one is expressed in the
SQL query and the other at the network protocol level. The way I see it, if
the user specifies one of them without specifying ORDER BY, they are
explicitly saying they don't care which row comes out. And if their testing
code fails because this is wrong, then they've made a mistake - IMHo this
isn't a reason to kill the entire feature.

In general, in my view it's beneficial to separate between the SQL queries
and the features that the driver is supposed to provide in its API. The SQL
may be written or managed by one entity, reused in many places (some of
which want all rows and others which want only 1).

#17David G Johnston
david.g.johnston@gmail.com
In reply to: Shay Rojansky (#16)
Re: Fetch zero result rows when executing a query?

On Sat, Feb 7, 2015 at 10:41 AM, Shay Rojansky [via PostgreSQL] <
ml-node+s1045698n5837082h66@n5.nabble.com> wrote:

Sorry everyone, I was unexpectedly very busy and couldn't respond
earlier... My apologies.

I'll clarify just a little... I am indeed talking about the PostgreSQL
network protocol, and not about query optimization (with LIMIT or omitting
RETURNING etc.). I am implementing ADO.NET's ExecuteNonQuery
​ ​
through which the user indicates they're not interested in any result rows
whether those exist or not.

​​ExecuteNonQuery returns an integer while row-returning queries ​do not.
I'd argue that the API states that the user is declaring that the query
they are executing does not return any actual rows - just a count of
affected rows - not that they do not care to see what rows are returned.

For the situation where a user does ExecuteNonQuery but the query returns
result rows, the driver can save the needless network transfers. We can
definitely say it's the user's fault for providing a query with a resultset
to ExecuteNonQuery, but we *do* have the user's clear intention that no
rows be fetched so why not act on it. I agree this isn't a terribly
important optimization, the trigger for this question was first and
foremost curiosity: it seems strange the protocol allows you to specify
max_rows for any value other than 0.

​Yes, it does seem strange and, like Marko said, ideally would be
deprecated. The fact that it cannot handle "zero rows" seems like an
unnecessary limitation and I cannot image that any values other than 0 and
all would be of practical usage. In the case of zero returning instead the
number of rows would be more useful than simply refusing to return anything
so even if something like this is needed the current implementation is
flawed.

Here's a possible believable use-case which doesn't involve user neglect:
imagine some server-side function which has side-effects and also returns
some rows. In some situations the user is interested in the result rows,
but in others they only want the side-effect. The user would probably have
no control over the function, and their only way to *not* transfer the
result rows would be with a mechanism such as max_rows.

​Functions always return rows and so should not be executed using
"ExecuteNonQuery". In most cases action-oriented functions return a single
result-status row so ignoring that row, while likely not advisable, is not
exactly expensive.​

Marko, regarding your general criticism of max_rows:

This seems to be a common pattern, and I think it's a *huge* mistake to

specify maxrows=1 and/or ignore rows after the first one in

the driver layer. If the user says "give me the only row returned by

this query", the interface should check that only one row is in

reality returned by the query. If the query returns more than one row,

the user made a mistake in formulating the query and she

probably wants to know about it. If she genuinely doesn't care about

the rows after the first one, she can always specify LIMIT 1.

For a sad example, look at PL/PgSQL's SELECT .. INTO ..; it's not

terribly difficult to write a query which returns more than one

row *by mistake* and have something really bad happen later on since it

went undetected during testing because you just

happened to get the expected row back first. And when you do want to

specifically enforce it for e.g. security critical code,

you have to resort to really ugly hacks like window functions.

There are some problems with what you say... First, the ADO.NET API
provides a SingleRow API option which explicitly provides exactly this.
This API option doesn't at all mean that there *should* be only one row
(i.e. an error should be raised if otherwise), but simply that any other
rows beyond the first should be discarded. So regardless of what we think
best practices are on this, this behavior is mandated/specified by a major
API.

​"​The resultset may contain multiple rows, which are ignored by
ExecuteScalar"; so ignore them.

The basic question here becomes - the executor already must generate, in
memory, all of the rows so is there a way to properly interact with the
server where you can request the number of rows that were generated but not
be obligated to actually pull them down to the client. This doesn't seem
like an unreasonable request but assuming that it is not currently possible
(of which I have little clue) then the question becomes who cares enough to
design and implement such a protocol enhancement.

More to the point, doesn't max_rows=1 have exactly the same dangers as
LIMIT 1? The two seem to be identical, except that one is expressed in the
SQL query and the other at the network protocol level.

​The planner does not have access to network protocol level​ options while
it does know about LIMIT.

The way I see it, if the user specifies one of them without specifying
ORDER BY, they are explicitly saying they don't care which row comes out.
And if their testing code fails because this is wrong, then they've made a
mistake - IMHo this isn't a reason to kill the entire feature.

In general, in my view it's beneficial to separate between the SQL queries
and the features that the driver is supposed to provide in its API. The SQL
may be written or managed by one entity, reused in many places (some of
which want all rows and others which want only 1).

​Then the driver writers that need these special API behaviors are
reasonably expected to contribute to adding them to backend products that
do not already have them. The database developers are not going to take on
responsibility for the API decisions of others; and features deemed (or
that in reality are) of marginal usefulness are likely to be omitted -
intentionally or otherwise - from the official (in this case libpq)
protocol.

Expecting users to use an API without knowledge or control of the SQL that
is being executed seems like a stretch to me. Expecting the driver to
simply provide an easy way to access data from the common SQL idioms a user
might use seems like a reasonable goal and puts leaves the smarts in the
purvue of the planner. The optimization you require doesn't seem
unreasonable but also doesn't seem especially compelling - nor matter how
many people might be using ADO.NET (which provides no indication that they
are trying to use APIs that are incompatible with the queries that they are
sending.

David J.

--
View this message in context: http://postgresql.nabble.com/Fetch-zero-result-rows-when-executing-a-query-tp5836537p5837084.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

#18Shay Rojansky
roji@roji.org
In reply to: David G Johnston (#17)
Re: Fetch zero result rows when executing a query?

First a general comment:

Then the driver writers that need these special API behaviors are
reasonably expected to contribute to adding them to backend products that
do not already have them. The database developers are not going to take

on

responsibility for the API decisions of others; and features deemed (or
that in reality are) of marginal usefulness are likely to be omitted -
intentionally or otherwise - from the official (in this case libpq)
protocol.

I absolutely agree with you there, I'm not trying to get anybody to
implement
something I need (i.e. fetch 0 rows). This is more of a general discussion
as
to whether that feature *makes sense* to you as a protocol feature (which
doesn't
seem to be the case, as some of you guys want to deprecate the whole
max_rows
thing).

I'll clarify just a little... I am indeed talking about the PostgreSQL
network protocol, and not about query optimization (with LIMIT or

omitting

RETURNING etc.). I am implementing ADO.NET's ExecuteNonQuery
through which the user indicates they're not interested in any result

rows

whether those exist or not.

ExecuteNonQuery returns an integer while row-returning queries do not.
I'd argue that the API states that the user is declaring that the query
they are executing does not return any actual rows - just a count of
affected rows - not that they do not care to see what rows are returned.

That's true. IMHO the count of affected rows isn't relevant to this
discussion
so I didn't mention it.

For the situation where a user does ExecuteNonQuery but the query returns
result rows, the driver can save the needless network transfers. We can
definitely say it's the user's fault for providing a query with a

resultset

to ExecuteNonQuery, but we *do* have the user's clear intention that no
rows be fetched so why not act on it. I agree this isn't a terribly
important optimization, the trigger for this question was first and
foremost curiosity: it seems strange the protocol allows you to specify
max_rows for any value other than 0.

Yes, it does seem strange and, like Marko said, ideally would be
deprecated. The fact that it cannot handle "zero rows" seems like an
unnecessary limitation and I cannot image that any values other than 0 and
all would be of practical usage. In the case of zero returning instead

the

number of rows would be more useful than simply refusing to return

anything

so even if something like this is needed the current implementation is
flawed.

Just to be precise: what is strange to me is that the max_rows feature
exists
but has no 0 value. You and Marko are arguing that the whole feature should
be
deprecated (i.e. always return all rows).

Here's a possible believable use-case which doesn't involve user neglect:
imagine some server-side function which has side-effects and also returns
some rows. In some situations the user is interested in the result rows,
but in others they only want the side-effect. The user would probably

have

no control over the function, and their only way to *not* transfer the
result rows would be with a mechanism such as max_rows.

Functions always return rows and so should not be executed using
"ExecuteNonQuery". In most cases action-oriented functions return a

single

result-status row so ignoring that row, while likely not advisable, is not
exactly expensive.

Your description of functions doesn't hold for all functions, this is why I
tried to provide a usecase. It is possible for some function to both have a
side-effect (i.e. modify some table) *and* return a large number of rows. It
may be legitimate for a user to want to have the side-effect but not care
about the rows. Ignoring one row isn't expensive, ignoring many could be.

The basic question here becomes - the executor already must generate, in
memory, all of the rows so is there a way to properly interact with the
server where you can request the number of rows that were generated but

not

be obligated to actually pull them down to the client. This doesn't seem
like an unreasonable request but assuming that it is not currently

possible

(of which I have little clue) then the question becomes who cares enough

to

design and implement such a protocol enhancement.

OK.

More to the point, doesn't max_rows=1 have exactly the same dangers as
LIMIT 1? The two seem to be identical, except that one is expressed in

the

SQL query and the other at the network protocol level.

The planner does not have access to network protocol level? options while
it does know about LIMIT.

That's an internal PostgreSQL matter (which granted, may impact efficiency).
My comment about max_rows being equivalent to LIMIT was meant to address
Marko's
argument that max_rows is dangerous because any row might come out and tests
may pass accidentally (but that holds for LIMIT 1 as well, doesn't it).

Expecting users to use an API without knowledge or control of the SQL that
is being executed seems like a stretch to me. Expecting the driver to
simply provide an easy way to access data from the common SQL idioms a

user

might use seems like a reasonable goal and puts leaves the smarts in the
purvue of the planner. The optimization you require doesn't seem
unreasonable but also doesn't seem especially compelling - nor matter how
many people might be using ADO.NET (which provides no indication that they
are trying to use APIs that are incompatible with the queries that they

are

sending.

Fair enough, I don't disagree with the above. The idea is less "no
knowledge or
control of the SQL", but rather a complicated prepared statement that is
executed
in some places to fetch all rows, and in others to fetch only 1 (maybe this
is the
most compelling usecase).

#19Marko Tiikkaja
marko@joh.to
In reply to: Shay Rojansky (#18)
Re: Fetch zero result rows when executing a query?

On 2015-02-08 09:56, Shay Rojansky wrote:

More to the point, doesn't max_rows=1 have exactly the same dangers as
LIMIT 1? The two seem to be identical, except that one is expressed in

the

SQL query and the other at the network protocol level.

The planner does not have access to network protocol level? options while
it does know about LIMIT.

That's an internal PostgreSQL matter (which granted, may impact efficiency).
My comment about max_rows being equivalent to LIMIT was meant to address
Marko's
argument that max_rows is dangerous because any row might come out and tests
may pass accidentally (but that holds for LIMIT 1 as well, doesn't it).

The point is that then the user gets to choose the behavior. LIMIT 1
without ORDER BY is very explicitly telling the reader of the code
"there might be more than one row returned by this query, but I'm okay
with getting only one of them, whichever it is". And when the LIMIT 1
is *not* there, you get the driver automatically checking your queries
for sanity. If the driver always throws away the rows after the first
one, it's difficult to go to behavior of enforcing that no more than one
row was returned.

Anyway, like you said somewhere upthread, the interface the driver
you're working on promises to implement right now can't be changed due
to backwards compatibility concerns. But I see new interfaces being
created all the time, and they all make this same mistake.

.m

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Robert Haas
robertmhaas@gmail.com
In reply to: Shay Rojansky (#18)
Re: Fetch zero result rows when executing a query?

On Sun, Feb 8, 2015 at 3:56 AM, Shay Rojansky <roji@roji.org> wrote:

Just to be precise: what is strange to me is that the max_rows feature
exists
but has no 0 value. You and Marko are arguing that the whole feature should
be
deprecated (i.e. always return all rows).

I think the fact that it has no zero value is probably just a
historical accident; most likely, whoever designed it originally
(probably twenty years ago) didn't think about queries with
side-effects and therefore didn't consider that wanting 0 rows would
ever be sensible. Meanwhile, a sentinel value was needed to request
all rows, so they used 0. If they'd thought of it, they might have
picked -1 and we'd not be having this discussion.

FWIW, I'm in complete agreement that it would be good if we had this
feature. I believe this is not the first report we've had of
PostgreSQL doing things in ways that mesh nicely with standardized
driver interfaces. Whether we think those interfaces are
well-designed or not, they are standardized. When people use $OTHERDB
and have a really great driver, and then they move to PostgreSQL and
get one with more warts, it does not encourage them to stick with
PostgreSQL.

.NET is not some fringe user community that we can dismiss as
irrelevant. We need users of all languages to want to use PostgreSQL,
not just users of languages any one of us happens to personally like.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Shay Rojansky
roji@roji.org
In reply to: Robert Haas (#20)
Re: Fetch zero result rows when executing a query?

Thanks for understanding Robert, that's more or less what I had in mind, I
was mainly wondering if I were missing some deeper explanation for the
absence of the possibility of requesting 0 rows.

Regardless of all of the above, it's really no big deal. I'll go ahead and
use max_rows=1 for now, hopefully you guys don't decide to deprecate it.

Shay

On Tue, Feb 10, 2015 at 3:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Show quoted text

On Sun, Feb 8, 2015 at 3:56 AM, Shay Rojansky <roji@roji.org> wrote:

Just to be precise: what is strange to me is that the max_rows feature
exists
but has no 0 value. You and Marko are arguing that the whole feature

should

be
deprecated (i.e. always return all rows).

I think the fact that it has no zero value is probably just a
historical accident; most likely, whoever designed it originally
(probably twenty years ago) didn't think about queries with
side-effects and therefore didn't consider that wanting 0 rows would
ever be sensible. Meanwhile, a sentinel value was needed to request
all rows, so they used 0. If they'd thought of it, they might have
picked -1 and we'd not be having this discussion.

FWIW, I'm in complete agreement that it would be good if we had this
feature. I believe this is not the first report we've had of
PostgreSQL doing things in ways that mesh nicely with standardized
driver interfaces. Whether we think those interfaces are
well-designed or not, they are standardized. When people use $OTHERDB
and have a really great driver, and then they move to PostgreSQL and
get one with more warts, it does not encourage them to stick with
PostgreSQL.

.NET is not some fringe user community that we can dismiss as
irrelevant. We need users of all languages to want to use PostgreSQL,
not just users of languages any one of us happens to personally like.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#22Shay Rojansky
roji@roji.org
In reply to: Shay Rojansky (#21)
Re: Fetch zero result rows when executing a query?

Sorry to revive this thread, I just had one additional thought...

To those advocating the deprecation of the max_rows parameter of Execute,
there's another argument to consider. max_rows isn't just there in order to
fetch, say, a single row of the result set and discard the rest (which is
what I originally asked about). There's also the function of retrieving the
resultset in chunks: getting 5 rows, then 10, etc. etc. Deprecating
max_rows would leave the user/driver only with the option of retrieving the
entire resultset in one go (for example, no option for the interleaving of
rows from several resultsets). And the lack of the ability to execute and
retrieve 0 rows hurts this scenario as well.

Just wanted to put it out there as another argument against deprecation.

On Wed, Feb 11, 2015 at 2:05 AM, Shay Rojansky <roji@roji.org> wrote:

Show quoted text

Thanks for understanding Robert, that's more or less what I had in mind, I
was mainly wondering if I were missing some deeper explanation for the
absence of the possibility of requesting 0 rows.

Regardless of all of the above, it's really no big deal. I'll go ahead and
use max_rows=1 for now, hopefully you guys don't decide to deprecate it.

Shay

On Tue, Feb 10, 2015 at 3:00 PM, Robert Haas <robertmhaas@gmail.com>
wrote:

On Sun, Feb 8, 2015 at 3:56 AM, Shay Rojansky <roji@roji.org> wrote:

Just to be precise: what is strange to me is that the max_rows feature
exists
but has no 0 value. You and Marko are arguing that the whole feature

should

be
deprecated (i.e. always return all rows).

I think the fact that it has no zero value is probably just a
historical accident; most likely, whoever designed it originally
(probably twenty years ago) didn't think about queries with
side-effects and therefore didn't consider that wanting 0 rows would
ever be sensible. Meanwhile, a sentinel value was needed to request
all rows, so they used 0. If they'd thought of it, they might have
picked -1 and we'd not be having this discussion.

FWIW, I'm in complete agreement that it would be good if we had this
feature. I believe this is not the first report we've had of
PostgreSQL doing things in ways that mesh nicely with standardized
driver interfaces. Whether we think those interfaces are
well-designed or not, they are standardized. When people use $OTHERDB
and have a really great driver, and then they move to PostgreSQL and
get one with more warts, it does not encourage them to stick with
PostgreSQL.

.NET is not some fringe user community that we can dismiss as
irrelevant. We need users of all languages to want to use PostgreSQL,
not just users of languages any one of us happens to personally like.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company