Is the PL/pgSQL refcursor useful in a modern three-tier app?

Started by Bryn Llewellynabout 3 years ago26 messagesgeneral

bryn@yugabyte.com

about 3 years ago

Section "43.7. Cursors” in the PL/pgSQL chapter of the doc (www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-DECLARATIONS) starts with this:

«
Rather than executing a whole query at once, it is possible to set up a cursor that encapsulates the query, and then read the query result a few rows at a time. One reason for doing this is to avoid memory overrun when the result contains a large number of rows. (However, PL/pgSQL users do not normally need to worry about that, since FOR loops automatically use a cursor internally to avoid memory problems.) A more interesting usage is to return a reference to a cursor that a function has created, allowing the caller to read the rows. This provides an efficient way to return large row sets from functions.
»

On its face, it seems to make sense. And I’ve written a few proof-of-concept tests. For example, I wrote a “security definer” function that's owned by a role that can select from the relevant table(s) that returns refcursor. And I called it from a subprogram that's owned by a role that cannot select from the relevant table(s) to loop through the rows. But I can't convince myself that this division of labor is useful. And especially I can't convince myself that the "pipeling" capability is relevant in a three-tier app with a stateless browser UI. Here, the paradigm has the client-side app checking out a connection from the pool, generating the entire response to the end-user's request, releasing the connection, and sending the response back to the browser. This paradigm isn't consistent with allowing the end user to navigate forwards and backwards in a scrollable cursor that is somehow held in its open state in in the sever by the the middle tier client on behalf of a browser session that comes back time and again to its dedicated middle tier client and thence yo its dedicated database server session. (Anyway, without anything like Oracle PL/SQL's packages, you have no mechanism to hold the opened cursor variable between successive server calls.)

Is it fair to say that the PL/pgSQL refcursor is useful, at best, only in very special use-cases?

adrian.klaver@aklaver.com

about 3 years ago

In reply to: Bryn Llewellyn (#1)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

On 3/14/23 17:50, Bryn Llewellyn wrote:

Section "43.7. Cursors” in the PL/pgSQL chapter of the doc (www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-DECLARATIONS) starts with this:

«
Rather than executing a whole query at once, it is possible to set up a cursor that encapsulates the query, and then read the query result a few rows at a time. One reason for doing this is to avoid memory overrun when the result contains a large number of rows. (However, PL/pgSQL users do not normally need to worry about that, since FOR loops automatically use a cursor internally to avoid memory problems.) A more interesting usage is to return a reference to a cursor that a function has created, allowing the caller to read the rows. This provides an efficient way to return large row sets from functions.
»

On its face, it seems to make sense. And I’ve written a few proof-of-concept tests. For example, I wrote a “security definer” function that's owned by a role that can select from the relevant table(s) that returns refcursor. And I called it from a subprogram that's owned by a role that cannot select from the relevant table(s) to loop through the rows. But I can't convince myself that this division of labor is useful. And especially I can't convince myself that the "pipeling" capability is relevant in a three-tier app with a stateless browser UI. Here, the paradigm has the client-side app checking out a connection from the pool, generating the entire response to the end-user's request, releasing the connection, and sending the response back to the browser. This paradigm isn't consistent with allowing the end user to navigate forwards and backwards in a scrollable cursor that is somehow held in its open state in in the sever by the the middle tier client on behalf of a browser session that comes back time and again to its dedicated middle tier client and thence yo its dedicated database server session. (Anyway, without anything like Oracle PL/SQL's packages, you have no mechanism to hold the opened cursor variable between successive server calls.)

I guess that would depend on how you define a server call:

https://www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-USING

"Once a cursor has been opened, it can be manipulated with the
statements described here.

These manipulations need not occur in the same function that opened the
cursor to begin with. You can return a refcursor value out of a function
and let the caller operate on the cursor. (Internally, a refcursor value
is simply the string name of a so-called portal containing the active
query for the cursor. This name can be passed around, assigned to other
refcursor variables, and so on, without disturbing the portal.)

All portals are implicitly closed at transaction end. Therefore a
refcursor value is usable to reference an open cursor only until the end
of the transaction."

Is it fair to say that the PL/pgSQL refcursor is useful, at best, only in very special use-cases?

--
Adrian Klaver
adrian.klaver@aklaver.com

bryn@yugabyte.com

about 3 years ago

In reply to: Adrian Klaver (#2)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

adrian.klaver@aklaver.com wrote:

bryn@yugabyte.com wrote:

Section "43.7. Cursors” in the PL/pgSQL chapter of the doc (www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-DECLARATIONS) starts with this:
«
Rather than executing a whole query at once, it is possible to set up a cursor that encapsulates the query, and then read the query result a few rows at a time. One reason for doing this is to avoid memory overrun when the result contains a large number of rows. (However, PL/pgSQL users do not normally need to worry about that, since FOR loops automatically use a cursor internally to avoid memory problems.) A more interesting usage is to return a reference to a cursor that a function has created, allowing the caller to read the rows. This provides an efficient way to return large row sets from functions.
»
On its face, it seems to make sense. And I’ve written a few proof-of-concept tests. For example, I wrote a “security definer” function that's owned by a role that can select from the relevant table(s) that returns refcursor. And I called it from a subprogram that's owned by a role that cannot select from the relevant table(s) to loop through the rows. But I can't convince myself that this division of labor is useful. And especially I can't convince myself that the "pipeling" capability is relevant in a three-tier app with a stateless browser UI. Here, the paradigm has the client-side app checking out a connection from the pool, generating the entire response to the end-user's request, releasing the connection, and sending the response back to the browser. This paradigm isn't consistent with allowing the end user to navigate forwards and backwards in a scrollable cursor that is somehow held in its open state in in the sever by the the middle tier client on behalf of a browser session that comes back time and again to its dedicated middle tier client and thence yo its dedicated database server session. (Anyway, without anything like Oracle PL/SQL's packages, you have no mechanism to hold the opened cursor variable between successive server calls.)

I guess that would depend on how you define a server call:

www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-USING

"Once a cursor has been opened, it can be manipulated with the statements described here.

These manipulations need not occur in the same function that opened the cursor to begin with. You can return a refcursor value out of a function and let the caller operate on the cursor. (Internally, a refcursor value is simply the string name of a so-called portal containing the active query for the cursor. This name can be passed around, assigned to other refcursor variables, and so on, without disturbing the portal.)

All portals are implicitly closed at transaction end. Therefore a refcursor value is usable to reference an open cursor only until the end of the transaction."

Is it fair to say that the PL/pgSQL refcursor is useful, at best, only in very special use-cases?

Ah… I see. I had read this wrongly:

« Internally, a refcursor value is simply the string name of a so-called portal containing the active query for the cursor. This name can be passed around, assigned to other refcursor variables, and so on, without disturbing the portal. »

I thought that it was an "under the hood" explanation and that the only thing that you could see after entry into the PL/pgSQL subprogram that will be the top of the stack would be an opaque value (sometimes called a handle in other contexts) that was accessible only from PL/pgSQL.

I hadn't yet tried this in psql:

create schema s;
create table s.t(k int primary key, v text not null);
insert into s.t(k, v) values (1, 'cat'), (2, 'dog');

create function s.f(k_in in int)
returns refcursor
language plpgsql
as $body$
declare
s_f_cur cursor(k_in int) for select v from s.t where k = k_in;
begin
open s_f_cur(k_in);
return s_f_cur;
end;
$body$;

create function s.g(cur in refcursor)
returns text
language plpgsql
as $body$
declare
v text;
begin
fetch cur into v;
return v;
end;
$body$;

begin;
select ''''||s.f(1)||'''' as cur
\gset
select s.g(:cur) as result;
end;

I just did. And the result of "select s.g(:cur)" was the expected "cat".

It requires a leap of imagination, or help from the pgsql-general list, to get to this. So thanks!

Might the doc add an example like this?

Of course, it all falls into place now. I can see how I could write a client app in, say, Python to write a humongous report to a file by fetching manageably-sized chunks, time and again until done with a function like my "g()" here, from a cursor that I'd opened using a function like my "f()".

B.t.w., when I said "top-level call", I meant the SQL statement that a client issues—in this case most likely "select my_plpgsql_function()" or "call my_plpgsql_procedure()". That top-of-stack subprogram can invoke other subprograms and so on ad infinitum. But eventually the whole stack empties and control passes back to the client. But all that falls away now with the exampe I showed in place.

adrian.klaver@aklaver.com

about 3 years ago

In reply to: Bryn Llewellyn (#3)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

On 3/14/23 20:29, Bryn Llewellyn wrote:

adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> wrote:

bryn@yugabyte.com <mailto:bryn@yugabyte.com> wrote:

Section "43.7. Cursors” in the PL/pgSQL chapter of the doc
(www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-
<http://www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR->DECLARATIONS) starts with this:
«
Rather than executing a whole query at once, it is possible to set up
a cursor that encapsulates the query, and then read the query result
a few rows at a time. One reason for doing this is to avoid memory
overrun when the result contains a large number of rows. (However,
PL/pgSQL users do not normally need to worry about that, since FOR
loops automatically use a cursor internally to avoid memory
problems.) A more interesting usage is to return a reference to a
cursor that a function has created, allowing the caller to read the
rows. This provides an efficient way to return large row sets from
functions.
»
On its face, it seems to make sense. And I’ve written a few
proof-of-concept tests. For example, I wrote a “security definer”
function that's owned by a role that can select from the relevant
table(s) that returns refcursor. And I called it from a subprogram
that's owned by a role that cannot select from the relevant table(s)
to loop through the rows. But I can't convince myself that this
division of labor is useful. And especially I can't convince myself
that the "pipeling" capability is relevant in a three-tier app with a
stateless browser UI. Here, the paradigm has the client-side app
checking out a connection from the pool, generating the entire
response to the end-user's request, releasing the connection, and
sending the response back to the browser. This paradigm isn't
consistent with allowing the end user to navigate forwards and
backwards in a scrollable cursor that is somehow held in its open
state in in the sever by the the middle tier client on behalf of a
browser session that comes back time and again to its dedicated
middle tier client and thence yo its dedicated database server
session. (Anyway, without anything like Oracle PL/SQL's packages, you
have no mechanism to hold the opened cursor variable between
successive server calls.)

I guess that would depend on how you define a server call:

www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-USING <http://www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-USING>

"Once a cursor has been opened, it can be manipulated with the
statements described here.

These manipulations need not occur in the same function that opened
the cursor to begin with. You can return a refcursor value out of a
function and let the caller operate on the cursor. (Internally, a
refcursor value is simply the string name of a so-called portal
containing the active query for the cursor. This name can be passed
around, assigned to other refcursor variables, and so on, without
disturbing the portal.)

All portals are implicitly closed at transaction end. Therefore a
refcursor value is usable to reference an open cursor only until the
end of the transaction."

Is it fair to say that the PL/pgSQL refcursor is useful, at best,
only in very special use-cases?

Ah… I see. I had read this wrongly:

« Internally, a refcursor value is simply the string name of a
so-called portal containing the active query for the cursor. This name
can be passed around, assigned to other refcursor variables, and so
on, without disturbing the portal. »

I thought that it was an "under the hood" explanation and that the only
thing that you could see after entry into the PL/pgSQL subprogram that
will be the top of the stack would be an opaque value (sometimes called
a handle in other contexts) that was accessible only from PL/pgSQL. >

I am not sure how this:

"These manipulations need not occur in the same function that opened the
cursor to begin with. You can return a refcursor value out of a function
and let the caller operate on the cursor. ..."

could be any clearer.

I just did. And the result of "select s.g(:cur)" was the expected "cat".

It requires a leap of imagination, or help from the pgsql-general list,
to get to this. So thanks!

Might the doc add an example like this?

43.7.3.5. Returning Cursors

"PL/pgSQL functions can return cursors to the caller. This is useful to
return multiple rows or columns, especially with very large result sets.
To do this, the function opens the cursor and returns the cursor name to
the caller (or simply opens the cursor using a portal name specified by
or otherwise known to the caller). The caller can then fetch rows from
the cursor. The cursor can be closed by the caller, or it will be closed
automatically when the transaction closes."

And then a series of examples on how to do that.

I have a hard time fathoming why someone who writes documentation does
not actually read documentation.

Of course, it all falls into place now. I can see how I could write a
client app in, say, Python to write a humongous report to a file by
fetching manageably-sized chunks, time and again until done with a
function like my "g()" here, from a cursor that I'd opened using a
function like my "f()".

B.t.w., when I said "top-level call", I meant the SQL statement that a
client issues—in this case most likely "select my_plpgsql_function()" or
"call my_plpgsql_procedure()". That top-of-stack subprogram can invoke
other subprograms and so on ad infinitum. But eventually the whole stack
empties and control passes back to the client. But all that falls away
now with the exampe I showed in place.

--
Adrian Klaver
adrian.klaver@aklaver.com

laurenz.albe@cybertec.at

about 3 years ago

In reply to: Bryn Llewellyn (#1)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

On Tue, 2023-03-14 at 17:50 -0700, Bryn Llewellyn wrote:

Section "43.7. Cursors” in the PL/pgSQL chapter of the doc
(www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-DECLARATIONS) starts with this:

«
[...]
A more interesting usage is to return a reference to a cursor that a function has created,
allowing the caller to read the rows. This provides an efficient way to return large row
sets from functions.
»

I can't convince myself that this division of labor is useful. And especially I can't convince
myself that the "pipeling" capability is relevant in a three-tier app with a stateless browser UI.

You seem to think that a client request corresponds to a single database request, but that
doesn't have to be the case. Satisfying a client request can mean iterating through a result set.

Cursors shine wherever you need procedural processing of query results, or where you don't
need the complete result set, but cannot tell in advance how much you will need, or where
you need to scroll and move forward and backward through a result set.

Yours,
Laurenz Albe

bryn@yugabyte.com

about 3 years ago

In reply to: Laurenz Albe (#5)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

laurenz.albe@cybertec.at wrote:

bryn@yugabyte.com wrote:

Section "43.7. Cursors” in the PL/pgSQL chapter of the doc
(www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-DECLARATIONS) starts with this:

«
[...]
A more interesting usage is to return a reference to a cursor that a function has created, allowing the caller to read the rows. This provides an efficient way to return large row sets from functions.
»

I can't convince myself that this division of labor is useful. And especially I can't convince myself that the "pipeling" capability is relevant in a three-tier app with a stateless browser UI.

You seem to think that a client request corresponds to a single database request, but that doesn't have to be the case. Satisfying a client request can mean iterating through a result set.

Cursors shine wherever you need procedural processing of query results, or where you don't need the complete result set, but cannot tell in advance how much you will need, or where you need to scroll and move forward and backward through a result set.

Re « You seem to think that a client request corresponds to a single database request », I meant no more than what psql models when you hit "return"after terminating an ordinary SQL statement with semi-colon (i.e. not a meta-command). I used "client" to mean a program that connects directly to the PostgreSQL server (using TCP/IP)—and not the ultimate human client who's using, say, a browser. Perhaps I should have said "top-level SQL call". (To keep this simple, let's assume that I don't use "backslash-semicolon" to send two or more top-level SQL calls in one round trip.)

So in this example:

begin;
declare cur cursor for select k, v from s.t order by k;
fetch forward 10 in cur;
fetch absolute 90 in cur;
fetch forward 10 in cur;
commit;

where I hit "return" after each semicolon, I'd say that I made six top-level SQL calls—and that lines up with the fact that each of "begin". "declare", "fetch", and "commit" has its own page in the "SQL Commands" section of the PG doc.

It sounds like you prefer "database request" for this. Is that right?

I was trying to get at the distinction between issuing a particular SQL statement as a top-level SQL call and issuing it from a PL/pgSQL subprogram.

I've read wording like what you wrote following "Cursors shine wherever you need …" elsewhere. But I can't picture a concrete use case where, not withstanding the "where" restriction that my "select" used, I can't tell how much of the result set I'll need or where reading result #n1 informs me that I next need to scroll and read result #n2. So I was looking for a convincing example.

adrian.klaver@aklaver.com

about 3 years ago

In reply to: Bryn Llewellyn (#6)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

On 3/15/23 13:37, Bryn Llewellyn wrote:

laurenz.albe@cybertec.at wrote:

Re « You seem to think that a client request corresponds to a single database request », I meant no more than what psql models when you hit "return"after terminating an ordinary SQL statement with semi-colon (i.e. not a meta-command). I used "client" to mean a program that connects directly to the PostgreSQL server (using TCP/IP)—and not the ultimate human client who's using, say, a browser. Perhaps I should have said "top-level SQL call". (To keep this simple, let's assume that I don't use "backslash-semicolon" to send two or more top-level SQL calls in one round trip.)

So in this example:

begin;
declare cur cursor for select k, v from s.t order by k;
fetch forward 10 in cur;
fetch absolute 90 in cur;
fetch forward 10 in cur;
commit;

where I hit "return" after each semicolon, I'd say that I made six top-level SQL calls—and that lines up with the fact that each of "begin". "declare", "fetch", and "commit" has its own page in the "SQL Commands" section of the PG doc.

It sounds like you prefer "database request" for this. Is that right?

I was trying to get at the distinction between issuing a particular SQL statement as a top-level SQL call and issuing it from a PL/pgSQL subprogram.

I've read wording like what you wrote following "Cursors shine wherever you need …" elsewhere. But I can't picture a concrete use case where, not withstanding the "where" restriction that my "select" used, I can't tell how much of the result set I'll need or where reading result #n1 informs me that I next need to scroll and read result #n2. So I was looking for a convincing example.

Huh?

You provided your own example earlier:

"Of course, it all falls into place now. I can see how I could write a
client app in, say, Python to write a humongous report to a file by
fetching manageably-sized chunks, time and again until done with a
function like my "g()" here, from a cursor that I'd opened using a
function like my "f()"."

--
Adrian Klaver
adrian.klaver@aklaver.com

bryn@yugabyte.com

about 3 years ago

In reply to: Adrian Klaver (#7)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

adrian.klaver@aklaver.com wrote:

bryn@yugabyte.com wrote:

laurenz.albe@cybertec.at wrote:

You seem to think that a client request corresponds to a single database request

…I can’t picture a concrete use case where, not withstanding the "where" restriction that my "select" used, I can't tell how much of the result set I'll need or where reading result #n1 informs me that I next need to scroll and read result #n2. So I was looking for a convincing example.

Huh?

You provided your own example earlier:

"Of course, it all falls into place now. I can see how I could write a client app in, say, Python to write a humongous report to a file by fetching manageably-sized chunks, time and again until done with a function like my "g()" here, from a cursor that I'd opened using a function like my "f()"."

My “Humongous report via client-side Python” example doesn’t call for me to abandon it part way through. Nor does it call for me to leap forwards as I discover facts along the way that make me realize that I need immediately to see a far distant fact by scrolling to where it is (and especially by scrolling backwards to what I’ve already seen). It was an example of this that I was asking for. The bare ability to do controlled piecewise materialization and fetch is clear.

bryn@yugabyte.com

about 3 years ago

In reply to: Adrian Klaver (#4)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

adrian.klaver@aklaver.com wrote:

I have a hard time fathoming why someone who writes documentation does not actually read documentation.

Ouch. In fact, I had read the whole of the "43.7. Cursors" section in the "PL/pgSQL" chapter (www.postgresql.org/docs/15/plpgsql-cursors.html). And the sections in the "SQL Commands" chapter for "declare", "fetch" and "close". But several of the key concepts didn't sink in and this prevented me not only from understanding what some of the examples showed but, worse, from being able to use the right vocabulary to express what confused me.

It's very much clearer now than when I started this thread, about twenty-four hours ago. Here's (some of) what I believe that I now understand.

"refcursor" is a base type, listed in pg_type. This sentence seems to be key:

«
A refcursor value is simply the string name of a so-called portal containing the active query for the cursor. This name can be passed around, assigned to other refcursor variables, and so on, without disturbing the portal.
»

Maybe it's better to say that a portal has a defining "select" statement and acts as a pointer to the potential result set that its select statement defines. A portal also represents the position of the current (next-to-be-fetched) row it that set. The doc that I've found doesn't make it clear how much of the entire result set is materialized at a time. But the implication is that it's materialized only in portions and that one portion is purged to make room for another.

You can create a portal instance using either top-level SQL (with the "declare" statement) or using PL/pgSQL by declaring a refcursor variable for its name and then using that as the argument of "open". Only in top-level SQL, the "with hold" option for "declare" lets you create a portal instance outside of a transaction block. This has session duration. (Or you can pre-empt this with the "close" statement.) Otherwise, you must use the "declare" statement within an ongoing transaction. With this choice, it vanishes when the transaction ends. You can also create a portal instance by using PL/pgSQL. (There's no "with hold" option here.)

A portal instance exists within the session as a whole, even though you can declare the refcursor to denote it as a PL/pgSQL subprogram's formal parameter or as a PL/pgSQL local variable. This means that you can create a portal instance using PL/pgSQL and (when you know its name) fetch from it using top-level SQL

The open portal instances in a particular session are listed in pg_cursors. (Why not pg_portals?) When the instance was created with the SQL "declare" statement, pg_cursors.statement shows the verbatim text that follows the "declare" keyword. (In other words, not a legal SQL statement.) When the instance was created using PL/pgSQL, pg_cursors.statement shows the verbatim text that follows (in one creation approach variant) "open <identifier> for" in the defining block statement's executable section. (In other words, and with a caveat about placeholders, this is a legal SQL statement.)

A portal instance is uniquely identified by just its name. (You cannot use a schema-qualified identifier to create it or to refer to it.) And (just like a prepared statement) the name must be unique only within a particular session.

There are many ways to set the name of a portal instance. Here are some examples. First top-level SQL:

begin;
declare "My Refcursor" cursor for select k, v from s.t order by k;
select name, statement from pg_cursors;
fetch forward 5 in "My Refcursor";
end;

I noticed that 'select pg_typeof("My Refcursor")' within the ongoing txn fails with '42703: column "My Refcursor" does not exist'.

Now, PL/pgSQL:

create function s.f(cur in refcursor = 'cur')
returns refcursor
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
begin
open cur for select k, v from s.t order by k;
return cur;
end;
$body$;

begin;
select s.f('My Cursor');
select name, statement from pg_cursors;
fetch forward 5 in "My Cursor";
end;

Arguably, it's pointless to use a function to return the name of the portal instance that you supplied as an input—and you might just as well write this:

create procedure s.p(cur in refcursor = 'cur')
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
begin
open cur for select k, v from s.t order by k;
end;
$body$;

begin;
call s.p('My Cursor');
select name, statement from pg_cursors;
fetch forward 5 in "My Cursor";
end;

You could sacrifice the ability to name the portal instance at runtime like this:

create procedure s.p()
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
cur refcursor := 'My Cursor';
begin
open cur for select k, v from s.t order by k;
end;
$body$;

begin;
call s.p();
select name, statement from pg_cursors;
fetch forward 5 in "My Cursor";
end;

You can even let the runtime system make up a name for you. But you need to go back to the function encapsulation to learn what was chosen:

create function s.f()
returns refcursor
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
cur refcursor;
begin
open cur for select k, v from s.t order by k;
return cur;
end;
$body$;

begin;
select s.f();
select name, statement from pg_cursors;
fetch forward 5 in "<unnamed portal 1>";
end;

Here's yet another variant:

create procedure s.p()
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
"My Refcursor" cursor for select k, v from s.t order by k;
begin
open "My Refcursor";
raise info '%', pg_typeof("My Refcursor")::text;
end;
$body$;

begin;
call s.p();
select name, statement from pg_cursors;
fetch forward 5 in "My Refcursor";
end;

(I included "pg_typeof()" just here to make the point that it reports "refcursor" and not the plain "cursor" that the declaration might lead you to expect. It reports "refcursor" in all the other PL/pgSQL examples too.

With all these variants (and there may be more), and with only some of the exemplified, I don't feel too stupid for getting confused.

laurenz.albe@cybertec.at

about 3 years ago

In reply to: Bryn Llewellyn (#8)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

Well, it is simple.

As we wrote, some of us think that cursors are useful, and we tried to
explain why we think that. If you don't think that cursors are useful,
don't use them. We are not out to convince you otherwise.

Yours,
Laurenz Albe

Dominique Devienne

ddevienne@gmail.com

about 3 years ago

In reply to: Laurenz Albe (#10)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

On Thu, Mar 16, 2023 at 6:48 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

As we wrote, some of us think that cursors are useful, and we tried to
explain why we think that. If you don't think that cursors are useful,
don't use them. We are not out to convince you otherwise.

Perhaps OT (I only skimed this thread) but when I compared Cursors to
regular Statements / Queries
from a LIBPQ client application perspective, on the same "streamable"
queries (i.e. w/o a sort), Cursor
shined in terms of time-to-first-row, compared to waiting for the whole
ResultSet, but getting the full result
OTOH was 2x as long with Cursor, compared to the regular SELECT Statement.

Thus in my mind, it really depends on what you value in a particular
situation, latency or throughput. --DD

PS: In my testing, I used forward-only cursors
PPS: I don't recall the ResultSet cardinality or byte size, nor the
batching used with the Cursor.

pavel.stehule@gmail.com

about 3 years ago

In reply to: Dominique Devienne (#11)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

čt 16. 3. 2023 v 9:18 odesílatel Dominique Devienne <ddevienne@gmail.com>
napsal:

On Thu, Mar 16, 2023 at 6:48 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

As we wrote, some of us think that cursors are useful, and we tried to
explain why we think that. If you don't think that cursors are useful,
don't use them. We are not out to convince you otherwise.

Perhaps OT (I only skimed this thread) but when I compared Cursors to
regular Statements / Queries
from a LIBPQ client application perspective, on the same "streamable"
queries (i.e. w/o a sort), Cursor
shined in terms of time-to-first-row, compared to waiting for the whole
ResultSet, but getting the full result
OTOH was 2x as long with Cursor, compared to the regular SELECT Statement.

Thus in my mind, it really depends on what you value in a particular
situation, latency or throughput. --DD

cursors are optimized for minimal cost of first row, queries are optimized
for minimal cost of last row

Regards

Pavel

Show quoted text

PS: In my testing, I used forward-only cursors
PPS: I don't recall the ResultSet cardinality or byte size, nor the
batching used with the Cursor.

Dominique Devienne

ddevienne@gmail.com

about 3 years ago

In reply to: Pavel Stehule (#12)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

On Thu, Mar 16, 2023 at 9:23 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

čt 16. 3. 2023 v 9:18 odesílatel Dominique Devienne <ddevienne@gmail.com>
napsal:

[...] depends on what you value in a particular situation, latency or
throughput. --DD

cursors are optimized for minimal cost of first row, queries are optimized
for minimal cost of last row

That's a nice way to put it Pavel.

And to have it both ways, use COPY in binary protocol? That way the rows
are streamed
to you in arbitrary chunks as soon as available (I hope), and the burden is
on you the
client to decode and use those rows in parallel as they are "streamed" to
you.

I've yet to test that (thus the 'i hope' above). I used COPY binary for
INSERTs,
and COPY text/json for SELECTs, not yet COPY binary for SELECTs. I'm hoping
the latency of COPY will be small compared to a regular SELECT where I have
to
wait for LIBPQ to assemble the whole ResultSet. Are my hopes unfounded? --DD

pavel.stehule@gmail.com

about 3 years ago

In reply to: Dominique Devienne (#13)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

čt 16. 3. 2023 v 9:55 odesílatel Dominique Devienne <ddevienne@gmail.com>
napsal:

On Thu, Mar 16, 2023 at 9:23 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

čt 16. 3. 2023 v 9:18 odesílatel Dominique Devienne <ddevienne@gmail.com>
napsal:

[...] depends on what you value in a particular situation, latency or
throughput. --DD

cursors are optimized for minimal cost of first row, queries are
optimized for minimal cost of last row

That's a nice way to put it Pavel.

And to have it both ways, use COPY in binary protocol? That way the rows
are streamed
to you in arbitrary chunks as soon as available (I hope), and the burden
is on you the
client to decode and use those rows in parallel as they are "streamed" to
you.

I've yet to test that (thus the 'i hope' above). I used COPY binary for
INSERTs,
and COPY text/json for SELECTs, not yet COPY binary for SELECTs. I'm hoping
the latency of COPY will be small compared to a regular SELECT where I
have to
wait for LIBPQ to assemble the whole ResultSet. Are my hopes unfounded?
--DD

COPY is a different creature - it has no execution plan, and it is not
interpreted by the executor.

Using COPY SELECT instead SELECT looks like premature optimization. The
performance benefit will be minimal (maybe there can be exceptions
depending on data, network properties or interface). Cursors, queries can
use binary protocol, if the client can support it.

Regards

Pavel

Dominique Devienne

ddevienne@gmail.com

about 3 years ago

In reply to: Pavel Stehule (#14)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

On Thu, Mar 16, 2023 at 11:15 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

čt 16. 3. 2023 v 9:55 odesílatel Dominique Devienne <ddevienne@gmail.com>
napsal:

On Thu, Mar 16, 2023 at 9:23 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

čt 16. 3. 2023 v 9:18 odesílatel Dominique Devienne <ddevienne@gmail.com>
napsal:

[...] depends on what you value in a particular situation, latency or
throughput. --DD

cursors are optimized for minimal cost of first row, queries are
optimized for minimal cost of last row

That's a nice way to put it Pavel.

And to have it both ways, use COPY in binary protocol?

COPY is a different creature - it has no execution plan, and it is not
interpreted by the executor.

OK. Not sure what that means exactly. There's still a SELECT, with possibly
WHERE clauses and/or JOINs, no?
Doesn't that imply an execution plan? I'm a bit confused.

Using COPY SELECT instead SELECT looks like premature optimization.

Possible. But this is not an e-commerce web-site with a PostgreSQL backend
here.
This is classical client-server with heavy weight desktop apps loading
heavy weight data
(in number and size) from PostgreSQL. So performance (throughput) does
matter a lot to us.
And I measure that performance in both rows/sec and MB/sec, not (itsy
bitsy) transactions / sec.

The performance benefit will be minimal ([...]).

COPY matters on INSERT for sure performance-wise.
So why wouldn't COPY matter for SELECTs too?

Cursors, queries can use binary protocol, if the client can support it.

I already do. But we need all the speed we can get.
In any case, I'll have to try and see/test for myself eventually.
We cannot afford to leave any performance gains on the table.

pavel.stehule@gmail.com

about 3 years ago

In reply to: Dominique Devienne (#15)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

čt 16. 3. 2023 v 11:52 odesílatel Dominique Devienne <ddevienne@gmail.com>
napsal:

On Thu, Mar 16, 2023 at 11:15 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

čt 16. 3. 2023 v 9:55 odesílatel Dominique Devienne <ddevienne@gmail.com>
napsal:

On Thu, Mar 16, 2023 at 9:23 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

čt 16. 3. 2023 v 9:18 odesílatel Dominique Devienne <
ddevienne@gmail.com> napsal:

[...] depends on what you value in a particular situation, latency or
throughput. --DD

cursors are optimized for minimal cost of first row, queries are
optimized for minimal cost of last row

That's a nice way to put it Pavel.

And to have it both ways, use COPY in binary protocol?

COPY is a different creature - it has no execution plan, and it is not
interpreted by the executor.

OK. Not sure what that means exactly. There's still a SELECT, with
possibly WHERE clauses and/or JOINs, no?
Doesn't that imply an execution plan? I'm a bit confused.

Using COPY SELECT instead SELECT looks like premature optimization.

Possible. But this is not an e-commerce web-site with a PostgreSQL backend
here.
This is classical client-server with heavy weight desktop apps loading
heavy weight data
(in number and size) from PostgreSQL. So performance (throughput) does
matter a lot to us.
And I measure that performance in both rows/sec and MB/sec, not (itsy
bitsy) transactions / sec.

The performance benefit will be minimal ([...]).

COPY matters on INSERT for sure performance-wise.
So why wouldn't COPY matter for SELECTs too?

Please, can you show some benchmarks :-) I don't believe it.

The protocol is already designed for massive reading by queries. If COPY
SELECT is significantly faster than SELECT, then some should be wrong on
some side (server or client).

Regards

Pavel

Show quoted text

Cursors, queries can use binary protocol, if the client can support it.

I already do. But we need all the speed we can get.
In any case, I'll have to try and see/test for myself eventually.
We cannot afford to leave any performance gains on the table.

adrian.klaver@aklaver.com

about 3 years ago

In reply to: Bryn Llewellyn (#9)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

On 3/15/23 18:41, Bryn Llewellyn wrote:

adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> wrote:

I have a hard time fathoming why someone who writes documentation does
not actually read documentation.

Ouch. In fact, I had read the whole of the "43.7. Cursors" section in
the "PL/pgSQL" chapter (www.postgresql.org/docs/15/plpgsql-cursors.html
<http://www.postgresql.org/docs/15/plpgsql-cursors.html>). And the
sections in the "SQL Commands" chapter for "declare", "fetch" and
"close". But several of the key concepts didn't sink in and this
prevented me not only from understanding what some of the examples
showed but, worse, from being able to use the right vocabulary to
express what confused me.

Given this from your original question:

" (Anyway, without anything like Oracle PL/SQL's packages, you have no
mechanism to hold the opened cursor variable between successive server
calls.)"

What part of this:

CREATE TABLE test (col text);
INSERT INTO test VALUES ('123');

CREATE FUNCTION reffunc(refcursor) RETURNS refcursor AS '
BEGIN
OPEN $1 FOR SELECT col FROM test;
RETURN $1;
END;
' LANGUAGE plpgsql;

BEGIN;
SELECT reffunc('funccursor');
FETCH ALL IN funccursor;
COMMIT;

did not make sense in that context?

The open portal instances in a particular session are listed in
pg_cursors. (Why not pg_portals?) When the instance was created with the

Why are tables also known as relations and you can look them up in
pg_class or pg_tables?

Answer: It is the rules of the game.

create procedure s.p()
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  "My Refcursor" cursor for select k, v from s.t order by k;
begin
  open "My Refcursor";
*  raise info '%', pg_typeof("My Refcursor")::text;*
end;
$body$;

begin;
call s.p();
select name, statement from pg_cursors;
fetch forward 5 in "My Refcursor";
end;

(I included "pg_typeof()" just here to make the point that it reports
"refcursor" and not the plain "cursor" that the declaration might lead
you to expect. It reports "refcursor" in all the other PL/pgSQL examples
too.

https://www.postgresql.org/docs/current/plpgsql-cursors.html

"All access to cursors in PL/pgSQL goes through cursor variables, which
are always of the special data type refcursor. One way to create a
cursor variable is just to declare it as a variable of type refcursor.
Another way is to use the cursor declaration syntax, which in general is:

name [ [ NO ] SCROLL ] CURSOR [ ( arguments ) ] FOR query;
"

Again, I would like to know how that is confusing?

With all these variants (and there may be more), and with only some of
the exemplified, I don't feel too stupid for getting confused.

Where you get confused is in moving the goal posts.

What starts out with:

"(Anyway, without anything like Oracle PL/SQL's packages, you have no
mechanism to hold the opened cursor variable between successive server
calls.)

Is it fair to say that the PL/pgSQL refcursor is useful, at best, only
in very special use-cases?"

evolves into deep dive into all thing cursors.

--
Adrian Klaver
adrian.klaver@aklaver.com

Peter J. Holzer

hjp-pgsql@hjp.at

about 3 years ago

In reply to: Dominique Devienne (#15)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

On 2023-03-16 11:52:47 +0100, Dominique Devienne wrote:

On Thu, Mar 16, 2023 at 11:15 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
čt 16. 3. 2023 v 9:55 odesílatel Dominique Devienne <ddevienne@gmail.com>
napsal:
That's a nice way to put it Pavel.
And to have it both ways, use COPY in binary protocol?

[...]

The performance benefit will be minimal ([...]).

COPY matters on INSERT for sure performance-wise.
So why wouldn't COPY matter for SELECTs too?

COPY is faster than a bunch of INSERTs because each INSERT has some
overhead: It needs to be parsed (if you PREPAREd the INSERT you need to
parse the EXECUTE command instead) and planned. But most importantly you
have a round trip time between the client and the server. With COPY you
incur that overhead only once.

(Which reminds me that I should benchmark INSERT with lots of VALUES
against COPY some time.)

With COPYing the output of a SELECT I don't see any savings. On the
contrary, it's an extra step.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

bryn@yugabyte.com

about 3 years ago

In reply to: Adrian Klaver (#17)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

adrian.klaver@aklaver.com wrote:

bryn@yugabyte.com wrote:

adrian.klaver@aklaver.com wrote:

I have a hard time fathoming why someone who writes documentation does not actually read documentation.

Ouch. In fact, I had read the whole of the "43.7. Cursors" section in the "PL/pgSQL" chapter. And the sections in the "SQL Commands" chapter for "declare", "fetch" and "close". But several of the key concepts didn't sink in and this prevented me not only from understanding what some of the examples showed but, worse, from being able to use the right vocabulary to express what confused me.

Given this from your original question:

« Anyway, without anything like Oracle PL/SQL's packages, you have no mechanism to hold the opened cursor variable between successive server calls. »

What part of this [a particular code example] did not make sense in that context?

First off, despite the fact that I've clearly annoyed you (for which I apologize), I have found these exchanges very helpful. So thank you very much.

Your questions can be summarized as "Why couldn't you understand the doc? And why did you mix questions about use-cases with questions about the mechanics?" The answer has to do with psychology. I probably can't explain this convincingly. That's why it's taken me a long time to respond. I also had to do lots of testing before responding to make sure that the mental model that I've formed for myself is consistent with these.

You may not be interested in what follows. But, anyway, here goes.

— I happen to have many years of experience with Oracle Database and PL/SQL. The latter has analogous features to PL/pgSQL's "refcursor". But the differences between the notions in the two environments are enormous. My attempt to understand the latter was hindered by my understanding of the former. I accept that this is *my* problem and that I could never expect that the PG doc would cater for such a reader.

— Oracle Database doesn't expose scrollability for PL/SQL's equivalent of "refcursor". So I had never come across use cases where this was beneficial. I wanted, therefore, to hear about some. I thought that insights here would help me understand the mechanics. But I didn't get anything beyond "Scrollability is what it is. If you don't need it, don't use it."

Anyway, never mind all that now. Here's what I now (think that) I understand—with some comments on what made it hard for me to grasp.

— The key notion is what is referred to sometimes as "portal" and sometimes as "cursor". This is the thing that's global within, and private to, a session, that's uniquely identified by a bare name, that, and that's listed in "pg_cursors". I believe that in typical use, a cursor has only transaction duration. But (and only when you use the SQL API) you can create a cursor with (up to) session duration

— The doc pages for the "declare", "fetch", and "close" SQL statements don't mention "portal" and use only "cursor". They use the term to mean the underlying phenomenon and use wording like: "DECLARE allows a user to create cursors"; "You can see all available cursors by querying the pg_cursors system view"; "FETCH retrieves rows using a previously-created cursor"; "CLOSE frees the resources associated with an open cursor. After the cursor is closed, no subsequent operations are allowed on it. A cursor should be closed when it is no longer needed." However, these pages leave the term "open" undefined, though it's used. It seems that it has no meaning. Rather, a cursor with a particular name either exists or not. You create it with "declare" and drop it with "close". And that's it. If "open" means anything, it's just another word for "exists". (The fact that "pg_cursors" doesn't have a boolean column called "open" supports this understanding.) The sentence "After the cursor is closed, no subsequent operations are allowed on it." is equivalent to "After a table is dropped, no subsequent operations are allowed on it." But who would bother to say that? Notice that "pg_cursors" has a column called "creation_time" — and not "declaration time".

— On the other hand, the doc page "43.7. Cursors" uses "portal" a lot—and never says that it means exactly the same as "cursor" qua term of art (and not qua keyword). It does say "...a so-called portal containing the active query for the cursor." This suggests a subtle difference in meaning between "portal" and "cursor" and a notion of containment. I can't make any sense of that. It says things like "Before a cursor can be used to retrieve rows, it must be opened. (This is the equivalent action to the SQL command DECLARE CURSOR.)" This is the closest that it comes to saying that the SQL API and the PL/pgSQL API both manipulate the same thing—what you see in "pg_cursors". The sentence that I quoted is equivalent to saying "Before you can insert a row into a table, the table has to exist." In other words, an unhelpful tautology. I believe that the sense is this: « A variable with the data type "refcursor" holds a bare name (which is governed by the usual rules for a SQL name). The name might be found in "pg_cursors" or it might not be. When, and only when, it is found in "pg_cursors", the refcursor variable acts as a handle to the denoted cursor and supports operations upon it using various PL/pgSQL statements that use the identifier for the refcursor variable's name.

— I (but maybe only I) would have appreciated being able to read a single generic account that explained the underlying concepts. This would have prepared me for understanding the operations that the SQL and PL/pgSQL APIs expose—and especially that they are interoperable. So I'd've liked to see a note at the start of the four relevant sections ("43.7. Cursors" and the "declare", "fetch", and "close" SQL statements) that x-ref'd to the generic account and said "read this first".

Here's some more detail of how I'd state the mental model that I've deduced. Please tell me if you think that some, or all, of my account is wrong. (When I say "cursor", I always mean what's listed in "pg_cursors". And I'll never mention "portal" because the term seems to means exactly the same as "cursor".)

(1) A cursor must have a defining "select" statement. It also always has a pointer to the next-to-be-fetched row in the result set that the "select" defines. The rows are (implicitly) numbered from 1 through N where N is the number of rows that the cursor's "select" defines. However (as you can see from "fetch :x" in SQL, where :x is less than 1 or more than N) the pointer can point outside of the result set and not cause an error.

(2) A cursor defines a read-consistent snapshot, as of its "pg_cursors.creation_time". (From the "declare" doc, « In PostgreSQL, all cursors are insensitive. ») The complete set of rows that the "select" defines may not all be concurrently materialized in the cursor. This implies some kind of aging out and replacement implementation. The details aren't described because they have no semantic significance.

(3) In top-level SQL, you create a cursor with the "declare" statement. This lets you name it, specify its “select”, and specify a few other boolean attributes like "[ no ] scroll" and "{ with | without } hold".

(4) In PL/pgSQL, you create a cursor with "open". The operand is the identifier for the refcursor variable that holds the cursor's name. You can test your mental model by using the equivalent SQL statements with the "execute" PL/SQL statement.

(5) In top-level SQL, you drop a cursor with "close" where the operand is the identifier for the cursor's name. In PL/pgSQL, you drop a cursor with "close" where the operand is the identifier for the refcursor variable that holds the cursor's name.

(6) In PL/pgSQL, the value of a refcursor variable is an ordinary "text" value. It might be null. It you assign the name of a cursor that's listed in "pg_cursors" to a refcursor variable, then you can fetch from it or close it. And as long as the name isn't currently found in "pg_cursors", you can create a new row with that name with the "open" statement, specifying any "select" that you want.

(7) I found the terms "bound cursor" and "unbound cursor" (as in the section "43.7.2.3. Opening A Bound Cursor") initially very confusing because the wording connotes a property of a cursor—and "pg_cursors" has no column for such a notion. But I presently came to understand that this was a careless shorthand for "[un]bound cursor variable" — which phrases are also used on the same page.

(8) I found it initially hard to understand that the "bound" property of a refcursor variable is not part of its value. (And nor, for that matter, is the SQL statement that you specify with the "open" statement.) I reasoned, eventually, that the "bound" property must be an annotation of the variable in the AST for the block statement where the variable is declared. (Here, "declare" is used in the PL/pgSQL sense, and not the SQL sense, of the term). This explains why, when a function returns a refcursor value where the variable was declared as "bound", it can only be seen as "unbound" in a subprogram that has a refcursor formal argument. The same reasoning applies if you assign a bound refcursor variable to an unbound refcursor variable. (But I can't see that you'd have a reason to do that unless, like I was, you were testing your mental model.)

It's the fact that the value that a refcursor variable holds is nothing other than the text of a (potential) cursor's name (and that the SQL text and "bound" status are represented elsewhere) that lead me to write « without anything like Oracle PL/SQL's packages, you have no mechanism to hold the opened cursor variable between successive server calls ». When I wrote that, I thought, wrongly as I now see, that a refcursor variable held a composite, opaque value (or an opaque pointer to such) like it does in Oracle.

(9) The upshot of #8 is that the "FOR recordvar IN bound_cursorvar" construct can be used only in the block statement that declares the bound cursor variable. And this seems to defeat the point. You may just as well use an ordinary "for" loop that has the SQL statement right after the "in" keyword.

(10) I discovered that this construct:

for ... in select ... from pg_cursors order by name loop
...
end loop;

sees a cursor with an automatically generated name like "<unnamed portal N>" for the loop itself. I suppose that this makes good sense. But it does seem to undermine the value of declaring and using a bound cursor variable—esp as the nominal value of the "cursor" concept is the scrollability and the ability to fetch a smallish set of rows from anywhere in a huge result set.

adrian.klaver@aklaver.com

about 3 years ago

In reply to: Bryn Llewellyn (#19)

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

On 3/20/23 1:46 PM, Bryn Llewellyn wrote:

adrian.klaver@aklaver.com wrote:

bryn@yugabyte.com wrote:

adrian.klaver@aklaver.com wrote:

I have a hard time fathoming why someone who writes documentation does not actually read documentation.

Ouch. In fact, I had read the whole of the "43.7. Cursors" section in the "PL/pgSQL" chapter. And the sections in the "SQL Commands" chapter for "declare", "fetch" and "close". But several of the key concepts didn't sink in and this prevented me not only from understanding what some of the examples showed but, worse, from being able to use the right vocabulary to express what confused me.

Given this from your original question:

« Anyway, without anything like Oracle PL/SQL's packages, you have no mechanism to hold the opened cursor variable between successive server calls. »

What part of this [a particular code example] did not make sense in that context?

First off, despite the fact that I've clearly annoyed you (for which I apologize), I have found these exchanges very helpful. So thank you very much.

Your questions can be summarized as "Why couldn't you understand the doc? And why did you mix questions about use-cases with questions about the mechanics?" The answer has to do with psychology. I probably can't explain this convincingly. That's why it's taken me a long time to respond. I also had to do lots of testing before responding to make sure that the mental model that I've formed for myself is consistent with these.

You may not be interested in what follows. But, anyway, here goes.

— I happen to have many years of experience with Oracle Database and PL/SQL. The latter has analogous features to PL/pgSQL's "refcursor". But the differences between the notions in the two environments are enormous. My attempt to understand the latter was hindered by my understanding of the former. I accept that this is *my* problem and that I could never expect that the PG doc would cater for such a reader.

— Oracle Database doesn't expose scrollability for PL/SQL's equivalent of "refcursor". So I had never come across use cases where this was beneficial. I wanted, therefore, to hear about some. I thought that insights here would help me understand the mechanics. But I didn't get anything beyond "Scrollability is what it is. If you don't need it, don't use it."

Anyway, never mind all that now. Here's what I now (think that) I understand—with some comments on what made it hard for me to grasp.

— The key notion is what is referred to sometimes as "portal" and sometimes as "cursor". This is the thing that's global within, and private to, a session, that's uniquely identified by a bare name, that, and that's listed in "pg_cursors". I believe that in typical use, a cursor has only transaction duration. But (and only when you use the SQL API) you can create a cursor with (up to) session duration

— The doc pages for the "declare", "fetch", and "close" SQL statements don't mention "portal" and use only "cursor". They use the term to mean the underlying phenomenon and use wording like: "DECLARE allows a user to create cursors"; "You can see all available cursors by querying the pg_cursors system view"; "FETCH retrieves rows using a previously-created cursor"; "CLOSE frees the resources associated with an open cursor. After the cursor is closed, no subsequent operations are allowed on it. A cursor should be closed when it is no longer needed." However, these pages leave the term "open" undefined, though it's used. It seems that it has no meaning. Rather, a cursor with a particular name either exists or not. You create it with "declare" and drop it with "close". And that's it. If "open" means anything, it's just another word for "exists". (The fact that "pg_cursors" doesn't have a boolean column called "open" supports this understanding.) The sentence "After the cursor is closed, no subsequent operations are allowed on it." is equivalent to "After a table is dropped, no subsequent operations are allowed on it." But who would bother to say that? Notice that "pg_cursors" has a column called "creation_time" — and not "declaration time".

— On the other hand, the doc page "43.7. Cursors" uses "portal" a lot—and never says that it means exactly the same as "cursor" qua term of art (and not qua keyword). It does say "...a so-called portal containing the active query for the cursor." This suggests a subtle difference in meaning between "portal" and "cursor" and a notion of containment. I can't make any sense of that. It says things like "Before a cursor can be used to retrieve rows, it must be opened. (This is the equivalent action to the SQL command DECLARE CURSOR.)" This is the closest that it comes to saying that the SQL API and the PL/pgSQL API both manipulate the same thing—what you see in "pg_cursors". The sentence that I quoted is equivalent to saying "Before you can insert a row into a table, the table has to exist." In other words, an unhelpful tautology. I believe that the sense is this: « A variable with the data type "refcursor" holds a bare name (which is governed by the usual rules for a SQL name). The name might be found in "pg_cursors" or it might not be. When, and only when, it is found in "pg_cursors", the refcursor variable acts as a handle to the denoted cursor and supports operations upon it using various PL/pgSQL statements that use the identifier for the refcursor variable's name.

— I (but maybe only I) would have appreciated being able to read a single generic account that explained the underlying concepts. This would have prepared me for understanding the operations that the SQL and PL/pgSQL APIs expose—and especially that they are interoperable. So I'd've liked to see a note at the start of the four relevant sections ("43.7. Cursors" and the "declare", "fetch", and "close" SQL statements) that x-ref'd to the generic account and said "read this first".

Here's some more detail of how I'd state the mental model that I've deduced. Please tell me if you think that some, or all, of my account is wrong. (When I say "cursor", I always mean what's listed in "pg_cursors". And I'll never mention "portal" because the term seems to means exactly the same as "cursor".)

(1) A cursor must have a defining "select" statement. It also always has a pointer to the next-to-be-fetched row in the result set that the "select" defines. The rows are (implicitly) numbered from 1 through N where N is the number of rows that the cursor's "select" defines. However (as you can see from "fetch :x" in SQL, where :x is less than 1 or more than N) the pointer can point outside of the result set and not cause an error.

(2) A cursor defines a read-consistent snapshot, as of its "pg_cursors.creation_time". (From the "declare" doc, « In PostgreSQL, all cursors are insensitive. ») The complete set of rows that the "select" defines may not all be concurrently materialized in the cursor. This implies some kind of aging out and replacement implementation. The details aren't described because they have no semantic significance.

(3) In top-level SQL, you create a cursor with the "declare" statement. This lets you name it, specify its “select”, and specify a few other boolean attributes like "[ no ] scroll" and "{ with | without } hold".

(4) In PL/pgSQL, you create a cursor with "open". The operand is the identifier for the refcursor variable that holds the cursor's name. You can test your mental model by using the equivalent SQL statements with the "execute" PL/SQL statement.

(5) In top-level SQL, you drop a cursor with "close" where the operand is the identifier for the cursor's name. In PL/pgSQL, you drop a cursor with "close" where the operand is the identifier for the refcursor variable that holds the cursor's name.

(6) In PL/pgSQL, the value of a refcursor variable is an ordinary "text" value. It might be null. It you assign the name of a cursor that's listed in "pg_cursors" to a refcursor variable, then you can fetch from it or close it. And as long as the name isn't currently found in "pg_cursors", you can create a new row with that name with the "open" statement, specifying any "select" that you want.

(7) I found the terms "bound cursor" and "unbound cursor" (as in the section "43.7.2.3. Opening A Bound Cursor") initially very confusing because the wording connotes a property of a cursor—and "pg_cursors" has no column for such a notion. But I presently came to understand that this was a careless shorthand for "[un]bound cursor variable" — which phrases are also used on the same page.

(8) I found it initially hard to understand that the "bound" property of a refcursor variable is not part of its value. (And nor, for that matter, is the SQL statement that you specify with the "open" statement.) I reasoned, eventually, that the "bound" property must be an annotation of the variable in the AST for the block statement where the variable is declared. (Here, "declare" is used in the PL/pgSQL sense, and not the SQL sense, of the term). This explains why, when a function returns a refcursor value where the variable was declared as "bound", it can only be seen as "unbound" in a subprogram that has a refcursor formal argument. The same reasoning applies if you assign a bound refcursor variable to an unbound refcursor variable. (But I can't see that you'd have a reason to do that unless, like I was, you were testing your mental model.)

It's the fact that the value that a refcursor variable holds is nothing other than the text of a (potential) cursor's name (and that the SQL text and "bound" status are represented elsewhere) that lead me to write « without anything like Oracle PL/SQL's packages, you have no mechanism to hold the opened cursor variable between successive server calls ». When I wrote that, I thought, wrongly as I now see, that a refcursor variable held a composite, opaque value (or an opaque pointer to such) like it does in Oracle.

(9) The upshot of #8 is that the "FOR recordvar IN bound_cursorvar" construct can be used only in the block statement that declares the bound cursor variable. And this seems to defeat the point. You may just as well use an ordinary "for" loop that has the SQL statement right after the "in" keyword.

(10) I discovered that this construct:

for ... in select ... from pg_cursors order by name loop
...
end loop;

sees a cursor with an automatically generated name like "<unnamed portal N>" for the loop itself. I suppose that this makes good sense. But it does seem to undermine the value of declaring and using a bound cursor variable—esp as the nominal value of the "cursor" concept is the scrollability and the ability to fetch a smallish set of rows from anywhere in a huge result set.

Pretty much all of the above can be explained by:

https://www.postgresql.org/docs/current/sql-declare.html

"Note

This page describes usage of cursors at the SQL command level. If you
are trying to use cursors inside a PL/pgSQL function, the rules are
different —"

"The SQL standard only makes provisions for cursors in embedded SQL. The
PostgreSQL server does not implement an OPEN statement for cursors; a
cursor is considered to be open when it is declared. However, ECPG, the
embedded SQL preprocessor for PostgreSQL, supports the standard SQL
cursor conventions, including those involving DECLARE and OPEN statements."

https://www.postgresql.org/docs/current/plpgsql-cursors.html

"Rather than executing a whole query at once, it is possible to set up a
cursor that encapsulates the query, and then read the query result a few
rows at a time. One reason for doing this is to avoid memory overrun
when the result contains a large number of rows. (However, PL/pgSQL
users do not normally need to worry about that, since FOR loops
automatically use a cursor internally to avoid memory problems.) A more
interesting usage is to return a reference to a cursor that a function
has created, allowing the caller to read the rows. This provides an
efficient way to return large row sets from functions."

As to portal, entering it in the documentation search leads to a first
result of:

https://www.postgresql.org/docs/current/protocol-flow.html

Do a page search for portal.

--
Adrian Klaver
adrian.klaver@aklaver.com

laurenz.albe@cybertec.at

about 3 years ago

In reply to: Bryn Llewellyn (#19)

bryn@yugabyte.com

about 3 years ago

In reply to: Laurenz Albe (#21)

laurenz.albe@cybertec.at

about 3 years ago

In reply to: Bryn Llewellyn (#22)

adrian.klaver@aklaver.com

about 3 years ago

In reply to: Bryn Llewellyn (#22)

bryn@yugabyte.com

about 3 years ago

In reply to: Laurenz Albe (#23)

bryn@yugabyte.com

about 3 years ago

In reply to: Adrian Klaver (#24)