ECPG FETCH readahead
Hi,
we improved ECPG quite a lot in 9.0 because we worked and
still working with an Informix to PostgreSQL migration project.
We came across a pretty big performance problem that can be seen in
every "naive" application that uses only FETCH 1, FETCH RELATIVE
or FETCH ABSOLUTE. These are almost the only FETCH variations
usable in Informix, i.e. it doesn't have the grammar for fetching N rows
at once. Instead, the Client SDK libraries do caching themselves
behind the scenes to reduce network turnaround time.
This is what we implemented for ECPG, so by default it fetches 256 rows
at once if possible and serves the application from memory. The number
of cached rows can be changed using the ECPGFETCHSZ environment
variable. The cursor readahead is activated by "ecpg -r fetch_readahead".
The implementation splits ECPGdo() and ecpg_execute() in ecpglib/execute.c
so the different parts are callable from the newly introduced cursor.c code.
Three new API calls are introduced: ECPGopen(), ECPGfetch() and
ECPGclose(). Obviously, OPEN and CLOSE use ECPGopen() and
ECPGclose(), respectively. They build and tear down the cache structures
besides calling the main ECPGdo() behind the scenes.
ECPGopen() also discovers the total number of records in the recordset,
so the previous ECPG "deficiency" (backend limitation) that sqlca.sqlerrd[2]
didn't report the (possibly estimated) number of rows in the resultset
is now
overcome. This slows down OPEN for cursors serving larger datasets
but it makes possible to position the readahead window using MOVE
ABSOLUTE no matter what FORWARD/BACKWARD/ABSOLUTE/RELATIVE
variants are used by the application. And the caching is more than
overweighs
the slowdown in OPEN it seems.
ECPGfetch() is the more interesting one, this handles FETCH and MOVE
statements and follows the absolute position of the cursor in the
client, too.
In Informix, the DECLARE statement is used for creating a cursor descriptor,
it can be OPENed/CLOSEd several times and the "FREE curname" statement
tears down the cursor descriptor. In our implementation, OPEN and CLOSE
sets up and tears down the caching structure, The DECLARE statement
didn't lose its declarative nature and the FREE statement is still only
usable
only for prepared statements. I chose this path because this way this
feature
can be used in native mode as well. It is usable even if the application
itself
uses FETCH N. The readahead window can be set externally to the
application to squeeze out more performance in batch programs.
The patch size is over 2MB because I introduced a new regression test
called fetch2.pgc that does a lot of work on a recordset having 400 rows.
It browses the recordset back and forth with:
- FETCH FORWARD 1/FETCH BACKWARD 1
- FETCH FORWARD 5/FETCH BACKWARD 5
- FETCH ABSOLUTE +N/FETCH ABSOLUTE -N
- FETCH FORWARD 3+MOVE FORWARD 7, also backwards
- FETCH RELATIVE +2/FETCH RELATIVE -2
This test is compiled both with and without "-r fetch_readahead", so
I was able to verify that the two runs produce the same output.
Also, fetch.pgc, dyntest.pgc and sqlda.pgc are also compiled with and
without "-r fetch_readahead", for verifying that both SQL and SQLDA
descriptors are working the same way as before. E.g. PGresult for
SQL descriptors are not simply assigned anymore, they are copied
using PQcopyResult() without tuples and a bunch of PQsetvalue() calls
to copy only the proper rows from the cache or all rows if no cache.
The split parts of ecpg_execute() are intentionally kept the original
wording (especially the "ecpg_execute" function name) in ecpg_log()
messages to eliminate any impact on other regression tests. If this is
not desired, a patch for this can come later.
Because of the patch size, the compressed version is attached.
Comments?
Best regards,
Zolt�n B�sz�rm�nyi
--
Bible has answers for everything. Proof:
"But let your communication be, Yea, yea; Nay, nay: for whatsoever is more
than these cometh of evil." (Matthew 5:37) - basics of digital technology.
"May your kingdom come" - superficial description of plate tectonics
----------------------------------
Zolt�n B�sz�rm�nyi
Cybertec Sch�nig & Sch�nig GmbH
http://www.postgresql.at/
Attachments:
Boszormenyi Zoltan wrote:
Hi,
we improved ECPG quite a lot in 9.0 because we worked and
still working with an Informix to PostgreSQL migration project.We came across a pretty big performance problem that can be seen in
every "naive" application that uses only FETCH 1, FETCH RELATIVE
or FETCH ABSOLUTE. These are almost the only FETCH variations
usable in Informix, i.e. it doesn't have the grammar for fetching N rows
at once. Instead, the Client SDK libraries do caching themselves
behind the scenes to reduce network turnaround time.
I assume our ecpg version supports >1 fetch values, even in Informix
mode. Does it make sense to add lots of code to our ecpg then?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ None of us is going to be here forever. +
Hi,
2010-06-23 22:42 keltez�ssel, Bruce Momjian �rta:
Boszormenyi Zoltan wrote:
Hi,
we improved ECPG quite a lot in 9.0 because we worked and
still working with an Informix to PostgreSQL migration project.We came across a pretty big performance problem that can be seen in
every "naive" application that uses only FETCH 1, FETCH RELATIVE
or FETCH ABSOLUTE. These are almost the only FETCH variations
usable in Informix, i.e. it doesn't have the grammar for fetching N rows
at once. Instead, the Client SDK libraries do caching themselves
behind the scenes to reduce network turnaround time.I assume our ecpg version supports>1 fetch values, even in Informix
mode. Does it make sense to add lots of code to our ecpg then?
I think, yes, it does make sense. Because we are talking
about porting a whole lot of COBOL applications.
The ESQL/C or ECPG connector was already written
the Informix quirks in mind, so it fetches only one record
at a time passing it to the application. And similar performance
is expected from ECPG - which excpectation is not fulfilled
currently because libecpg doesn't do the same caching as
ESQL/C does.
And FYI, I haven't added a whole lot of code, most of the
code changes in the patch is execute.c refactoring.
ECPGdo() was split into several functions, the new parts
are still doing the same things.
I can make the test case much smaller, I only needed
to test crossing the readahead window boundary.
This would also make the patch much smaller.
And this readahead is not on by default, it's only activated
by "ecpg -r fetch_readahead".
Best regards,
Zolt�n B�sz�rm�nyi
On 24/06/10 10:27, B�sz�rm�nyi Zolt�n wrote:
And this readahead is not on by default, it's only activated
by "ecpg -r fetch_readahead".
Is there a reason not to enable it by default? I'm a bit worried that it
will receive no testing if it's not always on.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
2010-06-24 11:04 keltez�ssel, Heikki Linnakangas �rta:
On 24/06/10 10:27, B�sz�rm�nyi Zolt�n wrote:
And this readahead is not on by default, it's only activated
by "ecpg -r fetch_readahead".Is there a reason not to enable it by default? I'm a bit worried that
it will receive no testing if it's not always on.
Because in the first step I wanted to minimize the impact
on regression test stderr results. This is what I mentioned
in the initial mail, I stuck to the original wording of ecpg_log()
messages in the split-up parts of the original ECPGdo() and
ecpg_execute() exactly for this reason. The usual policy for
ecpg_log() is to report the function name where it was issued.
I was also thinking about a new feature for pg_regress,
to compare stdout results of two regression tests automatically
so a difference can be reported as an error. It would be good
for automated testing of features in ECPG that can be toggled,
like auto-prepare and fetch readahead. It might come in handy
in other subsystems, too.
Best regards,
Zolt�n B�sz�rm�nyi
On Wed, Jun 23, 2010 at 04:42:37PM -0400, Bruce Momjian wrote:
I assume our ecpg version supports >1 fetch values, even in Informix
mode. Does it make sense to add lots of code to our ecpg then?
Yes, it does. The big question that zoltan and I haven't figured out yet, is
whether it makes sense to add all this even for native ecpg mode.
Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
ICQ 179140304, AIM/Yahoo/Skype michaelmeskes, Jabber meskes@jabber.org
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL
I think, yes, it does make sense. Because we are talking
about porting a whole lot of COBOL applications.
COBOL???
The ESQL/C or ECPG connector was already written
the Informix quirks in mind, so it fetches only one record
at a time passing it to the application. And similar performance
is expected from ECPG - which excpectation is not fulfilled
currently because libecpg doesn't do the same caching as
ESQL/C does.
Eh, you are talking about a program you wrote for your customer or they wrote
themselves, right? I simply refuse to add this stuff only to fix this situation
for that one customer of yours if it only hits them. Now the thing to discuss
is how common is this situation.
Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
ICQ 179140304, AIM/Yahoo/Skype michaelmeskes, Jabber meskes@jabber.org
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL
On Thu, Jun 24, 2010 at 12:04:30PM +0300, Heikki Linnakangas wrote:
Is there a reason not to enable it by default? I'm a bit worried
that it will receive no testing if it's not always on.
Yes, there is a reason, namely that you don't need it in native mode at all.
ECPG can read as many records as you want in one go, something ESQL/C
apparently cannot do. This patch is a workaround for that restriction. I still
do not really see that this feature gives us an advantage in native mode
though.
Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
ICQ 179140304, AIM/Yahoo/Skype michaelmeskes, Jabber meskes@jabber.org
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL
On Jun 24, 2010, at 2:13 PM, Michael Meskes wrote:
I think, yes, it does make sense. Because we are talking
about porting a whole lot of COBOL applications.COBOL???
yes, COBOL :).
it is much more common than people think.
it is not the first COBOL request for PostgreSQL hitting my desk.
in our concrete example we are using a C module written with ECPG which is magically attached to tons of COBOL code ...
The ESQL/C or ECPG connector was already written
the Informix quirks in mind, so it fetches only one record
at a time passing it to the application. And similar performance
is expected from ECPG - which excpectation is not fulfilled
currently because libecpg doesn't do the same caching as
ESQL/C does.Eh, you are talking about a program you wrote for your customer or they wrote
themselves, right? I simply refuse to add this stuff only to fix this situation
for that one customer of yours if it only hits them. Now the thing to discuss
is how common is this situation.Michael
i think that this cursor issue is a pretty common thing for many codes.
people are usually not aware of the fact that network round trips and parsing which are naturally related to "FETCH 1" are a lot more expensive than fetching one row somewhere deep inside the DB engine.
out there there are many applications which fetch data row by row. if an app fetches data row by row in PostgreSQL it will be A LOT slower than in, say, Informix because most commercial database clients will cache data inside a cursor behind the scenes to avoid the problem we try to solve.
currently we are talking about a performance penalty of factor 5 or so. so - it is not a small thing; it is a big difference.
regards,
hans
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
2010-06-24 14:13 keltezéssel, Michael Meskes írta:
I think, yes, it does make sense. Because we are talking
about porting a whole lot of COBOL applications.COBOL???
Yes, OpenCOBOL...
The ESQL/C or ECPG connector was already written
the Informix quirks in mind, so it fetches only one record
at a time passing it to the application. And similar performance
is expected from ECPG - which excpectation is not fulfilled
currently because libecpg doesn't do the same caching as
ESQL/C does.Eh, you are talking about a program you wrote for your customer or they wrote
themselves, right? I simply refuse to add this stuff only to fix this situation
for that one customer of yours if it only hits them. Now the thing to discuss
is how common is this situation.
The OpenCOBOL database connector was written by them
but the problem is more generic. There are many "naive"
applications (elsewhere, too) using cursors but fetching
one record at a time perhaps for portability reasons.
This patch provides a big performance boost for those.
Best regards,
Zoltán Böszörményi
On Thu, Jun 24, 2010 at 8:19 AM, Michael Meskes <meskes@postgresql.org> wrote:
On Thu, Jun 24, 2010 at 12:04:30PM +0300, Heikki Linnakangas wrote:
Is there a reason not to enable it by default? I'm a bit worried
that it will receive no testing if it's not always on.Yes, there is a reason, namely that you don't need it in native mode at all.
ECPG can read as many records as you want in one go, something ESQL/C
apparently cannot do. This patch is a workaround for that restriction. I still
do not really see that this feature gives us an advantage in native mode
though.
So, who has the next action on this patch? Does Zoltan need to revise
it, or does Michael need to review it, or where are we?
Thanks,
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
Robert Haas �rta:
On Thu, Jun 24, 2010 at 8:19 AM, Michael Meskes <meskes@postgresql.org> wrote:
On Thu, Jun 24, 2010 at 12:04:30PM +0300, Heikki Linnakangas wrote:
Is there a reason not to enable it by default? I'm a bit worried
that it will receive no testing if it's not always on.Yes, there is a reason, namely that you don't need it in native mode at all.
ECPG can read as many records as you want in one go, something ESQL/C
apparently cannot do. This patch is a workaround for that restriction. I still
do not really see that this feature gives us an advantage in native mode
though.So, who has the next action on this patch? Does Zoltan need to revise
it, or does Michael need to review it, or where are we?
Michael reviewed it shortly in private and I need to send
a new revision anyway, regardless of his comments.
I will refresh it ASAP.
Best regards,
Zolt�n B�sz�rm�nyi
--
----------------------------------
Zolt�n B�sz�rm�nyi
Cybertec Sch�nig & Sch�nig GmbH
Gr�hrm�hlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/
2010-10-14 11:56 keltezéssel, Boszormenyi Zoltan írta:
Hi,
Robert Haas írta:
On Thu, Jun 24, 2010 at 8:19 AM, Michael Meskes <meskes@postgresql.org> wrote:
On Thu, Jun 24, 2010 at 12:04:30PM +0300, Heikki Linnakangas wrote:
Is there a reason not to enable it by default? I'm a bit worried
that it will receive no testing if it's not always on.Yes, there is a reason, namely that you don't need it in native mode at all.
ECPG can read as many records as you want in one go, something ESQL/C
apparently cannot do. This patch is a workaround for that restriction. I still
do not really see that this feature gives us an advantage in native mode
though.So, who has the next action on this patch? Does Zoltan need to revise
it, or does Michael need to review it, or where are we?Michael reviewed it shortly in private and I need to send
a new revision anyway, regardless of his comments.
I will refresh it ASAP.
The ASAP took a little long. The attached patch is in git diff format,
because (1) the regression test intentionally doesn't do ECPGdebug()
so the patch isn't dominated by a 2MB stderr file, so this file is empty
and (2) regular diff cannot cope with empty new files.
Anyway, here is the new version with a new feature.
Implementation details:
- New ecpglib/cursor.c handles the client-side accounting of cursors.
- Functions in ecpglib/execute.c are split into smaller functions
so useful operations can be used by the new cursor.c
- ecpg -r fetch_readahead enables readahead by default for all cursors.
- Default readahead size is 256 rows, it can be modified by an environment
variable, ECPGFETCHSZ.
- *NEW FEATURE* Readahead can be individually enabled or disabled
by ECPG-side grammar:
DECLARE curname [ [ NO ] READAHEAD ] CURSOR FOR ...
Without [ NO ] READAHEAD, the default behaviour is used for cursors.
- Since the server and the client may disagree on the cursor position
if readahead is used, ECPG preprocessor throws an error if
WHERE CURRENT OF is used on such cursors.
- The default assumed behaviour of cursors with readahead is NO SCROLL.
If you want readahead and backward scrolling, SCROLL keyword is mandatory.
The regression test creates a table with 513 rows, so it spans 2 full and
1 incomplete readahead window. It reads the table with two cursors, one
with readahead and one without by 1 records forward and backward.
This is repeated with reading 5 records at a time. Then the whole table is
read into a chain of sqlda structures forward and backward. All other
regression tests pass as well.
The original regression tests also pass with these changes, the split of
execute.c was risky in this regard. Now the split is done more cleanly
than in the previous version, the file is not as rearranged as before.
Best regards,
Zoltán Böszörményi
--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/
Attachments:
ecpg-cursor-readahead-9.2-git-v2.patchtext/plain; name=ecpg-cursor-readahead-9.2-git-v2.patchDownload+2390-242
Hi,
2011-11-16 20:51 keltezéssel, Boszormenyi Zoltan írta:
2010-10-14 11:56 keltezéssel, Boszormenyi Zoltan írta:
Hi,
Robert Haas írta:
On Thu, Jun 24, 2010 at 8:19 AM, Michael Meskes <meskes@postgresql.org> wrote:
On Thu, Jun 24, 2010 at 12:04:30PM +0300, Heikki Linnakangas wrote:
Is there a reason not to enable it by default? I'm a bit worried
that it will receive no testing if it's not always on.Yes, there is a reason, namely that you don't need it in native mode at all.
ECPG can read as many records as you want in one go, something ESQL/C
apparently cannot do. This patch is a workaround for that restriction. I still
do not really see that this feature gives us an advantage in native mode
though.So, who has the next action on this patch? Does Zoltan need to revise
it, or does Michael need to review it, or where are we?Michael reviewed it shortly in private and I need to send
a new revision anyway, regardless of his comments.
I will refresh it ASAP.The ASAP took a little long. The attached patch is in git diff format,
because (1) the regression test intentionally doesn't do ECPGdebug()
so the patch isn't dominated by a 2MB stderr file, so this file is empty
and (2) regular diff cannot cope with empty new files.Anyway, here is the new version with a new feature.
Implementation details:- New ecpglib/cursor.c handles the client-side accounting of cursors.
- Functions in ecpglib/execute.c are split into smaller functions
so useful operations can be used by the new cursor.c
- ecpg -r fetch_readahead enables readahead by default for all cursors.
- Default readahead size is 256 rows, it can be modified by an environment
variable, ECPGFETCHSZ.
- *NEW FEATURE* Readahead can be individually enabled or disabled
by ECPG-side grammar:
DECLARE curname [ [ NO ] READAHEAD ] CURSOR FOR ...
Without [ NO ] READAHEAD, the default behaviour is used for cursors.
- Since the server and the client may disagree on the cursor position
if readahead is used, ECPG preprocessor throws an error if
WHERE CURRENT OF is used on such cursors.
- The default assumed behaviour of cursors with readahead is NO SCROLL.
If you want readahead and backward scrolling, SCROLL keyword is mandatory.The regression test creates a table with 513 rows, so it spans 2 full and
1 incomplete readahead window. It reads the table with two cursors, one
with readahead and one without by 1 records forward and backward.
This is repeated with reading 5 records at a time. Then the whole table is
read into a chain of sqlda structures forward and backward. All other
regression tests pass as well.The original regression tests also pass with these changes, the split of
execute.c was risky in this regard. Now the split is done more cleanly
than in the previous version, the file is not as rearranged as before.
New patch attached against yesterday's GIT tree. Changes:
- documented the new ECPG_INVALID_CURSOR error code
- consistently free everything in error paths in cursor.c
Best regards,
Zoltán Böszörményi
--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/
Attachments:
ecpg-cursor-readahead-9.2-git-v5.patchtext/x-patch; name=ecpg-cursor-readahead-9.2-git-v5.patchDownload+2423-242
On Thu, Dec 29, 2011 at 10:46:23AM +0100, Boszormenyi Zoltan wrote:
2011-11-16 20:51 keltez?ssel, Boszormenyi Zoltan ?rta:
2010-10-14 11:56 keltez?ssel, Boszormenyi Zoltan ?rta:
On Thu, Jun 24, 2010 at 8:19 AM, Michael Meskes <meskes@postgresql.org> wrote:
On Thu, Jun 24, 2010 at 12:04:30PM +0300, Heikki Linnakangas wrote:
Is there a reason not to enable it by default? I'm a bit worried
that it will receive no testing if it's not always on.Yes, there is a reason, namely that you don't need it in native mode at all.
ECPG can read as many records as you want in one go, something ESQL/C
apparently cannot do. This patch is a workaround for that restriction. I still
do not really see that this feature gives us an advantage in native mode
though.
We yet lack a consensus on whether native ECPG apps should have access to the
feature. My 2c is to make it available, because it's useful syntactic sugar.
If your program independently processes each row of an arbitrary-length result
set, current facilities force you to add an extra outer loop to batch the
FETCHes for every such code site. Applications could define macros to
abstract that pattern, but this seems common-enough to justify bespoke
handling. Besides, minimalists already use libpq directly.
I suggest enabling the feature by default but drastically reducing the default
readahead chunk size from 256 to, say, 5. That still reduces the FETCH round
trip overhead by 80%, but it's small enough not to attract pathological
behavior on a workload where each row is a 10 MiB document. I would not offer
an ecpg-time option to disable the feature per se. Instead, let the user set
the default chunk size at ecpg time. A setting of 1 effectively disables the
feature, though one could later re-enable it with ECPGFETCHSZ.
The ASAP took a little long. The attached patch is in git diff format,
because (1) the regression test intentionally doesn't do ECPGdebug()
so the patch isn't dominated by a 2MB stderr file, so this file is empty
and (2) regular diff cannot cope with empty new files.
Avoid the empty file with fprintf(STDERR, "Dummy non-empty error output\n");
- *NEW FEATURE* Readahead can be individually enabled or disabled
by ECPG-side grammar:
DECLARE curname [ [ NO ] READAHEAD ] CURSOR FOR ...
Without [ NO ] READAHEAD, the default behaviour is used for cursors.
Likewise, this may as well take a chunk size rather than a yes/no.
The patch adds warnings:
error.c: In function `ecpg_raise':
error.c:281: warning: format not a string literal and no format arguments
error.c:281: warning: format not a string literal and no format arguments
The patch adds few comments and no larger comments explaining its higher-level
ideas. That makes it much harder to review. In this regard it follows the
tradition of the ECPG code, but let's depart from that tradition for new work.
I mention a few cases below where the need for commentary is acute.
I tested full reads of various "SELECT * FROM generate_series(1, $1)" commands
over a 50ms link, and the patch gives a sound performance improvement. With
no readahead, a mere N=100 takes 5s to drain. With readahead enabled, N=10000
takes only 3s. Performance was quite similar to that of implementing my own
batching with "FETCH 256 FROM cur". When I kicked up ECPGFETCHSZ (after
fixing its implementation -- see below) past the result set size, performance
was comparable to that of simply passing the query through psql.
--- a/doc/src/sgml/ecpg.sgml +++ b/doc/src/sgml/ecpg.sgml
@@ -5289,6 +5315,17 @@ while (1)
</varlistentry><varlistentry> + <term>-231 (<symbol>ECPG_INVALID_CURSOR</symbol>)</term> + <listitem> + <para> + The cursor you are trying to use with readahead has not been opened yet (SQLSTATE 34000), + invalid values were passed to libecpg (SQLSTATE 42P11) hor not in prerequisite state, i.e.
Typo.
--- /dev/null +++ b/src/interfaces/ecpg/ecpglib/cursor.c @@ -0,0 +1,730 @@
cursor.c contains various >78-col lines. pgindent has limited ability to
improve those, so please split them.
+static struct cursor_descriptor * +add_cursor(int lineno, struct connection *con, const char *name, bool scrollable, int64 n_tuples, bool *existing) +{ + struct cursor_descriptor *desc, + *ptr, *prev = NULL; + bool found = false; + + if (!name || name[0] == '\0') + { + if (existing) + *existing = false; + return NULL; + } + + ptr = con->cursor_desc; + while (ptr) + { + int ret = strcasecmp(ptr->name, name); + + if (ret == 0) + { + found = true; + break; + } + if (ret > 0) + break; + + prev = ptr; + ptr = ptr->next; + }
Any reason not to use find_cursor() here?
+static void +del_cursor(struct connection *con, const char *name) +{ + struct cursor_descriptor *ptr, *prev = NULL; + bool found = false; + + ptr = con->cursor_desc; + while (ptr) + { + int ret = strcasecmp(ptr->name, name); + + if (ret == 0) + { + found = true; + break; + } + if (ret > 0) + break; + + prev = ptr; + ptr = ptr->next; + }
Any reason not to use find_cursor() here?
+bool +ECPGopen(const int lineno, const int compat, const int force_indicator, + const char *connection_name, const bool questionmarks, + const char *curname, const int st, const char *query, ...) +{ + va_list args; + bool ret, scrollable; + char *new_query, *ptr, *whold, *noscroll, *scroll, *dollar0; + struct sqlca_t *sqlca = ECPGget_sqlca(); + + if (!query) + { + ecpg_raise(lineno, ECPG_EMPTY, ECPG_SQLSTATE_ECPG_INTERNAL_ERROR, NULL); + return false; + } + ptr = strstr(query, "for "); + if (!ptr) + { + ecpg_raise(lineno, ECPG_INVALID_STMT, ECPG_SQLSTATE_ECPG_INTERNAL_ERROR, NULL); + return false; + } + whold = strstr(query, "with hold "); + dollar0 = strstr(query, "$0"); + + noscroll = strstr(query, "no scroll "); + scroll = strstr(query, "scroll ");
A query like 'SELECT 1 AS "with hold "' fools these lexical tests. Capture
that information in the parser rather than attempting to reconstruct it here.
+ scrollable = (noscroll == NULL) && (scroll != NULL) && (scroll < ptr); + + new_query = ecpg_alloc(strlen(curname) + strlen(ptr) + (whold ? 10 : 0) + 32, lineno); + if (!new_query) + return false; + sprintf(new_query, "declare %s %s cursor %s%s", + (dollar0 && (dollar0 < ptr) ? "$0" : curname), + (scrollable ? "scroll" : "no scroll"), + (whold ? "with hold " : ""), + ptr); + + /* Set the fetch size the first time we are called. */ + if (fetch_size == 0) + { + char *fsize_str = getenv("ECPGFETCHSZ"); + char *endptr = NULL; + int fsize; + + if (fsize_str) + { + fsize = strtoul(fsize_str, &endptr, 10); + if (endptr || (fsize < 4)) + fetch_size = DEFAULTFETCHSIZE;
"endptr" will never be NULL; use "*endptr". As it stands, the code always
ignores ECPGFETCHSZ. An unusable ECPGFETCHSZ should procedure an error, not
silently give no effect. Why a minimum of 4?
+ else + fetch_size = fsize; + } + else + fetch_size = DEFAULTFETCHSIZE; + } + + va_start(args, query); + ret = ecpg_do(lineno, compat, force_indicator, connection_name, questionmarks, st, new_query, args); + va_end(args); + + ecpg_free(new_query); + + /* + * If statement went OK, add the cursor and discover the + * number of rows in the recordset. This will slow down OPEN + * but we gain a lot with caching. + */ + if (ret /* && sqlca->sqlerrd[2] == 0 */)
Why is the commented code there?
+ { + struct connection *con = ecpg_get_connection(connection_name); + struct cursor_descriptor *cur; + bool existing; + int64 n_tuples; + + if (scrollable) + { + PGresult *res; + char *query; + char *endptr = NULL; + + query = ecpg_alloc(strlen(curname) + strlen("move all in ") + 2, lineno); + sprintf(query, "move all in %s", curname); + res = PQexec(con->connection, query); + n_tuples = strtoull(PQcmdTuples(res), &endptr, 10); + PQclear(res); + ecpg_free(query); + + /* Go back to the beginning of the resultset. */ + query = ecpg_alloc(strlen(curname) + strlen("move absolute 0 in ") + 2, lineno); + sprintf(query, "move absolute 0 in %s", curname); + res = PQexec(con->connection, query); + PQclear(res); + ecpg_free(query); + } + else + { + n_tuples = 0; + }
You give this rationale for the above code:
On Thu, Jun 17, 2010 at 02:09:47PM +0200, Boszormenyi Zoltan wrote:
ECPGopen() also discovers the total number of records in the recordset,
so the previous ECPG "deficiency" (backend limitation) that sqlca.sqlerrd[2]
didn't report the (possibly estimated) number of rows in the resultset
is now
overcome. This slows down OPEN for cursors serving larger datasets
but it makes possible to position the readahead window using MOVE
ABSOLUTE no matter what FORWARD/BACKWARD/ABSOLUTE/RELATIVE
variants are used by the application. And the caching is more than
overweighs
the slowdown in OPEN it seems.
From the documentation for Informix and Oracle, those databases do not
populate sqlerrd[2] this way:
http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.sqlt.doc/sqltmst189.htm
http://docs.oracle.com/cd/A57673_01/DOC/api/doc/PC_22/ch10.htm#toc139
The performance impact will vary widely depending on the query cost per row
and the fraction of rows the application will actually retrieve. Consider a
complex aggregate returning only a handful of rows. Consider SELECT * on a
1B-row table with the application ceasing reads after 1000 rows. Performance
aside, this will yield double execution of any volatile functions involved.
So, I think we ought to diligently avoid this step. (Failing that, the
documentation must warn about the extra full cursor scan and this feature must
stay disabled by default.)
+ + /* Add the cursor */ + cur = add_cursor(lineno, con, curname, scrollable, n_tuples, &existing); + + /* + * Report the number of tuples for the [scrollable] cursor. + * The server didn't do it for us. + */ + sqlca->sqlerrd[2] = (cur->n_tuples < LONG_MAX ? cur->n_tuples : LONG_MAX); + } + + return ret; +} + +static bool +ecpg_cursor_execute(struct statement * stmt, struct cursor_descriptor *cur) +{ + char tmp[64]; + char *query; + int64 start_pos; + + if ((cur->cache_pos >= cur->start_pos) && cur->res && (cur->cache_pos < cur->start_pos + PQntuples(cur->res))) + { + stmt->results = cur->res; + ecpg_free_params(stmt, true, stmt->lineno); + return true; + }
Why does ecpg_cursor_execute() also call ecpg_free_params()? Offhand, it
seems that ECPGfetch() always takes care of that and is the more appropriate
place, seeing it's the one calling ecpg_build_params().
--- a/src/interfaces/ecpg/ecpglib/data.c +++ b/src/interfaces/ecpg/ecpglib/data.c @@ -120,7 +120,7 @@ check_special_value(char *ptr, double *retval, char **endptr) }bool -ecpg_get_data(const PGresult *results, int act_tuple, int act_field, int lineno, +ecpg_get_data(const PGresult *results, int var_index, int act_tuple, int act_field, int lineno, enum ECPGttype type, enum ECPGttype ind_type, char *var, char *ind, long varcharsize, long offset, long ind_offset, enum ARRAY_TYPE isarray, enum COMPAT_MODE compat, bool force_indicator)
This function could sure use a block comment such as would be customary in
src/backend. Compare the one at heap_update(), for example.
--- a/src/interfaces/ecpg/ecpglib/error.c +++ b/src/interfaces/ecpg/ecpglib/error.c @@ -268,6 +268,20 @@ ecpg_raise(int line, int code, const char *sqlstate, const char *str) ecpg_gettext("could not connect to database \"%s\" on line %d"), str, line); break;+ case ECPG_INVALID_CURSOR: + if (strcmp(sqlstate, ECPG_SQLSTATE_OBJECT_NOT_IN_PREREQUISITE_STATE) == 0) + snprintf(sqlca->sqlerrm.sqlerrmc, sizeof(sqlca->sqlerrm.sqlerrmc), + + /* + * translator: this string will be truncated at 149 characters + * expanded. + */ + ecpg_gettext("cursor can only scan forward"));
Every other message works in the line number somehow; this should do the same.
--- a/src/interfaces/ecpg/ecpglib/execute.c +++ b/src/interfaces/ecpg/ecpglib/execute.c
@@ -1707,46 +1809,20 @@ ecpg_execute(struct statement * stmt)
}bool -ECPGdo(const int lineno, const int compat, const int force_indicator, const char *connection_name, const bool questionmarks, const int st, const char *query,...) +ecpg_do_prologue(int lineno, const int compat, const int force_indicator, + const char *connection_name, const bool questionmarks, + enum ECPG_statement_type statement_type, const char *query, + va_list args, struct statement **stmt_out)
A block comment would especially help this function, considering the name
tells one so little.
--- a/src/interfaces/ecpg/ecpglib/extern.h +++ b/src/interfaces/ecpg/ecpglib/extern.h @@ -60,6 +60,12 @@ struct statement bool questionmarks; struct variable *inlist; struct variable *outlist; + char *oldlocale; + const char **dollarzero; + int ndollarzero; + const char **param_values; + int nparams; + PGresult *results; };
Please comment the members of this struct like we do in most of src/include.
dollarzero has something to do with dynamic cursor names, right? Does it have
other roles?
--- a/src/interfaces/ecpg/preproc/ecpg.header +++ b/src/interfaces/ecpg/preproc/ecpg.header
@@ -111,6 +115,26 @@ mmerror(int error_code, enum errortype type, const char *error, ...)
}/* + * set use_fetch_readahead based on the current cursor + * doesn't return if the cursor is not declared + */ +static void +set_cursor_readahead(const char *curname) +{ + struct cursor *ptr; + + for (ptr = cur; ptr != NULL; ptr = ptr->next) + { + if (strcmp(ptr->name, curname) == 0) + break; + } + if (!ptr) + mmerror(PARSE_ERROR, ET_FATAL, "cursor \"%s\" does not exist", curname); + + use_fetch_readahead = ptr->fetch_readahead; +}
Following add_additional_variables(), use strcasecmp() for literal cursor
names and strcmp() for cursor name host variables.
--- a/src/interfaces/ecpg/preproc/extern.h +++ b/src/interfaces/ecpg/preproc/extern.h @@ -24,12 +24,17 @@ extern bool autocommit, force_indicator, questionmarks, regression_mode, - auto_prepare; + auto_prepare, + fetch_readahead; +extern bool use_fetch_readahead;
The names of the last two variables don't make clear the difference between
them. I suggest default_fetch_readahead and current_fetch_readahead.
--- /dev/null +++ b/src/interfaces/ecpg/test/preproc/cursor-readahead.pgc @@ -0,0 +1,244 @@ +#include <stdio.h> +#include <malloc.h>
Why <malloc.h>? I only see this calling free(); use <stdlib.h> instead.
Thanks,
nm
Hi,
first, thank you for answering and for the review.
2012-03-02 17:41 keltezéssel, Noah Misch írta:
On Thu, Dec 29, 2011 at 10:46:23AM +0100, Boszormenyi Zoltan wrote:
2011-11-16 20:51 keltez?ssel, Boszormenyi Zoltan ?rta:
2010-10-14 11:56 keltez?ssel, Boszormenyi Zoltan ?rta:
On Thu, Jun 24, 2010 at 8:19 AM, Michael Meskes <meskes@postgresql.org> wrote:
On Thu, Jun 24, 2010 at 12:04:30PM +0300, Heikki Linnakangas wrote:
Is there a reason not to enable it by default? I'm a bit worried
that it will receive no testing if it's not always on.Yes, there is a reason, namely that you don't need it in native mode at all.
ECPG can read as many records as you want in one go, something ESQL/C
apparently cannot do. This patch is a workaround for that restriction. I still
do not really see that this feature gives us an advantage in native mode
though.We yet lack a consensus on whether native ECPG apps should have access to the
feature.
I don't even remember about any opinion on this matter.
So, at this point don't know whether it's lack of interest.
We also have a saying "silence means agreement". :-)
My 2c is to make it available, because it's useful syntactic sugar.
Thanks, we thought the same.
If your program independently processes each row of an arbitrary-length result
set, current facilities force you to add an extra outer loop to batch the
FETCHes for every such code site. Applications could define macros to
abstract that pattern, but this seems common-enough to justify bespoke
handling.
We have similar opinions.
Besides, minimalists already use libpq directly.
Indeed. On the other hand, ECPG provides a safety net with syntax checking
so it's useful for not minimalist types. :-)
I suggest enabling the feature by default but drastically reducing the default
readahead chunk size from 256 to, say, 5.
That still reduces the FETCH round
trip overhead by 80%, but it's small enough not to attract pathological
behavior on a workload where each row is a 10 MiB document.
I see. How about 8? Nice "round" power of 2 value, still small and avoids
87.5% of overhead.
BTW, the default disabled behaviour was to avoid "make check" breakage,
see below.
I would not offer
an ecpg-time option to disable the feature per se. Instead, let the user set
the default chunk size at ecpg time. A setting of 1 effectively disables the
feature, though one could later re-enable it with ECPGFETCHSZ.
This means all code previously going through ECPGdo() would go through
ECPGopen()/ECPGfetch()/ECPGclose(). This is more intrusive and all
regression tests that were only testing certain features would also
test the readahead feature, too.
Also, the test for WHERE CURRENT OF at ecpg time would have to be done
at runtime, possibly making previously working code fail if ECPGFETCHSZ is enabled.
How about still allowing "NO READAHEAD" cursors that compile into plain ECPGdo()?
This way, ECPGFETCHSZ don't interfere with WHERE CURRENT OF. But this would
mean code changes everywhere where WHERE CURRENT OF is used.
Or how about a new feature in the backend, so ECPG can do
UPDATE/DELETE ... WHERE OFFSET N OF cursor
and the offset of computed from the actual cursor position and the position known
by the application? This way an app can do readahead and do work on rows collected
by the cursor with WHERE CURRENT OF which gets converted to WHERE OFFSET OF
behind the scenes.
The ASAP took a little long. The attached patch is in git diff format,
because (1) the regression test intentionally doesn't do ECPGdebug()
so the patch isn't dominated by a 2MB stderr file, so this file is empty
and (2) regular diff cannot cope with empty new files.Avoid the empty file with fprintf(STDERR, "Dummy non-empty error output\n");
Fixed.
- *NEW FEATURE* Readahead can be individually enabled or disabled
by ECPG-side grammar:
DECLARE curname [ [ NO ] READAHEAD ] CURSOR FOR ...
Without [ NO ] READAHEAD, the default behaviour is used for cursors.Likewise, this may as well take a chunk size rather than a yes/no.
Done.
The patch adds warnings:
error.c: In function `ecpg_raise':
error.c:281: warning: format not a string literal and no format arguments
error.c:281: warning: format not a string literal and no format arguments
Fixed.
The patch adds few comments and no larger comments explaining its higher-level
ideas. That makes it much harder to review. In this regard it follows the
tradition of the ECPG code, but let's depart from that tradition for new work.
I mention a few cases below where the need for commentary is acute.
Understood. Adding comments as I go over that code again.
I tested full reads of various "SELECT * FROM generate_series(1, $1)" commands
over a 50ms link, and the patch gives a sound performance improvement. With
no readahead, a mere N=100 takes 5s to drain. With readahead enabled, N=10000
takes only 3s. Performance was quite similar to that of implementing my own
batching with "FETCH 256 FROM cur". When I kicked up ECPGFETCHSZ (after
fixing its implementation -- see below) past the result set size, performance
was comparable to that of simply passing the query through psql.--- a/doc/src/sgml/ecpg.sgml +++ b/doc/src/sgml/ecpg.sgml @@ -5289,6 +5315,17 @@ while (1) </varlistentry><varlistentry> + <term>-231 (<symbol>ECPG_INVALID_CURSOR</symbol>)</term> + <listitem> + <para> + The cursor you are trying to use with readahead has not been opened yet (SQLSTATE 34000), + invalid values were passed to libecpg (SQLSTATE 42P11) hor not in prerequisite state, i.e.Typo.
Fixed.
--- /dev/null +++ b/src/interfaces/ecpg/ecpglib/cursor.c @@ -0,0 +1,730 @@cursor.c contains various >78-col lines. pgindent has limited ability to
improve those, so please split them.+static struct cursor_descriptor * +add_cursor(int lineno, struct connection *con, const char *name, bool scrollable, int64 n_tuples, bool *existing) +{ + struct cursor_descriptor *desc, + *ptr, *prev = NULL; + bool found = false; + + if (!name || name[0] == '\0') + { + if (existing) + *existing = false; + return NULL; + } + + ptr = con->cursor_desc; + while (ptr) + { + int ret = strcasecmp(ptr->name, name); + + if (ret == 0) + { + found = true; + break; + } + if (ret > 0) + break; + + prev = ptr; + ptr = ptr->next; + }Any reason not to use find_cursor() here?
Because both add_cursor() and del_cursor() needs the "prev" pointer.
I now modified find_cursor() and they use it.
+static void +del_cursor(struct connection *con, const char *name) +{ + struct cursor_descriptor *ptr, *prev = NULL; + bool found = false; + + ptr = con->cursor_desc; + while (ptr) + { + int ret = strcasecmp(ptr->name, name); + + if (ret == 0) + { + found = true; + break; + } + if (ret > 0) + break; + + prev = ptr; + ptr = ptr->next; + }Any reason not to use find_cursor() here?
+bool +ECPGopen(const int lineno, const int compat, const int force_indicator, + const char *connection_name, const bool questionmarks, + const char *curname, const int st, const char *query, ...) +{ + va_list args; + bool ret, scrollable; + char *new_query, *ptr, *whold, *noscroll, *scroll, *dollar0; + struct sqlca_t *sqlca = ECPGget_sqlca(); + + if (!query) + { + ecpg_raise(lineno, ECPG_EMPTY, ECPG_SQLSTATE_ECPG_INTERNAL_ERROR, NULL); + return false; + } + ptr = strstr(query, "for "); + if (!ptr) + { + ecpg_raise(lineno, ECPG_INVALID_STMT, ECPG_SQLSTATE_ECPG_INTERNAL_ERROR, NULL); + return false; + } + whold = strstr(query, "with hold "); + dollar0 = strstr(query, "$0"); + + noscroll = strstr(query, "no scroll "); + scroll = strstr(query, "scroll ");A query like 'SELECT 1 AS "with hold "' fools these lexical tests.
But SELECT 1 AS "with hold" doesn't go through ECPGopen(), it's run by ECPGdo()
so no breakage there. ecpglib functions are not intended to be called from manually
constructed C code.
Capture
that information in the parser rather than attempting to reconstruct it here.
Okay, this makes sense anyway.
+ scrollable = (noscroll == NULL) && (scroll != NULL) && (scroll < ptr); + + new_query = ecpg_alloc(strlen(curname) + strlen(ptr) + (whold ? 10 : 0) + 32, lineno); + if (!new_query) + return false; + sprintf(new_query, "declare %s %s cursor %s%s", + (dollar0 && (dollar0 < ptr) ? "$0" : curname), + (scrollable ? "scroll" : "no scroll"), + (whold ? "with hold " : ""), + ptr); + + /* Set the fetch size the first time we are called. */ + if (fetch_size == 0) + { + char *fsize_str = getenv("ECPGFETCHSZ"); + char *endptr = NULL; + int fsize; + + if (fsize_str) + { + fsize = strtoul(fsize_str, &endptr, 10); + if (endptr || (fsize < 4)) + fetch_size = DEFAULTFETCHSIZE;"endptr" will never be NULL; use "*endptr". As it stands, the code always
ignores ECPGFETCHSZ.
You're right.
An unusable ECPGFETCHSZ should procedure an error, not
silently give no effect.
Point taken. Which error handling do imagine? abort() or simply returning false
and raise and error in SQLCA?
Why a minimum of 4?
I forgot.
+ else + fetch_size = fsize; + } + else + fetch_size = DEFAULTFETCHSIZE; + } + + va_start(args, query); + ret = ecpg_do(lineno, compat, force_indicator, connection_name, questionmarks, st, new_query, args); + va_end(args); + + ecpg_free(new_query); + + /* + * If statement went OK, add the cursor and discover the + * number of rows in the recordset. This will slow down OPEN + * but we gain a lot with caching. + */ + if (ret /* && sqlca->sqlerrd[2] == 0 */)Why is the commented code there?
Some leftover from testing, it shouldn't be there.
+ { + struct connection *con = ecpg_get_connection(connection_name); + struct cursor_descriptor *cur; + bool existing; + int64 n_tuples; + + if (scrollable) + { + PGresult *res; + char *query; + char *endptr = NULL; + + query = ecpg_alloc(strlen(curname) + strlen("move all in ") + 2, lineno); + sprintf(query, "move all in %s", curname); + res = PQexec(con->connection, query); + n_tuples = strtoull(PQcmdTuples(res), &endptr, 10); + PQclear(res); + ecpg_free(query); + + /* Go back to the beginning of the resultset. */ + query = ecpg_alloc(strlen(curname) + strlen("move absolute 0 in ") + 2, lineno); + sprintf(query, "move absolute 0 in %s", curname); + res = PQexec(con->connection, query); + PQclear(res); + ecpg_free(query); + } + else + { + n_tuples = 0; + }You give this rationale for the above code:
On Thu, Jun 17, 2010 at 02:09:47PM +0200, Boszormenyi Zoltan wrote:
ECPGopen() also discovers the total number of records in the recordset,
so the previous ECPG "deficiency" (backend limitation) that sqlca.sqlerrd[2]
didn't report the (possibly estimated) number of rows in the resultset
is now
overcome. This slows down OPEN for cursors serving larger datasets
but it makes possible to position the readahead window using MOVE
ABSOLUTE no matter what FORWARD/BACKWARD/ABSOLUTE/RELATIVE
variants are used by the application. And the caching is more than
overweighs
the slowdown in OPEN it seems.From the documentation for Informix and Oracle, those databases do not
populate sqlerrd[2] this way:
http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.sqlt.doc/sqltmst189.htm
http://docs.oracle.com/cd/A57673_01/DOC/api/doc/PC_22/ch10.htm#toc139
The problem here is that Informix in the field in fact returns the number of rows
in the cursor and the customer we developed this readahead code for relied on this.
Maybe this was eliminated in newer versions of Informix to make it faster.
The performance impact will vary widely depending on the query cost per row
and the fraction of rows the application will actually retrieve. Consider a
complex aggregate returning only a handful of rows.
Indeed.
Consider SELECT * on a
1B-row table with the application ceasing reads after 1000 rows. Performance
aside, this will yield double execution of any volatile functions involved.
So, I think we ought to diligently avoid this step. (Failing that, the
documentation must warn about the extra full cursor scan and this feature must
stay disabled by default.)
OK, how about enabling it for Informix-compat mode only, or only via an
environment variable? I agree it should be documented.
+ + /* Add the cursor */ + cur = add_cursor(lineno, con, curname, scrollable, n_tuples, &existing); + + /* + * Report the number of tuples for the [scrollable] cursor. + * The server didn't do it for us. + */ + sqlca->sqlerrd[2] = (cur->n_tuples < LONG_MAX ? cur->n_tuples : LONG_MAX); + } + + return ret; +} + +static bool +ecpg_cursor_execute(struct statement * stmt, struct cursor_descriptor *cur) +{ + char tmp[64]; + char *query; + int64 start_pos; + + if ((cur->cache_pos >= cur->start_pos) && cur->res && (cur->cache_pos < cur->start_pos + PQntuples(cur->res))) + { + stmt->results = cur->res; + ecpg_free_params(stmt, true, stmt->lineno); + return true; + }Why does ecpg_cursor_execute() also call ecpg_free_params()? Offhand, it
seems that ECPGfetch() always takes care of that and is the more appropriate
place, seeing it's the one calling ecpg_build_params().
I will look at it.
--- a/src/interfaces/ecpg/ecpglib/data.c +++ b/src/interfaces/ecpg/ecpglib/data.c @@ -120,7 +120,7 @@ check_special_value(char *ptr, double *retval, char **endptr) }bool -ecpg_get_data(const PGresult *results, int act_tuple, int act_field, int lineno, +ecpg_get_data(const PGresult *results, int var_index, int act_tuple, int act_field, int lineno, enum ECPGttype type, enum ECPGttype ind_type, char *var, char *ind, long varcharsize, long offset, long ind_offset, enum ARRAY_TYPE isarray, enum COMPAT_MODE compat, bool force_indicator)This function could sure use a block comment such as would be customary in
src/backend. Compare the one at heap_update(), for example.
OK.
--- a/src/interfaces/ecpg/ecpglib/error.c +++ b/src/interfaces/ecpg/ecpglib/error.c @@ -268,6 +268,20 @@ ecpg_raise(int line, int code, const char *sqlstate, const char *str) ecpg_gettext("could not connect to database \"%s\" on line %d"), str, line); break;+ case ECPG_INVALID_CURSOR: + if (strcmp(sqlstate, ECPG_SQLSTATE_OBJECT_NOT_IN_PREREQUISITE_STATE) == 0) + snprintf(sqlca->sqlerrm.sqlerrmc, sizeof(sqlca->sqlerrm.sqlerrmc), + + /* + * translator: this string will be truncated at 149 characters + * expanded. + */ + ecpg_gettext("cursor can only scan forward"));Every other message works in the line number somehow; this should do the same.
Fixed.
--- a/src/interfaces/ecpg/ecpglib/execute.c +++ b/src/interfaces/ecpg/ecpglib/execute.c @@ -1707,46 +1809,20 @@ ecpg_execute(struct statement * stmt) }bool -ECPGdo(const int lineno, const int compat, const int force_indicator, const char *connection_name, const bool questionmarks, const int st, const char *query,...) +ecpg_do_prologue(int lineno, const int compat, const int force_indicator, + const char *connection_name, const bool questionmarks, + enum ECPG_statement_type statement_type, const char *query, + va_list args, struct statement **stmt_out)A block comment would especially help this function, considering the name
tells one so little.
OK.
--- a/src/interfaces/ecpg/ecpglib/extern.h +++ b/src/interfaces/ecpg/ecpglib/extern.h @@ -60,6 +60,12 @@ struct statement bool questionmarks; struct variable *inlist; struct variable *outlist; + char *oldlocale; + const char **dollarzero; + int ndollarzero; + const char **param_values; + int nparams; + PGresult *results; };Please comment the members of this struct like we do in most of src/include.
OK.
dollarzero has something to do with dynamic cursor names, right? Does it have
other roles?
Yes, it had other roles. ECPG supports user variables in cases where the
PostgreSQL grammar doesn't. There's this rule:
ECPG: var_valueNumericOnly addon
if ($1[0] == '$')
{
free($1);
$1 = mm_strdup("$0");
}
The "var_value: NumericOnly" case in gram.y can show up in a lot of cases.
This feature was there before the dynamic cursor. You can even use them together
which means more than one $0 placeholders in the statement. E.g.:
FETCH :amount FROM :curname;
gets translated to
FETCH $0 FROM $0;
by ecpg, and both the amount and the cursor name is passed in in user variables.
The value is needed by cursor.c, this is why this "dollarzero" pointer is needed.
--- a/src/interfaces/ecpg/preproc/ecpg.header +++ b/src/interfaces/ecpg/preproc/ecpg.header @@ -111,6 +115,26 @@ mmerror(int error_code, enum errortype type, const char *error, ...) }/* + * set use_fetch_readahead based on the current cursor + * doesn't return if the cursor is not declared + */ +static void +set_cursor_readahead(const char *curname) +{ + struct cursor *ptr; + + for (ptr = cur; ptr != NULL; ptr = ptr->next) + { + if (strcmp(ptr->name, curname) == 0) + break; + } + if (!ptr) + mmerror(PARSE_ERROR, ET_FATAL, "cursor \"%s\" does not exist", curname); + + use_fetch_readahead = ptr->fetch_readahead; +}Following add_additional_variables(), use strcasecmp() for literal cursor
names and strcmp() for cursor name host variables.
After modifying the grammar to use numeric values for readahead window size,
this function and the "use_fetch_readahead" variable are not needed anymore.
--- a/src/interfaces/ecpg/preproc/extern.h +++ b/src/interfaces/ecpg/preproc/extern.h @@ -24,12 +24,17 @@ extern bool autocommit, force_indicator, questionmarks, regression_mode, - auto_prepare; + auto_prepare, + fetch_readahead; +extern bool use_fetch_readahead;The names of the last two variables don't make clear the difference between
them. I suggest default_fetch_readahead and current_fetch_readahead.
Now fetch_readahead is int, the other one is no more.
--- /dev/null +++ b/src/interfaces/ecpg/test/preproc/cursor-readahead.pgc @@ -0,0 +1,244 @@ +#include <stdio.h> +#include <malloc.h>Why <malloc.h>? I only see this calling free(); use <stdlib.h> instead.
Fixed.
I had to update the code, the previous patch didn't apply cleanly to current GIT.
I will send the new patch some time next week after fixing the "make check"
breakage.
Thanks,
nm
--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/
On Fri, Mar 02, 2012 at 11:41:05AM -0500, Noah Misch wrote:
We yet lack a consensus on whether native ECPG apps should have access to the
feature. My 2c is to make it available, because it's useful syntactic sugar.
If your program independently processes each row of an arbitrary-length result
set, current facilities force you to add an extra outer loop to batch the
FETCHes for every such code site. Applications could define macros to
abstract that pattern, but this seems common-enough to justify bespoke
handling. Besides, minimalists already use libpq directly.
Sorry, I don't really understand what you're saying here. The program logic
won't change at all when using this feature or what do I misunderstand?
I suggest enabling the feature by default but drastically reducing the default
readahead chunk size from 256 to, say, 5. That still reduces the FETCH round
trip overhead by 80%, but it's small enough not to attract pathological
behavior on a workload where each row is a 10 MiB document. I would not offer
an ecpg-time option to disable the feature per se. Instead, let the user set
the default chunk size at ecpg time. A setting of 1 effectively disables the
feature, though one could later re-enable it with ECPGFETCHSZ.
Using 1 to effectively disable the feature is fine with me, but I strongly
object any default enabling this feature. It's farily easy to create cases with
pathological behaviour and this features is not standard by any means. I figure
a normal programmer would expect only one row being transfered when fetching
one.
Other than that, thanks for the great review.
Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL
2012-03-04 17:16 keltezéssel, Michael Meskes írta:
On Fri, Mar 02, 2012 at 11:41:05AM -0500, Noah Misch wrote:
We yet lack a consensus on whether native ECPG apps should have access to the
feature. My 2c is to make it available, because it's useful syntactic sugar.
If your program independently processes each row of an arbitrary-length result
set, current facilities force you to add an extra outer loop to batch the
FETCHes for every such code site. Applications could define macros to
abstract that pattern, but this seems common-enough to justify bespoke
handling. Besides, minimalists already use libpq directly.Sorry, I don't really understand what you're saying here. The program logic
won't change at all when using this feature or what do I misunderstand?
The program logic shouldn't change at all. He meant that extra coding effort
is needed if you want manual caching. It requires 2 loops instead of 1 if you use
FETCH N (N>1).
I suggest enabling the feature by default but drastically reducing the default
readahead chunk size from 256 to, say, 5. That still reduces the FETCH round
trip overhead by 80%, but it's small enough not to attract pathological
behavior on a workload where each row is a 10 MiB document. I would not offer
an ecpg-time option to disable the feature per se. Instead, let the user set
the default chunk size at ecpg time. A setting of 1 effectively disables the
feature, though one could later re-enable it with ECPGFETCHSZ.Using 1 to effectively disable the feature is fine with me,
Something else would be needed. For DML with WHERE CURRENT OF cursor,
the feature should stay disabled even with the environment variable is set
without adding any decoration to the DECLARE statement. The safe thing
would be to distinguish between uncached (but cachable) and uncachable
cases. A single value cannot work.
but I strongly
object any default enabling this feature. It's farily easy to create cases with
pathological behaviour and this features is not standard by any means. I figure
a normal programmer would expect only one row being transfered when fetching
one.Other than that, thanks for the great review.
Michael
Best regards,
Zoltán Böszörményi
--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/
On Sun, Mar 04, 2012 at 05:34:50PM +0100, Boszormenyi Zoltan wrote:
The program logic shouldn't change at all. He meant that extra coding effort
is needed if you want manual caching. It requires 2 loops instead of 1 if you use
FETCH N (N>1).
Ah, thanks for the explanation.
Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL
On Sun, Mar 04, 2012 at 05:16:06PM +0100, Michael Meskes wrote:
On Fri, Mar 02, 2012 at 11:41:05AM -0500, Noah Misch wrote:
I suggest enabling the feature by default but drastically reducing the default
readahead chunk size from 256 to, say, 5. That still reduces the FETCH round
trip overhead by 80%, but it's small enough not to attract pathological
behavior on a workload where each row is a 10 MiB document. I would not offer
an ecpg-time option to disable the feature per se. Instead, let the user set
the default chunk size at ecpg time. A setting of 1 effectively disables the
feature, though one could later re-enable it with ECPGFETCHSZ.Using 1 to effectively disable the feature is fine with me, but I strongly
object any default enabling this feature. It's farily easy to create cases with
pathological behaviour and this features is not standard by any means. I figure
a normal programmer would expect only one row being transfered when fetching
one.
On further reflection, I agree with you here. The prospect for queries that
call volatile functions changed my mind; they would exhibit different
functional behavior under readahead. We mustn't silently give affected
programs different semantics.
Thanks,
nm