FETCH FIRST clause PERCENT option
FETCH FIRST with PERCENT option is SQL standard that state limit count to
be specified in a percentage in addition to specify it in exact count and
listed as one of Major features simply not implemented yet in recent wiki
page [1].https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL_Standard.
I implemented it by executing the query plan twice. One without limit
filter to find the total number of row that will be feed to limit node so
the exact limit count can be calculated and the query plan execute again
with limit filter with newly calculated exact count .
[1]: .https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL_Standard
Regards
Surafel
Attachments:
fetch-wth-percent-v1.patchtext/x-patch; charset=US-ASCII; name=fetch-wth-percent-v1.patchDownload+114-28
Hi,
On 2018-08-16 17:27:45 +0300, Surafel Temesgen wrote:
FETCH FIRST with PERCENT option is SQL standard that state limit count to
be specified in a percentage in addition to specify it in exact count and
listed as one of Major features simply not implemented yet in recent wiki
page [1].I implemented it by executing the query plan twice. One without limit
filter to find the total number of row that will be feed to limit node so
the exact limit count can be calculated and the query plan execute again
with limit filter with newly calculated exact count .
Won't that have rather massive issues with multiple evaluations of
clauses in the query? Besides being really expensive?
I think you'd have to instead spill the query results into a tuplestore
(like we do for FOR HOLD queries at end of xact), and then do the
computations based on that.
- Andres
"Andres" == Andres Freund <andres@anarazel.de> writes:
Andres> I think you'd have to instead spill the query results into a
Andres> tuplestore (like we do for FOR HOLD queries at end of xact),
Andres> and then do the computations based on that.
The approach I've suggested to some people who wanted to look into this
was to inject a window function call into the query, and teach the Limit
node to be able to stop on a boolean condition (which would be based on
that function result). This method should also work for WITH TIES by
using a different window function.
--
Andrew (irc:RhodiumToad)
On Thu, Aug 16, 2018 at 5:34 PM, Andres Freund <andres@anarazel.de> wrote:
Won't that have rather massive issues with multiple evaluations of
clauses in the query? Besides being really expensive?
The plan re-scane after first execution I can’t see issue for multiple
execution of a clause in this case
I think you'd have to instead spill the query results into a tuplestore
The downside of it is all the result have to be stored even if needed
tuple is a fraction of it and also store it for longer so the choice became
memory or cpu utilization
- Andres
Regards
Surafel
On 2018-08-21 15:47:07 +0300, Surafel Temesgen wrote:
On Thu, Aug 16, 2018 at 5:34 PM, Andres Freund <andres@anarazel.de> wrote:
Won't that have rather massive issues with multiple evaluations of
clauses in the query? Besides being really expensive?The plan re-scane after first execution I can’t see issue for multiple
execution of a clause in this case
Imagine volatile functions being used. That can be problematic because
of repeated side-effects, but also will lead to plainly wrong
results. Consider what happens with your approach where somebody does
something like WHERE value < random().
Greetings,
Andres Freund
On Tue, Aug 21, 2018 at 3:50 PM Andres Freund <andres@anarazel.de> wrote:
Imagine volatile functions being used. That can be problematic because
of repeated side-effects, but also will lead to plainly wrong
results. Consider what happens with your approach where somebody does
something like WHERE value < random().
Thanks for explaining . The attach patch use tuplestore instead
Regards
Surafel
Attachments:
fetch-wth-percent-v2.patchtext/x-patch; charset=US-ASCII; name=fetch-wth-percent-v2.patchDownload+227-70
On 2018-08-28 14:14, Surafel Temesgen wrote:
On Tue, Aug 21, 2018 at 3:50 PM Andres Freund <andres@anarazel.de>
wrote:Imagine volatile functions being used. That can be problematic because
of repeated side-effects, but also will lead to plainly wrong
results. Consider what happens with your approach where somebody does
something like WHERE value < random().Thanks for explaining . The attach patch use tuplestore instead
[fetch-wth-percent-v2.patch]
I had a quick look at this. Apply, compile, make check are ok.
In straightforward usage seems to work ok.
But I also ran across this crash:
create table if not exists t as
select i from generate_series(1, 100) as f(i);
select * from t
offset 60 rows
fetch next 3 percent rows only
FOR UPDATE
;
TRAP: FailedAssertion("!(slot != ((void *)0))", File: "execTuples.c",
Line: 428)
Erik Rijkers
On Tue, Aug 28, 2018 at 7:33 PM Erik Rijkers <er@xs4all.nl> wrote:
;
TRAP: FailedAssertion("!(slot != ((void *)0))", File: "execTuples.c",
Line: 42
The attache patch include a fix for the crash .can you check it again?
Regards
Surafel
Attachments:
fetch-wth-percent-v3.patchtext/x-patch; charset=US-ASCII; name=fetch-wth-percent-v3.patchDownload+230-70
On Fri, Aug 31, 2018 at 11:42 PM Surafel Temesgen <surafel3000@gmail.com> wrote:
On Tue, Aug 28, 2018 at 7:33 PM Erik Rijkers <er@xs4all.nl> wrote:
;
TRAP: FailedAssertion("!(slot != ((void *)0))", File: "execTuples.c",
Line: 42The attache patch include a fix for the crash .can you check it again?
FYI this fails[1]https://travis-ci.org/postgresql-cfbot/postgresql/builds/430777123 in src/test/modules/test_ddl_deparse's create_table
test because of the keyword:
CREATE TABLE stud_emp (
percent int4
) INHERITS (emp, student);
! ERROR: syntax error at or near "percent"
[1]: https://travis-ci.org/postgresql-cfbot/postgresql/builds/430777123
--
Thomas Munro
http://www.enterprisedb.com
On Aug 16, 2018, at 7:34 AM, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-08-16 17:27:45 +0300, Surafel Temesgen wrote:
FETCH FIRST with PERCENT option is SQL standard that state limit count to
be specified in a percentage in addition to specify it in exact count and
listed as one of Major features simply not implemented yet in recent wiki
page [1].I implemented it by executing the query plan twice. One without limit
filter to find the total number of row that will be feed to limit node so
the exact limit count can be calculated and the query plan execute again
with limit filter with newly calculated exact count .
Surafel, there are no regression tests that I can see in your patch. It
would help if you added some, as then I could precisely what behavior you
are expecting. As it is, I'm just guessing here, but here goes....
Won't that have rather massive issues with multiple evaluations of
clauses in the query? Besides being really expensive?I think you'd have to instead spill the query results into a tuplestore
(like we do for FOR HOLD queries at end of xact), and then do the
computations based on that.
I should think that spilling anything to a tuplestore would only be needed
if the query contains an ORDER BY expression. If you query
FETCH FIRST 50 PERCENT * FROM foo;
you should just return every other row, discarding the rest, right? It's
only when an explicit ordering is given that the need to store the results
arises. Even with
FETCH FIRST 50 PERCENT name FROM foo ORDER BY name;
you can return one row for every two rows that you get back from the
sort node, reducing the maximum number you need to store at any time to
no more than 25% of all rows.
Or am I missing something?
mark
Hi,
On 2018-09-20 17:06:36 -0700, Mark Dilger wrote:
I should think that spilling anything to a tuplestore would only be needed
if the query contains an ORDER BY expression. If you queryFETCH FIRST 50 PERCENT * FROM foo;
you should just return every other row, discarding the rest, right? It's
only when an explicit ordering is given that the need to store the results
arises. Even withFETCH FIRST 50 PERCENT name FROM foo ORDER BY name;
you can return one row for every two rows that you get back from the
sort node, reducing the maximum number you need to store at any time to
no more than 25% of all rows.
I'm doubtful about the validity of these optimizations, particularly
around being surprising. But I think more importantly, we should focus
on the basic implementation that's needed anyway.
Greetings,
Andres Freund
On Sep 20, 2018, at 5:29 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-09-20 17:06:36 -0700, Mark Dilger wrote:
I should think that spilling anything to a tuplestore would only be needed
if the query contains an ORDER BY expression. If you queryFETCH FIRST 50 PERCENT * FROM foo;
you should just return every other row, discarding the rest, right? It's
only when an explicit ordering is given that the need to store the results
arises. Even withFETCH FIRST 50 PERCENT name FROM foo ORDER BY name;
you can return one row for every two rows that you get back from the
sort node, reducing the maximum number you need to store at any time to
no more than 25% of all rows.I'm doubtful about the validity of these optimizations, particularly
around being surprising. But I think more importantly, we should focus
on the basic implementation that's needed anyway.
You may be right that getting the basic implementation finished first
is better than optimizing at this stage. So the rest of what I'm going
to say is just in defense of the optimization, and not an argument for
needing to optimize right away.
As for reducing the surprise factor, I think that it would be surprising
if I ask for a smallish percentage of rows and it takes significantly longer
and significantly more memory or disk than asking for all the rows takes.
If I'm including an explicit ORDER BY, then that explains it, but otherwise,
I'd be surprised. Note that I'm not saying I'd be surprised by it taking
roughly the same length of time / memory / disk. I'd only be surprised if
it took a lot more.
There are plenty of SQL generation engines that people put in their software.
I'd expect something like
sprintf("FETCH FIRST %d PERCENT %s FROM %s", percentage, columns, tablename)
to show up in such engines, and percentage to sometimes be 100. At least
in that case you should just return all rows rather than dumping them into
a tuplestore. Likewise, if the percentage is 0, you'll want to finish quickly.
Actually, I don't know if the SQL spec would require side effects to still
happen, in which case you'd still have to generate all rows for their side
effects to happen, and then just not return them. But still, no tuplestore.
So the implementation of FETCH FIRST would at least need to think about what
percentage is being requested, rather than just mindlessly adding a node to
the tree for storing everything, then computing the LIMIT based on the number
of rows stored, and then returning that number of rows.
mark
hey
On 9/21/18, Mark Dilger <hornschnorter@gmail.com> wrote:
Surafel, there are no regression tests that I can see in your patch. It
would help if you added some, as then I could precisely what behavior you
are expecting.
thank you for looking at it .the attach patch add regression tests
Attachments:
fetch-wth-percent-v4.patchtext/x-patch; charset=US-ASCII; name=fetch-wth-percent-v4.patchDownload+369-72
On Sep 25, 2018, at 5:07 AM, Surafel Temesgen <surafel3000@gmail.com> wrote:
hey
On 9/21/18, Mark Dilger <hornschnorter@gmail.com> wrote:
Surafel, there are no regression tests that I can see in your patch. It
would help if you added some, as then I could precisely what behavior you
are expecting.thank you for looking at it .the attach patch add regression tests
<fetch-wth-percent-v4.patch>
Surafel,
I messed around with your changes to the grammar and it seems you don't
need to add PERCENT as a reserved keyword. Moving this to the unreserved
keyword section does not cause any shift/reduce errors, and the regression
tests still pass. Relative to your patch v4, these changes help:
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a97ee30924..9872cf7257 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -15186,6 +15186,7 @@ unreserved_keyword:
| PARTITION
| PASSING
| PASSWORD
+ | PERCENT
| PLANS
| POLICY
| PRECEDING
@@ -15465,7 +15466,6 @@ reserved_keyword:
| ONLY
| OR
| ORDER
- | PERCENT
| PLACING
| PRIMARY
| REFERENCES
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 150d51df8f..f23fee0653 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -300,7 +300,7 @@ PG_KEYWORD("partial", PARTIAL, UNRESERVED_KEYWORD)
PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD)
PG_KEYWORD("passing", PASSING, UNRESERVED_KEYWORD)
PG_KEYWORD("password", PASSWORD, UNRESERVED_KEYWORD)
-PG_KEYWORD("percent", PERCENT, RESERVED_KEYWORD)
+PG_KEYWORD("percent", PERCENT, UNRESERVED_KEYWORD)
PG_KEYWORD("placing", PLACING, RESERVED_KEYWORD)
PG_KEYWORD("plans", PLANS, UNRESERVED_KEYWORD)
PG_KEYWORD("policy", POLICY, UNRESERVED_KEYWORD)
On Sep 25, 2018, at 8:08 AM, Mark Dilger <hornschnorter@gmail.com> wrote:
On Sep 25, 2018, at 5:07 AM, Surafel Temesgen <surafel3000@gmail.com> wrote:
hey
On 9/21/18, Mark Dilger <hornschnorter@gmail.com> wrote:
Surafel, there are no regression tests that I can see in your patch. It
would help if you added some, as then I could precisely what behavior you
are expecting.thank you for looking at it .the attach patch add regression tests
<fetch-wth-percent-v4.patch>Surafel,
I messed around with your changes to the grammar and it seems you don't
need to add PERCENT as a reserved keyword. Moving this to the unreserved
keyword section does not cause any shift/reduce errors, and the regression
tests still pass. Relative to your patch v4, these changes help:
I spoke too soon. The main regression tests pass, but your change to
src/test/modules/test_ddl_deparse/sql/create_table.sql per Thomas's
suggestion is no longer needed, since PERCENT no longer needs to be
quoted.
I recommend you also apply the following to your v4 patch, which just
rolls back that one change you made, and at least for me, is enough
to get `make check-world` to pass:
diff --git a/src/test/modules/test_ddl_deparse/sql/create_table.sql b/src/test/modules/test_ddl_deparse/sql/create_table.sql
index 4325de2d04..5e78452729 100644
--- a/src/test/modules/test_ddl_deparse/sql/create_table.sql
+++ b/src/test/modules/test_ddl_deparse/sql/create_table.sql
@@ -94,7 +94,7 @@ CREATE TABLE student (
) INHERITS (person);
CREATE TABLE stud_emp (
- "percent" int4
+ percent int4
) INHERITS (emp, student);
I messed around with your changes to the grammar and it seems you don't
need to add PERCENT as a reserved keyword. Moving this to the unreserved
keyword section does not cause any shift/reduce errors, and the regression
tests still pass. Relative to your patch v4, these changes help:
In sql standard PERCENT list as reserved world so i don't think it is a
thing that can change by me.
I think the main reason create_table fail in test_ddl_deparse is because
i don't surround PERCENT key word in
/src/test/regress/expected/create_table.out too
regards
Surafel
Attachments:
fetch-wth-percent-v5.patchtext/x-patch; charset=US-ASCII; name=fetch-wth-percent-v5.patchDownload+370-73
On 25/11/2018 10:00, Surafel Temesgen wrote:
I messed around with your changes to the grammar and it seems you don't
need to add PERCENT as a reserved keyword. Moving this to the
unreserved
keyword section does not cause any shift/reduce errors, and the
regression
tests still pass. Relative to your patch v4, these changes help:In sql standard PERCENT list as reserved world so i don't think it is a
thing that can change by me.
Yes it absolutely is. In PostgreSQL we only make words as reserved as
they need to be, and PERCENT doesn't need to be reserved at all.
Also, this query returns 210 rows instead of the expected 208:
select *
from generate_series(1, 1000)
fetch first 20.8 percent rows only
As for the design, I think you should put some serious consideration
into Andrew Gierth's approach. I would rather see something like that.
--
Vik Fearing +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Sun, Nov 25, 2018 at 1:24 PM Vik Fearing <vik.fearing@2ndquadrant.com>
wrote:
Also, this query returns 210 rows instead of the expected 208:
select *
from generate_series(1, 1000)
fetch first 20.8 percent rows only
this is because fetch first values work with integer and it change
fractional number to nearest integer number like select * from
generate_series(1, 1000) fetch first 20.3 rows only; is not an error rather
it return 20 rows.
As for the design, I think you should put some serious consideration
into Andrew Gierth's approach. I would rather see something like that.
I didn't go in that direction because I don’t understand why this approach
is inferior to using window function and I am not fully understand on how
to implement it.
regards
Surafel
On 25/11/2018 12:49, Surafel Temesgen wrote:
On Sun, Nov 25, 2018 at 1:24 PM Vik Fearing <vik.fearing@2ndquadrant.com
<mailto:vik.fearing@2ndquadrant.com>> wrote:Also, this query returns 210 rows instead of the expected 208:
select *
from generate_series(1, 1000)
fetch first 20.8 percent rows onlythis is because fetch first values work with integer and it change
fractional number to nearest integer number like select * from
generate_series(1, 1000) fetch first 20.3 rows only; is not an error
rather it return 20 rows.
I don't see how this behavior is justified by reading the SQL standard.
Obviously only an integer number of rows is going to be returned, but
the percentage should be calculated correctly.
I assume you meant 200 rows there, but the correct number of rows to
return is 203 for that query. Please fix it.
--
Vik Fearing +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
"Surafel" == Surafel Temesgen <surafel3000@gmail.com> writes:
Surafel> this is because fetch first values work with integer and it
Surafel> change fractional number to nearest integer number like select
Surafel> * from generate_series(1, 1000) fetch first 20.3 rows only; is
Surafel> not an error rather it return 20 rows.
31) The declared type of the <simple value specification> simply
contained in <fetch first percentage> shall be numeric.
Note that unlike <fetch first row count> there is no requirement for the
type to be _exact_ numeric or to have scale 0.
later:
ii) If <fetch first percentage> is specified, then let FFP be the
<simple value specification> simply contained in <fetch first
percentage>, and let LOCT be a <literal> whose value is OCT. Let
FFRC be the value of
CEILING ( FFP * LOCT / 100.0E0 )
NOTE 333 -- The percentage is computed using the number
of rows before removing the rows specified by <offset
row count>.
ceil(20.3 * 1000 / 100.0E0) is definitely 203, not 200.
--
Andrew.