Get more from indices.
Hello, This is the last episode of the 'dance with indices'?
series.
Unique indexes can sort the tuples in corresponding tables
prefectly. So this query might can use index.
uniquetest=# create table t (a int, b int, c int, d text);
uniquetest=# create unique index i_t_pkey on t(a, b);
uniquetest=# insert into t
(select a % 10, a / 10, a, 't' from generate_series(0, 100000) a);
uniquetest=# analyze;uniquetest=# explain (costs off, analyze on) select distinct * from t;
QUERY PLAN
---------------------------------------------------------------------------
Unique (actual time=149.625..245.978 rows=100001 loops=1)
-> Sort (actual time=149.624..199.806 rows=100001 loops=1)
Sort Key: a, b, c, d
Sort Method: external merge Disk: 2328kB
-> Seq Scan on t (actual time=0.016..15.597 rows=100001 loops=1)
Total runtime: 251.272 ms
With this patch, the plan for this query becomes as follows,
uniquetest=# explain (costs off, analyze on) select distinct * from t;
QUERY PLAN
---------------------------------------------------------------------
Index Scan using i_t_pkey on t
(actual time=0.019..32.457 rows=100001 loops=1)
Total runtime: 37.488 ms
This only seems not so valuable to archive, but with my previous
MergeAppend for UNION patch, this will have more value.
uniquetest=# create table pu1 (a int, b int, c int, d text);
uniquetest=# create unique index pu1_pkey_idx on pu1 (a, b);
uniquetest=# create table cu11 (like pu1 including all) inherits (pu1);
uniquetest=# create table cu12 (like pu1 including all) inherits (pu1);
uniquetest=# create table pu2 (like pu1 including all);
uniquetest=# create table cu21 (like pu1 including all) inherits (pu2);
uniquetest=# create table cu22 (like pu1 including all) inherits (pu2);
uniquetest=# insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11'
from generate_series(000000, 099999) a);
uniquetest=# insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12'
from generate_series(100000, 199999) a);
uniquetest=# insert into cu21 (select a / 5, 4 - (a % 5), a, 'cu21'
from generate_series(150000, 249999) a);
uniquetest=# insert into cu22 (select a / 5, 4 - (a % 5), a, 'cu22'
from generate_series(250000, 349999) a);
uniquetest=# explain (analyze on, costs off)
select * from pu1 union select * from pu2 order by a, b limit 100;
QUERY PLAN
--------------------------------------------------------------------------
Limit (actual time=0.226..0.591 rows=100 loops=1)
-> Unique (actual time=0.223..0.561 rows=100 loops=1)
-> Merge Append (actual time=0.223..0.470 rows=100 loops=1)
Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
-> Limit (actual time=0.125..0.326 rows=100 loops=1)
-> Unique (actual time=0.125..0.303 rows=100 loops=1)
-> Merge Append (actual time=0.123..0.220 rows=100 loops=1)
Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
-> Index Scan using..on pu1 (actual time=0.005..0.005 rows=0 loops=1)
-> Index Scan using..on cu11(actual time=0.071..0.128 rows=100 loops=1)
-> Index Scan using..on cu12 (actual time=0.045..0.045 rows=1 loops=1)
-> Limit (actual time=0.096..0.096 rows=1 loops=1)
-> Unique (actual time=0.096..0.096 rows=1 loops=1)
-> Merge Append (actual time=0.094..0.094 rows=1 loops=1)
Sort Key: pu2.a, pu2.b, pu2.c, pu2.d
-> Index Scan using..on pu2 (actual time=0.002..0.002 rows=0 loops=1)
-> Index Scan using..on cu21 (actual time=0.047..0.047 rows=1 loops=1)
-> Index Scan using..on cu22 (actual time=0.043..0.043 rows=1 loops=1)
Total runtime: 1.987 ms
On the other hand, what we will get from the unpatched PostgreSQL is,
uniquetest=# explain (analyze on, costs off)
select * from pu1 union select * from pu2 order by a, b limit 100;
QUERY PLAN
-------------------------------------------------------------------------
Limit (actual time=535.620..535.706 rows=100 loops=1)
-> Unique (actual time=535.618..535.695 rows=100 loops=1)
-> Sort (actual time=535.617..535.637 rows=100 loops=1)
Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
Sort Method: external sort Disk: 10568kB
-> Append (actual time=0.012..144.144 rows=400000 loops=1)
-> Append (actual time=0.012..49.570 rows=200000 loops=1)
-> Seq Scan on pu1 (actual time=0.001..0.001 rows=0 loops=1)
-> Seq Scan on cu11 (actual time=0.011..16.202 rows=100000 loops=1)
-> Seq Scan on cu12 (actual time=0.008..16.334 rows=100000 loops=1)
-> Append (actual time=0.010..54.052 rows=200000 loops=1)
-> Seq Scan on pu2 (actual time=0.000..0.000 rows=0 loops=1)
-> Seq Scan on cu21 (actual time=0.009..16.444 rows=100000 loops=1)
-> Seq Scan on cu22 (actual time=0.021..18.923 rows=100000 loops=1)
Total runtime: 537.765 ms
What do you think about this?
This could be realized by following modifications,
- Consider a query pathekeys is useful for ordering when a index
pathkey is a subset of that when the index is UNIQUE.
(build_index_paths, pathkeys_useful_for_ordering,
truncate_useless_pathkeys, grouping_planner)
- Judge a path is orderd or not counting the uniqueness of the
path.
- Regard IndexPath on UNIQUE index is reagarded uniquely
ordered.
- Regard MergeAppendPath of which all children is uniquely
orderd as (not necessarily uniquely) ordered.
(path_is_ordered)
- Judge the necessity of sorting on above
premises. (grouping_planner)
Needless to say, theses patches have no effect on any other set
operations than UNION(ALL). They are, INTERSECT(ALL),
EXCEPT(ALL).
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
pathkey_and_uniqueidx_v1_20131031.patchtext/x-patch; charset=us-asciiDownload+151-25
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
Unique indexes can sort the tuples in corresponding tables
prefectly. So this query might can use index.
uniquetest=# create table t (a int, b int, c int, d text);
uniquetest=# create unique index i_t_pkey on t(a, b);
uniquetest=# insert into t
(select a % 10, a / 10, a, 't' from generate_series(0, 100000) a);
uniquetest=# analyze;uniquetest=# explain (costs off, analyze on) select distinct * from t;
ISTM the right way to deal with this is not what you've done here, but
just to deem the DISTINCT a no-op if there's a unique index on some subset
of the distinct-ed columns. This query is actually legally satisfied by
a simple seqscan, which would be faster than either of the plans you
mention. In any case, it seems like a bad idea to me to conflate
distinct-ness with ordering, so I don't like what you did to PathKeys.
Having said that, there is the kernel of a useful idea here, I think.
The reason you don't get an indexscan already is that the DISTINCT
assumes it needs to sort by (a,b,c,d), which an index on just (a,b)
doesn't appear to satisfy. However, if the index is unique, wouldn't
scanning the index produce data that actually satisfies the longer sort
key? It doesn't matter what the values of c,d are if there are no
duplicates in the a,b columns. So maybe as a separate patch, we could
look at claiming that a unique index satisfies the entire query_pathkeys
if it matches the first N columns of that.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Oct 31, 2013 at 10:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
However, if the index is unique, wouldn't
scanning the index produce data that actually satisfies the longer sort
key? It doesn't matter what the values of c,d are if there are no
duplicates in the a,b columns. So maybe as a separate patch, we could
look at claiming that a unique index satisfies the entire query_pathkeys
if it matches the first N columns of that.
That would be really spiffy.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I wrote:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
Unique indexes can sort the tuples in corresponding tables
prefectly. So this query might can use index.
BTW, how much of any of this is correct if the "unique" index contains
multiple NULL values? We might need to restrict the optimization(s)
to cases where the unique-ified columns are all marked NOT NULL.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/31/13, 6:43 AM, Kyotaro HORIGUCHI wrote:
Hello, This is the last episode of the 'dance with indices'?
series.
Your patch fails the isolation test because of changed query plans:
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thank you,
In any case, it seems like a bad idea to me to conflate
distinct-ness with ordering, so I don't like what you did to
PathKeys.
Hmm, that sounds quite resonable in general. But the conflation
is already found in grouping_planner to some extent. The name
distinct_pathkey itself asserts that it is the ordering used for
distinct. And actually is used for sorting if hashed-distinct is
not selected.
Plus, these modifications in grouping_planner is required by the
patch for pathkeys.c to be effective.
I suppose the main cause of nastiness of the patch for
grouping_planner comes from the adheration of the usage of the
variable for path uniqueness with the existent code.
The additional members to Plan, say, pathkeys and ordered could
help the code to look less ugly by taking in the related code
currently appears nakedly in grouping_planner into make(or
generate)_xxxxplan() functions. Although the new attributes
somewhat look out of place..
However, if the index is unique, wouldn't
scanning the index produce data that actually satisfies the longer sort
key? It doesn't matter what the values of c,d are if there are no
duplicates in the a,b columns. So maybe as a separate patch, we could
look at claiming that a unique index satisfies the entire query_pathkeys
if it matches the first N columns of that.That would be really spiffy.
# Putting aside the trueness of the assumption for unique-index
# and pathkeys.
The "expanded" sufficiency check can be archieved by involving
'unique-indexed' attribute for pathkeys_contained_in(),say
pathkeys_satisfies(pathkeys, pathkeys, is_uniq), but finally
could have no effect without some extent of assist in the process
in grouping_planner like my preveous patch to be in effect, I
believe.
I'll try to rewrite the path to be as following considering less
conflating lookings in grouping_planner.
- is_unique and pathkeys is added to the type Path. (umm...)
- create function like pathkeys_satisfies(check_pathkeys,
pathkeys, isuniq) or path_ordered_by(pathkeys, path) as
needed.
- Plan will be set ordered and pathkeys derived from source path
and its process and grouping_planner consults it to deceide
whether to do sort (to hide the currently naked code).
- Add check for NULLuble columns :-)
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
Your patch fails the isolation test because of changed query plans:
Thank you for pointing out. I wasn't aware of that..
# Because it is not launched from the top-level make check...
I'll count that in next pach.
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello, this is the revised patch.
Hmm, that sounds quite resonable in general. But the conflation
is already found in grouping_planner to some extent.
Although this new patch is not split into unique-index sutff and
others, it removes current_pathkeys from grouping_planner, and
adds pathkeys and is_unique into struct Plan instead. This
eliminates independent and longstanding current_pathkeys variable
and calculus involving it from grouping_planner() so it would be
made clearer and easier to read, I expect.
- is_unique and pathkeys is added to the type Path. (umm...)
- create function like pathkeys_satisfies(check_pathkeys,
pathkeys, isuniq) or path_ordered_by(pathkeys, path) as
needed.
This was different from my thought. Finally I added
plan_is_ordered(plan, path) and some of make_<nodename>()
functions take pathkeys and/or uniquely_ordered as parameter and
some of others take them from given leftree plan. Plan nodes
propagate the attributes in autonomous way so explicit
current_pathkeys handling could be thrown away.
- Plan will be set ordered and pathkeys derived from source path
and its process and grouping_planner consults it to deceide
whether to do sort (to hide the currently naked code).- Add check for NULLuble columns :-)
Now IndexOptInfo has new attribute full_ordered and it is set
true if the index does not cover any nullAble columns in
get_relation_info().
And expect file of isolation test is modified to fit the change
in result plan.
Finally, this version became to conflict with the another UNION
patch so I detached from it and rebased to current master.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
20131112_unique_orderby.patchtext/x-patch; charset=us-asciiDownload+224-106
On Tue, 2013-11-12 at 17:48 +0900, Kyotaro HORIGUCHI wrote:
Hello, this is the revised patch.
Since you're using git, please check your patch for trailing whitespace
with git diff --check.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello, I might made insufficient explanation. Surely it is
overdone for the example.
uniquetest=# create table t (a int, b int, c int, d text);
uniquetest=# create unique index i_t_pkey on t(a, b);
uniquetest=# insert into t
(select a % 10, a / 10, a, 't' from generate_series(0, 100000) a);
uniquetest=# analyze;uniquetest=# explain (costs off, analyze on) select distinct * from t;
This is the simplest example for this patch.
The main intention of this patch is to widen application of my
another 'Index for UNION' patch.
https://commitfest.postgresql.org/action/patch_view?id=1279
ISTM the right way to deal with this is not what you've done
here, but just to deem the DISTINCT a no-op if there's a unique
index on some subset of the distinct-ed columns.
It is not sufficient only to deem DISTINCT no-op in order to get
more utility of MergeAppends on IndexScans for UNION. The
following example (omitting data insertion..)
| create table cu11 (a int not null, b int not null, c int, d text);
| create unique index i_cu11_pk on cu11 (a, b);
| create table cu12 (like cu11 including all);
| explain select a, b from cu11 union select a, b from cu12
| order by a, b limit 100;
yields following plan without this patch.
| Limit
| -> Sort (Sort Key: cu11.a, cu11.b)
| -> HashAggregate
| -> Append
| -> Limit
| -> Unique
| -> Index Only Scan using cu11_a_b_idx on cu11
| -> Limit
| -> Unique
| -> Index Only Scan using cu12_a_b_idx on cu12
But, this could be far more effecient when the plan were as
follows if the rows are fully-ordered on the sort key.
| Limit
| -> Unique
| -> Merge Append (Sort Key: cu11.a, cu11.b)
| -> Index Only Scan using cu11_a_b_idx on cu11
| -> Index Only Scan using cu12_a_b_idx on cu12
Propagation of uniqueness on MergeAppend is required to do this
but not for the first expample on this thread.
This query is actually legally satisfied by a simple seqscan,
which would be faster than either of the plans you mention. In
any case, it seems like a bad idea to me to conflate
distinct-ness with ordering, so I don't like what you did to
PathKeys.
Umm. Reconsidering on that and the discussion of the purose above
and the last patch of this thread, I've been wrong in where to
split the patches. I'll repost the reorganized patches on both
threads.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello, I've totally refactored the series of patches and cut out
the appropriate portion as 'unique (and non-nullable) index
stuff'.
As the discussion before, it got rid of path distinctness. This
patch works only on index 'full-orederedness', i.e., unique index
on non-nullable columns.
This patch itself does not so much. Will have power applied with
'Using indices for UNION' patch.
=== Making test table
create table t (a int not null, b int not null, c int, d text);
create unique index i_t_ab on t (a, b);
insert into t (select a / 5, 4 - (a % 5), a, 't' from generate_series(000000, 099999) a);
=== Example 1.
not-patched=# explain select * from t order by a, b ,c limit 1;
QUERY PLAN
----------------------------------------------------------------------
Limit (cost=2041.00..2041.00 rows=1 width=14)
-> Sort (cost=2041.00..2291.00 rows=100000 width=14)
Sort Key: a, b, c
-> Seq Scan on t (cost=0.00..1541.00 rows=100000 width=14)
patched=# explain select * from t order by a, b ,c limit 1;
QUERY PLAN
-----------------------------------------------------------------------------
Limit (cost=0.29..0.33 rows=1 width=14)
-> Index Scan using i_t_ab on t (cost=0.29..3857.04 rows=100000 width=14)
=== Example 2.
not-patched=# explain select distinct * from t order by a limit 1;
QUERY PLAN
---------------------------------------------------------------------------
Limit (cost=1820.46..1820.47 rows=1 width=44)
-> Sort (cost=1820.46..1835.34 rows=5951 width=44)
Sort Key: a
-> HashAggregate (cost=1731.20..1790.71 rows=5951 width=44)
-> Seq Scan on t (cost=0.00..1136.10 rows=59510 width=44)
patched=# explain select distinct * from t order by a limit 1;
QUERY PLAN
------------------------------------------------------------------------------------
Limit (cost=0.29..1.09 rows=1 width=44)
-> Unique (cost=0.29..4756.04 rows=5951 width=44)
-> Index Scan using i_t_ab on t (cost=0.29..4160.94 rows=59510 width=44)
The unique node could be removed technically but it requires to
care the distinctness of path/plan. So it has been put out to
"Using indeces for UNION" patch.
Any comments?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
pathkey_and_uniqueindx_v3_20131119.patchtext/x-patch; charset=us-asciiDownload+79-13
Kyotaro HORIGUCHI wrote:
Hello, I've totally refactored the series of patches and cut out the
appropriate portion as 'unique (and non-nullable) index stuff'.
As the discussion before, it got rid of path distinctness. This patch
works only on index 'full-orederedness', i.e., unique index on non-nullable
columns.
This is interesting!
I took a quick look at the patch. Here is my initial comment. I don't
think it's so good to set the pathkeys of a unique-index path to
query_pathkeys after the scan/join optimization because it can't exploit the
full-orderedness property in implementing mergejoins, i.e., it can't skip
doing an explicit sort that is considered unnecessary from the property.
So, I think the path's pathkeys should be set to query_pathkeys before the
scan/join optimization, i.e., at the index-path creation time.
If you wouldn't mind, I'd like to rework on the patch.
Thanks,
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello. I found a bug(?) in thsi patch as I considered on another
patch.
In truncate_useless_pathkeys, if query_pathkeys is longer than
pathkeys made from index columns old patch picked up the latter
as IndexPath's pathkeys. But the former is more suitable
according to the context here.
the attched pathkey_and_uniqueindx_v4_20131122.patch is changed
as follows.
- Rebased to current master.
- truncate_useless_pathkeys returns root->query_pathkeys when
the index is fully-ordered and query_pathkeys contains the
pathkeys made from index columns.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
pathkey_and_uniqueindx_v4_20131122.patchtext/x-patch; charset=us-asciiDownload+80-14
Kyotaro HORIGUCHI wrote:
Hello. I found a bug(?) in thsi patch as I considered on another patch.
In truncate_useless_pathkeys, if query_pathkeys is longer than pathkeys
made
from index columns old patch picked up the latter as IndexPath's pathkeys.
But the former is more suitable according to the context here.
the attched pathkey_and_uniqueindx_v4_20131122.patch is changed as
follows.
- Rebased to current master.
- truncate_useless_pathkeys returns root->query_pathkeys when
the index is fully-ordered and query_pathkeys contains the
pathkeys made from index columns.
OK, I'd like to look at this version of the patch more closely, then.
Thanks,
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Kyotaro HORIGUCHI wrote:
the attched pathkey_and_uniqueindx_v4_20131122.patch is changed as
follows.
The patch is compiled successfully and passes all regression tests. Also,
the patch works well for the tests shown in an earlier email from
Horiguchi-san. But in this version I found an incorrect behavior.
postgres=# CREATE TABLE t (a int not null, b int not null, c int, d text);
CREATE TABLE
postgres=# CREATE UNIQUE INDEX i_t_ab ON t (a, b);
CREATE INDEX
postgres=# INSERT INTO t (SELECT a / 5, 4 - (a % 5), a, 't' FROM
generate_series(000000, 099999) a);
INSERT 0 100000
postgres=# ANALYZE t;
ANALYZE
postgres=# CREATE TABLE t2 (e text, f int);
CREATE TABLE
postgres=# INSERT INTO t2 VALUES ('t', 2);
INSERT 0 1
postgres=# INSERT INTO t2 VALUES ('t', 1);
INSERT 0 1
postgres=# ANALYZE t2;
ANALYZE
postgres=# EXPLAIN SELECT * FROM t, t2 WHERE t.a < 10 AND t.d = t2.e ORDER
BY t.a, t.b, t.c, t.d, t2.f LIMIT 10;
QUERY PLAN
----------------------------------------------------------------------------
----
Limit (cost=0.29..9.30 rows=10 width=20)
-> Nested Loop (cost=0.29..129.99 rows=144 width=20)
Join Filter: (t.d = t2.e)
-> Index Scan using i_t_ab on t (cost=0.29..126.80 rows=72
width=14)
Index Cond: (a < 10)
-> Materialize (cost=0.00..1.03 rows=2 width=6)
-> Seq Scan on t2 (cost=0.00..1.02 rows=2 width=6)
(7 rows)
postgres=# SELECT * FROM t, t2 WHERE t.a < 10 AND t.d = t2.e ORDER BY t.a,
t.b, t.c, t.d, t2.f LIMIT 10;
a | b | c | d | e | f
---+---+---+---+---+---
0 | 0 | 4 | t | t | 2
0 | 0 | 4 | t | t | 1
0 | 1 | 3 | t | t | 2
0 | 1 | 3 | t | t | 1
0 | 2 | 2 | t | t | 2
0 | 2 | 2 | t | t | 1
0 | 3 | 1 | t | t | 2
0 | 3 | 1 | t | t | 1
0 | 4 | 0 | t | t | 2
0 | 4 | 0 | t | t | 1
(10 rows)
(Note the column f is sorted in the descending order.)
ISTM this was brought by the following change.
In truncate_useless_pathkeys, if query_pathkeys is longer than pathkeys
made from index columns old patch picked up the latter as IndexPath's
pathkeys. But the former is more suitable according to the context here.
- truncate_useless_pathkeys returns root->query_pathkeys when
the index is fully-ordered and query_pathkeys contains the
pathkeys made from index columns.
I think it would be required to modify the patch so that the transformation
of the pathkeys is performed only when each pathkey of query_pathkeys
references the indexed relation. (The above change might have been made
according to my comment in an earlier email I sent. But I have to admit my
explanation there was not adequate. I'm sorry for that.)
Here are random comments.
* In grouping_planner(), the patch resets the pathkeys of the cheapest
presorted index-path to query_pathkeys through
get_cheapest_fractional_path_for_pathkeys(). Is this necessary? ISTM the
transformation of the pathkeys after the scan/join optimization would be no
longer necessary once it has been done at the index-path creation time, ie,
build_index_paths(). Why does the patch do that thing?
* In get_relation_info(), the patch determines the branch condition
"keyattno != ObjectIdAttributeNumber". I fail to understand the meaning of
this branch condition. Could you explain about it?
Thanks,
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thank you for pointing out.
the attched pathkey_and_uniqueindx_v4_20131122.patch is changed as
follows.The patch is compiled successfully and passes all regression tests. Also,
the patch works well for the tests shown in an earlier email from
Horiguchi-san. But in this version I found an incorrect behavior.postgres=# CREATE TABLE t (a int not null, b int not null, c int, d text);
CREATE TABLE
postgres=# CREATE UNIQUE INDEX i_t_ab ON t (a, b);
CREATE INDEX
postgres=# INSERT INTO t (SELECT a / 5, 4 - (a % 5), a, 't' FROM
generate_series(000000, 099999) a);
INSERT 0 100000
postgres=# ANALYZE t;
ANALYZE
postgres=# CREATE TABLE t2 (e text, f int);
CREATE TABLE
postgres=# INSERT INTO t2 VALUES ('t', 2);
INSERT 0 1
postgres=# INSERT INTO t2 VALUES ('t', 1);
INSERT 0 1
postgres=# ANALYZE t2;
ANALYZE
postgres=# EXPLAIN SELECT * FROM t, t2 WHERE t.a < 10 AND t.d = t2.e ORDER
BY t.a, t.b, t.c, t.d, t2.f LIMIT 10;
QUERY PLAN
----------------------------------------------------------------------------
----
Limit (cost=0.29..9.30 rows=10 width=20)
- -> Sort (cost=92.33..92.57 rows=94 width=20)
- Sort Key: t.a, t.b, t.c, t.d, t2.f
-> Nested Loop (cost=0.29..129.99 rows=144 width=20)
Join Filter: (t.d = t2.e)
-> Index Scan using i_t_ab on t (cost=0.29..126.80 rows=72
width=14)
Index Cond: (a < 10)
-> Materialize (cost=0.00..1.03 rows=2 width=6)
-> Seq Scan on t2 (cost=0.00..1.02 rows=2 width=6)
(7 rows)
Auch. Outermost sort is omitted but necessary (Prefixed by '-' above).
ISTM this was brought by the following change.
In truncate_useless_pathkeys, if query_pathkeys is longer than pathkeys
made from index columns old patch picked up the latter as IndexPath's
pathkeys. But the former is more suitable according to the context here.- truncate_useless_pathkeys returns root->query_pathkeys when
the index is fully-ordered and query_pathkeys contains the
pathkeys made from index columns.
I implicitly put a wrong assumption that query_pathkeys is
dedicated for the relation under work, not whole subquery.
I think it would be required to modify the patch so that the transformation
of the pathkeys is performed only when each pathkey of query_pathkeys
references the indexed relation.
What is wrong here is that IndexPath declares that it is ordered
in the pathkeys which includes the columns not belongs to the
table.
(The above change might have been made according to my comment
in an earlier email I sent. But I have to admit my explanation
there was not adequate. I'm sorry for that.)
Nothing to be sorry for. It's my fault.
Anyway, I've put restriction on pathkeys_useful_for_ordering that
pick up longest pathkeys consists only ec members which matches
the required rel bitmap. With the new patch the planner makes
plan with the omitted sort and the query returns the following
result.
uniontest=# SELECT * FROM t, t2 WHERE t.a < 10 AND t.d = t2.e
| ORDER BY t.a, t.b, t.c, t.d, t2.f LIMIT 10;
| a | b | c | d | e | f
| ---+---+---+---+---+---
| 0 | 0 | 4 | t | t | 1 <-+-- Correct order by t2.f.
| 0 | 0 | 4 | t | t | 2 <-+
| 0 | 1 | 3 | t | t | 1
(snipped.)
Here are random comments.
* In grouping_planner(), the patch resets the pathkeys of the cheapest
presorted index-path to query_pathkeys through
get_cheapest_fractional_path_for_pathkeys(). Is this necessary? ISTM the
transformation of the pathkeys after the scan/join optimization would be no
longer necessary once it has been done at the index-path creation time, ie,
build_index_paths(). Why does the patch do that thing?
You're appliciated for pointing out. It is utterly useless code
of older patch. ("Have you totally scrapped the older verions?" >me :-(
Removed it in this version.
* In get_relation_info(), the patch determines the branch condition
"keyattno != ObjectIdAttributeNumber". I fail to understand the meaning of
this branch condition. Could you explain about it?
Literally answering, it means oid cannot be NULL (if it exists).
Precisely, it reflects that I believed (or wished) it cannot be
NULL if oid column exists. (Other sysattrs are dismissed because
they are hardly to be a unique index column:-)
Oid in a tuple is retrived using HeapTupleGetOid() in
heap_getsysattr().
| result = ObjectIdGetDatum(HeapTupleGetOid(tup));
Then HeapTupleHeaderGetOid,
| #define HeapTupleGetOid(tuple) \
| HeapTupleHeaderGetOid((tuple)->t_data)
Finally asking HeapTupleHeaderGetOid for the value.
htup_details.h:354
| #define HeapTupleHeaderGetOid(tup) \
| ( \
| ((tup)->t_infomask & HEAP_HASOID) ? \
| *((Oid *) ((char *)(tup) + (tup)->t_hoff - sizeof(Oid))) \
| : \
| InvalidOid \
| )
So oid cannot be NULL after all even though can be InvalidOid.
On the otherhand, it can be NULL according to the definition in
heap.c.
heap.c:146
| static FormData_pg_attribute a2 = {
| 0, {"oid"}, OIDOID, 0, sizeof(Oid),
| ObjectIdAttributeNumber, 0, -1, -1,
| true, 'p', 'i', true, false, false, true, 0
| }; ~~~~
Can we set this to false safely?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
pathkey_and_uniqueindx_v5_20131126.patchtext/x-patch; charset=us-asciiDownload+108-14
On Fri, 2013-11-22 at 16:59 +0900, Kyotaro HORIGUCHI wrote:
Hello. I found a bug(?) in thsi patch as I considered on another
patch.
src/backend/optimizer/util/plancat.c:256: trailing whitespace.
src/backend/optimizer/util/plancat.c:261: space before tab in indent.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Kyotaro HORIGUCHI wrote:
* In get_relation_info(), the patch determines the branch condition
"keyattno != ObjectIdAttributeNumber". I fail to understand the
meaning of this branch condition. Could you explain about it?
Literally answering, it means oid cannot be NULL (if it exists).
Understood. Thank you for the detailed information. But I'm not sure it's
worth complicating the code. What use cases are you thinking?
Thanks,
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I wrote:
Kyotaro HORIGUCHI wrote:
* In get_relation_info(), the patch determines the branch condition
"keyattno != ObjectIdAttributeNumber". I fail to understand the
meaning of this branch condition. Could you explain about it?
Literally answering, it means oid cannot be NULL (if it exists).
Understood. Thank you for the detailed information. But I'm not sure
it's
worth complicating the code. What use cases are you thinking?
Having said that, I've reconsidered about this, and start to think it would
be better that all system attributes are processed in the same way as are
user attributes because that makes the code more simple. Yes, as you
mention, it's not necessarily guaranteed that system attributes have the
uniqueness property in general, but that's another problem.
Thanks,
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I wrote:
I wrote:
Kyotaro HORIGUCHI wrote:
* In get_relation_info(), the patch determines the branch
condition "keyattno != ObjectIdAttributeNumber". I fail to
understand the meaning of this branch condition. Could you explainabout it?
Literally answering, it means oid cannot be NULL (if it exists).
Understood. Thank you for the detailed information. But I'm not sure
it's worth complicating the code. What use cases are you thinking?
Having said that, I've reconsidered about this, and start to think it
would
be better that all system attributes are processed in the same way as are
user attributes because that makes the code more simple. Yes, as you
mention, it's not necessarily guaranteed that system attributes have the
uniqueness property in general, but that's another problem.
I've modified the patch to work in such a way. Also, as ISTM the patch is
more complicated than what the patch really does, I've simplified the patch.
Attached is an updated version of the patch. Could you review the patch?
Thanks,
Best regards,
Etsuro Fujita