BRIN indexes for MAX, MIN, ORDER BY?
It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
just find the page range with the largest/smallest value, and then only
scan that one. Would that be hard to implement? I'm interested in working
on it if someone can give me some pointers.
Somewhat harder but still possible would be using BRIN indexes to
accelerate ORDER BY. This would require a sorting algorithm that can take
advantage of mostly-sorted inputs. You would sort the page ranges by their
minimum or maximum value, then feed the sorting algorithm in that order.
Gavin Wahl wrote:
It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
just find the page range with the largest/smallest value, and then only
scan that one. Would that be hard to implement? I'm interested in working
on it if someone can give me some pointers.
I think this means you need to represent this operation as a specific
Path in some way. See build_minmax_path() and its callers in planagg;
you probably need to tweak preprocess_minmax_aggregates() to consider
this.
This doesn't look like a simple project to me, mind.
Somewhat harder but still possible would be using BRIN indexes to
accelerate ORDER BY. This would require a sorting algorithm that can take
advantage of mostly-sorted inputs. You would sort the page ranges by their
minimum or maximum value, then feed the sorting algorithm in that order.
I wouldn't know where to start for this. Maybe once Tom is done with
planner rejiggering it would make sense to consider looking at how to do
it.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 28, 2015 at 9:58 AM, Gavin Wahl <gavinwahl@gmail.com> wrote:
It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
just find the page range with the largest/smallest value, and then only scan
that one.
You might need to scan more than that if you don't find any rows that
are visible.
Would that be hard to implement? I'm interested in working on it
if someone can give me some pointers.Somewhat harder but still possible would be using BRIN indexes to accelerate
ORDER BY. This would require a sorting algorithm that can take advantage of
mostly-sorted inputs. You would sort the page ranges by their minimum or
maximum value, then feed the sorting algorithm in that order.
Currently you get a Bitmap Index Scan and Bitmap Heap Scan, and then a
Sort node (quicksort or external sort). So the sort is already
receiving data sorted in BRIN-block order, isn't it? Are you saying
that the sort node should switch to something like insertion sort in
this case?
http://www.sorting-algorithms.com/nearly-sorted-initial-order
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
Gavin Wahl wrote:
It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
just find the page range with the largest/smallest value, and then only
scan that one. Would that be hard to implement? I'm interested in working
on it if someone can give me some pointers.
I think this means you need to represent this operation as a specific
Path in some way. See build_minmax_path() and its callers in planagg;
you probably need to tweak preprocess_minmax_aggregates() to consider
this.
This doesn't look like a simple project to me, mind.
Somewhat harder but still possible would be using BRIN indexes to
accelerate ORDER BY. This would require a sorting algorithm that can take
advantage of mostly-sorted inputs. You would sort the page ranges by their
minimum or maximum value, then feed the sorting algorithm in that order.
I wouldn't know where to start for this. Maybe once Tom is done with
planner rejiggering it would make sense to consider looking at how to do
it.
Yeah. I would urgently recommend that people *not* try to build new
things like planagg.c right now. A large part of the point of upper
planner path-ification is to have a less grotty way of dealing with
things like specialized aggregate implementations.
(And yes, I am working on that. Honest.)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Yeah. I would urgently recommend that people *not* try to build new
things like planagg.c right now. A large part of the point of upper
planner path-ification is to have a less grotty way of dealing with
things like specialized aggregate implementations.
Ok. I will wait and ask again later.
Hi Gavin
Note that Alexander Korotkov already started work in 2013 on a
somewhat similar feature called partial sort:
/messages/by-id/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com
In particular, see the 2nd patch for KNN sort -- it uses known
bounding boxes from the GiST index for sorting, which in many ways is
similar to min/max BRIN ranges.
*partial-knn-1.patch*
KNN-GiST provides ability to get ordered results from index, but this order
is based only on index information. For instance, GiST index contains
bounding rectangles for polygons, and we can't get exact distance to
polygon from index (similar situation is in PostGIS). In attached patch,
GiST distance method can set recheck flag (similar to consistent method).
This flag means that distance method returned lower bound of distance and
we should recheck it from heap.
Unfortunatley this work has stalled.
Regards,
Marti
On Sun, Sep 27, 2015 at 11:58 PM, Gavin Wahl <gavinwahl@gmail.com> wrote:
It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
just find the page range with the largest/smallest value, and then only scan
that one. Would that be hard to implement? I'm interested in working on it
if someone can give me some pointers.Somewhat harder but still possible would be using BRIN indexes to accelerate
ORDER BY. This would require a sorting algorithm that can take advantage of
mostly-sorted inputs. You would sort the page ranges by their minimum or
maximum value, then feed the sorting algorithm in that order.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 09/28/2015 05:28 PM, Marti Raudsepp wrote:
Note that Alexander Korotkov already started work in 2013 on a
somewhat similar feature called partial sort:
/messages/by-id/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.comIn particular, see the 2nd patch for KNN sort -- it uses known
bounding boxes from the GiST index for sorting, which in many ways is
similar to min/max BRIN ranges.*partial-knn-1.patch*
KNN-GiST provides ability to get ordered results from index, but this order
is based only on index information. For instance, GiST index contains
bounding rectangles for polygons, and we can't get exact distance to
polygon from index (similar situation is in PostGIS). In attached patch,
GiST distance method can set recheck flag (similar to consistent method).
This flag means that distance method returned lower bound of distance and
we should recheck it from heap.Unfortunatley this work has stalled.
No, that was actually committed for 9.5. It's this item in the release
notes:
Allow queries to perform accurate distance filtering of
bounding-box-indexed objects (polygons, circles) using GiST indexes
(Alexander Korotkov, Heikki Linnakangas)Previously, a common table expression was required to return a large
number of rows ordered by bounding-box distance, and then filtered
further with a more accurate non-bounding-box distance calculation.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 27/09/15 21:58, Gavin Wahl wrote:
Somewhat harder but still possible would be using BRIN indexes to
accelerate ORDER BY. This would require a sorting algorithm that can take
advantage of mostly-sorted inputs. You would sort the page ranges by their
minimum or maximum value, then feed the sorting algorithm in that order.
An internal merge sort does well with partially-sorted input.
--
Cheers,
Jeremy
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 29 September 2015 at 13:20, Jeremy Harris <jgh@wizmail.org> wrote:
On 27/09/15 21:58, Gavin Wahl wrote:
Somewhat harder but still possible would be using BRIN indexes to
accelerate ORDER BY. This would require a sorting algorithm that can take
advantage of mostly-sorted inputs. You would sort the page ranges bytheir
minimum or maximum value, then feed the sorting algorithm in that order.
An internal merge sort does well with partially-sorted input.
Yes, the $SUBJECT would be a valid use for BRIN.
And also in general for sorts, leading to an optimization of merge joins
using BRIN indexes.
All this was foreseen in the original design; if it really was trivial it
would be in 9.5 already...
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services