BRIN indexes for MAX, MIN, ORDER BY?

Started by Gavin Wahlover 10 years ago9 messages
#1Gavin Wahl
gavinwahl@gmail.com

It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
just find the page range with the largest/smallest value, and then only
scan that one. Would that be hard to implement? I'm interested in working
on it if someone can give me some pointers.

Somewhat harder but still possible would be using BRIN indexes to
accelerate ORDER BY. This would require a sorting algorithm that can take
advantage of mostly-sorted inputs. You would sort the page ranges by their
minimum or maximum value, then feed the sorting algorithm in that order.

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Gavin Wahl (#1)
Re: BRIN indexes for MAX, MIN, ORDER BY?

Gavin Wahl wrote:

It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
just find the page range with the largest/smallest value, and then only
scan that one. Would that be hard to implement? I'm interested in working
on it if someone can give me some pointers.

I think this means you need to represent this operation as a specific
Path in some way. See build_minmax_path() and its callers in planagg;
you probably need to tweak preprocess_minmax_aggregates() to consider
this.

This doesn't look like a simple project to me, mind.

Somewhat harder but still possible would be using BRIN indexes to
accelerate ORDER BY. This would require a sorting algorithm that can take
advantage of mostly-sorted inputs. You would sort the page ranges by their
minimum or maximum value, then feed the sorting algorithm in that order.

I wouldn't know where to start for this. Maybe once Tom is done with
planner rejiggering it would make sense to consider looking at how to do
it.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Gavin Wahl (#1)
Re: BRIN indexes for MAX, MIN, ORDER BY?

On Mon, Sep 28, 2015 at 9:58 AM, Gavin Wahl <gavinwahl@gmail.com> wrote:

It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
just find the page range with the largest/smallest value, and then only scan
that one.

You might need to scan more than that if you don't find any rows that
are visible.

Would that be hard to implement? I'm interested in working on it
if someone can give me some pointers.

Somewhat harder but still possible would be using BRIN indexes to accelerate
ORDER BY. This would require a sorting algorithm that can take advantage of
mostly-sorted inputs. You would sort the page ranges by their minimum or
maximum value, then feed the sorting algorithm in that order.

Currently you get a Bitmap Index Scan and Bitmap Heap Scan, and then a
Sort node (quicksort or external sort). So the sort is already
receiving data sorted in BRIN-block order, isn't it? Are you saying
that the sort node should switch to something like insertion sort in
this case?

http://www.sorting-algorithms.com/nearly-sorted-initial-order

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#2)
Re: BRIN indexes for MAX, MIN, ORDER BY?

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Gavin Wahl wrote:

It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
just find the page range with the largest/smallest value, and then only
scan that one. Would that be hard to implement? I'm interested in working
on it if someone can give me some pointers.

I think this means you need to represent this operation as a specific
Path in some way. See build_minmax_path() and its callers in planagg;
you probably need to tweak preprocess_minmax_aggregates() to consider
this.

This doesn't look like a simple project to me, mind.

Somewhat harder but still possible would be using BRIN indexes to
accelerate ORDER BY. This would require a sorting algorithm that can take
advantage of mostly-sorted inputs. You would sort the page ranges by their
minimum or maximum value, then feed the sorting algorithm in that order.

I wouldn't know where to start for this. Maybe once Tom is done with
planner rejiggering it would make sense to consider looking at how to do
it.

Yeah. I would urgently recommend that people *not* try to build new
things like planagg.c right now. A large part of the point of upper
planner path-ification is to have a less grotty way of dealing with
things like specialized aggregate implementations.

(And yes, I am working on that. Honest.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Gavin Wahl
gavinwahl@gmail.com
In reply to: Tom Lane (#4)
Re: BRIN indexes for MAX, MIN, ORDER BY?

Yeah. I would urgently recommend that people *not* try to build new
things like planagg.c right now. A large part of the point of upper
planner path-ification is to have a less grotty way of dealing with
things like specialized aggregate implementations.

Ok. I will wait and ask again later.

#6Marti Raudsepp
marti@juffo.org
In reply to: Gavin Wahl (#1)
Re: BRIN indexes for MAX, MIN, ORDER BY?

Hi Gavin

Note that Alexander Korotkov already started work in 2013 on a
somewhat similar feature called partial sort:
/messages/by-id/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com

In particular, see the 2nd patch for KNN sort -- it uses known
bounding boxes from the GiST index for sorting, which in many ways is
similar to min/max BRIN ranges.

*partial-knn-1.patch*

KNN-GiST provides ability to get ordered results from index, but this order
is based only on index information. For instance, GiST index contains
bounding rectangles for polygons, and we can't get exact distance to
polygon from index (similar situation is in PostGIS). In attached patch,
GiST distance method can set recheck flag (similar to consistent method).
This flag means that distance method returned lower bound of distance and
we should recheck it from heap.

Unfortunatley this work has stalled.

Regards,
Marti

On Sun, Sep 27, 2015 at 11:58 PM, Gavin Wahl <gavinwahl@gmail.com> wrote:

It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
just find the page range with the largest/smallest value, and then only scan
that one. Would that be hard to implement? I'm interested in working on it
if someone can give me some pointers.

Somewhat harder but still possible would be using BRIN indexes to accelerate
ORDER BY. This would require a sorting algorithm that can take advantage of
mostly-sorted inputs. You would sort the page ranges by their minimum or
maximum value, then feed the sorting algorithm in that order.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Marti Raudsepp (#6)
Re: BRIN indexes for MAX, MIN, ORDER BY?

On 09/28/2015 05:28 PM, Marti Raudsepp wrote:

Note that Alexander Korotkov already started work in 2013 on a
somewhat similar feature called partial sort:
/messages/by-id/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com

In particular, see the 2nd patch for KNN sort -- it uses known
bounding boxes from the GiST index for sorting, which in many ways is
similar to min/max BRIN ranges.

*partial-knn-1.patch*

KNN-GiST provides ability to get ordered results from index, but this order
is based only on index information. For instance, GiST index contains
bounding rectangles for polygons, and we can't get exact distance to
polygon from index (similar situation is in PostGIS). In attached patch,
GiST distance method can set recheck flag (similar to consistent method).
This flag means that distance method returned lower bound of distance and
we should recheck it from heap.

Unfortunatley this work has stalled.

No, that was actually committed for 9.5. It's this item in the release
notes:

Allow queries to perform accurate distance filtering of
bounding-box-indexed objects (polygons, circles) using GiST indexes
(Alexander Korotkov, Heikki Linnakangas)

Previously, a common table expression was required to return a large
number of rows ordered by bounding-box distance, and then filtered
further with a more accurate non-bounding-box distance calculation.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Jeremy Harris
jgh@wizmail.org
In reply to: Gavin Wahl (#1)
Re: BRIN indexes for MAX, MIN, ORDER BY?

On 27/09/15 21:58, Gavin Wahl wrote:

Somewhat harder but still possible would be using BRIN indexes to
accelerate ORDER BY. This would require a sorting algorithm that can take
advantage of mostly-sorted inputs. You would sort the page ranges by their
minimum or maximum value, then feed the sorting algorithm in that order.

An internal merge sort does well with partially-sorted input.
--
Cheers,
Jeremy

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Simon Riggs
simon@2ndQuadrant.com
In reply to: Jeremy Harris (#8)
Re: BRIN indexes for MAX, MIN, ORDER BY?

On 29 September 2015 at 13:20, Jeremy Harris <jgh@wizmail.org> wrote:

On 27/09/15 21:58, Gavin Wahl wrote:

Somewhat harder but still possible would be using BRIN indexes to
accelerate ORDER BY. This would require a sorting algorithm that can take
advantage of mostly-sorted inputs. You would sort the page ranges by

their

minimum or maximum value, then feed the sorting algorithm in that order.

An internal merge sort does well with partially-sorted input.

Yes, the $SUBJECT would be a valid use for BRIN.

And also in general for sorts, leading to an optimization of merge joins
using BRIN indexes.

All this was foreseen in the original design; if it really was trivial it
would be in 9.5 already...

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services