PostgreSQL - 'SKYLINE OF' clause added!

Started by ranbeer makinabout 19 years ago27 messageshackers
Jump to latest
#1ranbeer makin
ranbeer@gmail.com

We at International Institute of Information Technology (IIIT) Hyderabad,
India, have extended the Postgres database
system with the skyline operation. For this work, we were guided by our
Prof. Kamalakar Karlapalem
(http://www.iiit.ac.in/~kamal/).

We have extended SQL 'SELECT' clause by an optional 'SKYLINE OF' clause in
version 8.0.3. The changes are done in parser, transformation,
planner/optimizer (a bit) and executor stages. For its execution, two novel
algorithms - BNL (Block Nested Loop) and SFS
(Sort Filter Skyline) - are also implemented.

Can this piece of work contribute to PostgreSQL? If yes, then we'll send out
a detailed report of this project including changes
made, issues involved/need to be solved, limitations, future work, and the
source code etc.

Thanks very much.

Regards,
Nikita
Ranbeer

--
http://students.iiit.ac.in/~nikita/
http://students.iiit.ac.in/~ranbeer/

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: ranbeer makin (#1)
Re: PostgreSQL - 'SKYLINE OF' clause added!

On Sat, Mar 03, 2007 at 07:02:41PM +0530, ranbeer makin wrote:

We at International Institute of Information Technology (IIIT) Hyderabad,
India, have extended the Postgres database
system with the skyline operation. For this work, we were guided by our
Prof. Kamalakar Karlapalem
(http://www.iiit.ac.in/~kamal/).

<snip>

Can this piece of work contribute to PostgreSQL? If yes, then we'll send out
a detailed report of this project including changes
made, issues involved/need to be solved, limitations, future work, and the
source code etc.

Well, that kind of depends. I have no idea what "Skyline" means so
telling us what it is would be a good start

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Martijn van Oosterhout (#2)
Re: PostgreSQL - 'SKYLINE OF' clause added!

Martijn van Oosterhout wrote:

Well, that kind of depends. I have no idea what "Skyline" means so
telling us what it is would be a good start

Also, showing us the diffs (maybe on a web site) would give us an idea
of how intrusive it would be.

But I agree with Martijn - the first thing is to describe the
functionality properly.

cheers

andrew

#4ranbeer makin
ranbeer@gmail.com
In reply to: Martijn van Oosterhout (#2)
Re: PostgreSQL - 'SKYLINE OF' clause added!

Here is a description of what the SKYLINE operator is:
---
Suppose you wish to purchase books and you are looking for books with high
rating and low price. However, both the criteria of selecting books are
complementary since books of higher rating are generally more expensive. For
finding such books, you'll query the database system of the book store which
will return a set of interesting books. The word 'interesting' implies all
the books which are as good or better in both the dimensions (rating and
price) and better in at least one dimension. This set of interesting points
forms the Skyline.

Skyline operator finds points which are not dominated by other data points.
A point dominates another point if it is as good or better in all dimensions
and better in at least one dimension.

For specifying the Skyline queries, we extend SQL SELECT statement by an
optional *SKYLINE OF* clause as given below:

SELECT ... FROM ... WHERE...

GROUP BY ... HAVING...

*SKYLINE OF [DISTINCT] d1 [MIN | MAX | DIFF], .., dm [MIN | MAX | DIFF]*

ORDER BY...

Where, d1, d2 ,…, dm denote the dimensions of the Skyline, and MIN, MAX,
DIFF specify whether the value in that dimension should be minimized,
maximized, or simply be different. When DIFF is specified, two tuples are
compared only if the value of the attribute on which DIFF is applied is
different.

When DISTINCT clause is specified and if there are two or more tuples with
the same values of skyline attributes, then only one of them is retained in
the skyline set. Otherwise, all of them are retained.

Let's consider the above example of purchasing books with high rating and
low price.

*Book Name*

*Rating (out of 5)*

*Price (Rs)*

Prodigal Daughter

3

250

The city of Joy

5

400

Vanishing Acts

2

250

The Notebook

4

300

Fountain Head

5

350

Dear John

5

500

*Table1. Sample of book database*

Now, in order to get books with high rating and low price, you simply can
issue the following query:

SELECT *

FROM Books

SKYLINE OF rating MAX, price MIN;

The Skyline set returned will be:

*Book Name*

*Rating (out of 5)*

*Price (Rs)*

Prodigal Daughter

3

250

The Notebook

4

300

Fountain Head

5

350

*Table2. Skyline set*

From this set, you can now make your choice of books, by weighing your
personal preferences for price and rating of the books.

For more information, you can refer to:
S. Borzsonyi, D. Kossmann, and K. Stocker. The skyline operator. In *ICDE*,
pages 421.430, 2001

---

Thanks.

Show quoted text

On 3/3/07, Martijn van Oosterhout <kleptog@svana.org> wrote:

On Sat, Mar 03, 2007 at 07:02:41PM +0530, ranbeer makin wrote:

We at International Institute of Information Technology (IIIT)

Hyderabad,

India, have extended the Postgres database
system with the skyline operation. For this work, we were guided by our
Prof. Kamalakar Karlapalem
(http://www.iiit.ac.in/~kamal/).

<snip>

Can this piece of work contribute to PostgreSQL? If yes, then we'll send

out

a detailed report of this project including changes
made, issues involved/need to be solved, limitations, future work, and

the

source code etc.

Well, that kind of depends. I have no idea what "Skyline" means so
telling us what it is would be a good start

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

From each according to his ability. To each according to his ability to

litigate.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFF6XrkIB7bNG8LQkwRAqw8AJ0UKAy41OMxdgLUdY1G+e7R6/jGPwCZAQY4
9uCKFUW65UBIx7fpogR75Yo=
=6Yc0
-----END PGP SIGNATURE-----

#5Shane Ambler
pgsql@Sheeky.Biz
In reply to: ranbeer makin (#1)
Re: PostgreSQL - 'SKYLINE OF' clause added!

ranbeer makin wrote:

We at International Institute of Information Technology (IIIT) Hyderabad,
India, have extended the Postgres database
system with the skyline operation. For this work, we were guided by our
Prof. Kamalakar Karlapalem
(http://www.iiit.ac.in/~kamal/).

We have extended SQL 'SELECT' clause by an optional 'SKYLINE OF' clause in
version 8.0.3. The changes are done in parser, transformation,
planner/optimizer (a bit) and executor stages. For its execution, two novel
algorithms - BNL (Block Nested Loop) and SFS
(Sort Filter Skyline) - are also implemented.

From what I read on your web pages it sounds interesting and may be a
worthwhile addition to PostgreSQL. I'll have a look at it when it is
available.

Can this piece of work contribute to PostgreSQL? If yes, then we'll send
out
a detailed report of this project including changes
made, issues involved/need to be solved, limitations, future work, and the
source code etc.

I am not one making the choice of accepting your work but from what I
know you would want to make your patch available so others can review
the stability/quality of your work and decide whether there is enough
demand for the feature to have it included in the main distribution
either as part of the main code or within the contrib section.

One option you have is to start a project at pgfoundry.org so others can
access and try your contribution. This will allow your work to be
available and to be tested by those interested in this feature. If your
work proves to be worthwhile and in demand it can progress from there
into the main distribution.

You most probably want to look at porting your changes to the latest
postgresql release as well.

Thanks very much.

Regards,
Nikita
Ranbeer

--
http://students.iiit.ac.in/~nikita/
http://students.iiit.ac.in/~ranbeer/

--

Shane Ambler
pgSQL@Sheeky.Biz

Get Sheeky @ http://Sheeky.Biz

#6Joshua D. Drake
jd@commandprompt.com
In reply to: Shane Ambler (#5)
Re: PostgreSQL - 'SKYLINE OF' clause added!

You most probably want to look at porting your changes to the latest
postgresql release as well.

I believe many people would be interested, but to get the feature
accepted we would need a patch against -head as that is the latest
version of PostgreSQL under development.

You can see more information about that here:

http://www.postgresql.org/developer/

The above link will also help you with our required coding styles,
acceptance policies etc...

Sincerely,

Joshua D. Drake

Thanks very much.

Regards,
Nikita
Ranbeer

--
http://students.iiit.ac.in/~nikita/
http://students.iiit.ac.in/~ranbeer/

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

#7Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: ranbeer makin (#4)
Re: PostgreSQL - 'SKYLINE OF' clause added!

FWIW, this sounds like a subset of the Query By Example stuff that
someone is working on. I don't have a URL handy since I'm on a plane,
but I think google can find it.

On Mar 3, 2007, at 8:12 AM, ranbeer makin wrote:

Here is a description of what the SKYLINE operator is:
---
Suppose you wish to purchase books and you are looking for books
with high rating and low price. However, both the criteria of
selecting books are complementary since books of higher rating are
generally more expensive. For finding such books, you'll query the
database system of the book store which will return a set of
interesting books. The word 'interesting' implies all the books
which are as good or better in both the dimensions (rating and
price) and better in at least one dimension. This set of
interesting points forms the Skyline.
Skyline operator finds points which are not dominated by other data
points. A point dominates another point if it is as good or better
in all dimensions and better in at least one dimension.

For specifying the Skyline queries, we extend SQL SELECT statement
by an optional SKYLINE OF clause as given below:

SELECT ... FROM ... WHERE...

GROUP BY ... HAVING...

SKYLINE OF [DISTINCT] d1 [MIN | MAX | DIFF], .., dm [MIN | MAX |
DIFF]

ORDER BY...

Where, d1, d2 ,…, dm denote the dimensions of the Skyline, and MIN,
MAX, DIFF specify whether the value in that dimension should be
minimized, maximized, or simply be different. When DIFF is
specified, two tuples are compared only if the value of the
attribute on which DIFF is applied is different.

When DISTINCT clause is specified and if there are two or more
tuples with the same values of skyline attributes, then only one of
them is retained in the skyline set. Otherwise, all of them are
retained.

Let's consider the above example of purchasing books with high
rating and low price.

Book Name

Rating (out of 5)

Price (Rs)

Prodigal Daughter

3

250

The city of Joy

5

400

Vanishing Acts

2

250

The Notebook

4

300

Fountain Head

5

350

Dear John

5

500

Table1. Sample of book database

Now, in order to get books with high rating and low price, you
simply can issue the following query:

SELECT *

FROM Books

SKYLINE OF rating MAX, price MIN;

The Skyline set returned will be:

Book Name

Rating (out of 5)

Price (Rs)

Prodigal Daughter

3

250

The Notebook

4

300

Fountain Head

5

350

Table2. Skyline set

From this set, you can now make your choice of books, by weighing
your personal preferences for price and rating of the books.

For more information, you can refer to:
S. Borzsonyi, D. Kossmann, and K. Stocker. The skyline operator. In
ICDE, pages 421.430, 2001

---

Thanks.

On 3/3/07, Martijn van Oosterhout <kleptog@svana.org> wrote: On
Sat, Mar 03, 2007 at 07:02:41PM +0530, ranbeer makin wrote:

We at International Institute of Information Technology (IIIT)

Hyderabad,

India, have extended the Postgres database
system with the skyline operation. For this work, we were guided

by our

Prof. Kamalakar Karlapalem
(http://www.iiit.ac.in/~kamal/).

<snip>

Can this piece of work contribute to PostgreSQL? If yes, then

we'll send out

a detailed report of this project including changes
made, issues involved/need to be solved, limitations, future

work, and the

source code etc.

Well, that kind of depends. I have no idea what "Skyline" means so
telling us what it is would be a good start

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/
kleptog/

From each according to his ability. To each according to his

ability to litigate.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFF6XrkIB7bNG8LQkwRAqw8AJ0UKAy41OMxdgLUdY1G+e7R6/jGPwCZAQY4
9uCKFUW65UBIx7fpogR75Yo=
=6Yc0
-----END PGP SIGNATURE-----

--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#8David Fetter
david@fetter.org
In reply to: Jim Nasby (#7)
Re: PostgreSQL - 'SKYLINE OF' clause added!

On Mon, Mar 05, 2007 at 09:04:46PM -0600, Jim Nasby wrote:

FWIW, this sounds like a subset of the Query By Example stuff that
someone is working on. I don't have a URL handy since I'm on a plane,
but I think google can find it.

It's now called ObelisQ <http://pgfoundry.org/projects/qbe&gt;

Cheers,
D

On Mar 3, 2007, at 8:12 AM, ranbeer makin wrote:

Here is a description of what the SKYLINE operator is:
---
Suppose you wish to purchase books and you are looking for books
with high rating and low price. However, both the criteria of
selecting books are complementary since books of higher rating are
generally more expensive. For finding such books, you'll query the
database system of the book store which will return a set of
interesting books. The word 'interesting' implies all the books
which are as good or better in both the dimensions (rating and
price) and better in at least one dimension. This set of
interesting points forms the Skyline.
Skyline operator finds points which are not dominated by other data
points. A point dominates another point if it is as good or better
in all dimensions and better in at least one dimension.

For specifying the Skyline queries, we extend SQL SELECT statement
by an optional SKYLINE OF clause as given below:

SELECT ... FROM ... WHERE...

GROUP BY ... HAVING...

SKYLINE OF [DISTINCT] d1 [MIN | MAX | DIFF], .., dm [MIN | MAX |
DIFF]

ORDER BY...

Where, d1, d2 ,���, dm denote the dimensions of the Skyline, and MIN,
MAX, DIFF specify whether the value in that dimension should be
minimized, maximized, or simply be different. When DIFF is
specified, two tuples are compared only if the value of the
attribute on which DIFF is applied is different.

When DISTINCT clause is specified and if there are two or more
tuples with the same values of skyline attributes, then only one of
them is retained in the skyline set. Otherwise, all of them are
retained.

Let's consider the above example of purchasing books with high
rating and low price.

Book Name

Rating (out of 5)

Price (Rs)

Prodigal Daughter

3

250

The city of Joy

5

400

Vanishing Acts

2

250

The Notebook

4

300

Fountain Head

5

350

Dear John

5

500

Table1. Sample of book database

Now, in order to get books with high rating and low price, you
simply can issue the following query:

SELECT *

FROM Books

SKYLINE OF rating MAX, price MIN;

The Skyline set returned will be:

Book Name

Rating (out of 5)

Price (Rs)

Prodigal Daughter

3

250

The Notebook

4

300

Fountain Head

5

350

Table2. Skyline set

From this set, you can now make your choice of books, by weighing
your personal preferences for price and rating of the books.

For more information, you can refer to:
S. Borzsonyi, D. Kossmann, and K. Stocker. The skyline operator. In
ICDE, pages 421.430, 2001

---

Thanks.

On 3/3/07, Martijn van Oosterhout <kleptog@svana.org> wrote: On
Sat, Mar 03, 2007 at 07:02:41PM +0530, ranbeer makin wrote:

We at International Institute of Information Technology (IIIT)

Hyderabad,

India, have extended the Postgres database
system with the skyline operation. For this work, we were guided

by our

Prof. Kamalakar Karlapalem
(http://www.iiit.ac.in/~kamal/).

<snip>

Can this piece of work contribute to PostgreSQL? If yes, then

we'll send out

a detailed report of this project including changes
made, issues involved/need to be solved, limitations, future

work, and the

source code etc.

Well, that kind of depends. I have no idea what "Skyline" means so
telling us what it is would be a good start

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/
kleptog/

From each according to his ability. To each according to his

ability to litigate.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFF6XrkIB7bNG8LQkwRAqw8AJ0UKAy41OMxdgLUdY1G+e7R6/jGPwCZAQY4
9uCKFUW65UBIx7fpogR75Yo=
=6Yc0
-----END PGP SIGNATURE-----

--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate

#9Chris Browne
cbbrowne@acm.org
In reply to: ranbeer makin (#1)
Re: PostgreSQL - 'SKYLINE OF' clause added!

decibel@decibel.org (Jim Nasby) writes:

FWIW, this sounds like a subset of the Query By Example stuff that
someone is working on. I don't have a URL handy since I'm on a plane,
but I think google can find it.

The pgFoundry project is here...
http://pgfoundry.org/projects/qbe

And yes, indeed, this sounds quite a lot like what Meredith Patterson
presented at the Toronto conference.
--
(reverse (concatenate 'string "ofni.secnanifxunil" "@" "enworbbc"))
http://linuxfinances.info/info/finances.html
Rules of the Evil Overlord #189. "I will never tell the hero "Yes I
was the one who did it, but you'll never be able to prove it to that
incompetent old fool." Chances are, that incompetent old fool is
standing behind the curtain." <http://www.eviloverlord.com/&gt;

#10Josh Berkus
josh@agliodbs.com
In reply to: Chris Browne (#9)
Re: PostgreSQL - 'SKYLINE OF' clause added!

And yes, indeed, this sounds quite a lot like what Meredith Patterson
presented at the Toronto conference.

This would be good to have, though, since Meredith's work has some problematic
IP encumbrances.

Question, though: is the SKYLINE syntax part of a standard anywhere?

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#10)
Re: PostgreSQL - 'SKYLINE OF' clause added!

Josh Berkus <josh@agliodbs.com> writes:

Question, though: is the SKYLINE syntax part of a standard anywhere?

There's certainly not anything like that in SQL2003.

I'm also kind of wondering if the main use-cases couldn't be met with
suitable multi-input custom aggregates, which is something we already
have as of 8.2.

regards, tom lane

#12Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#11)
Re: PostgreSQL - 'SKYLINE OF' clause added!

Tom,

I'm also kind of wondering if the main use-cases couldn't be met with
suitable multi-input custom aggregates, which is something we already
have as of 8.2.

Actually, given that "skyline of" is *only* for aggregate sorting (as far as I
can tell) it doesn't present the complications which QBE did for using a
function interface.

Ranbeer, would it be possible to use an aggregate function syntax instead of
the SKYLINE OF syntax extension? This allows us to sidestep the issue of
non-standard additions to the reserved word list.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Josh Berkus (#12)
Re: PostgreSQL - 'SKYLINE OF' clause added!

Josh Berkus wrote:

Tom,

I'm also kind of wondering if the main use-cases couldn't be met with
suitable multi-input custom aggregates, which is something we already
have as of 8.2.

Actually, given that "skyline of" is *only* for aggregate sorting (as far as I
can tell) it doesn't present the complications which QBE did for using a
function interface.

There is people on a Venezuelan university working on SKYLINE OF and
other operators on Postgres. I had some looks at their work because
they asked for help in the spanish list.

Not only they added the SKYLINE OF clause, but they also had some mods
to the ORDER BY clause, and a couple of other grammar changes as well.
While SKYLINE OF itself could probably be folded as aggregates, the
other stuff is not likely to be amenable to such treatment.

Also, keep in mind that there were plenty of changes in the executor.
This stuff is not likely to be very easy to implement efficiently using
our extant executor machinery; note that Ranbeer mentioned
implementation of "block nested loop" and other algorithms. Not sure
how easy would be to fold that stuff into the optimizer for multi-input
aggregates, instead of hardwiring it to the SKYLINE OF syntax.

There's a certain group in the Venezuelan Uni that was about to finish
their thesis. They promised me a look into their report; maybe I can
give further input from that and maybe merge Ranbeer's stuff with it.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#14Gavin Sherry
swm@linuxworld.com.au
In reply to: Alvaro Herrera (#13)
Re: PostgreSQL - 'SKYLINE OF' clause added!

On Tue, 6 Mar 2007, Alvaro Herrera wrote:

Also, keep in mind that there were plenty of changes in the executor.
This stuff is not likely to be very easy to implement efficiently using
our extant executor machinery; note that Ranbeer mentioned
implementation of "block nested loop" and other algorithms. Not sure
how easy would be to fold that stuff into the optimizer for multi-input
aggregates, instead of hardwiring it to the SKYLINE OF syntax.

Yes, there's been a lot of working on calculating skyline efficiently,
with different sorting techniques and so on. This is the most interesting
part of the idea. You could calculate the query Ranbeer gave using pure
SQL and, perhaps, use of some covariance aggregates or something already.
Of course, it gets harder when you want to calculate across many
dimensions.

Personally, I'd love to see some of these newer data analysis
capabilities added to PostgreSQL -- or at least put out there as
interesting patches.

Thanks,

Gavin

#15Josh Berkus
josh@agliodbs.com
In reply to: Gavin Sherry (#14)
Re: PostgreSQL - 'SKYLINE OF' clause added!

Gavin,

Personally, I'd love to see some of these newer data analysis
capabilities added to PostgreSQL -- or at least put out there as
interesting patches.

I think if the code is good enough, and we can avoid horrible non-standard
syntax extensions, they should go in. We have to defend our title as "most
advanced database" and having stuff like Skyline first (before DB2 or MS)
goes a long way for that.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#15)
Re: PostgreSQL - 'SKYLINE OF' clause added!

Josh Berkus <josh@agliodbs.com> writes:

I think if the code is good enough, and we can avoid horrible non-standard
syntax extensions, they should go in. We have to defend our title as "most
advanced database" and having stuff like Skyline first (before DB2 or MS)
goes a long way for that.

Well, whether it's horrible or not is in the eye of the beholder, but
this is certainly a non-standard syntax extension.

My questions about whether to adopt it have more to do with
cost/benefit. I haven't seen the patch, but it sounds like it will be
large and messy; and it's for a feature that nobody ever heard of before,
let alone one that the community has developed a consensus it wants.
I'm not interested in adopting stuff just "because DB2 hasn't got it".

It's also worth noting that what we've got here is a large patch
developed, by students, completely outside our normal development
process; so the odds that it's going to be anywhere near acceptable are
low. I think the last time we applied a patch that met that description
was the INTERSECT/EXCEPT patch in 1999 ... maybe you don't remember
what a fiasco that was, but I do.

Sorry to be a thrower of cold water, but I just don't see that this
comes anywhere near being something we should be eager to accept.

regards, tom lane

#17David Fuhry
dfuhry@cs.kent.edu
In reply to: Gavin Sherry (#14)
Re: PostgreSQL - 'SKYLINE OF' clause added!

The query Ranbeer gave - as with any skyline query - can be solved with
just pure SQL:

select * from books b where not exists(
select * from books b2 where
b2.rating >= b.rating and b2.price <= b.price and
(b2.rating > b.rating or b2.price < b.price)
);
book_name | rating | price
-------------------+--------+-------
Prodigal Daughter | 3 | 250
The Notebook | 4 | 300
Fountain Head | 5 | 350
(3 rows)

The idea of the BNL (block nested loop) skyline algorithm is to avoid
the nested loop by storing "dominating" records as the query proceeds -
in the above example, records which are relatively high in rating and
low in price - and comparing each candidate record to those first.

BNL is the most reasonable skyline algorithm in the absence of a
multidimensional (usually R-Tree) index on the columns. For answering
skyline queries where such an index exists over all query columns, the
most broadly used generalized algorithm is BBS [1]Papadias, D., Tao, Y., Fu, G., and Seeger, B. 2005. Progressive skyline computation in database systems. ACM Trans. Database Syst. 30, 1 (Mar. 2005), 41-82. DOI= http://doi.acm.org/10.1145/1061318.1061320.

Thanks,

Dave Fuhry

[1]: Papadias, D., Tao, Y., Fu, G., and Seeger, B. 2005. Progressive skyline computation in database systems. ACM Trans. Database Syst. 30, 1 (Mar. 2005), 41-82. DOI= http://doi.acm.org/10.1145/1061318.1061320
skyline computation in database systems. ACM Trans. Database Syst. 30, 1
(Mar. 2005), 41-82. DOI= http://doi.acm.org/10.1145/1061318.1061320

Gavin Sherry wrote:

Show quoted text

On Tue, 6 Mar 2007, Alvaro Herrera wrote:

Also, keep in mind that there were plenty of changes in the executor.
This stuff is not likely to be very easy to implement efficiently using
our extant executor machinery; note that Ranbeer mentioned
implementation of "block nested loop" and other algorithms. Not sure
how easy would be to fold that stuff into the optimizer for multi-input
aggregates, instead of hardwiring it to the SKYLINE OF syntax.

Yes, there's been a lot of working on calculating skyline efficiently,
with different sorting techniques and so on. This is the most interesting
part of the idea. You could calculate the query Ranbeer gave using pure
SQL and, perhaps, use of some covariance aggregates or something already.
Of course, it gets harder when you want to calculate across many
dimensions.

Personally, I'd love to see some of these newer data analysis
capabilities added to PostgreSQL -- or at least put out there as
interesting patches.

Thanks,

Gavin

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

#18Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#16)
Re: PostgreSQL - 'SKYLINE OF' clause added!

Tom,

My questions about whether to adopt it have more to do with
cost/benefit. I haven't seen the patch, but it sounds like it will be
large and messy; and it's for a feature that nobody ever heard of before,
let alone one that the community has developed a consensus it wants.
I'm not interested in adopting stuff just "because DB2 hasn't got it".

OK, to make it a clearer case: we have an increasing user base using
PostgreSQL for decision support. One of the reasons for this is that PG is
the *only* OSDB which does a decent job of DSS. Adding unique DSS features
will make PostgreSQL attractive to a lot more DSS application developers, and
help make up for the things which we don't have yet (parallel query, async
I/O, windowing functions).

"Approximate queries" is something with DSS users *want*. Jim Grey addressed
this in his ACM editiorial on the databases of the future. It's something
that *I* want, and if the Greenplum people aren't speaking up here, it's
because they're not paying atttention.

Now, I don't know if this Skyline patch is our answer for approximate queries.
Maybe I should pester Meredith about getting QBE free of its IP issues; it
certainly looked more flexible than Skyline. In either case, the code
probably needs a complete refactor.

But I think that approximate queries ought to be on our TODO list.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

#19Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Josh Berkus (#18)
Re: PostgreSQL - 'SKYLINE OF' clause added!

Josh Berkus wrote:

Now, I don't know if this Skyline patch is our answer for approximate queries.
Maybe I should pester Meredith about getting QBE free of its IP issues; it
certainly looked more flexible than Skyline. In either case, the code
probably needs a complete refactor.

But I think that approximate queries ought to be on our TODO list.

+1

#20Gavin Sherry
swm@linuxworld.com.au
In reply to: Josh Berkus (#18)
Re: PostgreSQL - 'SKYLINE OF' clause added!

On Wed, 7 Mar 2007, Josh Berkus wrote:

"Approximate queries" is something with DSS users *want*. Jim Grey addressed
this in his ACM editiorial on the databases of the future. It's something
that *I* want, and if the Greenplum people aren't speaking up here, it's
because they're not paying atttention.

Now, I don't know if this Skyline patch is our answer for approximate queries.
Maybe I should pester Meredith about getting QBE free of its IP issues; it
certainly looked more flexible than Skyline. In either case, the code
probably needs a complete refactor.

What people want from "approximate queries" is different to this: the
desire is usually to balance run time with level of accuracy/quality (some
times the desire is to have accurate results as well as similar results).
Neither skyline or QBE are about this. The only thing in the spec which
addresses this is 'tablesample'.

Thanks,

Gavin

#21Luke Lonergan
llonergan@greenplum.com
In reply to: Gavin Sherry (#20)
#22Shane Ambler
pgsql@Sheeky.Biz
In reply to: Tom Lane (#16)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Shane Ambler (#22)
#24Nikita
nikita.p@gmail.com
In reply to: Tom Lane (#23)
#25Naz Gassiep
naz@mira.net
In reply to: Nikita (#24)
#26Shane Ambler
pgsql@Sheeky.Biz
In reply to: Naz Gassiep (#25)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Shane Ambler (#26)