space for optimalization: DISTINCT without index

Started by Pavel Stehuleabout 20 years ago3 messages

pavel.stehule@hotmail.com

about 20 years ago

Hello

I did some test and I can see so DISTINCT works well on indexed columns, but
is slow on derived tables without indexes. If I use without distinct group
by I get much better times.

SELECT DISTINCT a, b FROM tab
SELECT a,b FROM tab GROUP BY a, b.

Can You Explain it.

Thank You
Pavel Stehule

_________________________________________________________________
Citite se osamele? Poznejte nekoho vyjmecneho diky Match.com.
http://www.msn.cz/

Bruno Wolff III

bruno@wolff.to

about 20 years ago

In reply to: Pavel Stehule (#1)

Re: space for optimalization: DISTINCT without index

On Mon, Dec 12, 2005 at 18:35:07 +0100,
Pavel Stehule <pavel.stehule@hotmail.com> wrote:

Hello

I did some test and I can see so DISTINCT works well on indexed columns,
but is slow on derived tables without indexes. If I use without distinct
group by I get much better times.

SELECT DISTINCT a, b FROM tab
SELECT a,b FROM tab GROUP BY a, b.

Can You Explain it.

DISTINCT will require a sort step to remove duplicates. GROUP BY can use
either a sort or hash aggregate plan. If there are few distinct values,
the hash aggregate plan can be much faster.

Tom Lane

tgl@sss.pgh.pa.us

about 20 years ago

In reply to: Bruno Wolff III (#2)

Re: space for optimalization: DISTINCT without index

Bruno Wolff III <bruno@wolff.to> writes:

DISTINCT will require a sort step to remove duplicates. GROUP BY can use
either a sort or hash aggregate plan. If there are few distinct values,
the hash aggregate plan can be much faster.

The DISTINCT code hasn't been revisited in a long time. One obstacle to
improving it is that it's very tightly intertwined with ORDER BY. While
fixing that might be just a Small Matter Of Programming, it's not clear
how to decouple them without breaking DISTINCT ON.

regards, tom lane