DISTINCT is not quite distinct

Started by Florian Weimerover 19 years ago4 messagesgeneral
Jump to latest
#1Florian Weimer
fweimer@bfk.de

I run this innocent query

CREATE TABLE foo AS SELECT DISTINCT bar FROM baz ORDER BY bar;

and the resulting table contains duplicate rows. 8-(

According to EXPLAIN, an index scan on the bar column is used (using
the underlying B-tree index). This is with PostgreSQL 8.1.4 (Debian
package 8.1.4-6). Is this a known problem?

If I drop the DISTINCT, the output is not correctly ordered, either.
Perhaps this is an index corruption issue? The hardware itself seems
fine.

--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

#2Richard Huxton
dev@archonet.com
In reply to: Florian Weimer (#1)
Re: DISTINCT is not quite distinct

Florian Weimer wrote:

I run this innocent query

CREATE TABLE foo AS SELECT DISTINCT bar FROM baz ORDER BY bar;

and the resulting table contains duplicate rows. 8-(

According to EXPLAIN, an index scan on the bar column is used (using
the underlying B-tree index). This is with PostgreSQL 8.1.4 (Debian
package 8.1.4-6). Is this a known problem?

If I drop the DISTINCT, the output is not correctly ordered, either.
Perhaps this is an index corruption issue? The hardware itself seems
fine.

Could be index corruption perhaps.

I take it SELECT DISTINCT bar... shows the same problem?

If so, can you do:
SELECT OID,xmin,cmin,xmax,cmax,bar FROM baz
WHERE bar = <something with duplicates>

--
Richard Huxton
Archonet Ltd

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Weimer (#1)
Re: DISTINCT is not quite distinct

Florian Weimer <fweimer@bfk.de> writes:

I run this innocent query
CREATE TABLE foo AS SELECT DISTINCT bar FROM baz ORDER BY bar;
and the resulting table contains duplicate rows. 8-(

According to EXPLAIN, an index scan on the bar column is used (using
the underlying B-tree index).

Do you mean an indexscan followed immediately by a Unique node? If
so, yeah, that would depend entirely on correct ordering of the
indexscan output to produce distinct results.

If I drop the DISTINCT, the output is not correctly ordered, either.
Perhaps this is an index corruption issue? The hardware itself seems
fine.

Perhaps. Do you want to save off a physical copy of the index and then
try REINDEXing? If that fixes it, I'd be interested to compare the two
versions of the index.

regards, tom lane

#4Florian Weimer
fweimer@bfk.de
In reply to: Tom Lane (#3)
Re: DISTINCT is not quite distinct

* Tom Lane:

According to EXPLAIN, an index scan on the bar column is used (using
the underlying B-tree index).

Do you mean an indexscan followed immediately by a Unique node? If
so, yeah, that would depend entirely on correct ordering of the
indexscan output to produce distinct results.

Yes.

If I drop the DISTINCT, the output is not correctly ordered, either.
Perhaps this is an index corruption issue? The hardware itself seems
fine.

Perhaps. Do you want to save off a physical copy of the index and then
try REINDEXing?

The duplicate row is gone.

If that fixes it, I'd be interested to compare the two versions of
the index.

The index files are about 155 MB and 98 MB, compressed. How shall we
transfer them? (Their contents is not super-secret, but I don't want
to distribute them widely, either.)

--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99