DISTINCT is not quite distinct
I run this innocent query
CREATE TABLE foo AS SELECT DISTINCT bar FROM baz ORDER BY bar;
and the resulting table contains duplicate rows. 8-(
According to EXPLAIN, an index scan on the bar column is used (using
the underlying B-tree index). This is with PostgreSQL 8.1.4 (Debian
package 8.1.4-6). Is this a known problem?
If I drop the DISTINCT, the output is not correctly ordered, either.
Perhaps this is an index corruption issue? The hardware itself seems
fine.
--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99
Florian Weimer wrote:
I run this innocent query
CREATE TABLE foo AS SELECT DISTINCT bar FROM baz ORDER BY bar;
and the resulting table contains duplicate rows. 8-(
According to EXPLAIN, an index scan on the bar column is used (using
the underlying B-tree index). This is with PostgreSQL 8.1.4 (Debian
package 8.1.4-6). Is this a known problem?If I drop the DISTINCT, the output is not correctly ordered, either.
Perhaps this is an index corruption issue? The hardware itself seems
fine.
Could be index corruption perhaps.
I take it SELECT DISTINCT bar... shows the same problem?
If so, can you do:
SELECT OID,xmin,cmin,xmax,cmax,bar FROM baz
WHERE bar = <something with duplicates>
--
Richard Huxton
Archonet Ltd
Florian Weimer <fweimer@bfk.de> writes:
I run this innocent query
CREATE TABLE foo AS SELECT DISTINCT bar FROM baz ORDER BY bar;
and the resulting table contains duplicate rows. 8-(
According to EXPLAIN, an index scan on the bar column is used (using
the underlying B-tree index).
Do you mean an indexscan followed immediately by a Unique node? If
so, yeah, that would depend entirely on correct ordering of the
indexscan output to produce distinct results.
If I drop the DISTINCT, the output is not correctly ordered, either.
Perhaps this is an index corruption issue? The hardware itself seems
fine.
Perhaps. Do you want to save off a physical copy of the index and then
try REINDEXing? If that fixes it, I'd be interested to compare the two
versions of the index.
regards, tom lane
* Tom Lane:
According to EXPLAIN, an index scan on the bar column is used (using
the underlying B-tree index).Do you mean an indexscan followed immediately by a Unique node? If
so, yeah, that would depend entirely on correct ordering of the
indexscan output to produce distinct results.
Yes.
If I drop the DISTINCT, the output is not correctly ordered, either.
Perhaps this is an index corruption issue? The hardware itself seems
fine.Perhaps. Do you want to save off a physical copy of the index and then
try REINDEXing?
The duplicate row is gone.
If that fixes it, I'd be interested to compare the two versions of
the index.
The index files are about 155 MB and 98 MB, compressed. How shall we
transfer them? (Their contents is not super-secret, but I don't want
to distribute them widely, either.)
--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99