BUG #4748: hash join and sort-merge join make different results
The following bug has been logged online:
Bug reference: 4748
Logged by: Roman Kononov
Email address: kononov@ftml.net
PostgreSQL version: 8.3.7
Operating system: GNU/Linux x86_64
Description: hash join and sort-merge join make different results
Details:
test-std=# create table t(s int,i interval);
CREATE TABLE
test-std=# insert into t values (0,'30 days'), (1,'1 month');
INSERT 0 2
test-std=# select * from t as a, t as b where a.i=b.i;
s | i | s | i
---+---------+---+---------
0 | 30 days | 0 | 30 days
0 | 30 days | 1 | 1 mon
1 | 1 mon | 0 | 30 days
1 | 1 mon | 1 | 1 mon
(4 rows)
test-std=# analyze;
ANALYZE
test-std=# select * from t as a, t as b where a.i=b.i;
s | i | s | i
---+---------+---+---------
0 | 30 days | 0 | 30 days
1 | 1 mon | 1 | 1 mon
(2 rows)
Roman Kononov wrote:
Description: hash join and sort-merge join make different results
Details:test-std=# create table t(s int,i interval);
CREATE TABLE
test-std=# insert into t values (0,'30 days'), (1,'1 month');
INSERT 0 2
test-std=# select * from t as a, t as b where a.i=b.i;
Reproducible in 8.2.13 as well ..
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, Apr 3, 2009 at 2:10 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
Roman Kononov wrote:
Description: hash join and sort-merge join make different results
Details:test-std=# create table t(s int,i interval);
CREATE TABLE
test-std=# insert into t values (0,'30 days'), (1,'1 month');
INSERT 0 2
test-std=# select * from t as a, t as b where a.i=b.i;Reproducible in 8.2.13 as well ..
and the same in HEAD
PS: the analyze not always brings the problems, i had to turn off
enable_mergejoin and enable_nestloop
--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157
Em Sex, 2009-04-03 às 15:10 -0400, Alvaro Herrera escreveu:
Roman Kononov wrote:
Description: hash join and sort-merge join make different results
Details:test-std=# create table t(s int,i interval);
CREATE TABLE
test-std=# insert into t values (0,'30 days'), (1,'1 month');
INSERT 0 2
test-std=# select * from t as a, t as b where a.i=b.i;Reproducible in 8.2.13 as well ..
I could reproduce this once in a database that already have a table
named "t", then after i did dropped it i couldn't anymore.
I'm using 8.3.7.
# create table tt_01(s int,i interval);
CREATE TABLE
# insert into tt_01 values (0,'30 days'), (1,'1 month');
INSERT 0 2
# select * from tt_01 as a, tt_01 as b where a.i=b.i;
s | i | s | i
---+---------+---+---------
0 | 30 days | 0 | 30 days
0 | 30 days | 1 | 1 mon
1 | 1 mon | 0 | 30 days
1 | 1 mon | 1 | 1 mon
(4 registros)
# ANALYZE ;
ANALYZE
# select * from tt_01 as a, tt_01 as b where a.i=b.i;
s | i | s | i
---+---------+---+---------
0 | 30 days | 0 | 30 days
0 | 30 days | 1 | 1 mon
1 | 1 mon | 0 | 30 days
1 | 1 mon | 1 | 1 mon
(4 registros)
--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://planeta.postgresql.org.br
Alvaro Herrera <alvherre@commandprompt.com> writes:
Roman Kononov wrote:
Description: hash join and sort-merge join make different results
Details:test-std=# create table t(s int,i interval);
CREATE TABLE
test-std=# insert into t values (0,'30 days'), (1,'1 month');
INSERT 0 2
test-std=# select * from t as a, t as b where a.i=b.i;
Reproducible in 8.2.13 as well ..
The problem is that interval_cmp_internal(), and hence interval_eq(),
says that '1 month' = '30 days'. But they don't hash to the same thing.
I think we could fix this by having interval_hash() duplicate the
total-span calculation done by interval_cmp_internal, and then return
the hash of the resulting TimeOffset. This is going to break existing
hash indexes on intervals, but there seems little choice...
regards, tom lane
"Dickson S. Guedes" <listas@guedesoft.net> writes:
I could reproduce this once in a database that already have a table
named "t", then after i did dropped it i couldn't anymore.
As noted, you might have to force use of a hash join (my machine
preferred a mergejoin before the ANALYZE and a nestloop after).
It's definitely broken :-(
regards, tom lane
On 2009-04-03 14:57 Tom Lane said the following:
I think we could fix this by having interval_hash() duplicate the
total-span calculation done by interval_cmp_internal, and then return
the hash of the resulting TimeOffset. This is going to break existing
hash indexes on intervals, but there seems little choice...
Consider hashing the result of justify_interval().
Roman Kononov <kononov@ftml.net> writes:
On 2009-04-03 14:57 Tom Lane said the following:
I think we could fix this by having interval_hash() duplicate the
total-span calculation done by interval_cmp_internal, and then return
the hash of the resulting TimeOffset. This is going to break existing
hash indexes on intervals, but there seems little choice...
Consider hashing the result of justify_interval().
Uh, what's your point? We have to match interval_eq, not
justify_interval.
regards, tom lane
"Roman Kononov" <kononov@ftml.net> writes:
Description: hash join and sort-merge join make different results
I've applied a patch for this:
http://archives.postgresql.org/pgsql-committers/2009-04/msg00048.php
Thanks for the report!
regards, tom lane
On 2009-04-03 23:32 Tom Lane said the following:
Roman Kononov <kononov@ftml.net> writes:
On 2009-04-03 14:57 Tom Lane said the following:
I think we could fix this by having interval_hash() duplicate the
total-span calculation done by interval_cmp_internal, and then return
the hash of the resulting TimeOffset. This is going to break existing
hash indexes on intervals, but there seems little choice...Consider hashing the result of justify_interval().
Uh, what's your point? We have to match interval_eq, not
justify_interval.
For any two intervals a and b, saying that interval_cmp_interval(a,b)==0
is exactly the same as saying that (aj.month==bj.month && aj.day==bj.day
&& aj.time==bj.time), where aj=justify_interval(a) and
bj=justify_interval(b). Therefore, instead of hashing
interval_cmp_value() you can hash justify_interval(), where
interval_cmp_value() is the transformation of intervals in
interval_cmp_interval().
You said that hashing interval_cmp_value() breaks existing hash indexes.
Hashing "justified" intervals avoids such breaking in some cases.
Roman Kononov <kononov@ftml.net> writes:
On 2009-04-03 23:32 Tom Lane said the following:
Uh, what's your point? We have to match interval_eq, not
justify_interval.
For any two intervals a and b, saying that interval_cmp_interval(a,b)==0
is exactly the same as saying that (aj.month==bj.month && aj.day==bj.day
&& aj.time==bj.time), where aj=justify_interval(a) and
bj=justify_interval(b).
I doubt that that's exactly true when you take floating-point roundoff
into account ...
regards, tom lane