BUG #4748: hash join and sort-merge join make different results

Started by Roman Kononovabout 17 years ago11 messagesbugs
Jump to latest
#1Roman Kononov
kononov@ftml.net

The following bug has been logged online:

Bug reference: 4748
Logged by: Roman Kononov
Email address: kononov@ftml.net
PostgreSQL version: 8.3.7
Operating system: GNU/Linux x86_64
Description: hash join and sort-merge join make different results
Details:

test-std=# create table t(s int,i interval);
CREATE TABLE
test-std=# insert into t values (0,'30 days'), (1,'1 month');
INSERT 0 2
test-std=# select * from t as a, t as b where a.i=b.i;
s | i | s | i
---+---------+---+---------
0 | 30 days | 0 | 30 days
0 | 30 days | 1 | 1 mon
1 | 1 mon | 0 | 30 days
1 | 1 mon | 1 | 1 mon
(4 rows)

test-std=# analyze;
ANALYZE
test-std=# select * from t as a, t as b where a.i=b.i;
s | i | s | i
---+---------+---+---------
0 | 30 days | 0 | 30 days
1 | 1 mon | 1 | 1 mon
(2 rows)

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Roman Kononov (#1)
Re: BUG #4748: hash join and sort-merge join make different results

Roman Kononov wrote:

Description: hash join and sort-merge join make different results
Details:

test-std=# create table t(s int,i interval);
CREATE TABLE
test-std=# insert into t values (0,'30 days'), (1,'1 month');
INSERT 0 2
test-std=# select * from t as a, t as b where a.i=b.i;

Reproducible in 8.2.13 as well ..

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#3Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Alvaro Herrera (#2)
Re: BUG #4748: hash join and sort-merge join make different results

On Fri, Apr 3, 2009 at 2:10 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Roman Kononov wrote:

Description:        hash join and sort-merge join make different results
Details:

test-std=# create table t(s int,i interval);
CREATE TABLE
test-std=# insert into t values (0,'30 days'), (1,'1 month');
INSERT 0 2
test-std=# select * from t as a, t as b where a.i=b.i;

Reproducible in 8.2.13 as well ..

and the same in HEAD

PS: the analyze not always brings the problems, i had to turn off
enable_mergejoin and enable_nestloop

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

#4Dickson S. Guedes
listas@guedesoft.net
In reply to: Alvaro Herrera (#2)
Re: BUG #4748: hash join and sort-merge join make different results

Em Sex, 2009-04-03 às 15:10 -0400, Alvaro Herrera escreveu:

Roman Kononov wrote:

Description: hash join and sort-merge join make different results
Details:

test-std=# create table t(s int,i interval);
CREATE TABLE
test-std=# insert into t values (0,'30 days'), (1,'1 month');
INSERT 0 2
test-std=# select * from t as a, t as b where a.i=b.i;

Reproducible in 8.2.13 as well ..

I could reproduce this once in a database that already have a table
named "t", then after i did dropped it i couldn't anymore.

I'm using 8.3.7.

# create table tt_01(s int,i interval);
CREATE TABLE

# insert into tt_01 values (0,'30 days'), (1,'1 month');
INSERT 0 2

# select * from tt_01 as a, tt_01 as b where a.i=b.i;
s | i | s | i
---+---------+---+---------
0 | 30 days | 0 | 30 days
0 | 30 days | 1 | 1 mon
1 | 1 mon | 0 | 30 days
1 | 1 mon | 1 | 1 mon
(4 registros)

# ANALYZE ;
ANALYZE

# select * from tt_01 as a, tt_01 as b where a.i=b.i;
s | i | s | i
---+---------+---+---------
0 | 30 days | 0 | 30 days
0 | 30 days | 1 | 1 mon
1 | 1 mon | 0 | 30 days
1 | 1 mon | 1 | 1 mon
(4 registros)

--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://planeta.postgresql.org.br

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#2)
Re: BUG #4748: hash join and sort-merge join make different results

Alvaro Herrera <alvherre@commandprompt.com> writes:

Roman Kononov wrote:

Description: hash join and sort-merge join make different results
Details:

test-std=# create table t(s int,i interval);
CREATE TABLE
test-std=# insert into t values (0,'30 days'), (1,'1 month');
INSERT 0 2
test-std=# select * from t as a, t as b where a.i=b.i;

Reproducible in 8.2.13 as well ..

The problem is that interval_cmp_internal(), and hence interval_eq(),
says that '1 month' = '30 days'. But they don't hash to the same thing.

I think we could fix this by having interval_hash() duplicate the
total-span calculation done by interval_cmp_internal, and then return
the hash of the resulting TimeOffset. This is going to break existing
hash indexes on intervals, but there seems little choice...

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dickson S. Guedes (#4)
Re: BUG #4748: hash join and sort-merge join make different results

"Dickson S. Guedes" <listas@guedesoft.net> writes:

I could reproduce this once in a database that already have a table
named "t", then after i did dropped it i couldn't anymore.

As noted, you might have to force use of a hash join (my machine
preferred a mergejoin before the ANALYZE and a nestloop after).
It's definitely broken :-(

regards, tom lane

#7Roman Kononov
kononov@ftml.net
In reply to: Tom Lane (#5)
Re: BUG #4748: hash join and sort-merge join make different results

On 2009-04-03 14:57 Tom Lane said the following:

I think we could fix this by having interval_hash() duplicate the
total-span calculation done by interval_cmp_internal, and then return
the hash of the resulting TimeOffset. This is going to break existing
hash indexes on intervals, but there seems little choice...

Consider hashing the result of justify_interval().

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Roman Kononov (#7)
Re: BUG #4748: hash join and sort-merge join make different results

Roman Kononov <kononov@ftml.net> writes:

On 2009-04-03 14:57 Tom Lane said the following:

I think we could fix this by having interval_hash() duplicate the
total-span calculation done by interval_cmp_internal, and then return
the hash of the resulting TimeOffset. This is going to break existing
hash indexes on intervals, but there seems little choice...

Consider hashing the result of justify_interval().

Uh, what's your point? We have to match interval_eq, not
justify_interval.

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Roman Kononov (#1)
Re: BUG #4748: hash join and sort-merge join make different results

"Roman Kononov" <kononov@ftml.net> writes:

Description: hash join and sort-merge join make different results

I've applied a patch for this:
http://archives.postgresql.org/pgsql-committers/2009-04/msg00048.php

Thanks for the report!

regards, tom lane

#10Roman Kononov
kononov@ftml.net
In reply to: Tom Lane (#8)
Re: BUG #4748: hash join and sort-merge join make different results

On 2009-04-03 23:32 Tom Lane said the following:

Roman Kononov <kononov@ftml.net> writes:

On 2009-04-03 14:57 Tom Lane said the following:

I think we could fix this by having interval_hash() duplicate the
total-span calculation done by interval_cmp_internal, and then return
the hash of the resulting TimeOffset. This is going to break existing
hash indexes on intervals, but there seems little choice...

Consider hashing the result of justify_interval().

Uh, what's your point? We have to match interval_eq, not
justify_interval.

For any two intervals a and b, saying that interval_cmp_interval(a,b)==0
is exactly the same as saying that (aj.month==bj.month && aj.day==bj.day
&& aj.time==bj.time), where aj=justify_interval(a) and
bj=justify_interval(b). Therefore, instead of hashing
interval_cmp_value() you can hash justify_interval(), where
interval_cmp_value() is the transformation of intervals in
interval_cmp_interval().

You said that hashing interval_cmp_value() breaks existing hash indexes.
Hashing "justified" intervals avoids such breaking in some cases.

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Roman Kononov (#10)
Re: BUG #4748: hash join and sort-merge join make different results

Roman Kononov <kononov@ftml.net> writes:

On 2009-04-03 23:32 Tom Lane said the following:

Uh, what's your point? We have to match interval_eq, not
justify_interval.

For any two intervals a and b, saying that interval_cmp_interval(a,b)==0
is exactly the same as saying that (aj.month==bj.month && aj.day==bj.day
&& aj.time==bj.time), where aj=justify_interval(a) and
bj=justify_interval(b).

I doubt that that's exactly true when you take floating-point roundoff
into account ...

regards, tom lane