BUG #13592: Optimizer throws out join constraint causing incorrect result
The following bug has been logged on the website:
Bug reference: 13592
Logged by: Kristoffer Gustafsson
Email address: kristoffer.gustafsson@yves-rocher.se
PostgreSQL version: 9.4.3
Operating system: Linux
Description:
"A program produces the wrong output for any given input."
I’m including a description of the encountered issue because I can’t provide
you with a reproducible set of data. Both because of the data in question
but also because the last step ”corrected” the problem at hand and removed
the information showcasing the issue.
Three tables:
TableA: A_PK_BI, B_PK_BI_FK, A_DEF01_BI, A_VAL01_DP, A_VAL02_DP, A_VAL03_DP,
A_TOT_DP
TableB: B_PK_BI, C_PK_BI_FK
TableC: C_PK_BI, C_VAL01_DP, C_VAL02_DP
TableA is very large and reside in schema S01.
TableB small, around 6-9 entries and reside in schema S01.
TableC small, around 30-60 entries and reside in schema S02.
PK = Primary key
FK = Foreign key
BI = BigInt
DP = Double precision
select coalesce(sum(a.A_TOT_DP) / 1000, 0) as someVal
from
TableA a
inner join TableB b on a.B_PK_BI = b.B_PK_BI
inner join TableC c on b.C_PK_BI = c.C_PK_BI
where
a.A_DEF01_BI in (:LIST_OF_VALUES)
and a.B_PK_BI_FK = :B_PRIMARY_KEY
and (
(((a.A_VAL01_DP + a.A_VAL02_DP) / c.C_VAL01_DP) > :THRESHOLD_01)
or
(((a.A_VAL03_DP - a.A_VAL02_DP) / c.C_VAL01_DP) > :THRESHOLD_02)
)
Described as:
The selected entries from TableA
which can be linked to an entry in TableB
which has related information in TableC
And that fulfills these criteria.
Except suddenly that resulted in division by zero. Division is done in three
places, one fixed value and twice from TableC where each entry linked by
TableB has values greater than 0. I.e. none of the supposedly included
entries should be able to result in division by zero.
But, it seems the optimizer decided to restructure the whole thing to skip
the join relationship and explode TableA with TableC before using TableB as
a filter for the final aggregate.
I.e. appear to change the from clause to this in an early step:
select *
from
TableA a,
TableC c
where
a.A_DEF01_BI in (:LIST_OF_VALUES)
and a.B_PK_BI_FK = :B_PRIMARY_KEY
and (
(((a.A_VAL01_DP + a.A_VAL02_DP) / c.C_VAL01_DP) > :THRESHOLD_01)
or
(((a.A_VAL03_DP - a.A_VAL02_DP) / c.C_VAL01_DP) > :THRESHOLD_02)
)
TableA and TableC have no direct relation to each other. Only the entries in
TableC which can be bound via the bridge of TableB contain valid values for
TableA and the query as a whole. Meaning when the optimizer throws away the
relation and combines TableA with TableC it is using values which contain
invalid values for the where criteria. TableC in this case contains multiple
entries of 0 which in the calculated threshold criteria results in division
by zero error, but those entries should not be touched.
During investigation when TableB was checked in pgAdmin it indicated it was
in need of vaccum/analyze after which the query which had been throwing
division by zero was re-arranged by the optimizer to again work as intended
by the original description. Regardless of TableB requiring vacuum/analyze,
having the optimizer basically throw out the specified relationship and then
use the incorrectly gathered result in calculations seems rather incorrect.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Aug 26, 2015 at 3:50 PM, <kristoffer.gustafsson@yves-rocher.se>
wrote:
The following bug has been logged on the website:
Bug reference: 13592
Logged by: Kristoffer Gustafsson
Email address: kristoffer.gustafsson@yves-rocher.se
PostgreSQL version: 9.4.3
Operating system: Linux
Description:"A program produces the wrong output for any given input."
I’m including a description of the encountered issue because I can’t
provide
you with a reproducible set of data. Both because of the data in question
but also because the last step ”corrected” the problem at hand and removed
the information showcasing the issue.[...]
Except suddenly that resulted in division by zero. Division is done in
three
places, one fixed value and twice from TableC where each entry linked by
TableB has values greater than 0. I.e. none of the supposedly included
entries should be able to result in division by zero.But, it seems the optimizer decided to restructure the whole thing to skip
the join relationship and explode TableA with TableC before using TableB as
a filter for the final aggregate.[...]
TableA and TableC have no direct relation to each other. Only the entries
in
TableC which can be bound via the bridge of TableB contain valid values for
TableA and the query as a whole. Meaning when the optimizer throws away the
relation and combines TableA with TableC it is using values which contain
invalid values for the where criteria. TableC in this case contains
multiple
entries of 0 which in the calculated threshold criteria results in division
by zero error, but those entries should not be touched.During investigation when TableB was checked in pgAdmin it indicated it was
in need of vaccum/analyze after which the query which had been throwing
division by zero was re-arranged by the optimizer to again work as intended
by the original description. Regardless of TableB requiring vacuum/analyze,
having the optimizer basically throw out the specified relationship and
then
use the incorrectly gathered result in calculations seems rather incorrect.
I do not follow but the fact that a division-by-zero exception occurs in
some execution plans but not others is not a bug.
If you do not want any rows where (c.C_VAL01_DP = 0) to be considered you
should alter the query so that instead of linking to everything in "c" you
only consider those rows having a non-zero C_VAL01_DP attribute.
At worse this is a performance-related issue that happens to manifest as a
division-by-zero. however, your report is inadequate to consider that
particular dynamic.
David J.
DJ:
I do not follow but the fact that a division-by-zero exception occurs in some execution plans but not others is not a bug. If you do not want any rows where (c.C_VAL01_DP = 0) to be considered you should alter the query so that instead of linking to everything in "c" you only consider those rows having a non-zero C_VAL01_DP attribute.
KG:
Sorry, I can't provide more info/working sample. The analyze fixed the evaluation order.
Should I interpret your response that the inner joins are of no actual value in determining the result sets used by the optimizer?
The FROM has set A which is joined with set B which links set C.
There is a path with keys going from A to B to C.
There is no link from A to C.
C is only valid as part of B.
Selecting the full set of A and C without the limit from B includes all entries from C even when only the subset linked by B should be considered as valid according to FROM.
Of course you can write the WHERE criteria to not contain expressions/calculations and duplicate the constraint expressed by the FROM section, but doesn't that defeat the purpose of declaring the joins in FROM?
"c.C_VAL01_DP = 0" should per definition of the FROM never be included since none of the valid entries specified by the relationship has that value.
Doing the evaluation in this other order is one way of generating the result, but it is ignoring the relationship expressed by the FROM section and using it as any other filter.
But if this is working as intended so be it, will just have to cover for it with additional criteria in queries.
________________________________
Detta meddelande och alla bilagor är konfidentiella och avsedda för den namngivna mottagaren. Om du har fått detta meddelande av misstag, vänligen meddela omedelbart avsändaren och ta sedan bort meddelandet. All otillåten modifiering, användning eller spridning är förbjuden. Avsändaren är inte ansvarig för detta meddelande om det har ändrats, förfalskats, redigerats, smittat av ett virus eller spridits utan tillstånd. Skriv inte ut detta meddelande om det inte är nödvändigt, tänk på miljön.
This message and any attachments are confidential and intended for the named addressee(s) only. If you have received this message in error, please notify immediately the sender, then delete the message. Any unauthorized modification, edition, use or dissemination is prohibited. The sender is not liable for this message if it has been modified, altered, falsified, infected by a virus or even edited or disseminated without authorization. Do not print this message unless it is necessary, consider the environment.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Kristoffer Gustafsson <kristoffer.gustafsson@yves-rocher.se> writes:
The FROM has set A which is joined with set B which links set C.
There is a path with keys going from A to B to C.
There is no link from A to C.
Actually, there is: the WHERE clause involving A and C is itself a join
clause, since it allows filtering out some pairs of A and C rows, albeit
only after performing a nestloop join. It seems somewhat unlikely that
the optimizer would choose that approach in preference to equijoins, but
since you've not shown us any concrete details, it can't be ruled out.
For example, if both A and C are small and both could usefully be used in
an indexscan on a large B table, joining A to C first would make perfect
sense.
Of course you can write the WHERE criteria to not contain expressions/calculations and duplicate the constraint expressed by the FROM section, but doesn't that defeat the purpose of declaring the joins in FROM?
There is *no* semantic difference between writing a join clause in WHERE
and writing it in an (inner) JOIN/ON clause. There is certainly no
promise about the execution order. See
http://www.postgresql.org/docs/9.4/static/sql-expressions.html#SYNTAX-EXPRESS-EVAL
Personally I'd dodge the whole problem by reformulating the WHERE to avoid
division, that is
(
((a.A_VAL01_DP + a.A_VAL02_DP) > (:THRESHOLD_01 * c.C_VAL01_DP))
or
((a.A_VAL03_DP - a.A_VAL02_DP) > (:THRESHOLD_02 * c.C_VAL01_DP))
)
If you can't fix it in that sort of way, the traditional approach to
forcing the join order in Postgres is to put the desired innermost join
in a sub-SELECT with OFFSET 0, which works as an optimization fence.
(The optimizer can't push joins or WHERE clauses past a LIMIT/OFFSET
for fear of changing the set of rows returned.) But JOIN/ON is most
certainly not an optimization fence.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Tom, David,
You are correct. I'm sorry, gotten accustomed to the join criteria/table relationship taking first seat over other expression.
Just to fill in,
For example, if both A and C are small and both could usefully be used in an indexscan on a large B table, joining A to C first would make perfect sense.
A is much larger than B and C; B contains the least number of entries (~6 in this DB); C slightly larger than B (~60). Report indicated B required Vacuum/Analyze so I guess optimizer saw more than those ~6.
Guess fix will be either similar to example or force the B/C relation to evaluate first into D.
Also, I apologize for the blob message included at end of mail, it is attached by server on outgoing mail.
Regards
Kristoffer Gustafsson
________________________________
Detta meddelande och alla bilagor är konfidentiella och avsedda för den namngivna mottagaren. Om du har fått detta meddelande av misstag, vänligen meddela omedelbart avsändaren och ta sedan bort meddelandet. All otillåten modifiering, användning eller spridning är förbjuden. Avsändaren är inte ansvarig för detta meddelande om det har ändrats, förfalskats, redigerats, smittat av ett virus eller spridits utan tillstånd. Skriv inte ut detta meddelande om det inte är nödvändigt, tänk på miljön.
This message and any attachments are confidential and intended for the named addressee(s) only. If you have received this message in error, please notify immediately the sender, then delete the message. Any unauthorized modification, edition, use or dissemination is prohibited. The sender is not liable for this message if it has been modified, altered, falsified, infected by a virus or even edited or disseminated without authorization. Do not print this message unless it is necessary, consider the environment.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs